banner

大数据应用の信用评分及模型原理解析

作者: 大数据观察来源: 大数据观察时间:2017-01-26 16:59:060

虽然人人都可以通过对借款方在Lending Club和Prosper上的历史借贷数据进行分析,但我相信,了解消费信贷行为、评分机制和贷款决策背后的工作原理可以帮助投资人更好的在市场中进行决策,获得收益。

消费信贷一直是推动世界领先国家经济转型的主要力量。在过去的50年里,消费开支也因此有所增加。根据纽约联邦储备银行家庭债务和信用季度报告,2014年8月,消费者负债总额为11.63万亿美元,其中74%为按揭和净值贷款,10%为学生贷款,8%为汽车贷款,以及6%为信用卡债务。消费信贷需求增长率极高,自动化风险评估系统势在必行。

信用评分

信用评分最早始于上世纪50年代初。信用评分最初使用统计学方法来区分优秀和不良贷款。最初,信用评分的重点是是否要给贷方发放贷款,后来,这种行为转变成了申请人评分(applicant scoring)。信用评分借着申请人评分这一项成为了一项成功的评价系统。

在信用评分中,信贷价值假设会在未来的几年保持稳定,贷方会对申请人是否会在未来的12个月内出现90天以上的逾期支付进行评估。申请成功时申请人的最低评分是该分值边际良好和不良贷款几率相比而来,即会额外通过的优秀贷款与不良贷款的比例。申请者贷款1-2年以来的数据,加上相应的信用记录将帮助建立申请者未来2年左右的申请评分模型。

行为评分(Behavioral scoring),是申请人评分的一个补充,旨在评估申请人在过去一年中支付和购买行为的状况。 此数据用于预测未来12个月的违约风险情况,通常每个月更新一次数据。最近表现和当前信贷信息比最开始的申请信息更为重要。

比起违约风险,如今贷款方更加注重能满足他们盈利目标的贷款战略。他们可以选择贷款额、利率及其他条款,从而最大限度地提高盈利能力。基于盈利能力而做出决策的技术分析叫做利润评分(profit scoring)。

与可使用静态信用评分模式的申请人评分不同,行为评分和利润评分需要使用动态信用评分模式,即要将过去的信贷行为纳入考虑范围。 一般来说,信用评分模型会分别为每一笔借贷建模。但是,由于借款人贷款组合违约情况(信用风险)增高,所借款项的重要性便今非昔比了。目前为止没有广泛接受的用于评估贷款组合的信用风险模型。

您可以通过评估系统识别优秀及不良贷款的能力,预测概率的精确性以及分类预测的准确性这三点来评估一个信用评分模型。

贷款决策模型

贷款人的主要目标是在其投资组合中获得利润最大化。对于任何一笔贷款中,投资人都需要考虑贷款回报额。投资100美元,获利10美元显然不如投资25美元,获得3美元回报。

有些情况下,借款人无法偿还贷款,这就意味着贷款人甚至会面临重大损失。我们可以通过分析投资组合违约率及违约结果对风险进行量化。贷款人还可以将风险和回报设定在预期的范围内。

最终是否投资给借款人需要基于一系列决策:即哪些信息将有助于作出决策,在决策过程期间和之后贷款会有何发展以及最终可能出现的结果。

影响图网

影响图网用可视化的图形帮助投资人了解主要决策、不确定性、相关信息以及最终成果是如何相互影响的。

影响图网可以确定决策的重要方面,有哪些数据与决策相关,以及在哪些方面有关。图网包括三种节点:决策(长方形节点),不确定事件(圆形节点),以及结果(菱形节点)。各节点由箭头相互连接。图1从市场中贷款人的角度进行绘制。

图1中,首先,贷款人获得借款人是否会有良好表现的贷款预测。预测是随机事件,因为贷款人不能决定预测的结果。它将影响投资与否(Loan or not)的决策,也会影响借款人的表现(Borrower good or bad)。接着,平台将决定是否发布贷款(Loan issued or not)。这对贷款人来说是随机事件。除非该贷款没有得到足够的贷款人支持,否则贷款人对是否发布贷款没有决定权或影响力。贷款一旦发布,贷款人就可以检验收入证明(Income verification)执行情况,查看FICO分数及还款记录(FICO score and payment history.)是否有变化,并更新贷款预测。根据更新后的贷款预测,贷款人可以决定是否要在FILOfn二级交易平台上卖出贷款。类似的,其他贷款人也可以在二级交易平台上很据更新的贷款预测决定是否买入贷款。这一系列事件会最终影响贷款人的收益。

决策树

决策树确定贷款中有哪些最优决策,并按照决策过程中信息的了解顺序来解析决策的各个步骤。

那么决策树模型又是如何根据可视化影响图网中的结构逐渐形成的呢?决策树与影像图网的结构类似。其结果由以数字代表的回报事件表示。每个机会节点(不确定事件)都被赋予一定比重,比重代表事件结果发生的可能性。

从结果点开始往回推,经过所有决策及不确定事件的节点后,可以计算出每个结果的预期货币值(EMV)。

图2是一个简单的贷款决策的决策树。贷款人对是否进行投资做出了一份初期判断。如果贷款人不愿投资,则回报是0。如果贷款人投资,则有两种可能:投资回报良好,或不好(即违约)。

假设,借款人回报良好时,贷款人获益10,借款人违约时,贷款人则损失100。如果违约可能性是5%,并且贷款人愿意投资,则贷款人可能从借款人处获益:

0.95 x 10 + 0.05 x (-100) = 4.5

如果贷款人不愿投资,则获益为0。因此,决策树显示贷款人应该进行投资。如果违约的可能性增加到10%,则贷款人可能从借款人处获益:

0.90 x 10 + 0.10 x (-100) = -1

因此,决策树显示贷款人不应该进行投资。

综上所示,如果g代表贷款人收益,l代表因借款人违约导致的贷款人损失,p代表投资回报良好的可能性,那么根据预期货币值(EMV)的标准,只有 pg – (1-p)l > 0时,贷款人应该进行投资。

p/(1-p)即投资回报良好的可能性与违约可能性的比值,也称为良莠比(good:bad odds)。

能够涵盖所有贷款决策的决策树很难实现,也不方便。但是,决策树可以协助贷款人进行决策。

英语原文:

This series of blog posts tries to cover the theory behind credit scoring and models typically used in consumer lending domain. While anyone can perform statistics gymnastics given the historical loan data from Lending Club and Prosper, I believe, understanding the theory behind consumer credit behavior, scoring and lending decision making is important to profit from the opportunities in the marketplace lending.

The consumer credit has been the driving force behind the economies of leading nations. It has been responsible for growth in consumer spending in last 50 years. According to Federal Reserve Bank of New York’s Quarterly Report on Household Debt and Credit Report, August 2014, the total consumer indebtedness stands at $11.63 trillion with 74% in mortgage and home equity line of credit, 10% in student loans, 8% in auto loans, and 6% in credit card debt. The demand for consumer credit is growing at extremely high rate creating opportunity for automated risk assessment systems.

Credit Scoring

Credit scoring was a risk assessment approach introduced in 1950s. Credit scoring began with the application of statistical methods of classification in classifying good and bad loans. Credit scoring initially focused on whether one should grant credit to a new applicant, later come to known as applicant scoring. Credit scoring has been successful because of this singular objective.

It assumed that factors implying credit worthiness were relatively stable over a few years and assessed the chance of an applicant going 90 days overdue on their payment in next 12 months. The cut-off score at which applicants are accepted is made using the marginal good:bad odds at that score – ratio of additional goods to the additional bads that would be accepted if score was dropped.

Data on those who applied for credit 1 or 2 years ago, together with their subsequent credit history, was used to build the application scoring model that would be used to determine for the next 2 years or so which credit applicants to accept.

Behavioral scoring, as an extension to applicant scoring, uses information on payment and purchase behavior in the past year. This data is used to forecast the default risk over the next 12 months and typically updated monthly. The recent performance and current credit information is more powerful than just the initial application data.

Lenders are now focused on lending strategies that meet their profitability objectives rather than just default risk. They can choose the loan amount, interest rate, and other terms to maximize profitability. The techniques that support profitability based decision are called profit scoring.

Unlike applicant scoring that can use static credit models, behavioral and profit scoring models require dynamic credit models that consider past behavior. Traditionally, credit modeling has modeled each loan individually. However, the importance of the money lent will be lost because borrower default (credit risk) from a portfolio of loans has increased. There is currently no widely-accepted model of the credit risk for loan portfolio.

You can measure a credit scoring model by its discriminating ability between good and bad loans, by the accuracy of its probability predictions and by the correctness of its categorical forecasts.

Lending Decision Modeling

The main objective for a lender is to maximize profit on its loan portfolio. On an individual loan basis, the lender needs to do this by considering the return on amount lent. A loan with a profit of $10 when $100 is lent to a borrower is not as good when $3 is achieved on a loan of $25.

There is a risk that one of the borrowers will not repay the loan, in which case lending leads to substantial losses. Risk might be quantified by the default rate expected in the portfolio or the losses these defaults leads to. An alternate objective for the lender may be to keep the risk and return profiles within pre-defined limits.

Lending to a borrower is a based on a series of decisions: what information would be useful in making a decision, what is the chain of events that could occur during and after the decision process and the possible outcomes of the decision.

Influence Diagrams

Influence diagrams help one to visualize graphically how the decisions, the uncertainties, the information, and the outcomes are interrelated.

The Influence diagrams identify the important aspects of the decision, what data is relevant to the decisions and to what aspect of the decision making it is related. It consists of a graph with three types of nodes: decisions (rectangular nodes), uncertain events (circular nodes), and outcomes (diamond nodes) connected by arrows. The Figure 1 shows the influence diagram from a perspective of lender on marketplace lending platform.

In Figure 1, first there is a lender’s forecast of whether the borrower’s performance will be good or bad. The forecast is a random event in that the lender cannot decide what the forecast will say about a borrower. It will influence both the Invest in Loan or not decision and whether the borrower will subsequently be good or bad. Second, the platform may decide to issue the loan or not. It is a random event for lender. Lender has no influence or participation in decision making process to issue the loan except when not enough lenders decide to fund the loan. Once loan has been issued, lender can update the forecast based on whether income verification was performed and with any changes in FICO score and payment history. Depending on updated forecast, lender can decide to sell the loan or not on FOLIOfn secondary platform. Similarly, another lender may decide to purchase loan on secondary platform based on this updated forecast. These events affect the profitability of loan to the lender.

Decision Trees

Decision trees identify what are the optimal decisions and explain the sequence in which decisions have to be made and the sequence in which the information becomes available during the decision process.

Now consider how a decision tree model can be built of the decision structure visualized in the influence diagram. The decision trees are similar structure to influence diagrams. The outcomes are now represented by pay-off events represented by numerical values. Each path from a chance node (uncertain event) is given a weight representing the probability of outcome listed on that path to occur.

By starting at the end of each of the outcome branches and working backward in time through all the decision and chance event nodes, expected monetary value (EMV) can be calculated for each outcome.

Figure 2 shows the decision tree for a very simple lending decision. There is an initial decision by the lender of whether to invest in loan or not. If lender doesn’t invest in the loan, the pay-off to lender is 0. If lender invests in the loan, there is a chance event which is whether borrowers repayments are good or bad (defaults).

Consider the situation where the profit to lender if a borrower repays is 10 while the loss is 100 if borrower defaults. If chance of default is 5%, the expected profit from the borrower, if lender invests, is

0.95 x 10 + 0.05 x (-100) = 4.5

while if lender doesn’t invest the profit is 0. So the decision tree suggests that the lender should invest in the loan. If the probability of default increases to 10%, the expected profit from the borrower is

0.90 x 10 + 0.10 x (-100) = -1

So the decision tree now suggests that the lender shouldn’t invest in the loan.

Generalizing the above, if g is the profit made by lender from repaying borrower, l is the loss lender suffers because of borrower default, and p is the probability of borrower being good, under the EMV criterion, lender should invest in loan when pg – (1-p)l > 0.

p/(1-p), the chance of being good divided by the chance of being bad is also called the good:bad odds.

A decision tree model that tries to cover all aspects of lending decision can become very unwieldy. However, the exercise of drawing decision tree helps understand the decision making process.

In the next post in this series, I will discuss probabilities and odds concepts and apply them to real historical data from Lending Club.

banner
看过还想看
可能还想看
热点推荐

永洪科技
致力于打造全球领先的数据技术厂商

申请试用
Copyright © 2012-2024开发者:北京永洪商智科技有限公司版本:V10.2
京ICP备12050607号-1京公网安备110110802011451号 隐私政策应用权限