How Do Banks Predict Expected Credit Losses? | HackerNoon

Assessment of Credit Portfolio Reserves is one of the tasks I have worked on extensively throughout my professional practice. It is an interesting and complex challenge, which I will discuss in detail.

In this article, I will explain what reserves are and why they are essential for banks, how banks assess reserves, and where machine learning can be applied to this task.

What Are Reserves?

Reserves, or Expected Credit Losses (ECL), represent a long-term forecast of how much money a bank might lose due to loan defaults. They are a crucial risk management tool that impacts a bank’s financial stability.

A bank’s credit portfolio includes all the loans it has issued to clients at a given time. Ideally, all loans will be repaid, but borrowers occasionally fail to meet their obligations, resulting in losses for the bank. To prepare for such cases, banks estimate potential losses in advance and create reserves to offset these losses.

The key event for calculating reserves is a “default.” This is the moment when a borrower is officially recognized as unable to fulfill their obligations. Typically, default is recorded if the borrower fails to make payments for 90 days or more.

Expected Credit Loss

Reserves (Expected Credit Loss) are not a single abstract sum but rather a collection of calculations for each individual loan. These calculations are based on the international standard IFRS 9, which uses the following formula:

ECL = EAD * PD * LGD * MR

EAD (Exposure at Default) — the amount of outstanding debt at the moment of default.
PD (Probability of Default) — the probability of a borrower defaulting on their obligations.
LGD (Loss Given Default) — the amount of loss incurred in the event of default.
MR (Macro Rate) — external macroeconomic risks affecting the quality of the credit portfolio.

If we break the ECL formula into its components, it turns out to be quite intuitive.

To calculate expected losses, knowing the current debt of a client is not enough — it is necessary to forecast its change by the time of default. This is where EAD comes into play: the projected amount the client will owe the bank at the time of default.

Obviously, a client will not necessarily default — it will occur with a certain probability, denoted as PD. Therefore, the bank’s expected losses are the forecasted balance at the time of default multiplied by the probability of default: EAD × PD.

However, even in the event of default, the bank might recover part of the debt. Often, a portion of the funds can be recovered through restructuring, negotiations, or legal proceedings. Here, LGD comes into play, reflecting the portion of debt the bank will not recover. Taking this into account, the formula expands to EAD × PD × LGD.

Finally, MR — a macroeconomic indicator that reflects the influence of external factors — is incorporated into the calculations. For instance, during an economic crisis, MR increases, reflecting heightened risks of non-repayment. The final formula becomes EAD × PD × LGD × MR.

To calculate reserves, a bank must be able to compute all components of the ECL formula for each client, and then simply multiply them to determine the reserve. Let’s discuss the specific tasks that arise in reserve estimation.

Portfolio Segmentation

When assessing reserves, the credit portfolio is typically divided into segments, each of which requires distinct models to calculate EAD, PD, LGD, and MR. This segmentation is essential because different types of loans and borrowers exhibit significantly different behaviors. I won’t go into detail on this topic now but will highlight the main points.

Classic Examples of Segmentation:

Type of Product – For example, a loan for purchasing goods, a credit card, and a mortgage are entirely different financial products. They vary in terms, amounts, repayment structures, and the nature of the bank’s interaction with clients. Each type of loan requires a specific model that considers its unique characteristics.
Presence of Delinquency – One of the key risk factors is payment delinquency. If a client misses a mandatory payment (fails to pay a loan on time), their probability of default (PD) increases significantly. As a result, loans with delinquencies are considered a higher risk, necessitating larger reserves.

Experience shows that developing separate models for major segments results in more accurate estimates compared to attempting to consolidate all data into a single universal model. This approach provides greater calculation flexibility and better accounts for the behavioral differences specific to each segment.

EAD Models (Exposure at Default)

EAD (Exposure at Default) is the amount a client will owe the bank at the moment of default. It includes not only the current debt but also potential changes such as additional spending, interest, or partial repayments.

Let’s attempt to construct an EAD model for a specific segment (e.g., credit cards without delinquencies). To do this, we need to identify the same type of credit cards as they were 12 months ago (12 months is a requirement of the standard). From this set, we only keep the accounts that went into default within the following 12 months. This subset will be our sample.

For the selected loans, we need to collect:

Debt at the time of default — this is the value that needs to be predicted.

Account information at the cut-off date, i.e., the account’s status exactly 12 months prior. This may include the client’s debt amount, client data (age, gender, etc.), and payment information.

Of course, in practice, the process is somewhat more complex. The evaluation is not based solely on data from exactly 12 months ago. To increase the sample size and avoid dependence on clients from a specific month, data from different periods, such as 12, 13, 14 months ago, and so on, is used. At the same time, the default window for each client is strictly fixed — it always spans exactly 12 months from the moment the client is included in the sample. This approach ensures greater statistical reliability and model accuracy.

After collecting the data, we can proceed to build a machine-learning model. In this case, it will be a regression model that predicts a numerical value — the amount of debt at the time of default — based on the available information about the client and their loan.

PD Models (Probability of Default)

Probability of Default (PD) is the likelihood that a borrower will default within a specified time period (e.g., over the next 12 months). Estimating PD is one of the key models in the reserve formation process.

In essence, this task is similar to credit scoring, where the bank determines whether to issue a loan to a client. However, in this case, the loan has already been issued, and the bank has much more data about the client, including information on how they are currently servicing their debt.

Let’s try to build a PD model for a specific segment (e.g., credit cards without delinquencies). To do this, we need to extract data for a similar segment from 12 months ago. Then, each account is assigned a binary indicator of default over the subsequent 12 months, depending on whether the user defaulted or not (1 — default occurred, 0 — no default).

For the selected loans, we need to collect:

Default indicator (1 or 0) — this is the target variable to be predicted.

Account status at the cut-off date, i.e., information about the account exactly 12 months prior.

As with EAD estimation, it is better to use data from different periods, such as 12, 13, 14 months ago, and beyond, to build PD models. This approach increases the dataset size, which enhances the reliability and accuracy of the model.

The resulting dataset can be used to build a PD model — in this case, it is a binary classification task. However, in this task, we are not interested in predicting the class itself (0 or 1), but rather in the probability that the client belongs to the class of default accounts.

LGD Models (Loss Given Default)

LGD (Loss Given Default) represents the amount of loss in the event of default. It reflects the proportion of the total outstanding debt at the time of default that the bank will lose if the borrower is unable to repay the loan.

LGD is often calculated as the inverse of RR (Recovery Rate) — the rate of funds recovered after the borrower’s default. RR indicates the portion of the total debt that the bank managed to recover.

LGD = 1 – RR

When estimating RR (Recovery Rate), machine learning methods often prove to be ineffective. This is because, after a borrower defaults, their subsequent behavior is more influenced by the bank’s debt recovery actions than by the borrower’s own characteristics.

RR estimation is based on statistics regarding how accounts that have already defaulted repay their debts. Typically, to perform the estimation, users are divided into segments based on the stage of interaction with the bank and how long they have been delinquent. Within these segments, the proportion of recovered funds relative to the total debt of the entire segment is calculated.

MR Models (Macro Rate)

Models for assessing macroeconomic risks are perhaps the most creative part of the reserve calculation process. The key task here is to understand how changes in economic conditions affect the bank’s losses.

The primary goal is to account for macroeconomic risks that are not yet reflected in the forecasts of PD, LGD, and EAD. Why is this important? Because all previous models were built on historical data (at least 12 months old). These data reflect the specific economic conditions that existed during that period and already include the macroeconomic risks relevant at the time.

For instance, if the models were built on data from a period of economic growth, when household incomes were rising and loans were being repaid on time, the reserves might be underestimated during a crisis. This is problematic because the increased risks won’t disappear — they will suddenly impact the bank, causing unexpected losses. To avoid this, the MR metric is introduced, which increases proactively, even before real risks materialize.

How to Build MR? There are many approaches to constructing MR, and the standards in this area allow flexibility. The key requirements are:

Incorporate macroeconomic indicators such as GDP, unemployment rate, inflation, etc.
Consider three scenarios:
- Baseline — directly influences reserve calculations.
- Optimistic and Pessimistic — used for analyzing potential deviations.

It is important to note that the baseline scenario directly impacts the size of the reserves, while the additional scenarios serve as reference tools.

One particularly promising approach involves forecasting not changes in PD, LGD, and EAD, but the redistribution of the portfolio between client segments. For example, during a crisis, some clients may begin to fall into delinquency, moving them into segments with higher PD, LGD, and EAD values.

To forecast such transitions, it is necessary to:

Collect data on client transitions between segments in past periods.

Assess how these transitions depend on macroeconomic indicators.

Remove the average effect already accounted for in the historical data used to build the baseline models.

Here, it is difficult to avoid using machine learning (ML). Depending on how you analyze the data and what exactly you are forecasting, the task can be either:

Regression (forecasting quantitative values), or

Classification (determining the probabilities of belonging to specific classes).

For developing pessimistic and optimistic scenarios, confidence intervals for forecasts can be used.

Task Features

Interpretability. One of the key requirements when building models for reserve estimation is their transparency and interpretability. This is due to the expectations of regulators and audit firms, who require the bank to explain why the reserves are calculated in a particular way, down to any level of detail.

Neural networks, gradient boosting, and other complex algorithms, while demonstrating high accuracy, are often not used in the final version of the model due to their “black box” nature. Instead, such models are utilized during the research phase to determine the maximum achievable level of forecast quality. Afterward, interpretable models are developed that aim to achieve similar accuracy.

Significant Impacts. Exceptional accuracy is crucial in reserve forecasts, as even a 1–2% error can result in substantial financial losses for the bank. Moreover, the bank cannot abruptly change the model or suddenly increase or decrease the reserve amount. The bank’s actions are expected to be consistent and gradual to avoid raising concerns from regulators, shareholders, and other stakeholders.

Reporting. The work on the reserve model does not end with its development. It is essential to conduct an in-depth analysis of its forecasts, understand the model’s properties, and interpret its behavior. This not only helps improve the model but also builds trust in its results among internal and external users.

Conclusion

In this article, I discussed what the task of reserve estimation entails and the steps a bank needs to take to address it. Of course, real-life scenarios are much more complex, and I have deliberately simplified certain details for clarity. Additionally, there are internal aspects of the work that cannot be disclosed.

If you find this topic interesting, feel free to join the discussion in the comments! I’d be happy to dive deeper into specific parts of the reserve estimation process.

Other Articles by the Author:

What Are Reserves?

Expected Credit Loss

Portfolio Segmentation

EAD Models (Exposure at Default)

PD Models (Probability of Default)

LGD Models (Loss Given Default)

MR Models (Macro Rate)

Task Features

Conclusion

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Leave a Reply