Thursday, 27 November 2014

More Credit Risk Analysis on Bondora Default Rates


I'm Brett, and in this article I'll present some more findings from an analysis on the Bondora Loan dataset.

Click here to see the previous analysis I did on this dataset. The previous analysis looked at how to potentially reduce default rates by using the Portfolio Manager.

In this article I'll look at some of the factors you can use to choose loans through the Marketplace and Secondary Market. Note that I'm only presenting data from Estonian borrowers, and also that past performance is no indicator of future performance.

So with that in mind, what factors are useful in screening out higher risk borrowers, and which factors bear little relationship with loan default rates?

Read on to find out...


There is quite a significant difference between the loan default rates of male and female borrowers:

Percentage default rates (Y) plotted against borrower's gender (X)
I found that default rates for men were 10.36%, against 8.92% for women. So men appear to be 16% more likely to default on their loans.

Only loaning money to women does significantly cut down on the number of available loans though. Men comprise 53% of the market, and women 47%.


As far as age goes, the first observation is the data quality isn't great for anyone younger than 21 or older than 65. So I'd definitely avoid anyone outside of the core 21 - 60 age group.

But in this core age group, it's fairly obvious that there's a good relationship between age and default rates on loans:

Percentage default rates (Y) plotted against age of borrower (X)
So younger people are a higher credit risk. There's a much lower credit risk once borrowers hit 30.

Again, I would avoid lending to borrowers older than 60 as there is a spike in defaults at 65 for some reason.

Marital Status

There's also some useful information that can be found in the marital status factor:

Percentage default rates (Y) plotted against borrower's marital status (X)
So single people are the highest risk, with those co-habiting a safer credit risk and married people even safer.

But lowest risk of all are divorced people. I'm a little surprised at this, so it could definitely be a factor worthy of further investigation in a future article...

Use of Loan

In my previous attempt at analysing the factors you could use to select loans using the Bondora portfolio manager I was somewhat disappointed.

Not so with the factors you can use in the Marketplace or Secondary Market!

Another really useful factor is Use of Loan:

Percentage default rates (Y) plotted against borrower's declared use of loan monies (X)
According to my analysis, loan defaults are much lower for loans taken out for the purpose of business and travel.

I was a little surprised about this, and previously I would avoid investing in any travel related loans on Bondora.

I was flat wrong!

I guess that people who book vacations are feeling reasonably confident that their financial situation is stable and that they'll be able to repay their loan.

And I suppose that people who borrow for a business are the type of personality who works hard and is able to find enough opportunities out there to repay their loans.

After producing this chart, I would definitely be more wary of loans relating to education and consolidation of existing loans.

I read on a forum that somebody likes to avoid vehicle loans. Well these do look higher risk, but they're not the highest risk - that honour belongs to education.

Finally, I'll definitely be seeking out real estate or home improvement loans, which appear to have a lower risk profile compared to some other uses.

Education Level

From the chart below, it appears that loan risk is lower the more educated the borrower is:

Percentage default rates (Y) plotted against borrower's highest level of education (X)
 I guess that more educated people tend to find it easier to find good jobs in Estonia's labour market.
Incidentally, take the default rate for primary level education with a pinch of salt due to the comparatively low number of borrowers in this group.

Data Generation

To generate this data, I used the following process:
  1. I downloaded the loan Excel spreadsheet from Bondora.
  2. I imported the Excel spreadsheet into SQL Server.
  3. I wrote some custom SQL queries to analyse the data.
  4. I exported the results sets from SQL Server back into Excel in order to turn them into charts.

I have assumed that the AD column equaling 1 indicates that a loan has defaulted. I have excluded loans that were applied for within the last 3 months or so. Finally, I've only included Estonian loans in all the queries except for the one relating to country.

If you want to have a go at analysing the data yourself, then this is the basic SQL query I used, in this case the query for the education_id factor:

    case education_id when 1 then 'Primary'
     when 2 then 'Basic'
     when 3 then 'Vocational'
     when 4 then 'Secondary'
     when 5 then 'Higher'
     end 'Education',
    (Sum(AD) / Count(*) * 100) AS 'Percentage Defaulted',
    SUM(AD) as NumberInDefault,
    COUNT(*) as NumberOfLoans
From Loans
where country = 'EE' and creditdecision = 1
and education_id between 1 and 5
and LoanApplicationStartedDate < '2014-10-27'
group by education_id
order by education_id

Summary and Conclusions

The basic message is that if you want more control over your loan future default rates, then you have to buy loans on the Marketplace and possibly in the Secondary Market as well.

There are definitely some good factors that can be used to lower your potential default rates.

One thing I'll point out is that in this article I've only considered the effect of one factor at a time on loan default rates.

I'd sure like to drill down to get the likely default rates of a particular group (e.g. 50-55 year old women who want travel loans). I suspect that default rates for these types of groups will be significantly lower than the market average. 

Comments? Questions? Suggestions? Leave feedback below!

1 comment:

  1. You should probably cut out more of the sample from the end (4-6 months). Also, I think there might be some differences in the results in reality because it seems you didn't account for some additional nuances in your analyses.

    In general it's good enough to do the analyses in this way I guess, but for at least the loan purpose I think you'll get somewhat different results if you do the calculation properly.

    I'll look into it in my blogpost series 4th post more thoroughly. It may have changed, but when I did the calculations rather recently, the results were different from yours when you account for the all the necessary criteria.