Week 2: tools - "Loan approval" dataset¶
Context¶
- Source: Kaggle
- Description: complete dataset of 50,000 loan applications across Credit Cards, Personal Loans, and Lines of Credit. Includes customer demographics, financial profiles, credit behavior, and approval decisions based on real US & Canadian banking criteria.
- Credit: Brian Risk on Kaggle
Load dataset¶
In [1]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("datasets/Loan_approval_data_2025.csv", delimiter=',', encoding='ascii')
numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
categorical_cols = df.select_dtypes(exclude=[np.number]).columns.tolist()
# 🧾 Display dataset informations
print("Dataset shape:", df.shape)
Dataset shape: (50000, 20)
Explore content¶
In [2]:
df.head()
Out[2]:
| customer_id | age | occupation_status | years_employed | annual_income | credit_score | credit_history_years | savings_assets | current_debt | defaults_on_file | delinquencies_last_2yrs | derogatory_marks | product_type | loan_intent | loan_amount | interest_rate | debt_to_income_ratio | loan_to_income_ratio | payment_to_income_ratio | loan_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | CUST100000 | 40 | Employed | 17.2 | 25579 | 692 | 5.3 | 895 | 10820 | 0 | 0 | 0 | Credit Card | Business | 600 | 17.02 | 0.423 | 0.023 | 0.008 | 1 |
| 1 | CUST100001 | 33 | Employed | 7.3 | 43087 | 627 | 3.5 | 169 | 16550 | 0 | 1 | 0 | Personal Loan | Home Improvement | 53300 | 14.10 | 0.384 | 1.237 | 0.412 | 0 |
| 2 | CUST100002 | 42 | Student | 1.1 | 20840 | 689 | 8.4 | 17 | 7852 | 0 | 0 | 0 | Credit Card | Debt Consolidation | 2100 | 18.33 | 0.377 | 0.101 | 0.034 | 1 |
| 3 | CUST100003 | 53 | Student | 0.5 | 29147 | 692 | 9.8 | 1480 | 11603 | 0 | 1 | 0 | Credit Card | Business | 2900 | 18.74 | 0.398 | 0.099 | 0.033 | 1 |
| 4 | CUST100004 | 32 | Employed | 12.5 | 63657 | 630 | 7.2 | 209 | 12424 | 0 | 0 | 0 | Personal Loan | Education | 99600 | 13.92 | 0.195 | 1.565 | 0.522 | 1 |
Display a nice chart¶
In [3]:
plt.figure(figsize=(12, 8))
for idx, col in enumerate(numeric_cols):
plt.subplot(4, 4, idx+1)
sns.histplot(df[col], kde=True, bins=30)
plt.title(col)
plt.tight_layout()
plt.show()
In [ ]: