A spin on the famous GermanCredit machine learning dataset.

A dataset containing the loan attributes of 1,000 fake loans. This dataset is intended to be well-suited for building classification models, including logistic regression models that can be converted into scorecards.

Usage

loans

Format

A data frame with 1,000 rows and 11 variables:

loan_id: identifier representing a unique loan
amount_of_existing_debt: interval representing the amount of existing debt the customer of the loan has outstanding, in dollars
term: original length of the loan term, in months
industry: primary industry farmed of the primary customer on the loan
loan_amount: original amount of the loan, in dollars
other_debtors_guarantors: status of the customer on the loan as a "co-applicant", "guarantor", or "none"
years_at_current_address: length of time customer has lived at their current address, in years
collateral_type: type of collateral used to back the loan
housing_status: whether the primary customer on the loan owns or rents their residential address
count_loan_facilities: count of the number of loan facilities the customer associated with the loan has with the institution
default_status: binary "good"/"bad" classification of the loan's default status (i.e., the dependent variable)

Source

http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)