Skip to contents

cai() computes the characteristic analysis index for an independent variable and its classes using the scorecard, the model development sample and the model monitoring sample

Usage

cai(data, x, y, verbose = TRUE)

Arguments

data

A two-column data frame that contains the unique classes in a single independent variable (located in the first column) and the associated points assigned to each class (located in the second column); data should be a subset of the overall scorecard

x

A vector containing the values of the independent variable in the development sample

y

A vector containing the values of the independent variable in the monitoring sample

verbose

(Logical) Should information on the CAI calculation be printed in the console?

Value

A list with two elements:

  • classes: data frame with the CAI computed for each class

  • net: numeric value that represents the index for the entire independent variable; i.e., the net of the CAI values in classes

Details

The characteristic analysis index details shifts in the population within each independent variable.

The CAI is derived via the formula

$$\sum(P(Y = c) - P(X = c)) * t_{c}$$

for each level \(c\), where \(X\) is the development sample, \(Y\) is the monitoring sample and \(t\) is the number of scorecard points

References

Siddiqi, Naeem (2017). Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards. 2nd ed., Wiley., pp. 284-287.

Examples

set.seed(123)
industry_classes <- c("dairy", "fruit", "grain", "poultry")

# Simulate development data
development_values <- sample(
  x = industry_classes,
  size = 1000,
  replace = TRUE,
  prob = c(0.25, 0.20, 0.25, 0.30)
)

# Simulate monitoring data
monitoring_values <- sample(
  x = industry_classes,
  size = 200,
  replace = TRUE,
  prob = c(0.20, 0.15, 0.10, 0.55)
)

# Points assigned to industry classes
characteristic_points <- tibble::tribble(
  ~class,  ~points,
  "dairy",      73,
  "fruit",      65,
  "grain",      87,
  "poultry",    97
)

# Calculate characteristic analysis index for the "industry" independent
# variable
cai(
  data = characteristic_points,
  x = development_values,
  y = monitoring_values
)
#> • Using "class" as the independent variable classes
#> • Using "points" as the scorecard points
#> $classes
#>     class prop_development prop_monitoring points     cai
#> 1   dairy            0.245           0.220     73  -1.825
#> 2   fruit            0.198           0.150     65  -3.120
#> 3   grain            0.261           0.065     87 -17.052
#> 4 poultry            0.296           0.565     97  26.093
#> 
#> $net
#> [1] 4.096
#>