cai() computes the characteristic analysis index for an independent variable and its classes using the scorecard, the model development sample and the model monitoring sample


cai(data, x, y, verbose = TRUE)



A two-column data frame that contains the unique classes in a single independent variable (located in the first column) and the associated points assigned to each class (located in the second column); data should be a subset of the overall scorecard


A vector containing the values of the independent variable in the development sample


A vector containing the values of the independent variable in the monitoring sample


(Logical) Should information on the CAI calculation be printed in the console?


A list with two elements:

  • classes: data frame with the CAI computed for each class

  • net: numeric value that represents the index for the entire independent variable; i.e., the net of the CAI values in classes


The characteristic analysis index details shifts in the population within each independent variable.

The CAI is derived via the formula

$$\sum(P(Y = c) - P(X = c)) * t_{c}$$

for each level \(c\), where \(X\) is the development sample, \(Y\) is the monitoring sample and \(t\) is the number of scorecard points


Siddiqi, Naeem (2017). Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards. 2nd ed., Wiley., pp. 284-287.


industry_classes <- c("dairy", "fruit", "grain", "poultry")

# Simulate development data
development_values <- sample(
  x = industry_classes,
  size = 1000,
  replace = TRUE,
  prob = c(0.25, 0.20, 0.25, 0.30)

# Simulate monitoring data
monitoring_values <- sample(
  x = industry_classes,
  size = 200,
  replace = TRUE,
  prob = c(0.20, 0.15, 0.10, 0.55)

# Points assigned to industry classes
characteristic_points <- tibble::tribble(
  ~class,  ~points,
  "dairy",      73,
  "fruit",      65,
  "grain",      87,
  "poultry",    97

# Calculate characteristic analysis index for the "industry" independent
# variable
  data = characteristic_points,
  x = development_values,
  y = monitoring_values
#> • Using "class" as the independent variable classes
#> • Using "points" as the scorecard points
#> $classes
#>     class prop_development prop_monitoring points     cai
#> 1   dairy            0.245           0.220     73  -1.825
#> 2   fruit            0.198           0.150     65  -3.120
#> 3   grain            0.261           0.065     87 -17.052
#> 4 poultry            0.296           0.565     97  26.093
#> $net
#> [1] 4.096