Calculate the Characteristic Analysis Index
cai.Rd
cai()
computes the characteristic analysis index for an
independent variable and its classes using the scorecard, the model
development sample and the model monitoring sample
Arguments
- data
A two-column data frame that contains the unique classes in a single independent variable (located in the first column) and the associated points assigned to each class (located in the second column);
data
should be a subset of the overall scorecard- x
A vector containing the values of the independent variable in the development sample
- y
A vector containing the values of the independent variable in the monitoring sample
- verbose
(Logical) Should information on the CAI calculation be printed in the console?
Value
A list with two elements:
classes
: data frame with the CAI computed for each classnet
: numeric value that represents the index for the entire independent variable; i.e., the net of the CAI values inclasses
Details
The characteristic analysis index details shifts in the population within each independent variable.
The CAI is derived via the formula
$$\sum(P(Y = c) - P(X = c)) * t_{c}$$
for each level \(c\), where \(X\) is the development sample, \(Y\) is the monitoring sample and \(t\) is the number of scorecard points
References
Siddiqi, Naeem (2017). Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards. 2nd ed., Wiley., pp. 284-287.
Examples
set.seed(123)
industry_classes <- c("dairy", "fruit", "grain", "poultry")
# Simulate development data
development_values <- sample(
x = industry_classes,
size = 1000,
replace = TRUE,
prob = c(0.25, 0.20, 0.25, 0.30)
)
# Simulate monitoring data
monitoring_values <- sample(
x = industry_classes,
size = 200,
replace = TRUE,
prob = c(0.20, 0.15, 0.10, 0.55)
)
# Points assigned to industry classes
characteristic_points <- tibble::tribble(
~class, ~points,
"dairy", 73,
"fruit", 65,
"grain", 87,
"poultry", 97
)
# Calculate characteristic analysis index for the "industry" independent
# variable
cai(
data = characteristic_points,
x = development_values,
y = monitoring_values
)
#> • Using "class" as the independent variable classes
#> • Using "points" as the scorecard points
#> $classes
#> class prop_development prop_monitoring points cai
#> 1 dairy 0.245 0.220 73 -1.825
#> 2 fruit 0.198 0.150 65 -3.120
#> 3 grain 0.261 0.065 87 -17.052
#> 4 poultry 0.296 0.565 97 26.093
#>
#> $net
#> [1] 4.096
#>