Calculate the Population Stability Index
psi.Rd
psi()
computes the population stability index based on the
score range bin assigned to each observation in the development and
monitoring samples
Arguments
- x
Factor vector representing the score range bin assigned to each observation in the development sample
- y
Factor vector representing the score range bin assigned to each observation in the monitoring sample
Details
The population stability index compares the sample distributions between the development data and the monitoring data across score ranges. It helps answer the question, “Does our recent applicant pool continue to look like the applicant pool that the model was trained on?”
The PSI is derived via the formula
$$\sum(P(Y = b) - P(X = b)) * ln\Big(\frac {P(Y = b)}{P(X = b)}\Big)$$
for each score range bin \(b\), where \(X\) is the development sample and \(Y\) is the monitoring sample.
The population stability index can be interpreted as follows:
PSI Value | Interpretation |
< 0.10 | No significant shift in population |
[0.10, 0.25) | Minor shift in population |
> 0.25 | Significant shift in population |
References
Siddiqi, Naeem (2017). Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards. 2nd ed., Wiley., pp. 369.
Examples
set.seed(123)
score_ranges <- c("[188,215]", "(215,221]", "(221,227]", "(227,237]")
# Simulate development data
development <- sample(
x = score_ranges,
size = 1000,
replace = TRUE,
prob = c(0.25, 0.25, 0.25, 0.25)
) |>
factor(
levels = score_ranges,
ordered = TRUE
)
# Simulate monitoring data
monitoring <- sample(
x = score_ranges,
size = 200,
replace = TRUE,
prob = c(0.10, 0.20, 0.30, 0.40)
) |>
factor(
levels = score_ranges,
ordered = TRUE
)
# Compute the population stability index
psi(
x = development,
y = monitoring
)
#> [1] 0.3404592