Skip to contents

psi() computes the population stability index based on the score range bin assigned to each observation in the development and monitoring samples

Usage

psi(x, y)

Arguments

x

Factor vector representing the score range bin assigned to each observation in the development sample

y

Factor vector representing the score range bin assigned to each observation in the monitoring sample

Value

A numeric value representing the population stability index

Details

The population stability index compares the sample distributions between the development data and the monitoring data across score ranges. It helps answer the question, “Does our recent applicant pool continue to look like the applicant pool that the model was trained on?”

The PSI is derived via the formula

$$\sum(P(Y = b) - P(X = b)) * ln\Big(\frac {P(Y = b)}{P(X = b)}\Big)$$

for each score range bin \(b\), where \(X\) is the development sample and \(Y\) is the monitoring sample.

The population stability index can be interpreted as follows:

PSI ValueInterpretation
< 0.10No significant shift in population
[0.10, 0.25)Minor shift in population
> 0.25Significant shift in population

References

Siddiqi, Naeem (2017). Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards. 2nd ed., Wiley., pp. 369.

Examples

set.seed(123)
score_ranges <- c("[188,215]", "(215,221]", "(221,227]", "(227,237]")

# Simulate development data
development <- sample(
  x = score_ranges,
  size = 1000,
  replace = TRUE,
  prob = c(0.25, 0.25, 0.25, 0.25)
) |>
  factor(
    levels = score_ranges,
    ordered = TRUE
  )

# Simulate monitoring data
monitoring <- sample(
  x = score_ranges,
  size = 200,
  replace = TRUE,
  prob = c(0.10, 0.20, 0.30, 0.40)
) |>
  factor(
    levels = score_ranges,
    ordered = TRUE
  )

# Compute the population stability index
psi(
  x = development,
  y = monitoring
)
#> [1] 0.3404592