Perform Quantile Binning of a Numeric Vector
bin_quantile.Rd
bin_quantile()
takes a vector of numeric values and converts
them into intervals using the quantile()
and cut()
functions in R, such
that the data is equally distributed across each of the unique intervals.
Binning continuous independent variables into categorical representations
is an important pre-processing step for building credit scorecards, and
needs to take place prior to fitting a logistic regression model.
Usage
bin_quantile(
x,
n_bins = 4L,
min_value = -Inf,
max_value = Inf,
decimals = 0L,
digits = 10L,
na.rm = FALSE
)
Arguments
- x
A numeric vector to convert to a categorical factor vector
- n_bins
(Integer) The number of "bins" (unique intervals) to be returned; default is 4
- min_value
(Numeric) The value to floor the lowest interval at; default is
-Inf
(no floor)- max_value
(Numeric) The value to ceiling the highest interval at; default is
Inf
(no ceiling)- decimals
(Integer) The number of decimals to round the quantile values to before creating intervals; default is 0 (round to the nearest integer)
- digits
(Integer) Number of digits to display in the console for each interval; this helps avoid scientific notation (default is 10)
- na.rm
(Logical) Should
NA
andNaN
(missing) values be removed fromx
before calculating the quantile bins? Default isFALSE
.
Value
A vector of factors with the same length as x
, representing the interval
that corresponds to each value in x
Examples
# Use the function's defaults
bin_quantile(
x = iris$Sepal.Width * 100
) |>
levels()
#> [1] "[-Inf,280]" "(280,300]" "(300,330]" "(330, Inf]"
# Specify the number of bins, include min and max cap to the lowest and
# highest intervals
bin_quantile(
x = iris$Sepal.Width * 100,
n_bins = 3,
min_value = -5,
max_value = 9999
) |>
levels()
#> [1] "[-5,290]" "(290,320]" "(320,9999]"
# Handle small `x` values by ignoring rounding intervals to nearest whole
# number
bin_quantile(
x = iris$Sepal.Width,
decimals = 2
) |>
levels()
#> [1] "[-Inf,2.8]" "(2.8,3]" "(3,3.3]" "(3.3, Inf]"
# Remove missing values in `x` before the binning calculation
bin_quantile(
x = c(iris$Sepal.Width, NA),
decimals = 2,
na.rm = TRUE
) |>
levels()
#> Warning: `bin_quantile()` produced some NA values for input values that were outside of the defined `min_value` or `max_value` specifications
#> [1] "[-Inf,2.8]" "(2.8,3]" "(3,3.3]" "(3.3, Inf]"