LikeLike

]]>I think I found the answer to where this blasted stat was derived! It’s in the first few pages of that information theory book you linked to which is available on Google, in the section on Divergence – this is what would apparently now be called symmetric kullback-leibler divergence according to wikipedia.

Lets say we’re doing WOE/IV on age binning and I will summarize my possibly incorrect understanding. We have our distributions b(age) and g(age) which give the probability that a sample falls in a given age bucket given that they are bad or good respectively.

We start with the Bayes Rule that tells us that LogOdds(Bad:Good given Age) – LogOdds(Bad:Good) = WOE(bad), i.e. your WOE(bad)=log(b(age)/g(age)) gives you a measure of how much you are increasing or decreasing your log odds that the sample is bad when you take into account age vs. when you don’t. WOE(good) is just negative WOE(bad).

The KL(Bad, Good) divergence is the expected WOE(bad), i.e. average change in log-odds by taking into account age, for samples that are actually bad. It’s always positive because on average bad samples are going to be in age buckets where bad is concentrated more than good is concentrated i.e. they end up in buckets with positive WOE, This is intuitively satisfying as well because on average adding data such as age vs. not adding should only ever increase the log-odds in favour of bad for samples that are actually bad, suggesting positive average WOE again. So KL(Bad, Good) is used as a measure of information gained (discriminative power gained) by taking into account age for bad samples.

Symmetric KL is KL(Bad, Good) + KL(Good, Bad), it’s the information gained for samples which are in fact bad plus information gained for samples which are in fact good. It can be written in the form sum((g(age)-b(age))log(g(age)/b(age))) i.e. it is the IV stat.

LikeLike

]]>LikeLike

]]>