Archive for January, 2016
When I’m not writing SQL, it’s a good bet that I’m building a scorecard – in consumer finance, they’re ubiquitous: customer accept/reject, fraud detection, marketing, collections, etc.
In ideal circumstances, the code that performs the scoring of the data is completely within my control (usually written in T-SQL). However, I’ve recently built a scorecard that needed to be implemented in a third-party system – and this particular system just spits out a score, it can’t currently tell you the contribution of each variable. But why might this be a problem? I mean, I specified the scorecard in the first place, and I can get at the raw data, so…?
As is typical, the issue arises when something goes wrong: data is run through the scoring system, but the score isn’t as expected. How can we help ourselves with debugging?
Here’s an example scorecard, which for the purposes of demonstration, I’ve written in SQL:
SELECT Score = 497 -- Customer age in years + CASE WHEN Age < 20 THEN -10 WHEN Age < 40 THEN -2 WHEN Age < 60 THEN 2 WHEN Age < 80 THEN 10 END -- BLR = Balance-to-Limit Ratio, as % + CASE WHEN BLR < 50.0 THEN 30 WHEN BLR < 90.0 THEN 10 WHEN BLR < 100.0 THEN 2 WHEN BLR >= 100.0 THEN -45 END -- (A real scorecard would have more variables.) FROM dbo.Data
Given just these two variables, you can see that there are two ways in which the overall score could be 509: either the customer is aged between 40 and 59, with a balance-limit ratio (BLR) between 50.0% and 89.9%; or they are aged between 60 and 79, with a BLR between 90.0% and 99.9%. But which is it? As it stands, we can’t know from the score alone.
However, if we make some minor changes, it becomes possible. Take a look at the following tweaks to the CASE statements:
CASE WHEN Age < 20 THEN -10.0001 WHEN Age < 40 THEN -2.0002 WHEN Age < 60 THEN 2.0004 WHEN Age < 80 THEN 10.0008 END CASE WHEN BLR < 50.0 THEN 30.0016 WHEN BLR < 90.0 THEN 10.0032 WHEN BLR < 100.0 THEN 2.0064 WHEN BLR >= 100.0 THEN -45.0128 END
To each possible score, I’ve added a small decimal that is a power of 2, divided by some large enough power of 10. Now we can work out from the overall score alone, exactly how it was built up.
A customer aged between 40 and 59, with a balance-limit ratio (BLR) between 50.0% and 89.9%, now has a score of 509.0036; whereas a customer aged between 60 and 79, with a BLR between 90.0% and 99.9% now has a score of 509.0072. And there is only one way in each case that the score could have been achieved.
Fairly obviously, you should round the score before making the Pass/Refer/Fail decision. But if it’s not possible, those extra digits after the decimal point are only going to have a tiny effect on the predictive power of the scorecard.
(Of course, the above only works if your scoring system can cope with decimal scores. If it can’t, you’re out of luck.)
One final thing: I’ve recently seen some scorecard errors that were due the scorecard not specifying scores for all possible values of a variable. The commonly accepted convention is that if a value isn’t present (usually blank or NULL), then it contributes zero – but if you aren’t aware of all the possible values, or the code is wrong, then you’ll have problems. If it’s under my control, I make sure all possible values are covered (even if they’d normally score zero), and add a catch-all for each variable, e.g.:
CASE WHEN Age >= 18 AND Age < 20 THEN -10.0001 WHEN Age >= 20 AND Age < 40 THEN -2.0002 WHEN Age >= 40 AND Age < 50 THEN 0.0004 WHEN Age >= 50 AND Age < 60 THEN 2.0008 WHEN Age >= 60 AND Age < 80 THEN 10.0016 ELSE -99999.99 END
, and then Decline or Refer all scores less than zero.
Do you have any scorecard tricks that have made your life easier? Please tell us about them in the comments below.