Archive for January, 2016
Identifying individual variable contributions to a scorecard
Posted by sqlpete in scorecards on January 24, 2016
When I’m not writing SQL, it’s a good bet that I’m building a scorecard – in consumer finance, they’re ubiquitous: customer accept/reject, fraud detection, marketing, collections, etc.
In ideal circumstances, the code that performs the scoring of the data is completely within my control (usually written in T-SQL). However, I’ve recently built a scorecard that needed to be implemented in a third-party system – and this particular system just spits out a score, it can’t currently tell you the contribution of each variable. But why might this be a problem? I mean, I specified the scorecard in the first place, and I can get at the raw data, so…?
As is typical, the issue arises when something goes wrong: data is run through the scoring system, but the score isn’t as expected. How can we help ourselves with debugging?
Here’s an example scorecard, which for the purposes of demonstration, I’ve written in SQL:
SELECT
Score = 497
-- Customer age in years
+ CASE
WHEN Age < 20 THEN -10
WHEN Age < 40 THEN -2
WHEN Age < 60 THEN 2
WHEN Age < 80 THEN 10
END
-- BLR = Balance-to-Limit Ratio, as %
+ CASE
WHEN BLR < 50.0 THEN 30
WHEN BLR < 90.0 THEN 10
WHEN BLR < 100.0 THEN 2
WHEN BLR >= 100.0 THEN -45
END
-- (A real scorecard would have more variables.)
FROM dbo.Data
Given just these two variables, you can see that there are two ways in which the overall score could be 509: either the customer is aged between 40 and 59, with a balance-limit ratio (BLR) between 50.0% and 89.9%; or they are aged between 60 and 79, with a BLR between 90.0% and 99.9%. But which is it? As it stands, we can’t know from the score alone.
However, if we make some minor changes, it becomes possible. Take a look at the following tweaks to the CASE statements:
CASE
WHEN Age < 20 THEN -10.0001
WHEN Age < 40 THEN -2.0002
WHEN Age < 60 THEN 2.0004
WHEN Age < 80 THEN 10.0008
END
CASE
WHEN BLR < 50.0 THEN 30.0016
WHEN BLR < 90.0 THEN 10.0032
WHEN BLR < 100.0 THEN 2.0064
WHEN BLR >= 100.0 THEN -45.0128
END
To each possible score, I’ve added a small decimal that is a power of 2, divided by some large enough power of 10. Now we can work out from the overall score alone, exactly how it was built up.
A customer aged between 40 and 59, with a balance-limit ratio (BLR) between 50.0% and 89.9%, now has a score of 509.0036; whereas a customer aged between 60 and 79, with a BLR between 90.0% and 99.9% now has a score of 509.0072. And there is only one way in each case that the score could have been achieved.
Fairly obviously, you should round the score before making the Pass/Refer/Fail decision. But if it’s not possible, those extra digits after the decimal point are only going to have a tiny effect on the predictive power of the scorecard.
(Of course, the above only works if your scoring system can cope with decimal scores. If it can’t, you’re out of luck.)
One final thing: I’ve recently seen some scorecard errors that were due the scorecard not specifying scores for all possible values of a variable. The commonly accepted convention is that if a value isn’t present (usually blank or NULL), then it contributes zero – but if you aren’t aware of all the possible values, or the code is wrong, then you’ll have problems. If it’s under my control, I make sure all possible values are covered (even if they’d normally score zero), and add a catch-all for each variable, e.g.:
CASE
WHEN Age >= 18 AND Age < 20 THEN -10.0001
WHEN Age >= 20 AND Age < 40 THEN -2.0002
WHEN Age >= 40 AND Age < 50 THEN 0.0004
WHEN Age >= 50 AND Age < 60 THEN 2.0008
WHEN Age >= 60 AND Age < 80 THEN 10.0016
ELSE -99999.99
END
, and then Decline or Refer all scores less than zero.
Do you have any scorecard tricks that have made your life easier? Please tell us about them in the comments below.
Recent Comments