Archive for January, 2016

Identifying individual variable contributions to a scorecard

When I’m not writing SQL, it’s a good bet that I’m building a scorecard – in consumer finance, they’re ubiquitous: customer accept/reject, fraud detection, marketing, collections, etc.

In ideal circumstances, the code that performs the scoring of the data is completely within my control (usually written in T-SQL). However, I’ve recently built a scorecard that needed to be implemented in a third-party system – and this particular system just spits out a score, it can’t currently tell you the contribution of each variable. But why might this be a problem? I mean, I specified the scorecard in the first place, and I can get at the raw data, so…?

As is typical, the issue arises when something goes wrong: data is run through the scoring system, but the score isn’t as expected. How can we help ourselves with debugging?

Here’s an example scorecard, which for the purposes of demonstration, I’ve written in SQL:

SELECT
	Score = 497
		-- Customer age in years
		+ CASE
			WHEN Age < 20 THEN -10
			WHEN Age < 40 THEN  -2
			WHEN Age < 60 THEN   2
			WHEN Age < 80 THEN  10
		END
		-- BLR = Balance-to-Limit Ratio, as %
		+ CASE
			WHEN BLR <   50.0 THEN  30
			WHEN BLR <   90.0 THEN  10
			WHEN BLR <  100.0 THEN   2
			WHEN BLR >= 100.0 THEN -45
		END
		-- (A real scorecard would have more variables.)
	FROM dbo.Data

Given just these two variables, you can see that there are two ways in which the overall score could be 509: either the customer is aged between 40 and 59, with a balance-limit ratio (BLR) between 50.0% and 89.9%; or they are aged between 60 and 79, with a BLR between 90.0% and 99.9%. But which is it? As it stands, we can’t know from the score alone.

However, if we make some minor changes, it becomes possible. Take a look at the following tweaks to the CASE statements:

CASE
	WHEN Age < 20 THEN -10.0001
	WHEN Age < 40 THEN  -2.0002
	WHEN Age < 60 THEN   2.0004
	WHEN Age < 80 THEN  10.0008
END

CASE
	WHEN BLR <   50.0 THEN  30.0016
	WHEN BLR <   90.0 THEN  10.0032
	WHEN BLR <  100.0 THEN   2.0064
	WHEN BLR >= 100.0 THEN -45.0128
END

To each possible score, I’ve added a small decimal that is a power of 2, divided by some large enough power of 10. Now we can work out from the overall score alone, exactly how it was built up.

A customer aged between 40 and 59, with a balance-limit ratio (BLR) between 50.0% and 89.9%, now has a score of 509.0036; whereas a customer aged between 60 and 79, with a BLR between 90.0% and 99.9% now has a score of 509.0072. And there is only one way in each case that the score could have been achieved.

Fairly obviously, you should round the score before making the Pass/Refer/Fail decision. But if it’s not possible, those extra digits after the decimal point are only going to have a tiny effect on the predictive power of the scorecard.

(Of course, the above only works if your scoring system can cope with decimal scores. If it can’t, you’re out of luck.)

One final thing: I’ve recently seen some scorecard errors that were due the scorecard not specifying scores for all possible values of a variable. The commonly accepted convention is that if a value isn’t present (usually blank or NULL), then it contributes zero – but if you aren’t aware of all the possible values, or the code is wrong, then you’ll have problems. If it’s under my control, I make sure all possible values are covered (even if they’d normally score zero), and add a catch-all for each variable, e.g.:

CASE
	WHEN Age >= 18 AND Age < 20 THEN -10.0001
	WHEN Age >= 20 AND Age < 40 THEN  -2.0002
	WHEN Age >= 40 AND Age < 50 THEN   0.0004
	WHEN Age >= 50 AND Age < 60 THEN   2.0008
	WHEN Age >= 60 AND Age < 80 THEN  10.0016
	ELSE -99999.99
END

, and then Decline or Refer all scores less than zero.

Do you have any scorecard tricks that have made your life easier? Please tell us about them in the comments below.

, ,

Leave a comment