Posts Tagged Visual Studio

Inverting a matrix in SQL Server

Apologies in advance, this post is a little light on certain details. This is because (a) I don’t have time to write a big, exhaustive piece, that could easily be a 30-page document, and (b) the actual creation of the SQLCLR function in Visual Studio was more akin to witchcraft than coding (I’ll attempt to explain later on…) Also, there’s some waffle around credit risk, you can safely ignore it! 😉

My credit risk use-case

In credit risk, we always want to know what’s going to happen to the loans in our loan book. Will they eventually pay us back, or will they default? Clearly, this is very important information, and you can’t run a successful lending business without it.

Markov chains

One of the ways we predict what’s going to happen, is using a Markov Chain model. We generate a transition matrix from our data, and using matrix arithmetic, repeatedly multiply the matrix by itself to calculate what happens in the future. The Markov property is that the future states of a process only depend on the current state, not the past. (This is also referred to as ‘memoryless’.)

Here’s an example transition matrix that shows the statuses of loans last month, compared to this month.


(An old FD used to call this the ‘Was/Is’ matrix)

The numbers 0 to 6 refer to how many payments behind the customer is, D is default, and S is settled. For example, of the loans that were three payments down last month, 35% stayed three payments down; 4% returned to zero (‘up to date’), and 7% settled their account. None of them went straight to default. Note that the percentages add up to 100 row-wise, but not column-wise.

D (default) and S (settled) are called absorbing states — once you enter them, that’s it, you can’t come back. (In real life, this isn’t true, but we can ignore it here.)

Because of these absorbing states, our Markov chain is a special case, an Absorbing Markov Chain. Using the above transition matrix, we can calculate the long-term probabilities of settlement (repayment) or default for all our loans, given their current position.

To get these probabilities, we need to calculate (I-Q)^{-1}, where I is the identity matrix, Q is our transition matrix, and the ‘-1’ superscript means inverse. Calculating this inverse matrix, and multiplying by our initial distribution of statuses, gives us:

E.g. an up-to-date loan currently has a 6% chance of defaulting. A loan that is 6 payments down only has a 25% chance of being repaid.

Inverting a matrix in Excel

Okay, couldn’t we just use Excel to do these calculations? Excel has an inbuilt function MINVERSE, see here for an example.

In the past, I’d done these calculations for a different loan book, using Excel, and the modelling had worked and told us what we needed to know. For the loan book I was working on now, I wanted to automate the calculations, over several different partitions of the data — for instance, if you have multiple products, then typically you’d need to do the calculations per product. Hence, the requirement to do everything within SQL Server.

Inverting a matrix from within SQL Server

I chose the title of this post carefully: we’re inverting a matrix using SQL Server, but the code isn’t written in SQL (T-SQL) itself. I expect someone somewhere has converted pre-existing Fortran or C/C++ code into pure T-SQL. (The algorithm is fairly straightforward, so I doubt it’s that hard to do the conversion.)

The two methods I’m proposing here both involve installing something to SQL Server itself, which admittedly may not be possible for everyone.

  1. Executing R code from within SQL Server, an example of running R code is given here. There are several libraries and functions dedicated to inverting matrices; solve from the base installation and inv from the matlib package are two of the most popular.
  2. Creating a SQLCLR function using C#

It’s the second one we’ll look at for the remainder of this post.

Creating an assembly using SQLCLR

Below is the exact (not terribly well-written) C# code that I used to create an assembly in Visual Studio, that I could then publish to my instance of SQL Server, and then use the function MatrixInvert as I would any other SQL function call.

using System;
using System.Data;
using System.Linq;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.IO;
using System.Text;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction]
    public static SqlString MatrixInvert(SqlString matrixAsString)
    {
        string[][] strMatrix = (matrixAsString.ToString()).Split(';').Select(x => x.Split(',')).ToArray();

        int matrixX = strMatrix.Length;
        for (int i = 0; i < matrixX; i++)
        {
            if (strMatrix[i].Length != matrixX)
            {
                throw new Exception("Matrix not square NxN");
            }
        }

        int matrixSize = matrixX;

        double[][] matrix = new double[matrixSize][];

        for (int i = 0; i < matrixSize; i++)
        {
            matrix[i] = new double[matrixSize];
        }

        for (int i = 0; i < matrixSize; i++)
        {
            for (int j = 0; j < matrixSize; j++)
                matrix[i][j] = Convert.ToDouble(strMatrix[i][j]);
        }

        double[][] inverseMatrix = MatrixInverse(matrix);
        double[][] identityMatrix = MatrixIdentity(matrixSize);
        double[][] product = MatrixProduct(matrix, inverseMatrix);

        String result = null;

        Boolean matrixAreEqual = MatrixAreEqual(product, identityMatrix, 1.0E-8);

        if (matrixAreEqual == true)
        {

            for (int i = 0; i < matrixSize; i++)
            {
                for (int j = 0; j < matrixSize; j++)
                {
                    result += inverseMatrix[i][j].ToString();
                    if (j < (matrixSize - 1)) result += ',';
                }
                if (i < (matrixSize - 1)) result += ';';
            }
        }
        else
        {
            result = null;
        }
        return (result);

    }

    // here: add code from https://jamesmccaffrey.wordpress.com/2015/03/06/inverting-a-matrix-using-c/

}

As you’ll no doubt spot, the code is incomplete — see the final comment. Because it was convenient, I used the C# code from James D. McCaffrey’s blog post Inverting a Matrix using C#. (I don’t want to reproduce his code here, you’ll have to grab it for yourself – but you literally don’t have to do anything to it, just copy and paste.) If this code hadn’t been available, I was going to adapt the code in my venerable copy of ‘Numerical Recipes in C’ (current website here).

Other points to note:

  • My function MatrixInvert calls the functions MatrixInverse, MatrixIdentity, MatrixProduct and MatrixAreEqual from James D. McCaffrey’s code. My function is not much more than a wrapper around the clever stuff.
  • The figures are passed to the function as a single delimited string (comma / semi-colon), and returned similarly. There’s undoubtedly a better way of doing this, but it works.
  • The code checks that the original matrix multiplied by its inverse equals the identity, to some tolerance level, and returns null if this isn’t the case.
  • There could be more error-checking — there could always be more error-checking!

But how did I get this code into SQL Server? So here come the excuses…

The last time I wrote a SQLCLR (maybe 2 years ago?), I downloaded the latest Visual Studio, created a ‘SQL Server Database Project’, wrote my code, hit ‘Publish’, entered the database details, and it sorta just worked. (Least, that’s how I remember it.) While writing this post, I followed the same path… and it took ages, with a ton of googling throughout. Finally, I got it working again, but it wasn’t a pleasant experience. In the end, I had to manually paste in SQL similar to the following:

and set/unset some configuration values:

EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;

EXEC sp_configure 'clr strict security', 0;
RECONFIGURE;

EXEC sp_configure 'clr enabled', 1;  
RECONFIGURE;  

I’m sure it wasn’t this hard last time!

At this point, the SQL function dbo.MatrixInvert is available for me to use. To test it out, let’s take an integer matrix that has an inverse that only contains integers; I got my example from the blog post “Random Integer Matrices With Inverses That Are Also Integer Matrices. The code below builds up our input matrix (as a string), calls the function, then parses the string that’s returned.

DECLARE @myMatrix VARCHAR(MAX)

SET @myMatrix = '9,21,23,8,18,7,12,7;
8,13,15,5,10,5,9,6;
0,13,11,2,13,1,3,0;
13,11,16,13,6,11,13,10;
11,21,23,7,17,7,13,8;
10,8,12,9,4,8,10,8;
11,16,19,7,12,7,12,8;
6,12,14,6,10,5,8,5'

DECLARE @invertedMatrix VARCHAR(MAX)
SELECT @invertedMatrix = dbo.MatrixInvert(@myMatrix)
SELECT
	[row]
	,[col] = ROW_NUMBER() OVER
		(PARTITION BY [row] ORDER BY GETDATE())
			,ROUND(CAST(s.[value] AS FLOAT), 2) AS [value]
	FROM (
		SELECT [row] = ROW_NUMBER() OVER (ORDER BY GETDATE())
			,[value] AS rowStr
		FROM STRING_SPLIT(@invertedMatrix, ';')
) x
CROSS APPLY STRING_SPLIT(x.rowStr, ',') s
ORDER BY [row],[col]
GO

The result set has 64 rows, and starts like this:

All good, it matches the inverse from the original blog post!

Of course, you can pass floats into the function, I used this integer matrix because it looked ‘clean’.

One thing I can’t let pass without comment: why the heck have I used ORDER BY GETDATE() in my ROW_NUMBER window function? Because, somewhat unbelievably, SQL Server’s STRING_SPLIT function doesn’t return an ordinal associated with the position, so I’m trying to force it to behave repeatably; see Row numbers with nondeterministic order for more information. In a production setting, I’d probably use my own hand-rolled function to split strings.


An admission of failure

Finally, an admission of failure: not the code, that all worked perfectly, I was able to calculate my inverse matrices and ‘final state’ probabilities. Except… the models didn’t work. I’d started off looking at older data, seeing if I could replicate the state of loans a few years down the line, and I couldn’t get it to work. I split the data into products, into cohorts, by customer attributes, started looking at second and higher order Markov chains (where the ‘present’ doesn’t just depend on the immediate past, but two, three, etc. steps ago), and nothing worked. In the end, I had to admit defeat. A few months later, I was talking to a Financial Director about my lack of success on this particular loan book — and he said he wasn’t surprised, that in his experience, this sort of modelling didn’t work on some shorter-term financial products. That made me feel somewhat better, but I still want to know what the ‘correct’ model was!

Postscript: Several years later, I was chatting with an agent who’d worked in the operations department, who told me that there was a manual component to the status changes (the statuses weren’t the traditional ‘0-6 payments down’) that I wasn’t aware of when I did the original modelling. I can’t put all the blame on this, but it did go some way to explaining the lack of consistency I was seeing in the data, and I should’ve pushed harder to understand it at the time.

, , , , , , , , , ,

Leave a comment