Milestone tables

Here’s a common-enough scenario: I have a fairly complex query, over multiple tables, where one or more of the component tables are large (tens of millions of rows) and with multiple indexes that are almost certainly not optimal for what’s needed. My query relates to a specific and well-defined date-range, but even though there are explicit indexes over the date columns, it still runs too slowly.

Given that tuning queries isn’t really my day job – I just need to get the data out so I can get on with analysing it – I tend to go for the quick win. In most cases, putting ranges on the integer primary keys can dramatically speed up the query. To this end, I keep several ‘milestone’ tables in the database, that look something like this:

StartDate       ID
---------- ----------
2015-01-01   20017996
2015-02-01   20843583
2015-03-01   21529164
2015-04-01   22231934
2015-05-01   22981850
2015-06-01   23723066
 ...          ...

(The primary key on these milestone tables will be a date – for me, it’s always the first of a month, but there’s no reason you couldn’t have weekly or daily tables.)

So now, either I look in the milestone table first and manually adjust the range on the primary key(s) for my query, or I’ll JOIN directly to the milestone table.

To keep the milestone tables updated, there’s a SQL Server job that runs just after midnight on the first of the month to INSERT the latest row. It looks something like this:

INSERT dbo.MyMilestoneTable
(
	StartDate
	,ID
)
	SELECT
			StartDate	= CAST(GETDATE() AS DATE)
			,ID		= MAX(t.MyTableID) 
		FROM dbo.MyTable t
		JOIN dbo.MyMilestoneTable m
			ON m.StartDate = DATEADD(month, -1, CAST(GETDATE() AS DATE))
		WHERE t.MyTableID > m.ID
		AND t.DateCreated < CAST(GETDATE() AS DATE)
GO

And that’s it, that’s all there is to them. There’s also a nice side-effect: it’s pretty trivial to work out how fast data is growing, month-on-month, which can help with resource planning.

I’m acutely aware of the fact that I’m ignoring the root cause of the queries being so slow, but when you need the data now, this approach works.

PS I should ‘fess up here: I’d never put a name to them before writing this post, but “milestone tables” seems to fit.

Advertisements

, , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: