In my first proper post, I want to share with you the number one tip that (a) I wish I’d been told 20 years ago, and (b) I endeavour to tell as many people about as will care to listen. It’s very simply this: put a default create date on every row. In every table. In every database. No exceptions (ok, some exceptions, I’ll come to that).
Why should we do this? Because it’s one of the most fundamental questions you can ask of your data: When was it created? When did that customer sign up? When was that email scheduled? When did we start selling product ABC? But it also gives you information about the opposite: when did something not happen? E.g. the website went down, and no orders came through – how long did the outage last? With a default create date, you can answer that. (Also, it’s insanely useful for debugging, I don’t know how you’d cope without it.)
To add a default create date to an existing table called MyTable, you would run the following command:
ALTER TABLE dbo.MyTable ADD CreatedOn DATETIME NOT NULL CONSTRAINT DF_MyTable_CreatedOn DEFAULT(GETDATE())
If you’re creating a table from scratch using DDL (i.e. typing it into a query window, not using a wizard), you would include the line:
CreatedOn DATETIME NOT NULL CONSTRAINT DF_MyTable_CreatedOn DEFAULT(GETDATE())
Let’s pull that apart:
CreatedOn : Call it what you want: DateCreated, DateInserted, anything that you like (as long as it makes sense).
DATETIME NOT NULL : It’s of type DATETIME, and it can’t be blank (NULL).
CONSTRAINT DF_MyTable_CreatedOn : In SQL Server, defaults are declared as constraints; you don’t have to name your constraints, but if you want people to like you, you should. (I always start my default constraints with DF_.)
DEFAULT(GETDATE()) : GETDATE() is the built-in SQL Server function for returning the current date/time. DEFAULT() means “unless specified otherwise, this is what will be put in the column when you insert a row”.
If you’re ALTER-ing an existing table to add a new column with a default date defined on it, then you have two choices:
- Define the column as NOT NULL-able. Every existing row will get the default value as it stands now. That means every existing row will get the date of when you issued the command. Each new row will get the current date at the point it’s inserted, as you’d expect.
- Define the column as NULL-able. Every existing row will have a NULL, but each new row will get the current date. However, by inserting a row where the CreatedOn is it explicitly set to NULL, you won’t get the default date, you’ll get a NULL.
Exceptions: Nearly every time I thought I didn’t need a default create date, I was wrong. Lookup / type tables? Someone will always come up with another category; just because you can’t think of it now, doesn’t mean someone else won’t in the future. And then you’ll want to know when that new category was added. The only real exceptions I can think of are (a) tables that are fully under your control (e.g. I regularly extract XML into hierarchies of tables; I only care about the create date on the table at the top of the tree), or (b) situations where diskspace is extremely limited – a DATETIME takes up 8 bytes of space, multiply that by a few billion rows, and you may not have room… but in which case, you’ve bigger problems anyway!
What next? Well, I don’t only have default create dates; if it’s appropriate, I’ll also have a default create user. That is, what was the SQL username of the user responsible for inserting a row into the database?
CreatedBy NVARCHAR(128) NOT NULL CONSTRAINT DF_MyTable_CreatedBy DEFAULT(SUSER_SNAME())
, where SUSER_NAME() is a built-in function that “returns the login identification name of the [current] user”.
Also, I like to keep track of when/who last updated a row:
UpdatedOn DATETIME NULL UpdatedBy NVARCHAR(128) NULL
, where these are maintained using a trigger:
CREATE TRIGGER dbo.tr_MyTable_Update ON dbo.MyTable AFTER UPDATE AS BEGIN SET NOCOUNT ON UPDATE mt SET mt.UpdatedOn = GETDATE() ,mt.UpdatedBy = SUSER_SNAME() FROM dbo.MyTable mt JOIN Inserted i ON i.MyTableID = mt.MyTableID END
Of course, this simple trigger isn’t a replacement for real auditing, it’s just the headlines: who last updated the row, and when? It doesn’t tell you what data they updated.
One final point: sometimes I see that the developer has put a CreatedOn column in the table, but left it without a DEFAULT. They plan to set the date from their own application code, and think this is sufficient. There are two problems with this:
- Any rows inserted outside of their code won’t get a date.
- The clock on the webserver (say) may not be sync-ed to the clock on the database server. Potentially, the CreatedOn dates will be offset depending on which webserver is talking to the database. (Note: If you’ve got the capacity, I’m completely supportive of having a second DATETIME column that records the webserver’s timestamp…)
Any thoughts or comments? Please let me know.