XSDs FTW

I’m a big fan of passing data to and from stored procedures (sprocs) as XML, especially XML that represents a complete object, or a list or hierarchy of objects. For a start, XML is perfectly human-readable (if you’re doing it right), and nearly every system and language knows how to work with it, SQL Server / TSQL included. What makes it even better, is being able to validate the XML before you even begin to parse it, using an XSD (XML Schema Definition).

Here’s a complete example you can copy and run:

USE tempdb
GO

-- If the XSD already exists, drop it:

IF EXISTS (
  SELECT xsc.name 
    FROM sys.xml_schema_collections xsc
    WHERE xsc.name='TestSchema'
)
BEGIN
  DROP XML SCHEMA COLLECTION TestSchema
END
GO

-- Create the schema:

CREATE XML SCHEMA COLLECTION TestSchema AS '
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:simpleType name="ST_EmailAddress">
      <xsd:restriction base="xsd:string">
      <xsd:pattern value="[^@]*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9}"/>
      </xsd:restriction>
  </xsd:simpleType>

  <xsd:simpleType name="ST_Usage">
    <xsd:restriction base="xsd:string">
      <xsd:enumeration value="home"/>
      <xsd:enumeration value="work"/>
      <xsd:enumeration value="other"/>
    </xsd:restriction>  
  </xsd:simpleType>

  <xsd:complexType name="CT_EmailAndUsage">
    <xsd:simpleContent>
      <xsd:extension base="ST_EmailAddress">
        <xsd:attribute name="usage" use="required" type="ST_Usage" />
      </xsd:extension>
    </xsd:simpleContent>
  </xsd:complexType>

  <xsd:element name="emailList">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="email" type="CT_EmailAndUsage" minOccurs="1" maxOccurs="3" />
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>

</xsd:schema>'
GO

-- Make some dummy data that conforms to the schema above:

DECLARE @testXML AS XML(TestSchema)

SET @testXML = '
<emailList>
  <email usage="home">pete@home.com</email>
  <email usage="work">pete@work.com</email>
  <email usage="other">pete@other.com</email>
</emailList>
'

-- Query it:

SELECT
    id = ROW_NUMBER() OVER (ORDER BY e.i)
    ,EmailAddress = e.i.value('(.)[1]','VARCHAR(255)')
    ,Usage = e.i.value('(@usage)[1]', 'VARCHAR(20)')
  FROM @testXML.nodes('//email') AS e(i)
GO

The result set is:


id   EmailAddress    Usage
---  --------------  ------
1    pete@home.com   home
2    pete@work.com   work
3    pete@other.com  other

(3 row(s) affected)

Now, try messing around with the contents of the @testXML variable, e.g.:

  1. Set usage to a string that’s not ‘home’, ‘work’ or ‘other’
  2. Add a fourth email address
  3. Take the ‘@’ symbol out of an email address
  4. Put in some extra nodes that don’t belong

,then re-run the code. They all fail, because the XML has to conform to the XSD we’ve defined as TestSchema. So, SQL Server automatically rejects any input that fails data validation (e.g. format of email address) or breaks business logic (‘no more than three emails’); if the XML was being passed to a sproc, the call would fail, and no code inside would ever run.

Obviously, you may not want to automatically reject ‘broken’ XML, you’ll probably want to record this fact. That’s fine – your code (sproc) can accept a schema-less XML, and attempt the cast itself; and if it fails, you can respond however you like.

There’s certainly an overhead in learning the language of XSDs, but because it’s a generic technology, there are plenty of online resources, e.g. w3schools. When it comes to transferring complex objects around as data, I don’t know of a better way than using XML and XSD.

Note

Because Microsoft haven’t got round to coding it yet, you can’t query the text value of a node that’s been defined as a type in XSD. That is, I’d ordinarily like to be able to query the email address itself like this:

EmailAddress = e.i.value('(./text())[1]','VARCHAR(255)')

, because directly accessing the node text is faster (presumably because it doesn’t have to do any more parsing). But sadly, it’ll just fail with an error. However, this is unlikely to cause practical problems, it’s just a mild annoyance that’s vastly outweighed by the benefits that come from validated XML.

Advertisements

, ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: