Posts Tagged xml

Passing structured data to stored procedures

As I wrote about here, I like to pass structured data around using XML (where applicable). While I think it’s the most flexible way, it’s understandable that people might not want to learn XML-parsing syntax, if they don’t have to. Another way of passing data into stored procedures (sprocs) is to use user-defined types (UDTs).

Here’s a simple example:

-- If the UDT already exists, delete it:
IF EXISTS (
	SELECT *
		FROM sys.types
		WHERE is_table_type = 1
		AND [name] = 'ListOfDates'
		AND [schema_id] = SCHEMA_ID('dbo')
	)
BEGIN
	DROP TYPE dbo.ListOfDates
END
GO

-- Create our UDT:
CREATE TYPE dbo.ListOfDates AS TABLE
(
	DateTypeID TINYINT NOT NULL
	,[Date] DATE NOT NULL
)
GO

-- Let's test it out:

SET DATEFORMAT YMD

DECLARE @MyListOfDates AS dbo.ListOfDates

INSERT @MyListOfDates(DateTypeID, [Date])
	VALUES(1, '2016Jan01'),(1, '2016Jan02'),(1, '2016Jan03')
	,(2, '2016Oct10'),(2, '2017Jan01')

SELECT * FROM @MyListOfDates
GO

DateTypeID Date
---------- ----------
1          2016-01-01
1          2016-01-02
1          2016-01-03
2          2016-10-10
2          2017-01-01

Hopefully, that looks straightforward; the way we’ve used it here is not dissimilar to a table variable (e.g. DECLARE @MyTable AS TABLE...), but we can’t pass a table variable to a sproc. We’ll create a test sproc and pass our new user-defined type to it:

-- If the sproc already exists, delete it:
IF OBJECT_ID('dbo.usp_MySproc') IS NOT NULL
BEGIN
	DROP PROCEDURE dbo.usp_MySproc
END
GO

-- Create our sproc:
CREATE PROCEDURE dbo.usp_MySproc
(
	@ListOfDates AS dbo.ListOfDates READONLY
)
AS
BEGIN
	
	SELECT
			DateTypeID
			,MIN([Date]) AS MinDate
			,MAX([Date]) AS MaxDate
		FROM @ListOfDates
		GROUP BY DateTypeID
		ORDER BY DateTypeID

END
GO

-- Test it out:
DECLARE @MyListOfDates AS dbo.ListOfDates
INSERT @MyListOfDates(DateTypeID, [Date])
	VALUES(1, '2016Jan01'),(1, '2016Jan02'),(1, '2016Jan03')
	,(2, '2016Oct10'),(2, '2017Jan01')
	,(3, '2017Feb01'),(3, '2017Feb03'),(3, '2017Feb28')

EXEC dbo.usp_MySproc @ListOfDates = @MyListOfDates
GO

DateTypeID MinDate    MaxDate
---------- ---------- ----------
1          2016-01-01 2016-01-03
2          2016-10-10 2017-01-01
3          2017-02-01 2017-02-28

See the READONLY in the CREATE PROCEDURE? Here’s what happens when you omit it:

The table-valued parameter "@ListOfDates" must be declared with the READONLY option.

This is one of the limitations of user-defined types: they can’t be used to pass data back. So we couldn’t make any changes to @ListOfDates in the sproc, and see those changes reflected outside of the sproc.

There’s another limitation: UDTs can’t be used cross-database. Here’s what happens:

The type name 'MyOtherDatabase.dbo.ListOfDates' contains more than the
maximum number of prefixes. The maximum is 1.

Even if you create the exact same type in another database, with the same name, it won’t work.

(Note that UDTs don’t have to be tables: go here [MSDN] for more info.)


Just for kicks, here’s how I’d replicate the above functionality, but using XML:

-- If the sproc already exists, delete it:
IF OBJECT_ID('dbo.usp_MySproc') IS NOT NULL
BEGIN
	DROP PROCEDURE dbo.usp_MySproc
END
GO

-- Create the sproc:
CREATE PROCEDURE dbo.usp_MySproc
(
	@ListOfDates XML
)
AS
BEGIN
	
	WITH cte_ExtractFromXML AS
	(
		SELECT 
				DateTypeID	= d.i.value('(@type)[1]', 'INT')
				,[Date]		= d.i.value('.[1]', 'DATE')
			FROM @ListOfDates.nodes('//date') d(i)
	)
	SELECT
			DateTypeID
			,MIN([Date]) AS MinDate
			,MAX([Date]) AS MaxDate
		FROM cte_ExtractFromXML
		GROUP BY DateTypeID
		ORDER BY DateTypeID

END
GO

-- Test it with some XML data:
DECLARE @MyListOfDates XML
SET @MyListOfDates = '
<listOfDates>
	<date type="1">2016Jan01</date>
	<date type="1">2016Jan02</date>
	<date type="1">2016Jan03</date>
	<date type="2">2016Oct10</date>
	<date type="2">2017Jan01</date>
	<date type="3">2017Feb01</date>
	<date type="3">2017Feb03</date>
	<date type="3">2017Feb28</date>
</listOfDates>'

EXEC dbo.usp_MySproc @ListOfDates = @MyListOfDates
GO

DateTypeID  MinDate    MaxDate
----------- ---------- ----------
1           2016-01-01 2016-01-03
2           2016-10-10 2017-01-01
3           2017-02-01 2017-02-28

If we wanted to, we could declare our @ListOfDates parameter as OUTPUT, and make changes to it in the sproc, e.g.:

Drop the sproc if it exists:
IF OBJECT_ID('dbo.usp_MySproc') IS NOT NULL
BEGIN
	DROP PROCEDURE dbo.usp_MySproc
END
GO

-- Create our (new and improved) sproc:
CREATE PROCEDURE dbo.usp_MySproc
(
	@ListOfDates XML OUTPUT
)
AS
BEGIN
	
	DECLARE @Aggregated XML

	;WITH cte_ExtractFromXML AS
	(
		SELECT 
				DateTypeID	= d.i.value('(@type)[1]', 'INT')
				,[Date]		= d.i.value('.[1]', 'DATE')
			FROM @ListOfDates.nodes('//date') d(i)
	)
	SELECT @Aggregated = (
		SELECT
				DateTypeID AS '@id'
				,MIN([Date]) AS '@MinDate'
				,MAX([Date]) AS '@MaxDate'
			FROM cte_ExtractFromXML
			GROUP BY DateTypeID
			ORDER BY DateTypeID
			FOR XML PATH('DataType'), ROOT('Aggregated')
	)

	SELECT @ListOfDates = (
		SELECT @ListOfDates, @Aggregated FOR XML PATH('Output'), TYPE
	)

END
GO

-- Run it:
DECLARE @MyListOfDates XML
SET @MyListOfDates = '
<listOfDates>
	<date type="1">2016Jan01</date>
	<date type="1">2016Jan02</date>
	<date type="1">2016Jan03</date>
	<date type="2">2016Oct10</date>
	<date type="2">2017Jan01</date>
	<date type="3">2017Feb01</date>
	<date type="3">2017Feb03</date>
	<date type="3">2017Feb28</date>
</listOfDates>'

EXEC dbo.usp_MySproc @ListOfDates = @MyListOfDates OUTPUT

SELECT @MyListOfDates AS MyListOfDates
GO

<Output>
  <listOfDates>
    <date type="1">2016Jan01</date>
    <date type="1">2016Jan02</date>
    <date type="1">2016Jan03</date>
    <date type="2">2016Oct10</date>
    <date type="2">2017Jan01</date>
    <date type="3">2017Feb01</date>
    <date type="3">2017Feb03</date>
    <date type="3">2017Feb28</date>
  </listOfDates>
  <Aggregated>
    <DataType id="1" MinDate="2016-01-01" MaxDate="2016-01-03" />
    <DataType id="2" MinDate="2016-10-10" MaxDate="2017-01-01" />
    <DataType id="3" MinDate="2017-02-01" MaxDate="2017-02-28" />
  </Aggregated>
</Output>

As you can see, we’ve returned our original list, along with the aggregated data, both under a new parent node.

, , , , ,

Leave a comment

XSDs FTW

I’m a big fan of passing data to and from stored procedures (sprocs) as XML, especially XML that represents a complete object, or a list or hierarchy of objects. For a start, XML is perfectly human-readable (if you’re doing it right), and nearly every system and language knows how to work with it, SQL Server / TSQL included. What makes it even better, is being able to validate the XML before you even begin to parse it, using an XSD (XML Schema Definition).

Here’s a complete example you can copy and run:

USE tempdb
GO

-- If the XSD already exists, drop it:

IF EXISTS (
  SELECT xsc.name 
    FROM sys.xml_schema_collections xsc
    WHERE xsc.name='TestSchema'
)
BEGIN
  DROP XML SCHEMA COLLECTION TestSchema
END
GO

-- Create the schema:

CREATE XML SCHEMA COLLECTION TestSchema AS '
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:simpleType name="ST_EmailAddress">
      <xsd:restriction base="xsd:string">
      <xsd:pattern value="[^@]*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9}"/>
      </xsd:restriction>
  </xsd:simpleType>

  <xsd:simpleType name="ST_Usage">
    <xsd:restriction base="xsd:string">
      <xsd:enumeration value="home"/>
      <xsd:enumeration value="work"/>
      <xsd:enumeration value="other"/>
    </xsd:restriction>  
  </xsd:simpleType>

  <xsd:complexType name="CT_EmailAndUsage">
    <xsd:simpleContent>
      <xsd:extension base="ST_EmailAddress">
        <xsd:attribute name="usage" use="required" type="ST_Usage" />
      </xsd:extension>
    </xsd:simpleContent>
  </xsd:complexType>

  <xsd:element name="emailList">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="email" type="CT_EmailAndUsage" minOccurs="1" maxOccurs="3" />
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>

</xsd:schema>'
GO

-- Make some dummy data that conforms to the schema above:

DECLARE @testXML AS XML(TestSchema)

SET @testXML = '
<emailList>
  <email usage="home">pete@home.com</email>
  <email usage="work">pete@work.com</email>
  <email usage="other">pete@other.com</email>
</emailList>
'

-- Query it:

SELECT
    id = ROW_NUMBER() OVER (ORDER BY e.i)
    ,EmailAddress = e.i.value('(.)[1]','VARCHAR(255)')
    ,Usage = e.i.value('(@usage)[1]', 'VARCHAR(20)')
  FROM @testXML.nodes('//email') AS e(i)
GO

The result set is:


id   EmailAddress    Usage
---  --------------  ------
1    pete@home.com   home
2    pete@work.com   work
3    pete@other.com  other

(3 row(s) affected)

Now, try messing around with the contents of the @testXML variable, e.g.:

  1. Set usage to a string that’s not ‘home’, ‘work’ or ‘other’
  2. Add a fourth email address
  3. Take the ‘@’ symbol out of an email address
  4. Put in some extra nodes that don’t belong

,then re-run the code. They all fail, because the XML has to conform to the XSD we’ve defined as TestSchema. So, SQL Server automatically rejects any input that fails data validation (e.g. format of email address) or breaks business logic (‘no more than three emails’); if the XML was being passed to a sproc, the call would fail, and no code inside would ever run.

Obviously, you may not want to automatically reject ‘broken’ XML, you’ll probably want to record this fact. That’s fine – your code (sproc) can accept a schema-less XML, and attempt the cast itself; and if it fails, you can respond however you like.

There’s certainly an overhead in learning the language of XSDs, but because it’s a generic technology, there are plenty of online resources, e.g. w3schools. When it comes to transferring complex objects around as data, I don’t know of a better way than using XML and XSD.

Note

Because Microsoft haven’t got round to coding it yet, you can’t query the text value of a node that’s been defined as a type in XSD. That is, I’d ordinarily like to be able to query the email address itself like this:

EmailAddress = e.i.value('(./text())[1]','VARCHAR(255)')

, because directly accessing the node text is faster (presumably because it doesn’t have to do any more parsing). But sadly, it’ll just fail with an error. However, this is unlikely to cause practical problems, it’s just a mild annoyance that’s vastly outweighed by the benefits that come from validated XML.

, ,

Leave a comment

Converting JSON to XML : A Gateway to Cygwin

Of the files I have to deal with on a weekly basis, I’d put the breakdown at 50% Excel, 40% CSV, and 10% XML. This is fine, I can reliably transfer data from those files into SQL Server without too many tears. However, today I was presented with JSON-formatted versions of files, that I’d normally get as XML; and I haven’t had to deal with JSON since I last wrote PHP/AJAX code, about five years ago.

Now, SQL Server 2016 can natively read/write JSON code (see, for example, this MSDN blog), but I use SQL Server 2014, which knows nothing about JSON.

Of course, I googled for JSON to XML converters. There are many, mostly in the form of libraries for other systems, and even a few online converters that would do the job ‘in-browser’. Unfortunately, the data I needed to convert was credit file data, and that data is sensitive. I can’t just go pasting it into unknown websites without completely understanding what’s going to happen to it – if there’s the slightest chance my data could get uploaded and saved elsewhere, I can’t use that site. I did find an online site that did the conversion purely in javascript (no POSTs back to the server), so I copied the code locally, pasted my JSON in… and it crashed the browser (Chrome). Turns out 80kb of JSON was too much for this javascript, and in fact, a couple of the standalone converters I tried also had trouble with this (small) amount of code.

There was even a pure T-SQL converter (written as a function) that I tried, but unfortunately, that didn’t work out either. Which is a shame, as a SQL-based solution appeals to me greatly!

To cut a dull story short, here’s how I did it: perl. Thanks to the third most popular answer to this stackoverflow question, the answer was to open up a cygwin window, and type:


cat MyFile.json | perl -MJSON -MXML::Simple -e 'print XMLout(decode_json(do{local$/;}),RootName=>"json")' > MyFile.xml

(Thank you very much, stackoverflow user azatoth!)

And that did the trick; I had to do some minor tidying up (due to @ symbols, and node names starting with a number), but in the main, it did the job for me, with a minimum of effort.

The point of this post is two-fold:

  1. When this requirement crops up again, I only have to look here to remind myself, and…
  2. To spread the word about cygwin.

Cygwin (www.cygwin.com) is a way to get hold of Unix/Linux-style functionality on Windows. I’ve used it for years now, and it’s an invaluable part of what I do; it’s literally one of the first things I install on any new machine.

If you do any significant amount of text file processing, there are many great command-line tools to be found within the cygwin environment; just a few I use on at least a weekly, if not daily, basis:

  • grep: for searching through text files, using regular expressions
  • sed: for basic text transformations
  • awk: a programming language for text processing
  • perl: a programming language widely used in the *nix world

The beauty of these tools, is that they’re so widely used, it’s almost guaranteed that whatever you want to do, someone else has already put the correct syntax online (cf. my JSON to XML problem). Usually, some light googling (often incorporating the term ‘+stackoverflow’) will get you your answer. I wouldn’t claim for a second that I ‘knew’ these tools (apart from maybe grep), but being able to work with them is enough.

If you’re a developer or analyst who has to routinely work with data files, I can’t recommend cygwin highly enough.

, , ,

Leave a comment