Converting JSON to XML : A Gateway to Cygwin

Of the files I have to deal with on a weekly basis, I’d put the breakdown at 50% Excel, 40% CSV, and 10% XML. This is fine, I can reliably transfer data from those files into SQL Server without too many tears. However, today I was presented with JSON-formatted versions of files, that I’d normally get as XML; and I haven’t had to deal with JSON since I last wrote PHP/AJAX code, about five years ago.

Now, SQL Server 2016 can natively read/write JSON code (see, for example, this MSDN blog), but I use SQL Server 2014, which knows nothing about JSON.

Of course, I googled for JSON to XML converters. There are many, mostly in the form of libraries for other systems, and even a few online converters that would do the job ‘in-browser’. Unfortunately, the data I needed to convert was credit file data, and that data is sensitive. I can’t just go pasting it into unknown websites without completely understanding what’s going to happen to it – if there’s the slightest chance my data could get uploaded and saved elsewhere, I can’t use that site. I did find an online site that did the conversion purely in javascript (no POSTs back to the server), so I copied the code locally, pasted my JSON in… and it crashed the browser (Chrome). Turns out 80kb of JSON was too much for this javascript, and in fact, a couple of the standalone converters I tried also had trouble with this (small) amount of code.

There was even a pure T-SQL converter (written as a function) that I tried, but unfortunately, that didn’t work out either. Which is a shame, as a SQL-based solution appeals to me greatly!

To cut a dull story short, here’s how I did it: perl. Thanks to the third most popular answer to this stackoverflow question, the answer was to open up a cygwin window, and type:


cat MyFile.json | perl -MJSON -MXML::Simple -e 'print XMLout(decode_json(do{local$/;}),RootName=>"json")' > MyFile.xml

(Thank you very much, stackoverflow user azatoth!)

And that did the trick; I had to do some minor tidying up (due to @ symbols, and node names starting with a number), but in the main, it did the job for me, with a minimum of effort.

The point of this post is two-fold:

  1. When this requirement crops up again, I only have to look here to remind myself, and…
  2. To spread the word about cygwin.

Cygwin (www.cygwin.com) is a way to get hold of Unix/Linux-style functionality on Windows. I’ve used it for years now, and it’s an invaluable part of what I do; it’s literally one of the first things I install on any new machine.

If you do any significant amount of text file processing, there are many great command-line tools to be found within the cygwin environment; just a few I use on at least a weekly, if not daily, basis:

  • grep: for searching through text files, using regular expressions
  • sed: for basic text transformations
  • awk: a programming language for text processing
  • perl: a programming language widely used in the *nix world

The beauty of these tools, is that they’re so widely used, it’s almost guaranteed that whatever you want to do, someone else has already put the correct syntax online (cf. my JSON to XML problem). Usually, some light googling (often incorporating the term ‘+stackoverflow’) will get you your answer. I wouldn’t claim for a second that I ‘knew’ these tools (apart from maybe grep), but being able to work with them is enough.

If you’re a developer or analyst who has to routinely work with data files, I can’t recommend cygwin highly enough.

Advertisements

, , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: