Consuming JSON Strings in SQL Server

It has always seemed strange to Phil that SQL Server has such complete support for XML, yet is completely devoid of any support for JSON. In the end, he was forced, by a website project, into doing something about it. The result is this article, an iconoclastic romp around the representation of hierarchical structures, and some code to get you started.

Updated 2nd May 2013
Updated 8th  May 2014

“The best thing about XML is what it shares with JSON, being human readable. That turns out to be important, not because people should be reading it, because we shouldn’t, but because it avoids interoperability problems caused by fussy binary encoding issues.

Beyond that, there is not much to like. It is not very good as a data format. And it is not very good as a document format. If it were a good document format, then wikis would use it.”

Doug Crockford March 2010

This article describes a TSQL JSON parser and its evil twin, a JSON outputter, and provides the source. It is also designed to illustrate a number of string manipulation techniques in TSQL. With it you can do things like this to extract the data from a JSON document:

And get:

1176-JSON1.jpg

…or you can do the round trip:

To get:

Background

TSQL isn’t really designed for doing complex string parsing, particularly where strings represent nested data structures such as XML, JSON, YAML, or XHTML.

You can do it but it is not a pretty sight; but why would you ever want to do it anyway? Surely, if anything was meant for the ‘application layer’ in C# or VB.net, then this is it. ‘Oh yes’, will chime in the application thought police, ‘this is far better done in the application or with a CLR.’ Not necessarily.

Sometimes, you just need to do something inappropriate in TSQL.

There are a whole lot of reasons why this might happen to you. It could be that your DBA doesn’t allow a CLR, for example, or you lack the necessary skills with procedural code. Sometimes, there isn’t any application, or you want to run code unobtrusively across databases or servers.

I needed to interpret or ‘shred’ JSON data. JSON is one of the most popular lightweight markup languages, and is probably the best choice for transfer of object data from a web page. It is, in fact, executable JavaScript that is very quick to code in the browser in order to dump the contents of a JavaScript object, and is lightning-fast to populate the browser object from the database since you are passing it executable code (you need to parse it first for security reasons – passing executable code around is potentially very risky). AJAX can use JSON rather than XML so you have an opportunity to have a much simpler route for data between database and browser, with less opportunity for error.

The conventional way of dealing with data like this is to let a separate business layer parse a JSON ‘document’ into some tree structure and then update the database by making a series of calls to it. This is fine, but can get more complicated if you need to ensure that the updates to the database are wrapped into one transaction so that if anything goes wrong, then the whole operation can be rolled back. This is why a CLR or TSQL approach has advantages.

“Sometimes, you just
need to do something
inappropriate in TSQL…”

I wrote the parser as a prototype because it was the quickest way to determine what was involved in the process, so I could then re-write something as a CLR in a .NET language.  It takes a JSON string and produces a result in the form of an adjacency list representation of that hierarchy. In the end, the code did what I wanted with adequate performance (It reads a json file of  540 name\value pairs and creates the SQL  hierarchy table  in 4 seconds) so I didn’t bother with the added complexity of maintaining a CLR routine. In order to test more thoroughly what I’d done, I wrote a JSON generator that used the same Adjacency list, so you can now import and export data via JSON!

These markup languages such as JSON and XML all represent object data as hierarchies. Although it looks very different to the entity-relational model, it isn’t. It is rather more a different perspective on the same model. The first trick is to represent it as a Adjacency list hierarchy in a table, and then use the contents of this table to update the database. This Adjacency list is really the Database equivalent of any of the nested data structures that are used for the interchange of serialized information with the application, and can be used to create XML, OSX Property lists, Python nested structures or YAML as easily as JSON.

Adjacency list tables have the same structure whatever the data in them. This means that you can define a single Table-Valued  Type and pass data structures around between stored procedures. However, they are best held at arms-length from the data, since they are not relational tables, but something more like the dreaded EAV (Entity-Attribute-Value) tables. Converting the data from its Hierarchical table form will be different for each application, but is easy with a CTE. You can, alternatively, convert the hierarchical table into XML and interrogate that with XQuery.

JSON format.

JSON is designed to be as lightweight as possible and so it has only two structures. The first, delimited by curly brackets, is a collection of name/value pairs, separated by commas. The name is followed by a colon. This structure is generally implemented in the application-level as an object, record, struct, dictionary, hash table, keyed list, or associative array. The other structure is an ordered list of values, separated by commas. This is usually manifested as an array, vector, list, or sequence.

“Using recursion in TSQL is
like Sumo Wrestlers doing Ballet.
It is possible but not pretty.”

The first snag for TSQL is that the curly or square brackets are not ‘escaped’ within a string, so that there is no way of shredding a JSON ‘document’ simply. It is difficult to  differentiate a bracket used as the delimiter of an array or structure, and one that is within a string. Also, interpreting a string into a SQL String isn’t entirely straightforward since hex codes can be embedded anywhere to represent complex Unicode characters, and all the old C-style escaped characters are used. The second complication is that, unlike YAML, the datatypes of values can’t be explicitly declared. You have to sniff them out from applying the rules from the JSON Specification.

Obviously, structures can be embedded in structures, so recursion is a natural way of making life easy. Using recursion in TSQL is like Sumo Wrestlers doing Ballet. It is possible but not pretty.

The implementation

Although the code for the JSON Parser/Shredder will run in SQL Server 2005, and even in SQL Server 2000 (with some modifications required), I couldn’t resist using a TVP (Table Valued Parameter) to pass a hierarchical table to the function, ToJSON, that produces a JSON ‘document’. Writing a SQL Server 2005 version should not be too hard.

First the function replaces all strings with tokens of the form @Stringxx, where xx is the foreign key of the table variable where the strings are held. This takes them, and their potentially difficult embedded brackets, out of the way. Names are  always strings in JSON as well as  string values.

Then, the routine iteratively finds the next structure that has no structure contained within it, (and is, by definition the leaf structure), and parses it, replacing it with an object token of the form ‘@Objectxxx‘, or ‘@arrayxxx‘, where xxx is the object id assigned to it. The values, or name/value pairs are retrieved from the string table and stored in the hierarchy table. Gradually, the JSON document is eaten until there is just a single root object left.

The JSON outputter is a great deal simpler, since one can be surer of the input, but essentially it does the reverse process, working from the root to the leaves. The only complication is working out the indent of the formatted output string.

In the implementation, you’ll see a fairly heavy use of PATINDEX. This uses a poor man’s RegEx, a starving man’s RegEx. However, it is all we have, and can be pressed into service by chopping the string it is searching (if only it had an optional third parameter like CHARINDEX that specified the index of the start position of the search!). The STUFF function is also a godsend for this sort of string-manipulation work.

So once we have a hierarchy, we can pass it to a stored procedure. As the output is an adjacency list, it should be easy to access the data. You might find it handy to create a table type if you are using SQL Server 2008. Here is what I use. (Note that if you drop a Table Valued Parameter type, you will have to drop any dependent functions or procedures first, and re-create them afterwards).

ToJSON. A function that creates JSON Documents

Firstly, we need a simple utility function:

And now, the function that takes a JSON Hierarchy table and converts it to a JSON string.

ToXML. A function that creates XML

The function that converts a hierarchy  table to XML gives us a JSON to XML converter. It is surprisingly similar to the previous function

This provides you the means of converting a JSON string into XML

This gives the result…

Wrap-up

The so-called ‘impedence-mismatch’ between applications and databases is, I reckon, an illusion. The object-oriented nested data-structures that we receive from applications are, if the developer has understood the data correctly,  merely a perspective from a particular entity of the relationships it is involved with. Whereas it is easy to shred XML documents to get the data from it to update the database, it has been trickier with other formats such as JSON. By using techniques like this, it should be possible to liberate the application, or website, programmer from having to do the mapping from the object model to the relational, and spraying the database with ad-hoc TSQL  that uses the base tables or updateable views.  If the database can be provided with the JSON, or the Table-Valued parameter, then there is a better chance of  maintaining full transactional integrity for the more complex updates.

The database developer already has the tools to do the work with XML, but why not the simpler, and more practical JSON? I hope these two routines get you started with experimenting with this.

Interesting JSON-related articles and sites

Since writing this article, Phil has also developed a CSV parser and output and an XML parser (Producing JSON Documents from SQL Server queries via TSQL)

Downloads

Tags: , , , , , ,

  • 274942 views

  • Rate
    [Total: 9    Average: 4.8/5]
  • realSantaClara

    ‘impedence-mismatch’
    Phil … you are amazing … but doesn’t this very article demonstrate the real existence of ‘impedence-mismatch’ ?

  • Phil Factor

    Re: impedence-mismatch
    Er, Maybe. However, if we had sensible ways of processing JSON or YAML strings from applications that are built-in like XML, then a lot of problems would vanish. What I meant to say was that JSON is so easy to produce from an application, Particularly Javascript, and are relatively easy to process from a stored procedure, that if we shared this language, a lot of problems would go away. No, getting XML from Javascript objects isn’t as neat or trivial. I’m happy to make JSON the language of an application-interface, both ways.

  • Sunny

    .Net Assembly
    As realSantaClara said: amazing T-SQL code! But it seems to me that this is an example to leave T-SQL and implement such things in an .Net Assembly 😉

  • BuggyFunBunny

    You quote Crockford, good
    — Using recursion in TSQL is like Sumo Wrestlers doing Ballet. It is possible but not pretty.

    Well, no. That’s what the CTE paradigm is for. Without CTE, then yes.

    — The so-called ‘impedence-mismatch’ between applications and databases is, I reckon, an illusion.

    It is. While the term was first used by (IIRC) OO programmers who hated the RM, the instance data of an object (and any included objects) is purely relational.

    — The database developer already has the tools to do the work with XML, but why not the simpler, and more practical JSON?

    Because the Suits that run companies producing Database Engines are just as ignorant as Suits everywhere else (I’m talking to you, Armonk). XML is the Emperor’s New Clothes, and loverly they are, too. Remember, it was created by LAWYERS and propelled by OO coders. Not a data geek to be found.

  • Phil Factor

    Re: Recursive CTEs
    True but what about performance, compared with a procedural language? I use recursive CTEs, but I get better timings with an iterative solution in TSQL. Maybe I’m doing something wrong, so we might set this as a challenge in the Phil Factor Speed Phreak competition!

  • Thought Police

    Disappointed
    “‘Oh yes’, will chime in the application thought police, ‘this is far better done in the application or with a CLR.'”
    *chime*
    While this is a good trick, The only possible reason I can think of to use this monstrosity is a scenario that you point out, where there is no application infrastructure in place.
    Where an CLR application layer is available, you can do JSON serialization/deserialization in a few lines of code. Using a hundred lines of complex sql nested calls instead goes against all basic participles of creating maintainable systems.
    One whose rationale is that they “lack the necessary skills with procedural code” is the proverbial man whose only tool is a hammer.

  • Anonymous

    CLR? Your DBA let’s you use that scary thing?
    lol

  • Anonymous

    json shred in CLR
    Its much too slow and cpu heavy in T-SQL for json of any size.

    I tried various methods before opting for using an open source .net json libray in a CLR proc.

    Described here http://www.sql-library.com/?p=140

  • Phil Factor

    Re: JSON Shred in CLR
    Sure. As I say in the article, CLR is the best choice for this sort of thing, if you have a choice. I don’t know which is the best version of a CLR for shredding JSON so it would be great if anyone else who has had a good experience with a JSON CLR to leave a comment. I’d suspect that the best idea would be to have one that ships with the SQL Server product, particularly now that Microsoft supports OData. http://www.odata.org/developers/protocols/json-format

    I certainly wouldn’t want to shred Facebook info in TSQL! I wrote this code for a particular purpose that used reasonably short documents.

  • Phil Factor

    Oops
    Just fixed a bug that I noticed in the XML Outputter, and added the source to it the speechbubble at the top of the article.

  • Robert Towne

    Nice job
    Thanks, I thought about attempting something like this because my shop is anti-CLR for the most part (at least for unsafe assemblies).

    To do this with a CLR function (someone please correct me if I am wrong) you need to add several unsafe assemblies to your production environment. For example, if I wanted to use http://json.codeplex.com/, you would need to add 3 unsafe assemblies to get this to work. Maybe not a problem for some shops, but understandably a problem for others.

    Anyway, great post and good discussions.

  • ricva

    another Json split for tsql
    I also wrote a parser – fnSplitJson2, also all t-sql, and available for sql server 2005 and below or sql server 2008. Would have save you have the work 🙂

    http://www.sqlservercentral.com/articles/JSON/68128/

    Find it very useful because it will also tell me the length and other errors that may occur in the string.

    • Phil Factor

      Re: another Json split for tsql
      Ric’s JSON splitter is on our sister publication SQL Server Central. It is well worth a look. I hadn’t realised it was there.  I guess a lot of TSQL programmers have been working on this particular problem.

  • John Galt

    This is brilliant.
    I designed a far less robust SQL JSON Serializer/de-serializer than this one 2 years ago using the CLR. This one is much more robust and likely reliable.

    I do strongly encourage you to build a CLR-based version at your earliest opportunity. I think you will be very pleasantly surprised at the performance gains you will achieve. The gains may be so significant as to throw open the doors I have cracked when I have first endeavored storing “documents” in SQL Server.

    As I find time I will work on a CLR version based upon the work you have done here unless and until you build one of the same quality of this T-SQL version.

    It may be worth the effort to reverse engineer the DataContractJsonSerializer available in .NET. I don’t believe it is compilable into SQL’s CLR.

    • Phil Factor

      Re: This is brilliant.
      Yes, I’d be really interested to see the CLR version and try it out. 

      If you email the editor<at>simple<hyphen>talk<dot>com, he’ll pass it on to me.

      Thanks for the kind words. If I were rolling up the sleeves and doing a CLR, I’d be inclined to ‘go the whole hog’ and do a CLR YAML parser. It will parse JSON as well, of course, but it would be great to be able to be able to process YAML documents and CLR is the only way to do it. There are several open-source versions that you could use. When you achieve it let the editor of Simple-Talk know!
      This algorithm in this JSON serialiser/Deserialiser article was imposed on me because of the limitations of TSQL, but it would be just as appropriate to any language and would be  as efficient for C# or VB. The particular interest for me is to be able to use it for converting JSON to XML and other formats, since it converts to a neutral table which can then generate a number of different serialisations.

  • John Galt

    This is brilliant.
    I designed a far less robust SQL JSON Serializer/de-serializer than this one 2 years ago using the CLR. This one is much more robust and likely reliable.

    I do strongly encourage you to build a CLR-based version at your earliest opportunity. I think you will be very pleasantly surprised at the performance gains you will achieve. The gains may be so significant as to throw open the doors I have cracked when I have first endeavored storing “documents” in SQL Server.

    As I find time I will work on a CLR version based upon the work you have done here unless and until you build one of the same quality of this T-SQL version.

    It may be worth the effort to reverse engineer the DataContractJsonSerializer available in .NET. I don’t believe it is compilable into SQL’s CLR.

  • John Galt

    new CLR Json Parser
    As indicated in my post of 5/28/2011 I have completed the first draft of a conversion of your T-SQL based solution of your jsonParse. I would like to share it with you given my CLR version is a .NET conversion of your original T-SQL solution. Presently it produces an exact copy of the results from your solution but it is over 20% faster on the low end. The performance improvement increases as the object size and complexity increases and the number of objects parsed increases.

    Please let me know how I can get this solution to you.

    Thank you again for your solution I hope you and yours will enjoy the improvements I have made.

  • Etriaph

    JSON vs. XML
    I just wanted to comment on the quote at the top. Comparing JSON to XML is a little strange. XML describes data (the key here is describes) where JSON is a serialization format for objects. You can describe objects in XML but that’s only a portion of its usage. That we use one of the two to transport data between systems tends to be the only great similarity aside from the capability of both to possess a hierarchy.

    You could just as easily describe a `forums` and a `posts` tables as a comparable format to XML, but it’s really not.

    At any rate, interesting SQL functions; scary, but interesting. 😀

  • Mike Stankavich

    Thanks for sharing helpful function + bug report
    Thanks for sharing this solution. It helped me quickly complete a data migration for some SQL data with an embedded JSON column.

    I did find one small bug though. The @token variable needs to be NVARCHAR(MAX), not NVARCHAR(200). That causes all strings longer than 200 characters to get truncated down to 200.

  • nmcdermaid

    I love a good iconoclastic romp
    really I do.

  • John Galt

    New release of J-SQL solution now available on GitHub
    I have finally found time to publish my latest release of J-SQL to https://github.com/jgcoding/J-SQL.git.

    Although Phil’s solution is technically brilliant it is implemented in T-SQL, which to me only adds to my amazement. But string manipulation and concatenation performs very poorly in T-SQL. It is very slow.

    J-SQL’s ToJsonTable is 40 times faster than a T-SQL solution.

    Although I am aware of other solutions by those who have attempted to compile an existing JSON library into the CLR as an assembly I am unaware of any of these which will do so unless the permission level is set to "UNSAFE". This is not a solution at all as a result.

    J-SQL may be compiled as a CLR assembly with SAFE permission setting.

    Please note linefeeds, tabs, spaces, or other "invisible" special characters preceding a valid JSON delimiter must be removed. Those residing with a valid quoted string item value do not pose a problem. See the notes for details.

    Enjoy

  • mikko

    Be aware of space between name and colon
    I noticed that if you have kind of JSON where there’s space between elements name and colon, the code wont work properly.
    Took me hours to figure out what’s wrong in my code.

    so JSON must be of form
    "name": "stringValue"
    not:
    "name" : "stringValue"

  • jens.frandsen

    Fixing problem with quotes.
    First I’d like to say thank you, great job on this.

    We found the following changes are needed to the JSONEscaped function

    Replace:
    … FROM (SELECT
    ‘"’ AS FromString, ‘"’ AS ToString
    UNION ALL SELECT ”, ‘\’…
    With:
    … FROM (SELECT
    ” AS FromString, ‘\’ AS ToString
    UNION ALL SELECT ‘"’, ‘"’…

    This enables a value with a quote in it to be properly encoded.

    Thanks!

  • wmvijay

    Problem with negative values in json
    This is great and I have used it extensively. Just realized though that negative values are parsed as positive. e.g.

    SELECT * FROM [dbo].[parseJSON](‘{"json": [{"Qty":-100}]}’) where [ValueType] not in (‘object’,’array’)

    parses Qty as 100.

    Is there any way to handle -ve values?

    Thanks

  • marinb

    Problem with arrays in JSON
    When all arrays in JSON inpu are with single values, parseJSON and then ToJSON returns JSON without keys.
    Example:
    ‘{"array1":["xxx"],"array2":["xxx"],"boolean":true}’
    returns: { ["xxx" ], ["xxx"],true}

    But when one of arrays has two values output is correct:
    ‘{"array1":["xxx","yyy"],"array2":["xxx"],"boolean":true}’
    returns:'{"array1":["xxx","yyy"],"array2":["xxx"],"boolean":true}’

  • Phil Factor

    Re: problems
    Hmm. Thanks for these recent bug reports! I’ll fix these as soon as I get a moment!

  • marinb

    Problem with arrays in JSON
    When all arrays in JSON inpu are with single values, parseJSON and then ToJSON returns JSON without keys.
    Example:
    ‘{"array1":["xxx"],"array2":["xxx"],"boolean":true}’
    returns: { ["xxx" ], ["xxx"],true}

    But when one of arrays has two values output is correct:
    ‘{"array1":["xxx","yyy"],"array2":["xxx"],"boolean":true}’
    returns:'{"array1":["xxx","yyy"],"array2":["xxx"],"boolean":true}’

  • warrenblaisdell

    Negative numbers
    I just ran across the same problem with negative numbers. Here’s the fix:

    this line:

    @start=PATINDEX(‘%[^A-Za-z0-9@+.e][A-Za-z0-9@+.e]%’, ‘ ‘+@contents collate SQL_Latin1_General_CP850_Bin)

    Should be changed to this:

    @start=PATINDEX(‘%[^A-Za-z0-9@+-.e][A-Za-z0-9@+-.e]%’, ‘ ‘+@contents collate SQL_Latin1_General_CP850_Bin)

    Then this line a little later that determines real vs. int:

    IF PATINDEX(‘%[^0-9]%’, @value collate SQL_Latin1_General_CP850_Bin)>0

    changes to this:

    IF PATINDEX(‘%[^0-9-]%’, @value collate SQL_Latin1_General_CP850_Bin)>0

    Now it seems to be working for me.

  • kdinuk

    load to table
    Hi, How do I load jason file to table?

  • JSON4SQL

    JSON4SQL
    I have developed a JSON type for SQL Server available for purchase at http://www.json4sql.com. It is a port of the binary JSON type just released for PostgreSQL.

    It also offers:
    •Table value functions for returning elements of JSON objects / columns as a result set
    •A fluent API for creating, modifying, and accessing object properties
    •JSONPath expressions to set, remove, and get data within JSON objects (similar to XPath expressions for XML)
    •Aggregations for converting result sets into objects or arrays
    •Binary storage for fast data access without reparsing

  • simontrain

    TEXTSIZE working with JSON
    Hi,

    Great article. Just sharing a gotcha I found when working with JSON in T-SQL.

    You may have to change the TEXTSIZE setting to support large JSON text strings in TEXT, NTEXT, VARCHAR(MAX) and NVARCHAR(MAX) variables or columns. I found this necessary when running a SQL Agent job where TEXTSIZE defaults to 1024. You can increase that using SET TEXTSIZE 2147483647 (2 GB).

    • Phil Factor

      Re: TEXTSIZE working with JSON
      This only happens in SSMS, of course, where there is a limit to the size of a string displayed in the results pane of the query window.

      SSMS only displays a limited length of string of 8092 characters. If this is the case, you can, with SSMS, export the data to a flat file which will not be truncated. To do this:

      Right-click the Database and then click Tasks -> Export Data Select your Data Source and choose “Flat File Destination” for the Destination type. Then, choose a file name for the output.
      On the menu item “Specify Table Copy or Query”, choose “Write a query to specify the data to transfer”. Then paste in the query.

      The rest should be pretty obvious. This will output the file to text and you can open it in notepad or whatever text editor you choose.

      You can use SQLCMD as well.

      When I do this sort of work I use a stored procedure of my own to save JSON files to disk that I’ve already published on the site, called spSaveTextToFile. which you can get here https://www.simple-talk.com/sql/t-sql-programming/the-tsql-of-text-files/

  • shashatriv

    Error occurs on providing incorrect json [I need to handle this in function]
    Following error occurs

    Cannot insert the value NULL into column ‘valuetype’, table ‘@hierarchy’; column does not allow nulls. INSERT fails.

    Please let me know how to handle this in function itself.

  • jsegarra

    Another JSON parser in TSQL
    Just to let you know that I’ve implemented another JSON parser in SQL.

    This one is full TSQL, but using a procedural approach. Despite being less orthodox, it is much faster. I have also included a JSON2STRING function and two QUERYING (ala XPath) functions.

    You can find it at: http://www.codeproject.com/Articles/1000953/JSON-for-Sql-Server-Part

  • young.programmer

    Parser Fails
    When I try to parse a JSON like ‘{"FirstName":"John\","LastName":"Doe"}’, I am getting error

    Msg 245, Level 16, State 1, Line 1
    Conversion failed when converting the nvarchar value ‘2Last’ to data type int.

    Is there a workaround for this?

  • Chuck

    Hello, I am new to JSON parsing in SQL Server. I just need to parse a JSON object coming out of a C# MVC application using a stored procedure which will be looking for data to return to the application. The database is in SQL Server 2014 and we are not creating tables from the JSON object we are just trying to return the data from the SQL database like a search function. Can someone help me out? Thanks, Chuck

  • jsegarra
  • Nancy Forbes

    I got this added and it works fine if I manually run my stored procedure but if I try to run it in the Job Scheduler I’m seeing “invalid length parameter passed to the LEFT or SUBSTRING function”. I added ParseJson function in this article to my database. (Does it need to be in master instead?) I tried the modifications noted by warrenblaisdell but it didn’t make any difference. I’ve added my SQL Agent account to the database security – even making it a dbo in case there was some permission issue that was masking as the invalid length error. Ideas?

  • Michael Vincent Waller

    Hi, This function was great for SQL 2012…. Is there a way to create the exact same resulting table @Hierarcyhy with Parent_Id using SQL 2016 OPENJSON only? (So no loops or character index logic.) This function works great, but it fails with huge JSON strings in the hundreds of thousand of characters, and OPENJSON should perform using SQL engine more efficiently in 2016.