Manipulating XML Data in SQL Server

When the average database developer is obliged to manipulate XML, either shredding it into relational format, or creating it from SQL, it is often done 'at arms length'. A shame, since effective use of techniques that go beyond the basics can save much code, and are likely to perform better.

It often becomes necessary to create, shred, combine, or otherwise reconstitute XML data, in order to make it fit a specific purpose. Sometimes business requirements dictate that XML fragments should be merged, while other requests call for XML documents or fields to be shredded and their values imported into tables. Sometimes, XML Data must be created directly from existing tables. SQL Server provides plenty of XML-related tools, but how can we know which ones to use, and when?

Let’s examine some of these tasks that require XML manipulation, using the sample AdventureWorks2012 database (Other AdventureWorks versions should work OK, but there may be variations in the data and/or table schemas).

Creating XML

One common requirement is to create an XML structure that is based on the schema of an existing table. Let’s assume that we’ve received a request to create XML data from relevant fields in the Person.Person table, for the person having BusinessEntityID 10001. We need to gather the values from this row:

1585-1-dd221bb4-51fd-4759-847e-dc0a3b960

We need to include the BusinessEntityID field and some of the name-data columns in the new XML structure. Note that there happens to be an existing XML column in the table – the Demographics field.

SQL Server provides an XML option to use with the FOR clause, allowing for an easy method of converting table data into XML nodes. FOR XML can take different arguments – let’s find out which one works for us.

The AUTO argument is one of the simplest to use. It creates one node for each record returned by the SELECT clause:

1585-1-bf14dd7c-2785-4fac-927e-169aec625

By default, the AUTO argument organizes every non-XML field into a node attribute. To specify that the values be created as node elements, not attributes, we can additionally specify the ELEMENTS argument:

The resulting XML:

1585-1-af99ca70-419c-41fc-8b44-04d37130a

The ELEMENTS argument causes every value to be created as a node element. Now we have a separate node for every value, and a wrapper root node. In the first example the resulting XML had the same data, but the values were rendered as attributes.

We notice that the root node name is the schema and table name (Person.Person). We would like to change this to ‘Person‘. To designate a custom root element, we’ll use the PATH argument instead of AUTO:

The results:

1585-1-90d45dec-f3f9-451e-8986-66204f168

Using the PATH argument has enabled us to specify the name ‘Person’ in our root wrapper.

Combining Node Attributes and Elements

What if we want one or a few values to be created as node attributes, thereby resulting in a combination of node attributes and elements? We can create node attribute values by simply designating column aliases that use the ‘@’ symbol:

The resulting XML:

1585-1-05fc0f0c-049c-40c1-b96c-2dd9de947

Including XML Columns

What happens if we add the XML field (Demographics) to our SELECT clause?

1585-1-2460a973-7412-4eed-9ade-570e92b7c

We see that an existing XML field is created as a nested node element. Note that the XML namespace data is included in the nested node.

Shredding XML

‘Shredding’ XML data is another common request. To ‘shred’ means to strip the actual data away from the markup tags, and organize it into a relational format. For example, shredding is what happens when an XML document is imported into a table, when each node value is mapped to a specific field in the table. A popular method to use for this is to use the OPENXML() function, but XQuery methods can also be engaged to perform the same tasks. OPENXML() was available to use for shredding before the SQL Server XQuery methods were introduced, and is somewhat faster for larger data operations. However, it is decidedly more complex to use, and is more memory intensive. Also, OPENXML() cannot take advantage of XML indexes as XQuery methods can.

We’ve received another request: pull data from some of the nodes (Occupation, Education, HomeOwnerFlag, NumberCarsOwned) that are contained in the Person.Person table’s Demographics XML column for BusinessEntityID 15291, and display it along with other non-XML field values (FirstName, MiddleName, LastName) from the table.

The Person.Person record for BusinessEntityID 15291, and their expanded Demographics XML instance are shown below:

1585-1-4cc25090-8bb1-4318-a1ae-8eaf7c89a

1585-1-04042007-3f5c-4266-bb35-2aa6983d5

The XQuery value() method is an easy way to extract values from XML data while preserving the data types:

1585-1-7d946213-f2ea-4357-aeae-f6d4dd49d

XML Namespaces

While this returns the shredded result that we want, the repetitive namespace declarations expand the size of our query – since we are returning four XML node values, we have to declare the namespace four times. Declaring the namespace is necessary because the Demographics XML structure uses typed XML – its XML data is associated with an XML schema. However, we can use a WITH XML NAMESPACES clause to declare the XML namespace instead – this lets us to declare the namespace only once for the entire code block:

XQuery nodes() Method

The XQuery nodes() method is another option that allows us to specify a particular node set in which to look for the desired child nodes:

We’ve used the nodes() method to drill down (one level) to the location of the ‘IndividualSurvey‘ node, and then returned the actual values via the XQuery value() method. We used CROSS APPLY to join the node set back to the table. A CROSS APPLY would not have been necessary if we were shredding an XML variable, instead of from an XML column in a table. The nodes() method easily shreds XML data from XML columns as well as from XML variables. The nodes() method requires table and column aliases (T(C)) in order for other XQuery methods (like the value() method) to access the node set that the nodes() method returns. The column and table alias names are irrelevant.

Notice that we’ve used a WITH XML NAMESPACES clause to declare the XML namespace. Declaring the namespace is necessary because the Demographics XML structure uses typed XML – its XML data is associated with an XML schema.

Now that our XML data has been shredded, the results can be stored in a table or combined with other queries.

Nodes() Method Application and Efficiency

Keep in mind that using the nodes() method in simple queries can unnecessarily reduce query efficiency. When run in the same batch, our query that used the nodes() method cost 54% of the batch, where the value()-method-only query cost just 46%:

1585-1-4694e9cb-4e9c-4855-a343-4aaa2670b

In light of this, why use the nodes() method at all? For simple queries, it’s probably better not to use it, although the batch cost or query time differences may be negligible. We’ve shown the use of nodes() in a very basic example, but it can also be used in more complex queries where it is necessary to return subsets of node sets – using nodes() on a nodes() result. It’s also very convenient to use for constructing new XML from existing nodes. Since nodes() works by rendering logical portions of XML instances as node sets, it’s ideal for when query results must be returned in node form.

Combining XML

Another, perhaps more uncommon procedure, would be to merge XML data from different instances. Let’s do just that to demonstrate – we’ll combine all of the store survey data from the Sales.Store table into one XML structure, for SalesPersonID 282. The store survey data is in the Demographics XML column. We also want to include 2 non-XML fields for each store: Name and BusinessEntityID, as node attributes in a parent ‘Store‘ node. To complete the process, we’ll wrap the final XML structure with a ‘StoreSurveys‘ root node.

Let’s get a 5-record sample from the Sales.Store table, for SalesPersonID 282:

1585-1-f2b19492-7397-4ea4-9d15-2b3c2acbe

The store survey XML data (Demographics column) for the Vinyl and Plastic Goods Corporation store looks like this:

1585-1-a1758079-2489-4817-b02e-5918efb77

To collect all of the store survey data for one sales person into one XML instance, we can again use the FOR XML clause. As with our previous use of FOR XML, the AUTO argument will be the first one we try. Remember that the AUTO argument, by default, gathers every non-XML field into an XML structure in node attribute form. Like the PATH argument, it also nests any existing XML column data as element nodes:

1585-1-d09c2718-0053-4293-8773-0dbdae8bd

We used the XQuery query() method to query starting at the root of each XML instance (by using the ‘/‘ in the path expression). It looks like we have the result we want, except that there is no root node surrounding the entire structure. We used the PATH argument in our other FOR XML example, but we cannot use it here, since we are using AUTO to obtain the node attribute values. To build the wrapper node while using the AUTO argument, we’ll use the ROOT argument:

1585-1-d7e15cc7-fcd5-41e8-92e8-5199562d0

The addition of the ROOT argument gives us the top-level wrapper node we wanted, merging all store survey records for the sales person into one XML instance with a root node.

Separating XML

If we were to perform the converse of the previous example, we would need to pull out the individual ‘StoreSurvey‘ node block from the combined store surveys XML instance (shown above), for each store name or ID. Separating out chunks of XML data into logical records essentially involves the same procedures we use when shredding XML, but in this case we will preserve of a portion of the XML structure. To demonstrate, let’s combine the store surveys for SalesPersonID 282 again; this time, however, we will use an XML variable to store the combined store surveys:

At this point, we have the same XML structure as before:

1585-1-3fdaad5a-44e2-499c-b573-c6f6d05ca

Now we will break apart the XML into logical records, again using the XQuery nodes() method:

1585-1-9ccb272c-899f-4b8d-922d-5c93d5416

The new Demographics XML structure for BusinessEntityID 312:

1585-1-3d41f4de-3b55-423d-8498-545df32c3

Note that we did not use CROSS APPLY this time, since the XML data was in a variable instead of an XML column in a table. Also, we used the ‘.‘, a self::node() abbreviation, in the query() method’s path expression – to pull out the store survey data. This indicates that the XML portion should be extracted from the nodes() method’s node set (the Store node), whereas a ‘/‘ root path expression would have caused the XML to be extracted from the root node (thus returning the entire XML structure). Also, notice that we used the parent::node() abbreviation ‘..‘ in order to reference attribute values ‘Name‘ and ‘ID‘ that are in the node above the ‘StoreSurvey‘ node (the parent ‘Store‘ node). A variation on the script that does not require a namespace declaration could be written as follows:

The above script uses the Child axis to drill down one level further than the nodes() method’s node set, thereby eliminating the need to reference the ‘StoreSurvey‘ node directly.

Summary

We’ve taken a look at some ways to coerce XML data to fit specific needs. We handled one common request, creating XML instances from existing tables, by using the FOR XML clause – applying appropriate arguments to design the XML structure to fit specific aesthetic requirements. We demonstrated another very common procedure, shredding XML data, by employing the XQuery nodes() method. We saw that multiple XML fragments and instances can also be merged into a single instance by using the FOR XML tools. We then reversed that operation by splitting the XML instance into logical records of XML data.

Manipulating XML data to fit your needs may take creativity and some experimentation with new tools. We’ve worked out solutions for a few basic problems, but there is more to learn. The techniques that we have introduced here should help to get you started.

Tags: , , , , , , ,

  • 207878 views

  • Rate
    [Total: 12    Average: 4.4/5]
  • Anonymous

    Creating XML and writing to the file system
    Excellent article. I wish I’d found it six-odd months ago when I was starting on the current project.

    We need to communicate with an external system using XML messages stored on the file system. I eventually managed to work out how to generate the XML I needed but was completely defeated by the problem of how to get this data out into a file. All the mechanisms I could find (I’m still pretty much a T-SQL newbie now, I was a complete virgin back then) seemed to require a text variable type that was limited to 8000 characters or so, and XML can outgrow that very easily if you turn your back.

    The only mechanism I could find to get data from a T-SQL variable of type XML out onto the file system was to use an SSIS package, feed the variable to an XSLT transform that did practically nothing, then and specify output to a file. Unfortunately it took me too long to work this out so the project went ahead using "arms length" methods.

    Is there a better way?

  • Anonymous

    Creating XML and writing to the file system
    Excellent article. I wish I’d found it six-odd months ago when I was starting on the current project.

    We need to communicate with an external system using XML messages stored on the file system. I eventually managed to work out how to generate the XML I needed but was completely defeated by the problem of how to get this data out into a file. All the mechanisms I could find (I’m still pretty much a T-SQL newbie now, I was a complete virgin back then) seemed to require a text variable type that was limited to 8000 characters or so, and XML can outgrow that very easily if you turn your back.

    The only mechanism I could find to get data from a T-SQL variable of type XML out onto the file system was to use an SSIS package, feed the variable to an XSLT transform that did practically nothing, then and specify output to a file. Unfortunately it took me too long to work this out so the project went ahead using "arms length" methods.

    Is there a better way?

  • delcons

    RE:Creating XML and writing to the file system
    As much as I love doing everything I can via SQL Server, this sounds to me like it might be more of a job for a .NET service. C# has awesome SQL Server integration and tools, and I’m sure it would be fairly simple to write something that could do the job for you…

  • delcons

    RE:Creating XML and writing to the file system
    As much as I love doing everything I can via SQL Server, this sounds to me like it might be more of a job for a .NET service. C# has awesome SQL Server integration and tools, and I’m sure it would be fairly simple to write something that could do the job for you…

  • ken ambrose

    Returning resultset from XML when structure is not known at compile time
    Hi,
    I have stored a "flat" resultset as XML in an XML column and now I need to query the XML and return the original resultset.

    At runtime when extracting the XML back into resultset, I don’t have access to the original resultset’s structure, only the XML representation.

    Is there a simple syntax to query an unknown XML structure from an XML columns and return a resultset?

    In Sql, it is "select *". Seems like it should be fairly simple when XML is the source, but I have not yet found it to be simple.

  • delcons

    RE:Returning resultset from XML when structure is not known at compile time
    Unfortunately, it’s not that simple to shred XML data…if you think about it for a minute, it’s easy to see why — XML data can hold RELATIONAL data for multiple tables…so it’s not really comparable to a SELECT * from a SQL table. You have to know at least some of the node names if you want to shred it out, like this (the XML column being "XML_DETAILS"):

    SELECT XML_DETAILS.value(‘(/Employee/FirstName)[1]’,’varchar(20)’) as FirstName,
    XML_DETAILS.value(‘(/Employee/LastName)[1]’,’varchar(20)’) as LastName,
    XML_DETAILS.value(‘(/Employee/@ssn)[1]’,’int’) as SSN
    FROM EMPLOYEE_TBL

  • ken ambrose

    RE:Returning resultset from XML when structure is not known at compile time
    Thanks for the reply delcons.

    I hadn’t thought about the relational nesting problem, but I can see your point.

    I did find about 300 lines of code on sql server central dot com that is a very recent (as of 10/2012) attempt to make such an XML shredder. It seems to work great but I don’t dare deploy it to production because I have no clue how or why the damn thing works!

    As an alternative, we’ve decided to generate the query syntax to shred the XML dynamically, at the same time we convert the recordset to XML, and save both together in the record. Because of course at that time we do have access to the original table structure, so we know what the XML structure is. But its kind of a new concept for me- saving the data structure along with the data in every record. I thought that was what XML was supposed to prevent the need for!

  • Jeff Moden

    Was it mine?
    @Ken Ambrose,

    Just curious, Ken… Was the 300 line auto-xml shredder the one I posted?

  • Jeff Moden

    Well done!
    I was looking for something else having to do with XML and ran into this fine article. Thought I’d stop long enough to read it and take my hat off to you for such a fine article. Well done, Seth!

  • delcons

    RE:Well done!
    Thank you Jeff 🙂

  • Andrea S.

    Thanks + Column naming question
    Thank you for such a clear concise and practical introduction to using xml in Sql Server, this post made me wonder whether there is a better method than this one (1) to explicitly name the column generated by the third sql script you posted.
    Thanks in advance and keep up the good work 🙂
    (1)
    select (
    — create XML structure using FOR XML AUTO
    SELECT BusinessEntityID,
    PersonType,
    Title,
    FirstName,
    MiddleName,
    LastName,
    Suffix
    FROM Person.Person
    WHERE BusinessEntityID = 10001
    FOR XML AUTO, ELEMENTS
    GO
    ) as NamedColumn

  • umairaslam22

    Get XML From SQL
    Hey nice one . I wrote a simple method to get XML from SQL can read write XML by SQL

    http://umairaslam.com/index.php/sql-to-xml/

  • doupanpan

    Great article
    This one combines the one you wrote at mssqltips are great
    http://www.mssqltips.com/sqlservertip/2889/basic-sql-server-xml-querying/#comments

    I feel like after reading these two articles, I am ready to do anything with XML in SQL Server, thank you Seth.

    BTW, I also recommend people to read this
    https://www.simple-talk.com/sql/learn-sql-server/using-the-for-xml-clause-to-return-query-results-as-xml/
    This is also great if people want to more a bit more about XML.

  • XMLLearner

    How to shred children nodes?
    This is a great article and I learned a lot from you. Thanks for sharing. I have a challenge shredding child nodes and I hope you can shred some light. I added a FamilyInformation and Education child nodes from your same XML. How can I shred the child nodes elements like TotalChildren, MarriedDate, Highschool, MiddleSchool information from XQuery?

    <IndividualSurvey xmlns="http://Schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"&gt;
    <TotalPurchaserYTD>4431.4</TotalPurchaserYTD>
    <DateFirstPurchase>2003-04-102</DateFirstPurchase>
    <BirthDate>1956-01-20</BirthDate>
    <MaritalStatus>S<MaritalStatus>
    <YearlyIncome>50000-75000</YearlyIncome>
    <Gender>F</Gender>
    <FamilyInformation>
    <TotalChildren>2</TotalChildren>
    <NumberChildrenAtHome>0</NumberChildrenAtHome>
    <MarriedDate>1980-12-25</MarriedDate>
    </FamilyInformation>
    <Education>
    <HighSchool>Standley High School</HighSchool>
    <MiddleSchool>St. John Middle School</MiddleSchool>
    <Elementary>Davidson Elementary</Elementary>
    </Education>
    <Occupation>Skilled Manual</Occupation>
    <HomeOwnerFlag>1</HomeOwnerFlag>
    <NumberCarsOwned>3</NumberCarsOwned>
    <CommuteDistance>5-10 Miles</CommuteDistance>
    </IndividualSurvey>

    Please advise.

  • XMLLearner

    How to shred children nodes?
    This is a great article and I learned a lot from you. Thanks for sharing. I have a challenge shredding child nodes and I hope you can shred some light. I added a FamilyInformation and Education child nodes from your same XML. How can I shred the child nodes elements like TotalChildren, MarriedDate, Highschool, MiddleSchool information from XQuery?

    <IndividualSurvey xmlns="http://Schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"&gt;
    <TotalPurchaserYTD>4431.4</TotalPurchaserYTD>
    <DateFirstPurchase>2003-04-102</DateFirstPurchase>
    <BirthDate>1956-01-20</BirthDate>
    <MaritalStatus>S<MaritalStatus>
    <YearlyIncome>50000-75000</YearlyIncome>
    <Gender>F</Gender>
    <FamilyInformation>
    <TotalChildren>2</TotalChildren>
    <NumberChildrenAtHome>0</NumberChildrenAtHome>
    <MarriedDate>1980-12-25</MarriedDate>
    </FamilyInformation>
    <Education>
    <HighSchool>Standley High School</HighSchool>
    <MiddleSchool>St. John Middle School</MiddleSchool>
    <Elementary>Davidson Elementary</Elementary>
    </Education>
    <Occupation>Skilled Manual</Occupation>
    <HomeOwnerFlag>1</HomeOwnerFlag>
    <NumberCarsOwned>3</NumberCarsOwned>
    <CommuteDistance>5-10 Miles</CommuteDistance>
    </IndividualSurvey>

    Please advise.

  • Syed

    Thanks for great article.
    I have odd requirement in which I need to have both Attribute as well Element for a node
    for e.g.

    Carolyn
    Alonso

    Is it possible to generate ?