Click here to monitor SSC
  • Av rating:
  • Total votes: 57
  • Total comments: 4
Robert Sheldon

Converting String Data to XML and XML to String Data

01 February 2012

We all appreciate that, in general, XML documents or fragments are held in strings as text markup. In SQL Server, XML variables and columns are instead tokenised to allow rapid access to the data within. This is fine, but can cause some odd problems, auch as 'entitization'. What, also, do you do if you need to preserve the formatting? As usual Rob Sheldon comes to our aid.

When you’re working with XML data, you might find that you want to convert a value from the XML data type to another type, or from another type to the XML type. SQL Server lets you do both, to a limited degree. You can convert XML data to any of the string or binary types, and you can convert any of the string or binary types to XML. The process you follow if fairly straightforward in either case. Mostly, you need to know when SQL Server will do implicit conversions or when you must explicitly cast data into another type. And you should be aware of a couple other subtleties when converting to or from XML data.

NOTE: This article is the fourth in a series about working with XML data in SQL Server. The first three articles cover the XML data type (“Working with the XML Data Type in SQL Server”), its methods (“The XML Methods in SQL Server”), and incorporating XML into database objects (“Incorporating XML into Your Database Objects”).

Converting String Data to XML

In SQL Server, you can convert data configured with any of the character or binary data types—such as CHAR, VARCHAR, and VARBINARY—to the XML data type. You can use the CAST() or CONVERT() function to explicitly cast the data to a different type, or you can let SQL Server implicitly convert the data. For instance, SQL Server will automatically convert a string value stored with the NVARCHAR data type into an XML value, as shown in the following example:

DECLARE @string NVARCHAR(MAX);
DECLARE @xml XML;
SET @string =
  
'<bookstore><book>Candide</book></bookstore>';
SET @xml = @string;
SELECT @xml;

In this set of statements, I first declare the @string variable with the NVARCHAR(MAX) data type and then the @xml variable with the XML data type. I assign a string value—an XML fragment—to the @string data type, then set the value of @xml to equal @string. Because SQL Server can implicitly convert an NVARCHAR value to an XML value, the assignment is very straightforward, which I confirm by querying the @xml variable. As expected, the SELECT statement returns the following results:

<bookstore><book>Candide</book></bookstore>

The result set contains the XML fragment as it was originally assigned to the @string variable. The same holds true for other character data types, in terms of performing implicit conversions. For instance, in the following example, I replace the NVARCHAR(MAX) data type with VARCHAR(100):

DECLARE @string VARCHAR(100);
DECLARE @xml XML;
SET @string =
  
'<bookstore><book>Candide</book></bookstore>';
SET @xml = @string;
SELECT @xml;

This example is identical to the preceding one, except for switching the character data types. As a result, the SELECT statement returns the same XML fragment as before:

<bookstore><book>Candide</book></bookstore>

Again, SQL Server has implicitly converted the string data to XML. However, you can explicitly convert that data if you want. In the following example, I use the CAST() function to convert the @string value:

DECLARE @string VARCHAR(100);
DECLARE @xml XML;
SET @string =
  
'<bookstore><book>Candide</book></bookstore>';
SET @xml = CAST(@string AS XML);
SELECT @xml;

When you specify the CAST() function, you must pass in an expression that references the data to be converted, in this case, the @string variable, followed by the AS keyword and the name of the target data type, XML. All this is enclosed in parentheses, as I’ve done here. I then assign the value returned by the CAST() function to the @xml variable. As to be expected, the SELECT statement returns the same results as the previous examples:

<bookstore><book>Candide</book></bookstore>

You can just as easily use the CONVERT() function to achieve the same results. The main difference is how you structure the arguments that you pass into the function. For CONVERT(), you must first specify the target data type (XML), followed by the expression that references the source data (@string), as shown in the following example:

DECLARE @string VARCHAR(100);
DECLARE @xml XML;
SET @string =
  
'<bookstore><book>Candide</book></bookstore>';
SET @xml = CONVERT(XML, @string);
SELECT @xml;

Notice that I follow the XML data type argument with a comma, then I specify the @string variable. Once again, the SELECT statement returns the XML fragment as it is assigned to the @string variable:

<bookstore><book>Candide</book></bookstore>

You might be wondering why you would explicitly cast string or binary data to the XML type if SQL Server handles such conversions implicitly. Let’s look at the CAST() function first. One reason you might want to use CAST()is if you plan to run your SQL script against a database management system in addition to SQL Server and that system doesn’t support implicit conversions. Because the CAST() function conforms to ANSI specifications, you can use it with any database systems that conform to those standards.

That’s not the case with the CONVERT() function, which is specific to Transact-SQL in SQL Server. If you want to run your script against another database system, you have to modify your code to conform to that system’s specifications. However, if you’re running your script only against SQL Server, you can take advantage of the CONVERT() function’s support for additional options, which let you better refine the conversion process.

For instance, in the following example I use tabs between the parent and child elements in the XML fragment:

DECLARE @string VARCHAR(100);
DECLARE @xml XML;
SET @string =
  
'<bookstore> <book>Candide</book> </bookstore>';
SET @xml = CONVERT(XML, @string);
SELECT @xml;

The tabs serve only to add white space between the XML elements and do not affect the elements themselves. Then, as I do in the previous example, I use the CONVERT() function to cast the string as XML data. However, when I run the SELECT statement, the data no longer includes the white space, as shown in the following results:

<bookstore><book>Candide</book></bookstore>

But I can preserve the white space in the original string by adding a third argument to the CONVERT() function, as shown in the following example:

DECLARE @string VARCHAR(100);
DECLARE @xml XML;
SET @string =
  
'<bookstore> <book>Candide</book> </bookstore>';
SET @xml = CONVERT(XML, @string, 1);
SELECT @xml;

Notice that the function’s third argument is 1, which tells SQL Server to preserve the white space in the source data during the conversion process. Now the SELECT statement returns the following results:

<bookstore><book>Candide</book></bookstore>

This time the tabs are intact. I could have used spaces instead and received similar results. Now let’s look at what happens when we add linefeeds to the XML fragment. In the following example, I modify the string to include an additional book along with linefeeds and tabs:

DECLARE @string VARCHAR(100);
DECLARE @xml XML;
SET @string =
'<bookstore>
<book>Candide</book>
<book>Pride and Prejudice</book>
</bookstore>'
;
SET @xml = CONVERT(XML, @string, 1);
SELECT @xml;

Because I pass the third argument (1) into the CONVERT() function, as I did in the previous example, the conversion process preserves the white space and linefeeds, as shown in the results returned by the SELECT statement:

<bookstore>
   <book>Candide</book>
   <book>Pride and Prejudice</book>
</bookstore>

Another thing worth noting about converting data to the XML data type is that once you’ve converted the data, you can use the data type’s methods to retrieve data. For example, I modified the SELECT statement in the previous example to include the query() method:

DECLARE @string VARCHAR(100);
DECLARE @xml XML;
SET @string =
'<bookstore>
<book>Candide</book>
<book>Pride and Prejudice</book>
</bookstore>'
;
SET @xml = CONVERT(XML, @string, 1);
SELECT @xml.query('/bookstore/book[2]');

As you can see, I pass in the second instance of the child element (/bookstore/book[2]) as an argument to the query() method. The SELECT statement now returns the following results.

<book>Pride and Prejudice</book>

You can, of course, use the query() method to perform more complex queries, and you can use other methods. The key is to save the string data to the XML type and then call the method.

Converting XML to String Data

Although SQL Server implicitly converts string and binary data to the XML data type, it doesn’t work the other way. All conversions from the XML type to string and binary types must be done explicitly using the CAST() or CONVERT() function. If you try to perform an implicit conversion, your statement will fail. For instance, the following example tries to convert the XML fragment to VARCHAR(100):

DECLARE @xml XML;
DECLARE @string VARCHAR(100);
SET @xml =
  
'<bookstore><book>Candide</book></bookstore>';
SET @string = @xml;
SELECT @string;

After I declare the @xml and @string variables, I assign the XML fragment to the @xml variable. I then try to set the @string variable to equal the @xml variable, but when I try to run these statements, SQL Server returns the following error:

Implicit conversion from data type xml to varchar is not allowed. Use the CONVERT function to run this query.

The problem is easy enough to fix. In the following example, I include the CAST() function to explicitly convert the XML value to VARCHAR(100):

DECLARE @xml XML;
DECLARE @string VARCHAR(100);
SET @xml =
  
'<bookstore><book>Candide</book></bookstore>';
SET @string = CAST(@xml AS VARCHAR(100));
SELECT @string;

Now the XML value is converted into a string with no problem. When I retrieve the value of @string, the SELECT statement returns the XML fragment, as shown in the following results:

<bookstore><book>Candide</book></bookstore>

I could have just as easily used the CONVERT() function to cast the XML value to VARCHAR:

DECLARE @xml XML;
DECLARE @string VARCHAR(100);
SET @xml =
  
'<bookstore><book>Candide</book></bookstore>';
SET @string = CONVERT(VARCHAR(100), @xml);
SELECT @string;

As to be expected, the conversion occurs without a hitch and the XML fragment is assigned to the @string variable, which the SELECT statement confirms. Now suppose you insert linefeeds and tabs into your XML fragment, as I do in the following example:

DECLARE @xml XML;
DECLARE @string VARCHAR(100);
SET @xml =
'<bookstore>
<book>Candide</book>
<book>Pride and Prejudice</book>
</bookstore>'
;
SET @string = CONVERT(VARCHAR(100), @xml, 1);
SELECT @string;

Because the XML fragment contains these new elements, you might expect that you can simply add the third argument to the CONVERT() function to preserve the tabs and linefeeds. However, although I take this approach, the value saved to the @string variable does not preserve these elements, as shown in the results returned by the SELECT statement:

<bookstore><book>Candide</book><book>Pride and Prejudice</book></bookstore>

The problem is not with how I convert the @xml value, but with the way I assign the XML fragment to that variable. Notice that I assign the value simply by setting @xml to equal the XML fragment, enclosed in single quotes. What is essentially happening here is that SQL Server is implicitly converting a string value to the XML type, which means that the tabs and linefeeds are not being preserved during that assignment. The way to get around this is to use the CONVERT() function to explicitly cast the string to the XML type, as I do in the following example:

DECLARE @xml XML;
DECLARE @string VARCHAR(100);
SET @xml = CONVERT(XML,
'<bookstore>
<book>Candide</book>
<book>Pride and Prejudice</book>
</bookstore>'
, 1);
SET @string = CONVERT(VARCHAR(100), @xml, 1);
SELECT @string;

As you can see, I include a third argument in the CONVERT() function that specifies these elements be preserved. The SELECT statement returns the following results:

<bookstore>
   <book>Candide</book>
   <book>Pride and Prejudice</book>
</bookstore>

Now the string includes the tabs and linefeeds as they exist in the original XML fragment. Let’s take a closer look at this process so you can better understand what’s happening behind the scenes when you convert XML data.

XML Entitization

As the last example demonstrates, you can use the CONVERT() function to preserve characters that the XML parser normally ignores, such as spaces and tabs between elements. What this points to is that SQL Server handles certain components of an XML fragment—whether converting to or from the XML type—differently from the basic elements, attributes, and their values.

To better understand what’s happening, take a look at another example. The following set of Transact-SQL statements are similar to the last example, except that I’ve included a second SELECT statement to retrieve the @xml value and I’ve modified the second instance of the CONVERT() function so it doesn’t include the third argument (1):

DECLARE @xml XML;
DECLARE @string VARCHAR(100);
SET @xml = CONVERT(XML,
'<bookstore>
<book>Candide</book>
<book>Pride and Prejudice</book>
</bookstore>'
, 1);
SELECT @xml;
SET @string = CONVERT(VARCHAR(100), @xml);
SELECT @string;

As you might expect, the new SELECT statement returns the XML fragment with the linefeeds and tabs in place:

<bookstore>
   <book>Candide</book>
   <book>Pride and Prejudice</book>
</bookstore>

However, the second SELECT statement returns several unexpected values:

<bookstore>
&#x09;<book>Candide</book>
&#x09;<book>Pride and Prejudice</book>&#x0A;</bookstore>

NOTE: If I were to try to run this example without converting the original XML fragment, or converting the fragment without specifying the third argument in the CONVERT() function, the tabs and linefeeds would not be preserved, and the two SELECT statements would return the string without those tabs and linefeeds.

When the XML parser stores characters such as tabs and linefeeds within an XML document or fragment, it must do so in a way that preserves them throughout the parser’s normalization process, allowing these elements to be maintained whether being stored or retrieved or whether being converted to or from XML. Tabs and linefeeds are just two of the special characters that need to be preserved in this way, as the above example demonstrates, with tabs being saved as &#x09; and linefeeds saved as &#x0A;. When you retrieve the XML value directly, these special characters are automatically displayed in a readable format.

The process of preserving certain characters with these special symbols is known as entitization. Unfortunately, the XML parser is not always consistent in entitizing characters, as demonstrated by the fact that only one linefeed is stored as &#x0A;. Another issue is the way the parser handles special characters that appear in an element or attribute value. In such cases, SQL Server simply returns an error, rather than trying to entitize them. For instance, in the following example I replace the word “and” in the second book title with an ampersand (&):

DECLARE @xml XML;
DECLARE @string VARCHAR(100);
SET @xml = CONVERT(XML,
'<bookstore>
<book>Candide</book>
<book>Pride & Prejudice</book>
</bookstore>'
, 1);
SELECT @xml
SET @string = CONVERT(VARCHAR(100), @xml);
SELECT @string;

When I run this statement, I receive the following error:

XML parsing: line 3, character 15, illegal name character

The XML parser doesn’t like characters such as ampersands, left brackets (<), and right brackets (>) within element and attribute values. To store XML that contains these characters, you must manually entitize them by replacing the character with their symbols. In this case, I replace the ampersand with &amp;:

DECLARE @xml XML;
DECLARE @string VARCHAR(100);
SET @xml = CONVERT(XML,
'<bookstore>
<book>Candide</book>
<book>Pride &amp; Prejudice</book>
</bookstore>'
, 1);
SELECT @xml;
SET @string = CONVERT(VARCHAR(100), @xml);
SELECT @string;

The parser can now parse the XML fragment. In doing so, it preserves the entitized ampersand, as shown in the results returned by the first SELECT statement:

<bookstore>
   <book>Candide</book>
   <book>Pride &amp; Prejudice</book>
</bookstore>

As you can see, the second book title contains the entitized ampersand. Now SQL Server has no problem handling the element’s value when that value is stored, retrieved, or converted. The entitized ampersand is also preserved when you convert the XML value to a string, which the second SELECT statement confirms:

<bookstore>
&#x09;<book>Candide</book>
&#x09;<book>Pride &amp; Prejudice</book>&#x0A;</bookstore>

Of course, chances are pretty good you won’t want your string value to show the entitized characters. As before, you can try to get rid of them by adding the third argument to the second CONVERT() function, as shown in the following Transact-SQL:

DECLARE @xml XML;
DECLARE @string VARCHAR(100);
SET @xml = CONVERT(XML,
'<bookstore>
<book>Candide</book>
<book>Pride &amp; Prejudice</book>
</bookstore>'
, 1);
SELECT @xml;
SET @string = CONVERT(VARCHAR(100), @xml, 1);
SELECT @string;

The second SELECT statement now returns the following results:

<bookstore>
   <book>Candide</book>
   <book>Pride &amp; Prejudice</book>
</bookstore>

Almost there. The entitized tab and linefeed characters are returned as regular characters, but the ampersand within the element value is still entitized. To address this issue, you have to specifically remove it from your string. For example, in the following Transact-SQL I use the REPLACE() function to replace the &amp; value with an ampersand:

DECLARE @xml XML;
DECLARE @string VARCHAR(100);
SET @xml = CONVERT(XML,
'<bookstore>
<book>Candide</book>
<book>Pride &amp; Prejudice</book>
</bookstore>'
, 1);
SELECT @xml;
SET @string =
  
REPLACE(CONVERT(VARCHAR(100), @xml, 1), '&amp;', '&');
SELECT @string;

For the first argument of the REPLACE() function, I pass in the converted @xml value, then specify the value to be replaced (&amp;) and the new value (&). Now the second SELECT statement returns the value we want:

<bookstore>
   <book>Candide</book>
   <book>Pride & Prejudice</book>
</bookstore>

As the results show, the string value no longer contains any entitized characters. As you work with XML data, you’ll get a better feel for when to manually entitize and de-entitize data. For more information about the entitization process in SQL Server, see the topic “Serialization of XML Data” in SQL Server Books Online.

XML Data Conversion

For the most part, converting data from the XML data type to another type or converting it from another type to the XML type is a fairly painless process. Not surprisingly, there are a few gotchyas along the way, entitization being one of them. For more details about converting XML data, see the topics “Generating XML Instances” and “CAST and CONVERT (Transact-SQL)” in SQL Server Books Online. Also worth checking out is how to incorporate the FOR XML clause in a SELECT statement to return results as XML. You can find details about that in my Simple-Talk article “Using the FOR XML Clause to Return Query Results as XML.” As you can see, SQL Server provides a number of ways to work with XML data. This article, along with the first three in the series, should have provided you with a good overview of the various considerations to take into account when working with XML.

Robert Sheldon

Author profile:

After being dropped 35 feet from a helicopter and spending the next year recovering, Robert Sheldon left the Colorado Rockies and emergency rescue work to pursue safer and less painful interests—thus his entry into the world of technology. He is now a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. He has also written news stories, feature articles, restaurant reviews, legal summaries, and the novel 'Dancing the River Lightly'. You can find more information at http://www.rhsheldon.com.

Search for other articles by Robert Sheldon

Rate this article:   Avg rating: from a total of 57 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: XML and string!!
Posted by: Anonymous (not signed in)
Posted on: Tuesday, February 07, 2012 at 12:48 AM
Message: Many thanks for this summary and All the Best for your endeavours .....

Subject: XML Conversion services
Posted by: kalpanaceo (not signed in)
Posted on: Tuesday, February 21, 2012 at 1:52 AM
Message: I like this post. this post very important. we can get lot of information thought this post and this site. thanks for giving these information, good luck...!!!!

Informatics Outsourcing - XML Conversion services

Subject: Converting XML via FOR XML to String with preserved entity formatting question
Posted by: bkt (view profile)
Posted on: Wednesday, January 23, 2013 at 5:11 AM
Message: Thanks Robert for a great article!

I'm trying to use the above in conjunction with the FOR XML (path, elements) to then create a string which retains the line/tab and line feeds but can't seem to get it to work.

My example is:

DECLARE @RetXML XML
DECLARE @ConvertXML XML
DECLARE @RetStr nvarchar(max)

Select @RetXML = (

SELECT Student.Id,
Student.First_Name,
Student.Last_Name,

(
SELECT
SA.Address1 As Address1,
SA.Address2 As Address2,
SA.Address3 As Address3,
SA.City As TownCity
FROM Students_Addresses AS SA
WHERE SA.Student_Id = Student.Id

FOR XML PATH('Student_Addresses_Local'), TYPE)

FROM Students AS Student
WHERE Student.Id = 54000

FOR XML PATH ('Students'), ELEMENTS )

SET @ConvertXML = CONVERT(XML,@RetXML,1)

SET @RetStr = CONVERT(nvarchar(max),@ConvertXML,1)

PRINT @RetStr


but the @RetStr is unformatted:

<Students><Id>54000</Id><First_Name>MyFirstName</First_Name><Last_Name>MyLastName</Last_Name><Student_Addresses_Local><Address1>123 My Street</Address1></Student_Addresses_Local></Students>

Is there any way to retain the text formatting in this case?

Thanks in advance for any help :)

Brett






Subject: Entitization of ampersand
Posted by: binki (view profile)
Posted on: Wednesday, February 26, 2014 at 11:45 AM
Message: You say “the ampersand within the element value is still entitized. To address this issue, you have to specifically remove it from your string. For example, in the following Transact-SQL I use the REPLACE() function to replace the &amp; value with an ampersand.”. Why would anyone ever want to do this? The correct way to extract a string from an XML document is *not* to first make the document invalid. Simply, if your data looks like “<book>Pride & Prejudice</book>”, it is not well-formed and, thus, is NOT even XML. You should just leave it looking like “<book>Pride &amp; Prejudice</book>” since that actually means something in XML.

If you want to extract a string from an XML, use an XML parser in your application layer or, in TSQL, use something like the following which lets TSQL’s XML parser unserialize the node’s value for you: “SELECT CAST('<bookstore><book>Candide</book><book>Pride &amp; Prejudice</book></bookstore>' AS XML).value('(/bookstore/book)[2]', 'VARCHAR(MAX)')”. Never try to programmatically parse, regex, or rewrite XML yourself *unless* if you are actually the author of the XML parser. If TSQL’s XML parser is broken, everyone is in trouble. If you go out of your way to handle serialized XML directly in your own code, you’re only going to end up getting yourself in trouble.

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Searching for Strings in SQL Server Databases
 Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.