21 February 2006

Beginning SQL Server 2005 XML Programming

XML has been used to represent semi-structured (as well as unstructured) data such as documents and emails. If information in these models has to be queried, then XML is probably the simplest way to represent such information.

XML has been widely adopted as a platform-independent mechanism for representing data, and is also commonly used to exchange data between disparate and loosely-coupled systems, such as B2B applications and workflow solutions. More recently, XML has been used to represent semi-structured (as well as unstructured) data such as documents and emails. If information in these models has to be queried, then XML is probably the simplest way to represent such information.

For example, if documents are represented in XML, it is very easy to write a DOM or XPATH query to extract the contents of a section titled, for example, “Recipe”. Many applications also require other pieces of information that typically reside in a database. For example, consider an application that allows a user to personalize the look and feel of an application. It is not uncommon to store the preferences of the user against the user record in the database. XML lends itself well to storing such unstructured information to cope with an ever expanding list of user preferences.

When we store this kind of semi-structured or unstructured information as XML in the database, it is not always feasible to extract the content to the application tier and then process the XML using XML parsers. If the database platform is able to provide native XML processing capabilities then that is a massive advantage. In that way we can also take advantage of other database capabilities such as query optimization, indexing, backups and, most importantly of all, interoperability with relational data.

Fortunately, SQL Server 2005 now natively supports an XML data type and enterprise applications that rely heavily on XML processing can take advantage of these native capabilities. This article will provide:

  • A brief overview of the basic techniques for getting XML into and out of SQL Server. This section provides basic code examples and descriptions, together with links for further reading.
  • An introduction to the new XML data type, how to define XML columns and variables and load them with data
  • How to define typed XML columns using XML schema

XML Support in SQL Server 2000

SQL Server 2000 provided both server-side and client-side XML support and it is useful to briefly review these capabilities in order to fully appreciate how SQL Server 2005 extends and improves them. We will focus on the server-side capabilities in this article, since XML management in the database is a relatively new phenomenon. For further information, MSDN provides a very useful Survey of SQL Server 2000 XML Features article.

SQL Server 2000 Server-Side XML

The major elements of server-side XML support in SQL Server 2000 are as follows:

  1. The creation of XML fragments from relational data using the FOR XML extension to the SELECT statement
  2. The ability to shred XML data, using the OPENXML function, so that it can be imported into relational tables.
  3. Storing XML data natively in the database.

FOR XML

The FOR XML extension allows the creation of XML from relational data. It supports several “modifiers” that dictate the shape of the resulting XML fragment. Following is the full syntax of the FOR XML clause:

The following table explains each of the available options:

Clause

Description

XML

When specified, the results of a query are returned as an XML document. One of the three options, from RAW, AUTO and EXPLICIT, must be specified

RAW

Transforms each row in the result set into an XML element with a generic identifier <row /> as the element tag

AUTO

Returns query results in a simple, nested XML tree. For each table in the FROM clause, for which there is at least one column listed in the SELECT clause, an XML element is created. The columns are mapped to the appropriate element attributes

EXPLICIT

Specifies the shape of the resulting XML tree explicitly. Using this mode, queries must be written in a particular way so that additional information about the nesting is specified

XMLDATA

Returns the schema, but does not add the root element to the result

ELEMENTS

Specifies that the columns are returned as sub-elements to the table element. Otherwise, they are mapped as attributes

BINARY BASE64

Specifies that the query returns the binary data in binary base64-encoded format

The following query uses AUTO mode and returns as an XML fragment information about a particular author. Note that we are using the pubs sample database that is shipped with SQL Server2000.

This query returns the following output:

228-FORXMLOutput.gif

OPENXML

The OPENXML function provides a rowset view over an XML document. Following is the full syntax of the OPENXML command:

Basically, the input to OPENXML is a “handle” to an XML document that is generated by making a call to the sp_xml_preparedocument procedure, and then a “row pattern” which is an XPATH used to identify the nodes in the XML document that are to be processed as rows. We can also specify the mapping information between the XML document and the relational rowset that is generated. The following example shows a simple usage of OPENXML:

The nodes are processed based on the rowset schema declaration provided after the WITH clause. The “2” in the query represents that we want to use element-centric mapping. The output of the above command is as follows:

228-OPENXMLOutput.gif

You can refer to the full syntax of OPENXML, along with other complex examples, in the OPENXML section of SQL Server 2000 Books Online.

Storing XML in the Database

SQL Server 2000 also allows you to store XML documents and fragments in the database. However, since SQL Server 2000 does not support an XML data type, the XML needs to be stored in a TEXT, NTEXT or IMAGE column. When there is a need to process this XML, applications have to retrieve this XML into the application tier and then process the same using the MSXML APIs.


SQL Server 2000 Client Side XML

Client-side XML support in SQL Server 2000 comes in the form of SQLXML. Following is a brief summary of the main SQLXML technologies:

  • XML Views, which provide a bidirectional mapping between XML schemas and underlying relation tables. Essentially, mapping annotations are added to the XML Schema that defines your particular business object. This produces an an XML View that allows you to present the underlying data in a hierarchical, semi-structured format. A nice example of this is given in the previously-cited Survey of SQL Server 2000 XML Features article. XML views also support querying using XPATH and packaging the result as XML.
  • Creation of XML Templates that allow creation of dynamic sections in XML documents. FOR XML queries can be embedded within the XML document, and/or XPATH expressions over mapping queries. When the template is executed, the query block is replaced with the result of the query.

In SQL Server 2000, there are two ways to access SQLXML functionality:

  1. Via a SQLXMLOLEDB Provider that exposes SQLXML functionality through ADO
  2. Via HTTP access through an ISAPI filter. Using a configuration tool, you can set up a web site to receive incoming requests to execute XML templates, FOR XML and XPATH statements, and then apply transformations to the results


SQL Server 2005 XML Enhancements

Although SQL Server 2000 provides quite extensive XML support, as described, the major drawback is that it is not native, and also does not provide all of the features that traditional XML processing requires. Some of the limitations are:

  1. No support for XML schemas in the database for validating XML documents
  2. XML is only stored as text and thus cannot be queried using XPATH syntax. The only option is to extract the XML to the application tier using ADO (or other data access APIs) and perform manipulations
  3. Results generating XML cannot be stored in variables inside of procedures and functions. The only option available is to stream it to the client tier using FOR XML
  4. Support only for XPATH 1.0 queries. The poses restrictions on some of the types of queries that can be written

However, in spite of these limitations, the XML features of SQL Server 2000 are extensively used and there are many enterprise applications that actively depend on it.

SQL Server 2005 builds removes these limitations and offers XML as a full-fledged native XML data type along with querying capabilities. Enhancements have also been made to the FOR XML syntax (see, for example, What’s new in FOR XML in SQL Server 2005). SQLXML 4.0 also brings with it many improvements. In this article, we will focus on the basics of the new XML data type.


The XML Data Type

The XML data type is a built-in data type similar to other built-in types in SQL Server 2005. You can define XML columns, XML variables, XML parameters for procedures and functions, and can also return XML as return values from functions. Internally, the XML data type is stored in a proprietary binary format. The maximum permissible size of the XML content is 2GB.


Defining XML Variables and Columns

To declare a variable of type XML is as simple as this:

DECLARE @xmlDoc XML

In order to create an XML column (using untyped XML – more on this later), we could use the following:


Loading Data into XML Columns and Variables

Having defined an XML variable or column, we can store information in it via the following methods:

  1. Assigning a string to the XML variable / column
  2. Using the output of the SELECT…FOR XML clause
  3. Using bulkload


Simple String Assignment

We can load a string into our @xmlDoc variable as follows:

Note that string content is implicitly converted to the XML type. You can also explicitly convert the string to XML as shown:

In each example, we then select this XML content. If you run this example in SQL Server Management Studio with the “results to grid” option enabled, the XML output will be as shown below:

228-SimpleStringAssignment.gif

Notice that the XML content is hyperlinked and if you click this link, a new window opens and shows the entire XML document. This is a great improvement over SQL Server 2000 Query Analyzer, which would show the results of XML queries as a single line of output and also truncate the results if the result size is greater than 8192 bytes.

We can insert regular string content into the feedXML column of our rssFeeds table using the following method:


Using FOR XML

In the following example, we initialize the @xmlDOC variable using a SELECT…FOR XML statement:

The output of the above command is as follows (only partial results are shown. To see the full results, you need to click on the hyperlink in Management Studio):

228-UsingFORXML.gif

Streaming results from FOR XML statements directly into XML variables is a great improvement over SQL Server 2000 (which did not allow this). Also note that the above example returns an XML fragment and not an XML document. If you need to return an XML document, you can use the new ROOT directive in the FOR XML command as shown:

Executing the above command produces the following output (again, only partial results are shown):

228-FORXMLwithROOT.gif

Bulkloading XML data using the OPENROWSET Function

SQL Server 2005 provides extensions to the OPENROWSET function to bulk load an XML document from the file system. The following example shows how to bulkload our @xmlDoc variable using this technique:

The RSS.XML file that is referred to in the above query is a sample RSS XML that I generated from the MSDN site. To do this, you can query the MSDN RSS site at: http://msdn.microsoft.com/rss.xml and then save results as XML from Internet Explorer.

The output of the above command is as follows:

228-BulkLoadingXML.gif

Similarly, the following code uses the OPENROWSET command to load an XML document into our feedXML column:

The following output is generated:

228-LoadedFeedData.gif

Typing XML

Up to this point, the XML content that we have used is referred to as untyped XML. An untyped XML variable/column is one that is not associated with a schema. If we associate a schema with an XML type, it is then referred to as typed XML. Once an XML variable/column is typed using a schema, any data that is inserted into it is automatically validated against the schema, and SQL Server 2005 raises an error if the data does not conform to the defined schema rules.

It is usually recommended to associate schemas to XML content in SQL Server, since query optimization can use the information and also it provides better operational semantics. For example, we can perform arithmetic operations on numbers, but not strings. If no schema is attached to an XML, then all values are considered to be strings. MSDN provides further information about schemas and their maintenance.

Creating typed XML involves two basic steps: 1) Create an XML schema and 2) Register it in the database.


Creating a Schema

Using tools such as Visual Studio 2005 and XML Spy, creating a schema is pretty simple. The following link contains some excellent tutorials on this topic: Creating Schemas in Visual Studio. Using the tool of your choice, create the following schema definition:

This schema defines an XML document of the form:

Save this schema in a file called ContactsSchema.xsd in the C:\Temp folder.


Registering a Schema in SQL Server 2005

You register a schema in SQL Server 2005 using the CREATE XML SCHEMA COLLECTION command. The following script loads our XML schema into the database.

In the following code, we type an XML variable to the MyContacts schema and then assign both valid and invalid content to the XML (by invalid, we mean that the content does not adhere to the schema):

Note how we have declared the XML variable. The schema reference is specified in parentheses after the variable. We have also used the new TRY…CATCH error handling syntax of SQL Server 2005 to capture the error and display the corresponding message.

When you execute the above snippet, the following output is generated:

228-TypingXML.gif

You should also receive the following error message (I have wrapped the error message to make it readable):

228-TypingXMLError.gif

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

This post has been viewed 261697 times – thanks for reading.

Tags: , , , ,

  • Rate
    [Total: 326    Average: 3.8/5]
  • Share

Srinivas Sampath works at SCT Software Solutions (http://sungardsct.com) as Head - Technology Labs. Technology Labs is a group focused on evangelizing on various Microsoft Technologies for applicability into the SCT Line of Business. Srinivas is also an MVP in SQL Server for the past 3 years and is a regular speaker at various Microsoft events like TechEd, DevDays and MSDN India. He runs a web site at: http://www32.brinkster.com/srisamp and a blog at: http://blogs.sqlxml.org/srinivassampath. You can reach him at: srisamp@gmail.com.

View all articles by Srinivas Sampath

  • Anonymous

    Broken Images
    Hi, all the images embedded within the article are broken

  • Anonymous

    Broken Images
    yes, all the images embedded within the article are broken

  • Anonymous

    Great Tutorial Srini
    Keep up good work

    Imran

  • Anonymous

    Good Work!
    YES!
    I definitely agree that it’s a GREAT work. But the broken images are really really needs to be fixed.
    Thanks for the stuff.

    Kathy

  • Anonymous

    Great Document
    Great Document – well written, shame about the broken images.

  • Anonymous

    Good Work
    Great work-easy to understand but the broken images needs to be fixed

    Sumaiya

  • Anonymous

    xlm data dtype
    work is good but not very clear

  • Anonymous

    appreciated
    thank you!

  • Anonymous

    Broken images
    They seem OK in firefox

  • Anonymous

    Firefox
    Tried Firefox… No dice on the images. Good article none the less.

  • Bango Ppong

    Error
    Tried..
    what am I doing wrong?

    Msg 4806, Level 16, State 1, Line 2
    SINGLE_CLOB requires a double-byte character set (DBCS) (char) input file. The file specified is Unicode.

  • Anonymous

    Images
    It would be nice if the images on the web page would appear

    Ya think

  • VIshal

    Greate
    Its a very good article,every one must read it

  • Kevin Wager, Moneta, VA

    Thank You.
    Very nice web site. Thank you for this.

  • Ariel

    Great!
    Just what I needed to start.
    All images look OK

  • Anonymous

    Goodjob
    This article is very nice.I understood the concepts clearly.

  • Abdullah

    good information
    good article and useful informatiom

  • FTO

    Danke
    Vielen Dank

  • Srinivas

    simple and superb explanation
    I got a good understanding of xml impolementation in both sql2000 and 2005.

  • andera

    great examples
    the exemplification is great. this is how we can learn easier.
    Thanks!

  • Rana Guha

    Article out of the top draw !! Keep it up
    clear and simple overview supported with good examples. wonderful Article to read ; helped me a lot…
    Best Wishes

  • Anonymous

    Great work!!
    When is the next version coming out??

  • Anonymous

    gr8 work
    very good article for those who are new to XML concept.
    What’s that broken image issue?????

  • sathish.R Dhanus Tecnlogies

    Good!
    simple & Good.

  • Anonymous

    Where are the rest of the articles?
    Where’s the follow-up article to explain the rest of this stuff?

  • RanjitK@Gmail.com

    Thanks!
    Srinivas, Thanks for sharing this excellent knowing. BTW, is there a way to have the database refer to URL based schema, rather than containing native XSD? Also, Does your next article cover XML views over non-XML (or relational) columns 😉 That’s what I am googling, appreciate any pointers.

  • Anonymous

    query xml type variable that contains data and inline schema.
    Can i query xml variable that contains multiple schema and data. It would be nice to have an example.

  • itizme

    Hi!
    It is really good artice to understand SQL SERVER WORKING with XML.

    Can u give me an example for reading xml data and writing it in the respective table columns?

    e.g.
    <first>
    <dataarea>
    <person>
    <id>1233</id>
    <name>myname</name>
    <address>
    <street>street1</street>
    <country>Country1</country>
    </address>
    </person>

    <person>
    <id>1111</id>
    <name>myname1</name>
    <address>
    <street>street2</street>
    <country>Country2</country>
    </address>
    </person>

    <person>
    <id>3333</id>
    <name>myname3</name>
    <address>
    <street>street3</street>
    <country>Country3</country>
    </address>
    </person>
    </DataArea>

    I need to store the person name,id,street,country in the coulmns of a table.

    Regards,
    Itizme

  • pkn

    hi from pk
    good work >>>>

  • Karthik

    hi …. Answer for that xml statement…..
    DECLARE @xmlDoc NVARCHAR(MAX)
    DECLARE @handle INT
    SET @xmlDoc = N’
    <first>
    <dataarea>
    <person>
    <id>1233</id>
    <name>myname</name>
    <address>
    <street>street1</street>
    <country>Country1</country>
    </address>
    </person>
    <person>
    <id>1111</id>
    <name>myname1</name>
    <address>
    <street>street2</street>
    <country>Country2</country>
    </address>
    </person>
    <person>
    <id>3333</id>
    <name>myname3</name>
    <address>
    <street>street3</street>
    <country>Country3</country>
    </address>
    </person>
    </dataarea>
    </first>’
    EXEC sp_xml_preparedocument @handle OUTPUT, @xmlDoc
    SELECT * FROM OPENXML (@handle,’/first/dataarea/person/address’,2) WITH
    (id NVARCHAR(200), [name] NVARCHAR(200),
    street NVARCHAR(200),country NVARCHAR(200)) ORDER BY id
    EXEC sp_xml_removedocument @handle

  • Nick Williams

    Good Article!
    Well written!! i’m a bit of a novice with SQL server (always been a MySQL boy up until now!), and it was easy enough for me to understand but it didn’t skim over the details – definitely a good thing.

    also, lots of good points that i haven’t seen elsewhere 🙂

    bookmarked for sure!

  • Dewy

    Multiple tables
    I have nested xml and as such need a way to extract the xml into multiple relational tables. Is this possible using this approach?

    Example:

    <xml>
    <child>
    <name></name>
    <dob></dob>
    <addresses>
    <address>
    <no></no>
    <line1></line1>
    <line2></line2>
    </address>
    <address>
    <no></no>
    <line1></line1>
    <line2></line2>
    </address>
    </addresses>
    </child>
    </xml>

    and then have a CHILD table and an ADDRESS table where the ADDRESS has a FK to CHILD the table.

    Cheers
    Dewy

  • ntn

    Srinivas Sampath is a PLAGIARIST!
    SRINIVAS HAS FLAT OUT PLAGIARIZED MSDN (see comparison below). ONE MUST CARFFULLY CONSIDER, CAN SRINIVAS BE TRUSTED??

    http://msdn.microsoft.com/en-us/library/ms345117(v=sql.90).aspx

    SRINIVAS “says”:
    XML has been widely adopted as a platform-independent mechanism for representing data, and is also commonly used to exchange data between disparate and loosely-coupled systems, such as B2B applications and workflow solutions. More recently, XML has been used to represent semi-structured (as well as unstructured) data such as documents and emails.”

    MICROSOFT MSDN:
    “eXtensible Markup Language (XML) has been widely adopted as a platform-independent format for data representation. It is useful for exchanging information among loosely coupled, disparate systems, such as in business-to-business (B2B) applications and workflow situations. Data interchange has been a major driver of XML technologies. XML is increasingly present in enterprise applications that are used for modeling semi-structured and unstructured data. One such application is document management. Documents (e-mail messages, for example) are semi-structured by nature.”

    … you decide… I for one cannot stand plagiarism, or plagiarists who pass of their work as “original”