28 April 2009

XML Data Modification Language Workbench

XML Data Modification Language (XML DML) allows you to modify and update XML data. When working with SQL Server Databases, this is the most efficient way to modify elements in an XML column, yet the techniques of using XML-DML have not been well, and simply, described - up until now. Robert Sheldon presents a practical workbench to show the various modify methods

The XML data type – first introduced in SQL Server 2005 and carried over to SQL Server 2008 – brought with it valuable functionality that supports XML documents or their fragments. You can assign the XML data type to columns, variables, or parameters, and you can configure those objects as either typed or untyped. A typed XML column (or variable or parameter) is one that is associated with an XML schema. An untyped column is not.

The XML data type also supports several methods that let you work directly with the XML data. Two common methods are query() and value(). Each one supports XQuery expressions that allow you to retrieve data from the column on which the method is called. However, neither of these methods or the XQuery language let you manipulate the XML data.

For this reason, SQL Server also introduced an extension of XQuery called the XML Data Modification Language (XML DML). You can issue an XML DML query by calling the modify() method supported by the XML data type. In this workbench, I show you several examples of how to manipulate XML data in both typed and untyped XML columns. To support these examples, I created the JobCandidates table in the AdventureWorks database (tested in both SQL Server 2005 and 2008.) The table includes the CandidateResume column, which is a typed XML column, and the CandidateRating column, which is an untyped XML column.

The following set of statements create and populate the table:

Notice that the CandidateResume column uses the schema collection named HumanResources.HRResumeSchemaCollection, which is the same collection used by the source data in the HumanResources.JobCandidate table. The CandidateRating column, on the other hand, is not associated with a schema, so I just pass in the simple XML document.

Inserting data into an untyped column

When you call the modify() method on an XML statement, you must specify whether you are adding, updating, or deleting data. To add data, you must use the ‘insert’ keyword, followed by the XML data you want to add.

When you insert data into an XML column, you can choose from one of three methods: directly into an element (by using the ‘into’ keyword), before an element (by using the ‘before’ keyword), or after an element (by using the ‘after’ keyword). If you are using the ‘into’ keyword to insert data into an element that contains child elements, you must specify whether the new element goes before the other elements (by using the ‘as first’ keywords) or after the other elements (by using the ‘as last’ keywords). This will all become clearer as you work through the examples.

In the following example, I insert the <Communication> element as a child element of the first <Rating> element in the CandidateRating column. When you call the modify() method, you pass the XML DML expression as a string (enclosed in a set of single quotes). The string begins with the ‘insert’ keyword, followed by the element that you want to insert. Next, you must specify where to insert the data.

In this case, I add that data directly into the first <Rating> element by specifying the ‘into’ keyword. Because the <Rating> element includes child elements, I also specify the ‘at last’ keywords (preceding ‘into’). As a result, the <Communication> element will be added as the last child element within the <Rating> element.

After specifying the ‘into’ keyword, you must provide the path of the target element (<Rating>). The path expression is similar to an XML Path Language (XPath) location path, which follows the hierarchical order of the XML elements. XML DML uses the same path structure as XQuery. (See SQL Server Books Online for more details about XPath and XQuery.)

In the following example, when I call the <Rating> element, I specify the element that has the RatingId attribute value of 1. Notice that the attribute name is preceded by the at (@) symbol. Also notice that the path is followed by [1]. The [1] represents the first element in the list of possible elements that are returned by the path expression. Because the modify() method requires that the path expression reference only a single path, you must include the [1] to ensure that you are returning a scalar value. Even if the path can return only one value, you still must include the [1].

The following results show the XML that is now stored in the CandidateRating column. As you can see, the first <Rating> element now included the <Communication> child element, which was been added as the last element.

You can achieve the same results as those shown in the previous example by using the ‘after’ keyword instead of ‘insert’. This eliminates the need to include the ‘as last’ keywords. However, you must define a more complete target path. Notice that in the following example, I’ve added the <ToolSkills> element to the path. This way, the <Communcation> element will be added directly after <ToolSkills>.

Note that, if you run the following example directly after the preceding example, your <Rating> element will include two <Communication> elements. To prevent this, rerun the code near the beginning of the workbench to re-create and repopulate the table before running the following example. This is the approach I took for the next few insert statements in this section.

You can also define an expression in your insert statement. For example, the following statement includes an if/then/else expression. If the value in the <DbDevelopment> element is greater than 4, text is added to the first <Rating> element.

As you can see below, the example uses the text() function to add text to the <Rating> element, rather than a child element. However, you can also use an expression to add an element, other types of nodes, or attributes.

Because the <DbDevelopment> value was greater than 4, the text was added, as shown in the following results.

You can also add two elements at the same time. In the following example, I add both the <Communication> element and the <FormalTraining> element. When adding multiple elements, enclose them in parentheses and separate them with commas.

As you can see in the following results, the first <Rating> element now includes both of the new child elements, added after the existing ones.

In addition to adding elements (or text) to an XML document, you can also add other types of XML nodes, such as comments. In the following example, I add a comment along with the two elements you saw in the previous example. Notice that I include the comment as I would an element. The comment itself follows the conventions of XML comments – enclosed in brackets and the necessary dashes and exclamation mark.

Notice also that the example uses the ‘before’ keyword instead of ‘after’. Now the new elements will be added before the existing child elements.

As you can see the following results, the comment is now included in the XML and is located at the beginning of the set of child elements.

In addition to adding elements to an XML document, you can also add attributes to an existing element. To add an attribute, you must include the ‘attribute’ keyword, followed by the attribute name and its value, which is enclosed in curly brackets. The following example adds the RatingName and RatingType attributes to each <Rating> element.

When you add an attribute, you should use the ‘insert’ keyword. However, you do not need to specify the ‘as first’ or ‘as last’ keywords because it does not matter whether there are child elements when you’re adding an attribute to the parent element.

Note that the following example, like all the remaining examples in this workbench, build on the preceding examples. This approach helps to demonstrate how to update and delete data, which are described in later sections.

The following results from the CandidateRating column show how the two new attributes and their values have been added to each of the <Rating> elements.

Updating data in an untyped column

As you saw in the previous examples, when you add data to an XML document, you use the ‘insert’ keyword. However, to update data, you instead use the ‘replace value of’ keywords. The keywords are followed by the XML path of the element or attribute whose value you want to update. After you specify the path, you then specify the ‘with’ keyword and the new value.

In the following example, I update the value in the <DbDevelopment> element. Notice that the XML path includes the text() node function. The function is necessary because it indicates that the path expression is specifically referencing only the element’s value, and not the tags as well.

As the following results show, the <DbDevelopment> element now contains the new value, but nothing else has changed.

If you want to update an attribute’s value, you must specify that attribute in the path expression by including it at the end of the path, as shown in the following example. In this case, I am updating the value of the RatingType attribute.

As you can see in the following results, the RatingType attribute of the first <Rating> element now includes the new value.

In the previous two examples, the value defined after the ‘with’ keyword is a simple string value (enclosed in double quotes). However, you can instead include an expression that defines more complex logic in determining the new value.

For instance, in the following example, I follow the ‘with’ keyword with an if/then/else expression. To demonstrate how this works, I first add the MeetsMinimum attribute to the second <Rating> element. I then update the attribute based on the value determined by the if/then/else expression. Specifically, if the <Experience> value is greater than eight and the <DbDevelopment> value is greater than five, I set the MeetsMinimum value to Yes, otherwise, I set the value to No.

As the following results show, the MeetsMinimum attribute has been set to Yes because the candidate has the necessary years of experience.

Deleting data from an untyped column

Deleting data from an XML column is very straightforward. Simply include the ‘delete’ keyword in your XML DML expression, followed by the path expression that points to the XML node or attribute you want to delete.

In the following example, I delete the comment in the first <Rating> element. Notice that the path expression includes the comment() function. Following the function, I include [1] to indicate that I want to delete the first comment. Note that this is necessary only if your element includes multiple comments and you want to delete a comment other than the first one. I include the [1] only to demonstrate how it works, but I could have left it out.

As you can see in the following results, the comment has been deleted from the first <Rating> element

It is just as easy to delete an attribute. Simply include the attribute name at the end of your path expression, as I’ve done in the following example. In this case, I’m deleting the MeetsMinimum attribute (created in an earlier example).

The following results show that the MeetsMinimum attribute as been deleted from the second <Rating> element.

To delete an XML element node, define a path expression that points to the element. In the following example, I delete the <FormalTraining> child element in the first <Rating> element.

As you can see in the following results, the <FormalTraining> element has been removed.

Inserting data into a typed xml columns

As mentioned earlier, you can use the XML modify() function to update typed or untyped XML data. The examples you’ve seen so far, have all modified untyped XML. To modify typed XML data, your XML DML expression must also include a namespace declaration. The namespace must match the namespace associated with the XML column, variable, or parameter.

The XML DML expression that modifies typed XML data must include two parts. The first part is the namespace declaration, and the second part is the actual data modification, similar to what you’ve seen in the preceding examples. The two parts are divided by a semi-colon; however, the entire XML DML expression is still enclosed in one set of single quotes.

In the following example, I use the modify() method to insert a new <Employment> element into the CandidateResume column in the JobCandidates table. To declare the namespace, I specify the ‘declare namespace’ keywords, followed by an alias (in this case ‘ns’), then an equals sign, and finally by the name of the schema, enclosed in double quotes.

After the namespace declaration, I add the semi-colon and then the part of the expression that manipulates the data. This part is similar to what you saw in the previous examples except that each node in the new <Employment> element and in the path expression must reference the namespace alias. For example, the <Resume> and <Employment> nodes in the path expression are each preceded by ‘ns:’, as are the elements listed in the new <Employment> element.nt.

When working with typed XML, you can add only those elements and attributes that are supported by the associated schema. If you try to add unsupported elements or attributes, your query will fail.

The following results show the new <Employment> element and its children elements that have been added to the <Resume> root element. Each of these elements conforms to the schema associated with the CandidateResume column.

The process of updating data in a typed XML column is similar to an untyped column. Once again, you must declare the namespace and include the namespace alias in your XML path expressions.

In the following example, I update the <Emp.Responsibility> child element in the <Employment> element that was added in the previous example. As you can see, I declare the namespace and include the alias in each component of the path expression.

The following results show that the new text has been added to the <Emp.Responsibility> element.

Deleting data from a typed column is also similar to deleting data from an untyped column. As the following example shows, you must once again declare the namespace and include the namespace alias in each node of your path expression.

As you have seen, you can use the XML modify() method to update XML data, whether typed or untyped. The primary difference between the two is that you must reference the schema that is associated with a typed column, variable, or parameter. And any changes you try to make to a typed column must conform to that schema. But once you understand the basic principles of using the modify() method and defining XML DML expressions, you can begin to use the full power of XML DML to manipulate your XML data.

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter


This post has been viewed 25402 times – thanks for reading.

Tags: , , , , ,

  • Rate
    [Total: 34    Average: 4.7/5]
  • Share

After being dropped 35 feet from a helicopter and spending the next year recovering, Robert Sheldon left the Colorado Rockies and emergency rescue work to pursue safer and less painful interests—thus his entry into the world of technology. He is now a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. He has also written news stories, feature articles, restaurant reviews, legal summaries, and the novels 'Last Stand' and 'Dancing the River Lightly'. You can find more information at http://www.rhsheldon.com.

View all articles by Robert Sheldon

  • Rui Carvalho

    Great Article!
    I really love this kind of articles: code & comments inside. This technique is not really well documented and exemples provided are not very good too, your article correct that and give us a real world overview about xml dml. Thank you for that!

  • Ranga

    Great article…
    Great article…prsented in a simple format with coments examples…please keepem coming….

  • Conrad Rowlands

    Great Article
    Great Article Robert. Really well laid out and broken down. I can see this being utilised in a project that needs some refactoring…. The only thing that I dont really see is what are the advantages of using a typed xml column. Just looks like more of a Faff really.

  • AdamA

    Superb article
    Thanks for this one – it really makes it clear!

  • Dean Roush

    Article was great and comment on why typed XML is valuable
    I am very new to using XML in the database. I plan to use it extensively in a new application I am designing. When I started researching the use of XML in SQL it seemed as though DML functionality should exist but I was not aware of it until now. Thanks for the very clearly explained examples!

    As for why typed XML is important? In one scenario we will use it to record collections of multiple-choice type answers in one data column (one test per row, one question’s answer set per column). We want to ensure that we have all answers (not null), and that they are data typed correctly so we can do the appropriate grading and/or statistical analysis. We believe that in most cases no data is better than corrupt data. This feature is a great defense against poorly written modification statements, including my own.

  • KJC

    Very good article
    I only wish I’d come across it when it was posted, I’ve just gone through learning all of this the hard way from a dozen different resources!

  • Anonymous

    Excellent Article to learn XML operations
    Excellent Article to learn XML operations in sql server 2005 & 2008.