Av rating:
Total votes: 24
Total comments: 13


Jim Fuller
XML and RDBMS: 10 years on
25 August 2006

"XML is the digital dial tone of the web" -- Jon Bosak (The father of XML)

November 2006 will mark the 10th year of XML's rise to prominence in a world that continues to be dominated by RDBMS and SQL. During this time, XML has caused noisy debate within every corner of our software development multi-verse. A recent XML-DEV discussion spurred me to review the reasons for its success, and especially how it intersected with past, and will intersect with future, uses of RDBMS.

Attractions of XML

Hard bitten SQL yeoman denigrated the performance characteristics of a half-baked hierarchical data model instantiated in a concrete syntax of angle brackets. The heir apparent SGML purists were thinking that XML was finally going to sort out publishing on the web – with the more esoteric imagining a way to hitch a ride to the stratospheric Semantic Web.

There were the unwashed many, like you and me, spending cycles on serving up data for web sites and wondering why we needed to do anything new. Then something surprising happened…XML started to be used in every conceivable form, propelling its status from the hyperbole of the 'new' to an essential tool in the kit of the perspicacious developer. To understand how this happened in a world full of tried and true solutions, I try to list some of the adoption characteristics that were attractive to me personally.

Easy to learn

People learn better when they have an existing analogy with which to 'map' a concept. For example, those familiar with HTML were one small step away from using and understanding XML. I should probably remind everyone that there are a *lot* more people who have heard and even seen HTML then those using tables in a relational database and it is only natural for them to seek the simplest 'next step' regardless if that next step is the right or wrong solution.Easy to make

All one needs to work with XML is a text editor. Finally an emacs lisper in the same room as a Dreamweaver designer has something other then HTML to talk about. 'Data for the masses' is a term I like to use, though in terms of design pattern speak...linga franca feels more appropriate.

Easy to debug

Finding what is wrong with your XML data is as easy as 'view source' in an HTML browser…easy to cogitate and faster then grokking some binary data format indirectly through some proprietary debugging tool

Informal and lightweight

No dependency on server side application software means that people like designers and business domain experts can get on without waiting on the overworked DBA to set something up for them. Admittedly few designers and even fewer business domain experts generate XML directly...though it was dead simple to get the tools they were working with to produce XML.

Unicode

Taking a 'clue' from Java and insisting on Unicode at its foundation meant that XML is prepared to be a more complete solution.

Existing technologies were immediately applicable and available

Technologies such as DOM and SAX were already parsing HTML in 'anger' and ready to be applied to XML

XML and the Web

There was a lot of semi-structured document type data being normalised into RDBMS tables at the time of XML birth (and still is). A lot of work was/is spent in marshalling data to and from the web and relational tables. There was benefit to having data marked up using XML as it could go over the wire and be stored as a document and generally 'played nice' in the web stratum. Using XML to take care of 'document orientated data' and RDBMS to manage 'data orientated data' became an architectural breakpoint in data modelling. Technologies such as XSLT also started to come into play; applying the 'separate data from presentation' mantra.

Barriers to XML adoption

At this point, I list some of the factors that should have been barriers to XML adoption:

  • Existing RDBMS technology is more performant by an order of magnitude(s) in most data scenarios
  • We were all just getting used to mapping RDBMS to OO in our code, not to mention a lot of useful tools were emerging to help us do this
  • XSLT functional approach was determined to be a 'steep' learning curve for those more procedurally minded...not to mention sending 'SQL join freaks' into a spin when they found out how difficult it was to do what they had already been doing easily using SQL.

On top of this, a phalanx of related technologies came riding on the back of the XML beast and simply failed to make an impact…

Where are the links?

It is still hard for me to think that one of the primary features that made HTML so popular would be 'missing' or woefully represented in efforts such as XLINK

Web Services distraction

All I can think about is 'Enterprise cathedral building' at its best when I see the SOAP stack and all its complementary bits. A lot of brainpower was drained into this bottomless pit with the most useful result being the rise of REST, which is essentially the web as we know it + XML.

To schema or not?

I have no doubt that activities like the W3C's XML Schema will eventually inform us on how we will all validate and constrain our XML data...but for the time being one can opt in or out or use some other schema technology such as RelaxNG, XSLT, or schematron. Having the ability to have formal and informal data means you can choose which best suits the application requirements; e.g. a web Content Management system can live with informal data, whilst your companies finance systems will need a rigorous data definition.

Semantic Web distraction

The application of XML in the development of a smarter web 'smells' of the early 80's and all the effort poured into such things as AI, genetic algorithms and LISP...it's great to see the old become new again but I can't help think but worry that a new generation will get seduced by the promise of the impossible.

Why XML gained a foothold

Ultimately, XML's adoption was a probably a natural result of the unique process by which the original specification was created. A hard core of SGML old salts had absolute control over what went into the specification itself; though they allowed themselves to be informed by a layer of software developers to provide in depth comment, advice, tests, and use cases.

These software developers in turn opened up the review process to the public. These days this is the normal way in which specification bodies (even those who have hard commercial cores!) go about their duties, and mirrors Open Source software development in general. With no lack of experience and the 'long memory' that is SGML, along with groups such as Hytime, it shouldn't surprise us that the XML specification has had an impact.

Note
XML was 'fortunate' to have been created near enough to the Internet bubble bursting; a lot of idle hands were 'ready to serve' the cause.

With all this said I think there are some deeper reasons why XML gained a foothold in a world dominated by RDBMS.

Multiple data models: hierarchical versus relational data

We already mentioned how XML helped characterise 'document orientated data' whereas RDBMS were good at managing 'data orientated data'. XML can also be considered a good choice for representing hierarchical data, e.g. data in the form of a tree. In fact we use hierarchical data models all the time, all one has to do is interact with your OS file system to know that.

With the past 20 years developers have gotten used to working 'at the interfaces' of their code so working with multiple data models was and is nothing new, albeit still painful.

Hybrid approach: RDBMS is the anchor, XML is the sail...

Long term data storage requirements are different than, let's say, the needs of a client application to query a subset of data. Having 2 approaches in the form of RDBMS and XML means you can better fulfill requirements.

Using RDBMS as the foundation of your data layer and XML for distributing data can be an effective technique. Though the performance and functionality of today's RDBMS means that it won't sit idly by as holder of a companies 'crown jewels' and not participate in any of the fun. At a minimum, the ability to marshal data back and forth from RDBMS and XML means that you can accommodate future integration requirements more ably.

The use of XML and RDBMS together represents a sophisticated hybrid approach to solving all your data problems and not having to make compromises.

Outside developments

"Memory is the new hard drive and hard drive is the new tape drive"

Let's not forget that developments outside the world of software can have an impact on the architectural decisions we make. I have taken the above un-attributed quote as an example of how advances in the production of memory and hard drives is driving applications to 'keep more in RAM' and use cheap, robust hard drives to take care of long term storage. This statement is evidenced by the emergence of things such as 'persistence layers' in software development.

For example, there are many arguments about how efficient XML is as a text based encoding. These arguments may become moot with more processing power and cheaper storage. The point is that we should be aware that new technology can render our current assumptions invalid.

Where XML is going?

I would like to think that using XML *and* RDBMS represents a better way to solving problems. Inevitably there will those specialists who have invested considerable time and effort in one approach and will argue against the use of any other.

One exciting approach is the emergence of native XML Databases...I am a fan of the eXist-db (exist.sourceforge.net). Recently, I was on the selection committee for XML Prague 2006 (www.xmlprague.cz) and was able to get the core development team of eXist-db over to Prague. In my opinion, eXist-db has some impressive features:

  • Core level 1 XML:DB compliance
  • Efficient indexing
  • Full database recovery
  • XQUERY, XPATH, XSLT
  • Full DOM/SAX support
  • Multiple interfaces e.g. SOAP, XML-RPC, WebDAV
  • HTTP/REST API providing all data via its built in web server
  • Updates achieved using XQUERY UPDATE or XUPDATE
  • Unix-like access permissions and XACML for XQUERY access control

The challenge of using native XML Databases (NXD), such as eXist-db, relates to how easy it will be to 'fit together' with RDBMS implementations, without duplicate effort or unnecessary complexity. Perhaps we will see database vendors picking up a few tricks from the NXD crowd. I for one would rather work with a single database product, but it might be a while before this occurs.

Other developments illustrate just how good things can get when using RDBMS and XML together; one such development that springs to mind is Microsoft's SQL Server Reporting Services and its XML based report objects, which:

  • Uses the XML-based Report Definition Language
  • Can perform XSLT transformations
  • Provides a sophisticated Web Services interface

Since SSRS is included for free in SQL Server 2005, I would highly recommend taking it for a spin to understand where convergence can really provide true benefit.

In any event, this November will mark 10 years that XML has been with us – just as long as "established" web technologies such as FLASH; it's hard to believe!



This article has been viewed 9726 times.
Jim Fuller

Author profile: Jim Fuller

Jim Fuller has been working on software for 20 years. He designs and builds software for large J2EE implementations and is stand-in Technical Director of a number of web companies (www.flamedigital.com, www.webcomposite.com). He is an organiser of XML Prague (www.xmlprague.cz), co-author of EXSLT (www.exslt.org), and would like to end his programming career writing in Perl.

Search for other articles by Jim Fuller

Rate this article:   Avg rating: from a total of 24 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: XML blues
Posted by: Andrew Clarke (view profile)
Posted on: Friday, August 25, 2006 at 3:52 AM
Message: I've done a couple of large progects involving the use of real-life hierarchical data. In both cases, I wanted to use taxonomies of data based on XML- after all we are all told how good it is at representing hierarchical information. I'm one of the species unkindly known as 'a hairy-arsed RDBMS man'. Whereas it is possible to represent relationships, hierarchies and categorisations in the traditional RDBMS, it aint straightforward, since SQL is inherently non-recursive. I was therefore keen on XML initially, very keen after years of SGML and HTML

I had high hopes of XML, but was amazed how difficult it was to stray beyond the simplest structures. Real-life business relationships are diffuse and reticular in nature. The idea of entities being 'contained' withing other entities doesn't even hold true to the academic examples such as company organisational structures.

We had to hurriedly abandon XML for any real work, though there is such public pressure to use it that we cheerfully used it for all public interfaces, data feeds, configuration files etc. 'Oh yes, sir, it is an XML-compliant system'.

Yes, XML has its place, though I still find it tricky to take any data-representation seriously where the names of elments have to be case-sensitive. So that a customer is quite different from a Customer, and different indeed from a cusTomer.

Subject: using XML
Posted by: Anonymous (not signed in)
Posted on: Monday, August 28, 2006 at 3:53 AM
Message: interesting comment, let me respond;

* isnt 'the idea of entities being contained within other entities' one of the tenats of hiearchical data model? perhaps u really didnt need XML in the first place.


*I would argue that there is no one data model sufficient to model all use cases. Though, if you are more comfortable using RDBMS to shoehorn all use cases then I would advocate your final result e.g. 'do what you want internally, but provide a minimal access via XML'.

Doesnt make any sense to make everything XML for the sake of it...me I do everything in perl...having interfaces exposed as XML-RPC and generating and consuming XML means I dont have to explain to people how to do things in perl.

Sounds like your development got hit by 'buzz word technical management' antipattern.

I am interested though in your assertion that XML is somehow not useful in modeling business relationships..ebXML, bpel come to mind (if its enterprise you are in) or were u able to do a survey of what is existing ...I make a bet there are quite a few respectable efforts in existence to model business org charts in XML (micro formats).

Your last comment about data representation has nothing to do with XML; I would propose putting case handling in your code (perhaps as callbacks from an XML Parser for example)...dont embed within your data model.

Also I would argue that the scenario of naming elements e.g. 'Customer versus cusTomer' is not valid...why would I not want to know that an error has occurred in my data? Do you remember the bad old days of HTML and tag soup circa 1995-7?

It clearly states in XML spec that 'XML shall be formal and concise' and I agree with it whole heartedly.

Though if one must there is nothing stopping you from encoding Customer or cusTomer, expecting your XML parser or processor to think that these are the same element is another thing.

Practically though, its just means you have to handle relaxed case handling in your own code; either at the parser level, or within XQUERY, XSLT, XPATH, etc...or at worst use something like HTML Tidy to fix your XML so it is well formed.

--Jim Fuller






Subject: XML for serialisation
Posted by: Anonymous (not signed in)
Posted on: Friday, September 01, 2006 at 2:42 AM
Message: Radu D.: We use class / instance hierarchies so XML is perfect for serialization, instead of doing it binary, but keeping an eye on performance!

I have seen XML as the best choice when you want to persist in a RDBMS some frequently changing data structures, e.g. documents in enterprise systems - where the documents structure must follow the enterprise changes and RDBMS must assure high performance for concurrent access against large volumes of data.

To use ORM like Hybernate to "easy" adapt business logic to a permanent changing database structure... is not an optimal cost solution. ORM may speed up the app. development, but the app. must be designed in the first place to be reliable.

Subject: Tools and RSS
Posted by: Anonymous (not signed in)
Posted on: Monday, September 04, 2006 at 7:42 AM
Message: As development tools evolve we are also seeing XML become more integrated. .NET's extensive use of XML was a key element in enabling MS developers to leverage this technology. I could not imaging .NET without XML. The popularity of RSS is going to be a significant factor in the continuing growth of XML.

Subject: The round file
Posted by: Anonymous (not signed in)
Posted on: Monday, September 11, 2006 at 2:00 AM
Message: XML is cool but retarded at the same time. It would be cool if it was recognized as just another method for mapping, one that you should be prudent about sending over the wire. It would be cool if it was just joined with XSL to produce HTML. But XML everywhere? What was wrong with TYPE, LENGTH, VALUE pairs or .INI files (other than that Microsoft thought them up). The core problem is that text is dumb, reasonable enought to send over the local network, or over the corparate network, but not over the wide area unless you cannot understand debuggers and client/server debugging. As a messagging protocol, it is at least 3X bigger, dumb. We could have saved 5 years of energy if we had just used EDI + gzip + mime (Nathaniel had it right). Additionally, to actually execute commmerce over the public internet you have to encrypt anyway, so anything you though you gained is lost by the obscurity of the encryption. You will have to compress these large documents to get performance compared to all that old middleware stuff, DCE, CORBA, TI-RPC. I would say that I have wasted at least a couple of years absorbing this crap, and I am ashamed of this wasted effort. In the mean time the newbie engineers with master degrees from India, China and Russia have taken your jobs.

Subject: Re: The Round File: The square hole.
Posted by: Phil Factor (view profile)
Posted on: Monday, September 11, 2006 at 1:53 PM
Message: I have quite a lot of sympathy with this viewpoint. The only reason that XML is useful is because of the man-centuries of effort that have gone into turning a sows ear into a silk purse.

We now have something that is pretty close to being a silk purse, but I can't help being aggrieved at the blatent over-selling of the technology over the past few years.

The use of INI files is not so daft as it sounds. Microsoft put a lot of effort into making them quick and easy to read. I suspect that if we spent the same effort into turning INI files into news feeds, UML diagrams, hierarchical data and so on, then we'd have more for our money than we've achieved with XML.

One point I disagree on. The Newbie engineers with Masters Degrees from India, China and Russia have filled their brains with the same XML stuff was we have. If they are likely to take our jobs it is because they work harder than we do.

Subject: Just Say No to XML
Posted by: Andrew Clarke (view profile)
Posted on: Monday, September 25, 2006 at 12:41 PM
Message: I was amused to read the article by Allen Holub which you all ought to see. it is on http://www.sdtimes.com/fullcolumn/column-20060901-05.html

"Just Say No to XML
By: Allen Holub

September 1, 2006 — XML is perhaps the worst programming language ever conceived. I’m not talking about XML as a data-description language, which was its original design. I’m talking about perverting XML for programming applications. It’s inappropriate to use XML as a scripting language (e.g., ANT), a test-description language (e.g., TestNG), an object-relational mapping language (e.g., Hibernate, JDO), a control-flow language (e.g., JSF), and so forth. These sorts of XML “programs” are unreadable, unmaintainable, an order of magnitude larger than necessary, and audaciously inefficient at runtime."


Subject: Argaiv
Posted by: Anonymous (not signed in)
Posted on: Wednesday, October 04, 2006 at 1:59 PM
Message: I think it was best site i ever visit.

Subject: XML is bad?
Posted by: Anonymous (not signed in)
Posted on: Thursday, October 05, 2006 at 6:11 AM
Message: I waited a bit before responding to various posts...to be honest I thought there would be more vitrol from this crowd then most with respect to XML Technologies.

Everyone should first be aware of the 'rule of mediocrity' and how it plays out within technology. Things get adopted not because they are the best, but because it is what everyone could agree upon. And usually commercial interests precede the interests of the practioner.

That is why we see so many 'less then' technologies get adopted where 'better' gets ignored...this could be due to trying to satisfy multiple requirements all at the same time; I would rather have a 'benevolent king' e.g. a wizened old master tell us the right way rather then the 'mob'. Though we know that all kings get corrupted with power over time, which is why the 'wisdom of mobs' tends to come out as a better way over time. There will be many (old hands) who say that the problem domain XML is trying to solve was already solved by such things as SGML and Hytime....someone else's sow's ear is someone else's silk purse.

Now, if we delve into useful 'critical comments' instead of just iterating how bad a technology is without backing it up with technical arguement, I am all for it; though stating a technology 'sucks' to me is just 'fashion' in software...and I wont get into a 'my dad can beat your dad up' type argument with any software.

I have failed at using so many technologies over the past 20 years that I tend to leave the room when such debates start...though I will weakly defend perl or lisp (perhaps when a pint is in front of me), I wont belabour the point.

Another thing I would avoid is the rathole that is discussing XML as a 'syntax'; if we all just knew Extended Backus-Naur Form (EBNF) notation and thinking in terms of 'data models' then life would be much simpler.

XML, as with any successful technology/software, is a/bused because of its success...we find it being applied 'golden hammer' style because people frankly resist using energy in learning the 'prope r way' to do things. This is natural human behavior and I agree with the poster's reference to the Allen Holub's article.

Though Allen goes on to blithely state that everyone should be able to write a compiler is a bit of a 'look how smart I am article'....there will always be older people saying 'do it in assembly'.

Even though I agree that there are too few people doing the right thing; it is no reason to eschew with the useful abstractions we have in computing...there are analogies; for example when building a house should I start mining materials for making bricks and kilning steel, or chopping down trees to process into wood planks? Or should I exist further up 'the food chain' and take advantage of the skills of experts to produce these material for me?

The point is that abstraction in computing allows scenarios such as 'an inexperienced javascript programmer to do useful work' without having to do this in assembly.

Back to XML, the 'efficiency' arguement is a bit of a red herring. I remember trying to shoehorn everything into 64kb of RAM and using tape drives as my hard drive 20 years ago...as time moves on I moved on.

I know I could spend time to optimise my programs but its always a sliding scale. There are lots of efficiencies in developing in XML; if not that its human readable. I agree with the points that XML is inefficient 'over the wire', but in today's world I have to prioritise what I spend my time on; I spent 2 years learning CORBA and 2 years learning DCOM....CORBA to me represents a pinnacle and DCOM was a nightmare. Both are not being used today in anger (ok CORBA is thankfully, though limited), so how stupid am I? Of course all knowledge is good (if one can consider abstractions in computing real 'knowledge) and I know for a fact that the failures and successes of these preceding technologies informed later technologies.

In any event we should always view optimisation as a late step in any software development; personally it's something I see very much as a problem for hardware to solve.

Lastly, Arguing if XML is successful or not, is a bit late in the day; it is a successful technology, though blindly applying it to all problems is well....stupid, I dont do it and you shouldnt as well. All critical analysis should review past and future approaches, it just so happens at this moment in time XML comes up as the answer time and time again for me; though that doesnt mean I will cease critical analysis.

Perhaps we will see on the back of XML a post XML world, with no angle brackets, binary representation, etc....for now, though, I would contest anyone to suggest another approach that would do any better at solving the wide and deep variety of problems XML is applied to.

cheers, Jim Fuller

ps: for the record, I would like to state now that I personally think all technology 'sucks' to some degree; though my opinion will be generated direclty on the basis of how much time is diverted away from more important activities e.g. time away from my loved ones, various hobbies, and other activities I would rather be doing then working.

Subject: Can XML replace RDBMS?
Posted by: Anonymous (not signed in)
Posted on: Friday, October 06, 2006 at 11:20 AM
Message: I am not highly skilled in XML, but I use it from time to time. I am investigating if it's posible to use XML in place of a RDMBS database.

The idea is not to have a database at all for simple queries such as select, etc, where instead of querying the db one just returns info from the xml file (which will be loaded in memory).

thanks

Subject: karma@moment.com
Posted by: Anonymous (not signed in)
Posted on: Thursday, October 19, 2006 at 4:22 PM
Message: Good work. Interesting posts, besides those spam...

Subject: re:Interesting posts, besides those spam
Posted by: Tony Davis (view profile)
Posted on: Friday, October 20, 2006 at 6:15 AM
Message: This article has been subject to relentless SPAM so I've had no choice but to disable anonymous comments. You will now need to be signed in to Simple-Talk to post comments here.

For more information, please refer to our general policy on anonymous posts.

Best,

Tony (Simple-Talk Editor-in-chief)

Subject: Good Job
Posted by: DotNetGuts (view profile)
Posted on: Friday, July 18, 2008 at 3:24 PM
Message: Good Job

 


















Alan Kay: Geek of the Week
 The development of Object-oriented programming, the windowing User-interface, Ethernet and the Laptop... Read more...

Simon Sabin Says SQLBits
 SQLBits is the largest SQL Server conference in Europe. Because it is held on a Saturday, and is free,... Read more...

Level Playing Field
 The Federal Government in the States accepts tenders for their IT projects from a wide-range of... Read more...

Simon Peyton Jones: Geek of the Week
 Simon Peyton Jones is a Principal Researcher at Microsoft Research’s lab in Cambridge. Although he is... Read more...

Craig Newmark: Geek of the Week
 Occasionally, readers of Simple-Talk will ask quizzically if the 'Geek of the Week' that the editors... Read more...

Linus Torvalds, Geek of the Week
 Linus Torvalds is remarkable, not only for being the technical genius who wrote Linux, but for then... Read more...

Driving up software quality - the role of the tester
 Have you ever wondered what a software tester does? Helen Joyce, test engineer at Red Gate software... Read more...

Coming Out as a Cancer Survivor - A Guide for Software Developers
 A personal perspective on the responsibilities of a cancer-surviving software developer Read more...

The Computer that Swore
 Database Developers occasionally get crazy ideas into their heads. Phil Factor should know; He... Read more...

Bad CaRMa
 From hope and euphoria, to desperation, firings and the ultimate demise of a company. Tim Gorman charts... Read more...

Over 150,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Join Simple Talk