06 July 2007

RSS Newsfeed Workbench

Robyn and Phil decide to build an RSS newsfeed in TSQL, using the power of SQL Server's XML.

Have you ever fancied being able to put Simple-Talk’s content into a database? We haven’t either, but we’ll do it just for practice. The objective of this workbench is to show just how easy it is to implement an RSS-based datafeed in SQL Server 2005. The purpose is to try out downloading an XML file from the internet, checking it for validity, and if not, seeing if it is a redirect. We’ll show a cool SQL Server 2005 way of reading a text file into a VARCHAR(MAX) variable. We’ll do some shredding of an XML file too. By the time we’ve finished we hope we will have illustrated some useful techniques and given you an application that you can turn into a production-quality newsreader.

It is very easy to read RSS feeds and OPML feeds into SQL Server 2005. This is because the files are XML and are easily eaten by the XML functions. RSS, which doesn’t actually stand for anything other than RDF Site Summary, is a simple way of publishing, or broadcasting, frequently updated Web content such as articles and blog entries. OPML, originally conceived for transferring outlines, is a format that has been used widely for distributing collections of RSS feeds. However, it has a large number of uses in transferring simple structured list-based information. As it is used for exchanging lists of RSS feeds, we’ll show how to read in an OPML list of feeds.

Please remember before we start that this code is for illustration and is not ‘production quality’. You will need to sort out the security issues (see the process delegation workbench) and add logging and more error-checking.

We won’t attempt to store all the information that RSS2 defines, simply because we can’t find any sites that use all this information. In fact, most feeds use only a small part of the information and few aggregators seem to display the extra information anyway. We’ll boil it down to two relational tables.

Firstly, create a new database called RSSFeedWorkbench. Make sure you have the rights to execute xp_cmdshell too!

The RSSFeed table contains every RSS feed that we want to read from. It is used to poll around the feeds. This routine checks to see whether the TYPE field is set to RSS (or RSS2 etc). If it isn’t then it isn’t read.

Although there are more portable ways of getting the contents of the URL, we’ll use CURL.EXE, which is free, and very easy to install, and useful for a wide range of purposes. You’ll need to download and install this to get the Workbench to run. You’ll need to install the OpenSSL package too from here .

Some RSS feeds will try to stop non-browser software from reading an RSS feed. Curl has useful facilities that allow it to mimic a browser to the extent of allowing cookies and letting you specify the user agent.

This is the stored procedure that grabs the contents of the feed, checks to see if is one we know about and, if not, saves its attributes. then it saves any items it has not already saved. As it saves the url of the feed, you can use it to add the feed to your repository, and then it will automatically refresh it (if you install the spUpdateAllFeeds stored procedure in the right place. This stored procedure is more complex than it might be because one can try to access a feed only to find it is a redirection to the actual feed. If this is the case then one has to read the file as an HTML file, get the anchor, and use the contents of the HREF as the source of the newsfeed, before trying again!

In the list of files to download will be some sample OMPL files to try out. Now when you’ve loaded in a few RSS feeds, all you have to do is to put this stored procedure on the scheduler and this will keep your News up-to-date. Instant Blogrolls! The first person to write an OMPL exporter for this will get a Simple-Talk goodie bag!

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

Downloads

This post has been viewed 21709 times – thanks for reading.

Tags: , , , , , ,

  • Rate
    [Total: 17    Average: 4.3/5]
  • Share

Robyn Page and Phil Factor

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 20 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

View all articles by Robyn Page and Phil Factor

  • Darren

    ISNULL OR COALESCE?
    hi, was just wondering why u are using coalesce instead of isnull in ur code:

    SELECT @ErrorMessage = COALESCE(@errorMessage, ”) + COALESCE(line, ”)

    Thanks.

  • Andy

    Security risks
    Are there potential security risks here by allowing the database to access websites? Could a malicious user us it to download and run code on the server?

  • Robyn Page

    Re IsNull and Security Risks.
    Darren, is the IsNull function better in some way than the COALESCE function? We use COALESCE because its use seems clearer in meaning, but we’re open to persuasion about its use.

    Andy,
    We never give ordinary users access to xp_cmdshell. We implemented this system to run on the SQL Server Agent under a special user created for the job with just the sp and xp permissions required. We say in paragraph 3 ‘You will need to sort out the security issues’ (this article is to illustrate techniques) but even as it stands it wouldn’t have to be unsafe unless your system is wide-open already. If it is, then you have problems enough already!

  • Anonymous

    IsNull
    Hi Robyn,
    i was asking the question just because i was unaware of any advantages between them both -so i did a little research and it seems they are practically the same perf wise -with IsNull very,very slightly coming out on top. Aside from all that i think IsNull is easier to read.
    While reading about it i found this perf tip that i think is very relevant indeed -“IsNull() is unquestionably faster and easier to spell than Coalesce()” LOL

    Darren.