Click here to monitor SSC
  • Av rating:
  • Total votes: 15
  • Total comments: 10
Tony Davis

SQL Server, PostgresSQL and Fish Curry

16 August 2006

An interview with Adam Machanic

At Tech Ed 2006 in Boston I cornered Adam for 45 mins and talked to him about SQL Server 2005, stored procedures, beer music and various other things. I finally got round to transcribing the interview. Enjoy!
–Tony Davis, Simple-Talk editor.

Adam is an independent database software consultant, based in Boston, Massachusetts. He has implemented SQL Server solutions for a variety of high-availability OLTP and large-scale data warehouse applications, and also specializes in .NET data access layer performance optimization. He is a SQL Server MVP, is co-author of Pro SQL Server 2005 and is currently in the process of solo-authoring "Expert SQL Server 2005". He regularly speaks at user groups and community events.

Small talk

[TD] Tell me a little bit about what you're up to right now, the sort of projects you're working on at moment…

[AM] I work mainly with start ups and ISVs…most businesses I work with are smaller companies who bring me in at beginning of projects to help with development work, data architecture etc. I just started up a consulting firm called Data Manipulation Group, Inc. It's just me at the moment but if I can find some people in the area (Boston) who are qualified they are welcome to email me and I'd love to talk to them.

[TD] It does sound like a busy time…because you're also writing a book at the moment, right?

[AM] Yeah, it's been (laughs wryly)…a tough process. Hopefully it will be out in November 2006 but progress has been a little slower than I anticipated. I'm fighting hard to maintain a very high quality level but I find I need a full day or two days solid focus on it otherwise it's very hard to make good progress. And of course finding those solid blocks of time is incredibly difficult. It's very different from writing an article…I can bang out a 4-5 page article in 2 hours, but the book…a page every two hours…if I'm lucky.

How 2005 changes things for developers and DBAs

[TD] I know you've already working quite extensively with 2005, and obviously playing with it quite a lot while writing the book…what's your overall impression…what's better/worse than in 2000? Have they made the DBMS easier to use, harder to use?

[AM] In my opinion SQL Server always has been and is still a very easy database to install and manage and, in fact, admin continues to get easier. Paul Flessner talked recently about the self-tuning, self administering database and I think they have taken it quite a way in that direction already. For small-to-medium databases…you really don't need to do that much with it.

[TD] But surely 2005 has added considerable complexity…there are so many elements to it now…

[AM] It's certainly true that you can't really be a "SQL Server 2005 expert" in the same way you could be a 2000 expert. With 2000 it was possible to know pretty much everything there was to know about the database. With 2005 you can spend all your time just being a Service Broker expert or an Analysis Services expert. I know SSIS experts who don't touch the relational engine…ever….and they make plenty of money. Same with Analysis Services and Reporting Services.

[TD] So how is all this affecting the "DBA-Developer divide"? Before I started doing SQL Server, I did quite a lot of work with Oracle so I guess that's my reference point…in Oracle this divide was strong, with the DBA very much the "gatekeeper" and final arbiter of what could and couldn't go on in the database…and developers often feeling they needed to subvert the DBA in order to get things done. I've never sensed that in SQL Server but wondered whether the complexity of 2005 would move things in this direction…

[AM] In my limited experience with Oracle…and I have to stress that it is limited…I found that managing Oracle is a lot harder that managing SQL Server. I worked in one company where we had two SQL Server DBA/developers and we were managing over 100 servers. We also had three Oracle servers and we had three Oracle DBAs…and these DBAs were constantly complaining at how underwater they were and had no time for anything…and there were two of us managing 40 times more servers without too much trouble.

Even installing Oracle seems difficult. I tried to install it and I had to get a DBA to come and help me for 4 hours to get it to install properly. Installing SQL Server is a 5 min job.

[TD] Oracle has never had a great rep for its usability…though that's something that is changing with 10g. But even with SQL Server… 2005 is such a leap in complexity …surely the days of the one-man SQL Server shop, the guy who could do anything and everything, are coming to an end?

[AM] For small to medium systems, if you've done at least the basics right, then SQL Server will just run. A lot of the shops I go into run into problems because they didn't even have a DBA…and they had grown beyond the point where they could cope with that. Eventually you do, but for small shops you still don't really need a DBA and I think that's the allure of SQL Server in many ways.

[TD] So what is "getting the basics right"?

[AM] It really is down to basic database design a lot of the time. You would not believe the number of shops I've gone into that didn't have primary keys…they didn't define a single PK, so there are no indexes in the database at all…no constraints on any of the data…

[TD] !!!????

[AM] It's more common than you might think…it's OK for me because I just come in, create a PK and the app is running 20x faster, and I'm the hero...but all I've done is what they should have done in the first place.

The inevitable stored procedure debate

[TD] In terms of application design, I know that you're a big fan of using stored procedures. You've just written an article for Simple-Talk on the big stored procedure debate and it caused quite a bit of controversy and debate. Why do you think this topic so polarizing?

[AM] I think a lot of people are swayed by anecdotal evidence rather than actual experience. They heard someone had a bad experience with technique "x" and then they just brand "x" as "bad technology". Also, there's a bit of fanaticism around …if author "y" says stored procedures are bad then they must be bad--even if you don't really understand the arguments. I happen to like stored procedures….probably because I've spent a so much of my time fixing problems with incorrect use of ad-hoc SQL!

[TD] There seem to be so many different strands to it…people talk about tying yourself in to SQL Server and having to make big changes to move to another database but I find that a tired argument…but I mean how many times do you actually need to migrate to another database?

[AM] The migration argument is very tired but not because it doesn't happen a lot...it does…but because nobody using Oracle or DB2 or MySQL or any other platform is only going to use the standard SQL part of that platform. Everyone uses the vendor extensions anyway! And that's because you have to. If you don't you're not fully utilizing the DBMS and your app will simply not perform as well

And even the standard SQL is not consistent from platform to platform so whatever it is you’re doing, you'll need to make changes when you migrate. For example SQL Server fully "entry level" compliant to SQL-92 and supports bits and pieces of SQL-99…extensions such as recursive CTEs…and you can bet that Oracle will be different and DB2 different again.

[TD] I've always bought in to the idea that if you have logic protecting the data then the best place for it is in the database where it can't be bypassed.

[AM] My rule is database as final arbiter of all matters data. Let the business tier deal with the business questions but if it's a data question, put it in the database. That's why I get into another set of arguments with people about business logic in database…I always ask them…well, what is business logic? They mumble a bit and generally have trouble answering.

That's why I have a very tight definition of business logic that I live with. I partition things into data logic, business logic and application logic. Data logic is anything to do with data consistency, verifiability and integrity…the core data rules.

Above that we have the business logic. What do we do with the data, how do we manipulate the data to get the business answers that we require. This probably doesn't belong in a stored procedure as often as it belongs somewhere else--i.e., in a business logic layer.

Then above that we have application rules…what color do I paint the UI? Where should this button show up? And so on. These are clearly UI questions and you want those as far out of the database as possible.

If you layer your application in this way you end up with solid data core that is verifiable and then you create multiple business layers over the data layer…all talking to the same data layer and following the same rules but answering different sets of questions independently of each other. Then if you like you can add multiple UI layers over those business layers. By properly layering you can eliminate dependencies and make yourself really flexible…but…that's a long way off for some people.

What's hot in 2005?

[TD] Before I came to Tech Ed, I spent some time on newsgroups and forums trying to seek out a common thread to the interest in SQL Server 2005 and came to conclusion that…there wasn't one….there is just so much that is new that there seems to be no single issue that is more prominent than any other. Do you agree?

[AM] I spend a lot of time on newsgroups and forums. Most of last year things were really boring…I was answering the same questions over and over. The most common questions on forums were things like: how do I do dynamic parameters for a query? How do I concatenate string at a row/set-based level? I still see those questions…but suddenly a whole host of new questions are appearing.

[TD] So have you detected any particular "hot" areas?

[AM] The traffic on forums is kind of bursty. For example, I can log on to the CLR forum and find 20 questions posted in one day, then the next 3 weeks there will be none. Same goes for service broker. I never worked out how users all seem to arrive on the same topic simultaneously and then disappear again!

Initially there was a lot of interest in SQL CLR but now that has died out – I'm now seeing very few questions on that.

[TD] Why do you think that is?

[AM] I think people are still a little scared of the technology, but more than that… there really aren't that many use-cases for it. A lot of the interest came from people who hadn't really researched it. I've actually tried to use CLR on a couple of projects. I had a couple of scenarios where I thought it would be great but it turned out to have a few problems.

[TD] Performance-related?

[AM] Well there are cases where CLR will definitely out-perform T-SQL but you don't tend to run into these cases that often. One example is heavy math -- but not too many apps try to do heavy math in the DB...it's done in the business tier so it becomes a non-issue.

Using CLR functions with regular expression is another popular sample that you see in books and online…but it turns out that if you filter data based on a regular expression from CLR UDF the optimizer can't do anything with it and you end up with a table scan and horrible performance. This is a classic instance where you can have functionality or you can have performance – and for database apps, it's performance every time.

Then there's the 8000 byte limit on UDAs and UDTs…a lot of people are frustrated by that. It's V.1 and it will be really interesting to see how it progresses but right now not that many people can use it.

[TD] If CLR, SSAS, SSIS, etc are the bells and whistles, what is the best really core new functionality that they have put in the database…what should a programmer be looking to exploit in their code right now?

[AM] For me the coolest thing is the new exception handling. I already did three migration projects this year and for the first one I really wasn't up to speed on it, in the second I started to realize how useful it was and now I use it ALL the time.

[TD] What's the primary difference?

[AM] Instead of just letting the error go up to the application you catch it in the database. If you care about the error you can log it right there. You don't have to bubble it up to the app and tell the app to go back to the DB and log it. I usually catch the exception add a marker to it that says "I caught an exception here", logging it and then throwing an exception back to the app. So the app knows an error occurs but I've already intercepted it, logged it, and maybe taken some corrective action. It's amazing how many places you can fit it in to make your code more readable and flow a lot better, by handling exceptions in the data layer instead of the app.

That's my number 1, but Service Broker is close behind. I'm working on an app now that has a lot of offline, disconnected asynch processes. They had written four different windows apps that would sit there and ping the DB from time to time waiting for a "start doing some work" message. We converted it all to SB and it all works amazingly well.

The database engine

[TD] Do you think the database engine is well instrumented? If you have made a mistake in your design that means your database is performing poorly, is it easy to find out exactly where the problem is? Again, taking Oracle as my reference point… if it is anything, the Oracle database is very well instrumented…you can get very granular, wait-based information out of the engine…you can find out exactly how long each small part of the whole operation took. That sort of capability wasn't there in 2000 and I'm interested in how 2005 has changed that…if at all.

[AM] Yeah…I don't think SQL Server is quite there yet. The new DMVs…Dynamic Management Views…are taking it a lot further and Profiler 2005 has added a whole host of new events, so you can get a lot more performance data out of the engine, but you still can't get really granular information and it's hard get it in real time.

[TD] OK, so you don't often run into a performance issue and think…I can't find where the problem is!

[AM] Yes that happens all the time! (laughs). In my experience, blocking issues are very common, as are IO issues.

[TD] Tell us a bit about the load tool you've developed to investigate execution times, IO stats and so on.

[AM] It's currently a very simple load tool that I developed for readers of my book, to help people collect basic timing and IO statistics for themselves. It let's you put in a batch of SQL or calls to stored procedures and call them in a loop. There is also a multi-threaded mode that let's you have up to 200 simultaneous virtual users. Of course, when you're testing SQL you don't want same data to be queried over and over…because Server will just cache it. Performance will be great but guess what…you need to test the IOs. So the tool lets you substitute in values for parameters in the SQL. It runs a query and returns the values, then you can map those values into the queries that you are actually testing, so you can test a range of values instead of one value at a time.

[TD] Sounds very cool. Will future versions go any further then that? For example, provide stats on parsing and latching etc so you can gauge a solution's likely scalability as well as performance?

[AM] At the moment, my tool just does very simple reporting on logical IOs – logical reads, which includes both physical and cache reads. It also gives CPU timing stats and total time reporting. It pulls these out using the simple SQL SET STATISTICS IO ON/TIME ON, and with the result of the query SQL Server sends back the stats, so this makes it very easy.

Going a step further I would have to latch into the tracing API and trace in the background while running a load and then correlate based on process IDs. But if I put that much work into it, it will move from the free product range to the paid product range…so I apologize to my readers in advance if that happens--but I plan to always maintain a free version, either way.

Is there life after SQL Server?

[TD] A bit of an off-the wall question …but if you suddenly found yourself working on Linux, which database would you use?

[AM] Easy question. Easy question. PostgresSQL. I've been studying it as much as possible in my spare time…I've installed it (on Windows as it happens). It's an awesome DBMS and has some really cool features

[TD] You surely can't mean that (mock horror)… it does some things better than SQL Server?

[AM] Oh yes. For example, it has a lot of index types that SQL Server doesn't support. It also has this really interesting API that lets a user create their own index types...it's an open index API. I'm not yet advanced enough to use it but you feed it an algorithm (e.g. tree algorithm) and it will index your data based on that. Very cool.

[TD] Do you ever see yourself making the switch?

[AM] I'm not tied to Windows/SQL Server. Well…I know it really well right now and it would be painful to switch but I'd be open to the idea. But right now I'm very happy in this space. I really enjoy it…it's a great community to be in. You get a real sense for that...especially at conferences like PASS, where you really feel that you are part of something.

Winding down

[TD] So where are the best online resources? Which are the handful of places that you visit every day?

[AM] You mean apart from Simple-Talk.com of course (laughs)

[TD] Simple-Talk you say. Interesting. How do you spell that…S.I.M.P.L.E….(laughs)

[AM] Microsoft seem to be putting a lot of resources into making the MSDN forums (forums.microsoft.com/msdn) very high quality – they have a lot of really good MS guys answering questions there and I tend to visit them most days. I also hang around a bit on sqljunkies.com and answer questions on the forum. In all honesty I don't actually surf round that much…I tend to stick to the sites I know.

[TD] What about outside of SQL Server. Any beer related sites? I know you're a keen home-brewer…

[AM] I used to go to brewrats.org, but one of the best one is beeradvocate.com. Actually, though, a lot of the non-SQL Server stuff I read is food-related. One of the best sites out there is actually run by Kalen Delaney's daughter –travelerslunchbox.com – where she posts recipes but also the background story to it, like where she found the recipe, plus some gorgeous photos of the food. It's a really great site.

[…big cheer in background as Italy score against Ghana in the World Cup…]

[TD] So you are a keen cook?

[AM] Yes, mainly Asian food…Japanese, Indian…lots of spices!

[TD] I always wished I could cook a really good curry.

[AM] I actually created a great fish curry…I'll send you the recipe and you can check it out.

[TD] Cheers Adam

[Ed Note: Adam did send me the recipe...I will try it out and report back soon in my blog]

Tony Davis

Author profile:

Tony Davis is an Editor with Red Gate Software, based in Cambridge (UK), specializing in databases, and especially SQL Server. He edits articles and writes editorials for both the Simple-talk.com and SQLServerCentral.com websites and newsletters, with a combined audience of over 1.5 million subscribers. You can sample his short-form writing at either his Simple-Talk.com blog or his SQLServerCentral.com author page.

As the editor behind most of the SQL Server books published by Red Gate, he spends much of his time helping others express what they know about SQL Server. He is also the lead author of the book, SQL Server Transaction Log Management.

In his spare time, he enjoys running, football, contemporary fiction and real ale.

Search for other articles by Tony Davis

Rate this article:   Avg rating: from a total of 15 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: PostgreSQL
Posted by: Andrew Clarke (view profile)
Posted on: Friday, August 25, 2006 at 5:09 AM
Message: Adam says, when asked what database he would use on Linux, 'Easy question. Easy question. PostgresSQL. I've been studying it as much as possible in my spare time…I've installed it (on Windows as it happens). It's an awesome DBMS and has some really cool features'

PostgreSQL! Although I have a certain fondness for it and very much hope it will eventually succeed, we must be realistic. Species have evolved in the time it takes to execute a decent bit of SQL. It runs sometimes at a tenth of the speed of SQLite. Have a look at some of the public comparative benchmarks.

As an exercise, I once created a reasonably simple customer database containing a million customer records along with all the usual NAD data. I installed it on SQL Server and PostgreSQL. (thanks to the EMS tools, bless them). They were both on the same Windows 2000 box.

The SQL Server system, on average, took a twentieth of the time to produce results from SQL, and the more joins, the more astonishing the difference. I'd assumed that I'd made some ghastly mistake in the installation of PostgreSQL so I got a PostgreSQL expert to check my installation. No mistake. He said that maybe it would run faster on Linux. I tried that, but failed to be excited.

I can now see why it is that I get such a difference of opinion with the J2EE-freaks over the extent of normalisation of databases. I know that a result from a SQL Query in SQL Server or Oracle joining maybe five or six tables will execute with almost the same speed as with just one. I've seen, on the other hand, J2EE/PostgreSQL systems that have to rely on a great deal of highly risky and complex 'Cacheing' of data inside volatile objects just to get the performance of web-based applications at a reasonable level.

I'm a great supporter of Linux and open-source databases. I really want them to succeed. There are some really excellent commercial databases available on Linux. However, one should be careful to run benchmarks on real data before selecting PostgreSQL or MySQL on a production commercial system

Subject: Interesting...
Posted by: Adam Machanic (view profile)
Posted on: Friday, August 25, 2006 at 4:54 PM
Message: Andrew,

I can't say I've done any performance comparisons, nor have I tried large data sets. But so far in my limited exposure PostgreSQL has been OK.

Have you played with Firebird at all? That's another interesting alternative DBMS. I'm morally opposed to MySQL, so any option other than that is what I'd tend to gravitate towards :)

Subject: What database to use with Linux
Posted by: Andrew Clarke (view profile)
Posted on: Monday, August 28, 2006 at 5:22 PM
Message: I haven't tried Firebird. I've used Sybase and Solid, which are both excellent. For J2EE, I'm beginning to think that SQL Server is an excellent choice, just as long as people don't get excited about the fact that it is running under Windows. The JDBC driver was always pretty reliable but the Data Programmability team at Mocrosoft have worked pretty hard on the V1.1 JDBC for SQL Server 2000 and 2005 released last week.

Subject: PostgreSQL > MySQL
Posted by: Anonymous (not signed in)
Posted on: Saturday, November 11, 2006 at 10:09 PM
Message: Mr Clarke, one should not really compare PostgreSQL to MySQL. PostgreSQL is much faster than MySQL.

As for your data points, it would be really interesting if you posted a page with your experience details. The free software community has been known to thrive on critiques and comparisions, sometimes showing they were mistaken, sometimes arriving at real improvements.

Subject: Andrew, what did explain analyze say?
Posted by: Anonymous (not signed in)
Posted on: Sunday, November 12, 2006 at 2:25 AM
Message: I'm wondering what explain analyze said about your joins? I've seen plenty of people using PostgreSQL who hadn't analyzed their data or created proper indexes or joined on unmatched types have this type of problem.

As a development and production PostgreSQL and Oracle DBA, I've not seen the vast difference you mention between Oracle and PostgreSQL. In fact, where I work, we pump data out of our production Oracle databases and into a PostgreSQL database for reporting, because we don't want to overload the working primary transactional database with queries. The PostgreSQL servers routinely return data in <1 second from millions of rows, and yes, they are often joining several tables (5 to 10).

We also run our internal support and decision making systems on PostgreSQL, where it also runs very quickly. However, without a knowledgeable Oracle or PostgreSQL DBA, neither of those databases run well.

Subject: PostgreSQL vs SQLite
Posted by: Anonymous (not signed in)
Posted on: Sunday, November 12, 2006 at 4:49 PM
Message: As soon as you start comparing PostgreSQL with SQLite I stopped reading. You could just as well compare Oracle with text files, you'd get a similar outcome. You don't get quite the same feature set though.

Ben

Subject: Re: PostgreSQL
Posted by: Phil Factor (view profile)
Posted on: Monday, November 13, 2006 at 12:48 PM
Message: Ben/Anonymous,

Many thanks for the comments and observations. Sorry to be a bit long in replying, but I've been trying to find the 'explain analyze' dumps for the sample database. I still have one or two which I've found, but only for the simple queries. It is the ones for the five-table joins that you'd be interested in. As they had a lot of rows, I'm afraid I deleted the PostgreSQL database some time ago. I've still got the SQL Server version.

It would be interesting to re-run the tests if you are willing to help. (I'll have to take out some copyright code)

I can do the conversion to PostgreSQL, but the problem with running the PostgreSQL versions of the tests is that when I get a bad result, someone says, ah well there are ways... and expert would....Explain analyse will tell you... If you can check it out in Oracle too then we could get a rather good SimpleTalk feature

In the subject of SQLite, I used to poke fun at it, but now I'm a convert. I think it is wonderful, and it fills a gap in the market where no other database can compete. If you haven't tried it lately, please give it a go. you'll be amazed. ( I thought I'd leave mention of SQLite to the end so that Ben didn't stop reading too soon!)

Subject: Postgresql is NOT Slow
Posted by: Anonymous (not signed in)
Posted on: Wednesday, January 17, 2007 at 1:21 PM
Message: People,
Postgresql might have been slow in the 7.x and lower versions, but the latest 8.2 version ROCKS!!!
Postgresql performance is tied to the postgresql.conf and if you leave it as is you get settings meant for a low resource server.
Also MS SQL server will blindly create indexes for you on the fly without you ever adding them. Postgresql does not automatically create indexes, you must do that yourself. If you add the same indexes in Postgresql as the MS SQL server you will get similar performance out of Postgresql. MS SQL server does a LOT of hand holding and I don't think that is a good thing for serious databases.

Subject: Automatic Indexes?
Posted by: GSquared (view profile)
Posted on: Friday, July 06, 2007 at 11:12 AM
Message: I don't think I've ever seen a query execution plan include any sort of "automatically generated index" in SQL 2000 or SQL 2005. Am I missing something?

Unless you're talking about clustered indexes on primary keys, but even those have to be created when you create the table.

But I've definitely never seen, "MS SQL server ... blindly create indexes ... on the fly...". Am I missing something on that?

Subject: Anonymous Comments Disabled
Posted by: Sarah Grady (view profile)
Posted on: Monday, November 19, 2007 at 3:44 AM
Message: Due to a high volume of spam, anonymous comments have been disabled.

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

SQL Server XML Questions You Were Too Shy To Ask
 Sometimes, XML seems a bewildering convention that offers solutions to problems that the average... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.