Phil Factor's Phrenetic Phoughts

Simple-Talk columnist
The wilder shores of Transact SQL

Sir! My dog ate my database.

Published Thursday, February 09, 2006 3:04 PM

In asking various colleagues about the disasters or near disasters they have had with databases, I find to my astonishment that I am the only one who has ever, in a moment when my attention wandered, deleted a live database. It is odd, though, how products geared solely to mopping up from database disasters sell so well. It must be that these virtuous people are stocking up with the metaphorical mop-and-bucket as an insurance against the near impossible.

The worst event I ever suffered was due in part  to the wretchedness of the way that BLOBS (Text and images) were implemented in SQL Server. Whereas even poor PostgreSQL can handle strings of arbitrary length, SQL Server finds it awfully difficult. In fact, I have a theory that the guy originally tasked with implementing TEXT fields in SQL Server got more and more stressed until one day he looked up from his desk, shouted out ‘must join the penguins!’, flung of all his clothes, and disappeared forever. His colleagues were too distressed to clear up his work, which remains today unfinished inside SQL Server.  I once had a database implemented in SQL Server 6.5 that used Text fields. I could have made a mistake somewhere but some loose pointer sprayed angry bytes silently and stealthily all over my data. Unknowingly, I innocently did a neat and virtuous backup regime, until finally, the cancer had spread to the point that the database came up with a string of errors, failed the DBCC checks, and shut up shop. Vainly I went back through the backups. I had, for some time, unknowingly backed-up mangled data until I’d over-written the good ones, the 'Last-Known good backup' as the documentation describes it. The result was a great-deal of embarrassing downtime until I could retrieve the database.  It taught me a great deal about how SQL Server worked in the process, and possibly made me a more saintly person.

Another way I’ve seen production databases disappear is the build script. Some programmers like to work on the entire database script at once. This is fine in a development environment but, if you choose all the options in the build script, you are in some potential peril. I knew a DBA who was trying to fix a problem with the live database late at night, chose to work on a build script,  and who went to select a stored procedure and recompile it, but hit the ‘Execute Query’ button when there was no selection. It rebuilt the entire database from scratch but without the …er… data.

In the course of a busy day, I’ll be working on several databases at once. I wish one could colour-code the Query-analyser or attach different icons to different databases. I have been doing this so long that I always go through a little ritual before I hit that deadly ‘Execute Query’ button. It is so easy to think you are on the development database, decide to truncate a table and….

So I’d be really interested in hearing about how the great database disasters really happen, and how you managed to restore the data. Surely someone out there has experienced that ghastly clammy feeling of shock as one realises that one has just destroyed a live database system on which a corporation depends for its revenue, and for which even a short down-time is a disaster. No, sir I haven’t but I have a ..er.. friend…… And, of course, a prize for the best story.

Comments

 

Chaklun said:

Once I killed 350000 rows containing discounts... And went for lunch, left my cellular charging on desk. It was for 1 hour when our clients were robbed :)
No one noticed. Good to me.
February 13, 2006 9:29 AM
 

Mike said:

I once killed our year-end process 7 days into it's run (had about another couple of hours to run before it was completed)
February 13, 2006 10:13 AM
 

Clive said:

Not me personally...I was on holiday and a colleague managed to delete 20m+ rows of data from a table forgetting to include the where clause. The db was a reporting db and only ever updated by sql server jobs but for some bizarre reason, it was decided to rerun the job to pull 3 years of data from the as/400 system (between the UK & US) instead of doing a restore. It took 19 hours (as well as impacting on general user connectivity to the as/400) when it would have taken maybe 20 minutes to do a restore!
February 13, 2006 10:22 AM
 

Mike Rodriguez said:

I was called from a client to give "support" (save his job, actually) about midnight, and the first statement he gave me was "I accidentally deleted all the databases and the backups"... I remembered a Dilbert's joke that begins just like that. I couldn't help but laugh and tell him: "Well, I suppose you have a plane ticket, just in case".

Mike
February 13, 2006 4:31 PM
 

Dave said:

I was working at a large insurance company with sales offices nationwide. We had a complex client-server application with users connected to it in about twelve states. Another developer and I were trying to troubleshoot a slow query that was key for a report, so we copied a table from the production database down to dev to fiddle with it. I had one QA connection pointing to the dev database and one pointing to the production database. We found a solution we liked, and rather than take up all that space on the dev server, the other guy deleted the table. Or so we thought; he had actually went to the wrong window and deleted the production table. It was one of those situations where you know about 100ms after you do something that it was the absolutely wrong thing to do, like rolling through a stop light in front of a speeding fire engine. And it meant that client apps were puking nationwide with "object not found" messages (hey, we'll put more graceful error handling in the next version.) We hurried up and ran a list of who was connected, then he took half and I took half and we phoned them all to let them know we were working on the "hardware" problem. This was before clustered servers with failover, so we could get away with blaming the hardware guys. Then we just copied the table back up from dev, and, since the system was offline anyway, we plopped in the new query, too. This took about 20 minutes. Only then did we call our boss who had been at lunch: "Bob, the funniest thing just happened..." We were able to say that everything was hunky-dory. The following week we got a message from the key user group committee complimenting us on keeping everyone informed during an outage, and hey that one report runs a lot faster now, too. nyuk nyuk nyuk.
February 13, 2006 6:34 PM
 

Phil Factor said:

The distinguished panel of judges from Simple Talk have conferred and overwhelmingly voted Dave's contribution to be the prizewinner of this competition.
It shows high professional skill to manage a mistake like this so skilfully that the team actually gets praise for fixing the problem the team has caused.

Dave, can you please get in youch so we can give you the prize!
February 24, 2006 11:01 PM
You need to sign in to comment on this blog

















<February 2006>
SuMoTuWeThFrSa
2930311234
567891011
12131415161718
19202122232425
2627281234
567891011
Reporting on Mobile Device Activity Using Exchange 2007 ActiveSync Logs
 In this new column giving practical advice on all things Sys Admin related, Ben Lye takes on the often... Read more...

The Bejeweled Puzzle in SQL
 Alex Kozak provides another SQL puzzle to hone your SQL Skills with.  Read more...

Using Powershell to Generate Table-Creation Scripts
 For all of us who learn best by trying out examples, Bob Sheldon produces a PowerShell script file for... Read more...

Configuring Exchange Server 2007 to Support Information Rights Management
 In Exchange Server 2007, Information Rights management is easy to set up once you have set up the... Read more...

SQL Response: The dim sum interview
 Richard Morris met David and Nigel of the SQL Response team, in a dim sum Restaurant in Cambridge. They... Read more...