|
|
Simple-Talk columnist
-
Posted Thursday, January 05, 2012 2:32 AM |
(Updated 14th Jan 2012, and again 26th Jan 2012)
Every so often, the question comes up on forums of how to pass a list as a parameter to a SQL
procedure or function. There was a time
that I used to love this question because one could spread so much happiness and gratitude by showing how to parse a comma-delimited
list into a table. Jeff Moden has probably won the laurels for the ultimate list-parsing TSQL function, the amazing ‘DelimitedSplit8K’.
Erland Sommarskog has
permanently nailed the topic with a very complete coverage of the various methods for using lists in SQL Server on his website.. With Table-Valued parameters, of course, the necessity for having any lists
in SQL Server is enormously reduced, though it still crops up.
Element-based XML seems, on the surface, to provide a built-in way of handling lists as parameters. No need for all
those ancillary functions for splitting lists into tables, one might think. Yes, indeed, but be careful of the XQuery
syntax that you use, as we'll see.
Let’s just take the very simplest example of taking a list of integers and turning it into a table
of integers. One can use a simple element-based XML list based on a fragment like this…
DECLARE @XMLlist XML
SELECT @XMLList
=
'<list><i>2</i><i>4</i><i>6</i><i>8</i><i>10</i><i>15</i><i>17</i><i>21</i></list>'
…to give a table like this ..
IDs -----------
2
4
6
8
10
15
17
21
(8 row(s) affected)
..by using this TSQL expression
SELECT x.y.value('.','int')
AS IDs
FROM @XMLList.nodes('/list/i')
AS x ( y
)
It isn’t as intuitive as a simple comma-delimited list to be sure, but it is bearable. In small
doses, this shredding of element-based XML lists works fine, but,<added 26th January> using this particular syntax for getting the value of the element </added 26th January> one soon notices the sigh from the server as the length of the list
starts to increases into four figures. Try it on a list of a hundred thousand integers and you’ll have time to eat a
sandwich.
OK. Let’s do a few measurements and do a graph.

Jeff Moden’s Split function, scales beautifully as shown by the red line, and it is difficult to
measure it, it is so quick. The XML eShred technique by contrast, using the element-based list, and this XQuery syntax, exhibits horrible scaling. This sort of
curve is enough to strike terror into the developer. Just as a comparison, I did
the process of taking a list and turning it into a table by creating a VALUES expression from the list. Even this took
longer than Jeff’s function, presumably because of the overhead of compiling the expression. Of course, the comparison
is rather unfair because this third approach , the ‘Using Values Script’ approach, does no validation, but still one
wonders what on earth the XML expression is DOING all that time. It must be gazing out the window, reading the paper and
scratching itself, one assumes. No. Actually, the CPU is warming up on the task, so it involves frantic activity.

Operations involving XML can be startlingly fast in SQL Server, but this particular critter seems startlingly slow
once one gets to a three-figure number in the list that you’re turning into a relation that can then be used in SQL
Expressions. Imagine this getting loose in a production system when someone starts passing more values in the list and
the database suddenly slows down. How would you track the problem down?
<added 26th January>The big problem here is one of the expression. As pointed out by a reader of the blog, if you modify the expression slightly to
SELECT x.y.value('.','int')
AS IDs
FROM @XMLList.nodes('/list/i/text()')
AS x ( y
) The shredding suddenly goes like a rocket</added 26th January>.
There is a second, and more wordy version of the simple XML list, The attribute-based list. Here
DECLARE @XMLlist XML SELECT @XMLList = '<list><y i="2" /><y i="4" /><y i="6" /><y i="8" /><y i="10" /><y i="15" /><y i="17" /><y i="21" /></list>'
…to give exactly the same table ..
IDs -----------
2
4
6
8
10
15
17
21
(8 row(s) affected)
by using a very similar syntax like this...
SELECT x.y.value('@i','int') FROM @XMLList.nodes('list/y') AS x( y )
'Eh? What difference could this make?', you're wondering. Well, rather a lot, it turns out. Thanks to a reader comment on the first version of this blog
by wBob,
we can reveal that this version scales as well as Jeff's amazing 'split' function. I must admit to being rather
surprised at the difference in performance of the two. By using the Attribute-based list, or the modified XQuery expression, you can cheerfully pass lists
of substantial length to procedures without worrying, it seems. It reminded me of an argument I had some years ago with
a distinguished Database Developer who was advising us all not to use XML-based lists because of their performance
problems. I'd been using attribute-based lists simply because I copied Bob Beauchamin's example code in 'The
Book', and hadn't hit any problem. We couldn't convince each other. Perhaps we'd hit this performance difference.
<added 26th January>We can even improve on this. as Sviridov Konstantin has pointed out, you can tweak it to the point that it goes
around twice the speed of Jeff's TSQL split function by using syntax like this...
SELECT x.y.value('.','int') FROM @XMLList.nodes('list/y/@i') AS x( y )
Which will give a performance comparison with the the TSQL split function. These modified XML shreds must be the fastest non-CLR
method of transferring list-based data as a variable in SQL.

</added 26th January>XML in
SQL Server seems to remind me of the old poem by Henry Wadsworth Longfellow. (1807-1882)
There was a little girl, who had a little curl Right in the middle of her forehead, And
when she was good, she was very, very good, But when she was bad she was horrid.'.
Just as a comparison, here is the horrid XML performing against the first attribute-based XML Shred, and the TSQL split function

As always, one should measure everything you can, and take nothing for granted if you have to
deliver a scalable application.
Here is the simple code I used to do the metrics. The results were simply pasted into excel and
graphed.
--in order to record the timings, we prepare a log. DECLARE @log TABLE ( Log_Id INT IDENTITY(1, 1),TheeVent VARCHAR( 2000 ), [Values] INT, CPU FLOAT DEFAULT @@CPU_BUSY, DateAndTime DATETIME DEFAULT GETDATE()) ; --define working variables DECLARE @List VARCHAR(MAX),@XMLList XML,@AttributeBasedXMLList XML, @buildScript VARCHAR(MAX), @XMLbuildScript NVARCHAR(MAX), @ii INT, @Values INT; --and table variables to receive the results/relations DECLARE @ByteBucket TABLE (TheNumeral INT) DECLARE @ByteBucket2 TABLE (TheNumeral INT) DECLARE @ByteBucket3 TABLE (TheNumeral INT) DECLARE @ByteBucket4 TABLE (TheNumeral INT) DECLARE @ByteBucket5 TABLE (TheNumeral INT) DECLARE @ByteBucket6 TABLE (TheNumeral INT) SET NOCOUNT ON SELECT @Values=1 --start with one item in the list WHILE @Values<=3000 BEGIN --build up the list with random integers SELECT @List='1',@ii=@Values WHILE @ii>1 BEGIN SELECT @List=@List+','+CONVERT(VARCHAR(3),CONVERT(INT,RAND()*100)) SELECT @ii=@ii-1 -- .. to the required length END --and pre-prepare the XML List and VALUES script SELECT @XMLList='<list><i>'+REPLACE(@List,',','</i><i>')+'</i></list>' SELECT @BuildScript='SELECT x FROM (values ('+REPLACE(@List,',','),(')+'))d(x)' --and pre-prepare the XML List and VALUES script SELECT @XMLBuildScript='SELECT @AttributeBasedXMLList = ( SELECT x AS "@x" FROM (values (' +REPLACE(@List,',','),(')+'))d(x) FOR XML PATH(''y''), ROOT(''root''), TYPE )' EXEC sp_executesql @XMLBuildScript, N'@AttributeBasedXMLList XML OUT', @AttributeBasedXMLList OUT --try doing the delimited list function INSERT INTO @log (TheEvent, [Values]) SELECT 'Using Split function',@Values INSERT INTO @ByteBucket (TheNumeral) SELECT item FROM dbo.DelimitedSplit8K (@list,',') --use the XML Shred trick INSERT INTO @log (TheEvent, [Values]) SELECT 'Element-based XML',@Values INSERT INTO @ByteBucket2 (TheNumeral) SELECT x.y.value('.','int') AS IDs FROM @XMLList.nodes('/list/i') AS x ( y ) --use the XML Shred trick INSERT INTO @log (TheEvent, [Values]) SELECT 'Element-based XML (text())',@Values INSERT INTO @ByteBucket6 (TheNumeral) SELECT x.y.value('.','int') AS IDs FROM @XMLList.nodes('/list/i/text()') AS x ( y ) --try the VALUES method INSERT INTO @log (TheEvent, [Values]) SELECT 'Using Values script',@Values INSERT INTO @ByteBucket3 (TheNumeral) EXECUTE (@BuildScript) --use the XML Shred trick INSERT INTO @log (TheEvent, [Values]) SELECT 'Attribute-based XML',@Values INSERT INTO @ByteBucket4 (TheNumeral) SELECT x.y.value('@x','int') FROM @AttributeBasedXMLList.nodes('root/y') AS x(y) INSERT INTO @log (TheEvent, [Values]) SELECT 'finished',@Values --use the XML Shred trick INSERT INTO @log (TheEvent, [Values]) SELECT 'Attribute-based XML (@)',@Values INSERT INTO @ByteBucket5 (TheNumeral) SELECT x.y.value('.','int') FROM @AttributeBasedXMLList.nodes('root/y/@x') AS x(y) INSERT INTO @log (TheEvent, [Values]) SELECT 'finished',@Values SELECT @Values=@Values+100 END
--Yes, we need to check that they all agree! SELECT COUNT(*), SUM(TheNumeral) FROM @ByteBucket UNION ALL SELECT COUNT(*), SUM(TheNumeral) FROM @ByteBucket2 UNION ALL SELECT COUNT(*), SUM(TheNumeral) FROM @ByteBucket3 UNION ALL SELECT COUNT(*), SUM(TheNumeral) FROM @ByteBucket4 UNION ALL SELECT COUNT(*), SUM(TheNumeral) FROM @ByteBucket5 UNION ALL SELECT COUNT(*), SUM(TheNumeral) FROM @ByteBucket6
SELECT TheStart.[Values], MAX(CASE WHEN TheStart.TheeVent ='Using Split function' THEN DATEDIFF( ms, TheStart.DateAndTime, Theend.DateAndTime ) ELSE 0 END) AS [Using Split function], MAX(CASE WHEN TheStart.TheeVent ='Element-based XML' THEN DATEDIFF( ms, TheStart.DateAnddTime, Theend.DateAndTime ) ELSE 0 END) AS [Element-based XML], MAX(CASE WHEN TheStart.TheeVent ='Using Values script' THEN DATEDIFF( ms, TheStart.DateAndTime, Theend.DateAndTime ) ELSE 0 END) AS [Using Values script], MAX(CASE WHEN TheStart.TheeVent ='Attribute-based XML' THEN DATEDIFF( ms, TheStart.DateAndTime, Theend.DateAndTime ) ELSE 0 END) AS [XML Attributes],*/ MAX(CASE WHEN TheStart.TheeVent ='Element-based XML (text())' THEN DATEDIFF( ms, TheStart.DateAndTime, Theend.DateAndTime ) ELSE 0 END) AS [Element-based XML (text())], MAX(CASE WHEN TheStart.TheeVent ='Attribute-based XML (@)' THEN DATEDIFF( ms, TheStart.DateAndTime, Theend.DateAndTime ) ELSE 0 END) AS [Attribute-based XML (@)] FROM @log TheStart INNER JOIN @log Theend ON Theend.Log_Id = TheStart.Log_Id + 1 WHERE TheStart.TheEvent<>'finished' GROUP BY TheStart.[Values];
SELECT TheStart.[Values], MAX(CASE WHEN TheStart.TheeVent ='Using Split function' THEN theEnd.cpu-TheStart.cpu ELSE 0 END) AS [Using Split function], MAX(CASE WHEN TheStart.TheeVent ='Element-based XML' THEN theEnd.cpu-TheStart.cpu ELSE 0 END) AS [Element-based XML], MAX(CASE WHEN TheStart.TheeVent ='Using Values script' THEN theEnd.cpu-TheStart.cpu ELSE 0 END) AS [Using Values script], MAX(CASE WHEN TheStart.TheeVent ='Attribute-based XML' THEN theEnd.cpu-TheStart.cpu ELSE 0 END) AS [XML Attributes], MAX(CASE WHEN TheStart.TheeVent ='Element-based XML (text())' THEN theEnd.cpu-TheStart.cpu ELSE 0 END) AS [Element-based XML (text())], MAX(CASE WHEN TheStart.TheeVent ='Attribute-based XML (@)' THEN theEnd.cpu-TheStart.cpu ELSE 0 END) AS [Attribute-based XML (@)] FROM @log TheStart INNER JOIN @log Theend ON Theend.Log_Id = TheStart.Log_Id + 1 WHERE TheStart.TheEvent<> 'finished' GROUP BY TheStart.[Values];
|
-
Posted Monday, December 05, 2011 4:24 PM |
(Guest Editorial for the SysAdmin/ITPro Newsletter)
It's been many years
since I had to set up and maintain the network servers for a large office LAN; for it is an activity that requires youthful
optimism and a high tolerance towards frustration. Whatever the reason, network maintenance is a task I've been keen to
avoid ever since, even for my own office. Recently, however, a smell of burning recently alerted me to the possible
demise of my hoary Windows-based office domain server. I dreaded the pain of replacing it, but it croaked its last soon
afterwards and I was left with no choice.
After a couple of
frustrating false starts with a new Windows server and a NAS device, I got one of the new generation of
'Smart Storage' servers that
masquerade as a rather plump
external hard drive. I was amazed. Experience had led me to expect nothing much more than a file server, but as soon
as I was in the management interface, I had file sharing via an ADS domain, an LDAP, FTP, WebDAV, and NFS. There was
automated backup to a USB device, a Webserver (with MySQL and PHP5), VPN server, Firewall, Mail server with Webmail, and
a Print server with AirPrint. I could even set up SNMP and event notifications.
It took far longer to
test all this than to set it up, but two hours later, I realised with a start of surprise that it all worked fine. I
could access all my PDF documentation and manuals from a central resource and read them on the iPad directly just by
using WebDav and a cute iPad app. Linux machines were able to access all the files perfectly. I could print from Linux
and iOS too, as well as Windows. Of course, to get all this plus a good RAID10 array, in a rack system, costs four
figures, but it is a great solution for the average SME to put alongside a Windows Domain server installation
Then, I noticed the toys
which are more appropriate for home use. Because the particular brand I chose had a whole range going from home
fileserver up to enterprise-level NAS, it is stuffed with recreational stuff one wouldn’t want on a work-based LAN.
I could, of course, play my MP3s all from the network drive on my iPad. There was
also streaming of music. One can set up a DLNA media server for all sorts of media devices. I could even access
everything from Android devices. There was iTunes Service so that an iPhone/iPad could browse and play songs or Videos.
A’Smart Storage’ server box to support all this costs less than an iPhone.
It strikes me that these
smart storage devices are at least as radical as the iPad was. When I think of the many hours I've spent just setting up
an LDAP, or fiddling with a PHP installation, I weep. Now, any home network can be tooled up to a level that I never
aspired to in my geeky past. Sure, the
Turnkey
Linux packages are great, and
I admire the WebMatrix Windows application, but for those of us who just want the functionality
and integration without the hassle, surely the new generation of NAS 'Smart Storage' boxes from Synology, Cisco, Qnap or
Iomega is the way to go. I'm now looking at a pile of components, and building up the mental energy, to rebuild my local
SQL Server test installation. Let's hope it will be as simple!
|
-
Posted Thursday, October 27, 2011 9:06 PM |
It isn’t entirely a pleasant experience to publish an article only to have it described on Twitter
as ‘Horrible’, and to have it criticized on the MVP forum. When this
happened to me in the aftermath of publishing my article on Temporary tables recently, I was taken aback, because these
critics were experts whose views I respect. What was my crime? It was, I
think, to suggest that, despite the obvious quirks, it was best to use Table Variables as a first choice, and to use
local Temporary Tables if you hit problems due to these quirks, or if you were doing complex joins using a large number
of rows.
What are these quirks? Well, table variables have
advantages if they are used sensibly, but this requires some awareness by the developer about the potential hazards and
how to avoid them. You can be hit by a badly-performing join involving a table variable. Table Variables are a compromise, and this compromise doesn’t always work out well. Explicit indexes aren’t
allowed on Table Variables, so one cannot use covering indexes or non-unique indexes. The query optimizer has to make
assumptions about the data rather than using column distribution statistics when a table variable is involved in a join,
because there aren’t any column-based distribution statistics on a table variable. It assumes a reasonably even
distribution of data, and is likely to have little idea of the number of rows in the table variables that are involved
in queries. However complex the heuristics that are used might be in determining
the best way of executing a SQL query, and they most certainly are, the Query Optimizer is likely to fail occasionally
with table variables, under certain circumstances, and produce a Query Execution Plan that is frightful.
The experienced developer or DBA will be on the lookout for this sort of problem.
In this blog, I’ll be expanding on some of the tests I used when writing my article to illustrate
the quirks, and include a subsequent example supplied by Kevin Boles.
A simplified example.
We’ll start out by illustrating a simple example that shows some of these characteristics. We’ll
create two tables filled with random numbers and then see how many matches we get between the two tables. We’ll forget
indexes altogether for this example, and use heaps. We’ll try the same Join
with two table variables, two table variables with OPTION (RECOMPILE) in the JOIN clause, and with two temporary tables.

It is all a bit jerky because of the granularity of the timing that isn’t actually happening at the
millisecond level (I used DATETIME). However, you’ll see that the table variable is outperforming the local temporary
table up to 10,000 rows. Actually, even without a use of the OPTION
(RECOMPILE) hint, it is doing well. What happens when your table size increases?

The table variable is, from around 30,000 rows, locked into a very bad execution plan unless you
use OPTION (RECOMPILE) to provide the Query Analyser with a decent estimation of the size of the table. However, if it
has the OPTION (RECOMPILE), then it is smokin’. Well, up to 120,000 rows,
at least. It is performing better than a Temporary table, and in a good linear
fashion.
What about mixed table joins, where you are joining a temporary table to a table variable? You’d
probably expect that the query analyzer would throw up its hands and produce a bad execution plan as if it were a table
variable. After all, it knows nothing about the statistics in one of the tables so how could it do any better? Well, it
behaves as if it were doing a recompile. And an explicit recompile adds no value at all. (we just go up to 45000 rows
since we know the bigger picture now)

Now, if you were new to this, you might be tempted to start drawing conclusions. Beware! We’re
dealing with a very complex beast: the Query Optimizer. It can come up with surprises
What if we change the query very slightly to insert the results into a Table Variable? We change
nothing else and just measure the execution time of the statement as before.

Suddenly, the table variable isn’t
looking so much better, even taking into account the time involved in doing the
table insert.

OK, if you haven’t used OPTION
(RECOMPILE) then you’re toast. Otherwise, there isn’t much in it between the
Table variable and the temporary table. The table variable
is faster up to 8000 rows and then not much in it up to 100,000 rows. Past the
8000 row mark, we’ve lost the advantage of the table variable’s speed. Any general rule you may be formulating
has just gone for a walk. What we can conclude from this experiment is that if you join two table variables, and can’t
use constraints, you’re going to need that Option (RECOMPILE) hint.
Count Dracula and the Horror Join.
These tables of integers provide a rather unreal example, so let’s try a rather different example,
and get stuck into some implicit indexing, by using constraints.
What unusual words are contained in the book ‘Dracula’ by Bram Stoker?
Here we get a table of all the common words in the English language (60,387 of them) and put them
in a table. We put them in a Table Variable with the word as a primary key, a Table Variable Heap and a Table Variable with a primary key. We then take all the distinct words used in the book ‘Dracula’ (7,558 of
them). We then create a table variable and insert into it all those uncommon words that are in ‘Dracula’. i.e. all the
words in Dracula that aren’t matched in the list of common words. To do this we use a left outer join, where the
right-hand value is null.
The results show a huge variation, between the sublime and the gorblimey.
- If both tables contain a Primary Key on the columns we join on, and both are Table Variables, it took 33 Ms.
- If one table contains a Primary Key, and the other is a heap, and both are Table Variables, it took 46 Ms.
- If both Table Variables use a unique constraint, then the query takes 36 Ms.
- If neither table contains a Primary Key and both are Table Variables, it took 116383 Ms. Yes, nearly two minutes!!
- If both tables contain a Primary Key, one is a Table Variables and the other is a temporary table,
it took 113 Ms.
- If one table contains a Primary Key, and both are Temporary Tables, it took 56 Ms.
- If both tables are temporary tables and both have primary keys, it took 46 Ms.
Here we see table variables which are joined on their primary key again enjoying a slight performance
advantage over temporary tables. Where both tables are table variables and both are heaps, the query suddenly takes
nearly two minutes! So what if you have two heaps and you use option Recompile? If you take the rogue query and add the
hint, then suddenly, the query drops its time down to 76 Ms. If you add unique indexes, then you've done even better, down to half that time. Here are the text execution plans. So where have we got to? Without drilling down into
the minutiae of the execution plans we can begin to create a hypothesis. If you are using table variables, and your
tables are relatively small, they are faster than temporary tables, but as the number of rows increases
you need to do one of two things: either you need to have a primary key on the
column you are using to join on, or else you need to use option (RECOMPILE) If you try to execute a query that is a
join, and both tables are table variable heaps, you are asking for trouble, well- slow queries, unless you give the
table hint once the number of rows has risen past a point (30,000 in our first example, but this varies considerably
according to context).
Kevin’s Skew
In describing the table-size, I used the term ‘relatively small’. Kevin Boles produced an
interesting case where a single-row table variable produces a very poor execution plan when joined to a very, very
skewed table. In the original, pasted into my article as a comment,
a column consisted of 100000 rows
in which the key column was one number (1) . To this was added eight rows with sequential numbers up to 9. When this was joined to a single-tow Table Variable with a key of 2 it produced a bad plan.
This problem is unlikely to occur in real usage, and the Query Optimiser team probably never set up
a test for it. Actually, the skew can be slightly less extreme than Kevin made it. The following test showed that once the table had 54 sequential rows in the table, then it adopted exactly the same execution plan as for the
temporary table and then all was well.

Undeniably, real data does occasionally cause problems to the performance of joins in Table
Variables due to the extreme skew of the distribution. We've all experienced Perfectly Poisonous Table Variables in real
live data. As in Kevin’s example, indexes merely make matters worse, and the
OPTION (RECOMPILE) trick does nothing to help. In this case, there is no option but to use a temporary table.
However, one has to note that once the slight de-skew had taken place, then the
plans were identical across a huge range.
Conclusions
Where you need to hold intermediate results as part of a process, Table Variables offer a good
alternative to temporary tables when used wisely. They can perform faster than a temporary table when the number of rows
is not great. For some processing with huge tables, they can perform well when only a clustered index is required, and
when the nature of the processing makes an index seek very effective. Table
Variables are scoped to the batch or procedure and are unlikely to hang about in the TempDB when they are no longer
required. They require no explicit cleanup. Where the number of rows in the table is moderate, you can even use them in joins as ‘Heaps’, unindexed. Beware,
however, since, as the number of rows increase, joins on Table Variable heaps can easily become saddled by very poor
execution plans, and this must be cured either by adding constraints
(UNIQUE or PRIMARY KEY) or by adding the OPTION (RECOMPILE) hint if this is
impossible. Occasionally, the way that the data is distributed prevents the
efficient use of Table Variables, and this will require using a temporary table instead.
Tables Variables require some awareness by the developer about the potential hazards and how to
avoid them. If you are not prepared to do any performance monitoring of
your code or fine-tuning, and just want to pummel out stuff that ‘just runs’ without considering namby-pamby stuff such
as indexes, then stick to Temporary tables. If you are likely to slosh about large numbers of rows in temporary tables
without considering the niceties of processing just what is required and no more, then temporary tables provide a safer
and less fragile means-to-an-end for you.
|
-
Posted Thursday, September 29, 2011 3:59 AM |
Ah, SQL Pass is imminent; and all over the world, people are preparing their presentations. There is great satisfaction to be had through facing a specialist audience. For the rest of the year, even devoted friends and family tire quickly of hearing your breathless account of the inner workings of execution plans and distribution statistics; but here the audience hang on your every word, if you've pitched it right. Mind you, getting it wrong carries with it a great deal of pain. As you hold forth strongly on what you imagine to be a really hot subject, such as virtualization, suddenly hundreds of faces are illuminated in front of you as laptops are opened by bored members of the audience, who start checking their mail or updating their Facebook pages. You struggle on, and they start Twittering, damn them. I remember more than one occasion where the presenter, stung by the twittered criticisms, twittered back defiantly. Admirable spirit, I reckon.
There are, of course, occasions when the audience has some justification for feeling a bit cheated. One famous SQL expert accidentally left his mobile phone on and had a family member ring him half way through. In his panic, he answered it without backing away from the microphone, saying, "Not now, dear, I'm doing a presentation at SQLPass". This is a line I have cherished ever since. A tense conversation ensued, and the audience was agog to hear both sides. Eventually, the caller was soothed enough to hang up and we carried on with the presentation. However, five minutes later, the phone rang again and was again answered! .
Actually, the presentation was made more memorable by the intermission, and was subsequently listened to intently by the audience. I must admit I thought it was a gag and, afterwards, I put it to the presenter that what we'd witnessed was a trick to engage the audience and induce them to turn off their laptops and listen. Nobody forgets to switch their phone off twice surely? Truth, he assured me, was stranger than fiction; it was a brand new phone, bought that morning as a replacement for a broken one. He hadn't had time to learn which button did what..
All sorts of things can go wrong in a presentation; technology grenades, power gets cut, fire sprinklers go off, members of the audience get taken ill, police raid (wrong address). The seasoned presenter will be prepared for almost anything, with strategies for coping with any predictable turn of events..
When thinking of presentations that can go wrong, I'm always reminded of the time that the great poet and author, Hugh Iztu, got invited to China for a university symposium that included a celebration of his creative work. Naturally, I've disguised or changed the details, and it isn't his real name..
Hugh is a celebrated writer, one of the last of the 'beat' poets. He'd led an innocent life, gradually descending from those heady, reckless days of post-war rebellion into a cosseted life as a lecturer in a leafy Colorado university, lauded by intense-looking students wearing berets and dark glasses. He was able to savor those wonderful years when he hung out with Dylan and Joan Baez, and went carousing with Elvis Presley. His first books were a magical invocation of an angst-fuelled adolescence, a discovery of both rock and roll and sex, and the giddy cultural revolution of sixties San Francisco. Now a plump and dignified academic, with a string of acclaimed books that never really made it to the mainstream, he was surprised to receive an invitation to speak in China about his life and work, as part of a symposium about American art and culture. It was a vast venue, and the fees they wanted to pay seemed astronomical.
When the plane landed, a large number of people met him. Some of them could speak perfect English, and knew his writings far better than he did. All of them were keen to shake his hand and express how privileged they were to meet him. For the first time in his life, he felt he was a real celebrity. He was given a tour of the university and was amazed to find how rigorously the cultural development of post-war western culture, and its many twists and turns, had been studied, documented and understood.
After a day of meetings with editors, film directors, intellectuals, and well-known Chinese artists and authors, he left quietly and was taken to his luxury hotel. The following morning, all was quiet. His hosts assumed that he wished to meditate before the headline talk he would give at the city's stadium in the afternoon.
As is usual in modern international hotels, the blues quickly descend. Left alone, Hugh was suddenly struck by the immensity of the task before him; to engage that vast audience, and to transmit to them the emotional intensity of the beat era, the camaraderie with the Nashville set, and his subsequent elevation into a literary icon of the Sixties West-coast.
At that point he spied the drinks cabinet and remembered his host's cheery invitation to help himself. He did, and each glass increased his confidence. When the car arrived to take him to the stadium, he was surprised by the difficulty he had in negotiating the stairs. Inside his brain, the lights were slowly going out. The only bright glow was his alcohol-fuelled confidence.
At the packed venue, a huge screen flashed photos of him, young and fit, writing lyrics with Dylan, or speaking with sixties gurus. Even younger versions of himself appeared with Mae Boren, Col Parker and Elvis, or doing public readings in a baggy jersey with Ginsburg and Ferlinghetti.
As he shook hands and exchanged greetings with a queue of dignitaries, as he was guided towards the stage, the whiskey finally began to anesthetize the cerebral cortex, and Hugh was dimly aware that he had perhaps been unwise; he had almost lost the capacity to string two meaningful words together. An intellectual retrospective, a broad account of a lifetime's poetic work, a glimpse of the artist at work; these were all now out of the question. With his last lucid moments, he started to grasp about in his memory, like a drowning man reaching vainly for a lifebelt, and then, finally, grasped something solid from the recesses of his memory.
The audience saw Hugh's small figure appear on the distant stage, suddenly magnified by the huge screens behind him and leaned forward expectantly to hear his words. Here, they thought, was something they'd be able to tell their grandchildren in years to come. Hugh went up to the microphone, swayed uncertainly and blinked at the vast sea of heads in front of him. From some entrenched resource, deep in the cerebellum, Hugh wrenched his Elvis impression, one that had entertained generations of students.
"Well, since my baby left me.
I found a new place to dwell
It's down at the end of lonely street
At heartbreak hotel…"
It is a useful fact about the human brain that, long after the capacity for logical thought is neutralized by alcohol, one can still do impressions. So engrossed had he become that he had no thought but to make it good. As he did it, the energy and excitement of his time hanging out with the Memphis beats and Rock artists came flooding back.
The audience was stunned. They gaped at the huge figure on the screens as he pitched into the song. They were expecting a rather boring lecture full of worthy thoughts about literature and poetry. Instead, one star of western culture had somehow morphed into another, and here was a brilliant poet who had transcended language, with a piece of theatre that expressed, in an immediate way, that which mere language could not; the broad sweep of post-war western culture that had been denied to them for so long.
According to the account I heard, no visiting celebrity had ever received applause like Hugh received when the adrenaline finally ran out and he slumped semi-conscious into his seat. Tears of emotion ran down the cheeks of even the coldest intellectuals. It had tapped straight into something deep in the psyche of the newly-liberated intellectual life of China.
The moral? With public speaking, whatever the emergency, there is always a way of pulling splendid victory from the jaws of defeat, but you may, like Hugh, need to dig deep to find it.
|
-
Posted Friday, June 10, 2011 1:29 PM |
Guest Editorial for Simple-Talk Newsletter... in which Phil Factor reacts with some exasperation when coming across a report that a majority of companies were still using financial and personal data for both developing and testing database applications. If you routinely test your development work using real production data that contains personal or financial information, you are probably being irresponsible, and at worst, risking a heavy financial penalty for your company. Surprisingly, over 80% of financial companies still do this. Plenty of data breaches and fraud have happened from the use of real data for testing, and a data breach is a nightmare for any organisation that suffers one. The cost of each data breach averages out at around $7.2 million in the US in notification, escalation, credit monitoring, fines, litigation, legal costs, and lost business due to customer churn, £1.9 million in the UK. 70% of data breaches are done from within the organisation.
Real data can be exploited in a number of ways for malicious or criminal purposes. It isn't just the obvious use of items such as name and address, date of birth, social security number, and credit card and bank account numbers: Data can be exploited in many subtle ways, so there are excellent reasons to ensure that a high priority is given to the detection and prevention of any data breaches. You'll never successfully guess all the ways that real data can be exploited maliciously, or the ease with which it can be accessed.
It would be silly to argue that developers never need access to a copy of the database containing live data. Developers sometimes need to track a bug that can only be replicated on the data from the live database. However, it has to be done in a very restrictive harness. The law makes no distinction between development and production databases when a data breach occurs, so the data has to be held with all appropriate security measures in place. In Europe, the use of personal data for testing requires the explicit consent of the people whose data is being held. There are federal standards such as GLBA, PCI DSS and HIPAA, and most US States have privacy legislation. The task of ensuring compliance and tight security in such circumstances is an expensive and time-consuming overhead. The developer is likely to suffer investigation if a data breach occurs, even if the company manages to stay in business.
Ironically, the use of copies of live data isn't usually the most effective way to develop or test your data. Data is usually time-specific and isn't usually current by the time it is used for testing, Existing data doesn't help much for new functionality, and every time the data is refreshed from production, any test data is likely to be overwritten. Also, it is not always going to test all the 'edge' conditions that are likely to flush out bugs. You still have the task of simulating the dynamics of actual usage of the database, and here you have no alternative to creating 'spoofed' data.
Because of the complexities of relational data, It used to be that there was no realistic alternative to developing and testing with live data. However, this is no longer the case. Real data can be obfuscated, or it can be created entirely from scratch. The latter process used to be impractical, now that there are plenty of third-party tools to choose from. The process of obfuscation isn't risk free. The process must access the live data, and the success of the obfuscation process has to be carefully monitored.
Database data security isn't an exciting topic to you or I, but to a hacker it can be an all-consuming obsession, especially if there is financial or political gain involved. This is not the sort of adversary one would wish for and it is far better to accept, and work with, security restrictions that exist for using live data in database development work, especially when the tools exist to create large realistic database test data that can be better for several aspects of testing.
|
-
Posted Wednesday, June 08, 2011 4:11 PM |
Guest Editorial for Simple-Talk IT Pro newsletter'DBAs and SysAdmins generally prefer an expression of calmness under
adversity. It is a subtle trick, and requires practice in front of a
mirror to get it just right. Too much adversity and they think you're
not coping; too much calmness and they think you're under-employed'
I dislike the term 'avatar', when used to describe a portrait photograph. An avatar, in the sense of a picture, is merely the depiction of one's role-play alter-ego, often a ridiculous bronze-age deity. However, professional image is important. The choice and creation of online photos has an effect on the way your message is received and it is important to get that right. It is fine to use that photo of you after ten lagers on holiday in an Ibiza nightclub, but what works on Facebook looks hilarious on LinkedIn.
My splendid photograph that I use online was done by a professional photographer at great expense and I've never had the slightest twinge of regret when I remember how much I paid for it. It is me, but a more pensive and dignified edition, oozing trust and wisdom. One gasps at the magical skill that a professional photographer can conjure up, without digital manipulation, to make the best of a derisory noggin (ed: slang for a head). Even if he had offered to depict me as a semi-naked, muscle-bound, sword-wielding hero, I'd have demurred. No, any professional person needs a carefully cultivated image that looks right. I'd never thought of using that profile shot, though I couldn't help noticing the photographer flinch slightly when he first caught sight of my face.
There is a problem with using an avatar. The use of a single image doesn't express the appropriate emotion. At the moment, it is weird to see someone with a laughing portrait writing something solemn. A neutral cast to the face, somewhat like a passport photo, is probably the best compromise.
Actually, the same is true of a working life in IT. One of the first skills I learned was not to laugh at managers, but, instead, to develop a facial expression that promoted a sense of keenness, energy and respect. Every profession has its own preferred facial cast. A neighbour of mine has the natural gift of a face that displays barely repressed grief. Though he is characteristically cheerful, he earns a remarkable income as a pallbearer. DBAs and SysAdmins generally prefer an expression of calmness under adversity. It is a subtle trick, and requires practice in front of a mirror to get it just right. Too much adversity and they think you're not coping; too much calmness and they think you're under-employed.
With an appropriate avatar, you could do away with a lot of the need for 'smilies' to give clues as to the meaning of what you've written on forums and blogs. If you had a set of avatars, showing the full gamut of human emotions expressible in writing: Rage, fear, reproach, joy, ebullience, apprehension, exasperation, dissembly, irony, pathos, euphoria, remorse and so on. It would be quite a drop-down list on forums, but given the vast prairies of space on the average hard drive, who cares? It would cut down on the number of spats in Forums just as long as one picks the right avatar.
As an unreconstructed geek, I find it hard to admit to the value of image in the workplace, but it is true. Just as we use professionals to tidy up and order our CVs and job applications, we should employ experts to enhance our professional image. After all you don't perform surgery or dentistry on yourself do you?
|
-
Posted Friday, May 27, 2011 1:20 AM |
Guest Editorial for Simple-Talk newsletterIn a guest editorial for the Simple-Talk Newsletter, Phil Factor wonders if we are still likely to find some more novel and unexpected ways of using the newer features of Transact SQL: or maybe in some features that have always been there!
There can be a great deal of fun to be had in trying out
recent features of SQL Expressions to see if
they provide new functionality.
It is surprisingly rare to find things that couldn’t be done before, but
in a different and more cumbersome way;
but it is great to experiment or to read of someone else making that
discovery. One such recent feature is the
‘table value
constructor’, or ‘VALUES constructor’, that
managed to get into SQL Server 2008 from Standard SQL. This allows you to create derived tables of
up to 1000 rows neatly within select statements that consist of lists of row values. E.g.
SELECT Old_Welsh, number FROM (VALUES ('Un',1),('Dou',2),('Tri',3),('Petuar',4),('Pimp',5),('Chwech',6),('Seith',7),('Wyth',8),('Nau',9),('Dec',10)) AS WelshWordsToTen (Old_Welsh, number)
These values can be expressions that return single values,
including, surprisingly, subqueries. You can use this device to create views, or
in the USING clause of a MERGE statement. Joe Celko covered this here
and here. It can become extraordinarily handy to use
once one gets into the way of thinking in these terms, and I’ve rewritten a lot
of routines to use the constructor, but the old way of using UNION can be used
the same way, but is a little slower and more long-winded.
The use of scalar SQL subqueries as an expression in a
VALUES constructor, and then applied to a MERGE, has got me thinking. It looks
very clever, but what use could one put it to? I haven’t seen anything yet that
couldn’t be done almost as simply in SQL
Server 2000, but I’m hopeful that someone will come up with a way of solving a
tricky problem, just in the same way that a
freak of the XML syntax forever made the in-line production of delimited lists from an
expression easy, or that a weird
XML pirouette could do an elegant
pivot-table rotation.
It is in this sort of experimentation where the community of
users can make a real contribution. The dissemination of techniques such as the
Number, or Tally table, or the unconventional ways that the UPDATE statement
can be used, has been rapid due to articles and blogs. However, there is plenty
to be done to explore some of the less obvious features of Transact SQL. Even
some of the features introduced into SQL Server 2000 are hardly well-known.
Certain operations on data are still awkward to perform in
Transact SQL, but we mustn’t, I think, be too ready to state that certain
things can only be done in the application layer, or using a CLR routine. With
the vast array of features in the product, and with the tools that surround it,
I feel that there is generally a way of getting tricky things done. Or should
we just stick to our lasts and push anything difficult out into procedural
code? I’d love to know your views.
|
-
Posted Thursday, April 28, 2011 3:16 PM |
A guest editorial for Simple TalkThe greatest struggle for anyone designing, building or maintaining applications is the fight against complexity. It chokes the lifeblood from any application. Chuck Moore, one of the founding fathers of IT recently said, in an interview with Simple-Talk:
“Complexity is the problem. Moving it from hardware to software, or vice versa, doesn’t help. Simplicity is the only answer. … I despair. Technology, and our very civilization, will get more and more complex until it collapses. There is no opposing pressure to limit this growth.”
This is not the howl of a lone wolf. Brian Kernighan once wrote, “Controlling complexity is the essence of computer programming.” And Ray Ozzie too, “Complexity kills. It sucks the life out of developers, it makes products difficult to plan, build and test, it introduces security challenges and it causes end-user and administrator frustration.” Probably the most memorable take on the problems of software complexity comes from Tony Hoare. “There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.”
A long time back, commercial-scale applications became too complex for mortals to track without using complex application-management applications. Project and configuration management systems were joined by bug-tracking systems, build management, automated testing, continuous integration and release management. With tools like these, we could all cope with more complexity, and run larger software teams. Integration was ever the issue, especially with the temptation to include the entire life-cycle of an application, including the architectural, deployment, and production-management phases; this is particularly true when sourcing your tools from different vendors. Within the past five years, Application Lifecycle Management (ALM) tools, such as Microsoft’s TFS, HP’s ALM Suite and IBM Rational Team Concert have promised the obvious solution: accepting a degree of single-vendor lock-in in return for a single integrated set of tools in which application providers can manage all aspects of their work.
This increasing automation is well-suited to promote ‘Agile’ methodologies, since it allows much shorter build cycles and provides a much closer cooperation between developers and testers. It certainly promotes the flow of information and makes it easier to spot potential problems.
My own nervousness comes partly from thinking of those ground-breaking tools such as distributed source control, software testing frameworks or development wikis that would never have penetrated through wall-to-wall management suites based on less open-ended ideas. I also wince at the subtle problems that would ensue from integrating cross-application databases into an ALM. Most of all, I wonder if there is now even less motivation to tackle the root-cause of the increasing inflexibility of application management, the uncontrolled complexity of application. Now that we can throw more silicon at corpulent software so cheaply, and use applications to manage the complexity of applications, where is the cost-benefit of keeping things as simple as possible?
Your views would be very welcome, as always.
|
-
Posted Monday, April 04, 2011 2:28 AM |
A database chum recently told me that he was giving up the drudgery of being a front-line DBA in order to become a database consultant. My face contorted in the rictus of polite grief. ‘That’s very sad for you; I’m so sorry that you are being ‘transitioned’’. There is an established set of advice given out by HR people when they hand people their cards. I was about to quote from it such gems as “Care for yourself by eating well, exercising, and resting when needed. Avoid stimulants (such as caffeine, chocolate and nicotine) and depressants (such as alcohol)”. He then surprised me by protesting that he had resigned, and he was looking forward to his new career.
Why my concern? Surely, this is a glorious career progression?
I should start by defining what a database consultant is. Nowadays, the process of job-title inflation has progressed to the extent that even my barber calls himself a Hair management consultant. There was a time that ‘Consultant’ meant a professional person who provides advice and assistance to his or her colleagues in a particular area of expertise. Sounds great doesn't it?
No real IT consultancy is as glamorous a job as you might imagine, though. Many a child has watched Superman, Emergency Ward 10 or the Lone Ranger, and dreamed that, once they are grown-up , they would sweep into the site of a crisis and make things right. The life of a database consultant would seem irresistible.
Tonto: “Well Kemo Sabe, you sure told them of the consequences of using the GUID datatype for a primary key didn’t you?”
The Lone Ranger: (Solemnly patting his trusty white stallion) “Sometimes, a man has gotta do what a man has gotta do. ”
(At this, the Lone Ranger leapt on the saddle and departed, shouting,) "Hi-yo, Silver! Away!"
DBAs and Developers (in wonderment): "Who was that masked man anyway?"
Like Little Red Riding Hood, the simple moral tales contained in such epics as The Fugitive, the cowboy B movie, Spiderman or Star Trek make few nods to reality. I have to admit that there is, for example, fun to be had in discovering that a production database has its MDF file on a network drive, but generally the life of a consultant isn’t a shoot-em-up. You will have an urge to whoop with joy, gloat about how easily you found the problem, and then cast aspersions on the intelligence of the person who did such a thing. Don’t do it, since your actions will have catastrophic consequences to someone, probably yourself. A cast to the face much like the solemnity of the undertaker is required. Laughing at the footling nature of the errors you identify is as tasteless as laughing at the predicament of the late departed lamented loved ones.
The other misconception of a consultant’s role is that he needs extraordinary intelligence or knowledge. In reality, any form of smartness is a handicap to the job. I once pointed out to the owner of an exceptionally successful IT consultancy that his foot soldiers dressed impeccably in charcoal suits and black shoes, and could give faultless Powerpoint presentations, but they were completely lacking in any understanding of IT, and proceeded to tell their clients what they knew already whilst charging exorbitantly for the privilege. He smiled bashfully and proudly, thanking me for the compliment, saying that this was by far the best way of doing it: the client got a consistent quality of service, and he never had to deal with prima-donnas who became a nuisance by threatening to leave unless they had a pay rise. His consultants were never tempted to irritate the client by bragging about their knowledge and expertise, since they didn’t have any, and, more to the point, consultants were always told what the technical problems were, and they just had to tackle the over-riding one: that the organisation was too gridlocked by vested interests and fragmentation to be able to decide how to tackle the technical issues: What they wanted was a smooth and socially-aware arbitrator to un-freeze the decision-making process without ruffling feathers. If they needed technical support, the consultancy always plenty of geeks back at base to help over the phone. These geeks were never let anywhere near a customer site, since they were mostly entirely socially-inept, bad-tempered and quarrelsome. More to the point, his clients didn’t want heroes.
There is some truth in what he said. For a start, everyone generally knows what the problem is before you get there, but someone from outside the organisation must tactfully arbitrate amongst the various factional interests within the organisation to put matters right. All you really need to do is to interview all the people who are involved with the problem, write a learned-sounding report that identifies how to put it right, and who should do it, whilst tactfully ascribing the blame to fickle fate, or a contractor. Nobody will then shake their heads in wonderment, asking ‘Who was that masked man anyway?’ Once your task is done, the curse is placed on you, and you must leave for the next job.
What is this curse I mention? It is the curse upon the bearer of bad news. If you steer too close to the truth, you will inevitably make parts of the organisation look ridiculous. It doesn’t make good commercial sense to make your client look ridiculous. It is difficult to reach such a state of conflict over technical issues that warrant a consultant’s arbitration without some closet foolishness within the management. In order to retain the client, you have to borrow techniques from psychotherapy in order to rebuild the self-esteem of the management. My technique is to drop in phrases such as ‘It is very difficult for someone within an organisation to get a broad perspective on their technical decision-making process’, or ‘It was a bold and perceptive decision to choose a NoSQL database, which could easily have paid dividends’.
Whenever I have made a good relationship with a client, it has been because I’ve taken the time and energy to work out how they came to find themselves in the predicament that precipitated the crisis. This can help to avoid all sorts of embarrassment. I remember one guy who started out work as a consultant who was carried out of the headquarters building of a major computer company, struggling vainly between two security guards, legs thrashing impotently in mid-air. I suspect that he’d read too many Marvel comics as a youth, and bought into the dream of the underwear-over-the-trousers approach to consultancy. When he’d come across a ridiculous-looking technical problem in the first week, he’d expostulated to the CTO ‘Who the hell was the daft idiot who did this?’ ‘Me, actually’, replied the CTO, reaching for the phone.
If you work full-time as a consultant, you will eventually lose the edge of your technical skills. Technical excellence is hard-won through struggle with recalcitrant technology under pressure. It can’t be picked up purely from reading. Most of us retain, especially, the knowledge that is gained through the sweat of purposeful activity. To work as a consultant, you are on a slow and easy downhill descent, with your feet off the pedals.
Most professions have been able to provide career structures that allow for technical excellence to be rewarded and encouraged. Information Technology hasn’t yet come to grips fully with the idea of offering a good technical career path, despite the blueprint offered in the brilliant, and well-reasoned book ‘The Mythical Man-Month: Essays on Software Engineering’ by Fred Brooks. Until that time, we’re in a quandary. Unless we, who enjoy doing technical stuff , are able to swim against the tide, we are faced with either playing career ‘snakes-and-ladders’ in IT management, or hitting the road as an IT Consultant.
|
-
Posted Monday, February 14, 2011 12:22 AM |
The senior corporate dignitaries settled into their seats looking important in a blue-suited sort
of way. The lights dimmed as I strode out in front to give my presentation. I
had ten vital minutes to make my pitch. I was about to dazzle the top management
of a large software company who were considering the purchase of my software product. I would present them with a
dazzling synthesis of diagrams, graphs, followed by a live demonstration of my
software projected from my laptop. My preparation had been meticulous: It had to be: A year’s hard
work was at stake, so I’d prepared it to perfection. I stood up and took them all
in, with a gaze of sublime confidence.
Then the laptop expired.
There are several possible alternative plans of action when this happens
A. Stare at the smoking laptop vacuously, flapping ones mouth slowly up and down
B. Stand frozen like a statue, locked in indecision between fright and flight.
C. Run out of the room, weeping
D. Pretend that this was all planned
E. Abandon the presentation in favour of a stilted and tedious dissertation about the
software
F. Shake your fist at the sky, and curse the sense of humour of your preferred deity
I started for a few seconds on plan B, normally referred to as the ‘Rabbit in the headlamps of the
car’ technique. Suddenly, a little voice inside my head spoke. It spoke the famous inane words of Yogi Berra; ‘The
game isn't over until it's over.’ ‘Too right’, I thought.
What to do? I ran through the alternatives A-F inclusive in my mind but none appealed to me.
I was completely unprepared for this. Nowadays, longevity has since taught me more than I wanted to
know about the wacky sense of humour of fate, and I would have taken two laptops. I hadn’t, but decided to do the
presentation anyway as planned. I started out ignoring the dead laptop, but pretending, instead that it was still
working. The audience looked startled. They were expecting plan B to be succeeded by plan C, I suspect. They weren’t
used to denial on this scale.
After my introductory talk, which didn’t require any visuals, I came to the diagram that described
the application I’d written. I’d taken ages over it and it was hot stuff.
Well, it would have been had it been projected onto the screen. It wasn’t.
Before I describe what happened then, I must explain that I have thespian tendencies.
My triumph as Professor Higgins in My Fair Lady at the local operatic
society is now long forgotten, but I remember at the time of my finest performance, the moment that, glancing up over
the vast audience of moist-eyed faces at the during the poignant scene between Eliza and Higgins at
the end, I realised that I had a talent that one day could possibly
be harnessed for commercial use
I just talked about the diagram as if it was there, but throwing in some extra description. The
audience nodded helpfully when I’d done enough.
Emboldened, I began a sort of mime, well, more of a ballet, to represent each slide as I came to
it. Heaven knows I’d done my preparation and, in my mind’s eye, I could see every detail, but I had to somehow project
the reality of that vision to the audience, much the same way any actor playing Macbeth should do the ghost of Banquo.
My desperation gave me a manic energy. If you’ve ever
demonstrated a windows application entirely by mime, gesture and florid description, you’ll understand the scale of the
challenge, but then I had nothing to lose. With a brief sentence of description here and there, and arms flailing whilst outlining the size and shape of graphs
and diagrams, I used the many tricks of mime, gesture and body-language
learned from playing Captain Hook, or the Sheriff of Nottingham in pantomime. I
set out determinedly on my desperate venture.
There wasn’t time to do anything but focus on the challenge of the task: the world around me
narrowed down to ten faces and my presentation: ten souls who had to be hypnotized into seeing a Windows application:
one that was slick, well organized and functional
I don’t remember the details. Eight minutes of my life are gone completely. I was a thespian
berserker. I know however that I followed the basic plan of building the
presentation in a carefully controlled crescendo until the dazzling finale where the results were displayed on-screen.
‘And here you see the results, neatly formatted and grouped carefully to enhance the significance of the figures,
together with running trend-graphs!’ I waved a mime to signify an animated window-opening,
and looked up, in my first pause, to gaze defiantly at the audience.
It was a sight I’ll never forget. Ten pairs of eyes were gazing in rapt attention
at the imaginary window, and several pairs of eyes were glancing at the imaginary graphs and figures.
I hadn’t had an audience like that since my starring role in Beauty and
the Beast. At that moment, I realized that my desperate ploy might work.
I sat down, slightly winded, when my ten minutes were up.
For the first and last time in my life, the audience of a ‘PowerPoint’
presentation burst into spontaneous applause.
‘Any questions?’
‘Yes, Have you got an agent?’
Yes, in case you’re wondering, I got the deal. They bought the software product from me there and
then. However, it was a life-changing experience for me and I have never ever again trusted technology as part of a
presentation. Even if things can’t go wrong, they’ll go wrong and they’ll
kill the flow of what you’re presenting. if you can’t do something without
the techno-props, then you shouldn’t do it. The greatest lesson of all is that
great presentations require preparation and ‘stage-presence’ rather than
fancy graphics. They’re a great supporting aid, but they should never dominate to the point that you’re lost without
them.
|
-
Posted Thursday, January 13, 2011 6:34 AM |
In SQL Server, probably the best-known 'broken' function is poor ISNUMERIC() . The documentation says 'ISNUMERIC returns 1 when the input expression evaluates to a valid
numeric data type; otherwise it returns 0. ISNUMERIC returns 1 for some characters that are not numbers, such as
plus (+), minus (-), and valid currency symbols such as the dollar sign
($).'
Although it will take numeric data types (No, I don't understand why either), its main use is supposed to be to test strings to make sure that you can convert them to whatever numeric datatype you are using (int, numeric, bigint, money, smallint, smallmoney, tinyint, float, decimal, or real). It wouldn't actually be of much use anyway, since each datatype has different rules. You actually need a RegEx to do a reasonably safe check. The other snag is that the IsNumeric() function is a bit broken. SELECT ISNUMERIC(',')
This cheerfully returns 1, since it believes that a comma is a currency symbol (not a thousands-separator) and you meant to say 0, in this strange currency. However, SELECT ISNUMERIC(N'₤') isn't recognized as currency. '+' and '-' is seen to be numeric, which is stretching it a bit. You'll see that what it allows isn't really broken except that it doesn't recognize Unicode currency symbols: It just tells you that one numeric type is likely to accept the string if you do an explicit conversion to it using the string. Both these work fine, so poor IsNumeric has to follow suit.
SELECT CAST('0E0' AS FLOAT) SELECT CAST (',' AS MONEY)
but it is harder to predict which data type will accept a '+' sign. SELECT CAST ('+' AS money) --0.00
SELECT CAST ('+' AS INT) --0
SELECT CAST ('+' AS numeric) /* Msg 8115, Level 16, State 6, Line 4 Arithmetic overflow error converting varchar to data type numeric.*/
SELECT CAST ('+' AS FLOAT) /*Msg 8114, Level 16, State 5, Line 5 Error converting data type varchar to float. */>
So we can begin to say that the maybe IsNumeric isn't really broken, but is answering a silly question 'Is there some numeric datatype to which i can convert this string? Almost, but not quite. The bug is that it doesn't understand Unicode currency characters such as the euro or franc which are actually valid when used in the CAST function. (perhaps they're delaying fixing the euro bug just in case it isn't necessary). SELECT ISNUMERIC (N'₣23.67') --0 SELECT CAST (N'₣23.67' AS money) --23.67 SELECT ISNUMERIC (N'£100.20') --1 SELECT CAST (N'£100.20' AS money) --100.20
Also the CAST function itself is quirky in that it cannot convert perfectly reasonable string-representations of integers into integers SELECT ISNUMERIC('200,000') --1 SELECT CAST ('200,000' AS INT) --0 /*Msg 245, Level 16, State 1, Line 2 Conversion failed when converting the varchar value '200,000' to data type int. */ A more sensible question is 'Is this an integer or decimal number'. This cuts out a lot of the apparent quirkiness. We do this by the '+E0' trick. If we want to include floats in the check, we'll need to make it a bit more complicated. Here is a small test-rig.
SELECT PossibleNumber, ISNUMERIC(CAST(PossibleNumber AS NVARCHAR(20)) + 'E+00') AS Hack, ISNUMERIC (PossibleNumber + CASE WHEN PossibleNumber LIKE '%E%' THEN '' ELSE 'E+00' END) AS Hackier, ISNUMERIC(PossibleNumber) AS RawIsNumeric FROM (SELECT CAST(',' AS NVARCHAR(10)) AS PossibleNumber UNION SELECT '£' UNION SELECT '.' UNION SELECT '56' UNION SELECT '456.67890' UNION SELECT '0E0' UNION SELECT '-' UNION SELECT '-' UNION SELECT '.' UNION SELECT N'₵' UNION SELECT N'¢' UNION SELECT N'₠' UNION SELECT N'₠34.56' UNION SELECT '-345' UNION SELECT '3.332228E+09') AS examples
Which gives the result ...
PossibleNumber Hack Hackier RawIsNumeric -------------- ----------- ----------- ------------ ₵ 0 0 0 - 0 0 1 , 0 0 1 . 0 0 1 ¢ 0 0 1 £ 0 0 1 ₠ 0 0 0 ₠34.56 0 0 0 0E0 0 1 1 3.332228E+09 0 1 1 -345 1 1 1 456.67890 1 1 1 56 1 1 1
I suspect that this is as far as you'll get before you abandon IsNumeric in favour of a regex.
You can only get part of the way with the LIKE wildcards, because you cannot specify quantifiers. You'll need full-blown Regex strings like these
.. [-+]?\b[0-9]+(\.[0-9]+)?\b #INT or REAL [-+]?\b[0-9]{1,3}\b #TINYINT [-+]?\b[0-9]{1,5}\b #SMALLINT
.. but you'll get even these to fail to catch numbers out of range. So is IsNumeric() an out and out rogue function? Not really, I'd say, but then it would need a damned good lawyer.
|
-
Posted Friday, December 10, 2010 1:01 AM |
It is an awkward feeling. You’ve just delivered a database application that
seems to be working fine in production, and you just run a few checks on it. You discover that there is a potential bug
that, out of sheer good chance, hasn’t kicked in to produce an error; but it lurks, like a smoking bomb. Worse, maybe
you find that the bug has started its evil work of corrupting the data, but in ways that nobody has, so far detected.
You investigate, and find the damage. You are somehow going to have to repair it.
Yes, it still very occasionally happens to me. It is not a nice feeling, and I
do anything I can to prevent it happening. That’s why I’m interested in SQL code smells. SQL Code Smells aren’t
necessarily bad practices, but just show you where to focus your attention when checking an application.
Sometimes with databases the bugs can be subtle. SQL is rather like HTML: the
language does its best to try to carry out your wishes, rather than to be picky about your bugs. Most of the time, this
is a great benefit, but not always. One particular place where this can be detrimental is where you have implicit conversion between different data types. Most of the time it is completely harmless but we’re
concerned about the occasional time it isn’t. Let’s give an example: String truncation. Let’s give another even
more frightening one, rounding errors on assignment to a number of different precision. Each requires a blog-post to
explain in detail and I’m not now going to try. Just remember that it is not always a good idea to assign data to
variables, parameters or even columns when they aren’t the same datatype, especially if you are relying on implicit
conversion to work its magic.For details of the problem and the consequences, see here: SR0014: Data loss might occur when casting from {Type1} to {Type2} . For any experienced Database Developer, this is a more frightening read than a Vampire Story.
This is why one of the SQL Code Smells that makes me edgy, in my own or other
peoples’ code, is to see parameters, variables and columns that have the same names and different datatypes. Whereas
quite a lot of this is perfectly normal and natural, you need to check in case one of two things have gone wrong. Either
sloppy naming, or mixed datatypes. Sure it is hard to remember whether you decided that the length of a log entry was 80
or 100 characters long, or the precision of a number. That is why a little check like this I’m going to show you is
excellent for tidying up your code before you check it back into source Control!
1/ Checking Parameters only
If you were just going to check parameters, you might just do this. It simply
groups all the parameters, either input or output, of all the routines (e.g. stored procedures or functions) by their
name and checks to see, in the HAVING clause, whether their data types are all the same. If not, it lists all the
examples and their origin (the routine)
Even this little check can occasionally be scarily revealing.
;WITH userParameter AS ( SELECT c.NAME AS ParameterName, OBJECT_SCHEMA_NAME(c.object_ID) + '.' + OBJECT_NAME(c.object_ID) AS ObjectName, t.name + ' ' + CASE --we may have to put in the length WHEN t.name IN ('char', 'varchar', 'nchar', 'nvarchar') THEN '(' + CASE WHEN c.max_length = -1 THEN 'MAX' ELSE CONVERT(VARCHAR(4), CASE WHEN t.name IN ('nchar', 'nvarchar') THEN c.max_length / 2 ELSE c.max_length END) END + ')' WHEN t.name IN ('decimal', 'numeric') THEN '(' + CONVERT(VARCHAR(4), c.precision) + ',' + CONVERT(VARCHAR(4), c.Scale) + ')' ELSE '' END --we've done with putting in the length + CASE WHEN XML_collection_ID <> 0 THEN --deal with object schema names '(' + CASE WHEN is_XML_Document = 1 THEN 'DOCUMENT ' ELSE 'CONTENT ' END + COALESCE( (SELECT QUOTENAME(ss.name) + '.' + QUOTENAME(sc.name) FROM sys.xml_schema_collections sc INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID WHERE sc.xml_collection_ID = c.XML_collection_ID),'NULL') + ')' ELSE '' END AS [DataType] FROM sys.parameters c INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys' AND parameter_id>0) SELECT CONVERT(CHAR(80),objectName+'.'+ParameterName),DataType FROM UserParameter WHERE ParameterName IN (SELECT ParameterName FROM UserParameter GROUP BY ParameterName HAVING MIN(Datatype)<>MAX(DataType)) ORDER BY ParameterName
so, in a very small example here, we have a @ClosingDelimiter variable that is only CHAR(1) when, by the looks of it, it should be up to ten characters long, or even worse, a function that should be a char(1) and seems to let in a string of ten characters. Worth investigating. Then we have a
@Comment variable that can't decide whether it is a VARCHAR(2000) or a VARCHAR(MAX)

2/ Columns and Parameters
Actually, once we’ve cleared up the mess we’ve made of our parameter-naming in
the database we’re inspecting, we’re going to be more interested in listing both columns and parameters. We can do this
by modifying the routine to list columns as well as parameters. Because of the slight complexity of creating the string
version of the datatypes, we will create a fake table of both columns and parameters so that they can both be processed
the same way. After all, we want the datatypes to match
Unfortunately, parameters do not expose all the attributes we are interested
in, such as whether they are nullable (oh yes, subtle bugs happen if this isn’t consistent for a datatype). We’ll have
to leave them out for this check.
Voila! A slight modification of the first routine
;WITH userObject AS ( SELECT Name AS DataName,--the actual name of the parameter or column ('@' removed) --and the qualified object name of the routine OBJECT_SCHEMA_NAME(ObjectID) + '.' + OBJECT_NAME(ObjectID) AS ObjectName, --now the harder bit: the definition of the datatype. TypeName + ' ' + CASE --we may have to put in the length. e.g. CHAR (10) WHEN TypeName IN ('char', 'varchar', 'nchar', 'nvarchar') THEN '(' + CASE WHEN MaxLength = -1 THEN 'MAX' ELSE CONVERT(VARCHAR(4), CASE WHEN TypeName IN ('nchar', 'nvarchar') THEN MaxLength / 2 ELSE MaxLength END) END + ')' WHEN TypeName IN ('decimal', 'numeric')--a BCD number! THEN '(' + CONVERT(VARCHAR(4), Precision) + ',' + CONVERT(VARCHAR(4), Scale) + ')' ELSE '' END --we've done with putting in the length + CASE WHEN XML_collection_ID <> 0 --tush tush. XML THEN --deal with object schema names '(' + CASE WHEN is_XML_Document = 1 THEN 'DOCUMENT ' ELSE 'CONTENT ' END + COALESCE( (SELECT TOP 1 QUOTENAME(ss.name) + '.' + QUOTENAME(sc.Name) FROM sys.xml_schema_collections sc INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID WHERE sc.xml_collection_ID = XML_collection_ID),'NULL') + ')' ELSE '' END AS [DataType], DataObjectType FROM (Select t.name AS TypeName, REPLACE(c.name,'@','') AS Name, c.max_length AS MaxLength, c.precision AS [Precision], c.scale AS [Scale], c.[Object_id] AS ObjectID, XML_collection_ID, is_XML_Document,'P' AS DataobjectType FROM sys.parameters c INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID AND parameter_id>0 UNION all Select t.name AS TypeName, c.name AS Name, c.max_length AS MaxLength, c.precision AS [Precision], c.scale AS [Scale], c.[Object_id] AS ObjectID, XML_collection_ID,is_XML_Document, 'C' AS DataobjectType FROM sys.columns c INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys' )f) SELECT CONVERT(CHAR(80),objectName+'.' + CASE WHEN DataobjectType ='P' THEN '@' ELSE '' END + DataName),DataType FROM UserObject WHERE DataName IN (SELECT DataName FROM UserObject GROUP BY DataName HAVING MIN(Datatype)<>MAX(DataType)) ORDER BY DataName Hmm. I can tell you I found quite a few minor issues with the various
tabases I tested this on, and found some potential bugs that really leap out at you from the results. Here is the
start of the result for AdventureWorks. Yes, AccountNumber is, for some reason, a Varchar(10) in the Customer table.
Hmm. odd. Why is a city fifty characters long in that view? The idea of the description of a colour being 256
characters long seems over-ambitious. Go down the list and you'll spot other mistakes. There are no bugs, but just mess.
We started out with a listing to examine parameters, then we mixed parameters
and columns. Our last listing is for a slightly more in-depth look at table columns. You’ll notice that we’ve
delibarately removed the indication of whether a column is persisted, or is an identity column because that gives us
false positives for our code smells. If you just want to browse your metadata for other reasons (and it can quite help
in some circumstances) then uncomment them!
;WITH userColumns AS ( SELECT c.NAME AS columnName, OBJECT_SCHEMA_NAME(c.object_ID) + '.' + OBJECT_NAME(c.object_ID) AS ObjectName, REPLACE(t.name + ' ' + CASE WHEN is_computed = 1 THEN ' AS ' + --do DDL for a computed column (SELECT definition FROM sys.computed_columns cc WHERE cc.object_id = c.object_id AND cc.column_ID = c.column_ID) --we may have to put in the length WHEN t.Name IN ('char', 'varchar', 'nchar', 'nvarchar') THEN '(' + CASE WHEN c.Max_Length = -1 THEN 'MAX' ELSE CONVERT(VARCHAR(4), CASE WHEN t.Name IN ('nchar', 'nvarchar') THEN c.Max_Length / 2 ELSE c.Max_Length END) END + ')' WHEN t.name IN ('decimal', 'numeric') THEN '(' + CONVERT(VARCHAR(4), c.precision) + ',' + CONVERT(VARCHAR(4), c.Scale) + ')' ELSE '' END + CASE WHEN c.is_rowguidcol = 1 THEN ' ROWGUIDCOL' ELSE '' END + CASE WHEN XML_collection_ID <> 0 THEN --deal with object schema names '(' + CASE WHEN is_XML_Document = 1 THEN 'DOCUMENT ' ELSE 'CONTENT ' END + COALESCE((SELECT QUOTENAME(ss.name) + '.' + QUOTENAME(sc.name) FROM sys.xml_schema_collections sc INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID WHERE sc.xml_collection_ID = c.XML_collection_ID), 'NULL') + ')' ELSE '' END + CASE WHEN is_identity = 1 THEN CASE WHEN OBJECTPROPERTY(object_id, 'IsUserTable') = 1 AND COLUMNPROPERTY(object_id, c.name, 'IsIDNotForRepl') = 0 AND OBJECTPROPERTY(object_id, 'IsMSShipped') = 0 THEN '' ELSE ' NOT FOR REPLICATION ' END ELSE '' END + CASE WHEN c.is_nullable = 0 THEN ' NOT NULL' ELSE ' NULL' END + CASE WHEN c.default_object_id <> 0 THEN ' DEFAULT ' + object_Definition(c.default_object_id) ELSE '' END + CASE WHEN c.collation_name IS NULL THEN '' WHEN c.collation_name <> (SELECT collation_name FROM sys.databases WHERE name = DB_NAME()) COLLATE Latin1_General_CI_AS THEN COALESCE(' COLLATE ' + c.collation_name, '') ELSE '' END,' ',' ') AS [DataType] FROM sys.columns c INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys'
) SELECT CONVERT(CHAR(80),objectName+'.'+columnName),DataType FROM UserColumns WHERE columnName IN (SELECT columnName FROM UserColumns
GROUP BY columnName
HAVING MIN(Datatype)<>MAX(DataType)) ORDER BY columnName
If you take a look down the results against Adventureworks, you'll see once again that there are things to
investigate, mostly, in the illustration, discrepancies between null and non-null datatypes

So I here you ask, what about temporary variables
within routines? If ever there was a source of elusive bugs, you'll find it there. Sadly, these temporary variables are
not stored in the metadata so we'll have to find a more subtle way of flushing these out, and that will, I'm afraid,
have to wait!
|
-
Posted Monday, November 22, 2010 12:16 AM |
Once you’ve done a number of SQL Code-reviews, you’ll know those signs in the code that all might
not be well. These ’Code Smells’ are coding styles that don’t
directly cause a bug, but are indicators that all is not well with the code. .
Kent Beck and Massimo Arnoldi seem to have coined the phrase in the "OnceAndOnlyOnce"
page of www.C2.com, where Kent also said that code "wants to be simple".
Bad Smells in Code was an essay by Kent Beck and
Martin Fowler, published as Chapter 3 of the book ‘Refactoring: Improving the
Design of Existing Code’ (ISBN 978-0201485677) Although there are generic code-smells, SQL has its own particular coding habits that will alert the programmer to the need to re-factor what has been written.
See
Exploring
Smelly Code and Code Deodorants for Code
Smells by Nick Harrison for a grounding in Code Smells in C#
I’ve always been tempted by the idea of automating a preliminary code-review for SQL. It would be so useful to trawl through code and pick up the various problems, much like the classic ‘Lint’ did
for C, and how the Code Metrics plug-in for .NET Reflector
by
Jonathan 'Peli' de
Halleux is used for finding Code Smells in .NET code. The problem is that few of the
standard procedural code smells are relevant to SQL, and we need an agreed list of code smells. Merrilll Aldrich made a
grand start last year in his blog
Top 10 T-SQL Code
Smells.However, I'd like to make a start by discovering if there is a general opinion amongst Database developers what the most important SQL Smells are.
One can be a bit defensive about code smells. I will cheerfully write very long stored procedures,
even though they are frowned on. I’ll use dynamic SQL occasionally. You can
only use them as an aid for your own judgment and it is fine to ‘sign them off’ as being appropriate in particular
circumstances. Also, whole classes of ‘code smells’ may be irrelevant for a
particular database. The use of proprietary SQL, for example, is only a
‘code smell’ if there is a chance that the database will have to be ported
to another RDBMS. The use of dynamic SQL is a risk only with certain security models. As the saying goes,
a CodeSmell is a hint of possible bad practice to a pragmatist, but a sure sign of bad practice to a purist.
Plamen Ratchev’s wonderful article
Ten Common SQL
Programming Mistakes lists some of these ‘code smells’ along with
out-and-out mistakes, but there are more. The use of nested transactions, for example, isn’t entirely incorrect, even
though the database engine ignores all but the outermost: but it does flag
up the possibility that the programmer thinks that nested transactions are supported.
If anything requires some sort of general agreement, the definition of code smells is one. I’m
therefore going to make this Blog ‘dynamic, in that, if anyone twitters a suggestion with a #SQLCodeSmells tag (or sends me a twitter) I’ll update the list
here. If you add a comment to the blog with a suggestion of what should be added or removed, I’ll do my best to oblige.
In other words, I’ll try to keep this blog up to date. The name against each 'smell' is the name of the person who
Twittered me, commented about or who has written about the 'smell'. it does not imply that they were the first ever to
think of the smell!
- Use of deprecated syntax such as *= - see
SR0010: Avoid using deprecated syntax when you join tables or views (Dave Howard)
- Denormalisation that requires the shredding of the contents
of columns. (Merrill Aldrich)
- Contrived interfaces
- Using VARCHAR and VARBINARY for datatypes that will be very small (size 1 or 2) and consistent - see
SR0009: Avoid using types of variable length that are size 1 or 2
- Use of deprecated datatypes such as TEXT/NTEXT (Dave Howard)
- Datatype mis-matches in predicates that rely on implicit
conversion. see SR0014: Data loss might occur when
casting from {Type1} to {Type2} (Plamen Ratchev)
- Using Correlated subqueries instead of a join (Dave_Levy/
Plamen Ratchev)
- The use of Hints in queries, especially NOLOCK (Dave Howard /Mike Reigler)
- Few or No comments.
- Use of functions in a WHERE clause. (Anil Das)
- Overuse of scalar UDFs (Dave Howard, Plamen Ratchev)
- Excessive ‘overloading’ of routines.
- The use of Exec xp_cmdShell (Merrill Aldrich)
- Excessive use of brackets. (Dave Levy)
- Lack of the use of a semicolon to terminate statements
- Use of non-SARGable functions on indexed columns in predicates (Plamen Ratchev)
- Duplicated code, or strikingly similar code.
- Misuse of SELECT * -see SR0001: Avoid
SELECT * in stored procedures, views and table-valued functions (Plamen Ratchev)
- Overuse of Cursors (Everyone. Special mention to Dave Levy & Adrian Hills)
- Overuse of CLR routines when not necessary
(Sam Stange)
- Same column name in different tables with different datatypes. (Ian Stirk)
- Use of ‘broken’ functions such as ‘ISNUMERIC’ without additional checks.
- Excessive use of the WHILE loop (Merrill Aldrich)
- INSERT ... EXEC (Merrill Aldrich)
- The use of stored procedures where a view is sufficient (Merrill Aldrich)
- Not using two-part object names (Merrill Aldrich)
- Using INSERT INTO without specifying the columns and their order (Merrill Aldrich)
- Full outer joins even when they are not needed. (Plamen Ratchev)
- Huge stored procedures (hundreds/thousands of lines).
- Stored procedures that can produce different columns,
or order of columns in their results, depending on the inputs.
- Code that is never used.
- Complex and nested conditionals
- WHILE (not done) loops without an error exit.
- Variable name same as the Datatype
- Vague identifiers.
- Storing complex data or list in a character map, bitmap or XML field
- User procedures with sp_ prefix (Aaron Bertrand)
- Views that reference views that reference views that reference views (Aaron Bertrand)
- Inappropriate use of sql_variant (Neil Hambly)
- Errors with identity scope using SCOPE_IDENTITY @@IDENTITY or IDENT_CURRENT- see
SR0008: Consider using SCOPE_IDENTITY instead of
@@IDENTITY (Neil Hambly, Aaron Bertrand)
- Schemas that involve multiple dated copies of the same table instead of partitions (Matt Whitfield-Atlantis UK)
- Scalar UDFs that do data lookups (poor man's join) (Matt Whitfield-Atlantis UK)
- Code that allows SQL Injection (Mladen Prajdic)
- Tables without clustered indexes (Matt Whitfield-Atlantis UK)
- Use of "SELECT DISTINCT" to mask a join problem (Nick Harrison)
- Multiple stored procedures with nearly identical implementation. (Nick Harrison)
- Excessive column aliasing may point to a problem or it could be a mapping implementation. (Nick Harrison)
- Joining "too many" tables in a query. (Nick Harrison)
- Stored procedure returning more than one record set. (Nick Harrison)
- A NOT LIKE condition (Nick Harrison)
- Not setting an output parameter for all code paths through a stored procedure, see
SR0013: Output parameter (parameter) is not
populated in all code paths
- excessive "OR" conditions. (Nick Harrison)
- User procedures with sp_ prefix (Aaron Bertrand)
- Views that reference views that reference views that reference views (Aaron Bertrand)
- sp_OACreate or anything related to it (Bill Fellows)
- Prefixing names with tbl_, vw_, fn_, and usp_ ('tibbling') (Jeremiah Peschka)
- Aliases that go a,b,c,d,e... (Dave Levy/Diane McNurlan)
- Overweight Queries (e.g. 4 inner joins, 8 left joins, 4 derived tables, 10 subqueries, 8 clustered GUIDs, 2 UDFs, 6 case statements = 1 query)
(Robert L Davis)
- Order by 3,2 (Dave Levy)
- MultiStatement Table functions which are then filtered 'Sel * from Udf() where Udf.Col = Something' (Dave Ballantyne)
- running a SQL 2008 system in SQL 2000 compatibility mode(John Stafford)
|
-
Posted Monday, November 15, 2010 6:14 AM |
For several years, I was responsible for the smooth running
of a large number of enterprise database servers. We ran a network monitoring
tool that was primitive by today’s standards but which performed the useful
function of polling every system, including all the Servers in my charge. It
ran a configurable script for each service that you needed to monitor that was
merely required to return one of a number of integer values. These integer
values represented the pain level of the service, from 10 (“hurtin’ real bad”)
to 1 (“Things is great”). Not only could you program the visual appearance of
each server on the network diagram according to the value of the integer, but
you could even opt to run a sound file. Very soon, we had a large TFT Screen,
high on the wall of the server room, with every server represented by an icon,
and a speaker next to it that would give out a series of grunts, groans,
snores, shrieks and funeral marches, depending on the problem. One glance at
the display, and you could dive in with iSQL/QA/SSMS and check what was going
on with your favourite diagnostic tools. If you saw a server icon burst into
flames on the screen or droop like a jelly, you dropped your mug of coffee to
do it.
It was real fun, but I remember it more for the huge
difference it made to have that real-time visibility into how your servers are
performing. The management soon stopped making jokes about the real reason we
wanted the TFT screen. (It rendered DVDs beautifully they said; particularly
flesh-tints). If you are instantly alerted when things start to go wrong, then
there was a good chance you could fix it before being alerted to the problem by
the users of the system.
There is a world of difference between this sort of tool,
one that gives whoever is ‘on watch’ in the server room the first warning of a
potential problem on one of any number of servers, and the breed of tool that attempts
to provide some sort of prosthetic DBA Brain. I like to get the early warning, to
get the right information to help to diagnose a problem: No auto-fix, but just
the information. I prefer to leave the task of ascertaining the exact cause of
a problem to my own routines, custom code, intuition and forensic instincts. A
simulated aircraft cockpit doesn’t do anything for me, especially before I know
where I should be flying.
Time has moved on, and that TFT screen is now, with SQL Monitor,
an iPad or any other mobile or static device that can support a browser. Rather
than trying to reproduce the conceptual topology of the servers, it lists them
in their groups so as to give a display that scales with the increasing number
of databases you monitor. It gives the
history of the major events and trends for the servers. It gives the icons and
colours that you can spot out of the corner of your eye, but goes on to give
you just enough information in drill-down to give you a much clearer idea of
where to look with your DBA tools and routines. It doesn't swamp you with
information.
Whereas a few server and database-level problems are pretty
easily fixed, others depend on judgement and experience to sort out. Although the idea of an application that
automates the bulk of a DBA’s skills is attractive to many, I can’t see it
happening soon. SQL Server’s complexity increases faster than the panaceas can
be created. In the meantime, I believe that the best way of helping DBAs
is to make the monitoring process as simple and effective as possible, and provide the right sort of detail and
‘evidence’ to allow them to decide on the fix. In the end, it is still down to
the skill of the DBA.
|
-
Posted Thursday, September 23, 2010 2:30 PM |
/*When you are exploring a database, it is interesting to check out the foreign key constraints. These will only tell you the relationships between tables if they have been properly filled-in. However, if they have been, then it becomes absurdly easy to construct queries, either graphically, or via system views. I've used several ways of 'walking' the foreign Key relationships, in the past, but I've grown to prefer seeing the SQL Clause defining the relationship, like this.... */ ON Sales.SalesOrderDetail.SpecialOfferID = Sales.SpecialOfferProduct.SpecialOfferID AND Sales.SalesOrderDetail.ProductID = Sales.SpecialOfferProduct.ProductID /*...or even the whole query so you can then quickly execute it to see the data, or paste it into SSMS to get a graphical representation*/
 SELECT TOP 20 * FROM Sales.SalesOrderDetail INNER JOIN Sales.SpecialOfferProduct on Sales.SalesOrderDetail.SpecialOfferID = Sales.SpecialOfferProduct.SpecialOfferID AND Sales.SalesOrderDetail.ProductID = Sales.SpecialOfferProduct.ProductID -- FK_SalesOrderDetail_SpecialOfferProduct_SpecialOfferIDProductID
/*You'll notice that this example from AdventureWorks, uses a compound foreign key containing more than one column. There could, in theory, be more. You could sniff them out in your database by using this code*/
SELECT fk.name FROM sys.foreign_Key_columns fkc INNER JOIN sys.foreign_keys fk ON fkc.constraint_object_id = fk.object_id GROUP BY fk.name HAVING COUNT(*) > 1
/*This complicates the SQL that you'd use to generate them. Here is the SQL to get all the relationships for a particular table in the database */
SELECT SUBSTRING( (SELECT ' AND ' + Object_Schema_name(fk.Parent_object_ID) + '.' + OBJECT_NAME(fk.Parent_object_ID) + '.' + cr.name + ' = ' + Object_Schema_name(fkc.referenced_object_id) + '.' + OBJECT_NAME(fkc.referenced_object_id) + '.' + c.NAME FROM sys.foreign_Key_columns fkc INNER JOIN sys.columns c ON fkc.referenced_column_id = c.column_id AND fkc.referenced_object_id = c.object_id INNER JOIN sys.columns cr ON fkc.parent_column_id = cr.column_id AND fkc.parent_object_id = cr.object_id WHERE fkc.constraint_object_id = fk.OBJECT_ID FOR XML PATH('') ), 6, 2000) + ' -- ' + fk.name FROM sys.foreign_keys fk WHERE fk.referenced_object_ID = OBJECT_ID('HumanResources.Employee') OR fk.parent_object_id = OBJECT_ID('HumanResources.Employee') /* we would, of course, enshrine this in a procedure or function. Here is an example procedure that produces the SQL to then look at a sample of the result that would be created. Try it and see. It is quite handy.*/
CREATE PROCEDURE FKRelationshipsFor @objectName SYSNAME=NULL /*summary: >
This procedure gives SQL Scripts to show sample (20 row) results of the joins that can be made from relationships defined in all the foreign key constraints associated with the table specified (all if no parameter give)
Author: Phil Factor Revision: - 1.1 added select statement to the ON clause - date: 20 sept 2010 Revision - 1.1 added facility to list everything - date: 21 sept 2010 example: - code: FKRelationshipsFor 'HumanResources.Employee' example: - code: FKRelationshipsFor 'Sales.SpecialOfferProduct' example: - code: FKRelationshipsFor 'Production.ProductInventory' example: - FKRelationshipsFor --do them all Returns: > single column Varchar(MAX) result called 'Script'.
**/ AS IF @objectName IS NOT NULL IF OBJECT_ID(@objectName) IS NULL BEGIN RAISERROR( 'Hmm. Couldn''t find ''%s''. Have you qualified it with the Schema name?', 16,1,@ObjectName) RETURN 1 END
SELECT 'SELECT TOP 20 * FROM ' + Object_Schema_name(fk.referenced_object_id) + '.' +OBJECT_NAME(fk.referenced_object_id) + ' INNER JOIN ' + Object_Schema_name(fk.parent_object_id) + '.' +OBJECT_NAME(fk.parent_object_id)+' ON '+ SUBSTRING( (SELECT ' AND ' + Object_Schema_name(fk.Parent_object_ID) + '.' + OBJECT_NAME(fk.Parent_object_ID) + '.' + cr.name + ' = ' + OBJECT_NAME(fkc.referenced_object_id) + '.' + c.NAME FROM sys.foreign_Key_columns fkc INNER JOIN sys.columns c ON fkc.referenced_column_id = c.column_id AND fkc.referenced_object_id = c.OBJECT_ID INNER JOIN sys.columns cr ON fkc.parent_column_id = cr.column_id AND fkc.parent_object_id = cr.OBJECT_ID WHERE fkc.constraint_object_id = fk.OBJECT_ID FOR XML PATH('') ), 5, 2000) + CHAR(13)+CHAR(10)+' -- ' + fk.name AS script FROM sys.foreign_keys fk WHERE fk.referenced_object_ID = COALESCE(OBJECT_ID(@objectName),fk.referenced_object_ID) OR fk.parent_object_id = COALESCE(OBJECT_ID(@objectName),fk.parent_object_ID) GO
--this produces the following.... --------------------------------------------------------------------------- SELECT TOP 20 * FROM Sales.SpecialOfferProduct INNER JOIN Sales.SalesOrderDetail ON Sales.SalesOrderDetail.SpecialOfferID = SpecialOfferProduct.SpecialOfferID AND Sales.SalesOrderDetail.ProductID = SpecialOfferProduct.ProductID -- FK_SalesOrderDetail_SpecialOfferProduct_SpecialOfferIDProductID SELECT TOP 20 * FROM Production.Product INNER JOIN Sales.SpecialOfferProduct ON Sales.SpecialOfferProduct.ProductID = Product.ProductID -- FK_SpecialOfferProduct_Product_ProductID SELECT TOP 20 * FROM Sales.SpecialOffer INNER JOIN Sales.SpecialOfferProduct ON Sales.SpecialOfferProduct.SpecialOfferID = SpecialOffer.SpecialOfferID -- FK_SpecialOfferProduct_SpecialOffer_SpecialOfferID
---------------------------------------------------------------------------
/* Well, this is useful, but what about something a bit cleverer that does the same thing if you give it a list of tables, up to six, maybe, and it then gives you the entire SQL to do the joins? Oh, just noticed, it is time for Sherry so I don't have time to do it in this blog. What a shame. Never mind, an exercise for the reader? */
|
|
|