Who the Devil Wrote This SQL Code?

The way that you format T-SQL code can affect the productivity of the people who have to subsequently maintain your work. It is never a good experience to see SQL Code, cry out “Who the devil wrote this code?”, and then realise that it was you. Grant gives some examples of bad formatting and explains why you should never check-in badly-formatted SQL code.

smalldevil

Have you ever had that moment when you open up some T-SQL code and you involuntarily shout out, “Who the devil wrote this SQL code?” It’s that moment when you spot code that’s not just visually wrong, such as in putting commas in front of column names, but wrong because of the mistakes that people are going to make because of all the difficulty in reading, understanding and editing that code. There are horrific things that people do to their T-SQL that absolutely shouldn’t happen, let’s talk about them and the problems that they cause.

Alphabet Soup

We’ll look first at three, very simplified, examples of what I call alphabet soup. It basically consists of aliasing the tables using the alphabet and the order in which they were referenced: It is a strange habit:

You are going to cause all sorts of problems if you write code like this. First off, if you really do have these three queries within the same stored procedure, how hard will it be to confuse which is table ‘a’ in each of the queries when you go back to edit them? Pretty easy.

It gets worse though. I know that none of us will ever write a query that exceeds 26 tables in a JOIN… well, except that one time… and that other time. In fact, it happens. Oh, it’s not always a good thing, but it’s a thing. How do we respond to that? I’ve seen this:

Yeah, turn on case-sensitivity and start referring to tables ‘a’ and ‘A’. What could ever go wrong there? What happens when you exceed 52 tables? Or when people do this:

Yeah, because table ‘aaa’ is going to be so easy to discern from ‘aa’ and ‘a’. At least we don’t have to worry about running out of letters, ‘aaaaaaa’ table is so clear.

This approach does nothing but cause confusion and a lack of clarity. I absolutely believe that we must use aliases, if for no other reason, I’m lazy and I want to be able to type less while still referring to object owners. Instead of alphabet soup, let’s name aliases in a meaningful fashion:

Not only is it easy to discern which table is which in each of the queries, but the consistency makes the code that much more readable. You will have fewer errors and better understanding of what the code is doing.

Object Owners

While we’re on the subject, there is no excuse for code like this:

There are a couple of problems here, and they will cause you lots of headaches. First off, which table does OrderDate belong to? Can’t tell, can you? Next, why doesn’t Application.People have an alias? Because it doesn’t need one. See, they only put aliases on tables that have columns that might repeat in the query. For example, many of the tables have a LastEditedWhen column, so it needs an owner to distinguish it. The Same thing goes for PurchaseOrderID which is in more than one table, so they supply aliases or, in some cases not use aliases but completely retype the column name.

This kind of inconsistency makes lots of retyping and re-editing necessary. As soon as you add in the LastEditedWhen column from the People table, you’ll have to either add an alias to that table or supply the full table name where that column is referenced. Either way, you’re going to be doing a lot more typing to get your code to work correctly. Let’s also note that we’re adding a little bit of work to SQL Server because it will have to go and find each of the objects that don’t have specific owners. Yes, that amount of work is vanishingly miniscule, but it’s extra work that it could be doing on your queries.

Let’s just do this:

With over 10,000 executions of each query on my system, the average for the query without object ownership was 129ms while the average for the query with object ownership was 115ms. The difference was the work required to discern object owners.

Statement Terminators

Oh, I get it. Semi-colons are syntactic sugar. Sure they are. Well, as long as you don’t count the fact that more and more statements are now requiring sugary icing in order to even function. What am I talking about? This horror:

The semi-colon is a statement terminator. Terminator, the end of something, like, I don’t know, Sarah Connor. No, I mean the end of the statement. So why is it going at the start of the statement above? Because people kept getting this error:

Rather than change their code to start using terminators, they threw it at the front of the WITH statement. Heck, I’ve seen internal documents for companies that have declared that a coding standard. Since this is only for a CTE, what the heck, we don’t really have to worry about just doing that, right? Put the semi-colon just in front of the WITH statement and we’re all good… as long as we’re not using Service Broker, because those commands require it too. Oh who cares, no one is using Service Broker. Just so long as you don’t need to ENABLE/DISABLE a trigger, ever. Oh, and use the THROW command or MERGE…

Microsoft is requiring the semi-colon in more and more statements. You’re going to hit issues if you don’t use it as a statement-terminator in all your statements. Yes, older code can still work, but as you add newer functions, you’re going to hit the issue. Eventually they’re going to require the terminator for all statements, just start writing it that way now and avoid the issue.

Single Line Queries

I see these all the time and I just wonder, how on earth people manage to get their tasks done when they’re looking at this:

Seriously? It’s a single line in SSMS. It’s a tiny query and it’s practically unreadable. No wonder people have a hard time query tuning, they don’t know what they’re looking at. It’s all one amorphous blob. Approaching coding like this will absolutely slow you down and lead to errors. You can’t help but mess up in a situation like this. For example, did you notice that there is a missing object owner up there? I put it near the end so you might be able to catch it. Imagine how much harder that would be if it was somewhere in the middle of that mess.

What should we use instead? Line breaks. I actually don’t care where, that much, but somewhere. Please. There are multiple studies on how white space enables readability and provides a mechanism for focus. A consistent set of line breaks and maybe some spaces or tabs will provide a common structure that you can easily communicate to your team, making all your code that much easier to read, making it easier to avoid errors or fix them.

Too Much White Space

On the other hand, it’s entirely possible to have too much of a good thing. It’s like the development team that was told a query hint would help this one query, this one time, and all they heard was that this query hint made queries run faster, so they put it everywhere. Others hear that white space makes things more readable, so we end up stuff like this:

At least it has a semi-colon.

I’m using a screen capture on this one so you can see how line 12 shoots out to character 119 on the far side of the screen. Thankfully this is a very small and simple query otherwise I’d be scrolling all over creation to try to read it. This suffers from the same problems as the previous example, but instead of no white space, this is ALL THE WHITE SPACE. Not only is this query hard to read, and therefore hard to understand, you can easily think you’ve moved between queries at a certain point.

Abnormal Characters

When you crack open a database and you see this, do you let slip some expletives before you regain control, causing yet another trip to HR, because it sure does happen to me:

Oh yeah, please, this is just what I need to see on a Friday afternoon when there’s a performance problem in a query. We have a mixed case of restricted characters, the space in the name of the table, [Table Name], reserved words, IDENTITY, SELECT, NOT, NULL, and, oh, just to add insult to injury, we’re using quotation marks around the object name so that the name can contain brackets.

From this point forward, we’re in hell. Some of you actually like using brackets around all the object names in T-SQL. Heck, the SSMS scripting tool does it by default (because, you guys and your poor choices force it to). However, admit it, you don’t want to look at this any more than I do:

Seriously, you want to type [p] over and over instead of just p? And you’re OK that you ALIAS to po, but then you have to reference it as [po]? The clutter is insane, but the inconsistency is worse. That’s before we get to the issues that can be caused by using the quotes around, not only reserved words, but special characters such as brackets. Now, we can’t even have a consistent use of brackets, but instead are forced to write our SELECT statement against the [Table Name] table like this:

Even if we wanted to use brackets, we can’t. This will end in tears, not counting what I’m going to do. What do we do instead? Follow the rules for Identifiers as laid out by Microsoft. Just because you can do something, doesn’t mean you should. As your mom always said, “Just because all your friends are using non-standard characters in T-SQL are you going to get shoved off that cliff by Grant too?”

Conclusion

I understand. You’re in a hurry. The niceties of coding well are not necessarily always going to be observed because… No, actually I don’t understand. There are simple and clear standards out there. Following them doesn’t add time, especially if you use a tool like SQL Prompt to help you do it. In fact, following them makes for greater clarity and understanding, which speeds things up. Avoiding errors and eliminating non-standard approaches is a performance enhancing mechanism. So, no, I don’t understand. You’re in a hurry, so do it right the first time. 

And put your commas at the end of the line like a good pagan.

References

  1. Cleaning up common T-SQL coding issues with SQL Prompt: Grant Fritchey
  2. Formatting SQL Code – Part the First: Joe Celko
  3. Formatting SQL Code – Part the Second: Joe Celko
  4. Transact-SQL Formatting Standards (Coding Styles): Robert Sheldon
  5. SQL Code Layout and Beautification: William Brewer
  6. Laying Out SQL Code: Phil Factor
  7. How Should You Lay Our Code: Tony Davis
  8. T-SQL Formatting: A Sure-Fire Way to Start a Conversation: Jonathan Allen

  • 21211 views

  • Rate
    [Total: 48    Average: 4.5/5]
  • Peter Schott

    I’ve learned to _never_ ask “who wrote this” because invariably I’ll find out that it was a younger version of me. I am grateful for tools like SQL Prompt that help with many of these issues, though. It won’t help with people choosing absolutely ridiculous names, but it helps with formatting and aliases. It can even take someone else’s query and make it legible. 🙂

  • rlDawson

    Grant,
    I agree with “almost” everything you wrote in this blog. The one thing that bothers me though is idea that these are hard and fast rules that seems to flow through your text. We as DBAs whether primarily concerned with the infrastructure or development do not have an intrinsic requirement to format t-Sql code in certain way. SSMS being the primary tool that most folks use (not having a nice generous company that will buy things like SqlPrompt.) to write their t-Sql with don’t have the benefit of our IDE automatically formatting our code for us. The Sql Server data engine certainly doesn’t require it nor does it care. Formatting t-Sql code in a certain way is strictly to make it easier when you inevitably have to look it over to troubleshoot a part of it or change it.

    I’ve been typing t-Sql code for so long that my fingers know what I’m doing before my brain does most of the time. If I were to stop actually think about it it would slow me down. I believe I have a clear easy to read style that I even had one manager comment that it was “too pretty” and that I was wasting time by keeping it that way.

    The two points I am trying to make are that first, I agree with you the code should be easily human readable. I have frequently ranted to our dev team that I am tired of straining my eyes trying to read their monoblock style. Second, what that style may be is very often a personal choice not enforced by any one or thing other than self. Even in the case of the “too much white space” comment above. IF I remember rightly the optimizer tokenizes the t-Sql code and extra white space doesn’t enter into it.

    Finally, great article and very well written. I’ve always enjoyed your sessions when you come to our SqlSaturday or I see an online session.

    Richard Dawson

    BTW, I do put my commas in front of the column names in a select list. It works for me. (grin)

    -r

    • ScaryDBA

      Hey Richard,

      Yeah, I’m probably over stating the case in some regard. The point is clarity because communication is frankly really hard. We just don’t get it right in ANY walk of life all that well. Code being what it is, clarity ought to be relatively simple to do. However, we go out of our way sometimes to make it not so. I’m actually cool with variation on style, especially if it’s consistently applied. There’s just far too much slop out there though.

      Thanks for the feedback.

      And move your commas.

      • TheEmirOfGroofunkistan

        I’m with commas on the left 🙂 That enables me to quickly comment out a line/column (ctrl+k, ctrl+c) from any query, especially in an ad-hoc way. Plus, Sql Prompt will auto format it this way!

        Also, the nice auto format feature of Sql Prompt makes anyone else’s slop easily readable. Unfortunately, I can’t just check it back in that way. Thanks for the post.

        • I am in the habit of adding a code snippet, a hard-coded “done” field, as the last line of the SELECT:

          ….,
          Person.Name AS “name_long”,
          –Person.SSN AS “SSN”,
          “done” as “done”
          FROM….

          This allows me to leave the commas at the ends of lines, and still comment out any line ad hoc, including the next to last as above. If I am saving things, I’ll just remember to remove my “done” field.

      • Philly Eagle

        Always comma left, it’s a hard and fast…personal preference, just like your comma right preference. You nailed it on the consistency though. I’ve actually seen stuff like this in production:
        select
        a.col1,
        a.col2,
        a.col3
        –some comment about how dev X added the following line a year later
        , a.col 4
        FROM a
        THAT is irritating

    • JerryK

      Look at ApexSQL Complete. It’s free and completes code you YOUR standards. Nice.

  • SQLWayne

    Grant, I think you’re in need of a Clue By Four, hanging on brackets on the wall behind your chair, as a reminder to programmers that you won’t hesitate to beat some sense in to them. 😉

    Reserved words and brackets for table/field names? Clue By Four, applied repeatedly.
    Excess white space? Definite Clue By Four.
    One-line? I can reformat that much easier than excess white space. Reinforce the concept to the perpetrator with a few meaningful and solid *whumps* of the Clue By Four in your hand while explaining their sin.

    I think an appropriately labeled cricket bat could also be used.

    • ScaryDBA

      So much violence… besides, I use an Arkansas tire thumper, hickory weighted with lead and a retention lanyard. Much more effective.

    • Ray Herring

      I worked in one database where the developer named tables and columns with the C++ object, property, method like syntax. So the table might be theDoc and then an column would be theDoc.docId. Some of them were nested objects three or more levels deep. so a column name could be theDoc.theComment.theSubmitter

  • sterbalr

    While this is true for checked in queries, there are a lot of one time queries that benefit from being one line, or with odd comma configurations.

    When I connect to a server, in a comment area of my default query are a half a dozen queries I need to run ad hoc. I know what they do. Grabbing 1 line is vastly easier than any other way of actually running the queries.

    When I document the query I display it both as a one liner and in a block format.

  • I guess we all have bugaboos with coding style. I’ve met coders who don’t care one little whit. And I’ve met coders who are obsessive-compulsive about it (that list would include me). However, mostly, it’s aesthetics. Some of the things that make code easier for me to read seem to annoy others.

    E.g., in one of your examples above…

    select a.Name,b.Name,c.Name from production.product as a
    join production.ProductSubcategory as b
    on a.productsubcategoryid = b.productsubcategoryid
    join production.productcategory as c
    on b.productcategoryid = c.productcategoryid
    where a.class = ‘L’

    Here’s how it would look if I coded it…

    SELECT a.name
    , b.name
    , c.name
    FROM production.product prod
    JOIN production.productsubcategory psub ON prod.productsubcategoryid = psub.productsubcategoryid
    JOIN production.productcategory pcat ON psub.productcategoryid = pcat.productcategoryid
    WHERE prod.class = ‘L’

    I lined this stuff up when I typed it, but have no idea whether Disqus will preserve my indentations once I post.

    If there were more criteria in the WHERE clause, I would like the ‘=’ sign up on every AND.

    Also, I’m presuming case insensitivity.

    I always want the keywords to bark loudly, so I capitalize them. Could go the other way, I suppose — if the data objects and items were capitalized, the keywords could be small-case. Just something so they stand out.

    FROM is more important than JOIN, so the JOINs are indented.

    WHERE is as important as JOIN, but more important than the ANDs and ORs that follow.

    For myself, I find it easier to keep track of data items when I’m reading vertically, not horizontally.

    Like I said, a lot of this is taste. I’ve been chided by former bosses for changing the formatting of existing code. I just shrug and say, well, I’m the one maintaining it now, and I’m in favor of making it readable.

    • ScaryDBA

      You are absolutely going to fall in love with the new formatting tool in SQL Prompt once it gets finished.

      Honestly, there’s nothing in your style that makes me crazy. It’s different than mine, but that’s OK. It’s not about my style being right and yours being wrong. It’s about having some consistency in order to promote clarity. What you have is clear. I’m in if you’re setting the standard.

      • I thought the comma in front was for two reasons, firstly if your using source control then a newly added lines or removes are clear in any diffs, i.e. less changes, secondly easier to comment out while your working on it.

        • ScaryDBA

          It’s easier to comment out the last line, harder to comment the first line, both for the same reasons. It doesn’t truly matter to me, but it’s fun to point out because people tend to fall pretty hard into one camp or the other.

          • Ron Clarke

            Not advocating it, but what about the “solution” of commas on their own line. Then it is just as easy to comment out the first and last line(s). But talk about excessive whitespace!

          • ScaryDBA

            Yikes!

            However, it would work.

        • MidnightDBA

          Strongly agree.

      • Tim Robertson

        I’ve been using SQL Prompt for several years and really like it. I’m currently on Version 7.2.4.291. It saves me several hours each week and the formatting is awesome.

  • DWalker07

    The hard-to-decipher table aliases are EXACTLY why I generally don’t use table aliases at all.

    Table aliases in Join statements are ALMOST never required, but they are taught as “part of the syntax”. The code is far easier to read if unnecessary table aliases are eliminated.

    And don’t try to tell me that table aliases save typing time; most people’s speed of writing SQL code is not limited by “how fast you can type”. There’s copy and paste for the table names too. 🙂

    I find code that doesn’t use table aliases in its joins to be FAR easier to read. SQL code is meant to be executed AND it also needs to be read and understood by humans.

    • Ray Herring

      Sorry, disagree on this one. Consistent table aliases are very beneficial.
      On the other hand inconsistent aliases are the devil’s spawn.

      • DWalker07

        Beneficial HOW? Declaring that table aliases are beneficial does not make it true.
        SQL code without table aliases is easier for humans to read.

  • SarekOfVulcan

    I like starting lines with commas, operators, and booleans because it immediately tags that line as a continuation of the one above it. I picked the style up from Conway’s Perl Best Practices, if I remember correctly.

    • Ray Herring

      agree

  • Richard Polichetti

    ;made me laugh!

  • Scott

    Thank you. This is one of those areas which is actually more important than many realize. Having worked in as many poor development environments as I have good ones, your words ring so true.

    I have found myself in environments where

    1. Keywords are used as column names. Really?
    2. Order By clauses have ordinals rather than column names. Come on. You can use the alias in the order by. It’s not that hard. You are not that busy dude.
    3. There are misspelled words in tables and columns for industry standard terms. Oh man. Google it if you don’t know the proper spelling. Once it is a table or column name it is a little late to change it. Change it as soon as it is recognized.

    But these environments are far better than the one I came from, which could be called “SQL Hell”.

    1. The entire data warehouse written in ANSI 89 instead of ANSI 92 and was a SQL 2005 database. ANSI 89 is the “select column, column from table, table where “column 1” = “column 2”. It has never been recommended by Microsoft (since they split off from Sybase in the 90’s) even though it runs, albeit slowly at times. Some die-hard programmers refuse to change. Some continue to code like this because it gives them the appearance of being “old school”. You’re not “old school”. That standard changed when you were in diapers.
    2. All the aliases were using a,b,c. like you mentioned above.
    3. There weren’t any comments anywhere.
    4. The code was terribly slow. Some jobs that took hours to run could be tuned simply by re-writing the code to industry standard and run in minutes.
    5. The code was not formatted in some instances. It looked like a dump truck backed up and dropped it into the procedure.
    6. Stored Procedure parameters did not have meaningful names, and used values like 1,2,3,4,5. There was no explanation as to what these parameters did and it took hours to sift through the code and determine it.
    7. There were misspelled industry standard words in tables and procedures and as folders on the file shares.
    8. Stored procedures that were a mile long that seemed to address every issue poorly rather than a single issue properly.
    9. It was near impossible to fix anything because if you did, things would come tumbling down like dominoes.
    10. “Retired” and “Temporary” tables in the databases prefixed with “ZZZ” or “AAA” and never cleaned up. It looked like an ape cage with banana peels and other food scraps lying on the floor.

    This happens when people come up with a company. Sometimes they come from another platform and never had the opportunity to learn Microsoft T-SQL properly, or work with other developers. They use a version of SQL from some ancient platform, and write in something similar to broken English, and some speak in broken English though they have been in the US for years. Some of them are promoted by attrition and were IT employee number 5 or something like that, and now the company has fifty IT staffers and a half a dozen on the data warehouse team alone. Sometimes they have bosses that don’t know the first thing about code but just know they’re a genius.

    I watch it happen when new employees come and go. I can read it in the code from those who came before me. I watch them struggle to learn the code, difficult because it is so archaic and ill organized. It is nearly impossible to grasp the business process because it is not written anywhere. I watch them become intimidated because they are struggling to grasp what the bosses make seem easy (because they wrote the crap). I watch them learn their bad habits and start to code in ANSI-89 and mimic the bosses. No! Don’t go there…Bad habits over time become set in stone. An environment built on a shoddy foundation will become bloated and stagnant over time.

    The road to SQL Hell is wide and easy, and many are merrily making the journey. They just don’t realize yet where it will get them.

    So keep up the good work, Grant. Go and tell it on the Mountains of SQL Code. I have watched many discard their good habits and trade them for bad ones in the data mines of the modern corporate world. But then again, isn’t that what the devil makes one do?

    Scott

  • Recce

    The place I’m working at now, I’ve inherited all these SQL statements where the JOIN is at the end of the line, so refers to the table at the start of the line below. I can’t think of a single reason why someone thought this was a good idea.

    • Henk van der Geld

      I totally agree with you. It’s there for years and it’s not likely it will be done in another way soon.

      • Ray Herring

        Me Too

  • Henk van der Geld

    As said in several comments hereafter, I prefer to have commas in front of the column name. I also apply this approach when writing C# code.

  • ShyMagpie

    I’m in the ‘commas first’ crowd. It makes it much easier to see the list of columns and to comment out what’s not needed. This is also why I highly promote using a code formatter of some kind so that there’s consistency and some kind of order between developers.

    • Ray Herring

      Concur completely

  • Rayis Imayev

    Very good post! I fully support a well written code, it not only help others to maintain it, it will help me to understand it better after a while when I forget it.

    I’ve written a blog post about this similar thing 2 years ago:
    http://www.sqlservercentral.com/blogs/data-adventures/2014/04/30/and-i-wonder-where-that-column-is/

  • John

    I came in here ready to disagree with you Grant, but I agree on everything. I detest front commas. They’re for sissies.

  • Evaldas Buinauskas

    I really love the way SQL Prompt works, however my first preference still is Poor Man’s T-SQL Formatter. Using both of these to get desired result.

    • Shawn

      ht to you, sir!

      • Evaldas Buinauskas

        What’s ht? 🙂

        • Shawn

          hat tip. It’s internet for “thanks!” haha.

          • Evaldas Buinauskas

            Well thanks! Never knew this one. 🙂