Formatting SQL Code – Part the Second

When you're formatting SQL Code, your objective is to make the code as easy to read with understanding as is possible, in a way that errors stand out. The extra time it takes to write code in an accessible way is far less than the time saved by the poor soul in the future, possibly yourself, when maintaining or enhancing the code. There isn't a single 'best practice, but the general principles, such as being consistent, are well-established. Joe Celko gives his take on a controversial topic.

A few decades ago, I wrote SQL columns for the two trade press in-newsstand database magazines that were current at the time. The two titles were “Database Programming and Design” and “DBMS”, from Miller Freeman and CMP publishers respectively. In 1999, Miller Freeman merged with CMP Media (a division of United Business Media plc) and the combined publication became “Intelligent Enterprise”, which ceased publication in February 2007.

At one point, Chris Date and I had columns in the two magazines, opposite each other. We were sort of the RDBMS version of the 1937 mock radio feud between Fred Allen and fellow comedian Jack Benny.

It was at this point that we needed a standard SQL code format that would be used as a standard across the publications, and then later could be used in books. Chris and I, with the help of the editors, set to work to create a standard for code that would make it easier for our readers to understand what we were arguing about. The exercise was quite a learning-experience, even though I did some work at AIRMICS (Army Institute for Research in Management, Information and Computer Science) and already had a fair amount of research material on code readability, debugging, and the other bits and pieces of software engineering (I actually started writing columns on software engineering in the trade press before I became a SQL guru).

You might have thought that it would be easy to come up with a standard for laying out code so readers could ‘take it in’ easily. We couldn’t just clone existing standards from another language because SQL is unusual for several reasons: SQL was designed to have an unambiguous grammar, allow for multi-word ‘keywords’ (i.e., “INSERT INTO”, “GROUP BY”, “ORDER BY”, etc.) and to allow unary operators in math (leading negative and positive signs in expressions). We wanted a computer language that was close to the English language you’d naturally use for set-based operations, yet one that was easy to parse unambiguously. To keep us on the right track, one of the members on the original ANSI X3H2 committee had access to a parser generator, a really good one. Anytime we, in the committee, would pass a paper, Steve would run it through his tool at work and bring back the results to the next committee meeting. Our goal was explicitly to create an LALR(1) grammar. This would make it possible to build a simpler parser. Yes, we actually worried about stuff like this, but that is why SQL is such a great language.

If you would like some of the details, then read this article: https://en.wikipedia.org/wiki/LALR_parser

If you would like something with more details, then look at this article: http://web.cs.dal.ca/~sjackson/lalr1.html

If you really do not care about writing compilers, then the ‘Management-Overview’ version is that this grammar leads to certain typographical conventions that would not be otherwise necessary. The multi-word ‘keywords’ are unusual in programming languages. We are also the only language that has embedded-in keywords within a parameter list. If you are writing T-SQL, you will use “CAST(<expression> AS <data type>)”, but will not use the ANSI/ISO standard “SUBSTRING (<string expression> FROM <start position> FOR <length>)” and other options that do not match the old Sybase syntax.

All these factors mean that you can assume that you can transfer your ‘best-practices’ unchanged from other languages. SQL is unusual.

Here are my rules for formatting SQL code, with comments that link back to the principals involved that I outlined in the first part of this series.

Mother Celko’s SQL Formatting Rules

  1. Do not use color to display the code (Stroop effect). Besides making the code harder to read, you just cannot go throwing around eight or so different colored inks in a publication!
  2. Use a monospace font. This will let you create rivers to make the code easy to read vertically. I will get back to this later.
  3. The worst thing to do is to be inconsistent. There are many ways to be inconsistent. A data element should have one and only one name in the schema and preferably in the universe at large. This is why we build data dictionaries. Otherwise, your schema will fill with crap like “account_number”, “account_nbr”, “acct_nbr”, “acct_num”, and so forth.
  4. In general, avoid using abbreviations and defaults. The full keywords are easier to read and we do not have to save space on a punch card anymore. For example, write out DECIMAL(s,p) instead of DEC(s,p); INTEGER instead of INT; “CASE.. ELSE NULL END” instead of letting it default to an implied else clause.
  5. Uppercase all of the reserved words in SQL (Bouma principle) so they can be clearly seen. This is surprisingly easy because the ANSI/ISO SQL standards actually have a list of all the reserved words that you can cut-and-paste. You probably wanted to add vendor specific reserved words and you have to get those from your vendor’s documentation.

    There is another advantage to this. When someone uses a Pretty Printer with this list, and they have used a reserved word as a column or data element name of some kind, it will be flagged in the formatting by being shifted to uppercase. In particular, the newbie will see “DATE”, “VALUE” and other generic data element names sticking out of his code like a sore thumb. He can then go back and fix his bad design by following ISO 11179 rules.

  6. Lowercase all scalar values and column names. This goes back to the newspaper readability rules about lowercase type. When I see “SELECT distinct”, it looks like “distinct” is a column name and not a reserved word.

    You might notice the use of the underscore. ISO likes this because it is part of a required subset in Unicode of punctuation marks for representing ISO standards. Typographers like it because it looks like the baseline that we been writing on in our loose leaf binders and parchment manuscripts for centuries. Latin alphabet users will flow over the underscore, but camelCase and Pascal case, which use an uppercase letter to separate pieces of a name, make the eye jerk and stop. We found that this sort of thing can add 8 to 12% more reading time to debug code.

    The ISO-11179 rules use a “[<role>_]<attribute>_<attribute property>” syntax in which the <role>

    tells us how multiple copies of the same data element are used within the same scope. The <attribute> is the name of the data element involved, and the <attribute property> is drawn from a small vocabulary that classifies what the attribute is doing. For example, if I have a table called “Tree”, this would tell us that I have only one tree and it is so unique, so well-defined that I do not need to specify anything else about it. Clearly this is stupid; what we wanted for the table name was a plural or collective noun.

    The name “Trees” is weak, and I probably wanted the collective noun “Forest” instead. Within that table, I might have columns named “tree_id”, “tree_diameter”, “tree_species” and so forth.

    Please notice that an attribute property cannot be used as an attribute. This is why names like “code_id” are a code smell. This names are simply too generic to have any meaning (the Law of Identity from formal logic). Years ago in an article I made fun of this sort of thing by stringing together three or four attribute properties, as a reducto ad absurdum. The next year. I found someone had actually done this in real code.

  7. Capitalize the names of schema objects that are constructs, such as tables, views, sequences, stored procedures and so forth. Remember that users of Latin alphabets read capitalized words as proper nouns or in some way special. They will not confuse them with the scalar values in SQL.

    This convention also does some other work for you. If I have a table named “Employee” and a column named “employee”, this is legal SQL. But it is really, really bad coding. We should have made multiple name spaces when we were writing the standards for the language back in the 1980’s, but we did not.

  8. In general, avoid using abbreviations. The full keywords are easier to read and we do not have to save space on the punch card anymore. For example, write out DECIMAL(s,p) instead of DEC(s,p); INTEGER instead of INT; “CASE.. ELSE NULL END” instead of letting it default to an else clause.

    Consistency is very important. Do not mix abbreviations with full words in your code. For example, if you use LEFT OUTER JOIN, then do not use LEFT JOIN anywhere in the program text. Likewise, if you use LEFT OUTER JOIN, then use RIGHT OUTER JOIN and FULL OUTER JOIN to match the style.

  9. Do not indent more than three spaces within a block (Law of Proximity). The readers I will get lost if he has to make huge jumps across the page. Likewise, do not set text in long lines. This is why newspaper columns are made up of relatively short lines of text. The rule of thumb in typography was that a newspaper or magazine column should be no more than 2 ½ times the length of the alphabet sets in the type font being used.

    If you have a long line of code, then split it on multiple lines. Sometimes a small indent of one to two spaces on the second and following lines will help. But be sure to end the line where the split occurs with language token that lets the reader know that there is more to come. This is why putting a comma at the front of the line is a bad practice; when you see a comma you know there is something coming after it. Likewise, when you see a semi-colon, you know it is a terminator. That lets the reader mentally close up that unit of code, “clear his buffers” and parse the next statement.

An Example

We’re going to start with code that hasn’t been formatted well, and improve it in stages. I want to use an example that is well understood, so I picked relational division. Relational division is one of the eight basic operations in Codd’s relational algebra. The idea is that a divisor table is used to partition a dividend table and produce a quotient or results table. The quotient table is made up of those values of one column for which a second column had all of the values in the divisor.

We have a table of pilots and the planes they can fly (dividend); we have a table of planes in the hangar (divisor); we want the names of the pilots who can fly every plane_name (quotient) in the hangar. To get this result, we divide the Pilot_Skills table by the planes in the hangar.

Here is the worst DDL and DML I can imagine:

It is in the proportional font, in all uppercase, crammed together as a continuous unreadable blob of text. There are needless parentheses, too. If I forgot an error, it is only because I was getting sick while doing this. Please don’t think I’m exaggerating: I have actually seen stuff like this posted on forums and in production.

Change Font, Add Line Spacing and Capitalization

Let us make a couple of basic changes. First, use Courier New for the text. Next, lower case column names and scalar variables, but keep reserved words in uppercase. Let us actually space out the text; the space around the equal signs, a space after commas, etc. Basically, follow the rules you would for English text; that is how people have been trained to read Latin alphabet text.

Next, we capitalize the table names. This will make them be seen as “special units of text” and not scalar or column names. Notice that the open parentheses in a data declaration are placed next to the keyword of the data type. This is the Bouma principle; they are actually part of unit. Another thing about open parentheses is that they announce subclauses in SQL. For example, a subquery is always enclosed in parentheses.

Overuse of parentheses is often a sign that the programmer came from a different language, with different parsing rules. Or that he has crammed his code text so close together, he needs to have a physical separator rather than being able to easily read it.

Languages which require a lot of parentheses are usually low level systems languages (LISP is the exception). Overusing parentheses results in a really awful debugging session. You will need to get a highlighter, and highlight matching pairs of parentheses. A good SQL programmer will use spacing to aid the debugger (remember someday you will be the poor bastard who has to go back over sloppy code; a professional writes code that is merciful to those that come after him).

This is clearer, but we’re not finished yet.

Add the Rivers and Indents

We come to the tricky part in formatting SQL code. We need to add rivers and indentation because SQL code is two-dimensional, not one-dimensional like text. Look at the Pilot_Skills table; it has two columns shown on one line each. The definition of a simple column should be written on one line because they are easily read as a Bouma. The PRIMARY KEY constraint is indented just one space to show that it is a separate unit of code. The close parentheses and semi-colon can be put together because they are terminators that tell the reader where the end of a unit of code is.

There is a line space between the table declarations to give clear separation. Again, the simple single column can get all of its information in one line. If we had CHECK() constraints, REFERENCES, DEFAULT or other DDL clauses, we would have gone to a new line and indented a single space.

Finally, the DML query needs explaining. The keywords SELECT and DISTINCT are together because they are actually a single unit in the SQL grammar. The outermost SELECT-FROM-WHERE construct needs a river to “anchor” those three clauses vertically

When we drop down into the level of the outermost WHERE clause, we have a NOT EXISTS() predicate. This is one of the few predicates in SQL that has a query as its parameter. Subqueries are always enclosed in parentheses – – we made sure of that when we were defining the ANSI standards. So to be consistent, the parameter must be formatted with the same convention of a river on the SELECT-FROM-WHERE construct (the consistency principle). Repeat this logic for the next nesting of an EXISTS() predicate.

Please notice how the “AND” appears at the left-hand shore of each river in its subclause. If you put it at the end of the line, the reader would expect to see something immediately after it. This is a word that announces something to follow, rather than terminating something.

Notice how the keyword GROUP is on the left-hand shore of the river, but the other part of the syntax, BY, goes on the right shore. These extra words are a peculiarity of SQL grammar and do not always add meaning to the code. I like to think that they add to the readability of the code.

Here is the final formatting:

I think you will agree that this is considerably easier to read, and therefore to understand, than the original solid blob of SQL.

  • 16534 views

  • Rate
    [Total: 27    Average: 4.3/5]
  • ammarmar

    There are multiple things in this article I do not agree with.

    For example, naming scheme with table name “Tree” vs “Trees” vs “Forest”. If you use any kind of ORM, the best practive is to name your table exactly like the object type it holds. So the best name for table holding “Tree” objects would be “Tree”. Some ORMs are smart enough to understand plural “Trees”, too, so it’s not the worst choice, either. The weakest name in this case would be “Forest”, because it creates confusion – where should I look for my “Tree” objects if I do not know table name? ORM won’t know this as well, you would specifically have to supply that information by creating non-standard mapping.

    Also, whether to use under_scores or camelCase or PascalCase for tables and columns – this may depend on the language your application is written in. In my case, it’s C# and the natural way of naming objects (and, by extension – tables that hold them), is PascalCase. The same for columns, which translate to properties in my classes.
    I find the “8-12% faster reading” with underscores names not worth the effort here. You never work with SQL alone, it’s always in a context of some application.

    The final formatting you present is still too crammed for my liking. My typical SELECT query would look like this:

    SELECT
    SomeColumn1,
    SomeColumn2
    FROM
    SomeTable
    WHERE
    SomeColumn3 = @SomeValue

    Is it too verbose? Maybe – but if you create a complicated query and it takes two screens, it may make you think twice whether it is a good idea in the first place. Database engines do not like multiple subselects and you shouldn’t like them, too.

    I especially hate parenthesis in the same line as content. I prefer using them like blocks of code:

    INSERT INTO SomeTable
    (
    SomeColumn1,
    SomeColumn2
    )
    VALUES
    (
    @SomeValue1,
    @SomeValue2
    )

    (well, indents did not make it into comment, but you get my meaning, I hope 🙂

    • You cannot just disagree; you have to give a reason. I worked with a guy who set his screen to red type on black background. This is why I did all that topography, visual psychology, etc in Part I. Quote research, etc and do not assume that “I do it this way, so it is the will of Ghod!” is a good thing.

      Singular names were used in IDEF and other early pre-RDBMS data modeling/analysis systems because they use a “record-at-a-time” model of data processing (Read Jeff Modem’s rants on this mindset; he is good!). You did not handle the entire set of Personnel; you did the job one employee at a time (loops, not joins or set operations). I really hate ORM tools; not only is there a classic “impedance mismatch”, but they force a set-oriented problem into a sequential solution.

      The use of underscores is another ISO thling. This is part of the common Unicode Latin set required to represent those international standards. COBOL uses a dash (which I read as a minus sign) , other languages use periods, etc. But the underscore is the most portable.

      PascalCase is better than camelCase, but it is still bad. Your eye jumps and pauses on UPPER case letters; this is how the Latin alphabet works. But camelCase jumps the eye to the middle of the word, then back. We used to do this research with video and a grad student; today, we have tools for this (there was a Springer-Verlag book on this on this stuff — got a ton of $$$ to buy it?) .

      The worst was Hungarian notation. Another topic; perhaps this is why Magyar is a minority language ?(Slavic infighting here :))

      We experimented with your “narrow newspaper column” layout. I think (not sure) it was done at University of Maryland under Ben Shneiderman. It does not work. By the time that your eye gets to bottom of the narrow column (again, a leftover from punch cards), you lost information. Test it for yourself; take a paragraph and turn the spaces to line returns. Try to read it.:) Did you read the “double the” example in Part I? Look up “Chunking” in human comprehension studies. .

      The old Sybase “50 lines to a proc” or one-page heuristic also had to do with eye movement. I have almost no peripheral vision (scarred retina) , but this is exceptional. Most programmers see what is above and below the line they are reading (if not exactly, but as hints and reminders).

      We found that when we split code over the pages of a printout, we messed up comprehension. Things are better (8-12% better) if the program text is in the same field of vision.

      • ammarmar

        You quote research, I quote real life. Do not close yourself in a box of standards and research documents. Many things were proven to be worse than optimal, yet are used everyday, including the QWERTY keyboard you probably typed your answer on.

        What kind of argument is “I really hate ORM tools”? ORM exists, it is used on a daily basis in plethora of projects, both commercial and other. It won’t go away just because it breaks your perfect idea of SQL. So are programming languages. Anybody using Java/C/C# etc won’t even look twice at names with underscores.
        And it’s actually not bad to “jump on upper case letters” in PascalCase. Even tools are built to support it – e.g. if you search for ThisUglyName in some IDE, you often just need to type “tug” in search box.

        My argument for “narrow newspaper column”, as you called it, is that it shows just how complicated the query is. It’s also easier to edit, e.g. to delete a column from SELECT query, you simply delete the whole line.
        I’ve seen your own format many times and all it does is hiding complicated queries just because they can be written in few lines of code.
        In real life, most often it’s not a problem to read queries. It’s a problem to write queries that are _efficient_. Best queries are short and simple. If you write something you have trouble comprehending, you can bet your house it will run even worse. Queries with multiple subselects that span over multiple pages of code are simply unacceptable and rejected right away. In fact, even a single subselect is bad in most cases.

        BTW, Hungarians are not Slavic people. Not that I care, but you seem to like your facts straight.

        • David Edwards

          In response to “Real Life”, I’ve been on the receiving end of Joe’s withering scorn, while asking for help creating a nasty query elsewhere on an inherited, off-the-shelf system.

          At the time I bristled a little – the DB wasn’t my design, but even so it suited the Real World and its purpose. However, I think it’s incredibly important that gurus like Joe do what they do and always push the importance of “purity” and The Right Way.

          I see it as a similar situation to the English language: constantly evolving and changing, which is what makes it so versatile, but the people who misuse apostrophes and choose the wrong word because it sounds similar etc need tempering to keep everyone speaking and understanding a common language. Ok, not an exact analogue but I hope you see what I’m getting at.
          Things change and develop, best practice changes accordingly, but the advice of someone intimately involved with the creation of anything is worth keeping at the back of one’s head, even if not followed to the letter – it comes from a place of deep understanding, and care I imagine,

          I largely agree with Joe’s post, already do most of it (but find leading commas more readable like Rob Crichton) so will continue to ignore that part, but not try to convert anyone – it may not be “perfectly” readable but it very readable.

      • buggyfunbunny

        — We experimented with your “narrow newspaper column” layout. I think (not
        sure) it was done at University of Maryland under Ben Shneiderman. It
        does not work.

        it is easy to prove. just go to your local library and pull any volume from the Great Books collection (not sure if they’re still for sale; don’t waste money if they are) and try to read it. I asked for a used set decades ago for birthday/christmas/Passover/whatever sight unseen. Bad move. I got a headache in a couple of minutes. the text is divided as two columns/page, just like most Bibles. impossible to scan.

      • Robert Sterbal

        Just a not about research: Research is a process not a destination. There has been a lot of change since your research was done, and you failed to note what the bias of the research was (all research has important bias).

  • buggyfunbunny

    since java has been my primary normal language for some years, the use of “.” has semantic meaning. imagine my frustration with R, a language of increasing interest here. well, in R “.” is just another character, and historically used as multi-word separator (which is odd, given that much of R is written C and treats the “.” semantically). a few newer R developers, notably Wickham, have finally said enough to that.

  • Thomas

    In this article you have done one item which is incredibly valuable: you have explained *why* a given standard was set forth. However, it does not appear that you then followed through to determine if the “why” is still applicable.

    1. Color – Frankly, your argument against color does not apply. Why? Because the purpose of color is primarily to *locate* keywords not identify what keyword it is which is where the Stroop Effect would be applicable. Color also provides feedback into whether a given word is a keyword.

    4. Abbreviations. You need to differentiate the use of abbreviations in keywords vs. abbreviations used in object names. I completely agree that abbreviations in object names should be avoided because the next developer will not necessarily know what the ARTU abbreviation means. However, every developers should be able to pick up on “int” meaning “integer” or “varchar” meaning “variable character” (the latter being absurdly verbose). Thus, I think it best to qualify where certain abbreviations make sense.

    5. Uppercase keywords. Completely disagree because of #1 and because of this statement made by someone you know:
    > Lowercase letters are rather important for readability.

    Your reason for using uppercase appears to be for reasons of visibility (“clearly seen”). However, color does a much better job of this. The only exception is color blindness for which there are solutions.

    6. Lowercase all scalar values and column names – Again, I disagree because words that are all in the same case are difficult to read. Underscores are frankly evil. They are cumbersome to write and see in some cases.

    > “Pascal case, … make the eye jerk and stop. We found that this sort of thing can add 8 to 12% more reading time to debug code. “”

    If that were true, every book written since the advent of lowercase letters would be 8-12% harder to read. Again, it appears you want to use all lowercase object names to make them easier to locate which can be accomplished with PascalCase and color coding of keywords.

    8. This brings up a good point in relation to noise words. E.g. “Into” and “Outer”. You can’t have a “Left Inner Join” or an “Insert Outof” statement. Almost every DBMS allows for stripping these out and IMO, they serve no purpose other than clutter.

    9. Don’t use spaces for indenting period. Use tabs. I’ll leave it to the author to go watch Silicon Valley.

    The one item on which we definitely agree is consistency…even if it blows. If consistent coding standards are used, it is easier to adjust them and existing code after the fact.

    • Kurt

      I agree with Color being a good thing. How many of use would turn off
      syntax highlighting in SSMS (is that even an option)? Not many, because I
      believe it aids in readability…whether actively coding or displaying
      in other contexts.

  • Rob Crichton

    For the most part I agree with the resultant formatting that is provided here. However …

    Is there an inconsistency with your argument for putting commas at the end of the line compared with AND at the beginning? You say that both suggest that something will follow but come to different conclusions.

    Personally I have opted for commas and ANDs at the beginning of the line. I recently changed from comma at the end to comma at the beginning. I know the argument about reading code more often than writing it but I changed because all too often I hit an error or, worse, aliased a column by pasting and forgetting the comma at the end of the previous line. By creating a ‘river’ of commas I can immediately see what is expected.

    Also I’m not too zealous about one thing per line as occasionally it is better to keep the page short by putting multiple select columns on a single line (within the 80 char width!) – often I’ll group related items in that case. Sorry no data for that one – just a preference.

    On colour – I would be very troubled if at least comments didn’t appear in a different colour! Often you will temporarily remove a few lines of code using a comment – it would be awful if these weren’t highlighted in some way. But you do hint that you are only really bothered about print for that where inline comments wouldn’t normally feature.

    I’m glad to subjectively contribute to the never-ending formatting wars ;¬>

    • Kurt

      I agree with the “commas and ANDs”. I feel they are in the same category. Funny, I always put them both at the end of a line.
      My additional reasoning for commas at the end of a line: As Joe put it, “follow the rules you would for English text”. When word-wrapping normal text, commas never end up at the beginning of the line. It would look “unnatural”.

      My 2 cents

    • Joe Celko

      The original reason for commas at the beginning of a line was punch cards! We saved our old punchcards and reuse them because it was a really hard job to get a free keypunch machine for programmers. In fact, we first had to fill out coding sheets with a specialized block printing alphabet, then we got our program deck. Saving and reusing the cards saved hours of waiting and work.

      Another bit of style was to save a card and hit the dupe key up to the point you wanted to reuse the code on that card. For example, in PL/1. You might type “IF A = 1 THEN B = 0;” and later find out that you needed another assignment statement. The better way to do it would be “IF A = 1 THEN BEGIN B = 0; C = 0;END;”, but that requires a lot more typing. So what we would do is hit the dupe key on the keypunch machine and generate a second card to give deck with “IF A = 1 THEN B = 0; IF A = 1 THEN C = 0;” in it. This was fine with us because it is how it would have to do it in Fortran (early Fortran did not have a block structure).

  • JaiNormosone

    I will format code using some of the “rules” you mention and ignore some of the “must haves” that you state.
    No problem there, of course, because a person needs to create a style that suits them and their requirements. I can pick up something that I wrote 10 years ago and almost instantly recognise it as one of mine.

    By far, the best line in the entire piece: “…remember someday you will be the poor bastard who has to go back over
    sloppy code; a professional writes code that is merciful to those that
    come after him…”

    I will expand on that and state unequivocally that anyone who writes code without comments and without a header section that defines what it does, is not only unprofessional but downright lazy and totally discourteous to colleagues and anyone else who has the misfortune to read their trash code in the future.

    I have been on the receiving end of pathetic object aliases and rubbish code while having to rationalise same to be tolerant of it.

    If this statement offends you then the correct action is to be professional and pick up your game.

    • JaiNormosone

      Whoops… I meant to qualify the rubbish code thing… Lack of comments, etc., was acceptable many years ago (like 30-40 years) when space was a premium and code needed to run top-down to be fast.
      This is not an acceptable practice any longer as memory is near the cheapest hardware component available.

  • pwo311

    As for formatting, it does not really matter in most cases anymore. Pick your language and there is probably a formatter for it. Most allow changing the formatting rules. Some languages force some of the formatting to provide scope. I do not care. Just so I can format the code automatically before I have to look at it. Some of the other points can also be addressed in some of the formatters such as reserved words and their case.
    I generally agree with most of the other points. For most of the languages I need to review I have written small and simple parsers to assert headers and documentation tags (and some content), error handling and a few copyright related items. I enforce code being brought up to our standards if it is touched. For each language we have a lint like process that’s gets executed on the code as well. We flag items that must we addressed via rules.
    As for the modeling, for the love of Pete, we need less so called “Modelers” and “Architects” and utilize the good ones more. If you code, it does not mean you can model. If you code, it does not mean you understand SQL or RDBMS’s. If you are really smart and are great at analysis, it does not mean you can model. My two cents is you really need to have an aptitude for it. I think the same can be said for the really good coders as well.

  • Blair

    The only possible issue I have with this is the insistence on lower case for field/schema names. Make your field names exactly match the CASE of the actual field names. This is less important in Oracle databases where upper case is assumed and double quotes are used when lowercase is used, but in Microsoft databases case sensitivity can be a real headache and matching case is critical. Microsoft Dynamics GP is a particular case.

  • Sam Vella

    I was recently thinking about the table alias and how readability can be improved there.
    As you have in your example, the usual standard is to simply keep the alias inline with the call for the table name:
    FROM Pilot_Skills AS PS1
    INNER JOIN Pilot_Training as PT1
    on PS1.Skill_id = PT1.Skill_ID

    however when there are many joins, and/or fully referenced table names i.e. .. then quickly scanning for an alias to determine which table a column originally comes from can be tricky as the table alias is not always the same number of characters in from the left.
    If the alias is instead pushed down to the next line then it will be easier to find:

    FROM Pilot_Skills
    AS PS1
    INNER JOIN Pilot_Training
    AS PT1
    on PS1.Skill_id = PT1.Skill_ID

  • Michael Jones

    I use lowercase for SQL reserved words and uppercase for all database objects like table and column names. The reason I use uppercase for database objects is because, on the database I work on, the database meta data is published in uppercase by default, unless one defines the object name in the first place in mixed case enclosed in double quotes, and references said object in code in double quotes (every code reference). Keeping database object code references in the same case as the published meta data (uppercase default) also assists with generating SQL code from the DB meta data.

    I also prefer SQL code editors that use 4 different colors for each of: SQL reserved words, database objects, numeric constants, string literals.

    Other than that, my SQL formatting is very close to Joe’s guidelines. Quality SQL formatting really helps productivity.

    Joe, thanks for sharing these tips. I need to make time to read more of your excellent SQL material.