Tony Davis

Simple-Talk Editor
News, views and good brews

How should you layout code?

Published Wednesday, February 27, 2008 11:02 AM

It is strange to see the heat generated over arguments about how code should be formatted. With Visual Studio, of course, it isn't much of an issue, as it is done for you, but the closer you get to the 'live free and die' communities of the LAMP platforms, the more contentious it gets. Likewise, SQL Server's TSQL inhabits a strange land where there seems little consensus over the correct way to lay it out.

 

There are some great tools for doing automatic layout of code. Visual Studio sets a high standard for this, but other IDEs have caught up. Visual Slickedit is hard to beat as a general-purpose programmers' editor, with facilities to tidy up almost any language. Nevertheless, for the most part, each language has its favourite tool. A few examples that spring to mind:

 

  • A perennial favourite for .NET code is Jetbrains ReSharper, thought by many to offer better formatting capabilities than native Visual studio.
  • For PHP, I've always liked Waterproof's PHPedit, and their command-line phpCodeBeautifier works well and is free.
  • There is nothing to beat HTMLTidy for tackling HTML and XML layout
  • Topstyle is the only tool I know of that sorts out CSS layout properly
  • Aptana studio does a great job in laying out JavaScript.
  • SQL Refactor is an essential tool for untidy SQL Server 2005, programmers. I've found very little else to touch it as a SQL code beautifier.

 I was recently re-reading Joe Celco's excellent book 'SQL Programming Style' and it struck me that there really wasn't much in what he says about the layout of code that one could argue with. However, the feedback that Red Gate gets over the SQL Refactor product is that providing 40 different options for laying out SQL code just isn't enough.

 

After years of 'anything goes' SQL coding, it is hard to get any consensus on the right way to lay it out. Every large IT department seems to have its own standard, ranging from incredibly strict, to "everything in lower case". I have a sneaking tolerance towards the latter, liberal approach but, as the editor of a website that has a lot of SQL on it, I have my own reasons to hope for some sort of consensus.

 

It is worse still for our sister website, SQLServerCentral.com. In this case, people describe their coding problems in forums, and sometimes paste in code that makes one involuntarily flinch. I can see how years of suffering in this way occasionally turns Joe Celco from his usual mild Dr Jeckell persona into a Forum Mr Hyde. Jeff Moden is now famous for his attempts to try to persuade forum users to adopt a Forum etiquette that includes writing SQL that can easily be understood by others.

 

In short, there would seem to be an obvious need to tighten up standards somewhat. So, how's this for a suggestion:

 

Identifiers:

  • No Reverse Hungarian notation
  • Use ISO-11179 where appropriate
  • Up to 30 characters.
  • Avoid abbreviations.
  • Avoid quoted (delimited) identifiers,
  • Use standard postfixes
  • Use only letters, digits and underscores for names
  • Scalars should be in lower case
  • Schema object names should be capitalised
  • Reserved words should be in upper case.

Layout:

  • Use a comma, followed by one space (or newline), as a delimiter in a list.
  • Use one space only between language tokens
  • Use a tab-space of three ems (spaces)
  • Put each clause on a new line, indenting if it is a subordinate clause or subquery.

Disagree? We'd love to hear from you - jsut add your suggestions as a comment to this blog (you'll need to be signed in). The best contribution will receive a $50 Amazon voucher!

 

Cheers,

 

Tony.

Comments

 

kamau said:

I think that the best way to layout code is just to keep it simple. Simple that is in a way that a developer that reviews your code can make sense of what you were trying to do. All this talk of identifiers etc (in my opinon) just makes thing more complicated, don't get me wrong, I actually do enforce some of the suggestions made here, all I'm saying is if it doesn't make sense to you right now, then it definately won't make sense to the guy that'll be tweaking it 6 months down the road.
Being consistent and writing good comments is more important than following strange rules, IMHO. If your script is all uppercase, then keep it that way, if you're going to abbreviate (which I don't), then abbreviate everything, as long as the abberviations make sense!
We've all (i think), checked out someone elses code and wondered what the hell they were thinking writing code/script like that, then after some time, when you've realised how they write their code, you start to know where/what needs to be fixed. Simply because they were consistent in how they wrote their code.
Being consistent is an easier thing to emphasize to developers, as compared to having them change how they've been coding for years. Of course I don't mean that if the code is bad, it should be consistently bad, just that if the code is easily understood and he/she is consistent, then, what's the big deal?
Just my opinion though.

John K.
February 27, 2008 8:35 AM
 

LaptopHeaven said:

I agree with code standardization, but one should not go broke trying to standardize code formatting.

ReSharper C#: $249
PHPedit: Free
HTMLTidy: Free
Topstyle: $79.95
Aptana: $99
SQL Refactor: $295
February 27, 2008 8:36 AM
 

Jason Haley said:

February 27, 2008 8:55 AM
 

Jason Haley said:

February 27, 2008 8:55 AM
 

HotAir said:

local @ variables should begin with a lowercase (camel cased); Input @parameters should be Pascal cased.

Specify joins using "INNER JOIN table ON <columns>" rather than using the where clause.
February 27, 2008 9:05 AM
 

wonderbison said:

There are 2 points concerning code in my mind. The first is that anyone in an organization should follow the standard set there, whether its what they are used to or not. Those who are out on their own then you need a standrd. This brings me to the second point. What is needed is some sort of standards organization like the W3C, and so on. They would then set a standard where companies can sign up to their standard and would be a good global place for people to see a set of standards. Companies like RedGate could then produce tools that "made" code to those standards. In fact....
February 27, 2008 9:55 AM
 

Phil Factor said:

If I had a songsheet to sing to as WonderBison suggests, then even I could easily re-write the SQL Prettifier to work to those standards. Poor RedGate has to cater for a huge veriety of tastes with their 'SQL Refactor' and 'SQL Prompt'. Anyone who aims to write a tool to 'beautify' SQL faces a daunting task. If we had an accepted SQL Layout standard, particularly if it was appropriate for all flavours: Oracle, MySQL etc. it would surely make everyone's life easier. The only problem with a standards body is that they occasionally do loony things, such as XHTML1.1 but if it was given a clear brief to reach as close a consensus with 'Best Practice' then it would be a grand idea.
February 27, 2008 10:34 AM
 

hammond13 said:

I really just love having a tool doing my layout for me, even if it is not perfect.  It took me a little while to get used to the way that Visual Studio lays out html, but as long as it doesn't break my code and it is consistent I would prefer to let the tool do it.  The same goes for SQL Refactor's layout function.  It makes the code more readable BECAUSE it is consistent.

That said, I don't know if an industry-wide standard is necessary.  It took me very little time to get used to the way SQL Refactor formats SQL, and it wouldn't take me long to get used to something else.  I just use all the default options, and I'm fine.

There is only one problem I have noticed with the way SQL Factor formats code.  If there is a string of IF/ELSE IF statements, it increases the indent with each ELSE statement.  To fix this, "ELSE IF" would have to be treated as an exception to the subordinate clause rule.
February 27, 2008 11:38 AM
 

Anil Das Q said:

Tony wrote, "SQL Refactor is an essential tool for untidy SQL Server 2005, programmers. I've found very little else to touch it as a SQL code beautifier." Now, I use SQL Refactor at work because my company provides it to me, but the lack of support for :setvar and other SQLCMD tokens makes it very, very difficult for me to use. Otherwise it's a pretty good product. A competitor that does handle SQLCMD tokens and allows more customization of formatting is SQL Assistant from SoftTree Technologies.
February 27, 2008 11:56 AM
 

Tore said:

To me, there are severeal areas touched in this article and comments:

- Coding Standards
    Use of aliases, parentheses, acceptable and unacceptable constructs (joins in WHERE clause), Keyword case, etc.

- Naming Standards
    The article refers to this as "Identifiers" - it would include use of cAsE, pre- and postfixes, abbreviations, use of special characters, quoted/delimited, etc.

- Layout
    To me, layout is simply placement of text within its container.  And this is quite often one of the biggest readability obstacles for SQL code.

- Style
    Style is Layout++.  It may include part of naming standards as well as touch on some coding standards.

The other issue (in addition to Layout) that I find to causes problems with readability, is the lack of (brief) table aliases in a query.  When the column's table prefix competes with the column name, it takes much longer to visually identify the columns in the query.   Personally, I prefer aliases to be as short as possible, and I try to keep them to 1-3 uppercase letters.

SQL Prompt has made me lazy, and I tend to accept whatever alias it comes up with, even when it is 5 or more any case characters.  Only when developing a new database, did I take the time to configure default aliases in SQL Prompt - and I seemed to lose them at one point....

As for layout rules, the biggest disturbance to me is when lines do not follow the syntax - when a new clause doesn't start on a new line usually followed by an arbitrary new line in the middle of the clause.  Indentation gives visual structure to complex statements that can be hard to interpret without it.  I like my lists to have spaces after commas, just to make it easier to navigate word by word through a long select list.  It also makes it easier to read.

Occasionally, I will break up a select list and put each column on a separate line, but that can make for excessive scrolling.

I prefer to use tabs as opposed to spaces for indentation.  I like my tabs to be four or five "positions" wide, while I know people who prefer two.  Although the "hard" tabs may result in somewhat jumbled spacing when used in the middle of a line, that is typically only an issue in comments.

I prefer my CASE expressions to be on multiple lines, with CASE [optional value] by itself, each WHEN (& ELSE) to be indented underneath, and the corresponding THEN to either be on the same line or indented under its WHEN, and the END to be at the same indentation level as the CASE.

When I have complex parenthesis structures, I indent according to the parenthesis level.  In some cases, I will develop the expression with full indentation for each parenthesis, but collaps or simplify after the expression is complete and fully tested.

I place commas and expression operators at the end of the line and logical operators (AND/OR) at the beginning of a line.  I usually keep a single logical construct (predicate) on a single line.

I use blank lines to separate most multi-line statements from other statements as well as to indicate blocks of single line statements.

In short, I try to make my statements as easy to read as possible - unless it conflicts with an enforced standard at work.  Of course, I use myself as the subjective measuring stick for that readability... :-]
February 27, 2008 1:00 PM
 

MacDBA said:

I agree with most of the author's formatting suggestions.  However, I REALLY would like the SQL Refactor tool to have the option to LOWERCASE keywords (in addition to uppercasing them).  IMHO, uppercasing keywords may have been a good idea many years ago when few editors would color-code syntax.  But now, it's harder to read--almost like the text is shouting at you.  

As for other formatting, I prefer the convention:
select  
  staffId
, staffName
, grossSalary
from dbo.personnel
where (
                empType = 'm'
           or empType = 'n'
         )
  and ( grossSalary - @minSalary ) > 0
February 27, 2008 3:22 PM
 

The Acne IT Agency said:

I've never been able to understand the use of lower case in reserved keywords. It would seem obvious to me that one has to differentiate keywords such as SELECT from  scalars and schema object names. I must admit to a liking for CamelCase though.
The thing that irritates me the most in rereading other people's code is to see the tbl prefix to tables. Odd, that when I tend to use the sp prefix for stored procedures.
February 27, 2008 4:08 PM
 

Reflective Perspective - Chris Alcock » The Morning Brew #42 said:

February 28, 2008 1:22 AM
 

cmt1 said:

I've found Steve McConnell's 'Code Complete 2nd Edition' (Microsoft Press, 2004) a great lead for setting coding standards within our organiization for our Visual Studio projects.

How about taking his 'industry standard' recommendations and translating them to the T-SQL declarative context, but trying to keep them as close as possible between the two environments? This is what we have sought to do.

This makes it a bit easier then for developers who are frequently moving between VB.NET/C# and T-SQL, or who have a working knowledge of both.
February 28, 2008 5:08 AM
 

billweh said:

My current company we have a standard of putting the commas before each line like MacDBA has.

I like that SQL Prompt automatically upper cases my reserved words.  I've gotten so used to that, when I got to my text editor du jour, I am confounded that it doesn't do this for me. :)

One of the worst formatting schemes I've come across from one of my coworkers is where he puts his join conditions on another line:

SELECT col1
 , col2  -- I like this part - easy to comment out a field later on for testing
 , col3
FROM tablea a
JOIN tableb b
ON b.col1 = a.col1

When you have a number of tables joined together, the code is just about unreadable to me.

I'm also big on using good aliases for tables that I'm using.  Our tables tend to be LONG on the name side - I will use a alias that is a "first letter of each word" sort of short hand for the table.  E.G. tbCompanyProductInfoStatusXref - the alias becomes cpisx.  We have some code in our system where the aliases are FK and a number.  So all of the tables are FK1...FKxx, but it's not like table A is always FK1 - it depends and they are not always numbered sequentially, so sometimes it might be FK1, FK2, FK10.  No rhyme or reason as far as I can tell.

We do have a standard on naming things and I think that as long as everyone follows that scheme, it can make reading others code easier down the line.  Sometimes it may not be the best naming scheme, but it definitely helps to pick things out of the lineup when you are reading someone elses code.

I think one of the biggest things is to use TABS as opposed to spaces.  Using a single indent between levels as well.  This way if user A prefers a four space tab, it will look just fine when user B opens it up with a two space tab. (my preference)

I would like to see a standards group come up with a list of prefixes perhaps for various types of items within a DB.  Present the naming solution as a suggestion then as you went from project to project, you would be able to pick out what each item was.

Now there's a suggestion for SQL Refactor.  If you were to create a document that had the spec in it (sprocs all start with sp, tables with tb, etc or whatever) - then you could give the program what your current naming convention is and it would go through and fix it all.  Giving you the output first so that you could go through it and decide if you really wanted to do this.  The down side would be if it didn't also update your code in say .NET or <shudder> ASP. :)
February 28, 2008 11:18 AM
 

Alex Kuznetsov said:

Working with a team of talented professional where everyone comes from different countries,
here is our small company approach to coding standards:

1. A team of skilled and motivated professionals is your biggest asset. Coding standards are
good for the business only if there is a consensus, only if everyone supports them.
Because many skilled professionals need to have some personal style, the standards must not be too tight,
so that the creativity and the bottom line do not suffer.

2. A full set of rules, written in clear and unambiguous language, must fit on a printed page.
References to outside sources such as ISO-11179 should not count as a single line, unless everyone
knows perfectly well all the contents of outside source. It is better to have good compliance with a simple set of rules.

3. if possible, rules should be enforced automatically. In our C# project some rules are enforced as unit tests.
Examples:
3.1 Warnings should be treated as errors for all the projects and all the configurations. Once we the team agreed
on this rule, we added a unit test which verifies it against all the projects in the solution.
3.2 There must be no tabs in the source code. The reason is simple: different tools render tabs as different number of spaces.
Again, a unit test enforces it.

4. The width of source code should fit on everybody's' screens. A method/procedure/query should not exceed
one or two screens. If some team members frequently work on laptops, or if someone needs a large font,
accommodate accordingly. Typically an identifier should not exceed 50 characters.

5. Only commonly accepted and easily understood abbreviations are OK.
SqlCommand and TpsReport are OK - everyone know what SQL and TPS reports are ;).
StructuredQueryLanguageCommand looks ridiculous to me. But avoid inventing your own nomenclature and abbreviations,
unless there is a very strong need for it.

6. All code should be reviewed by peers, and easy to understand. If a team member does not understand the code,
always comment it and/or give methods and variables more meaningful names. If the comment is not clear, always fix it.

Some more specifics for SQL:
1. Prefix all your columns, otherwise you may get subtle errors.
2. Qualify all your objects with schema names.
February 29, 2008 1:02 PM
 

louisducnguyen said:

Regarding the layout of T-SQL code:

HUNGARIAN object naming.  Hungarian and non-hungarian both have their strong points.  Hungarian e.g. tbl_salesInventory makes it easier to search hundreds of (usually horribly written) stored procedures, for troubleshooting purposes.  Non hungarian e.g. SALESINVENTORY is more concise.  Either method is acceptable.

Reserved words and objects in UPPER CASE.  T-SQL code should be no different than the convention followed when writing email, text messages, or normal English.  WRITING IN UPPER CASE IS CONSIDERED SHOUTING.  It is also my opinion the most interesting bits in T-SQL code are the objects: tables, views, columns; not the reserved words.  Writing object names in upper case is acceptable.  Reserved words in modern editors are color coded and have no need to SHOUT.

LAYOUT: SELECT clause.  The select clause can be written as one single line if followed by 4 columns or less (1).  If more than 4 columns, each column should be on its own line (2).  
Example (1)
 select firstName, lastName, phoneNum
Example (2)
 select firstName,
   lastName,
   streetAddress1,
   streetAddress2,
   city,
   state,
   postalCd,
   phoneNum
   country

LAYOUT: CASE statement.  With the case statement, the case when else & end reseerved words should start on their own line.   When and then should be on the same line.  
Example (1)
select case software
 when 'TSQL' then 'cool'
 when 'C#' then 'ok'
 when '.NET Framework' then 'alphabet soup'
 else 'no comment'
 end
Example (2)
select case
 when software = 'TSQL' then 'cool'
 else 'no comment'
 end
 
   
LAYOUT: FROM clause. The from clause has two special cases, DERIVED TABLES and JOINS.  In the case of derived tables, the from should be followed by a left parentheses.  The derived table code should start on a new line.  The closing parentheses and table alias follow on its own line.  Table aliases should be letters a, b, c, & d (1).  If the query involves more than four tables, it should be broken up into small steps using temporary tables.  This is done to improve readability.  Joining more than four tables in a given query, also usually hurts performance.  In the case of joins, each join and on operator should start on its own line.  For on operators with multiple conditions, each condition should start on its own line.  The from and join clause should be followed either by the table name or the opening parentheses (2).
Example (1)
 select a.cust_id, a.lastContactDt
 from (
   select cust_id, max(contactDt) as lastContactDt
   from customerContact
   group by cust_id
 ) a
Example (2)
 select a.name, b.lastContactDt
 from customer a
 join (
   select cust_id, max(contactDt) as lastContactDt
   from customerContact
   group by cust_id,
 ) b
   on a.cust_id = b.cust_id and
   a.cust_id <> -1

LAYOUT: WHERE GROUP BY & HAVING clauses. These clauses can be written in one line if they short, usually followed by four words or less (1).  Otherwise, each object following the reserved word should be on its own line (2).
Example (1)
 where contactDt is not null
 group by contactDt
 having count(*)>1
Example (2)
 where contactDt is not null and
   name is not null and
   cust_id <> -1
March 1, 2008 11:23 PM
 

duetsch.info » Blog Archiv » Fundstuecke - 04.03.2008 said:

March 4, 2008 3:34 AM
 

http://www.simple-talk.com/community/blogs/tony_davis/archive/2008/02/27/44572.aspx said:

March 22, 2008 1:56 AM
 

examples of a subordinate clause said:

May 2, 2008 10:46 PM
 

css reserved words said:

May 18, 2008 3:47 PM
 

Naming Convention question | keyongtech said:

January 18, 2009 11:36 AM
You need to sign in to comment on this blog


















<February 2008>
SuMoTuWeThFrSa
272829303112
3456789
10111213141516
17181920212223
2425262728291
2345678
Finding Stuff in SQL Server Database DDL
 You'd have thought that nothing would be easier than using SQL Server Management Studio (SSMS) for... Read more...

Mission Critical: SQL Server 2008 Performance Tuning Task List
 In which Buck Woody imagines how the US military would have tackled DBA checklists for... Read more...

Simple Query tuning with STATISTICS IO and Execution plans
 A great deal can be gleaned from the use of the STATISTICS IO and the execution plan, when you are... Read more...

Switching rows and columns in SQL
 When they use SQL Server, one the commoner questions that Ms Access programmers ask is 'Where's the... Read more...

Writing Efficient SQL: Set-Based Speed Phreakery
 Phil Factor's SQL Speed Phreak challenge is an event where coders battle to produce the fastest code to... Read more...