Listing common SQL Code Smells.

Once you’ve done a number of SQL Code-reviews, you’ll know those signs in the code that all might not be well. These ‘Code Smells’ are coding styles that don’t directly cause a bug, but are indicators that all is not well with the code. . Kent Beck and Massimo Arnoldi seem to have coined the phrase in the “OnceAndOnlyOnce” page of www.C2.com, where Kent also said that code “wants to be simple”. Bad Smells in Code was an essay by Kent Beck and Martin Fowler, published as Chapter 3 of the book ‘Refactoring: Improving the Design of Existing Code’ (ISBN 978-0201485677) Although there are generic code-smells, SQL has its own particular coding habits that will alert the programmer to the need to re-factor what has been written.

See Exploring Smelly Code   and Code Deodorants for Code Smells by Nick Harrison for a grounding in Code Smells in C#

I’ve always been tempted by the idea of automating a preliminary code-review for SQL. It would be so useful to trawl through code and pick up the various problems, much like the classic ‘Lint’ did for C, and how the Code Metrics plug-in for .NET Reflector by Jonathan ‘Peli’ de Halleux is used for finding Code Smells in .NET code. The problem is that few of the standard procedural code smells are relevant to SQL, and we need an agreed list of code smells. Merrilll Aldrich made a grand start last year in his blog Top 10 T-SQL Code Smells.However, I’d like to make a start by discovering if there is a general opinion amongst Database developers what the most important SQL Smells are.

One can be a bit defensive about code smells. I will cheerfully write very long stored procedures, even though they are frowned on. I’ll use dynamic SQL occasionally. You can only use them as an aid for your own judgment and it is fine to ‘sign them off’ as being appropriate in particular circumstances. Also, whole classes of ‘code smells’ may be irrelevant for a particular database. The use of proprietary SQL, for example, is only a ‘code smell’ if there is a chance that the database will have to be ported to another RDBMS. The use of dynamic SQL is a risk only with certain security models. As the saying goes,  a CodeSmell is a hint of possible bad practice to a pragmatist, but a sure sign of bad practice to a purist.

Plamen Ratchev’s wonderful article Ten Common SQL Programming Mistakes lists some of these ‘code smells’ along with out-and-out mistakes, but there are more. The use of nested transactions, for example, isn’t entirely incorrect, even though the database engine ignores all but the outermost: but it does flag up the possibility that the programmer thinks that nested transactions are supported.

If anything requires some sort of general agreement, the definition of code smells is one. I’m therefore going to make this Blog ‘dynamic, in that, if anyone twitters a suggestion with a #SQLCodeSmells tag (or sends me a twitter) I’ll update the list here. If you add a comment to the blog with a suggestion of what should be added or removed, I’ll do my best to oblige. In other words, I’ll try to keep this blog up to date. The name against each ‘smell’ is the name of the person who Twittered me, commented about or who has written about the ‘smell’. it does not imply that they were the first ever to think of the smell!

  • Use of deprecated syntax such as *= – see SR0010: Avoid using deprecated syntax when you join tables or views  (Dave Howard)
  • Denormalisation that requires the shredding of the contents of columns. (Merrill Aldrich)
  • Contrived interfaces
  • Using VARCHAR and VARBINARY for datatypes that will be very small (size 1 or 2) and consistent – see SR0009: Avoid using types of variable length that are size 1 or 2
  • Use of deprecated datatypes such as TEXT/NTEXT (Dave Howard)
  • Datatype mis-matches in predicates that rely on implicit conversion. see SR0014: Data loss might occur when casting from {Type1} to {Type2} (Plamen Ratchev)
  • Using Correlated subqueries instead of a join   (Dave_Levy/ Plamen Ratchev)
  • The use of Hints in queries, especially NOLOCK (Dave Howard /Mike Reigler)
  • Few or No comments.
  • Use of functions in a WHERE clause. (Anil Das)
  • Overuse of scalar UDFs (Dave Howard, Plamen Ratchev)
  • Excessive ‘overloading’ of routines.
  • The use of Exec xp_cmdShell (Merrill Aldrich)
  • Excessive use of brackets. (Dave Levy)
  • Lack of the use of a semicolon to terminate statements
  • Use of non-SARGable functions on indexed columns in predicates (Plamen Ratchev)
  • Duplicated code, or strikingly similar code.
  • Misuse of SELECT *  -see SR0001: Avoid SELECT * in stored procedures, views and table-valued functions  (Plamen Ratchev)
  • Overuse of Cursors (Everyone. Special mention to Dave Levy & Adrian Hills)
  • Overuse of CLR routines when not necessary (Sam Stange)
  • Same column name in different tables with different datatypes. (Ian Stirk)
  • Use of ‘broken’ functions such as ‘ISNUMERIC’ without additional checks.
  • Excessive use of the WHILE loop (Merrill Aldrich)
  • INSERT … EXEC (Merrill Aldrich)
  • The use of stored procedures where a view is sufficient (Merrill Aldrich)
  • Not using two-part object names (Merrill Aldrich)
  • Using INSERT INTO without specifying the columns and their order (Merrill Aldrich)
  • Full outer joins even when they are not needed. (Plamen Ratchev)
  • Huge stored procedures (hundreds/thousands of lines).
  • Stored procedures that can produce different columns, or order of columns in their results, depending on the inputs.
  • Code that is never used.
  • Complex and nested conditionals
  • WHILE (not done) loops without an error exit.
  • Variable name same as the Datatype
  • Vague identifiers.
  • Storing complex data  or list in a character map, bitmap or XML field
  • User procedures with sp_ prefix (Aaron Bertrand)
  • Views that reference views that reference views that reference views (Aaron Bertrand)
  • Inappropriate use of sql_variant (Neil Hambly)
  • Errors with identity scope using SCOPE_IDENTITY @@IDENTITY or IDENT_CURRENT- see SR0008: Consider using SCOPE_IDENTITY instead of @@IDENTITY  (Neil Hambly, Aaron Bertrand)
  • Schemas that involve multiple dated copies of the same table instead of partitions (Matt Whitfield-Atlantis UK)
  • Scalar UDFs that do data lookups (poor man’s join) (Matt Whitfield-Atlantis UK)
  • Code that allows SQL Injection (Mladen Prajdic)
  • Tables without clustered indexes (Matt Whitfield-Atlantis UK)
  • Use of “SELECT DISTINCT” to mask a join problem (Nick Harrison)
  • Multiple stored procedures with nearly identical implementation. (Nick Harrison)
  • Excessive column aliasing may point to a problem or it could be a mapping implementation. (Nick Harrison)
  • Joining “too many” tables in a query. (Nick Harrison)
  • Stored procedure returning more than one record set. (Nick Harrison)
  • A NOT LIKE condition (Nick Harrison)
  • Not setting an output parameter for all code paths through a stored procedure, see SR0013: Output parameter (parameter) is not populated in all code paths
  • excessive “OR” conditions. (Nick Harrison)
  • User procedures with sp_ prefix (Aaron Bertrand)
  • Views that reference views that reference views that reference views (Aaron Bertrand)
  • sp_OACreate or anything related to it (Bill Fellows)
  • Prefixing names with tbl_, vw_, fn_, and usp_ (‘tibbling’) (Jeremiah Peschka)
  • Aliases that go a,b,c,d,e… (Dave Levy/Diane McNurlan)
  • Overweight Queries (e.g. 4 inner joins, 8 left joins, 4 derived tables, 10 subqueries, 8 clustered GUIDs, 2 UDFs, 6 case statements = 1 query) (Robert L Davis)
  • Order by 3,2 (Dave Levy)
  • MultiStatement Table functions which are then filtered ‘Sel * from Udf() where Udf.Col = Something’ (Dave Ballantyne)
  • running a SQL 2008 system in SQL 2000 compatibility mode(John Stafford)

For more articles like this, sign up to the fortnightly Simple-Talk newsletter.

  • 6767 views

  • Rate
    [Total: 0    Average: 0/5]