Click here to monitor SSC
  • Av rating:
  • Total votes: 395
  • Total comments: 56
Plamen Ratchev

Ten Common SQL Programming Mistakes

20 August 2009
Database Lifecycle Management Patterns & Practices Library Stage 3

DATABASE LIFECYCLE MANAGEMENT PATTERNS & PRACTICES LIBRARY

Continuous Integration

It is not always easy to spot "antipatterns" in your SQL, especially in more complex queries. In this article, Plamen demonstrates some of the most common SQL coding errors that he encounters, explains their root cause, and illustrates potential solutions.

Programming in SQL can be both a fun and a challenging task. Those with a background in traditional programming languages such as Java, C, C++, and VB often find it difficult to adapt to the "set-based" mindset. Even seasoned SQL developers and DBAs can get trapped in one of the many pitfalls of the SQL language. I have been there and I'm willing to bet that you have too. Mastering the fundamentals takes time, and even then some mistakes are hard to spot.

My intention with this article was to highlight some of the more common mistakes that I've seen people make when coding SQL. The list reflects my direct experience working with different development teams and developers, performing code review sessions, and so on, as well as the issues I see every day on the newsgroups and forums. The list of common mistakes is far from exhaustive, and is presented in no particular order of severity. In fact, the list could be regarded as the result of the following query:

SELECT TOP(10) mistake

FROM CommonSQLProgrammingMistakes

ORDER BY CHECKSUM(NEWID());

Without further ado, here is the list:

  1. NULLs and the NOT IN predicate
  2. Functions on indexed columns in predicates
  3. Incorrect subquery column
  4. Data type mismatch in predicates
  5. Predicate evaluation order
  6. Outer joins and placement of predicates
  7. Subqueries that return more than one value
  8. Use of SELECT *
  9. Scalar user-defined functions
  10. Overuse of cursors

The examples are presented using SQL Server's Transact-SQL dialect, but most of the concepts are valid in any SQL implementation.

NULLs and the NOT IN predicate

One of the most common requests is to retrieve data based on some column value not included in a list of values. The following two tables illustrate the scenario. We have tables with colors and products:

Colors table:

color

----------

Black

Blue

Green

Red

Products table:

sku  product_description  color

---- -------------------- ------

1    Ball                 Red

2    Bike                 Blue

3    Tent                 NULL

 

Note that these tables do not represent a perfect design, following normalization rules and best practices. Rather, it is a simplified scenario to help illustrate this example better. In reality, the colors table would most likely contain a color code key column that would be referenced in the products table.

The request is to select a list of colors that have not previously been used on products. In other words, we need to construct a query that returns only those colors for which there is no product with that color. It might seem, at first glance, that the NOT IN predicate provides a very intuitive way to satisfy this request, very close to how the problem would be stated in plain English:

SELECT C.color

FROM Colors AS C

WHERE C.color NOT IN (SELECT P.color

                      FROM Products AS P);

You may have been expecting this query to return two rows (for 'black' and 'green') but, in fact, it returns an empty result set:

color

----------

 

(0 row(s) affected)

Obviously this is 'incorrect'. What is the problem? It's simply that SQL uses three-valued logic, driven by the existence of NULL, which is not a value but a marker to indicate missing (or UNKNOWN) information. When the NOT operator is applied to the list of values from the subquery, in the IN predicate, it is translated like this:

"color NOT IN (Red, Blue, NULL)"

This is equivalent to:

"NOT(color=Red OR color=Blue OR color=NULL)"

The expression "color=NULL" evaluates to UNKNOWN and, according to the rules of three-valued logic, NOT UNKNOWN also evaluates to UNKNOWN. As a result, all rows are filtered out and the query returns an empty set.

This mistake will often surface if requirements change, and a non-nullable column is altered to allow NULLs. It also highlights the need for thorough testing. Even if, in the initial design, a column disallows NULLs, you should make sure your queries continue to work correctly with NULLs.

One solution is to use the EXISTS predicate in place of IN, since EXISTS uses two-valued predicate logic evaluating to TRUE/FALSE:

SELECT C.color

FROM Colors AS C

WHERE NOT EXISTS(SELECT *

                 FROM Products AS P

                 WHERE C.color = P.color);

This query correctly returns the expected result set:

color
----------
Black
Green

Other possible solutions are as follows:

/* IS NOT NULL in the subquery */

SELECT C.color

FROM Colors AS C

WHERE C.color NOT IN (SELECT P.color

                      FROM Products AS P

                      WHERE P.color IS NOT NULL);

 

/* EXCEPT */

SELECT color

FROM Colors

EXCEPT

SELECT color

FROM Products;

 

/* LEFT OUTER JOIN */

SELECT C.color

FROM Colors AS C

LEFT OUTER JOIN Products AS P

  ON C.color = P.color

WHERE P.color IS NULL;

While all solutions produce the desired results, using EXCEPT may be the easiest to understand and use. Note that the EXCEPT operator returns distinct values, which works fine in our scenario but may not be correct in another situation.

Functions on indexed columns in predicates

We often tend to write code as a direct translation of given request. For example, if we are asked to retrieve all customers whose name starts with the letter L, it feels very natural to write the query like this, using the LEFT function to return the first character of their name:

SELECT customer_name

FROM Customers

WHERE LEFT(customer_name, 1) = 'L';

Alternatively, if we are asked to calculate the total sales for January 2009, we might write a query like the following, which uses the DATEPART function to extract the relevant month and year from the sale_date column:

SELECT SUM(sale_amount) AS total_sales

FROM Sales

WHERE DATEPART(YEAR, sale_date) = 2009

  AND DATEPART(MONTH, sale_date) = 1;

While these queries look very intuitive, you will find that the indexes that you (of course!) have on your customer_name and sale_date columns remain unused, and that the execution plan for these queries reveal index scans.

The problem arises from the fact that the index columns are being passed to a function, which the query engine must then evaluate for every single row in the table. In cases such as these, the WHERE clause predicate is deemed "non-SARGable" and the best that the query optimizer can do is perform a full index or table scan.

To make sure the indexes get used, we need to avoid the use of functions on the indexed columns. In our two examples, it is a relatively simple task to rewrite the queries to use SARG-able predicates. The first requested can be expressed with this logically equivalent query:

SELECT customer_name

FROM Customers

WHERE customer_name LIKE 'L%';

The equivalent for the second query is as follows:

SELECT SUM(sale_amount) AS total_sales

FROM Sales

WHERE sale_date >= '20090101'

  AND sale_date <  '20090201';

These two queries are most likely to utilize index seek to retrieve the data quickly and efficiently.

It's worth noting that SQL Server is getting "smarter" as it evolves. For example, consider the following query, which uses the CAST function on the indexed sale_date column:

SELECT SUM(sale_amount) AS total_sales

FROM Sales

WHERE CAST(sale_date AS DATE) = '20090101';

If you run this query on SQL 2005 or earlier, you'll see an index scan. However, on SQL Server 2008 you'll see an index seek, despite the use of the CAST function. The execution plan reveals that the predicate is transformed into something like the following:

SELECT SUM(sale_amount) AS total_sales

FROM Sales

WHERE sale_date >= '20090101'

  AND sale_date <  '20090102';

However, in general, you should use SARGable predicates where possible, rather than rely on the evolving intelligence of the optimizer.

Incorrect subquery column

When writing a subquery, it is very easy to abstract yourself from the main query logic and concentrate on the subquery itself. This can lead to the innocent mistake of substituting a column from the subquery source table for a column with similar name from the main query.

Let's look at two very simple tables; one is a Sales table containing sales data, and the other is an auxiliary Calendar table that has all calendar dates and holidays (abbreviated here):

Sales table:

sale_date  sale_amount

---------- -----------

2009-01-01 120.50

2009-01-02 115.00

2009-01-03 140.80

2009-01-04 100.50

Calendar table:

calendar_date holiday_name

------------- ----------------

2009-01-01    New Year's Day

2009-01-02    NULL

2009-01-03    NULL

2009-01-04    NULL

2009-01-05    NULL

Our task is to retrieve sales data for holiday dates only. It seems like a trivial query to write:

SELECT sale_date, sale_amount

FROM Sales AS S

WHERE sale_date IN (SELECT sale_date

                    FROM Calendar AS C

                    WHERE holiday_name IS NOT NULL);

However, you'll find that query simply returns all rows from the Sales table! A closer look at the query reveals that the culprit to be the SELECT list of the subquery. It accidentally references the sales_date column from the Sales table, instead of the calendar_date column from the Calendar table.

If that is the case, why did we not get an error? Although the outcome was not what we expected, this is still a valid SQL statement. When using a subquery, the outer query's columns are exposed to the inner query. Here, we unintentionally converted the self-contained subquery, to be executed once and the value passed to the outer query, to a correlated subquery, logically executed once for every row returned by the outer query.

As a result, the subquery evaluates to sale_date IN (sale_date) which is always true, as long as there is at least one holiday date in the Calendar table, and so our result set returns all rows from the Sales table. Of course, the fix is easy in this case; we simply use the correct date column from the Calendar table:

SELECT sale_date, sale_amount

FROM Sales AS S

WHERE sale_date IN (SELECT C.calendar_date

                    FROM Calendar AS C

                    WHERE C.holiday_name IS NOT NULL);

This illustrates another important point: it is a best practice to prefix columns in subqueries with table aliases. For example, if we had used an alias like this:

SELECT sale_date, sale_amount

FROM Sales AS S

WHERE sale_date IN (SELECT C.sale_date

                    FROM Calendar AS C

                    WHERE holiday_name IS NOT NULL);

Then this query would have resulted in an error – "Error: Invalid column name 'sale_date'".

Data type mismatch in predicates

This is another typical mistake that is sometimes hard to catch. It is very easy to mismatch data types in predicates. It could be in a stored procedure where the parameter is passed as one data type and then used in a query to filter data on a column of different data type. Another example is joining tables on columns with different data types, or simply using a predicate where data types are mismatched.

For example, we may have a Customers table where the last_name column is of type VARCHAR:

CREATE TABLE Customers (

 customer_nbr INT NOT NULL PRIMARY KEY,

 first_name VARCHAR(35) NOT NULL,

 last_name VARCHAR(35) NOT NULL);

Then the following stored procedure is used to retrieve the customer information by customer last name:

CREATE PROCEDURE GetCustomerByLastName

 @last_name NVARCHAR(35)

AS

 SELECT first_name, last_name

 FROM Customers

 WHERE last_name = @last_name;

Notice here the parameter @last_name is of data type NVARCHAR. Although the code "works", SQL Server will have to perform implicit conversion of the last name column to NVARCHAR, because NVARCHAR is of higher data precedence. This can result in a performance penalty. The implicit conversion is visible in the query plan as CONVERT_IMPLICIT. Based on collation, and other factors, a data type mismatch may also preclude the use of an index seek. Use of the correct data type resolves the problem:

CREATE PROCEDURE GetCustomerByLastName

 @last_name VARCHAR(35)

AS

 SELECT first_name, last_name

 FROM Customers

 WHERE last_name = @last_name;

In many cases, this mistake is the result of splitting responsibilities on the team, and having one team member design the tables and another implement stored procedures or code. Another reason could be using different data sources to join data where the join columns have different data types in the source systems. The same advice applies not only to character data type mismatches, but also to mismatches between numeric data types (like INT and FLOAT), or the mixing of numeric and character data types.

Predicate evaluation order

If you are familiar with the logical query processing order, then you may expect that a query is executed in the following order:

  1. FROM
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT

The sequence above outlines the logical order for executing query. Logically the FROM clause is processed first defining the source data set, next the WHERE predicates are applied, followed by GROUP BY, and so on.

However, physically, the query is processed differently and the query optimizer is free to move expressions in the query plan in order to produce the most cost efficient plan for retrieving the data. This leads to a common misunderstanding that a filter in the WHERE clause is applied before the next phases are processed. In fact, a predicate can be applied much later in the physical execution plan. Also, there is no left to right order for execution of predicates. For example, if you have a WHERE clause containing "WHERE x=1 AND y=2", there is no guarantee that "x=1" will be evaluated first. They can be executed in any order.

For example, consider the following Accounts table where, in the account_reference column, Business accounts are denoted by a numeric reference and Personal accounts by a character reference:

account_nbr account_type    account_reference

----------- --------------- -----------------

1           Personal        abc

2           Business Basic  101

3           Personal        def

4           Business Plus   5

In general, this table indicates bad design. The account_reference column should be represented as two different attributes, specific to business and personal accounts and each with the correct data type (not even belonging to the same table). However, in practice, we very often have to deal with systems designed with shortcomings, where altering the design is not an option.

Given the above scenario, a valid request is to retrieve all business type accounts with an account reference that is greater than 20 (assuming account reference has some meaningful numeric value for business type accounts). The query may look like this:

SELECT account_nbr, account_reference AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%'
  AND CAST(account_reference AS INT) > 20;

However, the query results in error:

"Conversion failed when converting the varchar value 'abc' to data type int"

The query fails because, as noted earlier, there is no prescribed order for executing predicates and nothing guarantees that the predicate "account_type LIKE ‘Business%’" will be evaluated before the predicate "CAST(account_reference AS INT) > 20". In our case, the second predicate is evaluated first resulting in a conversion error, due to the incompatible values in the account_reference column, for personal accounts.

One attempt to resolve this issue might be to use a derived table (or common table expression) to filter the business type accounts first, and then apply the predicate for account_reference column:

SELECT account_nbr, account_ref_nbr

FROM (SELECT account_nbr,

             CAST(account_reference AS INT) AS account_ref_nbr

      FROM Accounts

      WHERE account_type LIKE 'Business%') AS A

WHERE account_ref_nbr > 20;

However, this results in the exact same error because derived tables and CTEs are expanded in the query plan and a single query plan is produced, where predicates can again be pushed up or down in the plan.

As indicated earlier, the problem here is a mix of bad design and misunderstanding of how SQL Server performs physical query execution. What is the solution? The best solution is to design the table correctly and avoid storing mixed data in a single column. In this case, a work around is to use a CASE expression to guarantee that only valid numeric values will be converted to INT data type:

SELECT account_nbr, account_reference AS account_ref_nbr

FROM Accounts

WHERE account_type LIKE 'Business%'

  AND CASE WHEN account_reference NOT LIKE '%[^0-9]%'

           THEN CAST(account_reference AS INT)

      END > 20;

The CASE expression uses a LIKE pattern to check for valid numeric values (a double negation logic is used which can be translated as "there is not a single character that is not a digit"), and only for those values performs the CAST. For the rest of the values the CASE expression results in NULL, which is filtered out because NULL is not matched with any value (even with NULL).

Outer joins and placement of predicates

Outer joins are such a great tool but are also much misunderstood and abused. Some people seem to like them so much that they throw one into almost every query, regardless of whether or not it is needed!

The key to correct use of outer joins is an understanding of the logical steps required to process an outer join in a query. Here are the relevant steps from the query processing phases:

  1. A cross join (Cartesian product) is formed for the two input tables in the FROM clause. The result of the Cartesian product is every possible combination of a row from the first table and a row from the second table.
  2. The ON clause predicates are applied filtering only rows satisfying the predicate logic.
  3. Any Outer rows filtered out by the predicates in step 2 are added back. Rows from the preserved table are added with their actual attribute values (column values), and the attributes (columns) from the non preserved table are set to NULL.
  4. The WHERE clause predicates are applied.

An outer join query can produce completely different results depending on how you write it, and where predicates are placed in that query. Let's look at one example, based on the following two tables, Customers and Orders:

Customers table:

customer_nbr customer_name

------------ --------------

1            Jim Brown

2            Jeff Gordon

3            Peter Green

4            Julie Peters

Orders table:

order_nbr   order_date customer_nbr order_amt

----------- ---------- ------------ ----------

1           2008-10-01 1            15.50

2           2008-12-15 2            25.00

3           2009-01-02 1            18.00

4           2009-02-20 3            10.25

5           2009-03-05 1            30.00

Our task is to retrieve a list of all customers, and the total amount they have spent on orders, since the beginning of year 2009. Instinctively, one may write the following query:

SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009

FROM Customers AS C

LEFT OUTER JOIN Orders AS O

  ON C.customer_nbr = O.customer_nbr

WHERE O.order_date >= '20090101'

GROUP BY C.customer_name;

But the results do not look good:

customer_name  total_2009

-------------- ------------

Jim Brown      48.00

Peter Green    10.25

Customers Jeff and Julie are missing from the result set. Where is the problem? In order to understand what went wrong, let’s play back this query one step at a time following the logical processing order. The first step is a cross join between the two input tables:

SELECT C.customer_name, O.order_amt

FROM Customers AS C

CROSS JOIN Orders AS O;

This results in every possible combination of rows from both input tables:

customer_name    order_amt  order_date

---------------- ---------- ----------

Jim Brown        15.50      2008-10-01

Jim Brown        25.00      2008-12-15

Jim Brown        18.00      2009-01-02

Jim Brown        10.25      2009-02-20

Jim Brown        30.00      2009-03-05

Jeff Gordon      15.50      2008-10-01

Jeff Gordon      25.00      2008-12-15

Jeff Gordon      18.00      2009-01-02

Jeff Gordon      10.25      2009-02-20

Jeff Gordon      30.00      2009-03-05

Peter Green      15.50      2008-10-01

Peter Green      25.00      2008-12-15

Peter Green      18.00      2009-01-02

Peter Green      10.25      2009-02-20

Peter Green      30.00      2009-03-05

Julie Peters     15.50      2008-10-01

Julie Peters     25.00      2008-12-15

Julie Peters     18.00      2009-01-02

Julie Peters     10.25      2009-02-20

Julie Peters     30.00      2009-03-05

The next step is applying the ON predicates of the JOIN clause:

SELECT C.customer_name, O.order_amt, O.order_date

FROM Customers AS C

INNER JOIN Orders AS O

   ON C.customer_nbr = O.customer_nbr;

The result of this query includes only customers with orders. Since customer Julie does not have any orders it is excluded from the result set:

customer_name  order_amt  order_date

-------------- ---------- ----------

Jim Brown      15.50      2008-10-01

Jeff Gordon    25.00      2008-12-15

Jim Brown      18.00      2009-01-02

Peter Green    10.25      2009-02-20

Jim Brown      30.00      2009-03-05

The third step of the logical processing order is adding back the outer rows. These rows were excluded in the prior step because they did not satisfy the join predicates.

SELECT C.customer_name, O.order_amt, O.order_date

FROM Customers AS C

LEFT OUTER JOIN Orders AS O

   ON C.customer_nbr = O.customer_nbr;

Now customer Julie is added back in the result set. Notice the added outer rows from the preserved table (Customers) have values for the selected attributes (customer_name) and the non-preserved table (Orders) rows have NULL for their attributes (order_amt and order_date):

customer_name  order_amt  order_date

-------------- ---------- ----------

Jim Brown      15.50      2008-10-01

Jim Brown      18.00      2009-01-02

Jim Brown      30.00      2009-03-05

Jeff Gordon    25.00      2008-12-15

Peter Green    10.25      2009-02-20

Julie Peters   NULL       NULL

The last step is applying the WHERE clause predicates:

SELECT C.customer_name, O.order_amt, O.order_date

FROM Customers AS C

LEFT OUTER JOIN Orders AS O

   ON C.customer_nbr = O.customer_nbr

WHERE O.order_date >= '20090101';

Now the picture is clear! The culprit is the WHERE clause predicate. Customer Jeff is filtered out from the result set because he does not have orders past January 1, 2009, and customer Julie is filtered out because she has no orders at all (since the outer row added for Julie has NULL for the order_date column). In effect, in this case, the predicate in the WHERE clause turns the outer join into an inner join.

To correct our initial query, it is sufficient to move the WHERE predicate into the join condition.

SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009

FROM Customers AS C

LEFT OUTER JOIN Orders AS O

  ON C.customer_nbr = O.customer_nbr

 AND O.order_date >= '20090101'

GROUP BY C.customer_name;

Now, the query returns correct results because Jeff and Julie are filtered out in the join predicates, but then added back when the outer rows are added.

customer_name  total_2009

-------------- ------------

Jeff Gordon    0.00

Jim Brown      48.00

Julie Peters   0.00

Peter Green    10.25

In a more complex example, with multiple joins, the incorrect filtering may happen on a subsequent table operator (like join to another table) instead in the WHERE clause. For example, say we have an OrderDetails table containing product SKU and quantity, and the request is to retrieve a list of all customers, with order amount and quantity, for selected product SKUs. The following query may seem correct:

SELECT C.customer_name, O.order_amt, D.qty

FROM Customers AS C

LEFT OUTER JOIN Orders AS O

  ON C.customer_nbr = O.customer_nbr

INNER JOIN OrderDetails AS D

  ON D.order_nbr = O.order_nbr

 AND D.sku = 101;

However, here the INNER join with the OrderDetails table plays the exact same role as the predicate in the WHERE clause in our previous example, in effect turning the LEFT OUTER join to INNER join. The correct query to satisfy this request needs to use a LEFT OUTER join to join to the OrderDetails table:

SELECT C.customer_name, O.order_amt, D.qty

FROM Customers AS C

LEFT OUTER JOIN Orders AS O

  ON C.customer_nbr = O.customer_nbr

LEFT JOIN OrderDetails AS D

  ON D.order_nbr = O.order_nbr

 AND D.sku = 101;

Subqueries that return more than one value

A very frequent request is to retrieve a value based on some correlation with the main query table. For example, consider the following two tables, storing details of products and the plants that manufacture these products:

Products table:

sku   product_description

----- ------------------

1     Bike

2     Ball

3     Phone

ProductPlants table:

sku   plant_nbr

----- -----------

1     1

2     1

3     2

The request is to extract the manufacturing plant for each product. One way to satisfy the request is to write the following query using correlated subquery to retrieve the plant:

SELECT sku, product_description,

      (SELECT plant_nbr

       FROM ProductPlants AS B

       WHERE B.sku = A.sku) AS plant_nbr

FROM Products AS A;

Note that the point here is to illustrate a technique; there could be a more efficient way to accomplish the same task. However, all works fine and we get the correct result set:

sku  product_description plant_nbr

---- ------------------- -----------

1    Bike                1

2    Ball                1

3    Phone               2

The query will continue to work happily until the day arrives that the company decides to start manufacturing Balls at plant 3, to cope with increasing demand. The ProductPlants table now looks like this:

sku   plant_nbr

----- -----------

1     1

2     1

2     3

3     2

All of a sudden, our query starts generating the following error:

Msg 512, Level 16, State 1, Line 1
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

The error is descriptive enough. Instead of the expected scalar value, our subquery returns a result set, which breaks the query. Based on our business requirements, the fix is simple. To list all plants manufacturing plant for a particular product, we simply use a JOIN:

SELECT A.sku, A.product_description, B.plant_nbr

FROM Products AS A

JOIN ProductPlants AS B

  ON A.sku = B.sku;

Now the query completes without errors and returns the correct results:

sku  product_description  plant_nbr

---- -------------------- -----------

1    Bike                 1

2    Ball                 1

2    Ball                 3

3    Phone                2

Note that the same error can occur in a predicate where a column or expression is tested against a subquery, for example "… column = (SELECT value FROM Table)". In that case, the solution is to use the IN predicate in place of "=".

Use of SELECT *

On first encounter with SQL we always praise the genius who invented the syntax SELECT *! It's so handy and easy to use! Instead of explicitly listing all column names in our query, we just use the magic wildchar '*' and retrieve all columns. For example, a common misuse of SELECT * is to extract a set of all plastic products and to insert them into another table with the same structure:

INSERT INTO PlasticProducts

SELECT *

FROM Products

WHERE material_type = 'plastic';

Job done! However, one day business requirements change and two new columns are added to the Products table:

ALTER TABLE Products

ADD effective_start_date DATETIME,

    effective_end_date DATETIME;

All of sudden the magic query results in error:

Msg 213, Level 16, State 1, Line 1

Insert Error: Column name or number of supplied values does not match table definition.

The fix is to explicitly list the column names in the query:

INSERT INTO PlasticProducts (sku, product_description, material_type)

SELECT sku, product_description, material_type

FROM Products

WHERE material_type = 'plastic';

The situation can get even worse if a view is created using SELECT *, and later the base tables are modified to add or drop columns.

Note: If a view is create using the SCHEMABINDING option, then the base tables cannot be modified in a way that will affect the view definition.

To conclude, do not use SELECT * in production code! One exception here is when using the EXISTS predicate. The select list in the subquery for the EXISTS predicate is ignored since only the existence of rows is important.

Scalar user-defined functions

Reuse of code is one of the fundamental principles we learn when programming in any language, and the SQL language is no exception. It provides many means by which to logically group code and reuse it.

One such means in SQL Server is the scalar user-defined function. It seems so convenient to hide away all those complex calculations in a function, and then simply invoke it in our queries. However, the hidden "sting in the tail" is that it can bring a heavy toll in terms of performance. When used in a query, scalar functions are evaluated for each row and, with large tables, this can result in very slow running queries. This is especially true when the scalar function needs to access another table to retrieve data.

Here is one example. Given tables with products and sales for products, the request is to retrieve total sales per product. Since the total sales value can be reused in another place, you decide to use a scalar function to calculate the total sales for a product:

CREATE FUNCTION dbo.GetTotalSales(@sku INT)

RETURNS DECIMAL(15, 2)

AS

BEGIN

  RETURN(SELECT SUM(sale_amount)

         FROM Sales

         WHERE sku = @sku);

END

Then the query to retrieve the total sales for each product will look like this;

SELECT sku, product_description, dbo.GetTotalSales(sku) AS total_sales

FROM Products;

Isn't this a very neat and good looking query? But just wait until you run it over a large data set. The total sales calculation will be repeated for each and every row, and the overhead will be proportional to the number of rows. The correct way to handle this is, if possible, is to rewrite the function as a table-valued function, or simply perform the calculation in the main query. In our example, performing the calculation in the query will look like this:

SELECT P.sku, P.product_description, SUM(S.sale_amount) As total_sales

FROM Products AS P

JOIN Sales AS S

  ON P.sku = S.sku

GROUP BY P.sku, P.product_description;

And here is a table-valued function that can be used to calculate total sales:

CREATE FUNCTION dbo.GetTotalSales(@sku INT)

RETURNS TABLE

AS

RETURN(SELECT SUM(sale_amount) AS total_sales

       FROM Sales

       WHERE sku = @sku);

Now the table-valued function can be invoked in the query using the APPLY operator:

SELECT sku, product_description, total_sales

FROM Products AS P

CROSS APPLY dbo.GetTotalSales(P.sku) AS S;

Overuse of cursors

Let's face it – we love loops! Whether we start programming with VB, C, C++, Java, or C#, one of the first constructs we encounter is some form of a loop. They can helpfully solve pretty much any challenge you might face.

And so, it is only natural on the day we start programming with SQL to seek out our favorite loop construct. And here it is – the mighty cursor (and its little WHILE brother)! Then we hurry to put the well known tool to use in solving our problems.

Let's look at one example. Given a table with product prices, we have to perform a monthly update of prices for products; the price updates are stored in another table with new prices.

ProductPrices table:

sku  price  effective_start_date effective_end_date

---- ------ -------------------- ------------------

1    10.50  2009-01-01           NULL

2    11.50  2009-01-01           NULL

3    19.00  2009-01-01           NULL

4    11.25  2009-01-01           NULL

NewPrices table:

sku  price

---- ------

2    11.25

4    12.00

A cursor solution may look like this:

DECLARE @sku INT;

DECLARE @price DECIMAL(15, 2);

 

DECLARE PriceUpdates

CURSOR LOCAL

       FORWARD_ONLY

       STATIC

       READ_ONLY

FOR SELECT sku, price

    FROM NewPrices;

 

OPEN PriceUpdates;

 

FETCH NEXT FROM PriceUpdates

      INTO @sku, @price;

 

WHILE @@FETCH_STATUS = 0

BEGIN

 

  UPDATE ProductPrices

  SET price = @price,

      effective_start_date = CURRENT_TIMESTAMP

  WHERE sku = @sku;

 

  FETCH NEXT FROM PriceUpdates

        INTO @sku, @price;

 

END

 

CLOSE PriceUpdates;

DEALLOCATE PriceUpdates;

Mission accomplished! Now we can take a well-deserved break while the query is running. Soon, the realization dawns that procedural row by row processing is not working well in SQL. Besides being very slow, our solution is long, hard to read and maintain. This is the moment we understand the power of SQL is its set-based nature. The same task can be accomplished using a very efficient set-based query that is easier to understand and maintain:

UPDATE ProductPrices

SET price = (SELECT N.price

             FROM NewPrices AS N

             WHERE N.sku = ProductPrices.sku),

    effective_start_date = CURRENT_TIMESTAMP

WHERE EXISTS(SELECT *

             FROM NewPrices AS N

             WHERE N.sku = ProductPrices.sku);

There are different ways to write a set based query to solve this problem: using the MERGE statement, update with Common Table Expression, or the SQL Server specific update with join. But the point is to utilize the natural power of the SQL language and use set based techniques to solve problems and to avoid procedural solutions.

Note: While you should avoid cursors as much as possible, there are certain problems, such as running total aggregations, that today are still best solved using cursors. We can be optimistic that future enhancements will provide better tools to solve those problems in a set based way.

Summary

Maybe now is a good time for the next code review of your SQL code. Hopefully some of these examples will help you be a better developer/DBA and produce more reliable and efficient code. It is not always easy to spot these patterns, especially in more complex queries. But as long as you understand the fundamentals, avoiding these mistakes is easy.

Happy SQL coding!

This article is part of our Database Lifecycle Management patterns & practices.

Return to the library to find more articles on DLM, or visit red-gate.com/dlm for more information about SQL Server products for database lifecycle management.

Plamen Ratchev

Author profile:

Plamen Ratchev has over two decades experience in the software architecture and development field. He has enjoyed building solutions for UNIX, DOS and Windows platforms, with main focus on SQL Server since version 4.21. Plamen is founder of Tangra, specializing in relational database applications analysis, implementation, and tuning. His particular interest is in design patterns, performance and optimization. Plamen has master's degree in computer science with specialization in algorithms and compliers, and holds MCSE, MCSD/.NET, MCDBA, MCSA, MCAD.NET, and MCITP certifications. He is a SQL Server MVP.

Search for other articles by Plamen Ratchev

Rate this article:   Avg rating: from a total of 395 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Good work
Posted by: Anonymous (not signed in)
Posted on: Thursday, August 20, 2009 at 3:53 PM
Message: Simple, but very good work.

Keep writing!

~R

Subject: Nicely Written
Posted by: Chuck (view profile)
Posted on: Friday, August 21, 2009 at 11:33 AM
Message: And good advice.

Subject: Excellent!
Posted by: Anonymous (not signed in)
Posted on: Friday, August 21, 2009 at 2:19 PM
Message: Very well written, good examples of common traps!

Subject: Excellent work
Posted by: Neel Desai (not signed in)
Posted on: Saturday, August 22, 2009 at 5:15 AM
Message: Every one must read this. Even if you write simple query, you should know these basic things. Otherwise you will end up with error prone queries.

Thanks a lot. Explained very well with proper examples.

Subject: Excelent work
Posted by: rajesh (view profile)
Posted on: Monday, August 24, 2009 at 12:16 AM
Message: Very very good wok keep posting these types of articles

Subject: Great article, you should write an article on set based computation using SQL!
Posted by: James Zhuo (not signed in)
Posted on: Monday, August 24, 2009 at 1:29 AM
Message: Great article! I think most of these would be pretty quick to spot for the seasoned SQL developer, but nevertheless good to have them compiled in an article. You should write an article on set based computation using SQL with particular emphasis on how to replace cursors with set based processing under different scenarios :)

Subject: Excellent Article
Posted by: Anonymous (not signed in)
Posted on: Monday, August 24, 2009 at 2:41 AM
Message: Very good stuff - keep on writing.

Subject: Title Change
Posted by: Steve (view profile)
Posted on: Monday, August 24, 2009 at 3:53 AM
Message: Shouldn't the title of this article be "Ten Common SQL Server Programming Mistakes".


Subject: Great Article
Posted by: Anonymous (not signed in)
Posted on: Monday, August 24, 2009 at 4:38 AM
Message: I especially like the "Incorrect subquery column" section. I try to get the guys I work with here to use aliases more often. I find aliases help readability and maintenance, and more often than not help to catch simple mistakes like the one you highlighted. Even with these benefits they are still having none of it though and would rather just dump out column names.

Subject: Great Article
Posted by: Anonymous (not signed in)
Posted on: Monday, August 24, 2009 at 4:39 AM
Message: I especially like the "Incorrect subquery column" section. I try to get the guys I work with here to use aliases more often. I find aliases help readability and maintenance, and more often than not help to catch simple mistakes like the one you highlighted. Even with these benefits they are still having none of it though and would rather just dump out column names.

Subject: Top 11
Posted by: Sam Stange (not signed in)
Posted on: Monday, August 24, 2009 at 7:07 AM
Message: Good article, I can't tell you how many times a few of these top 10's have made my queries go from 5 minutes to 5 seconds.

If I could add one more item to your list, I would add the times when you insert a lot of data into a table, you'll need to reindex that table before any subsequent queries. Other adds might include avoiding the CLR when possible. For those ASP.NET devs, use LINQ.

Subject: I Agree, Great Article
Posted by: BrentP (not signed in)
Posted on: Monday, August 24, 2009 at 7:21 AM
Message: Bravo. I am going to send the link to my DBA's and Programmers. I have seen at least 2 of these mistakes in the last week of looking over scripts and stored procedures.

Subject: One of my favorite articles
Posted by: rakhi.bhatia@gmail.com (view profile)
Posted on: Monday, August 24, 2009 at 10:01 AM
Message: Wonderful work ! Keep it up !

Subject: Fix mismatched datatypes at source
Posted by: Anonymous (not signed in)
Posted on: Monday, August 24, 2009 at 10:08 AM
Message: Hi,

here's a great article that identifies all the 'broken' datatypes on your database.

http://www.sqlservercentral.com/articles/Admin/65138/


Subject: Great job
Posted by: Charles Kincaid (view profile)
Posted on: Monday, August 24, 2009 at 11:33 AM
Message: Enjoyed this article very much. I even learned something new. I did not know about the "scoping" issue with sub-querries.

One bad spot though. Your posting of table data in the examples caused an issue. I tried to recreate eeach one step by step. In "Predicate evaluation order" it's not clear that some rows have missing data for 'account_type'.

So the issue is not with you or the article but with the site software.

Subject: Not Just SQL Server
Posted by: Anonymous (not signed in)
Posted on: Monday, August 24, 2009 at 1:16 PM
Message: <i>Shouldn't the title of this article be "Ten Common SQL Server Programming Mistakes". </i>

At least 8 of these errors apply in any SQL dialect.

Subject: Not Just SQL Server
Posted by: Anonymous (not signed in)
Posted on: Monday, August 24, 2009 at 1:41 PM
Message: <i>Shouldn't the title of this article be "Ten Common SQL Server Programming Mistakes". </i>

At least 8 of these errors apply in any SQL dialect.

Subject: Very cool article
Posted by: jb (not signed in)
Posted on: Monday, August 24, 2009 at 7:41 PM
Message: This is truly based on experiences.. Lots of stuffs learned on this article.

Subject: Didn't specified Dynamic SQL
Posted by: jb (not signed in)
Posted on: Monday, August 24, 2009 at 7:50 PM
Message: It's better you should also include some topics on dynamic SQL queries...

Subject: Fix mismatched datatypes at source
Posted by: Anonymous (not signed in)
Posted on: Tuesday, August 25, 2009 at 1:52 AM
Message: Hi,

here's a great article that identifies all the 'broken' datatypes on your database.

http://www.sqlservercentral.com/articles/Admin/65138/


Subject: Very Good
Posted by: Anonymous (not signed in)
Posted on: Tuesday, August 25, 2009 at 6:30 AM
Message: Thanks Plamen, excellent as always :)

Subject: very good
Posted by: anvarsha (not signed in)
Posted on: Tuesday, August 25, 2009 at 10:52 AM
Message: pretty good list. good work.

thanks


Subject: Data type mismatch in predicates
Posted by: anvarsha (not signed in)
Posted on: Tuesday, August 25, 2009 at 11:08 AM
Message: i think it is possible to handle this kind of error using isnumeric function. this function returns 1 when the character string is indeed numeric

that way the query may be rewritten as

SELECT account_nbr, account_reference AS account_ref_nbr
FROM Accounts
WHERE account_type LIKE 'Business%'
AND isnumeric(account_reference)=1
AND CAST(account_reference AS INT) > 20;

i see this to be working when i tried. however, as there are no guarantees on the order, we may rewrite it as

SELECT account_nbr, account_reference AS account_ref_nbr
FROM (select account_nbr,account_reference
from Accounts
where
isnumeric(account_reference)=1 ) as a
WHERE account_type LIKE 'Business%'
AND CAST(account_reference AS INT) > 20;


this should work fine. but i am not losing your point. just thought of pointing out a minor work-around

Subject: Using * in EXISTS predicate
Posted by: Anonymous (not signed in)
Posted on: Tuesday, August 25, 2009 at 11:22 AM
Message: I totally agree with your comments -- To conclude, do not use SELECT * in production code! One exception here is when using the EXISTS predicate. The select list in the subquery for the EXISTS predicate is ignored since only the existence of rows is important --

Except that I believe it's just as efficient to say

where exists (select null from tbl)

instead of using select * -- I don't know if it makes any difference in terms of speed, but I read somewhere that this was a good idea and I've adopted it in my own code.

Subject: LEFT JOINs
Posted by: John (view profile)
Posted on: Tuesday, August 25, 2009 at 11:42 AM
Message: Unless I've misunderstood the problem statement, the solution presented isn't correct given the requirement "for selected product SKUs":

SELECT C.customer_name, O.order_amt, D.qty
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr
LEFT JOIN OrderDetails AS D
ON D.order_nbr = O.order_nbr
AND D.sku = 101;

Since OrderDetails is LEFT JOINed, it doesn't filter the result set at all.
What's wanted is:
SELECT C.customer_name, O.order_amt, D.qty
FROM Customers AS C
LEFT OUTER JOIN (Orders AS O
ON C.customer_nbr = O.customer_nbr
INEER JOIN OrderDetails AS D
ON D.order_nbr = O.order_nbr
AND D.sku = 101);

This gives all customers, and only their orders for SKU 101.

Subject: Bravo!
Posted by: Alex_Kuznetsov (view profile)
Posted on: Tuesday, August 25, 2009 at 11:55 AM
Message: Plamen,

Bravo!


anvarsha,

your "workaround" may or may not work. There is no guarantee it will work. It can break at any moment - the optimizer will rewrite your second query and evaluate the predicates in any way it wishes.

Subject: Data type mismatch in predicates
Posted by: anvarsha (not signed in)
Posted on: Tuesday, August 25, 2009 at 12:23 PM
Message: Alex..

you are right. my 'workaround' has this serious flaw... Plamen's solution will always work

thanks

Subject: Re: Data type mismatch in predicates
Posted by: Plamen (view profile)
Posted on: Tuesday, August 25, 2009 at 9:42 PM
Message: anvarsha,

In addition to what Alex pointed out, the ISNUMERIC function has other issues as it can recognize currency and other special symbols as valid numeric values.

Subject: LEFT JOINs
Posted by: Plamen (view profile)
Posted on: Tuesday, August 25, 2009 at 9:53 PM
Message: John,

If you read the request carefully it asks to return a list all customers – "...and the request is to retrieve a list of all customers...". Only the quantity for the selected sku is needed. Download the SQL scripts from the SCRIPTS link and try it.

Subject: Re: Using * in EXISTS predicate
Posted by: Plamen (view profile)
Posted on: Tuesday, August 25, 2009 at 9:57 PM
Message: Using "SELECT NULL" or "SELECT 1" were old tricks that do not make any difference today. Using "SELECT *" is just fine with the EXISTS predicate.

Subject: Incorrect subquery column appears to be wrong
Posted by: Anthony (view profile)
Posted on: Wednesday, August 26, 2009 at 5:41 PM
Message: I tested the Incorrect subquery column note you made both because I was curious, and worried. Could I actually of been writing SQL like that all these years and had it been wrong? I created tables in SQL server and ran this sql against it:

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT sale_date
FROM Calendar AS C
WHERE holiday_name IS NOT NULL);

and I indeed did NOT get all the rows from the Calendar table as you indicated I would, instead I only got the rows where the dates matched in both tables as I expected. Perhaps there is a version difference between my version of SQL server and yours? Or did I do something wrong?

I'm using this version of SQL server:
Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002 14:22:05 Copyright (c) 1988-2003 Microsoft Corporation Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 2)

Subject: Re: Incorrect subquery column appears to be wrong
Posted by: Plamen (view profile)
Posted on: Wednesday, August 26, 2009 at 9:02 PM
Message: Anthony,

The problem exists regardless of the SQL Server version. There must be something incorrect in the way you recreated the sample. Please download the full SQL code from the SCRIPTS link on the top of the article and try that.

Subject: Incorrect subquery column appears to be wrong
Posted by: Anthony (view profile)
Posted on: Thursday, August 27, 2009 at 5:01 PM
Message: I tested the Incorrect subquery column note you made both because I was curious, and worried. Could I actually of been writing SQL like that all these years and had it been wrong? I created tables in SQL server and ran this sql against it:

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT sale_date
FROM Calendar AS C
WHERE holiday_name IS NOT NULL);

and I indeed did NOT get all the rows from the Calendar table as you indicated I would, instead I only got the rows where the dates matched in both tables as I expected. Perhaps there is a version difference between my version of SQL server and yours? Or did I do something wrong?

I'm using this version of SQL server:
Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002 14:22:05 Copyright (c) 1988-2003 Microsoft Corporation Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 2)

Subject: Incorrect subquery column appears to be wrong
Posted by: Anthony (view profile)
Posted on: Thursday, August 27, 2009 at 5:02 PM
Message: I tested the Incorrect subquery column note you made both because I was curious, and worried. Could I actually of been writing SQL like that all these years and had it been wrong? I created tables in SQL server and ran this sql against it:

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT sale_date
FROM Calendar AS C
WHERE holiday_name IS NOT NULL);

and I indeed did NOT get all the rows from the Calendar table as you indicated I would, instead I only got the rows where the dates matched in both tables as I expected. Perhaps there is a version difference between my version of SQL server and yours? Or did I do something wrong?

I'm using this version of SQL server:
Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002 14:22:05 Copyright (c) 1988-2003 Microsoft Corporation Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 2)

Subject: Thanks Plamen!
Posted by: SQL Babe (view profile)
Posted on: Friday, August 28, 2009 at 9:42 AM
Message: I've seen you present this at SQL Saturday. So glad it is in an article. All my staff will be required to read this!

Cheers!
Karla Kay

Subject: great article
Posted by: darrell (not signed in)
Posted on: Monday, August 31, 2009 at 8:58 PM
Message: very interesting article, thanks!

Subject: Incorrect subquery column - appears to be a misunderstanding
Posted by: Anthony (view profile)
Posted on: Wednesday, September 2, 2009 at 1:03 PM
Message: First off, sorry my post shows up here three times. I'm not sure what I did to cause that.. perhaps I was clicking my back button, I'm not sure.

In any case, I ran your sql and re-reviewed my test I finally realize where the issue was. In my test I had a table called
sales
with columns:
transaction_nbr, sale_date, sale_amount

and a table called
calendar
with columns:
sale_date, holiday_name

The difference was your calendar table had these columns:
calendar_date, holiday_name

So your nested subquery was dependant on a column from the parent query. When I first read your notes I thought you were saying that the fix was to use table alias', where the fix was to use the correct column, using table alias' was just the best practice.

Thanks for the clarification, re-reading your notes today it's hard to see where I wound up having that confusion.

Thanks allot for your tips!
-- Anthony

Subject: Question with subquery??
Posted by: JD (not signed in)
Posted on: Saturday, September 5, 2009 at 11:10 AM
Message: Great great article....one question though..for the correlated subquery i did not understand how it was possible for the subqyery in this query
to return a value at all.

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT sale_date
FROM Calendar AS C
WHERE holiday_name IS NOT NULL)

I mean isnt sale_date an invalid column with respect to the Calendar table. I tried it out and it works, but i am still confused how sql allows this...please advise

Subject: Question with subquery??
Posted by: JD (not signed in)
Posted on: Saturday, September 5, 2009 at 11:19 AM
Message: Great great article....one question though..for the correlated subquery i did not understand how it was possible for the subqyery in this query
to return a value at all.

SELECT sale_date, sale_amount
FROM Sales AS S
WHERE sale_date IN (SELECT sale_date
FROM Calendar AS C
WHERE holiday_name IS NOT NULL)

I mean isnt sale_date an invalid column with respect to the Calendar table. I tried it out and it works, but i am still confused how sql allows this...please advise

Subject: RE: Question with subquery??
Posted by: Plamen (view profile)
Posted on: Saturday, September 5, 2009 at 5:39 PM
Message: JD,

Columns from the outer/main query are always visible inside subqueries. This is how the scoping of nested queries works. Otherwise you will not be able to write correlated subqueries.

The predicate with the subquery will always evaluate to TRUE, because it gets translated to "WHERE sale_date = sale_date". Of course, as long as the predicate in the subquery "holiday_name IS NOT NULL" returns at least one row, which is true in our case.

Subject: Plamen
Posted by: Jude D (view profile)
Posted on: Sunday, September 6, 2009 at 3:58 PM
Message: Thanks for pointing that out!. It makes this concept very clear to me now..

Subject: Re: Overuse of cursors
Posted by: calebsj79 (view profile)
Posted on: Monday, September 7, 2009 at 2:58 AM
Message: Plamen, You are good.
Using a set based query in the following way may help us more

Alternate set based query with Join:

update ProductPrices
set price = N.price
,effective_start_date = CURRENT_TIMESTAMP
from
NewPrices N
inner join ProductPrices PP
on N.sku=PP.sku

Subject: Article
Posted by: marbouh (not signed in)
Posted on: Tuesday, September 8, 2009 at 7:59 AM
Message: thank you very much, it's very interesting

Subject: Cartesian Join????
Posted by: Jeff Moden (view profile)
Posted on: Friday, September 11, 2009 at 3:50 AM
Message: Good article and good tips.

I would like to know, though, why anyone thinks the first step the optimizer would take on a Left Outer Join would be to create a Cartesian Product and filter from there?

Subject: Great article
Posted by: ALZDBA (view profile)
Posted on: Friday, September 11, 2009 at 7:08 AM
Message: 11 - not using object aliasses in a query

Where does my column come from ?

Subject: Re: Cartesian Join????
Posted by: Plamen (view profile)
Posted on: Friday, September 11, 2009 at 10:07 AM
Message: Jeff,

Not sure what you reference here... If this relates to processing of outer joins, then note the article refers to "logical query processing". Forming a Cartesian product is the first step in the logical processing phases. The following shows this visually much better than described with words: http://www.solidq.com/insidetsql/books/insidetsql2008/Logical%20Query%20Processing%20Poster.pdf.

How the optimizer will treat the join and what plan will be picked for physical execution is a completely different story.

Subject: Re: 11 - not using object aliasses in a query
Posted by: Plamen (view profile)
Posted on: Friday, September 11, 2009 at 10:11 AM
Message: ALZDBA,

This is a good point. The problem was emphasized in "3.Incorrect subquery column".

Subject: Re "Scalar user-defined functions - use Views
Posted by: JendaPerl (view profile)
Posted on: Friday, September 11, 2009 at 12:38 PM
Message: Re "Scalar user-defined functions" ... I think the easiest way to write and reuse such things is to define them as views.

CREATE VIEW TotalSales
AS
SELECT sku, SUM(sale_amount) AS total_sales
FROM Sales
GROUP BY sku
GO
SELECT P.sku, P.product_description, TS.total_sales
FROM Products AS P
JOIN TotalSales AS TS ON P.sku = TS.sku


I agree with calebsj79 on the "Overuse of cursors" solution. Your solution looks too complicated.

Subject: Re "Scalar user-defined functions - use Views
Posted by: Plamen (view profile)
Posted on: Friday, September 11, 2009 at 3:48 PM
Message: JendaPerl,

With views you cannot use parameters, and the point here was to demonstrate you can directly write a query without the need of function or if needs to be reused with parameters then it is better to use table-valued function.

As far as the solution provided by celebsj79 - I noted this is one of the options. However, I do prefer to use the ANSI update syntax. The SQL Server specific UPDATE with join has problem of non-deterministic updates on multiple matches, while the ANSI syntax generates error. My preferred approach would be to use the MERGE statement on SQL Server 2008.

Subject: Awesomeness
Posted by: Paul White NZ (view profile)
Posted on: Friday, September 11, 2009 at 10:29 PM
Message: Superb work Plamen.

Subject: Great article...
Posted by: Diana (view profile)
Posted on: Monday, September 14, 2009 at 8:46 AM
Message: ...thank you.
I've made some of these mistakes as a developer and I've seen them done by others as a code reviewer.
What about dynamic SQL? Despite the recent SQL injection attacks it seems that dynamic SQL made up from string concatenation (no parameters, no sp_executesql) is quite common...Maybe this is more a “bad practice” than a ‘’mistake”…
Another little annoyance is the lack of code formatting - of course this is not a mistake, but makes the reviewer's life very ugly sometimes..:)

Subject: Re: Dynamic SQL
Posted by: Plamen (view profile)
Posted on: Monday, September 14, 2009 at 9:22 PM
Message: Diana,

Overuse of dynamic SQL could be more of bad practice, but it becomes a mistake when it is incorrectly used and exposes to SQL injection attacks.

Code formatting and comments are very important for anyone who considers writing professional grade code, but unfortunately not on the top of the list for many coders...

Subject: Not to use dynamic queries
Posted by: avishwakarma (view profile)
Posted on: Friday, September 18, 2009 at 6:55 AM
Message: As you suggested for not using dynamic queries.

If i need to pass a comma-separated string in the stored procedure as a parameter. And this string will be used in the where clause. Then how to write an sql query without using dynamic SQL

Subject: Re: Not to use dynamic queries
Posted by: Plamen (view profile)
Posted on: Friday, September 18, 2009 at 9:10 AM
Message: avishwakarma,

There are different methods to handle this without the need for dynamic SQL. It depends a lot on the version of SQL Server you have. On SQL Server 2000 and prior you can use different techniques to parse the list to table format and then use is like a normal table. Here is a great article on the topic by Erland Sommarskog: http://www.sommarskog.se/arrays-in-sql-2000.html. On SQL Server 2005 you can use additional techniques that utilize the XML XQuery capabilities, or CLR. The following article explores that: http://www.sommarskog.se/arrays-in-sql-2005.html. On SQL Server 2008 you can use table-valued parameters to pass the list in table format. Here are more details about using table-valued parameters: http://msdn.microsoft.com/en-us/library/bb510489.aspx.

Subject: thumbs up!
Posted by: marleng (view profile)
Posted on: Saturday, November 3, 2012 at 12:19 PM
Message: Great job going over the main errors and how to handle them. This is a must read for anyone starting out on SQL. This is now in my teams tool box, along with the number to <a href:"http://www.reportingguru.com/"> Reporting Guru </a> for more specific or complicated issues.

Subject: Useful article
Posted by: crisred (view profile)
Posted on: Thursday, May 9, 2013 at 3:37 AM
Message: Excellent article written by the author to express views and explanations related to the subject. Thanks for your great efforts. <a href="https://www.google.co.in/" rel="follow">goto</a>

 
Simple-Talk Database Delivery

DLM
Patterns & Practices Library

Visit our patterns and practices library to learn more about database lifecycle management.

Find out how to automate the process of building, testing and deploying your database changes to reduce risk and make rapid releases possible.

Get started

Phil Factor
Microsoft and Database Lifecycle Management (DLM): The DacPac

The Data-Tier Application Package (DacPac), together with the Data-Tier Application Framework (DacFx), provides an... Read more...

 View the blog

Top Rated

Working with SQL Server data in Power BI Desktop
 What's the best way of providing self-service business intelligence (BI) to data that is held in... Read more...

Microsoft and Database Lifecycle Management (DLM): The DacPac
 The Data-Tier Application Package (DacPac), together with the Data-Tier Application Framework (DacFx),... Read more...

A Start with Automating Database Configuration Management
 For a number of reasons, it pays to have the up-to-date source of all the databases and servers that... Read more...

Archiving Hierarchical, Deleted Transactions Using XML
 When you delete a business transaction from the database, there are times when you might want to keep a... Read more...

Rollback and Recovery Troubleshooting; Challenges and Strategies
 What happens if your database deployment goes awry? Do you restore from a backup or snapshot and lose... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

Temporary Tables in SQL Server
 Temporary tables are used by every DB developer, but they're not likely to be too adventurous with... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.