When Stored Procedures Go Bad

It is far easier to work out what is wrong with a slow-running SQL query than one that shows erratic performance. DBAs will groan when a process occasionally performs terribly without apparent reason. There is always a good chance that this is a symptom of the painful problem of parameter sniffing. It can afflict database applications badly enough that SQL Server 2016 introduces an option to disable sniffing at the database level.

Let’s say you have a stored procedure or parameterized query that allows you the flexibility of providing a specific ProductID (PID) value or setting it to null, in which case you want data for all products. When the SQL Server optimizer creates or recompiles an execution plan for this query, it will “sniff” the input parameter value, let’s say we provided a value of 10. It uses column statistics to estimate how many rows might be returned for a PID of 10 (40 rows, say), selects an appropriate plan, passes it on for execution, and places it in the plan cache.

That last part is what occasionally causes trouble. If the optimizer sees the same query text again, just with different parameter values, it checks the plan cache and reuses the existing plan, if it’s in there. This is fine but if subsequent executions set the PID to NULL, and the Product table is large, then we’re now using a plan devised to return 40 rows for a query execution that actually returns thousands. That is likely to end badly. There are other common causes of this problem too, such as filtering on columns with skewed data distributions, or filtering on a date range and ending up with a cached plan for a very atypical range.

There’s no guaranteed way of avoiding this, but you can, during development, detect a procedure that is likely to cause problems. You first clear the cache to prevent the optimizer just grabbing the cached plan, and then capture the estimated execution plan for several different executions, using different parameter values. Estimated plans don’t get cached, and you’ll see the optimizer’s plan for each individual execution. If you’re seeing different estimated plans for some parameter values, that’s a warning sign. If it’s a flexible stored procedures that performs a range of duties, you should consider refactoring it. In any event, it’s an incentive to check how bad the performance gets if certain executions end up using the wrong plan, because that’s what is in the plan cache at that time.

This sort of testing can become tedious but you can automate it by using a test harness to capture the estimated plans for a wide range of parameter values. If you see different plans then depending on the frequency with which each one occurs, you have firmer evidence on which to base a decision to refactor the code, use an OPTIMIZE FOR hint, or even recompile the stored procedure on every execution. Of course, even this technique becomes impractical if the procedure accepts many parameters: you might end up needing to test millions of different possible executions.

I’m interested to hear how badly you’ve been stung by parameter sniffing problems, and any techniques you’ve used to gauge the extent of the likely problem, during testing.

Commentary Competition

Enjoyed the topic? Have a relevant anecdote? Disagree with the author? Leave your two cents on this post in the comments below, and our favourite response will win a $50 Amazon gift card. The competition closes two weeks from the date of publication, and the winner will be announced in the next Simple Talk newsletter.

  • 5856 views

  • Rate
    [Total: 5    Average: 4.6/5]
  • Radu Gheorghiu

    What about using OPTION (RECOMPILE)?

    You could and I recommend (as well as Aaron Bertrand, below) to use OPTION (RECOMPILE) at statement level, instead of at procedure level.

    https://youtu.be/XUCxQkFoqpw?t=55m18s

  • Peter Schott

    We had this issue a bunch w/ a SQL 2005 server (no OPTION(RECOMPILE) there). Multi-tenant DB with some tenants much larger than the others. We’d sometimes get a very small client hit the system for a query and that then stuck in the cache. The larger clients would try to run the same query, but be unable to do so because the query was optimized for a very small dataset. We eventually worked around it with a hard-coded index hint to force all queries to use the index, whether SQL thought it needed it or not. Never saw that issue again.

    Going forward, we tried to run some larger and smaller workloads against a server that had an older production dataset. The sizes were still within the same range of production, but much larger than Dev/Test. We could estimate some queries taking much longer and would test those accordingly to make sure the system was fully optimized and able to handle whatever we threw at it.

  • willliebago

    This also happens on other database platorms and is time consuming to test. I like the idea of automating it using the test harness link! We’ve had some success using recompile and pinning plans.

  • rogerthat

    We are undergoing migrating from an in-house CRM to Salesforce and we have had to generate hundreds of thousands of test data that, due to PCI concerns, could not be scrubbed real data.
    One issue we found early on was due to repeat or missing data elements that would never happen in real life.
    In general, we have found that hand-tuning would work far better than automated tuning. Our production hits follow production flows that vary greatly — during one part of the month, we hit client subscription data, other parts, mail merge data.
    We do log our queries and performance information and have an automated alert when performance is under threshold. That triggers a deep dive into root cause and may or may not result in another round of tuning.
    That’s the best we’ve found.