Jodie Beay Makes an Uncontrolled Release
Jodie Beay was checking all her database servers as a routine. One of the important checks to start her day was to make sure that all the overnight jobs had finished. It sounds strange, I know, but overnight jobs, such as ETL, reporting, clearing out historic data, and reconciliation just seem to grow and grow. Because of the interdependencies within the data, these overnight jobs have to be done in a particular order and if one job gets delayed, all the jobs behind it get delayed. It’s like the check-in queue at an airport where it only takes a couple of complicated queries and the whole queue has to wait. In this case, the overnight jobs on the production Forex database were still running and none of the daily reports had gone out to the dealers. This spelled trouble.
Quickly, Jodie scanned the application logs. This told her that a particular table had grown to the point where the query that once took 30 milliseconds had taken half an hour due to repetitive table scans. The routine was being called in several parts of the batch process. She sighed: Blooming Devs. Quickly, she created an appropriate covering index, and went back to her task of checking databases whilst the batch process cleared itself, rather like a drain unblocking.
Derek and the Domino Effect
Later that day, Derek, the developer, was making some changes to the GetLatestExchangeRate stored procedure in the Forex database. He’d done the work on his development machine and checked his changes into SQL Source Control. He then deployed the version of the database from the code in Version Control to the Quality Assurance (QA) environment using SQL Release. This was done to ensure that the tests were run on the code that was in source control. It also meant that the database that was tested became the production database after a successful deployment. The deployment to QA went without a hitch, and the automated test process kicked in.
Once the automated tests had run, Derek then tried to deploy to the Staging environment. He first made sure that Staging was in an identical state to the production database. He then created a new release with the Staging database as the target to be updated, and the code in Version Control as the source. This seemed to work fine. But when Derek reviewed the SQL Release change report he noticed something surprising. As part of the deployment, there was going to be a change to a table that nobody in development had touched. Someone had added an index to the ExchangeRate table in the production database. The index would be dropped because it wasn’t present in Version Control. This revealed that the state of the production database had changed without being recorded in Version Control – It had ‘drifted’ from a known version.
There was some perplexity in the development team. QA performance-testing hadn’t shown up a requirement for this index, but they decided that they ought to keep this change because this additional index definitely improved performance. But because this index was not checked into version control, it was not in the version to be deployed, and so it would have been dropped as part of any deployment had it not been pointed out by the deployment system.
The SQL Release change report showed that the ExchangeRate table had been modified, even though Derek had only made a change to the GetLatestExchangeRate stored procedure:
Expanding the ExchangeRate table modification in the report showed the index being deleted as part of the deployment. The relevant sections of the change script are prefixed with minus signs and highlighted in red:
Not Deployed? Phil’s Annoyed.
“Jodie? We’ve asked around the Forex team to find out who added an index and nobody has claimed it. Was it you?”
Jodie flushed slightly. “I had no choice. The overnight batch was still running at 8.15 this morning and the dealers weren’t going to get their Forex fluctuation reports. I was going to tell you but I then hit a hardware problem with another server and that took priority. I’ll email you the script.”
Phil was scowling at the screen when Derek came over. “Derek, what happened to the deployment this morning? I thought we were doing Continuous Integration, rather than Sporadic Integration.”
“It’s the old problem, Phil.” Derek paused. “We can test the database code until we collapse with exhaustion, but the problems show up on the production server because the batch overnight processes aren’t part of the database. They are in SQL Agent. We can’t even see when things go wrong, let alone test them in development. Forgive me repeating myself but we need access to the production server just to see what’s going on. Jodie was right about doing that emergency fix. But we should have detected it immediately and fed it into source control without her needing to tell us she’d done it.”
Phil sighed. The agent jobs were a mess. There was some Forex code in there, but a lot of BI stuff and some of the ETL code was provided by a third team. They were all hitting the ExchangeRate table even though they all needed the same aggregated information for the previous week’s fluctuations. It was difficult to test this stuff but the team should have spotted that missing index.
“Derek, we’re not going to start ‘hunting in packs’. We’ve also got to keep to the principal that the only way we can access the production server is through SQL Release. This stuff is already monitored by production. All we need are the right reports. I bet that nobody in the Dev team have figured out what information they really require.”
The colour of Derek’s face told Phil everything he needed to know.
“Let me sort this, Derek. In the meantime, let’s get the current problem sorted so we can do a deployment with Jodie’s index included. And another thing, let’s make sure that the drift is detected and fixed before we attempt to deploy to staging. We’ve wasted an automated test session and messed up a deployment.”
The Enjoyment of Deployment
Derek’s mailbox went ‘ping’, and there was an email from Jodie with the code for the index. Derek checked this into source control and then deployed to QA and reran the tests.
He contacted Jodie and asked her to drop the index on the production server just before he reran the deployment process. This time when he deployed, to staging by migrating a backup of the production database to the new version, the index was deployed as part of the release. The change report now had the ExchangeRate table marked as being modified by the release as expected. From Staging, the final deployment went through without a hitch.
This solved their immediate problem. Next, how could they stop this from happening again?
Having DLM Dashboard installed would have been useful because it would have notified the team as soon as Jodie had made her change to the production database. This would have prompted them to sort out the problem before they even tried to make a database deployment. For example, the production drift would have been flagged up on the DLM Dashboard homepage:
By clicking the Review button, the developers could have reviewed this change in the associated drift report:
Phil phoned Jodie. “Phil, you reprobate, how can you hinder me today?”
Phil laughed. “Well, making me God on all your systems might satisfy my ego, but in the meantime there are a couple of daily reports you could send to the Forex team.”
“Hmm. That overnight batch job will fall apart one day, if we keep adding reports.”
“Nah. You already do a ‘long-running queries’ report, and we just need a copy of yesterday’s application log which should tell us how long the main part of the batch takes. Also, how about the ‘Missing Indexes’ warnings from the Plan Cache?”
“Phil, if it wasn’t you, I’d make you grovel a bit and might even make you fill in a change request to production, but as it is… Hmm let’s see I’m really busy.”
“OK. A week’s supply of Cappuccino.”
“I suddenly notice I have a cancellation this afternoon. I’ll do it then.”
Putting an Interface in Place
A few days later Derek and Riza were doing the day’s deployment when Phil wandered over. “All well now?”
Derek pulled a face. “Well, yes, I’d say so. At least the deployments are now so regular that Jodie can pass her fixes over to us to put in production. We now have all the information we need from those reports to home-in on missing or incorrect indexes, so I doubt she’ll ever need to make that kind of change again. What bothers me is that the overnight batch jobs are a mess. We’re just papering over the cracks.”
Phil laughed. “Yes. The BI guys had got hold of sysadmin rights on the production server and were going in there and adding reports on SQL Agent without thinking it through. They’ve been kicked off by Jodie, but only on the promise that you’d provide an interface for the information they need about the last week’s Forex fluctuations. You need only do the aggregations once and just serve them as a result when they call a stored procedure. They are probably going to come and see you about help to get their testing automated. You can probably delete Jodie’s index once you’ve done all that.” He caught Derek’s glare. “No, only kidding!”