Click here to monitor SSC

Tony Davis

Simple-Talk Editor
News, views and good brews

  • IT Admin for Thrill Seekers

    Posted Friday, April 27, 2012 11:01 AM | 5 Comments

    A developer suggested to me recently that the life of the DBA was, surely, a dull one. My first reaction was indignation, but quickly followed by the thought that for many people excitement isn't necessarily the most desirable aspect of their job.

    It's true that some aspects of the DBA role seem guaranteed to quieten the pulse; in the days of tape backups, time must have slowed to eternity for the person whose job it was to oversee this process, placing tapes into secure containers, ensuring correct labeling, and.sorry, I drifted off there for a second. On the other hand, if you follow the adventures of the likes of Brent Ozar or Tom LaRock, you'd be forgiven for thinking that much of a database guy's time is spent, metaphorically, diving through plate glass windows in tight fitting underwear in order to extract grateful occupants from burning database applications.

    Alas it isn't true of the majority, but it isn't as dull as some people imagine, and is a helter-skelter ride compared with some other IT roles. Every IT department has people who toil away in shadowy corners doing quiet but mysterious tasks. When you ask them to explain what they do, you almost immediately want them to stop, but you hear enough to appreciate that these tasks are often absolutely vital to the smooth functioning of an IT organization. Compared with them, the DBAs are prima donnas.

    Here are a few nominations:

    • Installation engineer - install all of the company's laptops and workstations, and software, deal with licensing, shipping and data entry.many organizations, especially those subject to tight regulation, would simply grind to a halt without their efforts.
    • Localization engineer - Not quite software engineering, not quite translation, the job is to rebuild a product in a different language and make sure everything still works.
    • QA Tester - firstly, I should say that the testers at Red Gate seem to me some of the most-fulfilled in the company. I refer here to the QA Tester whose job is more-or-less entirely to read a script, click some buttons and make sure the actual and expected values match.
    • Configuration manager - for example, someone whose main job is to configure build environments so that devs can access their source code; assuredly necessary for the smooth functioning and productivity of the team, and hopefully well-paid.

    So what other sort of job in IT should one choose if the work of a DBA proves to be too exciting? Or are these roles secretly more exciting than many imagine? I invite you all to put forward your own suggestions.

    Cheers,
    Tony.

  • Going Metro

    Posted Friday, April 13, 2012 10:53 AM | 11 Comments

    When it was announced, I confess was somewhat surprised by the striking new "Metro" User Interface for Windows 8, based on Swiss typography, Bauhaus design, tiles, touches and gestures, and the new Windows Runtime (WinRT) API on which Metro apps were to be built. It all seemed to have come out of nowhere, like field mushrooms in the night and seemed quite out-of-character for a company like Microsoft, which has hung on determinedly for over twenty years to its quaint Windowing system.

    Many were initially puzzled by the lack of support for plug-ins in the "Metro" version of IE10, which ships with Win8, and the apparent demise of Silverlight, Microsoft's previous 'radical new framework'. Win8 signals the end of the road for Silverlight apps in the browser, but then its importance here has been waning for some time, anyway, now that HTML5 has usurped its most compelling use case, streaming video. As Shawn Wildermuth and others have noted, if you're doing enterprise, desktop development with Silverlight then nothing much changes immediately, though it seems clear that ultimately Silverlight will die off in favor of a single WPF/XAML framework that supports those technologies that were pioneered on the phones and tablets.

    There is a mystery here. Is Silverlight dead, or merely repurposed? The more you look at Metro, the more it seems to resemble Silverlight. A lot of the philosophies underpinning Silverlight applications, such as the fundamentally asynchronous nature of the design, have moved wholesale into Metro, along with most the Microsoft Silverlight dev team. As Simon Cooper points out, "Silverlight developers, already used to all the principles of sandboxing and separation, will have a much easier time writing Metro apps than desktop developers".

    Metro certainly has given the framework formerly known as Silverlight a new purpose. It has enabled Microsoft to bestow on Windows 8 a new "duality", as both a traditional desktop OS supporting 'legacy' Windows applications, and an OS that supports a new breed of application that can share functionality such as search, that understands, and can react to, the full range of gestures and screen-sizes, and has location-awareness.

    It's clear that Win8 is developed in the knowledge that the 'desktop computer' will soon be a very large, tilted, touch-screen monitor. Windows owes its new-found versatility to the lessons learned from Windows Phone, but it's developed for the big screen, and with full support for familiar .NET desktop apps as well as the new Metro apps. But the old mouse-driven Windows applications will soon look very passé, just as MSDOS character-mode applications did in the nineties.

    Cheers,
    Tony.

  • Concurrent Affairs

    Posted Friday, March 16, 2012 11:15 AM | 4 Comments

    I once wrote an editorial, multi-core mania, on the conundrum of ever-increasing numbers of processor cores, but without the concurrent programming techniques to get anywhere near exploiting their performance potential. I came to the.controversial.conclusion that, while the problem loomed for all procedural languages, it was not a big issue for the vast majority of programmers. Two years later, I still think most programmers don't concern themselves overly with this issue, but I do think that's a bigger problem than I originally implied.

    Firstly, is the performance boost from writing code that can fully exploit all available cores worth the cost of the additional programming complexity? Right now, with quad-core processors that, at best, can make our programs four times faster, the answer is still no for many applications. But what happens in a few years, as the number of cores grows to 100 or even 1000? At this point, it becomes very hard to ignore the potential gains from exploiting concurrency. Possibly, I was optimistic to assume that, by the time we have 100-core processors, and most applications really needed to exploit them, some technology would be around to allow us to do so with relative ease.

    The ideal solution would be one that allows programmers to forget about the problem, in much the same way that garbage collection removed the need to worry too much about memory allocation. From all I can find on the topic, though, there is only a remote likelihood that we'll ever have a compiler that takes a program written in a single-threaded style and "auto-magically" converts it into an efficient, correct, multi-threaded program.

    At the same time, it seems clear that what is currently the most common solution, multi-threaded programming with shared memory, is unsustainable. As soon as a piece of state can be changed by a different thread of execution, the potential number of execution paths through your program grows exponentially with the number of threads. If you have two threads, each executing n instructions, then there are 2^n possible "interleavings" of those instructions. Of course, many of those interleavings will have identical behavior, but several won't. Not only does this make understanding how a program works an order of magnitude harder, but it will also result in irreproducible, non-deterministic, bugs. And of course, the problem will be many times worse when you have a hundred or a thousand threads.

    So what is the answer? All of the possible alternatives require a change in the way we write programs and, currently, seem to be plagued by performance issues. Software transactional memory (STM) applies the ideas of database transactions, and optimistic concurrency control, to memory. However, working out how to break down your program into sufficiently small transactions, so as to avoid contention issues, isn't easy. Another approach is concurrency with actors, where instead of having threads share memory, each thread runs in complete isolation, and communicates with others by passing messages. It simplifies concurrent programs but still has performance issues, if the threads need to operate on the same large piece of data.

    There are doubtless other possible solutions that I haven't mentioned, and I would love to know to what extent you, as a developer, are considering the problem of multi-core concurrency, what solution you currently favor, and why.

    Cheers,

    Tony.

  • A Community Cure for a String Splitting Headache

    Posted Friday, March 02, 2012 3:25 AM | 6 Comments

    A heartwarming tale of dogged perseverance and Community collaboration to solve some SQL Server string-related headaches.

    Michael J Swart posted a blog this week that had me smiling in recognition and agreement, describing how an inquisitive Developer or DBA deals with a problem. It's a three-step process, starting with discomfort and anxiety; a feeling that one doesn't know as much about one's chosen specialized subject as previously thought. It progresses through a phase of intense research and learning until finally one achieves breakthrough, blessed relief and renewed optimism. In this case, the discomfort was provoked by the mystery of massively high CPU when searching Unicode strings in SQL Server. Michael explored the problem via Stack Overflow, Google and Twitter #sqlhelp, finally leading to resolution and a blog post that shared what he learned.

    Perfect; except that sometimes you have to be prepared to share what you've learned so far, while still mired in the phase of nagging discomfort. A good recent example of this recently can be found on our own blogs. Despite being a loud advocate of the lightning fast T-SQL-based string splitting techniques, honed to near perfection over many years by Jeff Moden and others, Phil Factor retained a dogged conviction that, in theory, shredding element-based XML using XQuery ought to be even more efficient for splitting a string to create a table.

    After some careful testing, he found instead that the XML way performed and scaled miserably by comparison. Somewhat subdued, and with a nagging feeling that perhaps he was still missing "something", he posted his findings. What happened next was a joy to behold; the community jumped in to suggest subtle changes in approach, using an attribute-based rather than element-based XML list, and tweaking the XQuery shredding. The result was performance and scalability that surpassed all other techniques.

    I asked Phil how quickly he would have arrived at the real breakthrough on his own. His candid answer was "never".

    Both are great examples of the power of Community learning and the latter in particular the importance of being brave enough to parade one's ignorance. Perhaps Jeff Moden will accept the string-splitting gauntlet one more time. To quote the great man: you've just got to love this community!

    If you've an interesting tale to tell about being helped to a significant breakthrough for a problem by the community, I'd love to hear about it.

    Cheers,

    Tony.

  • To Not CI to Eye

    Posted Friday, February 03, 2012 11:21 AM | 6 Comments

    Many developers, including Troy Hunt, here on Simple-Talk, have argued persuasively that each database developer in a team needs to work as sole user of a dedicated database-development environment whilst creating or updating databases. Troy makes a good case, listing several shortcomings of the shared development model: developers are no longer free to experiment and pursue evolutionary design of the database layer, in the same way as any other component of the application, without unduly interfering with the other developers; without the required discipline of checking in changes for continuous integration, source control for the database is hard to enforce; it's very hard to run reliable tests with data dependencies when everyone is free to change the data structures and data.

    And yet our surveys suggest that around 80% of database development still takes place in a shared environment, and for valid practical reasons. It isn't just about the data, though for enterprise databases, holding perhaps terabytes of data, and with that test data generated by complex ETL processes, it is no trivial task to replicate this environment in each developer's sandbox database.

    Troy's model seemed to assume that developers within the shared model couldn't also access their own sandbox servers, and also that they had no source control. Source Control certainly can be done in the shared model, but just requires more supervision when changes appear that weren't in source control. Many of the consequences he described were due more to this lack of source control, rather than working together on a development server.

    Perhaps the strongest argument for the dedicated model, however, is the need to maintain a constant, stable base against which to develop, in which the DDL code in Source Control is used for Continuous Integration. However, does this CI argument really apply equally to the database as to the application? For any enterprise database of reasonable complexity, the up-front design effort should mitigate the need for continuous evolution of the database design, during development. Also, with many database changes being immediately validated, the value to be gained from dedicated development plus CI, to find out if you broke anything, is surely lessened?

    Could it be that many of the difficulties of integration and deployment that have been so thoroughly documented are more due to poorly thought-out application-interfaces within the database? Integration problems that are due to version differences are far more likely where this has been omitted from the design, and applications have unrestricted access through the database. Are we in danger of trying to change database development practices to try to cure a problem that is far easier to solve by proper system design?

    I'm sure many will disagree; as always I'd love to hear about it.

    Cheers,
    Tony.

  • SQL Server's Big Red Buttons

    Posted Friday, January 06, 2012 12:24 PM | 9 Comments

    One of the most reassuring aspects of watching a vintage James Bond film is the comfort of knowing that, just when there seems no further hope that the villain's plans for world domination will be thwarted, Bond will glance up at the wall and notice a big red button. Instantly, he knows that all he has to do is press it and the villain's lair will self-destruct messily, with plenty of pyrotechnics, and armed men being tossed into the air like rag dolls.

    Of course, you have to wonder why the technologists who built the lair put that big red button on the wall. It seems to be an irresistible urge, and one to which the creators of SQL Server are not immune, as @SQLPoolBoy noted last week in one of his tweets...

            "Looking at a database that has 99% fragmentation across the board. The cause, AutoShrink"

    ...a database brought to its knees by some poor soul who had accidentally hit one of SQL Server's big red buttons.

    You have to feel sorry for anyone who accidentally accepts the default database sizing and auto-growth settings, or turns on AutoShrink, or accidentally creates a collation conflict, or falls foul of any other of a host of 'Red Button' actions that can eventually lead to metaphorical pyrotechnics and DBAs being tossed in the air like dolls.

    SQL Server makes it very easy to tweak its various database- and server-level settings and so it's easy for inconsistency to creep in between database and servers, and it's easy for someone to unwittingly hit one of the red buttons. However, for the DBA managing tens of servers, it's not necessarily easy to find out which buttons have been pressed where, or to find best practice advice on how some of these settings really should be configured for each environment.

    However, help in various forms is slowly emerging. Brent Ozar has made publicly available his SQL Server Blitz script, which helps you verify some of the absolute fundamentals (Are backups being taken? Are DBCC checks being run?), and then seeks out a few of the more common red buttons, which may need deactivating.

    Then there is also SQLCop, a free community tool for "detecting common problems with database configurations and TSQL code". Named after a similar .NET tool (FxCop), it's broader in scope than the Blitz script, checking everything from configuration settings, to fragmented indexes, to missing Foreign Keys, to "code smells" in stored procedures. Red Gate has done some work with the makers of this tool, in incorporating some of their tests into the SQLTest tool, the idea being that the final step to resolving such problems may be automated testing.

    However, there is still much work to be done. What are your favorite "Red Button" actions in SQL Server? What is the best way to find and deactivate them before they cause havoc in your Server and databases?

    Cheers,

    Tony.

  • A suitable present, whatever one's past

    Posted Thursday, December 15, 2011 5:32 PM | 14 Comments

    Even DBAs have devoted aunts. They are probably also oblivious to the mental anguish they cause to their relatives in the run-up to Christmas. What would be a suitable gift for someone so deeply in the grip of technophilia that they can tell you the difference between ten apparently identical brands of Smartphone, and have a couple of them stuffed in their pockets? Who then use them to discuss with colleagues the finer points of query-plan caching or buffer-chaining?

    I've got it you cry.some amusing, techno-gadgets! Something they'd want to keep with them, at their desk at work, and maybe be reminded of the source of the gift. We've all seen them: the USB lava lamps, a Voltron USB drive, a USB-heated blanket, and toast hand warmers. No; this stuff might have some value to the type of geek that holes up in a drafty basement, secretly inventing new programming languages, or trying to break the latest record for overclocking a CPU. Many DBAs, however, would open the parcel with a frozen rictus of a smile, a gurgle of feigned delight.

    A bit more imagination is required. So, here is an idea: a USB Home Brew kit. Plug it into your server or laptop and, with a bit of control software, you can keep the fermentation temperature at a steady 68F, and watch proudly as the brew progresses. Plus, the gentle bubbles of fermentation coupled with the faint odour of malt will be more satisfying, and relaxing, than any lava lamp.

    Could I be wrong? I encourage you to submit your ideas for the most delightful, but yet-to-be-manufactured, Christmas gift for IT professionals. The winner will receive one each of every gift linked in this blog (or equivalent value voucher). Three runners up will receive a gift each.

    Since this is my final Simple-Talk editorial of 2011, I would like to thank everyone for their support over the year, and to wish you all a happy festive season and 2012. I will probably be back in the New Year, depending on how well my new USB Home Brew business takes off.

    Cheers,

    Tony.

  • Fair Comments

    Posted Thursday, November 10, 2011 4:23 PM | 17 Comments

    To what extent is good code self-documenting? In one of the most entertaining sessions I saw at the recent PASS summit, Jeremiah Peschka (blog | twitter) got a laugh out of a sleepy post-lunch audience with the following remark:

    "Some developers say good code is self-documenting; I say, get off my team"

    I silently applauded the sentiment. It's not that all comments are useful, but that I mistrust the basic premise that "my code is so clearly written, it doesn't need any comments". I've read many pieces describing the road to self-documenting code, and my problem with most of them is that they feed the myth that comments in code are a sign of weakness. They aren't; in fact, used correctly I'd say they are essential.

    Regardless of how far intelligent naming can get you in describing what the code does, or how well any accompanying unit tests can explain to your fellow developers why it works that way, it's no excuse not to document fully the public interfaces to your code. Maybe I just mixed with the wrong crowd while learning my favorite language, but when I open a stored procedure I lose the will even to read it unless I see a big Phil Factor- or Jeff Moden-style header summarizing in plain English what the code does, how it fits in to the broader application, and a usage example. This public interface describes the high-level process and should explain the role of the code, clearly, for fellow developers, language non-experts, and even any non-technical stake holders in the project.

    When you step into the body of the code, the low-level details, then I agree that the rules are somewhat different; especially when code is subject to frequent refactoring that can quickly render comments redundant or misleading. At their worst, here, inline comments are sticking plaster to cover up the scars caused by poor naming conventions, failure in clarity when mapping a complex domain into code, or just by not entirely understanding the problem (/ this is the clever part).

    If you design and refactor your code carefully so that it is as simple as possible, your functions do one thing only, you avoid having two completely different algorithms in the same piece of code, and your functions, classes and variables are intelligently named, then, yes, the need for inline comments should be minimal. And yet, even given this, I'd still argue that many languages (T-SQL certainly being one) just don't lend themselves to readability when performing even moderately-complex tasks. If the algorithm is complex, I still like to see the occasional helpful comment.

    Please, therefore, be as liberal as you see fit in the detail of the comments you apply to this editorial, for like code it is bound to increase its' clarity and usefulness.

    Cheers,

    Tony.

  • A temporary disagreement

    Posted Friday, October 28, 2011 10:54 AM | 18 Comments

    Last month, Phil Factor caused a furore amongst some MVPs with an article that attempted to offer simple advice to developers regarding the use of table variables, versus local and global temporary tables, in their code. Phil makes clear that the table variables do come with some fairly major limitations.no distribution statistics, no parallel query plans for queries that modify table variables.but goes on to suggest that for reasonably small-scale strategic uses, and with a bit of due care and testing, table variables are a "good thing". Not everyone shares his opinion; in fact, I imagine he was rather aghast to learn that there were those felt his article was akin to pulling the pin out of a grenade and tossing it into the database; table variables should be avoided in almost all cases, according to their advice, in favour of temp tables. In other words, a fairly major feature of SQL Server should be more-or-less 'off limits' to developers.

    The problem with temp tables is that, because they are scoped either in the procedure or the connection, it is easy to allow them to hang around for too long, eating up precious memory and bulking up the shared tempdb database. Unless they are explicitly dropped, global temporary tables, and local temporary tables created within a connection rather than within a stored procedure, will persist until the connection is closed or, with connection pooling, until the connection is reused. It's also quite common with ASP.NET applications to have connection leaks, as Bill Vaughn explains in his chapter in the "SQL Server Deep Dives" book, meaning that the web page exits without closing the connection object, maybe due to an error condition. This will then hang around in the heap for what might be hours before picked up by the garbage collector.

    Table variables are much safer in this regard, since they are batch-scoped and so are cleaned up automatically once the batch is complete, which also means that they are intuitive to use for the developer because they conform to scoping rules that are closer to those in procedural code. On the surface then, an ideal way to deal with issues related to tempdb memory hogging.

    So why did Phil qualify his recommendation to use Table Variables? This is another of those cases where, like scalar UDFs and table-valued multi-statement UDFs, developers can sometimes get into trouble with a relatively benign-looking feature, due to way it's been implemented in SQL Server. Once again the biggest problem is how they are handled internally, by the SQL Server query optimizer, which can make very poor choices for JOIN orders and so on, in the absence of statistics, especially when joining to tables with highly-skewed data. The resulting execution plans can be horrible, as will be the resulting performance. If the JOIN is to a large table, that will hurt.

    Ideally, Microsoft would simply fix this issue so that developers can't get burned in this way; they've been around since SQL Server 2000, so Microsoft has had a bit of time to get it right. As I commented in regard to UDFs, when developers discover issues like with such standard features, the database becomes an alien planet to them, where death lurks around each corner, and they continue to avoid these "killer" features years after the problems have been eventually resolved.

    In the meantime, what is the right approach? Is it to say "hammers can kill, don't ever use hammers", or is it to try to explain, as Phil's article and follow-up blog post have tried to do, what the feature was intended for, why care must be applied in its use, and so enable developers to make properly-informed decisions, without requiring them to delve deep into the inner workings of SQL Server?

    Cheers,
    Tony.

  • Time for a rethink on SQL CLR?

    Posted Thursday, September 29, 2011 12:10 PM | 14 Comments

    It is almost seven years since Microsoft announced the sensational news that, with their new SQL Server 2005, you could do data manipulation in Visual Basic or C# as well as SQL. The marketing people got overexcited, stabbing out clichés from their keyboards such as "new era", and "revolutionary". However, they had run off barking in a different direction to the technologists, missing the whole value of SQLCLR for providing specialized extensions to SQL Server

    Unsurprisingly, the technology was misunderstood and many DBAs have never recovered from their initial feelings of dread, still shuddering at the merest mention of the term "CLR function". In the opening passage of his recent review of SQL#, a suite of very useful SQL CLR functions for string manipulation, regular expressions, and more, Grant uses terms such as "throwing your arms up" "screaming", "crying out loud", "nuts" and "stupid", which seems a pretty fair assessment of the attitude of many DBAs toward CLR functions in SQL Server.

    Several factors have added to its bad reputation. Some of this is misuse, using SQL CLR for functionality that belongs in the application tier, rather than the database. Security issues have compounded the anathema toward CLRs. Much harm was done by poor implementation, mainly on the part of the coder, but also on the part of Microsoft and SQL Server.

    The worst problem is the complexity of writing effective CLR routines for high performance and scalability. The key to success is the constant streaming of data, using IEnumerable methods and the SQLDataReader, so that SQL Server never has to process too much data at once. The use, instead, of the ADO.NET dataset, and various custom user collections, can make place huge memory demands on SQL Server, involving heavy multi-page memory allocations in the non-buffer pool region. This issue, especially in 32-bit versions of SQL Server, with access to limited amounts of Virtual Address Space, was the cause of paging problems, and specific errors such as "App Domain marked for unload due to memory pressure", or "701" errors, where SQL Server fails to allocate a contiguous region of VAS for an operation.

    Unfortunately, SQL Server does not necessarily make it easy for the SQL CLR programmer. SQL CLR functions cannot use a SQLDataReader over the context connection, so the efficient IEnumerable-based streaming simply isn't available and the programmer is forced to use less efficient techniques, or to find ways around the problem, which ultimately involves classifying your assembly as EXTERNAL ACCESS, or possibly even UNSAFE.

    If you're someone like Adam Machanic, you persevere. His blog, and various presentations, detail his journey through writing custom IEnumerable classes to enable streaming, creating multiple threads in the SQL CLR to help deal with issues such as "rude aborts" and evolving all this into a powerful "query parallelizer" component.

    If all of this is enough to discourage the rest of us from writing our own SQL CLR components, it still doesn't mean we should deny ourselves access to the useful functionality and performance that they can provide. Professionally-written and properly-tested, SQLCLR libraries, such as SQL#, may cause us to refine or qualify the general anti-CLR position. Thorough testing by the likes of the author of SQL#, Solomon Rutzky, has shown that for functions that require heavy computational power, SQL CLR routines have obvious performance advantages over T-SQL. The SQL CLR gives us a "turbo-charged cursor". Any type of calculation that lends itself naturally to cursor-based logic, such as the running total calculation, will run faster as a SQL CLR.

    For accessing important CLR libraries such as the Regex functions SQL CLR assemblies are essential. There are a host of specialized requirements where CLR can replace the clunky and antiquated OLE Automation techniques, and there is the added value of allowing new complex data types such as vectors and coordinates. Yes, the technology has its quirks, but CLR libraries are worth having. In fact, unless you have kicked out the Hierarchy functions and geospatial functions, you already probably have CLR libraries installed!

    I'd be interested to hear if, and how, you're currently using SQL CLR, or at least if your attitude towards it is softening.

    Cheers,

    Tony.

  • The ALMs Race

    Posted Friday, September 16, 2011 11:43 AM | 13 Comments

    How did we get to the point where Visual Studio, and Team Foundation Server, so strongly influences our way of developing applications?

    Visual Studio's history is convoluted, and it is still expanding its remit. Microsoft originally bought in a prototype visual form generator. It was combined with the dialect of Basic originally developed for the Omega project (later Access) to create Visual Basic. Although the Xerox Star workstations were the pioneers of this sort of visual interface, Windows users had never seen an IDE like this; it was popular, and duly proliferated, with Microsoft releasing Visual C++, Visual Interdev, Visual Foxpro and so on. Eventually, Microsoft undertook Project Boston, which combined many of these tools into one product, and the result was Visual Studio 97.

    For many developers Visual Studio is a still just a convenient, visual tool with which to build and compile their application code. However, the first hesitant steps in expanding this role came with the acquisition of Visual SourceSafe, to provide a rudimentary source control system. Since then Microsoft has been working hard to reposition VS as an Application Lifecycle Management (ALM) tool; i.e. one that not only helps you create code, but also provides a means to support all the operational processes that are necessary to guide an application from initial development (agile, of course) through testing, to release and maintenance.

    Visual Studio 2005 Team System (rebranded in VS 2010 as Visual Studio ALM) and Team Foundation Server provides a central, source-controlled data store accessed by the client (VS) via a set of web services, and providing features to support project tracking, build management, deployment, reporting and so on.

    It's a strong industry trend amongst the larger players. IBM offers its "Rational" set of integrated ALM tools and Hewlett-Packard has its own ALM offering. The advantage is an integrated, consistent approach to all aspects of DevOps, allowing the team to focus on business requirements, writing good code, and delivering, regular, incremental improvements to the users. The downside is, of course, vendor lock-in. A disadvantage of being cocooned with within a single-vendor ALM solution is that you could end up having to use products like Visual SourceSafe.

    By adopting single-vendor ALM, you embrace the specific development model of your ALM platform and adapt established processes accordingly. Remote team-working, for example, isn't easy within VS without Hosted TFS, so you tend not to do it. There is no notion of off-line working; if the network goes down, you stop working, which can be a considerable restraint.

    You come to rely on that platform's particular tools, and styles of working, even if it doesn't really match your particular requirements. You may, for example, need a distributed rather than centralized source control system that allow for off-line working. A significant weakness in any one of the tools (bug tracking within TFS is not universally admired, for example) can frustrate the team and slow them down considerably. ALM components tend to be incompatible with any other but their own.

    ALMs have the allure of any shrink-wrapped solution, but even if they offer all the tools you need to in order to support development processes, they should, surely, have open APIs and more pluggable architecture so as to allow development teams to swap specific tools in and out in order to fit ALMs to their own requirements, rather than the reverse? This is the area into which the likes of Collabnet and Atlassian seem to be moving.

    I'd be interested to hear your thoughts.

    Cheers,
    Tony.

  • SQL Code Reuse: teaching a dog new tricks

    Posted Thursday, September 01, 2011 5:42 PM | 8 Comments

    Developers, by every natural instinct and training, strive to make their code reusable and generic. Dissuading them from doing so, in certain circumstances, is akin to trying to persuade a dog not to fetch a stick. However, when Gail Shaw commented on Twitter last week that "For the record, code reuse in SQL.is not always a good thing", it was more than a causal remark; it was borne out of bitter experience.

    The problem is that, in the absence of their usual armory O-O techniques such as encapsulation and inheritance, the price of making database code easier to maintain, by such obvious methods, can be high. The "generic" views, stored procedures and functions that result, may seem elegant and reusable, but can destroy performance, because it is tough for the query optimizer to produce an efficient execution plan. It hurts to make SQL code generic.

    At some point, nearly every SQL Programmer gets infected with the feverish idea of passing table names to stored procedures. "Hey, why write scores of procedures to do this process on each table when I can write a generic, reusable procedure that does it on any table!" Bad idea; behind every stored procedure is an execution plan and a stored procedure designed to work with "any table" will result in a generic execution plan that will perform very poorly for a majority of tables. It is far better if they are tailored for specific tables and specific needs.

    Another typical example is where the logic for some seemingly-complex calculation has been "abstracted" into a monstrous, but reusable, view, which performs tortuous aggregations and multiple joins, executes appalling slowly, acquires numerous long-held locks and causes severe blocking in the database. Often, such twisted logic can be replaced by simple, easily optimized SQL statements. Granted, it isn't "reusable" and flaunts the 'DRY' (Don't repeat yourself) principle, but it is relatively easy to write and will often perform orders of magnitude faster.

    User-defined Functions (UDFs) are another favorite mechanism for promoting code reuse, and are often even more problematic. In-line logic is always much faster, even if to the sensitive developer it has the look of hippos doing line-dancing. Memories of the overuse of UDFs can make any seasoned DBA flinch. If you ever bump into Grant Fritchey at a community event, buy him a beer and ask him about the case of the application with multi-statement UDFs that called other multi-statement UDFs in an attempt at enforcing inheritance in a database. Also ask him how well it scaled beyond a single-user and a single row.

    Should SQL Server simply get better at adopting and supporting such basic and universally-accepted programming practices as putting logic in a function? Probably, yes, but in the meantime, we must measure any code reuse in the database against the likely performance penalty.

    Perhaps the most effective form of code reuse is via constraints, though it requires lateral thinking to extend this beyond simple data rules. Functions can be used, but extra care and effort is required to write them as inline functions; in-line code or calculated columns will always outperform UDFs. Stored procedure use is to be actively encourage; just don't try to make them generic.

    On Simple-Talk we've published a lot about execution plans, query optimization and performance. We believe that, once a developer is aware of the process, they are better able to judge that fine balancing point in the compromise between performance and maintainability. Even better, we hope we've also given a glimpse of an alternative path to those goals, by means of intelligent database design. A neat trick, if you can do it.

    Cheers,

    Tony.

  • A DBA's best friend is his tempdb

    Posted Thursday, August 18, 2011 5:22 PM | 6 Comments

    There is a saying amongst welfare agencies that one can tell how well a family is functioning by looking at their dog. If the dog is neurotic, neglected or maltreated, one fears for the welfare of the children. Likewise, you can tell a lot about the skills of a team of DBAs and developers by looking at the tempdbs on their servers.

    The tempdb database is available to all users of a SQL Server instance to house temporary objects such as cursors and tables, and it is where SQL Server creates various internal objects for sorting and spooling operations, as well as index modifications. It can get really busy in there, especially if there are unruly processes. The wise DBA will look after tempdb, giving it plenty of space, making sure it is never mistreated. In short, a happy, moist-nosed tempdb is the mark of a nurturing DBA.

    By default, tempdb will be installed, with the other system databases, on the c:\ drive of the SQL Server machine. This is far from ideal in almost all cases. Tempdb requires a lot of space, pre-allocated so files aren't constantly growing. You need more than one tempdb data file; one data file per CPU core is a common recommendation. These files need to be located on drives with the highest write performance possible; a RAID 10 array is a good choice. Also, tempdb storage is one area where Solid State Disks are becoming a popular storage choice in preference to conventional magnetic drives, again due to their vastly higher write performance.

    Of course, not all DBAs can afford the luxury of RAID 10 arrays and expensive SSDs. In such cases, special training and vigilance is required. The developers, who exercise the dog, must be encouraged away from complex, unwieldy routines. Structured activity is best, breaking down routines into a series of well-defined steps, and storing intermediate results in a set of explicit temporary tables. In this way, tempdb usage patterns become much easier to predict. Also these tables will hopefully be cached, and the bigger ones can be indexed, both of which will help reduce contention. The DBA must diligently monitor the health of tempdb, detecting the SPIDs of wild processes, investigating and killing them if necessary (the SPIDs, not the developers).

    Even given all this, one can't help feel that Microsoft could do more to prevent a lot of tempdb agony. After all, it feels like an area that's going to get worse rather than better, especially as use of Snapshot isolation becomes more prevalent. Aaron Bertrand has filed several tempdb-related Connect items urging Microsoft to offer better advice during the installation process, but to little avail. Judging by the number of forum questions relating to the often-painful process of moving tempdb and reallocating disk space, more action is needed.

    Some early adopters have their eyes on the new Contained Databases feature in Denali, hopeful that the concept of user databases independent of "plumbing features" like logins and roles may be a move toward selective workspaces for user processes.

    In the meantime, how healthy is the dog in your server? If you've got tales of exemplary nurturing or scandalous abuse, we'd love to hear them. The best story will, as always, win a prize.

    Cheers,

    Tony.

  • Bug Me Not

    Posted Friday, July 08, 2011 10:10 AM | 4 Comments

    Bug metrics are a notoriously erratic way to judge the performance of a development team and project, but despite this almost all software projects use them. There is a lot of data you can get from an electronic bug-tracking system, from bugs per lines of code, bugs per component, to defect trend graphs and bug fix rates. It is tempting to try to find meaning in the data, but how useful is this data, ultimately, in driving up software quality, in the long term?

    If you judge software testers on the number of bugs that they find, then more bugs will be found. If you judge developers on the number of bugs for which their code is responsible, then you'll get much less buggy code, but you'll probably struggle to ship a product on any reasonable timescale. Over the course of a project, it's easy for the team and even individual developers to feel oppressed by the bugs, and under intense pressure to produce 'better' code. Bugs continue to be logged and reported assiduously, but many of them simply disappear into the backlog, to be fixed "at some later time". As the pressure of the ship date mounts, developers are simply forced to cut corners, to change their perception of what "done" really means, in order to increase their velocity, and meet the deadline. Software quality and team morale suffers as result, and despite being rigorously tracked and reported, bugs fester from release to release, since there is never time to fix them. Before long the team finds itself mired in the oubliette.

    So how can we use bug metrics to drive up software quality software, over the long term, while enabling the team to ship on a predictable and reasonable timescale? In all likelihood, the surprising answer is "we can't". In fact, the ultimate goal of an agile development team might be to dispense with the use of an electronic bug tracking system altogether!

    Certainly at Red Gate, some teams are using JIRA for incoming customer defects, but also a more holistic "technical debt wall", consisting of post-it notes describing the most important issues causing "drag" on a team. They then collectively seek to resolve these issues, whilst striving to remain close to zero new internal defects.

    The team works to cultivate an atmosphere of zero tolerance to bugs. If you cause a bug, fix it immediately; if you find a bug in the area in which you're working, tidy it up, or find someone on the team who can. If you can't fix a bug as part of the current sprint, decide, very quickly, how and even if it will be fixed. This is not easy to achieve; it requires, among other things, an environment where it is "safe" for the team to stop and fix bugs, where developers and testers work very closely together, and both are strongly aligned with the customer, so they understand what they need from the software and which bugs are significant and which not.

    However, when you get there, what becomes important is not the number of bugs, and how long they stay open in your bug-tracking system, but a deeper understanding of the types of bugs that are causing the most pain, and their root cause. The team are then judged on criteria such as how quickly they learn from their mistakes, by for example, tightening up automated test suites so that the same type of bug doesn't crop up time and again, or by improving the acceptance criteria associated with user stories, so that the team can focus on fixing what's important, as soon as the problem arises.

    These are criteria that really will drive up software quality over the long term, and allow teams to produce software on predictable timescales, and with the freedom to feel they can "do the right thing".

    What do you think? Is this a truly achievable goal for most teams, or just pie-in-the-sky thinking?

    Cheers,
    Tony.

  • Cloud Backup: Getting the Users' Backs Up

    Posted Friday, June 24, 2011 2:40 PM | 4 Comments

    On Wednesday last week, Microsoft announced that as of July 1, all data transfers into its Microsoft Azure cloud will be free (though you have to pay for transferring data out). On Thursday last week, SQL Azure in Western Europe went down. It was a relatively short outage, but since SQL Azure currently provides no easy way to take a standard backup of a database and store it locally, many people had no recourse but to wait patiently for their cloud-based app to resume. It seems that Microsoft are very keen encourage developers to move their data onto their cloud, but are developers ready to do it, given that such basic backup capabilities are lacking?

    Recently on Simple-Talk, Mike Mooney described a perfect use case for the Microsoft Cloud. They had a simple web-based application with a SQL Server backend; they could move the application to Windows Azure, and the data into SQL Azure and in the process free themselves from much of the hassle surrounding management and scaling of the hardware, network and so on. It was a great fit and yet it nearly didn't happen; lack of support for the BACKUP command almost proved a show-stopper. Of course, backups of Azure databases are always and have always been taken automatically, for disaster recovery purposes, but these are strictly on-cloud copies and as of now it is not possible to use them to them to restore a database to a particular point in time.

    It seems that none of those clever Microsoft people managed to predict the need to perform basic backups of Azure databases so that copies could be stored locally, outside the Azure universe. At the very least, as Mike points out, performing a local backup before a new deployment is more or less mandatory.

    Microsoft did at least note the sound of gnashing teeth and, as a stop-gap measure, offered SQL Azure Database Copy which basically allows you to create an online clone of your database, but this doesn't allow for storing local archives of the data. To that end MS has provided SQL Azure Import/Export, to package up and export a database and its data, using BACPACs. These BACPACs do not guarantee transactional consistency; for example, if a child table is modified after the parent is copied, then the copied database will be in inconsistent state (meaning, to add to the fun, BACPACs need to be created from a database copy). In any event, widespread problems with BACPAC's evil cousin, the DACPAC have been well-documented, and it seems likely that many will also give BACPAC the bum's rush.

    Finally, in a TechEd 2011 presentation tagged "SQL Azure Advanced Administration", it was announced that "backup and restore" were coming in the next SQL Azure CTP. And yet this still doesn't mean that we'll get simple backups as DBAs know and love them. What it does mean, at least, is the ability to restore any given database to a point in time within a 2-week window.

    For the time being, if you want a local copy of your data and don't want to brave the BACPAC, one is left with SSIS or BCP, creative use of schema and data comparison tools, or use of SQL Azure Backup (currently in beta) in order to perform this simple but vital task.

    Cheers,

    Tony.

More Posts Next page »
Latest articles
A first look at SQL Server 2012 Availability Group Wait Statistics
 If you are trouble-shooting an AlwaysOn Availability Group topology, a study of the wait statistics... Read more...

SQL Server Prefetch and Query Performance
 Prefetching can make a surprising difference to SQL Server query execution times where there is a high... Read more...

SSIS Basics: Setting Up Your Initial Package
 When working with databases, the use of SQL Server Integration Services (SSIS) is a skill that often... Read more...

Checking Out SQL Backup Pro 7’s New Automatic Backup Verification
 Wouldn't it be great to offload the daily chore of checking the integrity of your production... Read more...

Chuck Lathrope: DBA of the Day
 Chuck Lathrope was a finalist for the Exceptional DBA of the Year award in 2009. We contacted him to... Read more...