|
|
Simple-Talk Editor
News, views and good brews
-
Posted Friday, February 05, 2010 10:54 AM |
The English language has, within a lifetime, emerged as the ubiquitous 'international language' of scientific, political and technical communication. On the one hand, learning a single, common language, International English, has made it much easier to participate in and adopt new technologies; on the other hand it must be exasperating to have to use English at international conferences, or on community sites, when your own language has a long tradition of scientific and technical usage. It is also hard to master the subtleties of using a foreign language to explain advanced ideas. This requires English speakers to be more considerate in their writing. Even if you’re used to speaking English, you may be brought up short by this sort of verbiage… "Business Intelligence delivering actionable insights is becoming more critical in the enterprise, and these insights require large data volumes for trending and forecasting" It takes some imagination to appreciate the added hassle in working out what it means, when English is a language you only use at work. Try, just to get a vague feel for it, using Google Translate to translate it from English to Chinese and back again. "Providing actionable business intelligence point of view is becoming more and more and more business critical, and requires that these insights and projected trends in large amounts of data" Not easy eh? If you normally use a different language, you will need to pause for thought before finally working out that it really means … "Every Business Intelligence solution must be able to help companies to make decisions. In order to detect current trends, and accurately predict future ones, we need to analyze large volumes of data" Surely, it is simple politeness for English speakers to stop peppering their writing with a twisted vocabulary that renders their writing inaccessible to everyone else. It isn’t just the problem of writers who use long words to give added dignity to their prose. It is the use of Colloquial English. This changes and evolves at a dizzying rate, adding new terms and idioms almost daily. As such, it is almost a new and separate language. By contrast, ‘International English', is gradually evolving separately at its own, more sedate, pace. As such, all native English speakers need to make an effort to learn, and use it, switching from casual colloquial patter into a simpler form of communication that can be widely understood by different cultures, even if gives you less credibility on the street. Simple-Talk is based, at least in part, on the idea that technical articles can be written simply and clearly in a form of English that can be easily understood internationally, and that they can be written, with a little editorial help, by anyone, and read by anyone, regardless of their native language. Cheers, Tony.
|
-
Posted Thursday, January 21, 2010 2:50 PM |
In a fascinating interview in this issue of Simple-Talk, Don Syme discusses his work on the F# programming language, its journey from it academic roots in ML and Ocaml, to its burgeoning status as the hottest new thing in .NET. A central pin in the forthcoming "parallel processing, multi-core revolution", it may or may not be, but F# has certainly captured people's attention as a simple, multi-purpose language that can be used for anything from performing complex calculations, building websites, or as a powerful scripting language. I've been slow to latch on to F#, but it was this apparent enthusiasm for F# as a scripting language that finally prompted me to take a look, along with an observation by Simple-Talk's Laila Lotfi that it was a language "that is surprisingly intuitive for any SQL Developer". Could this possibly be the "unifying" scripting language for developers and DBAs? Being essentially a functional language, F# has a markedly different philosophy to an imperative language such as C#. Instead of providing a detailed set of instructions that must be followed to achieve the desired result, F# applies a series of transformations and filters (in the form of functions). This certainly sounds very "SQL like" in its approach, but how does it work in practice? Coincidentally, Chad Miller wrote a small blog post on SQLServerCentral showing how to script out all the tables in a SQL Server database using F# and SMO. Leaving out the few lines required to import the SMO libraries and so on, the code is as follows: let svr = Server(@"ServerName\InstanceName") let db = svr.Databases.["DatabaseName"] for t in db.Tables do for s in t.Script() do printfn "%s" s;; It is certainly a very clean, short, uncluttered script, but it isn't, as the author acknowledges, very F#-like. One of the strengths, though some would argue weaknesses, of F# is that it allows many different ways to tackle the same problem. Here, we simply waddle through the usual for loop, scripting out the contents of the SMO Tables object. One could write a very similar-looking script in PowerShell or Python. If we express the script in a way more idiomatic of the F# language, it might look something like this: db.Tables |> Seq.cast |> Seq.collect (fun (t:Table) -> t.Script() |> Seq.cast) |> Seq.iter (fun s -> printfn "%s" s);; What this code does, in simple terms, is get a list of tables into a parameter of type table (t:Table), and then apply a pipeline of "transformations", scripting out each table, concatenating the results (seq.collect) and then "selecting" each one out (seq.iter) and printing it. The seq.cast function translates the Ienumerable type that's delivered by SMO into something F# can use. As a fully-fledged part of the .NET framework, one would hope that over time the need for these sorts of translations will disappear. Overall the code is still short, at least, but certainly looks a lot less intuitive at first glance! However, if you start to think of those pipeline characters (|>), which funnel values from one transformation to the next, as akin to connecting arrows in a SQL execution plan, then perhaps one can start to see how, after a little practice, this style of scripting may come fairly naturally to the SQL developer and DBA. Or maybe not! Perhaps I will forever hope in vain for a common scripting language to unite DBAs and developers, but I do see some potential in F# and, as powerful as PowerShell undoubtedly is for database maintenance, you can't reuse what you've learned to build websites or "bond" with your developers. How proud would developers be of their DBA if he scripted database tasks using F#?! As always, we love to hear your thoughts and objections, so please add them as comments. This week, a $50 Amazon voucher goes to randyvol for his excellent contribution to the "Reckless Drivers" blog. Cheers, Tony.
|
-
Posted Thursday, January 07, 2010 11:53 AM |
When I first joined the industry in the late 90's, Microsoft was in the process of shunting the ODBC driver into the background in favour of OLE-DB and ADO. The ODBC driver was still included in Microsoft' Windows MDAC framework, but it had gained a rather unfair reputation for obtuse connections strings, unreliability and slow data access. They had to include ODBC, as it was and remains, the only way to connect to the more exotic data sources. Most VB/ASP developers adopted ADO and OLEDB, as they were promised enhanced performance. The connection strings were hardly any less obtuse but developers persevered. Lucky .NET programmers are now provided with ADO.NET, which is the central database access machinery and allows you to connect with SQL Server via OLEDB, SQLClient, or ODBC. However, things are more confusing for those stuck with unmanaged applications. When SQL Server 2005 appeared, Microsoft chose not to update MDAC to support the new features and data types. Instead, they introduced SQL Native client (SQLNCLI), offering combined ODBC and OLEDB functionality in a single DLL and included support for all the new data types – UDTs, XML, varchar(max) – as well as snapshot isolation, Multiple Active Result Sets, and so on. Any stuffy old unmanaged ADO /MDAC apps that didn't need access to these exciting new features could continue to use MDAC, which Microsoft hurriedly renamed to 'Windows DAC'; otherwise the apps had to be migrated to the SQL Server Native Client. It does seem that the native client is more powerful and reliable than the MDAC ones, which had a tendency to "break" an application every time a new Windows service pack was applied. Unfortunately, those brave souls who chose to move from MDAC to the native client, in order to exploit the new features of SQL Server entered a minefield. While MDAC was tolerant of less than strict adherence to the specs (such as starting parameter names with "@"), the native client isn't, and will issue errors. There are also all sorts of subtle variations in behaviour to deal with, in regard to connection strings, dealing with failed connections, warning and error handling, and even in basic transaction handling. And it isn't as if the support for new features is complete. Yes, the native client supports the xml data types, and FOR XML queries…but that's it. No other part of XML is supported by the native client. You have to use SQLXML for this. How did we find ourselves in this mess on an issue as fundamental as database connectivity? Ten years ago Microsoft lost enthusiasm for the ODBC standard, and it is left mainly to "sticks in the mud" like Phil Factor to point out that what we dismissed was actually worth having: a fast, truly open standard for accessing all data sources. If it had been properly nurtured and developed by Microsoft so that it supported the full standard, it would doubtless have been loved by all. Instead new drivers come and go, along with new standards for data access, such as LINQ and EF, so that people are left paralyzed by choice, and unable even to safely upgrade their applications to support the newest version of the database. Cheers, Tony.
|
-
Posted Friday, December 11, 2009 10:05 AM |
I've always taken comfort in the fact that T-SQL belongs to an enlightened age of computer languages in which the operations that were specified were close to real language. When one sees a command called DBCC_CHECKDB, then one could feel pretty confident that it checks your database for any problems. Or so I thought…until I discovered DBCC DBREPAIR. If you're feeling confident, without looking it up, that DBREPAIR will perform some sort of repair work on a damaged database, then you're in for a shock. With an extraordinary irony, the command does the opposite of what you might think and destroys your database entirely. It is like using a Kalashnikov rifle to cure a headache. The DBCC DBREPAIR command has been in SQL Server and Sybase from the start and one can still find the occasional forum reference to a time when it clearly did have some use for 'repairing a database that had failed another DBCC check'. However, over time, the command was 'simplified' to the point where it meant 'please drop my database from a great height'. Use it and you need never again worry about that torn page in your database! Mercifully, the DBREPAIR command has been 'repaired' to the Grim Deprecator. However, its existence is a salutary lesson to all who are lulled into such a sense of confidence in the straightforward nature of T-SQL. When it lobs you a curve ball, it can have a devastating effect. And what next one wonders? What other tools and procedures could Microsoft provide to keep under-worked DBAs on their toes? Perhaps a DBCC CLEANTABLE command that relieves your database of its messy tables? Or a DBCC DBRECOVER command that pitches your healthy Database into 'recovery mode'? I planned to finish this editorial with further examples of amusingly paradoxical Transact SQL. However, it appears at this point as if some celestial DBA has mistakenly done a DBCC DBREPAIR on my brain, so consider it the Simple-Talk Christmas challenge. The best example of paradoxical T-SQL, added as a comment the editorial blog, will win a very nice iPod Nano.
|
-
Posted Thursday, November 26, 2009 12:05 PM |
Tools such as Visual Basic (VB) and Visual FoxPro succeeded and became wildly popular because they provided simple and effective abstractions of a programming language. This allowed relatively inexperienced developers to rapidly prototype and produce small business applications. The .NET languages and platform that superseded them have grown rapidly in complexity and this sort of simple abstraction has proved much more elusive. Web Forms attempted to abstract the complexities of the stateless Web model and to speed up the development of Rich Internet Applications (RIA), by making them work more like a traditional desktop application. This was done to minimise the culture-shock for those VB 6 / ASP developers who did not have a strong background in HTTP/HTML Web development. In the event, it confused the VB6 code-jockeys, and presented a complexity-overkill for the seasoned Web developer. The problems seem to stem from the fact that the event-driven model, which creates state where there is none, and relies heavily on Postback, is fragile and makes needlessly complex the seemingly- straightforward task of rendering HTML reliably to a browser. Criticisms levelled at Web forms include difficulty in testing, bloated ViewStates, a lack of standards-compliance, which made cross-browser support painful, and generally sucking all the fun out of web programming. As such, the advent of ASP.NET MVC (Model-View-Controller) is a blessed relief to many. It is based on solid and well-established principles of “separation of concerns”, and actually simplifies the process of rendering standard HTML to a range of browsers. And yet, as presented by Scott Guthrie in his recent DevConnections keynote, and reported recently by Laila Lotfi on Simple-Talk, it is aimed squarely at the experienced programmer who wants “full control over their HTML”, and at large-scale applications. If ASP.NET MVC is for the experienced programmer only, and Web Forms are fundamentally flawed, or as Rob Conery puts it are “an abstraction wrapped in deception covered in lie sauce presented on a plate full of diversion and sleight of hand”, then where does the small shop turn for a simple and rapid development platform for RIAs? It seems that, for these people, the answers may lie elsewhere. Ruby on Rails (RoR), also based on an MVC design, has grown rapidly in popularity due to a simplicity and clarity of structure that often seems to elude Microsoft frameworks. Likewise, PHP has always been massively popular for its relative simplicity and ubiquity, and it now has it own CakePHP framework, which begs, steals and borrows more than a few tricks from RoR. Similarly, the Python fraternity is warming to Django. On top of this, there are ExtJS, jQuery, MooTools, Dojo, Rialto, Qooxdoo, and a host of other JavaScript platforms, offering an easy route to RIA paradise. A generation of entrepreneurs grew successful businesses using simple RAD tools such as VB and FoxPro. These simple tools don’t seem to exist for the next generation of predominately web-based developers, at least not in the Microsoft platform. As always, we’d love to hear what you think. Many thanks to everyone who contributed to the previous Scalar UDFs editorial; the Amazon voucher goes to BuggyFunBunny for an interesting peak into a possible future direction. Cheers, Tony.
|
-
Posted Friday, November 13, 2009 10:24 AM |
Many developers seem to regard SQL Server as if it were a science-fiction alien planet where unsuspecting crew-members in blue jumpers occasionally die horribly; everything is suddenly unsafe, and potentially malicious: nothing really works properly and so any serious code should be kept well away from it. Is this developer ignorance, or is their fear justified?
The answer is, of course, somewhere in between; but SQL Server certainly does itself no favors by tolerating, and avoiding, features with ‘known problems’, rather than fixing them. It continues to ignore issues such as the performance of scalar UDFs. The problem is essentially that the optimizer deals with them very inefficiently. It generally can't decompose and rearrange the function as if it were inline, so the function gets executed once for every single row returned. This often has a huge performance impact.
Very recently, I had a conversation with a developer who had encountered exactly this problem during a commercial development project. It was a very familiar tale. As he said it to me, he was just a .NET guy, not a "SQL Server expert", but it was clear that he was simply astonished that something as fundamental as encapsulating reusable logic in a function could cause such drastic performance problems in SQL Server.
The problem has been known ever since UDFs were introduced but it has never been resolved, despite an apparently reasonable and supported Connect request. The reason for Microsoft dragging its feet on this issue appears to be that the workaround is relatively straightforward. It sometimes takes a bit of thinking and ingenuity but, in most cases, a developer can turn a scalar UDF into an inline UDF, as documented by Itzik Ben-Gan, among others.
Seasoned Database Developers just know about the problem and avoid it. One can blame developers that fall foul of the dangers of UDFs for not making the effort to understand how SQL Server works before using it, but on the other hand, if SQL Server cannot make even these basic concessions to universally-accepted ideas of sound programming practices, then maybe it can't complain about the feeling of apprehension amongst developers who use it only occasionally. UDFs are not the only flawed feature that adds to the perception that SQL Server is quirky. For years, error handling was another, lack of good documentation tools, and so on.
It isn’t even a good idea for seasoned Database Developers to just know about the problem and avoid it. Problems such as UDFs and SELECT..INTO get mythologized, and DBAs start to flinch from even the most harmless usage of flawed parts of the language and avoid features long after they are fixed. Given how persistent such myths can be, even if Microsoft fixed scalar UDFs tomorrow it could still be another 10 years before the "functions don't work in SQL Server" myth worked its way out of the system, and developers and DBAs were convinced that they were safe to use again.
Cheers,
Tony
|
-
Posted Wednesday, October 28, 2009 12:09 PM |
The doggedness with which certain SQL Server "myths" cling to the coat of the SQL Server community is really quite astonishing. They are the proverbial "sheep ticks"; very annoying and incredibly difficult to dislodge. And even when you think you've managed it, their embedded jaws remain to provoke further irritation. A classic case in point is the "SELECT…INTO" myth, which states that its use should be avoided as it will cause serious blocking issues in SQL Server. Way back in the days of version 6.5, i.e. circa 1996, SQL Server had a serious problem with SELECT…INTO. Due to limitations in the way page allocation worked, these statements caused massive contention in tembdb. In effect, tempdb was "locked" for the duration of the statement, and so any other process that required access to tempdb was blocked. Many a time, a long-running query would pour its results into a new table via a SELECT…INTO, and effectively lock an entire production application for maybe an hour until KILLed by the DBA. It was an unforgettable experience whenever it happened, remembered vividly both by the users and the harassed DBAs. The problem was essentially fixed in version 7.0, more than 10 years ago, and yet the memory of the problem lives on so vividly in the minds of many DBAs that, to this day, the myth persists, and SELECT…INTO is effectively banned from the SQL Servers they manage. Of course, other DBAs have accepted that SELECT…INTO has been rendered harmless, and have cautiously released it into the wild. And yet, it seems that the jaws of this particular tick remain resolutely embedded in our skin. In SQL Server 2000, the allocation system for tempdb was improved to greatly reduce contention issues, but the fact remained that SELECT…INTO, and other processes that made heavy use of tempdb, could still cause significant contention. In response, with SQL 2000 SP3, Microsoft introduced a new trace flag, -T1118. When activated, it forced "uniform extent allocations instead of mixed page allocations" and helped with tempdb scalability. Although still available in SQL Server 2005 and 2008, further architectural improvement has rendered this flag redundant…or has it? Speculation persists to the effect that if you are performing SELECT…INTOs then you should set the -T1118 flag in order to deal with the resulting tempdb contention. And yet others in the community have reported not being able to reproduce a single case where the flag has helped improve throughput in SQL Server 2005 or 2008. It's not the fact such myths arise that surprises me, but the way they manage to persist for so long, and morph into new forms. Why is this? Is it a lack of clear information? Or difficulty in finding the information that is out there? What other "ticks" do we need to kill off once and for all, and how can we do it? Could a resource such as asksqlservercentral help de-flea the community? Cheers, Tony.
|
-
Posted Friday, October 16, 2009 1:11 PM |
One of the big advances in Microsoft's "2008 platform", with regard to Reporting Services, was that there would be a single, consistent Report Definition Language (RDL) across all the products. This means that reports developed in Report Builder can be shared with reports developed in BIDS, and vice-versa. While one can immediately appreciate the advantages of this, it's disappointing that it seems, on this occasion, to have been at the expense of compatibility efforts. If you've developed reports in Visual Studio 2008 and expect to be able to deploy them to SSRS 2005, then think again. You can't. They will only work with SSRS 2008. This one is perhaps more understandable, given the extent of the changes to RDL, although it has inconvenienced more than a few developers. Conversely, however, if you are a customer who has made the technical investment in SSRS 2008, you have every right to look forward using your Visual Studio ReportViewer controls across both VS 2008 and SSRS 2008. Well, think again (again), because you can't. While Microsoft made big efforts to improve the number and quality of the report controls available in SSRS 2008, the VS ReportViewer 2008 control is not one of them. It still based on the 2005 version of RDL and so won't work with SSRS 2008. One can appreciate the extreme difficulties in coordinating releases, and dealing with dependencies across all the different products and versions, but it's still hard to understand why Microsoft has not rectified this issue. About four years ago many people discovered the joy of being able to deliver reports to SQL Express clients, allowing them to move away from disparate reporting solutions based on Crystal Reports, MySQL and so on. In many cases, this ultimately led to a purchase of a full SQL Server 2005 license. However, it also means that there are now thousands of reporting clients that are dependent on the VS ReportViewer control, and so are "blocking" any potential upgrade to SQL Server 2008! The lack of clear communication on this issue is even more difficult to accept. Back in August 2008, Bill Vaughan printed a retraction to his earlier assertion that SSRS 2008 and the ReportViewer control would work together, suggesting that the problem would be resolved in around 6 months. It has still not happened. Since then, there has been very little news. Indeed, if you read http://msdn.microsoft.com/en-us/library/ms345248.aspx perhaps you could be forgiven for being led to believe that SSRS 2008 and the VS ReportViewer control play nicely together and so pressing forward with the SSRS 2008 upgrade. Having migrated and tested potentially hundreds of reports and overcome numerous architectural changes and challenges, many will find it extremely disappointing that the anticipated reward to their reporting clients remains elusively out of reach. As always, you'd love to hear your experiences with this issue! Cheers, Tony.
|
-
Posted Thursday, October 01, 2009 5:47 PM |
Do you suffer from Index Anxiety? Do you lie awake, worrying about whether you are neglecting to defrag your indexes? We’ve all heard the chestnuts: rebuild all indexes that are over 30% fragmented; reorganize those that are between 10-30% fragmented; don't bother at all for small indexes. If you have the database equivalent of an eating disorder, worrying whether your indexes are too fat and fragmented, then I’d like to provide reassurance: they are often designed to be that way. If Index rebuilding is the database equivalent of a body-image neurosis, then the SQL Server Maintenance Plan Wizard often delivers slimmer's snake-oil: if in doubt, rebuild! Even in cases where at least some effort is made to track index fragmentation, using DBCC SHOWCONTIG or the sys.dm_db_index_physical_stat DMV, I wonder whether the index-rebuild task is done more for reassurance than in confident expectation of a sustained performance boost. The way it was explained to me is that, like people, indexes gravitate towards a certain natural weight. Some are naturally "skinny" and others naturally "fat". Let’s suppose that you have an index on a "name" column for a table storing members of your website. Entries will be regularly added to and deleted from the index column, space will be created and the index will grow fatter. However, this space will be reused quite effectively. As index entries are removed, the space will be reused as similar entries are added, and entries will naturally migrate from place to place in the index. If you regularly remove that space with a rebuild (or reorganization), all that will happen is that the engine will have to recreate it – working hard in the process – and the index will naturally grow fat again. It's the ultimate yo-yo diet, and it is all to little avail. I'm not arguing that there aren't valid reasons to rebuild indexes – just betting that over ninety percent of index rebuilds are, at best, a waste of time, if not positively harmful. Too often, it seems to me, there is a missing task, which is to understand exactly how a table and its indexes are used by an application, to know the I/O that the most expensive queries perform, and to determine whether this work really will be reduced, for any significant period of time, by a rebuild. If you are constantly rebuilding indexes to maintain performance, then you need to look at your underlying schema design, and at the way your queries work, and see if there is a way to tune them to reduce range scans, and so on. Crash Maintenance Plan slimming diets, followed by binge index growth, won't help to give your indexes the perfect figure. It is much better to give your indexes a healthy lifestyle. As always, we'd love to hear what you think. The best contribution, made as a comment to the blog, will win a $50 Amazon voucher. Cheers, Tony.
|
-
Posted Thursday, September 17, 2009 5:12 PM |
How many DBAs, I wonder, really know how to go about detecting potential rootkits in their SQL Servers? To install a simple database rootkit is an easier task than you might imagine, and once it's there it can be very difficult to uncover, as it goes about capturing passwords, stealing data, tampering with user accounts, or performing similar nefarious activities. The term rootkit emerged from the UNIX world, where the ROOT privilege was top of the tree. A database rootkit is essentially an application or procedure that it used to covertly maintain unauthorized administration-level access to that database, as opposed to the underlying operating system that is the prey of traditional rootkits. A SQL Server rootkit can take many guises, but in its simplest form it can be a function whose logic has been subtly altered to return different results, or a stored procedure slipped quietly into a system database and executed by unsuspecting users, in place of the legitimate target procedure. System views, the results from which seem genuine to the untrained eye, have been tweaked to mask the changes, and hide the existence of the rogue database login. What may be surprising to some is just how easy it is to install this type of rootkit on a SQL Server, once an intruder has obtained administration-level access. Kevvie Fowler's excellent book, SQL Server Forensic Analysis, explains in step-by-step detail how this might be done on SQL Server 2000 and 2005. Certain entry points have been closed in SQL 2008 but it's still entirely possible, for example, to exploit the way in which SQL Server searches the Master database for a referenced object, if it cannot be found locally. The relative ease of introducing such rootkits stands in stark contrast to the task of detecting them, and the damage that they have potentially inflicted. It involves identifying and collecting the various data fragments (artifacts) that are the 'fingerprints' left by the nefarious hacker, developing incident response scripts, installing "Windows Forensic Toolchests", collecting, collating and analysing all of this data in order to painstakingly reconstruct the activity of the intruder. It's a fascinating journey but one feels exhausted almost before one begins, and I wonder how many DBAs have the processes in place to detect this sort of activity or, indeed, the sort of layered security measures that can help prevent such intrusions in the first place, as described in John Magnabosco's new book, Protecting SQL Server Data. How real is the threat of SQL Server Rootkits? Are they a serious danger, or just another security issue that is being conjured up to instil unjustified fear and dread in the DBA? As always, it is your comments that will lead opinion. Cheers, Tony.
|
-
Posted Thursday, September 03, 2009 2:21 PM |
What is wrong with benchmarking software? Not much, you’d have thought. A while back, Laila Lotfi wrote an editorial on the need for a standard benchmark for Object-Relational mappers, such as Entity Framework and nHibernate. By how much do they really slow down database applications? When the developers over at x-tensive.com, creators of the DataObjects.Net ORM tool, created a series of benchmarks much as Laila suggested, they published the results on the ORMbattle.NET website. Nobody could have imagined the resulting Brouhaha. Derision has been heaped on the stated intent of the site, to provide an honest objective comparison of the performance of various different ORM tools for .NET. Instead, it has been portrayed as a "sleazy", "classless", underhand marketing trick using biased, inaccurate and unrealistic benchmark tests that were deliberately designed to show competing tools in the worst possible light. There many complaints to the effect that the benchmark tests are useless, because the tool "should never be used in that way". If it can be, it almost certainly will be; and it is up to the tool creator to make sure that it stands up as well as its competitors. Although some interesting points emerged about the "bulk processing" nature of the benchmarks, and the manner in which transactions were used, any "knowledge shrapnel arising from the explosion of ORM minds" was largely drowned in a sea of acrimony, personal affront and mud-slinging. I imagine the sight of such a brawl sent a chill down the spine of managers who may have been planning to use ORM technology. This comes at an unfortunate time. The IT industry is increasingly coming to suspect that the performance and scalability issues that come from use of ORMs outweigh the benefits of ease and the speed of development. DBAs will point unfailingly at the poorly optimised SQL that these ORM tools often produce. ORM supporters often accuse developers of not understanding of their tool and its features. If the latter is true, then such benchmarks are doubly important. It is not easy to produce realistic and fair benchmarks, especially for complex ORM tools, but if the community engages and perseveres, meaningful comparisons can be achieved, and one can learn a great deal from the process. We are crying out for objective benchmarks and if the ORM industry itself cannot hope to agree on how to do it, then perhaps benchmarks will have to be imposed on them. As always, we'd love to hear what you think. Cheers, Tony.
|
-
Posted Friday, August 21, 2009 11:28 AM |
The relational data model is 40 years old this month, and SQL is not much younger. By any standards, it is a mature, well-documented and well-understood language. So why do we still find SQL code in production that lacks resilience, is so vulnerable to breakage due to unexpected usage patterns, small changes in schema design, or even a standard SQL Server upgrade? Other languages provide IDEs that will warn you of some vulnerability in your code but the equivalent is not currently available for SQL. There are practices we all engage in that just aren't good for the long-term resilience of the database. A tool such as SQL Prompt can certainly help you to code quickly and accurately, and to produce maintainable and readable SQL. It can expand wildcards, replacing '*' with a column list, and can dynamically quote identifiers in square brackets, and qualify object names, which will help protect against certain changes. However it can't prevent you from doing things that are going to return to haunt you. It stops short of warning you if you err from good SQL "best practices": maybe this is a blessing. 'Best practices' implies some sort of universally correct way to code, appropriate for all circumstances, and is generally a term that has me reaching for my holster. Of course, the recommendations and advice provided by the likes of Alexander Kuznetsov, Plamen Ratchev, Joe Celko and others are valuable, and all the more powerful for their recognition that there is never a single "correct" way to solve an SQL problem. There are always exceptions to every rule. Of course, some advice, such as "Don't shrink your database files" is copper-bottomed. However, for every person who cautions against use of BETWEEN due to inaccuracies when dealing with time, there will be another who reasons that it makes for more readable code. For every piece of sound advice to avoid "*" and always to use fully-qualified object names, there will be equally valid exceptions, such as the need to preserve the contents of a table of unknown structure in a temporary table. Even rules as commonsense as 'never use cursors' can have exceptions. As Joe Celko points out, with traditional programming languages, every time you compile a program, you get the same executable code. It doesn't work like that in SQL. Data changes, data is added and removed, and query execution plans evolve accordingly. What one moment is sound advice quickly shades to gray and then is soon redundant. Every time a new edition of SQL Server is released, old assumptions (such as 'never use 'SELECT…INTO') are invalidated, although many continue to rely on them. Defensive programming is vital, but one cannot build resilient code simply by adopting a set of prescribed best practices, but instead by relentlessly revisiting and retesting your code and your assumptions. Or am I wrong? Could a 'best practices' feature in a tool like SQL Prompt be made to work, for a language as flexible and dynamic as SQL? What advice do you hear that you think should be emphasized more strongly? Do you know of a 'best practice' that should be kicked into touch? I’d love to hear about them from you. Cheers, Tony.
|
-
Posted Thursday, August 06, 2009 11:35 AM |
For the older generation of SQL developers, discouraging the use of views is tantamount to accusing a respectable aging dowager of immorality. No way! They've always been such benign and conventional database objects. Views are virtual tables, constructed from base tables and other views. They behave just like tables, and are a very simple means to "pre-assemble" data into an easily-queried format for the end user, thus protecting them from the underlying complexity of the SQL, and providing a layer of protection to your base tables. Views are doubtless an occasionally-useful tool for the database developer, when designed wisely. Unfortunately, abuse of views has increased proportionately with the widening mismatch between the relational model and the object-oriented world of C# development. The result has been the increasing use of "Godzilla Views" that attempt to deliver entire business objects to LINQ or EF users, hiding dozens of tables, hundreds of columns, and almost always returning far more data than is really necessary. Like Godzilla, such views will inevitably turn on their creators and wreak awful havoc on the databases they infest, through the performance and maintenance problems they cause. So when should you use views? Under what circumstances is it a bad idea? Much of the SQL literature is curiously silent on the issue. Surely, in the light of advice discouraging their use, it is time to take a reasoned stand and start a debate about what guidance IT departments should give over the use of views. I'd like to start by suggesting a few 'best practices'. - DO use views as the interface to grid controls, and so on, in applications that are designed to only work against table structures. You should never expose base tables to data controls
- DON'T use views as a general-purpose means to deal with ad-hoc, or parameterized queries from applications. Use a properly-designed interface that is based on stored procedures, wherever possible.
- DO use views to promote flexibility and portability – for example, isolating all query cross-references to a remote database in a single view.
- DO consider the careful use of Table-Valued UDFs as an alternative to views, where you also need to perform processing, pass parameters and so on.
Help us finish the list, and correct the advice we've given. Better still, if you have view code on which you'd like a second opinion, send it to us and we'll get our authors to review it and potentially offer alternative solutions. We'd also like to hear of real instances of Godzilla views of frightening complexity which have caused difficulties with production systems. The best example will receive a prize. Cheers, Tony.
|
-
Posted Thursday, July 23, 2009 4:12 PM |
|
It was one of those wonderful moments when a task that was, a moment ago, frustrating and nigh-impossible, suddenly became less daunting.
I'd been wrestling with Powershell. We were checking and editing a Powershell article for Simple Talk and I'd started the ritual of testing out the code and making sure that all the instructions worked. It is often difficult for the authors of articles to think themselves into the role of educated beginner in a technology, so someone usually has to check that instructions are graded and complete. As usual, I was voted by the editorial committee into the role.
My epiphany happened when I typed in
Get-Help Get-Command -examples
...and
Get-Help Get-Command -full
... and wondrous stuff spilled forth down the screen. Suddenly, I could make progress. Obviously, I had to type in the name of the particular cmdlet for which I wanted examples of use but, hey, I can cope with that.
It was beautifully written help, brief but to the point. With that, and a few simple examples of full Powershell scripts, I was airborne. It made me think how unnecessary so much of the training and publication industry around Microsoft actually is. There is a huge industry that interprets the information that Microsoft provides and re-interprets it for mere mortals without otherwise adding much extra value. Even Microsoft itself has joined in the game. It gives us the information from the developers distorted by 'chinese whispers', rather than from the source. SQL Server Books-on-Line has always provided a rich harvest for the trainers. Everything is there, if you can find it, but has, in the past, been painfully structured, short on practical examples and often written in a style that refuses to compromise with the less brain-bound reader in any way. Inevitably, an industry has developed to interpret the occasionally strangulated text of some of the MSDN information into reasonable, simple English.
There are, within Microsoft, people who really understand their products and can communicate them well. Whoever provided the terse but valuable help text for Powershell was, surely, a developer talking direct to other developers. It has always been a difficult struggle for any software provider such as Microsoft to pitch it right: to make sure that the key people have the time to trasmit their knowledge and understanding to others. This is a difficult balancing act to perform, since few of the creative developers are natural communicators. But when they are, then we all gain.
With any software product of any complexity, the job isn't finished until its use is made crystal-clear to the end-user. It is not a peripheral task. The people who developed the system are always best placed to write about it. This is why books such as Brian Kernighan and Dennis Ritchie's "The C Programming Language" (Dennis Ritchie wrote C), and Brian Kernighan and P. J. Plauger, "The Elements of Programming Style" (Kernighan wrote AWK and a lot else besides) remain classics of the art of explaining complex technology in a simple approachable way.
What do you think? Is it better for developers to stick to coding and leave the wider aspects of communicating with the users of their applications entirely to others; or is it better to encourage them to participate more in explaining how their applications work?
Cheers,
Tony
|
-
Posted Thursday, July 09, 2009 12:03 PM |
There are many good reasons for building your data integrity logic into constraints. When such business rules become part of your DDL, they can never be circumvented. Any change to a rule can be enforced by a single update to a constraint in the database. It is a simple and safe architecture for managing your data integrity. But is it always that simple in practice? It is much better to put this logic in the database than to have similar logic spread scattergun throughout the many tiers and applications that access the database. With the latter approach, one then faces the task of ensuring that the rules are consistent in all places, and that no process can bypass the checks. However, even with all your integrity logic safely corralled into the database DDL, things can still get complicated, and difficult to track, pretty quickly. Most DBAs will employ the full range of simple CHECK, DEFAULT, UNIQUE and KEY constraints that they have available. They may also have a few more complex rules to enforce that require UDF-based constraints. Of course, there are some checks that you can't realistically do in constraints, so you'll have some triggers thrown into the mix as well. In addition, you may have some "cascading" Referential Integrity requirements (such as "cyclic cascades") that are not supported in SQL Server, so you'll implement these rules using stored procedures. At this point your logic is beginning to look a little less neat. Although it is all within the database, it can become very easy to forget or miss something when the business rules change and need to be updated. At least, it will take some time to locate all the places where changes need to be made. If it is not possible to satisfy all integrity rules using constraints alone, then should one, instead, implement them all using triggers, so that all the rules really are in one place and can be maintained easily? This wouldn't be a free lunch. The cost-based optimizer uses the information it finds in constraints to optimize query execution. For example, if a query being optimised uses the LOWER function, and you already have a constraint in place on the column that restricts all entries to lowercase, then the optimizer can, if conditions are right, ignore the function altogether and arrive at a faster execution plan using just the constraint. How much performance are you actually missing out on? After all, the optimizer won't always use constraints, if they are not trusted because they were disabled at one point, or if they use functions. Does it outweigh the relative advantage of having a single, consistent "layer" of triggers, for example, to implement your rules? What would you do? What is your overall strategy towards implementing, and easing the management of, business rules in the database? Cheers, Tony.
|
|
|