Click here to monitor SSC

Tony Davis is an Editor with Red Gate Software, based in Cambridge (UK), specializing in databases, and especially SQL Server. He edits articles and writes editorials for both the Simple-talk.com and SQLServerCentral.com websites and newsletters, with a combined audience of over 1.5 million subscribers. You can sample his short-form writing at either his Simple-Talk.com blog or his SQLServerCentral.com author page. As the editor behind most of the SQL Server books published by Red Gate, he spends much of his time helping others express what they know about SQL Server. He is also the lead author of the book, SQL Server Transaction Log Management. In his spare time, he enjoys running, football, contemporary fiction and real ale.

Fair Comments

Published 10 November 2011 4:23 pm

To what extent is good code self-documenting? In one of the most entertaining sessions I saw at the recent PASS summit, Jeremiah Peschka (blog | twitter) got a laugh out of a sleepy post-lunch audience with the following remark:

"Some developers say good code is self-documenting; I say, get off my team"

I silently applauded the sentiment. It’s not that all comments are useful, but that I mistrust the basic premise that "my code is so clearly written, it doesn’t need any comments". I’ve read many pieces describing the road to self-documenting code, and my problem with most of them is that they feed the myth that comments in code are a sign of weakness. They aren’t; in fact, used correctly I’d say they are essential.

Regardless of how far intelligent naming can get you in describing what the code does, or how well any accompanying unit tests can explain to your fellow developers why it works that way, it’s no excuse not to document fully the public interfaces to your code. Maybe I just mixed with the wrong crowd while learning my favorite language, but when I open a stored procedure I lose the will even to read it unless I see a big Phil Factor- or Jeff Moden-style header summarizing in plain English what the code does, how it fits in to the broader application, and a usage example. This public interface describes the high-level process and should explain the role of the code, clearly, for fellow developers, language non-experts, and even any non-technical stake holders in the project.

When you step into the body of the code, the low-level details, then I agree that the rules are somewhat different; especially when code is subject to frequent refactoring that can quickly render comments redundant or misleading. At their worst, here, inline comments are sticking plaster to cover up the scars caused by poor naming conventions, failure in clarity when mapping a complex domain into code, or just by not entirely understanding the problem (/ this is the clever part).

If you design and refactor your code carefully so that it is as simple as possible, your functions do one thing only, you avoid having two completely different algorithms in the same piece of code, and your functions, classes and variables are intelligently named, then, yes, the need for inline comments should be minimal. And yet, even given this, I’d still argue that many languages (T-SQL certainly being one) just don’t lend themselves to readability when performing even moderately-complex tasks. If the algorithm is complex, I still like to see the occasional helpful comment.

Please, therefore, be as liberal as you see fit in the detail of the comments you apply to this editorial, for like code it is bound to increase its’ clarity and usefulness.

Cheers,

Tony.

17 Responses to “Fair Comments”

  1. GilaMonster says:

    Something I heard a while back about inline comments… “The code should say what, the comments should say why.”

    An inline comment should not say what the code does. The code says that and, if it’s written in a way that’s not clear, then the code should be fixed. The comment should say why something is being done.

    A comment that reads ‘insert the account totals for this period into a temp table’ right above a SELECT .. INTO .. FROM Accounts is about as useless as it gets. A comment that reads ‘Store the latest balance and total movements per account for the period to avoid the cost of recomputing them multiple times, ‘ is a little more helpful when coming along and modifying the procedure.

    Header comments of some form – essential.

  2. Alex_Kuznetsov says:

    As our tools improve, they automate out much of the need for comments.

    For example, 15 years ago it was common practice to begin a file with the history of modifications, whodunit, why, and when. Old hands remember that. Version control rendered that practice obsolete, and nobody does it anymore – you can get complete and accurate log of changes straight from Git for no additional cost.

    If a DBA pulls a stored procedure straight out of Object Explorer, if they want to tune it up, and they have no other information about it whatsoever except for the body of that procedure, of course they want as much information as possible right in that procedure.

    It does not have to be always like this – there are other tools and approaches.

    When we do OO programming, we have no need for “how it fits in to the broader application, and a usage example.” comments at the beginning of our classes – JetBrains automated that out for us. We can just right-click “Usages”, and we immediately get precise and up-to-date usage information.

    Similarly, when our database code lives in the same solution with C# and C++, we can easily and quickly search on stored proc’s name, and get accurate and complete information about where and why it is used, as well as working examples. Unlike unit tests, examples embedded in comments may not be up-to-date and may not be working. As such, they may be misleading, and having them might be worse than having no comments at all.

    For more information about usage/examples etc., we can also use things like wiki and/or forums – that improves communication, and allows everyone to modify documentation without having to redeploy modules ;) – and that would be a lot of work. However, if we do not redeploy every time we modify documentation, than the copy of documentation embedded in the module is no longer up-to-date…

    Still, sometimes comments are useful, and I have lots of them in my system, but not nearly as much as just a few years ago.

  3. Jeff Moden says:

    Great article, Tony, and I couldn’t agree with you and Jerimiah more. DBA’s shouldn’t have to open another tool to find out if the correct revision of code is being promoted to or has been deployed to their databases. People troubleshooting the code shouldn’t have to guess what the code does based on a (usually poorly implemented) naming convention. People troubleshooting code shouldn’t have to study the code just to figure out where they need to make a change. And, sketching out the code in the form of comments explaining the “why” prior to writing a lick of actual code will actually make it easier on the original developer during development.

    We went through all of this at one of the companies I worked for. The code was virtually devoid of comments and readability was all but satisfactory. It would take as long as two days to research a piece of code to correctly make a simple modification simply because you had to read/study the code just to figure out which section of code did what. The original code was usually fraught with errors and required a lot of rework before it actually went to production. Even after such rework, “Change Controls” were an oxy-moron that took hours to accomplish just for a few stored procedures because they still needed for errors found by sanity checks to be repaired during the Change Control.

    It took a while to fix all of that and proper comments and readability standards played the most important factors in the fix. As a result, research to modify a sproc dropped from days to minutes. Pre-production rework nearly disappeared as did the time required for Quality Assurance testing because everything usually worked correctly the first time. Production Change Controls dropped from hours for just a couple of procs to minutes for dozens of procs and post implementation error rates dropped to very nearly zero.

    It takes only scant seconds extra per statement in a proc, function, or view, to add extraordinary clarity (even for the original developer) that no tool or naming convention can add because you need to know the “why” (that Gail was talking about) even when developing original code. Tools/naming conventions just can’t explain the “why” to the same level especially if you want “newbies” to be able to make modifications. If the simple act of including proper embedded documentation is too much trouble, especially for a supposed “Ninja” of a Developer because of the advanced methods such a developer will use, then I agree 100% with Jerimiah… “Get off my team”. ;-)

    –Jeff Moden

  4. Jeff Moden says:

    BWAA-HAAAA!!!! Even the SPAM above is more clear because of comments! ;-)

  5. Alex_Kuznetsov says:

    >>DBA’s shouldn’t have to open another tool to find out if the correct revision of code is being promoted to or has been deployed to their databases.< <

    using comments for this task is slow and unreliable. It would be way better to pull the right revision out of version control, and use SQL Compare - that would give you complete and accurate information.

    >> People troubleshooting the code shouldn’t have to guess what the code does based on a (usually poorly implemented) naming convention.< <

    I concur - only professionally developed unit test harness can give you a complete and precise answer what a module is supposed to accomplish.

    >> People troubleshooting code shouldn’t have to study the code just to figure out where they need to make a change.< <

    Really? If someone makes a change without studying code, they are very likely to make a mistake, right?

    >> And, sketching out the code in the form of comments explaining the “why” prior to writing a lick of actual code will actually make it easier on the original developer during development. <<

    Not really – there are many other alternatives, such as TDD and BDD.

    Anyway, if it is you who is supposed to take a support call at 3 AM and it is you who is supposed to fix that bug, then of course you should have it your way – it must be convenient for you to troubleshoot. If this is the case, then your efforts must be respected and appreciated, and your need to quickly grasp what a module is about must be addressed – no argument here.

    However, not in all IT shops T-SQL code belongs to DBAs. Quite a few shops find it more profitable when developers own all their code, including T-SQL, and only developers can change it. This seems to be especially attractive in Agile environments, where speed of development is essential.

    Come to think of it, we have comprehensive suites of unit tests, so we don’t have to rely on comments to understand what the module does, like we did back then in 1995. Even if someone writes the best comments in the world, they are no match for proper unit tests including all use cases edge cases and such, no match for requirements executable in the language of the domain. As a result, in 2011 typical developers can ensure much better quality than we usually did in 1995, and with less effort. Whoever does not want to use modern tools such as unit tests are stuck with 1995 level of quality and productivity.

    On top of that, we have tools for stress tests, so our modules withstand high concurrency and do not embrace in deadlocks, and we have performance baselines, so we know how a module performed last week and three months ago – all with very little effort once the tools are in place. In other words, in the year of 2011 developers may have tools that make T_SQL development several times more efficient, dramatically reducing maintenance costs as well as reducing number of bugs, deadlocks and such.

    When this is the case, then it just makes sense to have people with the right tools fix T-SQL – it takes less time and is less prone to errors.

    On the other hand, if someone does not want to “open another tool”, without modern tools they are less productive and substantially more prone to errors, just like all of us were in 1995…

    So, if T-SQL belongs to developers – and in quite a few cases it does, and developers are supposed to fix it if something goes wrong at 3 AM, it might be a good idea to leave it to developers to decide if and how to comment it.

    I hope this post brings more understanding of developers’ point of view – this is why I am posting this thing. I really appreciate DBA’s efforts and do not intend to offend anyone.

  6. Jeff Moden says:

    I guess I’ll have to leave it as I agree to disagree with you, Alex. ;-)

  7. Jeff Moden says:

    For anyone reading this thread, the SPAM I spoke of in my second post on this thread has been removed.

  8. paschott says:

    I’ll tend to agree with the first comment. The comments in the code should say “why”. I also tend to disagree with Alex on this. Sometimes a tweak gets made in a system, but doesn’t make it to source control. Having a quick comment on that change in the stored proc helps avoid those situations when pulling latest and comparing. Sometimes we have the same proc being worked in different branches (rare, but it’s happened). Those comments inside the stored proc can be a life saver when you merge those changes.

    Also, we tend to look directly at what’s running on the system because sometimes we have bugs in the code or situations that only can be reproduced on that system. Seeing the changelog there makes it a bit easier to know when something was last changed and why. Sure that’s all stored in the various comments in source control, but it’s a lot easier for me to just look at the code currently running and to know what changed so I have an idea of the history easily and without going to yet another system to find out something that should just be there.

    Finally, there are times that I’ll resort to using an efficient, but perhaps somewhat more lengthy, bit of T-SQL to accomplish a task. I’ll definitely add a comment about both the “what” and the “why” at that point because reading it can be pretty difficult even for the one who wrote it. I’ll also make sure that the formatting doesn’t follow the “put it all on one line” method or the “line break randomly” method. :)

  9. Keith Rowley says:

    I agree with paschott, sometimes the most efficient running piece of code is NOT the clearest reading piece of code, and my users don’t care how beautiful my code is, they care how fast the data loads.

    So particularly in these cases comments can be helpful since three months down the road I may not remember why I did something the way I did it.

  10. Mischa says:

    There’s no one answer to what makes code understandable. Comments that explain intent are good; comments that describe assumptions are good IF you have no way to express them as assertions. I tend to trust code that has a large body of small unit tests: comments can be out of date with the code, but runnable tests never can! Examples of usage, guarantees of behaviour, and even a history of edge cases and bug fixes: give me test-driven dev any day!

  11. timothyawiseman@gmail.com says:

    @Alex_Kuznetsov I must join Jeff Moden in respectfully disagreeing with some of your points, but I think it worth explaining why.

    I will partially agree that version control has made change logs inside the code less significant, but I don’t think it has entirely removed it. When code is refactored or an algorithm is tweaked for effeciency, then version control fully documents that. When what a function does is even minutely adjusted, then I think a huge flag indicating when and why that was done is invaluable.

    I also do not find unit tests or searching for where a function was used to be the best way to determine what a function was meant for. It is much easier to read for the developer to just explain his mindset and why he wrote it then for someone else to guess. Unit tests in some ways are more reliable, but they cannot show why and their way of showing what can be obtuse and hard to read.

    As for your comment about examples in comments not necessarily bein up to date, you have a point. But there is an automated way to deal with that. In Python for instance, I include a call to DocTest as part of my test suite and it tells me if any of the examples are out of date automatically.

  12. Alex_Kuznetsov says:

    @timothyawiseman: As Scott Meyers aptly put it, we need to “program in the future tense” – I encourage you to read last chapters of his great book “Effective C++”. Basically he advices that we should not have to change our code if its usage changes, and his advice makes a lot of practical sense.

    Suppose we have used comments “to determine what a function was meant for” – we have the following comment:
    /* used by Sales Team in their Monthly Totals report */

    Short term, this looks great. Suppose, however, that marketing team begins to use this module in their Global Trends web page. What do we do? Should we redeploy the module with updated comments? Is it efficient? Should we keep the module as is, even though its comments are misleading?

    So, IMO commenting usage directly in the code contradicts long established best practices in software engineering, because we should not have to change our code if its usage changes.

    Regarding “Unit tests in some ways are more reliable, but they cannot show why and their way of showing what can be obtuse and hard to read. ” – that really depends on which tools you use for unit tests. If you use something like tSQLt, then I agree, it can be hard to read.

    Suppose, however, that our tests look BDD-ish, and are in fact executable requirements, as follows:

    – if no rows match, must return empty result set
    EXEC dbo.SelectCustomers @Firstname = ‘qwerty123′, @LastName = ‘ASDFGHJKL098′ ;

    – if multiple rows match, return them all ordered by SSN
    EXEC dbo.SelectCustomers @Firstname = ‘Joe’, @LastName = ‘Brown’ ;

    (snip – we have a complete set of all edge cases here)
    Of course, this script is followed by expected results in another file.

    This script is used both as a way of communicating with customers and as a unit test harness, and as such it is both easy to understand for everyone and complete and precise. “When what a function does is even minutely adjusted”, version control should give you precise information of what exactly has changed.

    I agree that “a huge flag indicating when and why that was done is invaluable”, but in the long term comments in code is not a good place for such flags. I remember working with some core java files back then in the ’90s – they were more than 90% comments about ” when and why that was done”, and less than 10% actual code. This is simply not practical if the half life of our project is more than a few months.

    The best way to store ” when and why that was done” I can think of for a long term project is version control – we just store executable requirements like the one I have shown above, and git does everything else for us.

    Also I am a little bit surprised: I never stated that I do not use comments at all. I do use comments as needed, they are all over my system, just not for the reasons specified by Tony and Jeff.

  13. Alex_Kuznetsov says:

    @paschott
    >>I also tend to disagree with Alex on this. Sometimes a tweak gets made in a system, but doesn’t make it to source control. Having a quick comment on that change in the stored proc helps avoid those situations when pulling latest and comparing. < <

    Interesting, because I do agree with you - if you are tweaking modules directly in production, with running unit and other automated tests, then of course you need comments.

    >>Also, we tend to look directly at what’s running on the system because sometimes we have bugs in the code or situations that only can be reproduced on that system. Seeing the changelog there makes it a bit easier to know when something was last changed and why. Sure that’s all stored in the various comments in source control, but it’s a lot easier for me to just look at the code currently running and to know what changed so I have an idea of the history easily and without going to yet another system to find out something that should just be there. <<

    If this is the case, then you probably would be better off if your comments describing log of changes were automatically generated off version control, and embedded in the body of stored procedure as part of deployment. This is not difficult to develop.

    If your change log is maintained manually, it is very likely to have errors. You don’t want developers to maintain such log manually.

  14. Gavin says:

    There’s some good stuff in “Code Complete” on commenting style as well; in particular about explaining intentions rather than code content, which can be useful in refactoring.

  15. Jeff Moden says:

    >>Suppose we have used comments “to determine what a function was meant for” – we have the following comment:
    /* used by Sales Team in their Monthly Totals report */

    My suggestion would be that whom ever wrote that, needs to learn how to comment. It doesn’t effectively explain the “What” or the “Why”.

    /* This report produces monthly totals of sales quanties by product */

  16. Seven24 says:

    The point that Alex is making is that the “Why” changes with usage. Further, the “why” of a particular change can and should be part of the check-in comments in source control (no one in the right mind makes changes directly to production systems right?). Source control is the tool that should be used to determine the history of changes; not procedure comments.

    In addition, it is often the case that reports for example have multiple intents. Saying “produces monthly totals of sales quantities by product” may expand to the point where the monthly totals is just one column of a much larger report output used for multiple purposes including the determination of the monthly totals.

    There is also the problem that a comment like “produces monthly totals of sales quantities by product” isn’t helpful. What information does it give us? At best, it tells us at one point that was the developer’s interpretation of the intent of the report but it does not tell us if that is still the intent or its only intent. When I work on T-SQL procedures, I do not trust comments to tell me what is actually going on or how a procedure is actually used. The only thing that can be trusted is the query and code itself. Having readable code is many orders of magnitude more useful than having lots of comments which may not reflect what is actually happening.

  17. Dave.Poole says:

    As IDEs have got better and source control tools have matured the nature of the comments has changed. This does assume that everyone is using some form of source-control for DBs. In this day and age we should be but I seem to remember a recent survey on SSC indicated otherwise.

    When I started in computing many of us were self taught and learnt by typing in listings from magazines.

    Clean efficient code should require minimal commenting however comments that say “why” also have a teaching aspect to them. I’d say that a lot is learnt by inspecting code.

    There is sometimes a need for a bizarre, counter-intuitive approach in T-SQL. If someone chances upon it as part of a development task they may be tempted to remove that feature. A comment that says “Why” will show you why.

    For example, back in the days of SQL 7 there was a strange hack where if you included a range clause against the clustered index in your WHERE statement you got a clustered index seek. The range would have been across the entire set so it would appear that it was a superfluous clause.
    Ditto query/locking hints

    MY other observation is that some queries are self evident in what they are doing but the verbose nature of SQL makes some very hard to read irrespective of how they are laid out.

    I’ve recently been reverse engineering someones code and eventually found out that the query in question was fundamentally flawed. Nothing appeared to be broken, it produced the correct recordset structure with credible, but in reality rubbish, data.

    A few simple comments would have sped up the whole reverse engineering task and probably have prevented the flaw in the first place because the mismatch between what the code actually did and what was intended would have been emphasised.

    I find a barrier to comments is the implicit simplicity of SQL. Developers see simple CRUD and think that is the extent of SQL. They then assume that this is the equivalent of their reduced and refactored code apply a their comment paradigm to the entire estate. I’m not advocating commenting CRUD procs.

    Finally, tools such as Red-Gate SQLDoc pick up on the extended properties and some competitors pick up on specifically marked comments in order to produce a full data dictionary and documentation set. Strip out the comments and all you have is a list of database objects.

Leave a Reply