Click here to monitor SSC

Developer, manager and marketing idiot. I've been working at Red Gate since 2005 on products as diverse as Red Gate Cloud Services, SQL Data Compare and ANTS Performance Profiler. I do a lot of DIY and home-brew because there's only so much time I can stare at a monitor.

When is a Bug NotABug

Published 23 January 2009 1:08 am
So we’re coding away – as you do – on our lovely new product (Blatant Plug : Exchange Server Archiver). Things are going well code monkeys are ooking nicely, testers are being evil as only they know how.

Now I’m all for thoroughly tested products, things weren’t always this way and I remember when I started working with full-time testers and took every bug as a personal insult. Afterwards I came to realise that we’re all working to get the best product we possibly can in the time available out the punters.

OK, back to the story. A bug was raised, this is a good thing, let’s have a look what have the testers done now.

Hmmm.

What.

What!

Let me get this straight you started the archive service, it wrote an XML configuration file out to a directory. Things are going well so far. We have a running service and a config file. You then, whilst the service was running, after it successfully saved the config file, removed the write access to the location where it saved the XML file. It carried on running, but no, that wasn’t enough for you – you had to go and make a change, prompting the archive service to try to save it’s XML config file again, it knows it has write access, it just did it when it started up, yet funnily enough it now fails.

Now I don’t argue that technically this is a bug, but really, I mean, is this actually ever going to happen in the real world. My argument was no and I wanted to close the bug as “NotABug” but the tester wouldn’t budge. I brought the test lead in, they wouldn’t budge. I brought in a tester from another team. Still no. At which point I admitted defeat and set the bug to “Future” instead.

Now opinions wanted – is this a bug or should I be allowed to close it as “Won’t Fix”?

If there is no consensus there may have to be a paint ball battle to solve this thorny problem.

This post has been brought to you with :) placed wherever you feel like in the text.

15 Responses to “When is a Bug NotABug”

  1. RobertChipperfield says:

    I still reckon we should configure Jira (our bug tracking system) to have different strings for bug priorities depending on whether it’s a tester or developer logged in:

    Developer – Tester
    =============
    Low – High
    Medium – Critical
    High – Bug of DOOOOOOOOOOOOM

  2. reka.burmeister says:

    Well, from the evil tester’s (that would be me :) ) point of view, the bug wasn’t raised because I received a nice and kind error message informing me, that the archive service doesn’t seem to have permission to its config file therefore it can’t write to it, and the system will be a bit broken until the problem is solved. The bug was raised, because this little evilness caused an unhandled exception, and the program died.

    A quality sofware shouldn’t allow any unhandled exception to reach the user, if a critical error occurs, it should send an error message and shut itself down gently

    I know, it’s not really probable that this situation (write protect a directory’s content) would happen, but I think we all agree that the probability is higher than 0, therefore the error should be handled. Users tend to do wierd things over time out of negligance, and even though in this situation it is clearly a user fault, it just leaves a bitter aftertaste if it the sofware blows up.

  3. radeldudel says:

    Ahem. So your tester handed you a bug.
    You ask the next person, he confirms: it is a bug.
    You ask another one, again, confirmation: it is a bug.
    You try to bring someone else in, still the testers insist: it is a bug.

    And now you are trying to appeal to the public, what do you expect to hear? The program crashes, it still is a bug!

    Kudos to your testers for sticking to it!

    What can be discussed is, what to do about it? First, how important is the bug, second, how hard is it the easiest fix (graceful death instead of crash, it seems).

    You should triangulate all bugs on a scale depending on ‘how likely to happen’ and ‘how desastrous are the results’ (data loss?). But don’t try and argue if it is a bug or not. You just make yourself look silly, really.

  4. keithr says:

    I disagree with the idea that every issue is bug.

    Bugs are things that could go wrong with the software when used properly or even improperly within the application.
    This is a vulnerability. The only way for a user to cause this is to change the operating environment outside of the application. It’s not possible to plan for every random action that someone may have.

    Should it be trapped for? Should expect that someday that there will be some overzealous IT admin that thinks that directory should be “protected”? That depends on the confidence you have in your users.

    And yes, I am a developer and this is also the opinion of the another developer in my office.

    Personally, I think you should feel proud. In an effort to crash your system, the tester had to step outside the bounds of normal use and resort to an act of intentional malice.

  5. reka.burmeister says:

    Well, from a quality perspective, I have to disagree that something, that doesn’t do what the specification of the sofware says (e.g. sadly in many cases, the customer documentation – but that’s a separate issue…) is not an error/fault (also known as bug).

    There is a set of things that a sofware has to do, and an other one, that it must not do (please notice the difference). Allowing an exception to reach the user is definitely something that should never ever happen if you’re using a well-designed, quality sofware.

    You’re right, the thing that I did, is outside the bounds of the software from some perspective, but I’m afraid you’ll have to specify the bounds of the application first, to be able to safely state that. Does it use parts of the hard disk? Yes it does, therefore the developer has to make sure that every such data transfer is made safely, and all possible errors that might occur while doing so are handled properly

    I’m afraid what you’re saying is, that if for example I’m using excel, and just been working for an hour on some project and forgot to save (and I have no autosave for some reason), and I save, because I haven’t noticed that the hard disk is full, when the software blows up and I loose all my work, is perfectly normal, since it’s out of excel’s bounds, and it was completely my fault anyway…

    Our opinion (which is possibly rooted in the tester < -> developer POV) is completely different. I think irritating the user (blowing up with no obvious reason… an exception is usually not the most obvious things one can see in a software…) is something that is not acceptable. All possible errors should be trapped and the user should be informed why the operation couldn’t be performed (in our case: ‘The config file appears to be write protected…’). Just try to imagine if you were using the sofware, what would you like to see? Exception, or error message? Not to mention, that at any given company many sys admins can work at a time, or in shifts. What if one of them write protects (for any reason really) the file, and the other will have to search for the error for hours, because the software just died?

    You mentioned, that you have discussed this with an other developer at your company? Would you discuss it with an other tester? I’m fairly sure, you’ll get a similar answer to mine.

  6. RobertChipperfield says:

    Whilst I agree that in an ideal world, end users would never see a stack trace, and to that end, it is a bug, there’s also a limit to what can be reasonably caught, in the interests of actually finishing a project :-) .

    For example, System.IO.File.Create can throw no less than 29 different types of exception as a result of various things going wrong.

    At one end of the spectrum, a crash resulting from a user mistyping a path in a Save As dialog and entering an invalid character is clearly a Very Bad Thing, and really should be fixed.

    At the other end, it’s pretty reasonable to expect Something Really Weird to happen if someone goes into the GAC and deletes System.dll, or starts editing your process’s memory while it’s running. Somewhere in between lies the point where you need to start fixing things.

    I think the “NotABug” wording is probably the problem here – it *is* a bug, but is it one that’s worth fixing, or could the time be spent better elsewhere?

  7. Ralf Bachle says:

    Writes to files can fail for all sorts of reasons, even without mad sysadmins and lunatic users throwing a wrench into a well oiled machinery. When the latest nazi security software says “no”. When a disk volume goes r/o due to a failed disk drive. Some disk drives will switch themselves to r/o when overheated for example. The possible scenarios in relation with exotic filesystems, storage management, SAN & NAS are endless. So I’d call this a bug though assuming no data gets corrupted not a show stopper.

    So unless you know (I mean *KNOW*) that an error does not matter, deal with them even if you think it’s an unlikely case.

    Don’t show users any sort of dump and traces unless you want to concern them. It can be the difference between a peaceful incoming mailbox or one full of emails that scream for blood for no reason ;-)

    cout << “This does not happenn”; // unreached

    Ralf

  8. Jonathan Steinberg says:

    I have to agree with Evil Tester. This is a bug. A minor one, but a bug.
    Jeers to Richard for not mentioning that this situation caused an exception that caused a program crash.
    However, this particular scenario seems to so unlikely to be lowest possible priority so the appropriate status for the bug is “Future”, as there must be more important things to work on.
    One last point in favor of the tester is that while removing write access is a particularly unusual case, it highlights a defect in the config file logic. Are you so sure that this error cannot also be caused by some other situation not yet thunk of by evil testers? Handle the error.

  9. keithr says:

    I have seen issues in our application that kicked off the global error handling because the user did something which makes little to no sense (blanked out a text field that would normally have its default value and didn’t enter a new value causing a null exception). The testers & product managers came to the conclusion that it’s a non-issue because it isn’t something clients wouldn’t normally do, since there is a default and that since it’s only happened to a single client out of 200+ over the course of 4 years. It didn’t cause a fault, but it did shut down the application through the error handler. So I know from experience our testers would agree with me.

    But I should concede a couple of points here. Our products can’t be compared to the quality of Red-Gate’s products. We use the Toolkit within our database update application and for the entire time I have worked here, I have never any errors working with the Red-Gates products (I believe we were on version 3 when I started). I can’t even say that about the programs I work on (and I am not counting issues found during development or testing stages). So my experience brings with it lower standards.

    Also, since this is a vulnerability that has been documented on the internet. It’s now possible for a malicious 3rd party to target this specific instance if they intend to target the application.

  10. reka.burmeister says:

    I have to admit, this is really interesting. You have a bug, which is a bug without any doubt. The error causes the software to crash, and can clearly be hit, since a user encountered it already. Instead of thinking, that there is a mistake in your code, you state, that the user was silly, and you won’t fix the problem. I’m sorry, but that is just terribly appaling, such practices are not followed here.

    I wouldn’t mind if a 3rd party company checked our product, since I believe that it has been tested thouroughly and deeply, and it would be a good measurement of the skills of our test team here.

    Just an afterthought… have you ever heared the expression ” The customer is always right? “. If you’re selling a software your customers are the users, and no matter what they do, they are always right. There’s no real “the user did a silly thing”…

  11. Bart Read says:

    OK, it’s a bug, but I’d probably have resolved this as WONTFIX, or maybe would have added an exception handler to either log or report the exception directly.

    Interestingly I have noticed a new resolution status has been added to our bug tracking system in the past few days. This status is called CANTFIX, and it’s given me a cunning idea for a TV show called “Can’t fix! Won’t fix!” in which obstreperous developers are taught how to fix bugs in front of a studio audience in a vaguely comedic fashion. I imagine Channel9 would be well up for screening this.

    < >

  12. WillN says:

    As a lunatic sysadmin, I say, handle the exception and fail gracefully, log an error, and bam, your user will not call support…..

    I can’t tell you how frustrating it is a system administrator trying to manage dozens of servers and applications where error logging is poor or nonexistant. Where its good, I’m highly impressed and will always recommend that software to others.

    The question isn’t “Will a user ever do this?”, because the question is probably YES, and if not the user theirselves, some other program or process will do it for them…

    As someone else commented, there a many unforseeable reasons write access to a file can fail.

    I’m also jeering Richard for not mentioning the unhandled exception part…

    Bravo to the tester.

  13. Gene Myers says:

    Hi,

    I’ve just stumbled upon this post, quite some time after the fact, but thought I’d throw in my 2p’s, for what they are worth.

    This is the type of question I’ve often been called in to moderate in my past roles, so here’s my take on it. An unhandled exception is an issue, regardless of how much of an edge case. Having your application exit ungracefully is an embaresssment, and smells bad. What this would prompt me to add it to the agenda for Root Cause Analysis, to determine if the coding standards are not appropriate for Exception handling, and why this wasn’t caught in peer code reviews- assuming you have at least peer code reviews, or a more formal process.

    Even though this is an edge case, because it did not exit gracefully, I would define it as a Priority 2, which would require it be fixed to clear Defect Zero before release., or a Root Cause Analysis shen new light on how it fell through the cracks. Why this exception wasn’t handled, really does indicate to me that either your codding standards, or your review process needs to fixed- you are missing the big picture here in my opinion.

    So, effectively, I agree with your QA dept (evil tester Reka), even though I’m offically from the code monkey side of the fence;-).

    Oh, and I firmely believe that every is is a bug as long as it can be reproduced, but every bug has a priority and some just will not be fixed (as Robert noted).

    I have a question for you- how closely do your Dev and QA team work? do they do review the initial product speciication and do the initial estimates together? It’s nice that Dev and QA are communicating here, but wouldnt it be nice if they just sat in a room and came to a common understanding in person!

  14. Richard Mitchell says:

    Our Dev and Test teams work very very very closely together. We tend to work on the same things at the same times so we can understand what things should do and even negotiate the behaviour.

    We don’t tend to have much in the way of written specs and work in a very agile manner.

    I think I’d probably disagree with my own blog post at this point as the very blog post caused a good conversation internally about what constitutes a bug and how best to prioritise them. We’ve now got a few more categories of bugs – “No Action Planned” (we have better things to do), “Low Occurrence” (we’ve seen it but can’t reproduce it), “Quick Wins” (things that are so small and easy to fix they’d never get to the top of a big list of bugs – developers and testers can scratch their itches here) and of course the huge bucket “Future”.

    We’re still learning as a team and are all prepared to comprise to get the best products we can out of the door. But we all pull in the same direction.

    For this bug in particular we’re probably going to do another intesive week of error handling improvements. We’ve done it in the past and it has worked well hopefully we can repeat that.

  15. shenoyroopesh says:

    BUGGGGG!!!! :)

    I agree with Reka in this regard because she makes it amply clear – the bug is not just this particular situation, the bug is raised because unforeseen exceptions are not handled gently by the software – there is just no way a software can crash and burn just because the developer dint think the user would do something.

    The repro steps are just one way of reproducing this bug; basically you do anything in this software that it is not designed to handle, it will crash badly.

    At least a simple error message like ‘Oops.. something went wrong, lets try again’ should be good enough to fix this bug when it is not a standard exception it is catching – Showing anything better than this is a bonus.

    And yeah just for the record – Im a developer, but still agree that this is a bug.

Leave a Reply