Phil Factor's Phrenetic Phoughts

Simple-Talk columnist
The wilder shores of Transact SQL

  • TSQL Spam-killer

    Posted Friday, April 11, 2008 7:53 AM | 8 Comments

    It's very refreshing to take a break from a tedious bit of routine code to try out something unusual or tricky. Sometimes one can be quite startled by finding out how easy it is to do in TSQL.

    Jonathan Snook's blog is my favourite "developer's blog" of all. It is packed with ideas. I was reading it the other day and came across How I built an effective blog comment spam blocker. This, I thought, was great fun, because it told you how to build it but didn't give you the code. The algorithm scores a string according to how likely it is to be spam. For everything in a comment that the filter likes like, the string gets a point. For everything it don't like, the string loses points. If the string totals 1 or higher, it lands on the site as a valid comment. If it scores a 0, it's set for moderation, If it's below 0, it's marked as spam. Some of the things that score negative are rather amusing ( .pl or .cn sites for example) but done from experience.

    With Jonathan's permission, here is a TSQL implementation. I've simplified it slightly, to fit it in a blog, by using temporary tables for the word-banks, dodgy URL suffixes and so on. Normally, you'd put these in permanent tables and fine-tune the system as the language of spam changes, without having to alter the code.

    CREATE FUNCTION [dbo].[fnSpamScore] (@comment VARCHAR(MAX))
    RETURNS INT
    AS
    BEGIN
    DECLARE
    @links INT
    DECLARE
    @LenComment INT
    DECLARE
    @points INT
    DECLARE
    @hit INT
    DECLARE
    @ii INT
    DECLARE
    @Start INT
    DECLARE
    @Length INT
    DECLARE
    @KeywordLength INT
    DECLARE
    @Starta INT
    DECLARE
    @urls TABLE(url VARCHAR(255))

    DECLARE @SpamWords TABLE(spamword VARCHAR(30))
    --stock the spam-words table
    INSERT INTO @spamwords(spamword)
    SELECT 'Levitra' UNION SELECT 'viagra' UNION SELECT 'casino' UNION
    SELECT
    'cialis' UNION SELECT 'nude' UNION SELECT 'tramadol' UNION
    SELECT
    'phentermine' UNION SELECT 'xanax' UNION SELECT 'alprazolam' UNION
    SELECT
    'amoxicillin' UNION SELECT 'xxx' UNION SELECT 'porn'

    DECLARE @commentstart TABLE(word VARCHAR(30))
    ---stock the comment-start table
    INSERT INTO @commentstart(word)
    SELECT 'interesting' UNION SELECT 'cool' UNION SELECT 'sorry' UNION
    SELECT
    'nice'

    DECLARE @keyWords TABLE(keyword VARCHAR(30))
    --stock the keyword table
    INSERT INTO @keywords(keyword)
    SELECT '.html' UNION SELECT 'free' UNION SELECT '?' UNION
    SELECT
    '&' UNION SELECT '.info' UNION SELECT '.pl' UNION
    SELECT
    '.de' UNION SELECT '.cn'

    DECLARE @URLStarts TABLE(start VARCHAR(30),offset INT)
    --stock the URL Start table
    INSERT INTO @URLStarts(start,offset)
          
    SELECT 'HREF=',5--unquoted
    UNION SELECT 'HREF="',6--quoted
    UNION SELECT 'HREF=" ',7--I've seen this trick in spam
    UNION SELECT 'HTTP://',0--not in an anchor
    UNION SELECT 'HTTPS://',0--not in an anchor
    UNION SELECT 'mailto://',0--not in an anchor

    --Get the length of the comments and initialise things
    SELECT @LenComment=LEN(REPLACE(@comment,' ','|')),@points=0,
      
    @ii=@LenComment,@links=0
    WHILE @ii>0--find every URL in the comments and put them in a table
      
    BEGIN--check for the next HREF, possibly 'quoted'
      
    SELECT @start=0
      
    SELECT TOP 1 @start=hit,@keywordLength=offset FROM (
          
    SELECT [hit]=PATINDEX ('%'+start+'%',RIGHT(@comment,@ii)),offset
                  
    FROM @urlStarts)f
        
    WHERE hit >0 ORDER BY hit ASC, offset DESC

       IF
    COALESCE(@start,0)=0 BREAK--no more?
       --so we isolate the actual URL in the anchor
      
    SELECT @Length= PATINDEX ('%["> ]%',
                  
    RIGHT(@comment,@ii-@start-@keywordLength))
      
    SELECT @links=@links+1,--increment the URL tally
              
    @Length=CASE @length WHEN 0 THEN @ii ELSE @length END
                                                          
    --no termination?
      
    INSERT INTO @urls(url) --add to our URL table
              
    SELECT LEFT(SUBSTRING(RIGHT(@comment,@ii),
                  
    @start+@keywordLength,@Length),255)
      
    --and reduce the length of the string we look at past the URL
      
    SELECT @ii=@ii-@start-@keywordLength-@Length
      
    END
    --How many links are in the body?  if more than 2 then -1 point per link
    -- if Less than 2 then +2 points
    SELECT @points=@points+ CASE WHEN @links < 2 THEN 2 ELSE -@links END
    --How long is the comment? More than 20 characters and there's no links
                                                           --then + 2 points
    -- Less than 20 characters then -1 point
    SELECT @points=@points+CASE WHEN @links=0
          
    AND @lencomment>20 THEN +2 ELSE -1 END

    --Number of previous comments from same ID Approved comments +1 point per
    -- Marked as spam  -1 point per

    --Keyword search in body of comments (viagra, casino, etc.) -1 point per
    SELECT @points=@points-COUNT(*)
              
    FROM @spamwords WHERE CHARINDEX(spamword,@comment)>0

    --URLs that have certain words or characters in them
    --                                .html, .info, ?, & or free -1 point per
    --or URLs that have certain TLDs .de, .pl, or .cn (sorry guys) -1 point
    SELECT @points=@points-COUNT(*)
              
    FROM @keywords INNER JOIN @urls ON CHARINDEX(keyword,url)>0

    --URL length More than 30 characters -1 point
    SELECT @points=@points-COUNT(*)
              
    FROM @urls WHERE LEN(url)>30
    --Body starts with... Interesting, Sorry, Nice or Cool. -10 points
    SELECT @points=@points-(10*COUNT(*))
              
    FROM @commentStart WHERE CHARINDEX(word,@comment)>0

    --Random character match 5 consonannts -1 point per
    SELECT @ii=@LenComment
    WHILE @ii>0
      
    BEGIN
       SELECT
    @hit= PATINDEX ('%'+REPLICATE('[bcdfghjklmnpqrstvwxyz]',5)
                           +
    '%',RIGHT(@comment,@ii))
      
    IF @hit=0 BREAK
       SELECT
    @points=@points-1, @ii=@ii-(@hit+5)
      
    END
    RETURN
    @points
    END

    SELECT
    dbo.fnSpamScore('this is a perfectly legitimate comment that
    points out that phil''s code is horribly broken due to him being called
    out for a beer half way through writing it.'
    )
    --  +4
    SELECT dbo.fnSpamScore('Cool. Buy herbal viagra at http:\\DodgySite.cn
    and impress your neighbours.'
    )
    --  -7

    The system seems pretty effective. If a spam slips through, or a legitimate comment falls foul of the system, you can tweak it very simply . It is in place in the forthcoming Simple-Talk/SSC Wiki which I'm currently helping to build.

    As a strange by-product of developing this, I've taken a sudden liking to spam, purely to test the efficiency of the algorithm. Unfortunately, Simple-Talk is a martyr to it, but I can't find a way of plugging this routine into community server. Still, it means we have a vast test-bank of the stuff to check the routine with!

    It occurs to me that there must be a number of ways of writing the SQL code to Jonathan Snooks algorithm. What is the fastest and most effective way? Perhaps we should have a 'Lionel-style' competition.

    For anyone without Firefox who can't copy n' paste, here is the source file

  • The Glittering Concrete

    Posted Wednesday, April 02, 2008 11:01 AM | 0 Comments

    I called in to Red-Gate the other day to see Andrew Clarke, the Simple-Talk subeditor. As I arrived, he was excitedly rummaging in Tony Davis's shoebox. This shoebox contains several contributions that time forgot. When Tony forgets one of my articles, it is usually his polite way of saying that it is beyond even his powers of rescue. What treasures lie therein? As we chatted, Andrew pulled out the following gem from an anonymous contributor whose identity is lost in the mists of time. I can see entirely how the healing process of amnesia kicked into Tony's conscious brain, but I feel that, at long last, it should be given life.

    It goes as follows...

    My Visit To Red-Gate

    After all that time using SQL Compare and SQL Data Compare, it was exciting to get invited to go to Red Gate's prestigious International Headquarters in Cambridge, to give a short talk on 'Advanced Paging in SQL Server 2008'.


     Ah, Cambridge. What thoughts well up in one's minds' eye. What images are conjured up? The stone cloisters, the glittering spires, musty bookshops, vibrant Cafe-culture, and college chapels. Keen students in scarves and blazers, riding on bicycles, the sound of distant church bells.


     I eagerly imagined Red Gate's offices. The ivy clad walls, the oak bookcases, oil paintings of the founders severe in their Victorian dignity, the shambling donnish figures in tweed jackets and bottle glasses clutching well-thumbed books. A few museum cases lit by gothic stone windows.


      So it was rather a shock to arrive at the Science Park, on the Milton road, full of concrete and steel architecture. The Jeffries Building was a typical bold, uncompromising, modernist statement in chrome and glass by, one imagines, a little known architect determined to make a name as a radical force in the profession. I stepped up to the door which didn't open. A sign indicated that I should go to a door at the side, where were placed a number of bell pushes and a speakerphone grill. Inside the building, through the glass, I could see  a deserted hallway, lightly furnished with 'calming potted trees' , made of plastic, from the Viking Catalogue. Strangely, the fallen leaves in the hall were real.


     I may be sneered at as a country cousin, but I know what to do in such circumstances. You press the right button and, eventually, a strange squeaky voice like an owl being strangled comes from the grill. You say your name, there is a disembodied click, and the door opens.


     I stumbled across the hall to the toilet. Now, here is a tip for any visitor eager to impress the Red Gate people he has come to see. Be careful with the taps of the basin. Turn the tap on with anything less than extreme caution and a Niagara Falls of water sprays the front on ones trousers in an uncompromisingly modernist way. My trousers were drenched. I stared in the full-length mirror in horror: it looked as if I had just had an episode of severe incontinence.


     After hurriedly dabbing myself down with paper towels, I felt ready for the next task: finding Red Gate. No sign visible.


     This new task turned out to be easy, however. I simply resolved to follow the first geeky-looking person I spotted. Luckily, I stumbled across a perfect specimen, like a 'Far Side' cartoon made flesh. He wandered across the hall lost in strange esoteric thoughts and led me, inexorably, to Red Gate.


     Now, any follower of the old blog of  'Disgruntled Brian'  might mistakenly expect the Red Gate offices to be ringing with diabolic laughter as marketing people hatched duplicitous, cunning, plans, the air reeking with sulphur and the crackle of static electricity.   Unfortunately the marketing department were all out that day on a team building excercise. Instead, the sunlight flooded through the gaunt steel windows, illuminating a tranquil, meditative, scene as deskbound geeks tapped away thoughtfully at several keyboards and screens at once, as they constructed huge, complex applications


     Still dripping slightly down my trousers, I made my way to the stairs at the back of the office to get to the meeting room upstairs where my talk was to be held, nodding amiably at the Geeks.  The door closed behind me with a click. I went upstairs. The door on the first floor required a swipe card. No human effort would persuade it to open.  It was immovable. I retreated back downstairs. To my dismay, I realized that, to get back in, also required a swipe-card! Trapped in the stairwell. Through the window of the door, I  could see the Geeks placidly keying in intricate code. I tapped on the door, but their concentration was complete, and their iPods were on . After ten minutes, a startled geek found me sitting disconsolately on the stairs as the audience for my talk sat stolidly in the meeting room upstairs, talking amongst themselves.


     Back in the office, the next task was to get a swipe card so I could move unhindered through all doors. This was given to me by a charming lady who apologized for my predicament.  Back up the stairs I went, and presented the card smartly to the offending door, only to be assaulted by a series of beeps followed shortly by a loud ringing noise. I  was aghast. The Architect of the building, as an act of vengeance on a humanity he obviously despised, had put two sensors on the door, one of which opens the door, and the other of which sets the burglar alarm. In my haste to get through the door to start my talk, I got it wrong. The alarm had gone off, and as an added bonus, also triggered a buzzer in the local police station. Everyone was very nice about it.


     And so it was, that fifteen minutes late, dripping from my trousers, and leaving chaos downstairs as the car-park filled with patrol cars, I burst into the meeting room, to be met by the startled gaze of my audience.


     Someone leapt to his feet and shook my hand warmly. 'Did you find the place easily?' he asked affably. I stepped back to shake his hand . My back  struck the partition wall of the meeting room. It made a crashing noise like the last trump, and woke a nearby dozing  support engineer in the open-plan area outside the meeting room.  Various  sales people looked up, startled, from their work. 


    'Oh yes, fine, no problems at all', I replied.

     


  • The Technically minded subclass, and the fog of misperception.

    Posted Wednesday, February 06, 2008 6:45 PM | 2 Comments

    I spent several years in a team that advised a large international manufacturing company on their software-purchasing strategy. It always amazed me how far the software companies misunderstood our core requirements, but never really took the time to find out what they really were. All their information about us was gleaned from their own non-technical sales force whose scientific knowledge stopped short of ruling out the existence of fairies. And even these guys talked only to our senior management who understood nothing of the details of what went on in their empires, but were happy to bluff their way through in return for a hearty lobster lunch. The technically minded subclass who actually understood the business processes and the technical details of product development were treated with general contempt. They were usually kept concealed from visitors due, it is said, to their bad-tempered candour and their propensity for blurting out embarrassing details of management mistakes.

    One Hardware and software supplier, in particular, based their continued prosperity on supplying the company. We were visiting their development Labs one day when they excitedly guided us into a large room where a number of developers were working on a wonderful and exciting top-secret project. They’d been at it for two years. Basically, it was a system for modelling complex castings on the computer, and putting all sorts of stresses on them to see how the structure would flex; based on finite element analysis. For two hours, expert after expert presented aspects of the application. It would, they told us, save huge sums of money in building prototypes, and test equipment. It would revolutionise our manufacturing processes. At the end, there was an expectant hush, and all eyes were on us to respond. I said that any company who purchased the system would have a handy tool, but that it would be no use to us. There was an amazed silence. I asked, in some alarm, who on earth had they had in mind as customers for the product?. 'Why you, of course', they replied pathetically. We then had to explain that we already had a CAE system that was far more sophisticated than the one that they’d shown which was able to take development of components from concept through to development, prototyping, test, costing, release and manufacture, all from the same basic wire-frame data model. It was designed to allow multidisciplinary teams in different organisations and countries to work together on complex design processes. Their application could never be made to fit into the way we did business.

    There were emotional scenes. They’d spent two years, and enormous expense (investment they called it, somewhat optimistically), doing this work with only us as customers. They had created purpose-built graphical workstations. They had relied only on the occasional remark by senior management of our company, but had never thought to penetrate through to the people who were actually doing the work. We left rather hurriedly, and they were immediately on the phone to those managers in our company who had lulled them into proceeding with the project. Their initial ploy in maintaining that we were mere running dogs, incapable of understanding corporate strategy, didn’t wash. In the end, everyone had to admit that this was another project that would bite the dust.

    This story is re-enacted over and over again, with different details, in government and industry throughout the world. Almost without exception, the same mistake is made every time. It is not entirely the case that development team doesn’t do their research first, although cutting code is much more fun than understanding in detail what the application needs to achieve. More commonly, they do not have the experience to identify where the real knowledge about business requirements and processes lies. They never penetrate the fog of misperception that exists in every large company.At an intellectual level, it is obvious that IT initiatives have to be fit for purpose, but emotionally, there is always the siren voice saying ‘This time it will be different, we can evolve prototypes and elicit the full requirements along the way; we can re-engineer business processes, rather than to fit with the existing ones.’ It never is different. Sadly, history just goes on repeating itself.

  • Actionable waffle

    Posted Monday, February 04, 2008 8:08 PM | 3 Comments

    One of the funnier malapropisms that have recently emerged from Microsoft’s Marketing arm is the word ‘Actionable’ . You can see how it happened. They wanted a word that sounded vaguely a ‘good thing’ without being too precise. We all like things we can act on don’t we. ‘Actionable' sounds good eh?

     

    ‘… to provide IT managers with an actionable set of prescriptive guidelines for improving operations…’

    (http://www.microsoft.com/Downloads/details.aspx?familyid=A030626B-C2E4-4D7C-AB75-832D360C86ED&displaylang=en)

    ‘… Deliver Actionable Insight Throughout Your Organization with Data Mining…’

    (http://msevents.microsoft.com/cui/WebCastEventDetails.aspx?culture=en-US&EventID=1032355543&CountryCode=US)

    ‘… Actionable audience intelligence and effective targeting technologies are essential for marketers.’

    (http://advertising.microsoft.com/research/travel-marketing-i)

    ‘… delivered real-time actionable information needed to help grow a business through search engine…’

    (http://advertising.microsoft.com/uk/advertising-events?Adv_EventID=97)

     ‘…It offers easy inventory, powerful assessment and actionable recommendations for Windows Server 2008, …’

    (http://connect.microsoft.com/site/sitehome.aspx?SiteID=297)

     

    Oops.  Typically of the purveyors of word salad, they didn’t check their dictionaries first. In this case they got it comically wrong.

     

    Actionable has a definite and precise meaning in both American English and in Queens English that is well known in the Legal Profession.  If you say, or do, something Actionable, it means you could be sued for it.  To quote from Webster, ‘furnishing grounds for a lawsuit’.  www.TheFreeDictionary.com talks of Actionable  meaning  ‘affording grounds for legal action’, giving the example  "slander is an actionable offense", and gives the synonym ‘unjust - violating principles of justice’. The Oxford English Dictionary yields only one meaning: ‘Subject or liable to an action at law. Of such a character that an action on account of it will lie’, and gives examples going back to the seventeenth century. To be fair, recent Websters have listed an ancillary meaning of ‘capable of being acted upon or readily used’, but it may just be a response to a frequently used malapropism. No, if you key the word ‘actionable’ into the Wikipedia,  you are straight into  the realms of Litigation.

     

    So sit back and read the marketing blurb with a lighter heart and a new cheerfulness. They are, unintentionally, telling you that the stuff they’re trying to sell you will get you sued!

  • Microsoft Boy announces his School Homework

    Posted Sunday, January 27, 2008 5:21 PM | 16 Comments

    Continuing in our series of attempts to imagine how Microsoft Marketing people relate to their fellow men outside work, we give you a glimpse of Microsoft Boy at school, before the start of his splendid career at Redmond.

    Scene: The History lesson in school. The teacher wearily calls Microsoft Boy to his desk to try to discover where his homework is.
    _______________________________________________

    Teacher:

    "Well, young William, (looks over his glasses severely) where is your homework? It should have been handed in today, I'm afraid."

    Microsoft Boy:

    (with a smug ingratiating smile redolent of sincerity) "The past week has been an amazing time for the me as I geared up to announce the delivery of my essay. The response to my announcement from friends and parents has been overwhelmingly positive – in fact, even my aunt Edith wants to read it. What is catching users' eyes? Legibility, correctness, conciseness….the list goes on and on. Simply put, this history essay is a significant release for me – one that builds on all of the great things that I was able to deliver last year in the Lower fifth. I see it as a critical step forward for my academic life here, and the foundation of the broader vision for my school career. Based on what we are hearing from people who have seen the current version of my essay, it seems that everyone agrees."

    Teacher

    (impatiently) "Well, that may be the case, but you haven't actually handed your work in. Where is it for heavens sake? The others have managed to hand their work in!"

    Microsoft Boy:

    (earnestly) Not surprisingly, one of the top areas of focus for me is always to deliver high quality homework, and in a very predictable manner. This is vital for my dazzling school career – which is why I've frequently discussed my goal of releasing my history essay within three months of the last one. I am on track to reach this goal. (folds his arms with a smile of achievement)

    Teacher: (whilst rustling about, searching on his desk)

    "I don’t see it, I really can't find your essay on my desk. It was supposed to have been handed in today."

    Microsoft Boy: (sensing something not quite right in his relationship)

    "To continue in this spirit of open communication between us, I want to provide clarification on the roadmap for my essay. Over the coming months, you, and the other teaching staff here can look forward to significant milestones in the delivery of my homework.  I am excited to deliver a release candidate of the essay in a month's time, at Scout Camp, with final Release of the entire homework expected in another couple of months. My goal is to deliver the highest quality History essay possible and I simply want to use the time to reach the high bar that you, my teacher, has set."

    Teacher: (Head in hands, dispairingly)

    "I really don't understand. Have you handed in your homework or not?"

    Microsoft Boy

    "I have not, in any way, changed my plans for launching the essay today. What I have done today is to announce to you the delivery of my essay, and I'm proud to have met this target. Please keep the great feedback coming and thank you again for your ongoing support of my  'best-in-class' academic work!" (Proudly walks out of the classroom)

  • The Three Little Pigs in Java

    Posted Tuesday, January 22, 2008 2:19 PM | 1 Comments

    Once upon a time, in a land .far off  to the west of Krakatoa called Java, .there lived three little pigs. They were called by the unique identifiers John, jOhn, and johN. The first little pig was known to his devoted parents as piglet[0], the second as piglet[1] and the third as piglet[2].

    Unfortunately the pigsty was too small, due to inadequate requirements-gathering at the time of construction, a familiar anti-pattern. The old sow said, ‘Now my dears, upon reflection, we seem to have a scalability problem. We need therefore create a distributed architecture. I have no inheritance, so you must to go off into the woods and build your own homes.. But, whatever you do, make them resilient and beware of the Big Bad Tester, for he will huff, and puff, and blow them away’.

    So the three little piglets all went tearfully out onto the road to look for a suitable platform on which to base their architecture. The first little pig met a man with some java beans.

     “Would you like to buy some of my beans in return for a cow?”

    “No fear, My mother told me the story of a little pig called Jack who bought some of your beans, had a big scalability issue with the resulting tree structure and had Gigantic problems as a result.’

    Then the little pig,  piglet[‘John’]  ( or piglet[0]), met a man with a cart of straw. ‘This looks to me to be a component that can be manipulated visually as a builder tool.’ He said gleefully. I can make a nice property with this! So he bought the straw, and extended the house base class with straw, using late bindings, and settled down contentedly in a local field, relishing the persistence of his house object.

    Then along came the tester, who said ‘Is this house of production quality? The little pig said, ‘Of course, lets’ ship it, by the hair of my chinny chin chin’.. So the tester smiled with glee and caused an unhandled exception by huffing and puffing and causing an array-out-of-bounds. Piglet[0] wiped the straw from his hair, exclaiming ‘It must be a hardware fault. I haven’t touched that module in weeks’. Perhaps he should have used curly braces.

    The second little pig wandered down a lane and came across a farmer hauling a load of Sticks I shall build a fantastick house! He cried, and bought the sticks from the farmer. The house class had multiple constructors, and he overrode the wall methods for sticks..

    After great effort, he sat down in his house of sticks, contented. At that moment the big bad tester knocked on the door. ‘Little Pig, Little pig, let me come in.

    “No, No, said the little pig, By the hair of my chinny chin chin, I will not let you come in. The door is a critical section with an exclusive lock.’

    “But we have to check the house’s resilience.” Pleaded the Tester,”It needs to undergo stress-testing”’.

    “..but I’ve used Agile Methodology in developing this house!”, the pig expostulated.

    But the Tester just laughed. He then huffed and he puffed, causing a null reference exception thrown up to the top level thereby causing premature object decomposition.

    Piglet[1]  ran squealing from a pile of sticks exclaiming “I thought I fixed that. Somebody must have changed the code”.

    The third little pig, piglet[2], decided that his two brothers had practiced a number of antipatterns, and determined to select a resilient architecture with bricks. As you generally hear far more about the successful developments in Java, I suspect I have no need to go into the details because you'll have already been told them. You will know that the Testers even climbed down the chimney to try to cause the system integrity to fail. All to no avail.  The Big Bad Tester had an unfortunate accident whilst regression-testing the roof Struts, and the third little pig lived happily ever after.

  • A great programmer.

    Posted Friday, January 18, 2008 5:01 PM | 3 Comments

    I sat back in my chair the other day, and my eye happened to fall on a bookshelf with some old A4 binders in it. The brain works in mysterious ways, well mine does certainly. I'd been brooding recently on the thought that the real breakthroughs in software are all made by individuals, not teams. Teams, by their very nature, try to perpetuate the status quo, not change it. I was struck with the strange urge to see what was in those old A4 binders.

    A dimly remembered individual had, in 1980, carefully and neatly bound the documentation and source code of a program called STOIC, written in 1977. That person was me. It came flooding back, the memory of that wonderful craftsmanlike code that I used to pore over. It taught me so much. STOIC was a language, like a rationalised and tidied dialect of FORTH. STOIC code was written in RPN, and most of the language itself was written in STOIC with just a small kernel in machine code. The program had been written on a Nova Minicomputer and Cross-assembled for the new 8080 chip, ancestor of the current Intel and ADM range. It was written by John Sachs the MIT and Harvard Biomedical Engineering Centre in Boston, (part of the Health, Science and Technology Division) in February 1977. It had a built-in Operating system, assembler, floating-point package, interrupt handler, and display editor. It was a work of brilliance. I wondered what happened to John Sachs, programmer. Is he now a grey bearded professor at MIT? I had the urge to contact him and thank him for the great help he gave me when I was a novice programmer.

    A quick 'google' brought up the fact that the famous programmer who created the first modern spreadsheet, Lotus 123, and thereby launched the first PC revolution, was also called Jonathan Sachs. (Visicalc was the first spreadsheet as such) Single-handed, he had worked for almost a year, crafting it from machine code. When launched, it was bug-free and ran on any old PC. It sold so fast that it caused a shortage of ring-binders for documentation. Jonathan Sachs had been a co-founder of Lotus Corp. Sachs, left Lotus in 1985, less than three years after the program was launched. He now develops photo-editing software such as PictureWindow for his own company, Digital Light and Color.

    I was curious. Could this be the same programmer who, five years after writing STOIC, went on to write what is generally acknowledged to be the definitive PC application? I was so intrigued that I asked him via email. The reply came back 'Yes, I am the same one'. So, thirty years later than I should have done, I thanked him for writing such wonderful code as was in STOIC. I read all through it again. It was just as great as I remembered it.

    It set me thinking that behind many of today's applications lay stories of individual feats of creativity and intellectual endurance. It would be a great idea to try to tell some of these stories and dispel the myth that truly great applications could ever be written by a committee.

  • Taking the Soup

    Posted Wednesday, January 02, 2008 11:04 PM | 4 Comments

    A while back, I had to review a book by an ‘award-winning’ author. It was an excellent book. I phoned the writer to get some background information and to try to ascertain whether my assumptions about the author’s background knowledge were right. ‘That award you won… What was it?’

    ‘Oh, just one of those awards you know, like the Whitbread, or the Booker.’

    ‘…and its name?’ I pressed, baffled by his vagueness,

    ‘Well, if you must know, I awarded it to myself,’ He confessed.

    I remarked on his initiative.

    ‘Yes, the established literary cartels were slow in coming forward to recognise new and interesting talent, due to their entrenched self-interest, so I pre-empted them and awarded myself a literary prize. It has done wonders for my book sales and everyone seems impressed. Even my mother refers to it to her friends.’

    We moved on to other topics, but before we ended the call, he said ' You know, there were two strange things about that award, …Firstly, after I awarded it to myself, I felt oddly elated, as if some august academic body had suddenly realised my true worth as an author and had strained every sinew to ensure that my talent was acknowledged. ' ’

    Pause

    ‘… and what was the other strange thing?’

    ‘You are the first person ever to have asked me precisely what award it was that I’d won. Everybody else has just taken it for granted.’

    ‘I work in IT. It makes one cautious of trusting qualifications and awards.’

    We parted on excellent terms. He recently sent me his latest book, a bestseller by an award-winning novelist.

    Awards are tricky things to get right. For an award to be universally supported, it has to have a transparent and scrupulously independent selection process. Nobel, Booker and Whitbread occasionally get it right, but then you’d be very hard pushed to find a Nobel prize-winner for Literature in your bookshelf or made into a Movie. Elias Canetti? Toni Morrison? Pearl Buck? Heinrich Böll?

    Professional awards in general are a minefield. The checks and balances that have to be put in place in order to make the selection and nomination process fair and visible are Byzantine in their complexity. The election processes are always constructed to prevent any hint of a possibility of unfair influence.

    I find the Microsoft MVP ‘award’ troubling. This is nothing to do with the MVPs themselves. I have many good friends who are annually, and with excellent judgement, awarded MVP status for their excellent work. People who are MVPs seem to know their specialised subject well. For all you know, I may even be an MVP myself. That has nothing to do with the fact that the MVP ‘gong’ itself is an insidious thing. The M at the beginning is enough to impress anybody’s aunt. MSc, MBE, MBA, MVP. It looks like it is a Master of something or other. It masquerades as an academic or professional award. Traditionally, MVP was the ‘most valued player’ in a baseball team, chosen by the rest of the team. In IT, the MVP is promoted and funded by one commercial organisation, and the final selection of MVPs is made by employees of that organisation in closed session. The M either stands for ‘Microsoft’ or ‘Most’, depending on who you ask.

    ‘Each year a panel of Microsoft employees reviews the contributions of each nominee for quality, quantity, and level of impact on the technical community.’ (from the MVP website)

    The award is given annually, and a recipient cannot claim the ‘qualification’ after the year is up unless re-nominated and re-selected, the ‘Lord giveth and the Lord taketh away’. It is, I suppose, a good way of ensuring ‘the expectations of courtesy, professionalism, code of conduct, and adherence to the community rules’. However, ask yourself this, could it also be a way of pitching the loose canons overboard? 

    Imagine that Microsoft has decided to take yet another slice of the IT market, in an area previously dominated by another player. Let us, for the sake of argument, imagine that Microsoft has produced a product, we'll call it Silverlight for the sake of argument, which takes on Adobe's Flash. This isn't our concern. We can look at the two offerings, and look at Adobe AIR, the commercial response to Silverlight, and make a judgement, based on merit, on how the two products fit with requirements. However, as an MVP, can one then blog on the superiority of the Adobe AIR package without the nagging thought that the panel of those Microsoft employees are going to make sucking sounds through their teeth, shake their heads sadly, and move onto the next nominee the next time your name comes up?

    ‘Eloquent, yes, but is he really singing from the same hymn sheet? Is he really the sort of Microsoft Valued Professional we want?’ The influence of Marketing has an insidious way of penetrating ‘Chinese walls’.

    In reality, Microsoft would never actually have to ‘pull the choke-chain’ at all. There will always be that small lingering doubt in the mind of the MVP who wants to stay an MVP. ‘Would that upset them?’ It is the whole basis of the award which is wrong rather than the way it is implemented and run.

    I realise that the vast majority of people who receive the award have put good selfless service into assisting the community of people who are using Microsoft products. Nobody is saying that the award is equivalent to Reginald Molehusband’s CDM. Can, though, anyone who puts the letters MVP after their name be, in addition,  accepted as an independent industry expert, or do the three letters just mean that the person has 'got the ring though the nose’?

    Naturally, the answer is ‘The thought of Microsoft’s reaction makes not a hoot of difference in what the average MVP will do or say. The letters MVP mean that the recipient has worked hard to support the community, rather than act as an unpaid cheerleader for Microsoft. The award merely shows that the recipient knows the particular Microsoft product he got the award for, and has used it to help the community of users: nothing more.’ Why then dress it up to make it look like more than it really is? Sadly, perception is everything, and it could well be the perception of the IT indiustry as a whole that you’ve ‘Taken the Soup’.

  • A temporary inconvenience

    Posted Monday, November 12, 2007 11:08 AM | 5 Comments

    Here is an interesting interview question. You have a PC in front of you, switched off,  with a database on it. You don’t know any of the passwords and you want to get at the database. Is this possible? If so, then how?

    This happened to me recently, due to a freakish accident concerning me reacting stupidly and impetuously to the death of a domain. I was left with a development database I had to get to urgently, (Backup of development work? Of course, on the local hard disk!) and I had no idea of any of the passwords. Normally, I'd never have bothered to find out by trying.

    In my case, it was ridiculously easy, once the feelings of panic had subsided. I just downloaded a utility from the internet that blanked out all the Windows passwords. Because the BIOS was not secured by any password, I could boot up with a CDROM, blank out the Windows passwords, and then, once more, I was god in this little PC world. At first, I stopped the SQL Service and copied the MDF files off and re-attached them to another SQL Server. Then I realised that I had gained admin rights to the database anyway through a local account. If all else had failed the backups weren’t encrypted anyway, so I could have got at them without any bother.

    I was just chucking to myself over a cup of coffee about my foolishness in getting in a panic about losing the database. It then occurred to me how wise it is to treat server rooms like forts. I could immediately think of several commercial databases with unsecured BIOSs.

    The problem with Database Developers and DBAs dealing with security issues at this level is that they have the wrong mindset. Finding security loopholes is a job for a different sort of thinking. The best security experts I know have a built-in malicious streak. They are like hunters that thrill to run down, and kill, a beautiful wild creature.

    In the meantime, we innocents carry on believing that intruders cannot get at our data by gaining admin rights to the database. I realise that most production servers are properly nailed down and their server rooms secure and monitored, but for the rest of us, maybe it is time to think again.

  • On the trail with the Cowboy Coders.

    Posted Wednesday, October 24, 2007 11:43 AM | 7 Comments

    One of the signs of increasing age in the IT industry is that of 'déjà vu'. New things that are laboriously explained to you ring all sorts of bells. The past flashes before one's eyes.

    I visited a company developing Internet-based applications the other day. They were proud of their progressiveness and explained to me that they developed database systems using the radical new Agile XP (Extreme Programming) methodology. Wow, I thought. That must have impressed the shareholders. Perhaps they don't realize that Agile XP is known in the industry as 'Cowboy Coding'.

    I feigned ignorance, which is not usually hard for me to do.

    'Well,' said my genial and pleasant host, 'one of the great things about Agile XP is that one tests every part of your code as soon as you've written it and you don't go on to the next bit until it has been thoroughly tested. Flaws in the system are easily communicated by writing a unit test that proves a certain piece of code will break.'

    'Gosh! You mean you test your code rather than just write it?'.

    'Ingenious isn't it. When writing code, the unit test provides direct feedback as to how the system reacts to the changes one has made. If, for instance, the changes affect a part of the system that is not in the scope of the programmer who made them, that programmer will not notice the flaw. There is a large chance that this bug will appear when the system is in production.'

    'Funny how nobody has ever thought of this before.' My face goes red when I'm desperately trying to stop myself guffawing with laughter. What I love best about working in IT is its rich unintended comedy. My host misunderstood my visceral difficulties as excitement

    'Because you can never be certain that what you coded is what you meant XP uses what it calls 'Unit Tests'. These are automated tests that test the code. The programmer will try to write as many tests he or she can think of that might break the code he or she is writing; if all tests run successfully then the coding is complete, and the programmer can then go on to develop more code.'

    My eyes bulged. My face flushed red.

    Seeing how interested I was he proudly produced an elaborate Java-based module he'd written that provided a test harness for SQL queries and procedure. 'It is great, you can change the parameters in the query (Klop Klop Klop on the keyboard) just type in what results you expect', (Klop Klop Klop Klop), 'and then Run the test harness' (Klop Klop Klop Klop Klop Klop)

    Various windows flashed on the screen in front of me, full of inscrutable commands. It was all very 'Seventies'.

    Finally a message popped up from the depths saying 'OK'. He was so proud; it could have been his baby.

    'Look, all we have to do is to change SQL code a bit in order to simulate an error and you'll see what happens!'

    (Klop Klop Klop Klop Klop Klop)

    Wow. An error message popped up saying there was an error in the code.

    'To think that, for all these years, we've been doing it so stupidly' I sighed, shaking my head in mock-regret. I'm afraid I wasn't able to show him the way that a SQL Programmer like myself would go around doing the same thing in their humble rudimentary way. I hope I explained enough to suggest that that perhaps there has been some sort of primitive thought before the genesis of Agile XP.

    Before you write code, you write a test-harness. (it is often best to get someone else to do it, preferably someone with malice in their soul). When you write the piece of SQL code, such as a function, you append the test harness for unit-testing purposes. In some places I've worked, you'd be given a rough time at code-review if these were missing or inadequate. Here is an example…. (normally, you comment out the test harness when the code is 'put to bed' and the GO moved to the end, so that the harness stays with the code, or you can, in SQL Server, save it as an extended property.)


    -- =============================================
    -- Author: Phil Factor
    -- Create date:
    -- Description: Reverse Charindex. Gives the index
    -- INTO the string OF the LAST occurence OF the
    -- substring
    -- =============================================
    ALTER FUNCTION [dbo].[rCharindex] --reverse Charindex
    (
      
    -- Add the parameters for the function here
      
    @needle VARCHAR(80),    --the substring
      
    @HayStack VARCHAR(8000) --the string
    )
    RETURNS INT
    AS
    BEGIN
       IF
    @needle+@Haystack IS NULL RETURN NULL
      
    IF CHARINDEX(@needle,@haystack)=0 RETURN 0
      
    RETURN COALESCE(LEN(REPLACE(@haystack,' ','|'))
                 -
    CHARINDEX(REVERSE(@needle),
                          
    REVERSE(@needle+@haystack)
                           )
                  -
    LEN(REPLACE(@needle,' ','|'))+2,0)

    END
    GO
    DECLARE @Test VARCHAR(80)
    SET @Test='First test'----------------------------
    IF dbo.rcharindex ('this','This can NOT be a thistle ')
       <>
    19
      
    RAISERROR ('Failed the %s',16,1,@test)
    SET @Test='Second test'---------------------------
    IF dbo.rcharindex ('pr',
      
    '  May the lord preserve me from the priests of the IT industry  ')
       <>
    37
      
    RAISERROR ('Failed the %s',16,1,@test)
    SET @Test='Third test'---------------------------
    IF dbo.rcharindex ('l',
      
    '...who maintain an evangelical following.. ')
       <>
    35
      
    RAISERROR ('Failed the %s',16,1,@test)
    SET @Test='Fourth test'---------------------------
    IF dbo.rcharindex (NULL,
      
    '...by waving their hands a lot.. ')
      
    IS NOT NULL
      
    RAISERROR ('Failed the %s',16,1,@test)
    SET @Test='Fifth test'---------------------------
    IF dbo.rcharindex (' without their ties on.',NULL)
      
    IS NOT NULL
      
    RAISERROR ('Failed the %s',16,1,@test)
    SET @Test='Sixth test'---------------------------
    IF dbo.rcharindex ('','')
       <>
    0
      
    RAISERROR ('Failed the %s',16,1,@test)  
    SET @Test='Seventh test'---------------------------
    IF dbo.rcharindex ('','and looking sincere')
       <>
    0
      
    RAISERROR ('Failed the %s',16,1,@test)  
    SET @Test='Eighth test'---------------------------
    IF dbo.rcharindex ('but it adds to life''s rich humour','')
       <>
    0
    SET @Test='Ninth test'---------------------------
    IF dbo.rcharindex ('Klop','so perhaps one should be tolerant')
       <>
    0
      
    RAISERROR ('Failed the %s',16,1,@test)  
    SET @Test='Tenth test'---------------------------
    IF dbo.rcharindex ('Klop','KlopKlopKlop ')
       <>
    9
      
    RAISERROR ('Failed the %s',16,1,@test)  
    SET @Test='Eleventh test'---------------------------
    IF dbo.rcharindex (' Klop ','Klop Klop Klop ')
       <>
    10
      
    RAISERROR ('Failed the %s',16,1,@test)  
    SET @Test='Twelveth test'---------------------------
    IF dbo.rcharindex (' ','      ')
       <>
    6
      
    RAISERROR ('Failed the %s',16,1,@test)  

    SET @Test='Thirteenth test'---------------------------
    IF dbo.rcharindex (' ! ',' ! !!!  !  !!!     ')
       <>
    8
      
    RAISERROR ('Failed the %s',16,1,@test)  

    There are many cleverer ways of doing it but I blush to think of the stupid bugs that this sort of system catches, that slip through casual testing.

  • Enumerating Group Members

    Posted Tuesday, September 25, 2007 10:20 AM | 3 Comments

    
    /*
    
    Problem: You have a table like this, recording what people eat.

    jane   pie
    fred   sandwich
    fred   sausage
    fred   biscuits
    jane   cake
    Rajiv  pastie
    Rajiv Croissants

    You want it to look like this...

    jane   pie, cake
    fred   sandwich, sausage, biscuits
    Rajiv  pastie, Croissants

    or maybe even you have the latter and you want the former!

    Robyn and I showed how this sort of thing was done in the SQL Server
    Grouping Workbench
     under the title 'enumerating group members'. Let's
    use a slightly more complicated example to show how it is done....

    */

    CREATE TABLE #schedule
     
    (
     
    EventStart DATETIME,
     
    description VARCHAR(2000)
     )
    /* so now we can get some sample data into the table */
    INSERT INTO #schedule EventStartdescription )
      
    SELECT '12 Oct 2007 10:00''Meeting with Bill'
    INSERT INTO #schedule EventStartdescription )
      
    SELECT '12 Oct 2007 12:00''visit Crawley site'
    INSERT INTO #schedule EventStartdescription )
      
    SELECT '12 Oct 2007 13:00''Lunch with Evelyn'
    INSERT INTO #schedule EventStartdescription )
      
    SELECT '12 Oct 2007 14:00''Review of CDW27'
    INSERT INTO #schedule EventStartdescription )
      
    SELECT '12 Oct 2007 17:00''admin'
    INSERT INTO #schedule EventStartdescription )
      
    SELECT '13 Oct 2007 10:00''Finance committee'
    INSERT INTO #schedule EventStartdescription )
      
    SELECT '13 Oct 2007 12:00''lunch with Bob'
    INSERT INTO #schedule EventStartdescription )
      
    SELECT '13 Oct 2007 14:00''Weekly SH meeting'
    INSERT INTO #schedule EventStartdescription )
      
    SELECT '14 Oct 2007 11:00''interviews'
    INSERT INTO #schedule EventStartdescription )
      
    SELECT '14 Oct 2007 15:00''office plan presentation'
    INSERT INTO #schedule EventStartdescription )
      
    SELECT '14 Oct 2007 16:00''performance reviews'


    /* now we produce the report that enumerates the group members */
    DECLARE @list VARCHAR(8000

    SELECT @List COALESCE(@list ',''') + '|' 
                     
    CONVERT(CHAR(11), eventStart113) + '| ' 
                     
    RIGHT(CONVERT(CHAR(17), eventStart113), 5
                     + 
    ' - ' description '|' 
                     
    CONVERT(CHAR(11), eventStart113) + '|'
    FROM #schedule
    ORDER BY eventstart
    /* we'll put the results in a temporary table so we can show the
    reverse process 
    */
    CREATE TABLE #DailySchedule
     
    (
     
    [date] CHAR(11),
     
    events VARCHAR(2000)
     )

    INSERT INTO #DailySchedule
      
    [date]events )
      
    SELECT 
        
    [Date] CONVERT(CHAR(11), TheDate),
        
    [Events] LEFT(membersCHARINDEX(',|'members ',|') - 1)
      
    FROM SELECT TheDate,
           
    'members' REPLACE(SUBSTRING(@list
                               
    CHARINDEX('|' dates.theDate '|'@list),
                               
    8000),
                           
    '|' dates.theDate '|''')
         
    FROM  SELECT [theDate] CONVERT(CHAR(11), eventStart113)
                 
    FROM #schedule
                 
    GROUP BY CONVERT(CHAR(11), eventStart113)
               ) 
    dates
        

    SELECT FROM #DailySchedule

    /* job done, but you may want to do the process in reverse. 
    no problem let's take the end-results and get back to the
    original */

    DECLARE @rollout VARCHAR(8000)

    SELECT @rollout COALESCE(@rollout'
    '
    ) + [date] 
                      
    ' ' REPLACE(REPLACE(events', ''
    [date] ' '), ' - ''|') + '
    '
    FROM #DailySchedule


    SELECT 
        
    [EventStart] CONVERT(DATETIMESUBSTRING(entry
                                            
    1
                                            
    CHARINDEX('|'entry) - 1)),
        
    [Description] SUBSTRING(entry,
             
    CHARINDEX('|'entry) + 1,
             
    8000)
    FROM SELECT [entry] LEFT(line,
             
    CHARINDEX(CHAR(13) + CHAR(10),
               
    line) - 1)
       
    FROM SELECT [line] SUBSTRING(@rolloutnumber 28000)
         
    FROM numbers
         
    WHERE SUBSTRING(@rolloutnumber2= CHAR(13) + CHAR(10
             AND 
    number LEN(@rollout)) - 2
         
    g
      
    h
    ORDER BY eventstart


    --Remember that you'll need a number table for this 
    CREATE PROCEDURE spMaybeBuildNumberTable
    @size INT=10000
    AS
    BEGIN
    SET 
    NOCOUNT ON
    IF 
    NOT EXISTS (SELECT FROM dbo.sysobjects
      
    WHERE id OBJECT_ID(N'[dbo].[Numbers]')
       AND 
    OBJECTPROPERTY(idN'IsUserTable'1)
        
    BEGIN
        CREATE TABLE 
    [dbo].[Numbers](
         
    [number] [int],
        
    CONSTRAINT [Index_Numbers] PRIMARY KEY CLUSTERED
        
    (
         
    [number] ASC
        
    ON [PRIMARY]
        
    ON [PRIMARY]

        
    DECLARE @ii INT
        SELECT 
    @ii=1
        
    WHILE (@ii<=@size)
         
    BEGIN
         INSERT INTO 
    NUMBERS(NUMBERSELECT @II
         
    SELECT @II=@II+1
         
    END
        END