SQL Style Habits: Attack of the Skeuomorphs

Although we like to think that our programming techniques are progressive and in tune with the bleeding edge of software development practices, too often they are directly influenced by restrictions faced in the post-war decades when computers
first became mainstream. As these restrictions no longer apply, is it time to relinquish such things as cursors,
'tibbling', storing display formats, using short names for symbols and primary keys?

I love the word “skeuomorphism” because it sounds so weird and has a lot of Scrabble points. The dictionary definition is “A skeuomorph is a derivative object that retains ornaments from structures that were necessary in the original.” and it is usually applied to physical design. For example, something was made in wood, but now we make it in plastic. Rather than use the properties of plastic, we keep the colors and woodgrain textures of the original wood.

However, things tend to find their own voice over time and evolve into a new creature. The first cellphone ring tones mimicked the traditional desk telephone bell, then later you got to pick from a list and use the tones as a signal of the caller’s identity. “Chopin’s Funeral March” for your boss, something romantic for a lover and so forth. The problem is that the old skeuomorph too often hangs around as a vestigial organ. It is not just the physical things, like the watch pocket on Levi Jeans (http://www.levisguide.com/watchpocket/watchpocket.html). It includes pieces of your mindset that you have picked up, often without knowing what it is or why it exists. I now have to tell an old joke:

A young woman was preparing a ham by cutting of the protruding ham bone before putting it in the oven. Her friend asked her, “Why did you cut off the bone”?

And she replied, “I really don’t know but my mother always did, so I thought you were supposed to.”

Later when talking to her mother she asked her why she cut off the bone before baking it, and her mother replied, “I really don’t know, but that’s the way my mom always did it.”

A few weeks later while visiting her grandmother, the young woman asked, “Grandma, why is it that you cut off the ham bone before you bake it?”

Her grandmother replied, “When I was first married, we lived in a small apartment with a tiny stove. The oven was so small you had to cut off the ham bone to get it to fit and cook it a long time.”

Mainframes & Tapes

This rambling vocabulary lesson leads us to our “first small apartment” – Mainframes, computers and the FORTRAN I – and COBOL programming languages. Most of the readers have not even seen mainframes.

2256-7ecf8d8e-682c-4c25-858e-e72a4916134

Fig 1: IBM System 360 Model 65 administered to by Mark 1 geeks.

Originally, this hardware used magnetic tape drives and punch cards. If you want to look EBCDIC, ASCII, Fielddata, TTY and several other competing binary encoding schemes that were in use in those days, get a copy of Coded Character Sets: History and Development by Charles E. Mackenzie (ISBN 0-201-14460-3). The important part of a magnetic tape is that it is a sequential-access-only file structure, with records made of physically contiguous fields.

In order to know where the read/write head on the tape drive is located on the tape, the system has to count the physical records that have been read. The first tape drives mimicked punch card readers in the way they read in one direction only. Old T-SQL programmers will remember the early Sybase implementations whose cursors could read forward only.

When the read/write head is placed on the start of record, it increments a counter and looks at a flag. The flag tells the system if the record is active or not. Think about the alternatives. You could write over the record with zeroes, but that takes time. You could physically move the active records down toward the start of the tape, but that takes insanely more time. Tapes are not random access media.

This is where the concept of a Master file or tape comes in. We have one tape that holds the current data. On a regular schedule, we take other tapes with new data, sort them in the same order as the master and merge them into a new Master. The old Master and the transaction tapes are then archived.

Tape is cheap, but it is bulky. Managing the tapes required a tape library program and human operators to mount and dismount the tapes from drives. The reels of tapes had physical labels, and the most common labeling system was based on the ISO-8601 ordinal date. But in those days we only used the year-within-century for the first two digits; this was part of the “Y2K problem” that scared us back then, but that is another story.

2256-051094c2-22df-4564-b46a-787d760353f

Fig 2: ‘Since punch cards have 80 physical columns in them, these records also had 80 columns.’

There were also header records on the tape. Vendors had various header record formats, but there was a standard (http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-013.pdf). Since punch cards have 80 physical columns in them, these records also had 80 columns. The size of the actual data records, how they are blocked and other things to do with how to read the data of the tape. Today, we would call this meta data. But besides the physical layout descriptors, there were things like “creation date” and data source information.

Now go to any SQL Forum text search the postings. You will find thousands of postings with DDL that include columns named “createdby“, “createddate“, “modifiedby” and “modifieddate” with that particular meta data on the end of the row declaration. It is the old mag tape header label written in a new language! Deja Vu!

The header records appeared only once on a tape. But these meta data values appear over and over on every row in the table. One of the main reasons for using databases (not just SQL) was to remove redundancy from the data; this just adds more redundancy. But now think about what happens to the audit trail when a row is deleted? What happens to the audit trail when a row is updated? The trail is destroyed. The audit data should be separated from the schema. Would you put the log file on the same disk drive as the database? Would an accountant let the same person approve and receive a payment?

Today, we have tiered architectures. We have front-end layers, presentation layers, a report tool, analysis & statistics, audit tools and other such things which reside inside their own fiefdoms (often their own servers), with well-defined interfaces among them.

But when you work with monolithic programs, there are no layers. Everything has to be in the file. This leads to a lot of redundancy and puts a burden on the programmer who has to keep everything in sync in application code. In RDBMS, the schema is the unit of work. Tables in the same schema can reference each other via DRI. If I change a customer address, all the tables (present and future), will cascade the changes and reference the same row in the referenced table. The principle is “one fact, one time, one way, one place” as part of normalization.

The PRIMARY KEY is another skeuomorph from tape files. Obviously you can only sort a tape in one and, only one, order. This fact was so ingrained in our mindset, that when we built the first SQLs on top of sequential file systems, we preserved the old familiar model. Logically, all keys have equal “key-ness” to them (i.e. NOT NULL and UNIQUE properties), so there is no reason to mark one of them as special. Except it feels good to us.

Dr. Codd later corrected this in his relational models, but the damage was done in SQL. In particular, in SQL Server, the PRIMARY KEY defaults to a CLUSTERED index to mimic the original tape files.

This “mag tape mindset” was also the reason that GROUP BY was done by sorting to create groupings, so that an ORDER BY clause was not needed in early T-SQL programs. Nobody thought about building grouping with hashing with parallel processors.

FORTRAN I

One of the first programming languages was FORTRAN (short for “Formula Translator”) which was designed by. John Backus and first appeared in 1957 on IBM hardware. It was a simple procedural language with less than fifty statements and only two data types – INTEGER and FLOAT. There was no data declaration in the language at this point. Instead, the first time the compiler read a variable name, it determined its data type by the first letter. Those that began with I through N were Integers (get it?) and all the other were floating point.

It is also worth noting that FORTRAN I had only six letter names and uppercase letters. No, really. I recently saw a posting on a SQL forum that still followed those limits.

The early versions of BASIC used $ for strings. And you probably have noticed that Sybase T-SQL uses @, #, @@, and ## for the same reason. It is now too late to change it, thanks to the “code museum” effect of legacy code.

The use of an initial letter or prefix was carried over from early operating systems that also used one-pass compilers. This might be a stylized name with a number (MT0, MT1, MT2, ..) that identifies the physical device, not the data on the device. Every time you see a FROM clause with alphabetical aliases in alphabetical order, the programmer has mimicked physical tape or disk drives that held files instead of tables.

2256-e611513a-5915-4255-9828-0a1c386bd73

Fig 3 ‘One day perhaps we’ll persist as skeuomorphs’

Those SQL programmers who code on auto-pilot keep this skeuomorph by using the data types as prefixes, but carry the design flaw further. They tibble! They put an affix like “tbl-“, “-table“, “tb-” in a table name. Think how silly this is! SQL has one, and only one, data structure: the table. This is like prefixing every word with “word-” in your sentences.

But it does not end there. They will add data type affixes to column names. They prefix VIEWs with “vw_” (this is called “Volkswagen-ing”) and use “udf-” or “fn-” prefixes on user defined functions. This last one is special because the FN- prefix was required by FORTRAN (later by BASIC) for in-line functions.

The final ghost of FORTRAN is in how we had to implement a function. Stealing a simple example from the great Hugo Kornelis:

Did you notice the local variable @result? Why not just write:

and not bother to create and load a local variable? Other languages have always allowed this, and in particular declarative languages really like doing it. The answer is a skeuomorphism from IBM hardware. Early machines had registers, accumulators and other specialized hardware. The result of a function had to come back via a register in the first FORTRAN compilers because of hardware limits.

COBOL

Today, we need to name a thing for what it is by its fundamental nature. This is the ISO-11179 standard in ten words or less! A data element uses the syntax “[<role>_]<attribute>_<attribute property>” so that the name is complete in itself, not dependent on a context.

But in COBOL, all data is kept in strings that stored as physically contiguous records. The fields that make up the records sub-strings are usually nested in a hierarchy. For example, a customer address record is made up of a house number, street name, city, state and postal code sub-fields. The sub-fields are always in the context of the “customer address” and never considered alone. This is the nature of a hierarchy versus a set.

But in RDBMS, they are separate data elements, and I need to know if this is “city_name“, “city_population“, “city_area” or whatever. I have to have what ISO calls an attribute property to name it correctly. If I use that data element in multiple roles in the same scope, I need to qualify it. For example “boss_emp_id” and “emp_id” are (possibly) different elements from the set “Personnel“, which play two roles in the data model.

COBOL has no presentation layer, so the data has to be put in display format in the DATA DIVISION (the COBOL version of DDL) by using a PICTURE clause. This lets you put commas, dollar signs, decimal points and other punctuation marks in the strings. Ever wonder about the MONEY and SMALLMONEY data types in T-SQL? They do what PICTURE does! Likewise, the CONVERT() string function with temporal data types is another hidden COBOL skeuomorph.

The main verb (COBOL term for the operators) is MOVE. This shows just how much the physical model of data dominates this language. The strings are copied from field to field and get formatted by the destination PICTURE clauses. This language also likes to use flags to track changes to the data and computations from other levels of aggregation. In the old days, we did not have much storage, so an application was broken into a series of steps that passed along their output to the next process. Transaction tapes would be sorted with a polyphase merge (look it up; it is a fun algorithm), edited, then merged into a new Master, and finally taken over to a printer to produce a hardcopy output.

None of the steps had direct contact with the previous or following step; they communicated by setting flags or doing totals that are passed along with the data.

You can find this same logic done in SQL.

Today, we can use a MERGE statement and get some safety, but the original UPDATE statement works just fine

The quota flag can be set after you have gotten the total. The next process step.

But the “SQLway” would be:

The quota flag is not in the VIEW. Let the presentation layer decide what the business rules should be. This view is not materialized until invoked and it is always current and correct. Just like the ringtone on your cell phone is not hard-wired into a physical bell inside a fixed land line any more.

Tags: , , , , , , ,

  • 11957 views

  • Rate
    [Total: 0    Average: 0/5]
  • Anonymousest

    Wow, walk down memory lane
    I remember late nights with data tapes, and master tapes, and freaking out when the tapes wore out.

    I remember my first card machine. I thought it was awesome.

    More than these, I remember the guys who thought they were better than me because while they went to school to learn, I was self-taught. I taught myself so I could write my own computer games.

    The skill it taught me that those college-educated guys rarely got was that I can learn a new language in a short time. THEY ARE THE REASON WE HAVE LEGACY CODE!!! They are why we still have FORTRAN in SQL.

  • Robert young

    hams or rails?
    For myself, I prefer the story of how the standard gauge railroad track evolved. I’ll leave it as mystery for the reader, if so inclined, to solve.

  • Anonymous

    rails
    According to a railroad museum, wider tracks were tried, but they found they physics of all things considered, the current width is best.wider rails mean slower, wider curves, and more possible derails and greater wear one the wheels for not enough benefit of increased cargo capacity.

  • Phil Factor

    The Royal Navy Field Gun competition
    Here in Blighty, at the Royal Tournament, two crews compete to transport a 12 pounder field gun and limber over a series of obstacles.including a 5ft wall and a 25 ft chasm. It is a great spectacle, At one occasion, at the ‘run-out’ where each crew then fired three rounds, the show was being watched by several military dignitaries and their spouses as the gun was assembled and fired in a whirl of activity, but one of the spectators, the Brigadier’s wife, was puzzled by the fact that one of the team merely stood rigidly to attention with an arm held out. She asked why, After asking around, none of the spectators could explain. After the Royal Tournament ended, the brigadier determined to find out. It turned out that, back when the tournament was first invented during the Boer War, this man used to hold the reins of the horses!

    In IT departments, there are many people ‘holding the reins of the horses’.

  • John Rogerson

    IBM 360
    A big pang of nostalgia for me seeing a picture of an IBM 360. I started my journey as a computer operator in the late 1970’s on an IBM 360/30 – which had been manufactured in 1967. It had 32K of memory, and 8K was dedicated to the print spooler. I still have 80 column cards in my drawer at work with some of my COBOL notes scribbled on the back (eg. how to quickly initialize a table in working storage by redefining it, initializing just the first entry, then moving it to itself from the first entry to the second entry – as it was a byte by byte move the whole table got initialized). Nowdays I’m SQL Server database developer in the BI sphere. Every now and again my colleagues wander around and prod me to see if I’m still alive. At a SQL course the other year I had to take along a picture of what core memory looked like as no-one on the course believed me. Kids!

  • Jani

    vw_
    I actually use vw_ prefix deliberately for most views to make a distinction between tables and views in queries. I also prefix views that access other databases with xvw_ (eXternal VieW). This way it is easy to locate and change inter-database connections in SQL Server.
    I do have some views that mimic table structure and name of tables that were modified for the purpose of backward compatibility. Sometimes it’s the easiest way to go when implementing new functionality in old schemas. A good database design can last more than 10 years if software in maintained correctly.

  • bourgmw

    IBM 360s too
    John Rogerson, you may be a bit less superannuated than I am. I started in 1970 with a 360/20. We would boot up OS, TOS, or DOS depending on what legacy code needed to run.

    I remember one report that had been written to run on an IBM 1401 machine which had had a whopping 6k of memory. We operators had to run a program deck and cards to print out the report, tear off the paper and feed it back into the 1403 printer, and then run another deck to print page headers on the report. Good times.

    I’m a Sql Server DBA now and get similar inquiries about my wakefulness, pulse, etc… Those young developers won’t get off my lawn either.

  • Robert young

    Hollerith Hell
    @ Mr. Rogers(on)
    I started my journey as a computer operator in the late 1970’s on an IBM 360/30

    So did I, as it happens: a contracted machine situated over a Baskin-Robbins ice parlor in Harvard Square. Although, mid 70s for me.

    Which fact reminded me of the worst skeuomorph: the continued insistence that all input (source, data, what have you) be limited to 80 columns. Nothing makes code harder to scan for meaning than disconnected phrases. Or breaking "records" into 80 byte units. Still done by java folks of my acquaintance.

  • Bart Read

    Local variables… or not
    This made me smile, but there’s one place I take issue…

    "The final ghost of FORTRAN is in how we had to implement a function. Stealing a simple example from the great Hugo Kornelis:

    CREATE FUNCTION dbo.Triple(@input INTEGER)
    RETURNS INTEGER
    AS
    BEGIN
    DECLARE @result INTEGER;
    SET @result = @input * 3;
    RETURN @result;
    END;
    Did you notice the local variable @result? Why not just write:

    CREATE FUNCTION dbo.Triple(@input INTEGER)
    RETURNS INTEGER
    AS RETURN (@input * 3);"

    This may be so and switching from SQL over to C++, Java, C# or JavaScript I *always* used to do the latter, because why not? It’s more efficient after all. Right?

    Except it’s a pain in the arse when you step through with a debugger. Nowadays you generally get the return value but if you do some complex calculation, member access, or sequence of method invocations and don’t assign to intermediate results along the way you may not easily be able to view them in the debugger.

    Nowadays therefore I’ve started to use *more* local variables to hold intermediate results because:

    (1) it makes debugging easier in debug builds (and if anyone suggests that if you have tests you don’t need a debugger I will beat them with a fire extinguisher because how else to you find out what caused that integration test failure?),

    (2) I type fast so slightly more typing for the sake of clarity won’t kill me,

    (3) compilers are generally smart enough to optimise such intermediates away in release builds.

    Totally agree on all other fronts though. I have a particular beef with terse/non-descriptive variable and method names. Why? Why do this? Especially if you’re then going to write a blasted comment that nobody will bother to maintain.

  • Robert young

    Don’t Bug Me
    @ Mr. Read
    Except it’s a pain in the arse when you step through with a debugger.

    There was a time when the mantra was, "a programmer/coder spends most of his/her time thinking, so fancy IDEs don’t add much value; just use EMACS or vi". Or, as the trades still say, "measure twice, cut once".

    Since the rise of the web and koders who know nothing else, it’s become clear that coding for such folk is the just the precursor to actually divining the logic… in the debugger. "I don’t have time to think, I must crank out more LoC!" I’m not at all sure that this is a note of intellectual progress.

    "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
    — Kernighan & Plauger

    One might add a corollary: if you write code as sloppily as possible, no debugger will help.

    Now, as to local variables, they’re useful to the debugger for the same reason they’re useful to the human: a lexically explicit logic trail. No need to claim the debugger needs them. Humans need them.

  • Anonymous

    Not so unusual
    Looks like an ashtray on the left of the guy at the table in Fig 1. That was not unusual in those days.

  • Jonathan Shields

    Easy to read can be better
    There is an obsession these days with writing fewer lines of code, and often the "saved" lines made the code more readable and saved time when debugging/problem-solving.

    I have also recently worked with "experts" who never wrote comments and never used error handling as "the code itself should deal with any problems". Sounds a bit like "no lifejackets as the ship shouldn’t sink" to be honest.

    As I approach the big 40 I have realised that fashions changed and technologies come and go, but the need for common sense stays the same

  • Anonymous

    Ashtray
    And a pack of cigs as well. Good times!

    Agree about local variables making the code easier to read – our brains can only understand stuff in stages

  • Doug Johnson

    thanks
    thanks for the walk. I spent most of my early years in the user side running code with some development. Then the computer room was IBM only and seeing actual equipment was through a window! Now still a combination user/developer but do everything! Thanks for making me smile this morning!

  • Anonymous

    More skeuomorphs
    The first skeuomorph I thought of was the fake wood trim on my family’s old station wagon. I agree with above comments on affix "VW" to views (easy to spot the views when having to improve performance of a query) and using local variables (cleaner debugging; easier to modify). But since I started programming on a RCA Spectra in the early 70’s, I’m sure I have creeping Fortran-ism in my code. Probably worse in C# than in SQL.

  • sly bry

    programming paradigms
    I started my "mainframe" career in mid-70’s with summer internship at Palo Alto IBM learning the awesome APL language. I remember wrestling with Cobol code and going to great lengths to make it much more structured and eliminating spaghetti logic. To the point of anticipating object orientation before it became a buzzword. I was a big believer of commenting the code.

  • Anonymous

    Tibbling
    Tibbling object names produces better hits (reduces spurious hits) when crawling code.

  • Eric

    Some afterthoughts
    Just see myself could be a middle-age "mother", still cut the ham bone in Joes’ joke.

    I start programming with BASICs in school, and COBOL is my first language used in my IT careers, so some habbit may carry along without being notice, even after switching to DBA role. So I feel a little shame as I may still not catch all the merits behind what Joe tell… And thus would be glad if anyone can shed some light to lead me out …

    1) In early days while learning about MSSQL Server, one "dream" (which I still never achieved) for DBA as heard,
    would be only granting views to developer and hide all direct table access, so in case any re-factoring activities occurs
    DBA could carry out without code change.

    Hence I may still think naming covention employed, as saw on quite some code sample, where view (vw_) as seperated from table (tbl_) could be helful, can catch such undesired access during code inspection …

    2) I beware nowadays lauguage syntax no longer required special prefix to differentiate different attribute type,
    though from past occasion of problem with numeric values, I would be extremely cautious if any comparison of floating point value with an integer (where 0 = 0 may not always true, if 0 is only an approximation in floating point). particular when having implicit data type conversion within an calculation, they are all culprit of potential problem.

    So I may still not beware the harm as such prefix to variable caused, overwelm all downside if doing otherwise, as that practice may aid detection of the problem in the first place …

    3) For primary key in SQL Server which default to clustered index, I still find much web source to promote its merit on performance in "most" scenario,
    https://www.brentozar.com/archive/2015/07/finding-tables-with-nonclustered-primary-keys-and-no-clustered-index/
    and SQL Server already allows designer to choose not to be so, if they aware otherwise.

    4) My deepest desire to know more, in particular about what would be a "proper" naming of individual element would be, as seen in Joe’s COBOL section. Joe has already pointed to an ISO standard, though undestand many time such standard would tell what should be, without telling why it is so ….

  • Anonymous

    Stunned!
    It appears from the dates mentioned that I may be one of the oldest developers still on the go. 1966, Forward Trust (1401) and still going strong as a contractor to fund the artistic endeavours these days.

    Oh, my CV does not refer to those early days.

  • Robert young

    Emulation is the Finest form of Flattery
    — 1966, Forward Trust (1401)

    Hmm. Does z/OS still emulate the machine? IIRC, it was up through the 370?

  • Anonymous

    railroads
    I think robert young was referring to the width of a horse’s behind. One of the more ancient skeuomorphs. Which used to be extended forward to the space shuttle via the solid fuel rockets but unfortunately they are all now decommissioned.

    I really like historical stuff like this. If I had to sort punch cards I would probably smoke too! LOL

  • Robert young

    We have a winner!
    Yes, the legend is that early roman roads were spaced for two horse chariots, which were built to span said two horses. That span, naturally, is two horse butts plus a bit to avoid constant contact. The chariots wore ruts into soft roadways. Roadways were repurposed to rails. And so on.

    Whether this is any more than folklore, I suspect not. But it is a fun story.

  • Ed K

    The punch card skeuomorph
    The size of the 80 column punch card is itself a skeuomorph. As you may know the first use was to record census data in the late 1800’s. Since the census bureau was Washington based they got help from the bureau of Engraving to cut the cards to the size of the dollar bill. Of course someone will point out that the dollar is smaller than the punch card. That’s because some time in the 1920’s they changed the size of the dollar, but the punch card never changed. (God created the heavens and earth in six days, but he didn’t have an installed user base to keep happy!)

    Someone earlier commented on being the oldest having started in 1966, I’m only two years behind him. I currently use SQL server to analyze Medicare usage records. I occasionally claim to be the only Medicare data analyst who is also a Medicare recipient.

  • Anonymous

    railroads
    I think robert young was referring to the width of a horse’s behind. One of the more ancient skeuomorphs. Which used to be extended forward to the space shuttle via the solid fuel rockets but unfortunately they are all now decommissioned.

    I really like historical stuff like this. If I had to sort punch cards I would probably smoke too! LOL

  • Ed K

    The punch card skeuomorph
    The size of the 80 column punch card is itself a skeuomorph. As you may know the first use was to record census data in the late 1800’s. Since the census bureau was Washington based they got help from the bureau of Engraving to cut the cards to the size of the dollar bill. Of course someone will point out that the dollar is smaller than the punch card. That’s because some time in the 1920’s they changed the size of the dollar, but the punch card never changed. (God created the heavens and earth in six days, but he didn’t have an installed user base to keep happy!)

    Someone earlier commented on being the oldest having started in 1966, I’m only two years behind him. I currently use SQL server to analyze Medicare usage records. I occasionally claim to be the only Medicare data analyst who is also a Medicare recipient.