06 June 2013

SQL Server CREATE TABLE syntax diagrams

Many of us have seen, on MSDN, the heading 'Syntax', followed by a rash of nested brackets and keywords, enough to put off the most resolute of code-cutters. However, there is a goldmine of information there, and Phil had an ambition to get at it, and share the gold. The result is this article, full of railroad diagrams

Most of the time, we can create tables perfectly adequately using our preferred GUI tool. However, for complex tables, a GUI can’t always provide what we want, or even if it does, it is often quicker and easier to use code.

The “Backus-Naur Form (BNF)” diagrams on MSDN, the format of which I’ll review briefly towards the end of the article, explain every syntax detail, but in a form that is very hard to digest and keep in the head, well in my head anyway. Often, I find myself reverse engineering the syntax from the examples.

The aim of this article is to provide a smooth route through the syntax, by means of railroad-style diagrams. I developed these diagrams, initially, while creating a SQL-based build-script generator and trying to learn some of the more esoteric aspects of the SQL Server 2012  CREATE TABLE syntax. However, I soon became convinced that others might find them useful. I found that the details “stuck” far more quickly when learning from a diagram and through their use, I uncovered several interesting facts about how SQL Server 2012  CREATE TABLE works that had eluded me previously.

Diagrams, not words

To create a table in SQL Server, we just need to use the CREATE TABLE command, give the table’s name and a list of the columns we need. We can also specify where the table is to be stored and provide some table options, including constraints.

To describe this in more detail would take too many words, so I’m going to use diagrams instead. These are railroad, or syntax, diagrams and you encounter them occasionally in SQL. SQLite, for example has them for its documentation, and so has Oracle.

With a syntax diagram, there is only one entry point and one exit point. We start at the entry point and read from left to right. We can take any of the possible paths or railway lines to the exit, but we can only travel in one direction along each path.

I’ve changed tradition slightly by running down the page as well as from left to right, but elsewhere, I’ve adhered to the following conventions:

  • Commands and other keywords appear in UPPERCASE inside rectangles, and we type them literally, exactly as shown in the rectangles.
  • Parameters appear in lowercase inside ovals. We generally substitute our own information, such as names of tables, columns, numbers or expressions  for the parameters . If parameters are within angle-brackets they represent an entire diagram.
  • Punctuation, operators, delimiters, and terminators appear inside circles.

Where the syntax gets a bit dense, I’ve taken the liberty of putting the punctuation with the associated symbol rather than separating out the punctuation in its own circle. I’ve only done this where space is a bit tight and the usage is unambiguous.

Using a Syntax Diagram

Once you get the hang of a railroad diagram, they are a glorious way of getting to understand syntax quickly. Start at the top, the terminus, and travel along the railway system. You can only go one way. As with any railway map, there are certain sections, akin ot the suburban lines around and through major cities, which require an “exploded view” to show the details.

At such points, the parameter in the oval appears in angle brackets (<>). This signals that, for detail, we need to change temporarily to a more detailed map. When a syntax diagram aims to be so complete that someone can use it to engineer a parser, it can get quite intricate. Fortunately, this isn’t the case with our diagrams.

The SQL Server 2012 CREATE TABLE Syntax Diagrams

Evey diagram that follows is done in reduced definition so it fits on the page. Click it to see it a decent size, or use the PDF file to view it or print it out in a good working size. The PDF files are in the downloads at the bottom of the article.

Although there is only one high-level diagram to consider for creating a table, we may require five other detail diagrams at certain parts of our journey. I provide each diagram as a separate PDF file, as part of the code download with this article (see the speech bubble to the right of the article title). They are readable when printed out in A4 but I like to print them as A3 wall-charts.

The annotations that you’ll see on each diagram are quotes taken from MSDN, to whom all credit must go. The words in the MSDN CREATE TABLE page are all smithed with great care. Whereas I’ve spotted errors in the syntax BNF on the page, the words have always been spot on, and packed with information.

Let’s take a short example of use, taken from the create_table_overview diagram.

1815-img33.jpg

The entry point is CREATE TABLE. From here, we can qualify the table name with the database name and schema name, each separated by a full stop (MyDB.MySchema.MyTable) or we can just qualify with the schema, or we can simply enter the table name.

After an opening bracket, we hit the <column_definition> junction, where if required we can jump to the Column Definition syntax diagram for details of how to define the required columns.

A comma separates one column definition from another, or from a computed column definition, or a column constraint definition, and so on. In this fashion, we work our way through the railroad network, in the directions of the arrows, and making a choice at every junction.

CREATE TABLE overview syntax

1815-TableOverview.png

The Column Definition syntax

1815-ColumnDefinition.png

Column Constraint syntax

1815-ColumnConstraint.png

Computed Column Definition

1815-ComputedColumnDefinition.png

Table Options

1815-TableOptions.png

Table Constraint

1815-TableConstraint.png

Index Options

1815-IndexOption.png

Using the Diagrams

I find that I learn a diagram more rapidly than a raw syntax expression. If you are similar, then you soon find that you can dispense with most of the diagrams except for some of the more obscure table creation operations.

When I read through these diagrams, I get the occasional surprise, or reminder of an odd fact I’ve forgotten.  For example, the computed_column_definition diagram shows that one can use a computed column, if persisted, to be a reference to a foreign key.  One doesn’t even have to specify the column in the foreign table if it is the primary key. IDENTITY fields only need to specify start and increment if they vary from the default.

 The MSDN Syntax descriptions

The syntax descriptions provided on MSDN are like Extended Backus-Naur Form (EBNF) but have some conventions from BNF. This is a very simple language.

Angled brackets <> around a symbol, such as the data type example shown below, shows that it is non-terminus and needs a separate definition.

  • Optional items are enclosed in square brackets
  • Groups of expressions are enclosed in curly braces
  • A ‘|’ is a binary operator meaning ‘or’.

This definition of a data type…

… says that we are defining a data_type symbol(<symbol> ::=). This can start with the schema of the data type (if it is user-defined!), followed by the obligatory type_name. After that, we have a bracket, followed by one of these:

  • A specification of precision ,  and maybe scale (if it is a numeric data type)
  • A MAX qualifier if it is a VARCHAR, NVARCHAR  or VARBINARY  data types
  • A specification for an XML schema collection, only to associate an XML type with a schema collection

Finally, we have a closing bracket.

If you see an “n”, it means that the preceding symbol or group can be repeated. For example, [,...n] means that there can be any number of repeats, each separated by a comma.

Summary

The raw BNF of the MSDN documentation is a difficult and error-prone way of trying to understand the syntax of the CREATE TABE statement, or any SQL for that matter. We already know from other database systems that a form of railroad diagrams are a useful aid for understanding how SQL statements work, so why not have them for SQL Server TSQL?

 I did these to improve my own understanding and very much hope that you’ll find them as useful as I did. I realize that, for most purposes, a well-designed GUI is the easiest way to do a job on a Windows system but just occasionally,  code is just plain easier: and to get up and running, nothing beats a clear syntax diagram.

Acknowledgements

Many thanks to the pioneers of the railroad diagram. Also the MSDN CREATE TABLE  page from where the explanatory quotes within the diagrams were taken and adapted.

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

Downloads

This post has been viewed 21337 times – thanks for reading.

Tags: , , ,

  • Rate
    [Total: 65    Average: 4.7/5]
  • Share

Phil Factor

Follow on

View all articles by Phil Factor

  • SAinCA

    Brilliant!
    Thanks, so much, Phil! These are SO much easier to follow and understand, and thanks for the clear annotations that are so educational. Would that MS would adopt this style. I totally agree that the EBNF diagrams, even after a decade of working with them, are non-intuitive and often a roadblock instead of a facilitator.

    MS should commission you to extend railroad diagrams to all T-SQL constructs! The benefits to the community will be HUGE.

    Cheers!

  • macxima

    Great!
    Hi Phil, I’m working with SQL last 8 years but did not get so deeply understand as you described. Really It’s great work.

    Cheers,
    Mukesh

  • Robert young

    Genesis
    For as long as anyone knows, IBM manuals (user, engineer, support, …) have been founded on railroad diagrams. They had a generic name within IBM, but it escapes me for the moment. Backus, after all, was an IBMer.

  • testuser

    Nitpicking
    The diagrams are great, but I really wish that you had used a different font for the text annotations.

    By the way, which software did you use to draw the diagrams?

  • Phil Factor

    Re nitpicking
    Smartdraw. I wouldn’t say it is ideal for this sort of work but it does the job. Some people, such as Doug Crockford, used Visio. The font is my own handwriting. I had the crazy idea, at one time, of doing some of the annotations by hand so I turned my own handwriting into a TTF so I could do both and they’d blend. . Eventually, when i realised how many syntax diagrams there were, i gave up the idea, but I thought ‘ hell, at least it isn’t comic sans’ and kept it. I tried changing it to something like verdona, but it just looked kinda wrong.

  • veritas

    DB2 Diagrams
    I have to agree with Genesis here, although I’m fairly sure I recall when IBM first abandoned the various squiggly brackets and introduced them. Here’s a link to CREATE INDEX, scroll to the bottom:
    http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0000927.htm
    Note that they only use standard characters to create this one, although the paper manuals use solid lines and arrowheads. Far better once you try them.

  • Frank Quintana

    Deja vu
    In 1998 I bought a book titled “SQL Programmer Reference” by Kishore Bhamidipati, published by Osborne/McGraw-Hill ISBN)-)7-882460-5
    I highly recommend this book.
    Mr. Bhamidipati divided his book in three parts
    I-SQL Commands
    II-Syntax Flowcharts
    III- Keywords and Parameters
    The Part II Syntax Flowchart is a display of railroad diagrams to depict DDL, DML and DCL statements.
    Mr. Bhamidipati was a Senior Technical Analyst at Oracle at that time.

  • Tim

    precision, scale
    It looks like you’ve missed a comma between precision and scale in the column definition railroad diagram.

  • Topcat

    Optional vs Required
    This is great Phil, the world runs on images now more than on words. May I suggest one improvement and that is to indicate somehow whether a clause is required or optional. For example common sense and experiment tells me I can’t create a table with only a table constraint and no columns, but the railroad diagram allows this – unless I’m not understanding correctly…

  • veritas

    Re: Optional vs Required
    Looks like a slight error in the structure. The theory is that the required components are on the ‘main line’, in this case left, top to bottom. This is illustrated in defining the table name where the database and/or schema can be ommited. I’ve seen a convention where defaults are in BOLD. DB2 diagrams run left-right with keywords in CAPS and default values above the main line.

  • veritas

    Re: Optional vs Required
    Looks like a slight error in the structure. The theory is that the required components are on the ‘main line’, in this case left, top to bottom. This is illustrated in defining the table name where the database and/or schema can be ommited. I’ve seen a convention where defaults are in BOLD. DB2 diagrams run left-right with keywords in CAPS and default values above the main line.

  • Phil Factor

    Re: suggested improvements
    @Tim Now fixed.
    @Topcat Yes, This is a good idea. I suspect that we’ll hit this problem quite often. At the moment I can’t think how to show this clearly. There has to be a column, unless it is a filetable, but it can be preceded by a computed column. I have another complaint, which is that there seems no way of showing what the default setting is if you don’t make it explicit. e.g. That IDENTITY has a seed of 1 and increment of 1, or that the default setting of IGNORE_DUP_KEY is ‘off’

  • Ray

    Syntax Diagrams
    Thanks for the brain bump Phil. I have not seen the "railroad track" approach in 30+ years. The original "Pascal Report" by Niklaus Wirth, July 1973 includes the full language specification in the this format. I think that was one of the reasons I found Pascal syntax generally easy to master.
    WikiPedia has a nice article which includes a link to the original report in pdf. Scroll to the appendix and you will see the pages that filled many of my grad school nights back in the dawn of time 🙂
    http://en.wikipedia.org/wiki/Pascal_(programming_language)
    http://www.standardpascal.com/The_Programming_Language_Pascal_1973.pdf

  • roundand

    Railroad Diagrams are under-appreciated!
    And it’s great to be endorsed by Phil Factor himself! Like you I sometimes write them to ensure I understand syntax constraints.
    They can be used to illustrate any more-or-less linear state machine. Some years back I documented a development tool we wrote with various APIs for integrating with external systems, each API consisting of many function calls, so used railroad diagrams to document the legal sequence of function calls in each API.

  • Colin Daley

    Optional vs Required
    @Topcat Well, the Books Online grammar also says that you can create a table with a table constraint but no columns. To fix the problem Phil would have to start there, because he is simply representing that grammar visually.

    However, you must remember that it is not always possible to state *everything* in a context-free grammar, the classical example being the "dangling else"[1].

    When such anomalies are found, the clarification can be put in text annotations ("the else belongs to the innermost if"). Default values can go there too.

    [1]http://en.wikipedia.org/wiki/Dangling_else

  • JonathanRoberts

    Great way to display syntax
    I first used these diagrams when I started programming in Pascal (over 25 years ago) from Kathleen Jensen and Niklaus Wirth "Pascal User Manual and Report" which was published in 1970. I have always thought them a great way to display syntax.

  • Jeff Moden

    Mixed Feelings
    While railroad diagrams are nice to look at and can certainly be easier to read than code with a million optional braces, they cannot be copied and pasted into SSMS for the simple stuff. For example, the syntax layout for BULK INSERT is easy to copy and paste and then you simply delete what you don’t want to use.

    I agree with most that railroad diagrams can make it very simple to understand syntactical requirements but that should never be a reason not to provide copy-able code examples.