Click here to monitor SSC
Av rating:
Total votes: 91
Total comments: 32


Joe Celko
Getting rid of SQL Code
20 August 2009

Joe becomes intrigued by the way that experts make errors in any area of technology, and suggests that the problem is more that of mindsets than lack of knowledge. He illustrates the point with SQL Development by means of the "Britney Spears, Automobiles and Squids" table, and  the tangled Stored procedure, and shows ways of getting rid of both procedural and non-procedural code by adopting a different programming mindset..

Errors in complex technology

The book "The Logic of Failure: Recognizing and Avoiding Error in Complex Situations" by Dietrich Dörner (ISBN-13: 978-0201479485; 1997) explains the mindsets of people who were trying to solve a problem and got it wrong. He goes into details of the action of the operators of the Chernobyl reactor, who were all trained experts. They weren’t incompetent.  They ignored safety standards because they knew what they were doing. Dörner identifies the mindsets that cause failures, and we can apply this insight to programming.

First mindset problem:

Once you have started in one direction, you stick to it.  But the Turkish proverb is right: “No matter how far you have travelled down the wrong road, turn around”. 

Second mindset problem:

Most programmers started with procedural languages and still carry that mindset into declarative SQL.  This is another proverb: “To a child with a hammer, everything looks like a nail.”

In SQL, I usually find this error in attempts to save bad DDL with TRIGGERs, CURSORs, stored procedures and application code.  Those are the ways to get procedural code into SQL. 

Third mindset problem:

People prefer static mental models, rather than a dynamic model.  This was also the model for traditional programming languages – when you compile a program, you get the same executable code every time.  That does not work in SQL; this is why we have query optimizers rather than procedural code in RDBMS. 

An example of that mindset was a small company that did medical lab reports over the internet.  They had to deliver 24 hour or less turn-around of accurate data.  The clumsy schema required convoluted queries and lots of procedural code to validate the data.  Most of the data validation could have been done declaratively much faster. Hey, it works, so why change it?   My personal favourite was keeping numeric data in strings, then casting both blanks and NULLs to zero. 

The business was successful and the data volume started growing.  But the queries were growing slower at a faster rate than the company, so you could see the train wreck coming.  Management’s solution was to buy more hardware.  This is like adding more rails to the end of the track; if the bridge is wrecked this does not solve the fundamental problem. 

Fourth mindset problem: 

People have a strong tendency to focus on immediately problems and not ask what will happen next.  This will then lead to problems caused by the solutions.  This is also known as the Law of Unintended Consequences.  There is a wonderful book "The Systems Bible: The Beginner's Guide to Systems Large and Small (Third Edition of SYSTEMANTICS)" by John Gall (ISBN 0-9618251-7-0; 2003) that details this phenomena. 

In IT, there is a proverb: “Nothing is more permanent than a ‘temporary patch’ in software”; I will let the reader fill in his favourite horror story here – we all have one.

We’ll illustrate a few of these mindset problems in some code written by a competent expert.  

Getting Rid of Procedural Code

This is from a real email I got.  I was asked if I could see any problems with the stored procedure below and if I had any tips regarding naming conventions or a better approach.   

ALTER PROCEDURE [dbo].[SaveEmployee]
 @Employee_Id INT OUTPUT,
 
@Employee_Username VARCHAR(25),
 
@Employee_FirstName VARCHAR(25),
 
@Employee_LastName VARCHAR(25),
 
@Employee_Status INT,
 
@Employee_Created DATETIME OUTPUT,
 
@Employee_Modified DATETIME OUTPUT,
 
@UsernameExists INT OUTPUT,
 
@CommandStatus INT
AS
BEGIN
 SET NOCOUNT ON;
 
IF @Employee_Id > 0
  
BEGIN
-- If employee exists, check if the employee's username already belongs to another employee
  
IF NOT EXISTS
      (
SELECT TOP 1 Employee_Id
        
FROM dbo.Employees AS A
        
WHERE NOT A.Employee_Id = @Employee_Id
          
AND Employee_Username = @Employee_Username)
  
BEGIN
  
SET @Employee_Modified = GETDATE();
  
UPDATE dbo.Employees
      
SET Employee_Username = @Employee_Username,
          
Employee_FirstName = @Employee_FirstName,
          
Employee_LastName = @Employee_LastName,
         
Employee_Status = @Employee_Status,
          
Employee_Modified = @Employee_Modified
    
WHERE Employee_Id = @Employee_Id;    
    
-- If employee is deleted, delete the employee orders
    
IF @Employee_Status = 3 -- Employee is deleted
      
BEGIN
       UPDATE
Employee_Orders
      
SET Employees_Orders_Status = 3
    
WHERE Employee_Id = @Employee_Id;
      
END
    END
   ELSE
    BEGIN
     SET
@UsernameExists = 1;
    
SET @CommandStatus = 3; -- Indicate failure
    
END
  END
 ELSE
  BEGIN
-- If new employee, check if employee's username already exists
  
IF NOT EXISTS
      (
SELECT TOP 1 Employee_Id
        
FROM dbo.Employees AS A
      
 WHERE Employee_Username = @Employee_Username)
    
BEGIN
    
INSERT INTO Employees (Employee_Username, Employee_FirstName, Employee_LastName, Employee_Status, Employee_Created)
   
VALUES (@Employee_Username, @Employee_FirstName, @Employee_LastName, @Employee_Status, @Employee_Created);
    
    
SET @Employee_Id = SCOPE_IDENTITY();
    
END
   ELSE
    BEGIN
     SET
@UsernameExists = 1;
    
SET @CommandStatus = 3; -- Indicate failure
    
END
   END
END
;

There is a lot of stuff that we don’t do in SQL here.  It looks more like a COBOL or BASIC program written in T-SQL.

  1. The “Employees” table ought to be named “Personnel” because it is a set.  But if you are still writing sequential file processing, your mindset only lets you see the trees, one at a time, instead of the forest. 
    Can you figure out why he would use “A” as an alias for the Employees table?  Alphabetical order and sequential thinking again!  In later code he used “B” for the second alias, etc.

  2. I'd suggest that Pascal and camelCase is bad for maintaining code. 

  3. What do the OUTPUT values do?  Well, @Employee_Id returns an IDENTITY value, so we have no verification or validation possible for this data element.  You need better ids than a count of physical insertion attempts to the hardware.
    @Employee_Created and @Employee_Modified are audit meta-data and should not be here at all.  You never put audit data in the table being audited.  But even here, he used getdate()  instead CURRENT_TIMESTAMP just to be proprietary. 
    @UsernameExists is a local flag, just like you would write in Assembly language.  Remember those days?  Make a test and look at a register!  He does not understand that SQL would use a predicate. 

  4. Look at more assembly language programming with @CommandStatus

  5. Why write “NOT EXISTS (SELECT TOP 1 ..)”?  This is silly, proprietary and it doesn’t even have an ORDER BY clause.  Now look at “WHERE NOT A.employee_id = @employee_id” and think about trying to maintain the code. 

  6. @CommandStatus is an example of “flag coupling”, one of the worst kinds of coupling in Software Engineering.  This procedure depends on that flag, this module will do completely different things. Look at what happens when you add a new guy with (@CommandStatus = 3); insert him into the Employees table, flag his orders for deletion and finally report an error with invented codes.  ARRGH! 

When you write older procedural code, the application and the data are coupled in the same module.  You have to do everything in one place.  There were no log files for audit data.  Files had no constraints.  The closest thing to limiting user access was a write-protect ring on a reel of magnetic tape (Google it if you are too young to remember). 

SQL uses DDL, DML and DCL sublanguages to assure data integrity and DDL is the most important line of defense. Let’s put all of those IF-THEN-ELSE control flows from the original procedure into constraints:

CREATE TABLE Personnel
(emp_email VARCHAR(255) NOT NULL
    
CONSTRAINT no_dup_emails PRIMARY KEY
    CONSTRAINT CHECK
(<< company VALIDATION RULE >>),
 
emp_username VARCHAR (25) NOT NULL
  
CONSTRAINT no_dup_usernames UNIQUE,
 
emp_firstname VARCHAR (25) NOT NULL,
 
emp_lastname VARCHAR (25) NOT NULL,
 
..);

Now back to a coherent module that does one job in itself.

CREATE PROCEDURE InsertNewEmployee
(@in_emp_email VARCHAR(255), -- company assigned
 @in_emp_username VARCHAR(25), -- employee choice
 @in_emp_firstname VARCHAR(25),
 
@in_emp_lastname VARCHAR(25))
AS
BEGIN
INSERT INTO
Personnel (emp_email, emp_username, emp_firstname, emp_lastname)
VALUES (@in_emp_email, @in_emp_username, @in_emp_firstname, @in_emp_lastname);
 
<< error handling here >>
END;

Pick your favorite error handling method here.  There is the traditional @@ERROR and RAISERROR or the new TRY-CATCH method. You will get the name of the constraint for display in the error message.

There is no need to invent a new @EmployeeStatus code to pass along to calling programs.  That was COBOL, 40 years ago.  The only serious problem is that SQL Server does not support the Standard SQLSTATE yet. 

The audit data is being done with a third party tool outside the table; deleting a row will not destroy the audit trail anymore.  

For this problem, I would put the update of an existing employee in a separate “CREATE PROCEDURE UpdateEmployee()” so that I can add some DCL on it so nobody can change other people’s data.  But if you want to have this in one module, you can use the new MERGE statement:

MERGE INTO Personnel -targettable

USING (SELECT emp_email, emp_username, emp_firstname, emp_lastname

        FROM (VALUES (@in_emp_email, @in_emp_username,

                      @in_emp_firstname, @in_emp_lastname)) AS X)

ON Personnel.emp_email = X.emp_email

WHEN MATCHED

THEN UPDATE SET emp_username = X.emp_username,

            emp_firstname = X.emp_firstname,

            emp_lastname = X.emp_lastname

WHEN NOT MATCHED

THEN INSERT (emp_email, emp_username, emp_firstname, emp_lastname)

     VALUES (X.emp_email,X.emp_username, X.emp_firstname, X.emp_lastname);

Remember to catch errors, of course. The advantages of the MERGE over all that procedural code are huge because

  1.  It is shorter and easier to read than line after line of procedural code.  

  2. It is Standard SQL!  It will port.  Other SQL programmers can read it and maintain it. 

  3. It is one statement, so there is no worry about isolation levels and other sessions.  Your error messages are for just one SQL statement rather than several different ones in a multi-statement block of code. 

  4. It is one statement, so it can be optimized.  Procedural language compilers do their optimizations in multiple passes and have elaborate symbol tables and other structures to get a good executable module. T-SQL is a simple one-pass compiler.  That is why you have to use the “@” prefix.  The prefix tells the compiler about the nature of the variable so it does not have to use a symbol table. 

Getting Rid of Non-Procedural Code

One common problem in SQL Server (but not other SQLs) is that it overly sensitive to referential cycles.  If Table A references Table B twice or more, you get an error message.  This makes it hard to have a data element play two or more roles in a table. 

But most of the time, the problem is a design flaw.  Here is an actual skeleton schema from a Newsgroup posting.  The poster asked what the design flaw(s) were.

CREATE TABLE Employees
(emp_ssn CHAR(9) NOT NULL,
 
emp_name VARCHAR(50) NOT NULL,
 
works_for CHAR(4) NOT NULL,
 
CONSTRAINT pk_Emp PRIMARY KEY(emp_ssn));
 
CREATE TABLE Departments
(dept_code CHAR(4) NOT NULL,
 
dept_name VARCHAR(30) NOT NULL,
 
dept_boss CHAR(9) NOT NULL,
  
CONSTRAINT pk_Dept PRIMARY KEY (dept_code),  
  
CONSTRAINT uq_Dept UNIQUE (dept_name));
ALTER TABLE Employees ADD CONSTRAINT fk_Emp_Dept
 FOREIGN KEY (works_for)
 
REFERENCES Departments (dept_code)
  
ON DELETE CASCADE
  ON UPDATE CASCADE
;
ALTER TABLE Departments ADD CONSTRAINT fk_Dept_Emp
 FOREIGN KEY (dept_boss)
 
REFERENCES Employees (emp_ssn)
  
ON DELETE CASCADE
  ON UPDATE CASCADE
;

SQL Server will return an error when you run this code:

Msg 1785, Level 16, State 0, Line 1 Introducing FOREIGN KEY constraint 'fk_Dept_Emp' on table 'Departments' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.

Msg 1750, Level 16, State 0, Line 1 Could not creates constraint. See previous errors..

The immediate flaws are improper data element names, which is easy to fix with a global text replacement, but let’s get this out of the way. 

  1. The Employees table should be named Personnel.  It models a set and not one record at a time.  Even Microsoft gave up on camelCase, so the constraint names should be changed, too. 

  2. The "works_for" column is a bad data element name.  It is a verb that describes a relationship and not the name of an attribute.  We will get to that problem shortly.  It is serious and fundamental.

  3. "dept_code" is not an identifier any more that "shipping_code" would be.  A code is a different kind of attribute than a tag number or name. It would be something like “unfunded”, “over budget”, etc. 

  4. Being the department boss is not a property of a department, but of an employee's job assignment.  Is the driver a part of the automobile? 

  5. Likewise, "dept_boss" is a role played by some emp_ssn.  It is not some special attribute of a department. 

In short, you have built "Britney Spears, Automobiles and Squids" tables.  You have used improper data element names.  Let's fix the data element names first and consolidate the table declaration into one declarative statement per table. 

CREATE TABLE Personnel_and_OrgChart -- mixed table
(emp_ssn CHAR(9) NOT NULL PRIMARY KEY,
 
emp_name VARCHAR(50) NOT NULL,
 
works_for_dept_nbr CHAR(4) NOT NULL
  
REFERENCES Departments (dept_nbr)
    
ON DELETE CASCADE
    ON UPDATE CASCADE
);
CREATE TABLE Departments_and_Job_Assignments --mixed table
(dept_nbr CHAR(4) NOT NULL PRIMARY KEY,
 
dept_name VARCHAR(30) NOT NULL UNIQUE,
 
dept_boss_emp_ssn CHAR(9) NOT NULL
  
REFERENCES Personnel (emp_ssn)
    
ON DELETE CASCADE
    ON UPDATE CASCADE
);

While it is a good idea to name constraints, remember that you do it so that you can see those names in errors messages or use them to alter the table.  Naming a PRIMARY KEY is a bit redundant, isn’t it?  The error message will tell you about such violations.

Let’s next get Britney Spears and the Squids into their own tables by splitting out entities and relationships. 

-- tables that model entities
CREATE TABLE Personnel
(emp_ssn CHAR(9) NOT NULL PRIMARY KEY,
 
emp_name VARCHAR(50) NOT NULL,
 
..);
CREATE TABLE Departments
(dept_nbr CHAR(4) NOT NULL PRIMARY KEY,
 
dept_name VARCHAR(30) NOT NULL UNIQUE
 ..);
-- tables that model relationships
CREATE TABLE OrgChart
(job_title CHAR(9) NOT NULL,
 
dept_nbr CHAR(4) NOT NULL
  
REFERENCES Departments (dept_nbr)
  
ON DELETE CASCADE
  ON UPDATE CASCADE
,
 
PRIMARY KEY (job_title, dept_nbr),
  <<
nested set model>>);

For this example, let’s not bother with the organizational hierarchy and just assume that we have a nested sets model in the rest of the table.  I also made a two-column the key just to make the next table more interesting. 

CREATE TABLE Job_Assignments
(dept_nbr CHAR(4) NOT NULL
 
dept_name VARCHAR(30) NOT NULL,
 
FOREIGN KEY (job_title, dept_nbr)
  
REFERENCES OrgChart (job_title, dept_nbr)
    
ON DELETE CASCADE
     ON UPDATE CASCADE
,
 
emp_ssn CHAR(9) NOT NULL
  
REFERENCES Personnel (emp_ssn)
    
ON DELETE CASCADE
     ON UPDATE CASCADE
,
 
PRIMARY KEY (job_title, dept_nbr, emp_ssn),
 
..);

The DRI cycle problem is now gone. 



This article has been viewed 20486 times.
Joe Celko

Author profile: Joe Celko

Joe Celko is one of the most widely read of all writers about SQL, and was the winner of the DBMS Magazine Reader's Choice Award four consecutive years. He is an independent consultant living in Austin, TX. He has taught SQL in the US, UK, the Nordic countries, South America and Africa.
He served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards.
He has written over 800 columns in the computer trade and academic press, mostly dealing with data and databases. He is the author of eight books on SQL for Morgan-Kaufmann, including the best selling SQL FOR SMARTIES.
Joe is a well-known figure on Newsgroups and Forums, and he is famous for his his dry wit. He is also interested in Science Fiction.

Search for other articles by Joe Celko

Rate this article:   Avg rating: from a total of 91 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: comment
Posted by: BuggyFunBunny (view profile)
Posted on: Friday, August 21, 2009 at 8:09 AM
Message: >> “No matter how far you have travelled down the wrong road, turn around”.

Ah. In the first day of any Econ 101 course, a decent instructor will share with the students the Prime Directive: sunk costs are irrelevant to decision making.


>> “To a child with a hammer, everything looks like a nail.”

Ah. To a coder with a task, any answer looks like a loop.


>> “Nothing is more permanent than a ‘temporary patch’ in software”;

Ah. Useless factoid: TSO (IBM mainframe timesharing for those not quite old enough) was for, at least most if not all, of its existence a PTF (program temporary fix).


>> It looks more like a COBOL or BASIC program written in T-SQL.

Ah. In my experience, most DB2 is written by COBOL coders. I'd wager a nickel that for SQLServer it's BASIC. For the same reason.





Subject: Weinberg's "Psychology of Computer Programming"
Posted by: Joe Celko (not signed in)
Posted on: Friday, August 21, 2009 at 10:43 AM
Message: >> Ah. In my experience, most DB2 is written by COBOL coders. I'd wager a nickel that for SQL Server it's BASIC. For the same reason. <<

No bet. "The Psychology of Computer Programming: Silver Anniversary Edition" by Gerald M. Weinberg had a study of how you could tell the native programming language of the student by their PL/I code.

Years ago, I did an article in the JOURNAL OF STRUCTURED PROGRAMMING about how you could tell the native spoken language of the student by their code.

Subject: You ignore cultural differences
Posted by: Anonymous (not signed in)
Posted on: Friday, August 21, 2009 at 1:04 PM
Message: I don't think it makes much sense to make the following statements without qualifying which culture(s) you are speaking about:

"The “Employees” table ought to be named “Personnel” because it is a set."

"I'd suggest that Pascal and camelCase is bad for maintaining code."

There is a lot of talented and skilled people who are reading articles on this site, and who are not native English speakers. Working in a team where everyone come from different countries, I think both these statements are wrong.

1. Employees table name requires less knowledge of subtleties of English language and as such may be strongly preferable to Personnel.

2. Perception of CamelCase may be very different for people who typically write from right to left, and/or for people who typically write in Cyrillic letter, Chinese, or Hindu.

Subject: camelCase
Posted by: timothyawiseman@gmail.com (view profile)
Posted on: Friday, August 21, 2009 at 9:50 PM
Message: As always an excellent article, but I am confused as to why you are so against camelCase. Perhaps it is merely that I am accustomed to it, but I find it easier to read than most other variable/column/table naming schemes.

Also, is the JOURNAL OF STRUCTURED PROGRAMMING on finding languages still available somewhere? It sounds rather interesting.

Subject: what is wrong with my column names
Posted by: Anonymous (not signed in)
Posted on: Saturday, August 22, 2009 at 8:46 PM
Message: I liked much of the article, but I cannot see what is wrong with column names such as CreatedAt and ReviewedBy. Sure they are verbs, so what? They are easy to understand and unambiguous.

Also the following makes little or no practical sense: "@Employee_Created and @Employee_Modified are audit meta-data and should not be here at all. You never put audit data in the table being audited."

How do we know if these columns are used for audit purposes or not? Maybe you know, but you did not communicate it to us. In my experience, many of such columns are not used for audit purposes, so your assumptions may be unfounded.

Subject: Correction to table definition
Posted by: Andy Dent (not signed in)
Posted on: Sunday, August 23, 2009 at 9:28 PM
Message: CREATE TABLE Job_Assignments
(dept_nbr CHAR(4) NOT NULL
dept_name VARCHAR(30) NOT NULL,


should be
CREATE TABLE Job_Assignments
(dept_nbr CHAR(4) NOT NULL
job_title VARCHAR(30) NOT NULL,


Subject: A habit for the irrelevant
Posted by: CrankyRat (not signed in)
Posted on: Monday, August 24, 2009 at 1:15 AM
Message: Some good info here but Celko has the irritating propensity to highlight the utterly unimportant. His discussion of procedural habits being used in SQL is sound but his endless comments on column naming, proprietary language use (CURRENT_TIMESTAMP ), CamelCase, etc. is nonsense and silly. Better writers emphasize consistency and leave it at that.

Subject: SQL Programming
Posted by: peacekalya@gmail.com (not signed in)
Posted on: Monday, August 24, 2009 at 1:17 AM
Message: I have always taught i was a supperb SQl programmer but after having alook at the above samples of SQL codes, i now know for sure that i vahe to improve.
One thing i am using and failing to understand why i must stop using is employees instead of personel. i dont andurstand the objection reason. could someone plz ellaborate further on that.

Otherwise thanks for the Lesson.

Subject: CamelCase.
Posted by: Phil Factor (view profile)
Posted on: Monday, August 24, 2009 at 3:17 AM
Message: Joe has written at some length on both the CamelCase issue and the 'Employees/Personnel' preference. On page 14 of his great book 'SQL Programming Style' he says...

'Collective or class names are better than singular names because a table is a set and not a scalar value. If I say "Employee", the mental picture is of Dilbert standing by himself - one generic employee. If I say "Employees", the mental picture is of the crew from Dilbert-a collection of separate employees. If I say "Personnel", the mental picture is suddenly more abstract- a class without particular faces on it.'

Again, CamelCase is covered on page 29, in more detail than I can reproduce here. Although he argues persuasively, and I agree with many of his points, I'd like to read in more detail the Psychological research that provides the evidence that it slows reading speed etc.

Until someone else can produce a book on SQL Programming Style that supplants Joe's, Id suggest it is the best basis for standards that we've got, and any team leader will, I'm sure, agree that the most comprehensive standard in place at the time is the best one to adopt, with whatever exceptions are required for particular conditions.

Subject: violent agreement
Posted by: randyvol (view profile)
Posted on: Monday, August 24, 2009 at 6:32 AM
Message: Joe - I could not agree more with your observations, except one. In reviewing your article and all the posts, apparently many from wizened, gray-beards, I'm surprise that anyone would choose to name a set 'Personnel' any longer, as that is soooooooo '70s.

Any person with a passing regard for the PC-mentality of our culture knows the real name for such a set is 'Human_Resources' ;-)

Subject: Interesting
Posted by: bayonet-recon (view profile)
Posted on: Monday, August 24, 2009 at 9:39 AM
Message: "Is the driver a part of the automobile?"

In NASCAR s/he is. Just kidding. ;)

Another good article. I find myself guilty of a few things and guess I have more to learn than I thought.

As others have said, I do disagree with the Employees vs. Personnel name. It's an issue of semantics and depends on which definition of "Personnel" you use (http://www.answers.com/personnel).

I have and will continue to use Employees even if it bars me from SQL heaven.

http://www.answers.com/topic/employee

" A person who works for another in return for financial or other compensation."

That seems to fit the bill of employees, consultants, and contract positions.

Subject: Nice article
Posted by: Ryan Cristobal (not signed in)
Posted on: Tuesday, August 25, 2009 at 12:49 AM
Message: Very good article.

Subject: CamelCase and Personnel
Posted by: TrahtenPizdulBljahaMux (not signed in)
Posted on: Tuesday, August 25, 2009 at 12:28 PM
Message: I do strongly agree with camelCase views in this article.
It makes it easier to read which is bad for the code keeping because it speeds up the troubleshooting which in turn reduces the final payment for those who are on per hour rate.

Employees table should be named Slave because each record identify a single person who works for another person which makes the first one a slave.
A free man always works for himself.
And also it will attract much more attention due to it's political incorrectness, thus make it easier to book more time later on to rename a bloody table to something like PerSonNel which covers up the truth better and increases the final payment for those who are on per hour rate becaue eventualy it will be renamed to Employees as it provides the same cover for the truth but is more widely used and therefore better understood. It must never ever be called Human_Resources in any case because it is a disgrace to call a living human creature a resourse as if it was oil or gold or something which is good for you and can be resold with a healthy margin.


And the final one.
The IF clauses should be banned in SQL as they are for girls. The real boys go ahead whith no IFs or BUTs , just plain and straight "format c:" or whatever is the cooliest SQL statement.


Subject: Curious
Posted by: WarpedSQL (not signed in)
Posted on: Tuesday, August 25, 2009 at 2:12 PM
Message: Nice article, but why point out issues in someone elses code when your code contains the same issues? I don't understand..

"Can you figure out why he would use “A” as an alias for the Employees table? Alphabetical order and sequential thinking again! In later code he used “B” for the second alias, etc."

but in your code..

"..MERGE INTO Personnel -– targettable
USING (SELECT emp_email, emp_username, emp_firstname, emp_lastname

FROM (VALUES (@in_emp_email, @in_emp_username,

@in_emp_firstname, @in_emp_lastname)) AS X)
"

What does 'X' mean?

Subject: X marks the spot
Posted by: BuggyFunBunny (view profile)
Posted on: Tuesday, August 25, 2009 at 6:54 PM
Message: >> What does 'X' mean?

it's a required (by the parser) correlation (in some dialects) name which is not required in most others. it's a placeholder.

Subject: Employees vs Departments
Posted by: Anonymous (not signed in)
Posted on: Wednesday, August 26, 2009 at 2:52 PM
Message: I name my "Personnel" table "Employee" because I spend most of my time working with the "Employee" class in c#, not the table. Thinking about an employee makes more sense in that context.

Subject: as always, very interesting
Posted by: Alex_Kuznetsov (view profile)
Posted on: Thursday, August 27, 2009 at 1:36 PM
Message: As always, very interesting! A very consistent and logical approach. Yet as usual, I am getting the feeling that I live and work in a similar but definitely different world. Regarding moving functionality to constraints, eliminating IFs and such, I cannot agree more. Regarding style, I cannot agree at all. I agree with the previous commenter - having a class named Employee, for my team it is simpler and better to have a table named Employees.

BTW when _I_ read Personnel, it is very similar to Russian word Personal, so _my_ mental picture is of some office in Russia. So all the further reasoning does not apply to me. You need to be very careful when you extend the conclusions based on your own perception to a broader audience.

And as usual, unfortunately this excellent advice is followed by well known fallacies such as:
"@Employee_Created and @Employee_Modified are audit meta-data and should not be here at al" when nothing indicates if these columns are used for audit or for other needs, standard rant about identities, standard rant about CamelCase which does not apply for many parts of the world, and so on. We have debunked all this many times...

Subject: Quick replies
Posted by: Joe Celko (not signed in)
Posted on: Thursday, August 27, 2009 at 2:32 PM
Message: Thanks to everyone for the feedback!

The JOURNAL OF STRUCTURED PROGRAMMING went out of business a long time ago. My academic journal writing for Springer was done in ABACUS, a publication aimed at naive computer users. It was supposed to the "Popular Science of Comp Sci" and was related to the ACM at time. I ran the Puzzle Column.

>> Perception of CamelCase may be very different for people who typically write from right to left, and/or for people who typically write in Cyrillic letter, Chinese, or Hindu. <<

Latin, Cyrillic and Greek have two cases and the reader’s eye jump to the uppercase letters in all of them. Arabic has four (initial, middle, terminal and stand-alone in the connected script). I taught at a school that was largely Arabic; that is where you learn about how much left-to-right versus right-to-left is engrained. Where do you staple your home work?

>> .. I cannot see what is wrong with column names such as CreatedAt and ReviewedBy. Sure they are verbs, so what? They are easy to understand and unambiguous. <<

Verbs are for actions; nouns are for names. So you would use "<something>_creation_date" and "reviewer_emp_id"; note the guesses I made about the role prefixed to the base data element name. The “reviewed_by" might be a person, a deadline, a process name, etc.

>> should be :
CREATE TABLE Job_Assignments
(dept_nbr CHAR(4) NOT NULL
job_title VARCHAR(30) NOT NULL,
..);

Opps!

>> Some good info here but Celko has the irritating propensity to highlight the utterly unimportant. His discussion of procedural habits being used in SQL is sound but his endless comments on column naming, proprietary language use (CURRENT_TIMESTAMP ), CamelCase, etc. is nonsense and silly. Better writers emphasize consistency and leave it at that. <<

Human factors research and ISO disagree. When I was at AIRMICS (old US Army research group), we found that global naming conventions saved 8 to 12% of maintenance time.

>> Any person with a passing regard for the PC-mentality of our culture knows the real name for such a set is 'Human_Resources' ;-) <<

Ghod! I hate that name, too. If I have “Natural resources” does that make me an Un-Natural resource?

>> but in your code..

"..MERGE INTO Personnel -– targettable
USING (SELECT emp_email, emp_username, emp_firstname, emp_lastname
FROM (VALUES (@in_emp_email, @in_emp_username, @in_emp_firstname, @in_emp_lastname)) AS X)
"

What does 'X' mean? <<

I agree with you on that. The table constructor has to be given a name for this to work, but a better would have been something like "new_emp_record", "new_login_form" or something descriptive. But I have global change & replace in my editor!

>> I name my "Personnel" table "Employee" because I spend most of my time working with the "Employee" class in c#, not the table. Thinking about an employee makes more sense in that context. <<

This is record-at-a-time mindset versus set-at-a-time mindset. In a few years, you will be using F# or some other functional language and only have a set-at-a-time mindset. We had the same thing in IDEF and earlier naming systems for files or non-RDBMS products.


Subject: micromanaging is not always welcome
Posted by: Alex_Kuznetsov (view profile)
Posted on: Friday, August 28, 2009 at 9:03 AM
Message: Well, it might be true that "global naming conventions saved 8 to 12% of maintenance time. "
However, other approaches such as defensive programming and proper unit testing may cut your maintenance time several times, not several percent. If you invest too much of your time in naming conventions, you are not investing it in much more profitable improvements. And you know what? Many of the very best developers do not like too much micromanagement. If you try to impose the standards they do not agree with and do not consider important, they simply move on and innovate somewhere else. If that happens, then your maintenance time goes up because your best developers are gone - "the road to hell is paved with good intentions"...

Subject: challenge
Posted by: Mad (not signed in)
Posted on: Thursday, September 03, 2009 at 7:34 AM
Message: First of all, I speak german. Nouns for columnnames are nice for me. However camelCase is a lot easier to read than lowercasecolumnnamesblabla. It's just that my eyes snap to from one important word to the next in the identifier. Might be that the english speaking programmers find it easier the other way around, but not us.

Thinking of Sets and Collections is irrelevant in my opinion. It all comes down to what you make of it. In c,c#... you treat all of the items of a collection in a loop. In Sql the loop is hidden in the SqlEngine. For example:

Tablescan -> foreach
Clustered Index on an int -> SortedList<number>

select * from table order by column -> list.Sort(delegate{...});

thats all the same! You operate on the whole list(set). You can also operate on single items, its just that sql is not optimized for that.

In the end you have to agree, that a table is just a table, being a list of rows, that are tuples of values. If you really want to point it out that it is a set name it employeeset(or EmployeeSet).

I find it nice to store everything that logically belongs together in a seperate table. But sometimes it simply does not make sense. If the table stores all Employees, the Job belongs to the Person, from a certain Perspective. Dealing with more tables is just increasing Maintenance time. Of course having too much or the wrong things in a table does also. Its just about finding the right portion.

Having to insert, update, delete two tables for just one column looks kind of stupid for me.

I could also:

create table emails
(
id int identity(1,1),
email varchar(64)
)

create table names
(
emailid int references emails(id)
firstname varchar(64),
LastName varchar(64)
)

I could even split first and lastname. But you would agree that would be stupid.

If the OrgChart is just that simple, it is and it is the BEST solution to keep job and employee together.

One last word to your mindsets:

No. 1: thats a general one

No. 2: it's not how something looks to you, it's just about whether you know how it should look. If you write sql procedural, you just don't know(understand) it.

No. 3: Thats general stupidity. You wouldn't store a number in a string in c,c#,asm...

No. 4: Thats Project Management and Planning

Write a tutorial, with Emploees, Jobs and Departments nothing else. Every Employer has exactly one Job and belongs exactly to one dep. And explain why you chose you schema, so that others, such as myself might benefit from it
Send me a link to mhn@gmx.at

Regards

Subject: Touché Alex_Kuznetsov :)
Posted by: Anonymous (not signed in)
Posted on: Thursday, September 03, 2009 at 9:28 AM
Message: The old saying "Don't sweat it 100 years from now no none one will care...well in the software world it more like "10 years from now"... :)



Subject: Table aliases considered harmful
Posted by: Anonymous (not signed in)
Posted on: Thursday, September 03, 2009 at 10:30 AM
Message: You mention the "problem" of using "A" as an alias for the Employees table. WHY USE TABLE ALIASES AT ALL?

Occasionally, table aliases are required (such as derived table), but only in special cases. The usual Join syntax doesn't need table aliases AT ALL, and I hate it when I see people "learning" that the Join syntax needs table aliases.

Sure, it can save the programmer from typing 6 or 8 extra characters. But I don't think the speed of SQL programming is limited by how fast you can type. What's more important is how readable the code is. REMOVE the aliases completely!

I really, really wish programmers and DBAs would stop using table aliases when they are not necessary. If you can't type "Personnel" or "Employees" three or four times without hurting your poor, overworked fingers, then use Copy and Paste to copy the table name.

Aliases hurt readability to a greater degree than programmers realize.

Subject: Re: Table aliases considered harmful
Posted by: Phil Factor (view profile)
Posted on: Thursday, September 03, 2009 at 10:35 AM
Message: I have to agree with  the comment 'Table aliases considered harmful'. They are grossly overused and seem to be often used as an amateur obfuscation device. There seems to be a common assumption that tables should always be aliased. I've never understood why.



Subject: Table aliases considered harmful
Posted by: Anonymous (not signed in)
Posted on: Thursday, September 03, 2009 at 10:37 AM
Message: You mention the "problem" of using "A" as an alias for the Employees table. WHY USE TABLE ALIASES AT ALL?

Occasionally, table aliases are required (such as derived table), but only in special cases. The usual Join syntax doesn't need table aliases AT ALL, and I hate it when I see people "learning" that the Join syntax needs table aliases.

Sure, it can save the programmer from typing 6 or 8 extra characters. But I don't think the speed of SQL programming is limited by how fast you can type. What's more important is how readable the code is. REMOVE the aliases completely!

I really, really wish programmers and DBAs would stop using table aliases when they are not necessary. If you can't type "Personnel" or "Employees" three or four times without hurting your poor, overworked fingers, then use Copy and Paste to copy the table name.

Aliases hurt readability to a greater degree than programmers realize.

Subject: Aliases are GRRRREAT!
Posted by: JJEugene (view profile)
Posted on: Thursday, September 03, 2009 at 11:01 AM
Message: To those who think that aliases hurt readability, I disagree. I find a properly aliased query to be a lot easier to read than repeating the entire table name or using an unrelated set of characters (like 'A' for Employee table). It has nothing to do with "poor fingers" having to type more characters.

Aside from readability, I find aliases help with content absorbtion. Suppose a query has 5 tables and a list of 20 fields in the select statement. Further, suppose I'm not familiar with the database. Knowing which tables the fields come from by looking at the query and not having to look it up in 5 different tables is a great help to internalizing what the query is doing. You may be familiar with your database and query, but what about your future replacement? Don't you owe it to your company to create code that is as easily maintainable as possible?

Subject: I use to work where Procedural Coders made all the DB Design Decisions
Posted by: Anonymous (not signed in)
Posted on: Thursday, September 03, 2009 at 11:58 AM
Message: And the general thinking (because these were procedural folks making set based decisions) was along these lines:

Why fix correctly this week that which you can patch today and put off until another release?

Or this one which is even better:

"A set based solution? I'm talking programming here not matching up outfits" ~ any Procedural programmer.

Subject: TOP 1
Posted by: Anonymous (not signed in)
Posted on: Friday, September 04, 2009 at 5:48 AM
Message: Why use "TOP 1": it might just be that it was faster and that was a factor... and yes: an optimizer should make such practice redundant, but what should (hopefully) happen and what does happen are not always the same thing.

Subject: More ways to Rome
Posted by: Anonymous (not signed in)
Posted on: Friday, September 04, 2009 at 6:29 AM
Message: Joe makes some good points but also makes the impression that there is only one truth: his truth. It might be a good idea to be a bit more open to other styles/conventions/opinions.

Subject: TOP 1
Posted by: Anonymous (not signed in)
Posted on: Friday, September 04, 2009 at 6:46 AM
Message: Why use "TOP 1": it might just be that it was faster and that was a factor... and yes: an optimizer should make such practice redundant, but what should (hopefully) happen and what does happen are not always the same thing.

Subject: VARCHAR
Posted by: Izhar (not signed in)
Posted on: Saturday, September 05, 2009 at 10:37 AM
Message: Not all the names are English (ASCII), we have NVARCHAR for it.

Subject: Programming style
Posted by: Seven24 (view profile)
Posted on: Thursday, October 15, 2009 at 1:50 PM
Message: As is not uncommon, I agree with some things Joe states and not others. I agree that camelCase for table names or column names is a poor choice but, for the same reasons, all upper case for keywords is a bad choice. PascalCase for keywords is a better choice because it is harder to read words in all caps as it creates words of identical shape whereas PascalCase creates words of unique shape and thus easier to read. In the days of monochrome monitors, it made sense to distinguish keywords from object identifiers using capitalization but with color monitors and editors, that need has long past.

In addition, I personally hate columns with underscores in them. Only someone that does not write a lot of code would do this. They make for a far more painful writing process.

> When I was at AIRMICS
> (old US Army research group),
> we found that global naming
> conventions saved 8 to 12%
> of maintenance time.

Joe, that speaks to consistency of conventions rather than the specifics of the conventions themselves.

RE: Eliminating table aliases

First, in many cases that makes the code harder to read by making the query substantially wordier. Table names should be clearly named which tends to make them longer. Second, that is not always possible. In subqueries for example, that use the outer table, you must have an alias.

Subject: Naming conventions
Posted by: Baxter (view profile)
Posted on: Wednesday, July 28, 2010 at 5:01 AM
Message: "Staff" is a better name for a table than "Personnel".

;-p

 










Phil Factor
Automated Script-generation with Powershell and SMO
 In the first of a series of articles on automating the process of building, modifying and copying SQL Server... Read more...



 View the blog
Using SQL Test Database Unit Testing with TeamCity Continuous Integration
 With database applications, the process of test and integration can be frustratingly slow because so... Read more...

SQL Source Control: The Development Story
 Often, there is a huge difference between software being easy to use, and easy to develop. When your... Read more...

How to Import Data from HTML pages
 It turns out that there are plenty of ways to get data into SQL Server from websites, whether the data... Read more...

SQL Scripts Manager: An Appreciation
 SQL Scripts Manager is Simple-Talk's present to its readers. William Brewer was an enthusiastic... Read more...

Hosted Team Foundation Server 2010 Review
 Team Foundation Server (TFS) has expanded its remit to support the whole software development process,... Read more...

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 Database design and implementation is the cornerstone of any data centric project (read 99.9% of... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Beginning SQL Server 2005 Reporting Services Part 2
 Continuing his in-depth tour of SQL Server 2005 Reporting Services, Steve Joubert demonstrates the most... Read more...

Creating CSV Files Using BCP and Stored Procedures
 Nigel Rivett demonstrates some core techniques for extracting SQL Server data into CSV files, focussing... Read more...

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Join Simple Talk