Click here to monitor SSC
  • Av rating:
  • Total votes: 28
  • Total comments: 12
Joe Celko

Database Normalization Basics

07 April 2014

The task of Database Normalization doesn't have to be painful, especially if you follow Old Mother Celko's Normalization Heuristics.

Have you reached the point of having seen the term "normalization" used database literature, but you are still unsure as to just what you have to do to get a normalized database? If so, then you’re not alone.

We will be showing, in this article, how  Normal Forms keep data integrity. Some people will argue that it is all right to "denormalize" a schema for efficiency reasons. If we denormalize the schema, then we are either not enforcing some of the user-required constraints or we have to write addition code to enforce the constraint. If we choose to write additional code, we have to duplicate the functionality in every application, present and future. Normalization reduces programming effort; rules are enforced in one place, one way, one time.

Let’s start in a simple way with some general heuristics about normalization.

Heuristics are exact and often sound a little funny when you say them. One of my favorites is “if you cannot think of it as a percentage, think of it as a ratio!” (From a mathematical view point, a ratio and percentage are the same. But percentages have a base while the two things in a ratio are seen as equal in some sense).

Mother Celko's Thirteen Normalization Heuristics

  • Does the table model either a set of one and only one kind of entity or one and only one relationship. This is what I call disallowing a “Automobiles, Squids and Lady GaGa” table. Following this rule will prevent ‘Multi-valued dependencies’ (MVD) problems and it is the basis for the other heuristics.
  • Does the entity table make sense? Can you express the idea of the table in a simple collective or plural noun? To be is to be something in particular; to be everything in general or nothing in particular is to be nothing at all (this is known as the Law of Identity in Greek logic). This is why EAV does not work – it is everything and anything.
  • Do you have all the attributes that describe the thing in the table? In each row? The most important leg on a three-legged stool is the leg that is missing.
  • Are all the columns scalar? Or is a column serving more than one purpose? Did you actually put hat size and shoe size in one column? Or store a CSV list in it?
  • Do not store computed values, such as (unit_price * order_qty). You can compute these things in VIEWs or computed columns.
  • Does the relationship table make sense? Can you express the idea of the table in a simple sentence, or even better, a name for the relationship? The relationship is “marriage” and not “person_person_legal_thing”
  • Did you check to see if the relationship is 1:1, 1:m or n:m? Does the relationship have attributes of its own? A marriage has a date and a license number that does not belong to either  of the people involved. This is why we don't mind tables that model 1:1 relationships.
  • Does the entity or relationship have a natural key? If it does, then you  absolutely have to model it as the PRIMARY KEY or a UNIQUE constraint. Is there a standard industry identifier for it? Let someone else do all that work for you.
  • If you have a lot of NULL-able columns, the table is probably not normalized.
  • The NULLs could be non-related entities or relationships.
  • Do the NULLs have one and only one meaning in each column?
  • If you have to change more than one row to update, insert or delete a simple fact, then the table is not normalized.
  • Did you confuse attributes, entities and values? Would you split the Personnel table into “Male_Personnel” and ”Female_Personnel” by splitting out the sex code? No, sex is an attribute and not an entity. Would you have a column for each shoe size? No, a shoe size is a value and not an attribute.

The Writing on the Wall

Many years ago, there was a magazine named Database Programming & Design which published a poster on Normalization by Marc Rettig as a subscription renewal premium. It was so popular that it is still around after all this time. It follows one example, the Daisy Hill Puppy Farm database (this is the birth place of Snoopy in the Peanuts comic strip).

You really ought to have a copy hanging in your cube. It is that good. It is called '5 Rules of Database Normalization'. You can now find it here:
 http://www.databaseanswers.org/downloads/Marc_Rettig_5_Rules_of_Normalization_Poster.pdf

The Relational Model

The Relational Model and Normal Forms of the Relational Model were first defined by Dr. E. F. Codd then extended by other writers after him. He borrowed the term "normalized relations" from the US and Soviet political jargon back when the USSR still existed.

Relations are a branch of mathematics which deal with mappings among sets defined by predicate calculus from formal logic. Just like an algebraic equation there are many forms of the same relational statement but the "normal forms" of relations are certain formally defined desirable constructions. The goal of Normal Forms is to avoid certain data anomalies that can occur in un-normalized tables. Data anomalies are easier to explain with examples. When Dr. Codd defined the relational model it added some additional requirements to the visualization of the relation as a table:

  1. If a row in a table can exist without a certain attribute being known, then this is considered a NULL state for that attribute and the DBMS must use a special symbol for that state. We have that in SQL, but NULLs have a data type to signal the SQL engine how to handle the physical storage.
  2. No two rows are identical. Furthermore it must be possible to define a subset of the known attributes of a relation which have no identical values. This subset of attributes is known as a "key". SQL was built on old file systems, and the language can allow redundant duplicates. But that is a sign of really bad programming
  3. One key in a relation is the "primary key". This key cannot have any NULL attributes, of course. This initial definition was later modified to regard all keys as “equally key-like”, but the notion was set in the early literature and it got into SQL. The actual motive was from sequential files; a sequential tape file has to be sorted on a sort key to work. ISAM files on disk also had to have a sort key. This is why T-SQL has clustered indexes and and clustering as the default for the PRIMARY KEY. The new technology mimics to old technology, until it learns better and has its own voice.
  4. The order of rows and columns are irrelevant and do not contain any information. This leads us to the requirement that any atomic fact in the database (a collection of such tables) can be found by a table name, column names, and a primary key value.

    Again, SQL assumes that it can go to the ANSI/ISO Standard Schema Information meta-data tables and get a default list  of columns to fill in the column lists in various statements. It is a shorthand.

    Anyone tackling relational databases who  is familiar only with file-based data will assumes that there is a physical ordering because that is how all of the flies he has ever worked with are stored –-- physically contiguous tracks on a disk. Nope. Wrong. Columnar databases are not contiguous storage. Neither is a hashed database. The first thing you have to do to learn SQL is throw out a physical model mindset and think in abstractions.

Dr. Codd added quite a few other requirements (initially a total of 12 rules and eventually 333) but those listed here will suffice for our discussion.

Anomalies

If you do not normalize a table, you get data anomalies. These are situations where data integrity cannot be maintained without extra code, NULLs and convoluted programming. Let's create a simple example. The table holds information about college students, their majors and their departmental advisers.

CREATE TABLE StudentMajors

(student_name CHAR(10) NOT NULL,

 major CHAR(10) NOT NULL,

 adviser_name CHAR(10) NOT NULL,

 PRIMARY KEY (student_name, major, adviser_name));

 

INSERT INTO StudentMajors

VALUES ('Higgins', 'Comp Sci', 'Celko'),

 ('Jones', 'Comp Sci', 'Celko'),

 ('Smith', 'English', 'Wilson');

UPDATE Anomaly

Let's try to update Higgins to change from Computer Science to Sanskrit and suddenly Celko is advising Sanskrit!

UPDATE StudentMajors

 SET major = 'Sanskrit'

 WHERE student_name = 'Higgins';

The actual erroneous row is ('Higgins', 'Sanskrit', 'Celko')

DELETE Anomaly

Mr. Smith drops out of school and Professor Wilson loses his job!!

DELETE FROM StudentMajors

 WHERE student_name = 'Smith';

The row ('Smith', 'English', 'Wilson') happens to be the only row that tells us that Professor Wilson is the English department's faculty adviser. If this sample table and the English department had been bigger, this problem could have been hidden for a long time. But we would have been carrying redundant data.

INSERT Anomaly

Mr. F. L. Wright starts an Architecture department, but we cannot put him into the database without using NULLs or a dummy student name.

INSERT INTO StudentMajors (student_name, major, adviser_name)--- fails!

VALUES (NULL, 'Architecture', 'Wright')'

SQL will prevent this row from getting into the table the ways I declared this example. It is bad practice to  declare an IDENTITY as the key and all other columns are NULL-able. Since the table's IDENTITY property is not a column nor an attribute of any possible data model, it cannot ever be used as a key. It is the count of the physical insertion attempts.  You can use it to mimic the record numbers that are so familiar from using sequential files, but they have no place in the relational model.

First, Second, Third, and Boyce-Codd Normal Forms

Let's talk about going from a file system to an RDBMS. Let's start with a file to maintain data about class schedules at an imaginary school. We keep the course_nbr, section_nbr, time_period, room_nbr, professor_name, student_name, student_major, and course_grade. Assume we started with a Pascal file, declared like this, and wanted to move it to a relational database. I picked Pascal because it is easy to read and understand, unlike currently popular languages.

RECORD Classes =

 course_nbr: ARRAY [1:7] OF CHAR;

 section: CHAR;

 time_period: INTEGER;

 room_nbr: INTEGER;

 room_size: INTEGER;

 professor_name: ARRAY [1:25] OF CHAR;

 Students : ARRAY [1:class_size]

 OF RECORD

     student_name ARRAY [1:25] OF CHAR;

     student_major ARRAY [1:10] OF CHAR;

     course_grade CHAR;

     END;

END;

To convert this sequential file of records into a table, we first have to "flatten" it out, making sure that each row has full information for all the columns. Since Pascal, or any other language, does not have NULLs, the columns will be NOT NULL. The first thought is to simply rewrite every field over as a column, a line-for-line translation, as I said earlier. It does not really work because a column is nothing like a field; did you notice that “Students” is itself an array of records whose size is determined by an external variable? This would have been done with an OCCURS clause in COBOL and other programming language have similar features.

But let's do anyway.

CREATE TABLE Classes

(course_nbr CHAR(7) NOT NULL,

 section_nbr CHAR(1) NOT NULL,

 time_period INTEGER NOT NULL,

 room_nbr INTEGER NOT NULL,

 room_size INTEGER NOT NULL,
 professor_name VARCHAR(25) NOT NULL,

 student_name VARCHAR(25) NOT NULL,

 student_major VARCHAR(10) NOT NULL,

PRIMARY KEY (course_nbr, section_nbr, student_name));

We declare PRIMARY KEY (course_nbr, section_nbr, student_name) so we have a proper table. But what we are doing is hiding the Students record array, which has not changed its nature by being flattened.

To get this table in 1NF we "decompose" it into two or more tables so that no table has repeating groups, hidden or otherwise.

CREATE TABLE Classes

(course_nbr CHAR(7) NOT NULL,

 section_nbr CHAR(1) NOT NULL,

 time_period INTEGER NOT NULL,

 room_nbr INTEGER NOT NULL,

 room_size INTEGER NOT NULL,
 professor_name VARCHAR(25) NOT NULL,

 PRIMARY KEY (course_nbr, section_nbr));

 

CREATE TABLE StudentCourses

(student_name VARCHAR(25) NOT NULL,

 course_nbr CHAR(7) NOT NULL,

 section_nbr CHAR(1) NOT NULL,
 student_major VARCHAR(10) NOT NULL,

 course_grade CHAR(1) NOT NULL,

PRIMARY KEY (student_name, course_nbr, section_nbr),

FOREIGN KEY (course_nbr, section_nbr)

  REFERENCES Classes (course_nbr, section_nbr));

Since we were told that many students could sign up for a single course we know that the student information was a repeating group. Note that the foreign key declaration allows us to enforce the constraint that a student may only sign up for a valid course and section. Files do not do that; they are raw data without any constraints that get meaning from the host program that reads them.

With the second table structure there are no repeating groups but there are other problems. For example, what if a student changes his/her major? We need to change student_major for every (course_nbr, section_nbr) that student_name is enrolled in.

We now need to introduce the concept of a functional dependency (FD) or determiner to borrow a British term for the same idea. The notation is "X → Y", and the arrow is read "X determines Y" in English, where X and Y are sets of attributes that will be used to define columns in tables. A functional dependency means that if I know a value of X, then I can determine a value of Y, just as you would in a mathematical function.

A table is in Second Normal Form (2NF) if it has no "partial key" dependencies. That is, if X and Y are columns or subsets of columns and X is a key then for any Z which is proper subset of X, it cannot be the case that Z → Y. In the example, our users tell us that knowing the (student_name, course_nbr) is sufficient to determine the section_nbr (since students cannot sign up for more than one section of the same course) and course_grade. This is the same as saying that (student_name, course_nbr) → (section_nbr, course_grade). After more analysis we also discover from our users that student_name → student_major; a student has only one major, in English. This leads us to the following decomposition:

CREATE TABLE Classes

(course_nbr CHAR(7) NOT NULL,

 section_nbr CHAR(1) NOT NULL,

 room_nbr INTEGER NOT NULL,

 room_size INTEGER NOT NULL,

 time_period INTEGER NOT NULL,
 professor_name VARCHAR(25) NOT NULL,

 PRIMARY KEY (course_nbr, section_nbr));

 

CREATE TABLE Enrollment

(student_name VARCHAR(25) NOT NULL,

 course_nbr CHAR(7) NOT NULL,

 section_nbr CHAR(1) NOT NULL,

 course_grade CHAR(1) NOT NULL)

 PRIMARY KEY (student_name,  course_nbr),

 FOREIGN KEY (course_nbr, section_nbr)

  REFERENCES Classes (course_nbr, section_nbr),

 FOREIGN KEY (student_name)

  REFERENCES Students (student_name));

 

CREATE TABLE Students

(student_name VARCHAR(25) NOT NULL PRIMARY KEY,

 student_major);

At this point, we are in 2NF. Every attribute depends on the entire key. Now if a student changes majors it can be done in one place. Furthermore, a student cannot sign up for different sections of the same class because we changed the key of Enrollment. Unfortunately, we have still have problems.

Notice that while room_size depends on the entire key of Classes it also depends on room_nbr. If the room_nbr is changed for a (course_nbr, section_nbr) we may also have to change the room_size and if the room_nbr is modified (we knock down a wall) we may have to change room_size in several rows in Classes for that room_nbr. Obviously, the room size is a property of the room.

Another normal form can address these problems. A table is in Third Normal Form (3NF) if for all X → Y where X and Y are columns of a table then X is a key or Y is part of a "candidate" key (a key which is not necessarily the primary key). This implies that the table in 2NF since a partial key dependency is a type of transitive dependency. To get our example into 3NF we make the following decomposition:

CREATE TABLE Rooms

(room_nbr INTEGER NOT NULL PRIMARY KEY,

room_size INTEGER NOT NULL);

 

CREATE TABLE Classes

(course_nbr CHAR(7) NOT NULL,

 section_nbr CHAR(1) NOT NULL,

 room_nbr INTEGER NOT NULL,

 time_period INTEGER NOT NULL,

 PRIMARY KEY (course_nbr, section_nbr),

 FOREIGN KEY (room_nbr)

   REFERENCES Rooms (room_nbr));

 

CREATE TABLE Enrollment

(student_name VARCHAR(25) NOT NULL,

 course_nbr CHAR(7) NOT NULL,

 section_nbr CHAR(1) NOT NULL,

 course_grade CHAR(1) NOT NULL,

 PRIMARY KEY (student_name, course_nbr),

 FOREIGN KEY (course_nbr, section_nbr)

   REFERENCES Classes (course_nbr, section_nbr),

 FOREIGN KEY (student_name)

   REFERENCES Students (student_name));

 

CREATE TABLE Students

(student_name VARCHAR(25) NOT NULL PRIMARY KEY,

 student_major VARCHAR(10) NOT NULL);

A common misunderstanding about relational theory is that 3NF has no transitive dependencies. As indicated above, if X → Y then X does not have to be a key. If Y is part of a candidate key. We still have a transitive dependency in the example - (room_nbr, time_period) → (course_nbr, section_nbr) - but since the right side of the dependency is a key in is technically in 3NF. The unreasonable behavior this table structure still has is that several courses can be assigned to the same room_nbr at the same time_period.

Boyce-Codd Normal Form (BCNF)

The normal form that actually removes all transitive dependencies is Boyce-Codd Normal Form or BCNF. A table is in BCNF if for all X → Y, X is a key; period. We can go to this normal form just by adding another key with UNIQUE (room_nbr, time_period) constraint clause to the table Classes. There are some other interesting and useful "higher" normal forms but that is out of the scope of this discussion. In our example we have removed all of the important anomalies with BCNF.

A table can be in 3NF, but still have problems. Let's declare an abbreviated table for mailing labels.

CREATE TABLE Mailing_Labels

(street_address VARCHAR(35) NOT NULL,

 city_name VARCHAR(25) NOT NULL,

 zip_code CHAR(5) NOT NULL

 CHECK (zip_code LIKE '[0-9][0-9][0-9][0-9][0-9]'),

 PRIMARY KEY (street_address, city_name)

);

The nature of US mailing addresses and the ZIP code gives us these two functional dependencies, ignoring the fact that many cities have the same name within different states.

(street_address, city_name) → zip_code

zip_code → city_name

The first FD can give us a key, but we are not enforcing the second one. Consider this:

INSERT INTO Mailing_Labels

VALUES ('100 Main Street', 'Springfield', '99999'),

 ('100 Main Street', 'Yorktown', '99999');

In this example, we have to add another table to implement the second FD. Notice the overlapping unique constraints.

CREATE TABLE Zip_Cities

(city_name VARCHAR(25) NOT NULL,

 zip_code CHAR(5) NOT NULL UNIQUE

 CHECK(zip_code LIKE '[0-9][0-9][0-9][0-9][0-9]',

 PRIMARY KEY (city_name, zip_code));

and modify the original table:

CREATE TABLE Mailing_Labels

(street_address VARCHAR(35) NOT NULL,

 city_name VARCHAR(25) NOT NULL,

 zip_code CHAR(5) NOT NULL

 CHECK (zip_code LIKE '[0-9][0-9][0-9][0-9][0-9]'),

 

 PRIMARY KEY (street_address, city_name),

 FOREIGN KEY (city_name, zip_code)

 REFERENCES Zip_Cities (city_name, zip_code));

This is a bit elaborate, but it is all declarative code. Using overlapping constraints is a good technique that does not get used as often as it should. Let's go to the next normal form. BCNF removes all the transitive dependencies, not 3NF.

Fourth Normal Form

A table in BCNF can still have problems with multi-valued dependencies. Here is another overly simple table.

CREATE TABLE Inventory

(item_nbr CHAR(13) NOT NULL,

 dept_name CHAR(15) NOT NULL,

 supplier_name CHAR(20) NOT NULL,

 PRIMARY KEY (item_nbr, dept_name, supplier_name));

It is all key, so we have a BCNF table. Now let's enforce the following two multi-valued dependencies.

item_nbr →→ dept_name

item_nbr →→ supplier_name

The MVDs say that an item can be in one or more departments --- every department has trash cans. The second multi-valued dependencies say that an item can have different suppliers –-- trash can come from many suppliers. But there is a problem with this table. Imagine that I get a new shiny red trash can. I can not do anything with it until I find some department that wants a shiny red trash can. I need to recognize that MVD constraints on the departments and suppliers are independent.

CREATE TABLE Department_Inventory

(item_nbr CHAR(13) NOT NULL,

 dept_name CHAR(15) NOT NULL,

 PRIMARY KEY (item_nbr, dept_name));

 

CREATE TABLE Suppliers_Inventory

(item_nbr CHAR(13) NOT NULL,

 supplier_name CHAR(7) NOT NULL,

 PRIMARY KEY (item_nbr, supplier_name));

 Fifth Normal Form

You can have a n-ary relationship that cannot be decomposed into lower degree relations. If you ever have had a mortgage or made any major purchase that involved a lender, then this Shelton will look familiar.

CREATE TABLE Escrows

(buyer_name CHAR(15) NOT NULL,

 seller_name CHAR(15) NOT NULL,

 lender_name CHAR(15) NOT NULL,

 PRIMARY KEY (buyer_name, seller_name, lender_name));

 

INSERT INTO Escrows(buyer_name, seller_name, lender_name)

VALUES ('Smith', 'Jones ', 'National Bank'),

('Smith ', 'Wilson', 'Home Bank'),

('Nelson', 'Jones ', 'Home Bank');

Let's try to split this into three binary relations.

(CREATE TABLE Buyer_Seller

(buyer_name CHAR(15) NOT NULL,

 seller_name CHAR(15) NOT NULL

 PRIMARY KEY (buyer_name, seller_name));

 

CREATE TABLE Buyer_Lender

(buyer_name CHAR(15) NOT NULL,

 lender_name CHAR(15) NOT NULL,

 PRIMARY KEY (buyer_name, lender_name));

 

CREATE TABLE Seller_Lender

(seller_name CHAR(15) NOT NULL,

 lender_name CHAR(15) NOT NULL,

 PRIMARY KEY (seller_name, lender_name));

The trouble is that when you try to assemble the original information by joining pairs of these three tables together, thus:

SELECT BS.buyer_name, SL.seller_name, BL.lender_name

 FROM BuyerLender AS BL, SellerLender AS SL, BuyerSeller AS BS

 WHERE BL.buyer_name = BS.buyer_name

  AND BL.lender_name = SL.lender_name

  AND SL.seller_name = BS.seller_name;

This will add false rows, such as ('Smith', 'Jones', 'Home Bank'). These three tables should have been implemented as VIEWs and not as base tables

 Other Normal Forms and non-NF Redundancies

There are more normalizations, but they involve temporal consideration and other things. There are also redundancies that involve more than one table. The most common non-normal form redundancy is a summary column in a referenced table. In English, imagine the usual skeleton of an Orders table and its referencing order details table:

CREATE TABLE Orders

(order_nbr CHAR(10) NOT NULL PRIMARY KEY,

 ..,

 order_amt_tot DECIMAL (12,2) NOT NULL,

 CHECK(order_amt_tot > 0.00),

 ..);

 

CREATE TABLE Order_Details

(order_nbr CHAR(10) NOT NULL

  REFERENCES Orders(order_nbr)

  ON UPDATE CASCADE

  ON DELETE CASCADE,

 sku CHAR(15) NOT NULL,

 PRIMARY KEY (order_nbr, sku),

 unit_price DECMAL (8,2) NOT NULL

  CHECK (unit_price > 0.00),

 order_qty INTEGER NOT NULL

  CHECK (order_qty > 0),

 ..);

where order_amt_tot is the total amount of the order as computed by the query:

SELECT order_nbr, SUM(unit_price * order_qty) AS order_amt_tot

 FROM Order_Details

 GROUP BY order_nbr;

People who write an Orders table like this will be updating it constantly, often with procedural code in a TRIGGER or stored procedure.

So, do we have to learn a lot of math just to get a correct schema? Well, yes. But you can use some basic heuristics and feel that you have gotten it right 95% of the time

Joe Celko

Author profile:

Joe Celko is one of the most widely read of all writers about SQL, and was the winner of the DBMS Magazine Reader's Choice Award four consecutive years. He is an independent consultant living in Austin, TX. He has taught SQL in the US, UK, the Nordic countries, South America and Africa.
He served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards.
He has written over 800 columns in the computer trade and academic press, mostly dealing with data and databases. He is the author of eight books on SQL for Morgan-Kaufmann, including the best selling SQL FOR SMARTIES.
Joe is a well-known figure on Newsgroups and Forums, and he is famous for his his dry wit. He is also interested in Science Fiction.

Search for other articles by Joe Celko

Rate this article:   Avg rating: from a total of 28 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: The Two Issues
Posted by: Robert young (view profile)
Posted on: Tuesday, April 8, 2014 at 11:24 AM
Message: -- Normalization reduces programming effort; rules are enforced in one place, one way, one time.

1) Thus we find that coders run 99.44% of development shops. Such coders aren't really dense, and generally understand both the RM and SQL. They don't oppose RDBMS as the repository of the constraints of record out of ignorance. Rather, they're only worried about continued employment as coders. So, the notion of reduced effort is seen as reduced income. Or, as one former colleague said to a prospect, "We prefer to manage transactions in our application code". So long as development is run by coders, so too will RDBMS stored data be treated as dumb-as-a-sack-of-hair bytes. Just as it was by today's coders' Grand Daddies pushing out COBOL.

2) It's worth mentioning that SQL is a bad image of the RM due to Chamberlin, not Codd who was banned by Armonk from defining the language for the RM.

Subject: Other language for RDBMS
Posted by: Joe Celko (not signed in)
Posted on: Thursday, April 10, 2014 at 5:20 PM
Message: SQL is the dominant language for RDBMS and it is so firmly entrenched it will not be replaced. But Dr. Codd had his own language called Alpha that was based on his relational algebra. Ingres had QUEL that was based on the relational calculus, which was better than SQL for years. Sterling Software had one based on ER diagrams, but I cannot remember the name. Then Chris Date likes Tutorial D, which looks a good bit like Alpha.

Subject: A developer's viewpoint
Posted by: Eric P (not signed in)
Posted on: Monday, April 14, 2014 at 10:48 AM
Message: In answer to Robert young, I've been around a long time and I don't recall ever seeing developers like the ones he describes, who favor one DB over another because of wanting to write more code. There's always far more code to write than can be completed in a lifetime as it is, without trying to drum up more work. I am always looking for the easiest, most efficient (and stable) way to write code so that I can get more done and make my customer happy.

In our company, the chief database architect is trying to shove EAV down everyones' throats. Most have buckled under, but not my team. We have fought it for all we're worth. The code required to work with EAV is a nightmare - but the architect maintains that we will fail because we are not using EAV. I guess it's clear that we are in that .56% group that is not run by coders.

Subject: Poster
Posted by: Anonymous (not signed in)
Posted on: Monday, April 14, 2014 at 12:43 PM
Message: What's the deal with the poster? I followed the link and find a poster with one rule - Eliminate Repeating Groups. Duh, aren't there more?

Subject: Igor: Abi Normal
Posted by: Robert young (view profile)
Posted on: Monday, April 14, 2014 at 2:46 PM
Message: @Eric P

The LoC centric behavior describes coders, whose job it is to write C++/java/COBOL/etc., not database developers. These are the folks who aim to keep the end of coding at some point well past retirement; "far more code" is the mantra. If DRI is your friend, then java is your enemy, so to speak.

Building Organic Normal Form databases reduces, sometimes considerably, the amount of client-side code needed (i.e., all that C++/java), and thus is a threat to them. I only cited one instance; I've experienced many.

Anyone who's an EAV zealot reads the wrong magazines. You have my sympathy.

Subject: Re the poster
Posted by: simple Talk Editor (not signed in)
Posted on: Tuesday, April 15, 2014 at 12:56 AM
Message: It may not be obvious, but the link leads to a page that gives the background to the poster, and has an image of part of the poster. The poster itself is a PDF file that is a download link from that page. it contains a lot more than just repeating groups. Sorry it wasn't obvious but we are not in control of that page!

Subject: Typos
Posted by: Anonymous (not signed in)
Posted on: Wednesday, April 16, 2014 at 6:03 AM
Message: Hi Joe, Thanks for the article
4 typos and 1 mystery:
used database literature
we have to write addition code
but since the right side of the dependency is a key in is technically in 3NF
Unfortunately, we have still have problems

"then this Shelton will look familiar"??

HTH

Subject: Normalization is Essential
Posted by: Anonymous (not signed in)
Posted on: Thursday, April 17, 2014 at 9:25 PM
Message: Great article - unfortunately our java developers insist on keeping logic and constraints in the java application, generating keys and maintaining tables as slot buckets with an ORM object relational mapper based on HIbernate.

Of course their database is impossible to decipher, queries using HQL rather than Sql are a disaster and the simplest of queries requires massive table scans and spikes in CPU usage.

This feels like flat file land - did I mention that one sql developer has the productivity of 5 java coders? The only true agility comes from using practical theory not homespun agile coder mythology.

Subject: Missing logical design, only physical
Posted by: ScottPletcher (not signed in)
Posted on: Wednesday, April 23, 2014 at 12:52 PM
Message: I was deeply disappointed to see no mention of a logical design process, instead straight to "tables" and "indexes". The biggest problem today is not lazy or greedy coders, it's a failure to do a proper *logical* design before proceeding to the physical design.

Subject: Poster
Posted by: Anonymous (not signed in)
Posted on: Wednesday, April 23, 2014 at 4:03 PM
Message: I have had that poster on my wall for years! I even put it in a cheap plastic frame.

Subject: Summary of Normalization
Posted by: VJ (not signed in)
Posted on: Thursday, April 24, 2014 at 8:39 AM
Message: Each field depends on the key, the whole key and nothing but the key, so help me Codd

Subject: No More Poster
Posted by: SQLWayne (view profile)
Posted on: Monday, November 10, 2014 at 4:05 PM
Message: Unfortunately Rettig's site has gone dark, though there are other places that have the PDF. Problem is, it's questionable if you can have it printed large.

 
Simple-Talk Database Delivery

DLM
Patterns & Practices Library

Visit our patterns and practices library to learn more about database lifecycle management.

Find out how to automate the process of building, testing and deploying your database changes to reduce risk and make rapid releases possible.

Get started

Phil Factor
Microsoft and Database Lifecycle Management (DLM): The DacPac

The Data-Tier Application Package (DacPac), together with the Data-Tier Application Framework (DacFx), provides an... Read more...

 View the blog

Top Rated

The Evolution of SQL Server BI
 It is sometimes hard to keep up with Microsoft's direction in Business Intelligence. Over the years,... Read more...

Microsoft and Database Lifecycle Management (DLM): The DacPac
 The Data-Tier Application Package (DacPac), together with the Data-Tier Application Framework (DacFx),... Read more...

A Start with Automating Database Configuration Management
 For a number of reasons, it pays to have the up-to-date source of all the databases and servers that... Read more...

Archiving Hierarchical, Deleted Transactions Using XML
 When you delete a business transaction from the database, there are times when you might want to keep a... Read more...

Rollback and Recovery Troubleshooting; Challenges and Strategies
 What happens if your database deployment goes awry? Do you restore from a backup or snapshot and lose... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Temporary Tables in SQL Server
 Temporary tables are used by every DB developer, but they're not likely to be too adventurous with... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.