Click here to monitor SSC

Design questions - Advanced

Last post 06-21-2009, 3:20 PM by Phil Factor. 3 replies.
Sort Posts: Previous Next
  •  06-19-2009, 8:48 PM Post number 73819

    Design questions - Advanced

    These are the types of problems that's never in the SQL books!

    1) I have these columns in most of my tables: created_date, creator_email, modified_date, modifier_email. Should I make both modified_date and modifier_email NULL? I know SQL is a predicate logic based language which means that I could figure out by comparing created_date = modified_date. If I do make it a NON-NULL,the modified_date and modifier_email will contain the values of created_date and creator_email (space is wasted). In the application layer, having modifier_email and modified_date to NULL makes more sense. How can a member that just signed up have a modified date and a modifier email?? That's a logical error.

    2) I read an article where it says that primary keys are not required if you have unique constraints on the keys that define the data entity. However, isn't a table not a table if it doesn't have a primary key by definition???

    3) Take a forum data model. Forums would be aware of topics, topics would be aware of forums, messages are aware of topic....should it be aware of forums? (this would require all forum composite keys to be in the messages table as a foreign key)

    4) Take a social networking application model. Members would have friends. The friends table would consist of the member and the friend. Both would require foreign key references back to the members table. The problem with this is that SQL Server (even the latest version) does not support cyclic cascades (almost EVERY other vendor supports it INCLUDING ACCESS). The answer is triggers but I don't like triggers for debugging reasons (and the fact that it persists before trigger fires). I like to use the DRI to do as much of the "what should be automated" work. So the alternative would be to delete the friend references when a member is deleted. The problem with this is that some will be done in the DRI and some in the SP. It's like using natural keys and surrogates in the data model!

    5) In the friends table, a member can be the friend or the friend requestor. To get a members friends, one would have to do a UNION ALL (constraints prevents duplicates). How is the performance of the UNION ALL operator (I read some articles mentioning bad performance)? Is there any other way to model this?

    6) In the memebers table, should the primary key be the username or the email? By buisiness logic, an email would likely change in the real world while a username would be for "aesthetics".

    7) This is the first data model I'm doing without using surrogates. I'm worried about massive cascades occuring cause of primary key changes or doing joins on non-number columns. Yes, there will be joins almost everytime even though I'm using natural keys. This is because the application layer uses a layered architecture which does not do any type of lazy loading other than nested entity properties. Any tips?

    8) Should you seperate stored procedures by CRUD or put them into a single SP. For an example, CreateMember, UpdateMember,DeleteMember OR InsertOrUpdateOrDeleteMember (I prefer SaveMember but Celko advised against it). Celko also recommends that I should not do any boolean logic inside the procedure to check if the data already exists and just do an update/insert/delete which is valid EVEN on an empty set. With surrogates, I used to check (ID < 0 OR ID > 0 ) for INSERT/UPDATES. With natural keys, I just check the created_date (a non persisted entity has a NULL created_date) but I do the check directly in the update/insert query (using merge statement).

    9) Regarding candidate keys, sometimes a table will require 2-4 candiate keys which become the primary key. This would mean that any foriegn keys would also have those 2-4 candidate key references. This would take a lot of space but how is it in performance? (celko mentions minimal hashing algorithm etc)
  •  06-20-2009, 1:05 PM Post number 73822 in reply to post number 73819

    Re: Design questions - Advanced

    Question 1
    If you're worried about that little space, then you ought to look elsewhere for a saving. No, really, the creator modified it by changing it from nothing into something. Initialize it to the same as the creation date/person. It makes sense and makes it easier for reporting

    Question 2
    It is a good idea to have a primary key so that processes that need a unique, non-null index, (query builders, comparison tools, visual SQL tools, application widgets) know where to look

    Question 3
    No, I don't think so since the extra effort to get the information isn't huge. Why not do some timings? I am always wary of adding redundancy

    Question 4
    Ive dome it both ways. I actually like the SP way because you can see exactly what it going on  from the logic without poring through build scripts.

    Question 5
    Don't believe anything you read without testing it.

    Question 6
    I tend to prefer using Email, but there is a potential danger here. A user can have several emails and can change them. However, from a practical point of view, emails are unique and I've never come a cropper using them.

    Question 7
    People always talk darkly of performance problems with natural keys, but I don't trust this sort of vague talk and usually run up a test harness to see for myself. So far, I've never seen a performance problem from them.  I've done 'lazy loading' at the database end. It worked wonderfully, I ought to write it up. Mind you, I only did it with string results, not tabular tesults

    Question 8
    I wouldn't map tables to stored procedures 1 to 1 at the interface level. However, if you do, dont mix operations in the one SP. It really doesn't work well.

    Question 9
    I've never been able to detect much of a performance degradation in doing this, even on a million row table. Logic says yes, it will degrade performance but I reckon it is worth it.
  •  06-20-2009, 1:38 PM Post number 73823 in reply to post number 73822

    Re: Design questions - Advanced



    Thanks for the reply.

    Question 3:
    Are you saying that it's good to put the forum foreign keys on the message table or not?

    Question 4:
    Yes, I don't like mixing cascades and doing it in the SPs (same as I don't like using both surrogates and natural keys).

    Question 8:
    So you are suggesting to split the CRUD operations into seperate SPs? If the application has 20 tables, that would be at least 80 stored procedures lol.

    If the database has over 100 million rows and a 10 members decide to change their email address, what would the "cascade" effect be? I'm guessing there would be massive read/write locks everywhere.

  •  06-21-2009, 3:20 PM Post number 73825 in reply to post number 73823

    Re: Design questions - Advanced

    Question 3
    No it is not a good idea as this is hierarchical data and should be treated as such. I must have said this too politely. You can always scatter foreign keys all over hierarchical data but it does you little good. I'd just be reconciled to the idea that you will need to access elements in a stored procedure which will walk the hierarchy to get the forum. It makes a lot of sense as you can delete anything just by adding it to the deleted forum in a simple operation.

    Question 4
    Quite right. Just keep in the forefront of your mind the poor soul who will have to maintain the code. That poor soul could be you, so look after him/her. Cascading constraints are great just as long as you don't forget you've got them.

    Question 8
    A customer record is typically distributed amongst eight tables in a production database. you only need four stored procedures.  Think in terms of business objects such as invoice entries, purchases, customers rather than tables.  Some databases, I read, have 25,000 stored procedures so consider yourself lucky!  Actually,  I find that I very rarely exceed the 100 stored procedures.

    Regarding your parting comment. The answer is, yes, if they all do it simultaneously. But the chances of this happening are in the same region as you being beamed up by aliens to take part in bizarre biological experiments.

    Do it right, and then mess with your relational purity only when your tests show it to be absolutely necessary.  My belief is that most compromises and 'Denormalisation' is done through irrational fear long before you hit any sort of performance problem.


View as RSS news feed in XML