Database Documentation – Lands of Trolls: Why and How?

When database documentation is mentioned in an IT Department, everybody nods wisely, yet everyone does their best to avoid doing it. Attention to the database documentation can be the best invertment in time a development group can make. It is essential, and no system can be properly maintained without it. Feodor gives a sensible explanation and guideline for the unloved task of creating database documentation.

Database documentation often seems to be one of those uncharted territories, unknown grounds, lands of trolls and even monsters. In my experience as a DBA, some 80% of production databases don’t have sufficient documentation. This seems quite well accepted, perhaps even as an implicit rule. It seems that the larger the organization and the management overhead therein, the more cursory the documentation is.

What is the purpose of the documentation (the WHY)

Just as there is sufficient documentation in the cockpit of every airplane, so, in a production database, there should be the equivalent of maintenance checklists, pre-flight checklists, and onboard manuals that give the detail of all the routines that should be executed in case of a significant malfunction or untoward event. On an airliner, if one of the thousands of indicators on the dashboard lights up, the co-pilot checks the documentation, determines the reason for the event, and starts executing a routine to correct the reason.

So it should be with an IT organization; even if failure doesn’t cost lives, the risk is as high. The mission of the DBA team can be critical to the survival of an organisation.

Database documentation is important because it meets several needs. For example, it:

  • Provides a common language between business decision makers and IT personnel – two different breeds, usually with no common language, but with a shared ability to read a nice flow chart and discuss it.
  • Provides a shortcut to finding ‘hot-spots’ – by looking at a global functionality chart, one can easily outline the most troublesome parts of the system. In this way, educated decisions can be made about hardware and software requirements, and the outcome can be easily communicated to the financial department.
  • Facilitates a ‘no-panic’ rule – by having proper documentation and using it, the chances of making a wrong decision are diminished because the risk is easier to assess.
  • Makes maintenance easier, and reduces risk when extending or upgrading a system.
  • Reduces training costs, by acting as a mediator between newcomers and existing staff. For example, when a consultant or a new hire comes to the company, having up-to-date documentation reduces the time required from the existing staff to transfer knowledge.
  • Improves productivity of both newcomers and seasoned employees, reducing the likelihood of costly misunderstandings by providing a glossary of commonly used terms, naming conventions, and even commonly-used strategy patterns.

Who will use the documentation (the WHO)?

In a company, we usually have quite a few roles participating in the development process: testers, developers, team leaders, project managers and so on. Whatever the organizational structure might be, the database documentation proves necessary as a way of coordinating the development activities and should be accessible to all involved.

Of course, when it comes to the maintenance of the documentation, it is most likely to be a common task of the testers and the developers to keep the documentation up-to-date.

Aside from the technical staff, the documentation is a powerful tool in the hands of the business executives; the people who vote on the budget and decide on the company’s priorities.

Here are couple examples from everyday life:

Business people and upper management staff respond to pictures and well-documented proof as to why they need to make a decision and/or release some of that precious budget. If you walk up to your manager and say ‘we really need a new server, because…’, the manager will look at you and will say ‘Hmm, we’ll see.’ And that’s about it. However, if you walk up to your manager and back up your request with a picture of the environment highlighting the hotspots in red, you will get a more positive response.

Very simply put, management tends to respond very well to highlighted reports and schemas. Nevertheless, you should keep in mind that however ‘pretty’ a schema, it still needs a solid base of factual data and impeccable documentation logic in order to offer a complete picture.

The database documentation must cater for a variety of audiences, so there must be different representations of the same documentation for each target audience.

Walk up to a developer or IT person, with a ‘pretty’ diagram, highlighted in red, and they will almost certainly frown. This is simply because a developer or an IT specialist is usually first on the line and, logically, when they see red it tells them that they will have some kind of trouble to deal with. It is a compelling message. You will be confronted by a defensive wall even before you have started talking, a wall that will take time to get around. Technical personnel tend to react much more favourably to a factual consolidation of application logic, instead of business mechanics highlighted in red

In conclusion, database documentation is important for a wide range of roles: we have already mentioned technical staff (developers, DBAs, testers and so on), business executives, and IT specialists, but there is a whole other set of people involved in the work process, such as external consultants (on the business or on the IT sides), auditors, and even potential buyers of the company. However, it is important to remember that the representation of the documentation has to be tailored according to the target audience.

Documentation types (the WHAT)

The most common classification of the different aspects of database documentation is by tier: for example, application logic documentation, database logic documentation. There is also hardware environment documentation – installation, maintenance, DR (disaster recovery) planning and testing.

Different levels of documentation exist at different stages of a project. For example, there is a project planning documentation, which is used as a common language between the business department and the technical department; there is development cycle documentation during the development of new system modules; finally, there is a deployment documentation, which ensures that releases and release upgrades are not an unpleasant surprise for the end user.

The process of defining the documentation model is critical, as parts of it will reflect the unique requirements, goals, and resources of the particular organization. It will vary according to the point in time at which the documentation is created; whether, for example, it is part of the development process, or whether it is a retrospective gift for the tenth anniversary of the existence of the project or organization.

Methodology (the HOW)

Before starting the documentation

There is probably no company or organization which does not have some kind of documentation. At the very least, there will be some financial information about the project, its participants, and its purpose. Quite often though, the existing documentation is scattered throughout the organization in various forms and versions.

Start by locating whatever documentation exists already. This could be in a ‘volatile’ state in someone’s head, on someone’s workstation, on a common shared directory, in a central management system etc.

Also, try to locate any person who is directly involved in the development of the system, or who has been in touch with the vendors and has been passing application specifications and requirements to them.

The questionnaire

Nowadays, there are probably very few businesses that have a one-to-one correspondence between application and database. A database might serve several applications, and an application may span several databases.

In complex environments, we usually see many applications, related to different departments, developed and maintained by different people. This would be perfectly fine if the applications did not depend on each other and the company could afford one server (or one clustered instance) per application. Wouldn’t this be nice? In reality, however, the environment is shared, sliced and diced, the hardware is exposed to different kinds of workloads and the overall responsibility has to reside somewhere.

In order to get a clear picture of an undocumented or poorly documented system, we would need to create a standard questionnaire and require all the application owners to fill it in, so as to gather the basic information about the application, its description, demands and expectations, as well as technical and owner contact information.

The general topics touched in the application/database questionnaire should be as follows:

  • Responsible contacts: owner, technical, other
  • Performance: expected load, peak times
  • Scalability: initial and expected data growth
  • Security: auditing, encryption
  • Compatibility: collations, versioning (which SQL Server version is the application tested for, etc)
  • Expected network bandwidth needs
  • Network connectivity specifications
  • Other: depends on individual business priorities

The detailed documentation

In a standard organization we would usually have several modules or departments, and their data would probably overlap.

Let’s say, for the purposes of this example, that we have several modules such as a user interface (e.g. a banking system with a web UI), a financial/billing department, customer support, and an administrative interface.

Each department will consume data from the same source, but with different requirements: the user interface will always use the ‘hottest’, most current data, plus some historical data. The financial/billing department will deal with reports on the data from the previous day backwards, mostly aggregations and analytical dimensions. And so on.

The point is that the documentation should eventually bring an overview of the processes of the entire system and the way the data is treated by the different departments, which will make the changes in the system easier, better, and more reliable.

The priorities of the attributes of the application, such as performance, security, and scalability, are unique to each organisation. Therefore, the results of the questionnaire should be discussed inside the organization in order to clarify the relative priorities.

This is a good time to mention that there are plenty of methods of documenting a system; the idea of this article is not to list the various methods, but to outline a ‘bird’s-eye view’ methodology of what to document in an organized manner, no matter what approach is used. I strongly recommend the Red Gate Guide to SQL Server Team-based Development, which will give you plenty of options and detailed examples of the database development, documentation and refactoring.

There are various tools that can be used to document a system – visual tools, analytical tools, metadata tools and so on. Whether you use Red Gate’s SQL Doc, SQL Dependency Tracker 2, MS Visio, or even Adobe Photoshop, the following points will give you a brief idea of what to document.

As I suggested earlier, you should look into how processes are defined in the organization, establish the order in which they should be documented, and attempt to identify the main aspects of the system and sort them by importance.

A documentation template

Here is an example of the contents of a process documentation template:

  • Process flow description: A short introduction, what is the flow for each process? How is it related to the system as a whole?
  • Flow classification: What characterizes the process? What makes it similar or distinguishes it from other processes?
    • data traffic and the load on the database:
                      <insert performance metrics here>
    • primary users: …
    • etc …
                     
  • List of objects involved in the process:
    • file group: …
    • schema: …
    • full-text catalog: …
    • etc …
  • Business logic flow paths: Is this process wrapped in a bigger piece of application logic? Is it a multi-step process itself?And so on…
  • Business logic restrictions: What conditions exist in the application layer in order for this process to take place?
  • Expected input parameters based on the business logic flow and restrictions:
    • prerequisites
    • mandatory information
    • optional information
  • Output: What is the expected outcome of the process? For example, returns some information to the user/application, confirmation, and so on.
  • Error handling: How are the errors handled? Is there a ‘try-catch’ code? And so on.
  • Create diagrams, flow charts, dependency lists etc â⬔ this will make it easier for people from different backgrounds to talk about the process.

When the template is ready, start filling it out until you have all processes documented.

Where should the documentation be kept and how should it be distributed (the WHERE)?

To recap, here are the most common documentation storage methods in a company:

  • in the mind of the senior developer or system designer
  • in a file on some computer
  • in an unknown share
  • in a centralized repository or under source control
  • in a database
  • in a specialised Wiki

It is generally up to the organization itself to decide which way is best; however, you can guess that the risks of the first few storages mentioned above are much higher than the rest. What I would recommend is to use a content management system which allows searching and indexing.

How to encourage the use of the documentation (the PERSISTENCE)

Of course, creating the documentation is just part of the road to ultimate happiness in database development. The rest of the path is to encourage the use of it.

Again, it is up to the organization to encourage the use of the documentation in meetings, discussions, and in the decision making processes.

Keep in mind that proper documentation can increase productivity and minimize the risk of incorrect decisions.

How to make sure the documentation is up to date (the RELEVANCE)

It would be so nice if we could use Policy Based Management to track if changes introduced to the system are also well documented… It would save a lot of grief in production if this were enforceable.

Well, unfortunately, it does not work this way. It is up to the developers and system administrators to evaluate changes and document them as they happen.

Depending on how the organization is structured, it might be a good idea to introduce rules that enforce teams to sign off their output by presenting the documentation for it. Or, if the organization can afford it, there could be a separate role defined in the organization for ‘documentation officer’.

Keep in mind that out-of-date or inaccurate documentation is almost as dangerous as the lack of documentation.

Conclusion

Documentation is a time-consuming, but rewarding process. It is also unique to each organization and the way it operates. The quality of documentation depends greatly on resources and time allocated to it and the priority it is given.

After reading this article I hope you have developed your own ideas on how to develop and use documentation, as well as a clear vision of how you will benefit from it.

Tags: , , ,

  • 38851 views

  • Rate
    [Total: 3    Average: 4.7/5]
  • Thomas Williams

    The Why
    Great article Feodor, particularly appreciated the bit on the “Why”, any manager would appreciate those rationales.

  • Anonymous

    Agree on the goal – but how to change organsiation ethos?
    With documentation at the level you propose, how many databases/servers would you expect a DBA to be supporting?

    I strongly believe in the ethos of documentation, and am battling to make a change in the organsiation I am in because “it has always been done this way”, “documentation will always be out of date when you need it” and “no-one has time, because there is too much to do”.

    Of the 3 most common responses I get from our global team, the last really wrings true. The other 2 are simply a facet of the fact that no-one has done documentation properly, and taken on board teh key point that without a commitment to keep it current, documentation is at worst only partly useful, and sometimes an actual risk as someone may use old documentation to re-introduce a problem that was solvedm but not documented. Living documentation is KEY to any documentation, imho.

    BUT – the behaviour change is problematic when the DBA team supports 3000+ servers, none of which have any documentation, and which are on a wdie variety of configurations and stages of life-cycle, and the sheer volume of change/incident means that every DBA has a backlog of problems. Also, given the global economy, and the typical reaction of large organsiations, the team has been shrinking – the only sensible solution to our problem (get more people) is not even vaguely likely.

    Thoughts? Shoulder to cry on?

  • SQLGuyChuck

    Changing company motivation for documentation
    I struggle with creating, maintaining, and motivating management to make documentation a priority also. So, I can sympathize with anonymous. Being on the IT operations side of the fence trying to convince the development team to spend more time on documentation has been a failure. IT turnover is around 1-2 years, so much knowledge is lost when people leave, leaving the rest of the company to reverse engineer what code they wrote or decisions they made.

    My answer was to create an automated documentation process that goes out and hits every server to get any kind of information we can using powershell and t-sql. Then with that information, create reporting services reports that display this information and show the value add of this to business. I haven’t completed it yet, as I have 60 hours of work to do a week already, but we are getting close. My hope is that once they see the value of the portal, we can work together to create living documentation. Only difficult part is reporting services isn’t very good mechanism to write back to our repository. Hopefully the choice of using SharePoint 2010 as a foundation will allow us to do this (it does have the capability).
    -Chuck

  • Anonymous

    anonymous back for #2
    @Chuck:

    Yeah, we have a inventory process in place, with processes that automatically connect to servers gather information like drives space, DB space, object space, version, job history, audit information, configurations (memory etc.). But that only covers the bare bones of the infraxtructure and some details of usage.

    It doesn’t go to the detail that the author refer’s to – processing flows (aka) transactional analysis, business logic, implementation of design-by-contract (i.e expect outputs, etc.). IT doesn’t transactional volumetrics to allow for the linking of that so we can start doing more powerful cause-effect and root-cause analysis (20% increase in transaction A, overrunning batch B, leads to CPU 100%, and then overrun into Batch C, impacting on acheiving SLa’s).

    etc.