Click here to monitor SSC

Simple-Talk columnist

Silos and Monoliths – Moving On to the Past

Published 11 April 2014 1:27 pm

The presenter positively glowed with enthusiasm as she showed us how easily the schemas collections of documents in MongoDB could follow the changes in the way you structure the data in your application. I was at an interesting presentation recently of MongoDB with .NET. The presenter explained, brightly, that using MongoDB was a liberating experience because it meant that one could break free of having to share databases between applications by creating many different applications, each of which had its own database. This meant that each application domain could develop its own private schema. To reinforce the point, she had a PowerPoint slide, with the left hand side showing the bad old days, with applications having to share a corporate database, what he called the ‘monolithic’ design, and on the right was the new architecture, where each service using different data storage technologies. In this bright new world, sharing data via a central database was no longer necessary.

Applications can share data via RESTful services and one could see the attraction of  a solution composed of loose-coupled federation of applications, communicating via HTTP, This leaves each development team free to choose their preferred database technology and design, since it is private to them. There is no up-front need to decide on a shared understanding of the application domain since that is a matter for the interface, which is a ‘consumer-driven’ contract to supply data in the right form.

One can see the attraction of course, but are we are sleepwalking into a strange repetition of history? In the early eighties, many large organizations struggled to control the proliferation of small, ‘siloed’ applications, each solving a specific business problem and each having a ‘private’ database. By the mid-eighties, we reached the crisis of ‘siloed’ information systems. Such were the difficulties of getting the right information from them, in the right form, for any kind of cooperation or reporting that we then spent the next fifteen years having to re-engineer them. The struggle gave rise to whole industries such as Data Warehousing, and gave me a good living for many years. Of course, network bandwidth has grown vastly since then but so has the complexity and volume of corporate data

The problem of  developing and releasing individual applications where several  were close-coupled to  one monolithic server was solved many years ago, in the 1990s, by creating abstraction layers within databases to provide an interface to applications. We have subsequently reinvented the problem for which siloed applications are the “solution” by ignoring the obvious best practice of having an abstraction layer for each application, based in the database and defined by a customer-driven contract. By doing this, we decouple the application sufficiently from the database to allow asynchronous releases, and we don’t need to create a rat’s nest of RESTful services to share information!

14 Responses to “Silos and Monoliths – Moving On to the Past”

  1. Robert Young says:

    This reactionary movement goes back, at least, to Fowler, and at least to the early 00s (

    “I use the term Application Database for a database that is controlled and accessed by a single application, (in contrast to an IntegrationDatabase). Since only a single application accesses the database, the database can be defined specifically to make that one application’s needs easy to satisfy. This leads to a more concrete schema that is usually easier to understand and often less complex than that for an IntegrationDatabase.”

    This statement is undated while he also has another nearly identical one that is dated (2004): (

  2. Proxide says:

    I agree with you entirely. At my previous place there were so many “cottage industries” that they restructured the company to sort it out. Lots of small pots of data all over the place that don’t link nicely cause no end of issues. For instance, how many of x customers have been sold product a? We need to do a mailshot to them about b. Ahhh, the system doesn’t store that information ’cause no-one’s requested the change. It’s stored in a spreadsheet being maintained by the local management team? They’ve put the customer ref in the spreadsheet? Yes, but it’s not consistent and some of the columns have been transposed!!!!!!!!

    Good blog post! So what’s the golden bullet? People need to talk early and plan well.

  3. Dimitrios Kalemis says:

    Although we should not over-generalize and we should not say that one solution is *always* best for every situation, I agree with your database-centric view of applications. For those applications that the database is their most important pillar and they completely depend on it and are “based” and “centered” around it, “having an abstraction layer for each application, based in the database and defined by a customer-driven contract” (as you wrote) is the most sane approach.

    For these applications database is “King” and should be the first and foremost concern. This is why I do not like the “impedance mismatch” that sometimes exists between code and data. Having impedance mismatch means that the developers created code using a principle different than the one that was used for the organization of the data. Example: code in OOP, data in relational database.

    Am I against OOP? Of course not! I am against OOP code that does not respect the fact that the database is the first and most important concern, thus creating programming objects that are mismatched in relation to the underlying database schema. If we accept and embrace the fact that the database is “King”, we will come up with designs that are easier to implement and support and which are leaner and meaner. I think that your points also steer us to this direction.

  4. Phil Factor says:

    The answer?
    If you, as data architect for the enterprise, have been through the exercise of identifying the data that is used, you will know what is ‘corporate/financial data’ and what is deemed to be ‘private data’. You will have needed to do this in order to ensure that the important data is under the right recovery regime. It is plain good housekeeping.
    ‘Private data’ aka ‘departmental data’ is data that is required for one or more applications, but is of no interest to the enterprise as a whole beyond maybe aggregate reporting.
    If the developers of an application are wishing to closet ‘corporate/financial data’ within an inaccessible system, providing only a contract-driven interface, this would be a ‘silo-alert’, flagged as an issue to IT management.
    If, on the other hand, the data is ‘private’ or ‘departmental’, then bless them, the development team can negotiate with the Ops/support staff to introduce whatever database technology they wish.
    In the future, we may devise a system for sharing large-enterprise volumes of data in a resilient, error-free way via RESTful interfaces. However, experience suggests that it is far easier, and more reliable, to share corporate data via large databases. Store the data in one place, and just take what you need for your application.

    • paschott says:

      I’ve rarely found data that is truly “private” or “departmental”. At some point it all seems to need to show up in a report somewhere. The concept sounds really cool, but we then run into issues with using common data incorrectly, poor design, poor integration, and really poor support. I know there are several bits of data in a poorly designed legacy internal app we have that were never intended to go beyond the app, but are needed elsewhere for all sorts of things.

  5. Keith Rowley says:

    The problem with relying on an abstraction layer defined by a contract is that if the DBA is not part of the development team and the “contract” is not flexible, it creates an administrative hurdle to every single little change of the application.

    Need to add a field to store some new piece of data, fill out a request to change the contract, wait for it to be processed thorough a committee meeting and approved, wait for a DBA to find time to make the change to the data base and the abstraction layer, test those changes, then and only then can you add the field to your application.

    This creates a huge temptation to use old fields for new things instead of wading through this process every single time the dev needs a change.

    This isn’t to say single databases with abstraction layers are the wrong solution, I still think you are correct and they often are, but it is to say if we are going this direction we need to eliminate as many hurdles to having changes made to the database as possible.

  6. Robert Young says:

    – The problem with relying on an abstraction layer defined by a contract is that if the DBA is not part of the development team and the “contract” is not flexible, it creates an administrative hurdle to every single little change of the application.

    IFF the database’s tables are designed to be orthogonal from the start, and developers/coders eschew SELECT * FROM FOO… , changes to the tables are ignored by disinterested code, and vice-versa. It’s when you have something like an xml/NoSql/FooFile store that any change to the datastore has to be parsed against all code, since the physical structure is seen by code. COBOL/VSAM coders went through that decades ago; still do.

    The best, so far anyway, implementation of database schemas is in DB2/LUW. With schemas, one can have the silos in one place, at least. And since the data has a single, sort of, repository, there’s a lower likelihood of duplicate data.

  7. Phil Factor says:

    – the “contract” is not flexible, it creates an administrative hurdle to every single little change of the application

    Agreed, and I’ve suffered from this myself on occasion. Where the DBA is not particularly competent as well, it can become a nightmare. However, where there has been the will to make it happen, and it is properly supervised, it is a good technical solution. In fact, even when I’ve undertaken both roles, DBA and application Dev, I still do it, because it solves so many integration problems. I don’t think one should discount a technical solution just because management processes are too flawed to allow it to happen. After all, even practices like pair programming and TDD can only happen effectively in the right development setting with management buy-in.

  8. sdmcnitt says:

    “Departmental Data” should trigger a “Shadow IT Alert”. I cannot wait until I get my first MongoMess. I assume it will be created by a “technical user / code enthusiast” just as it has in every company I have worked for in the past.

    Like Lotus Notes and Excel spreadsheets and Microsoft Access databases, I will take ownership of them, spend nights and weekends keeping the desktop that they run on up and running, and nurture and cajole them into supporting the critical business processes for which they were built.

    I am not saying we take away the ability of the Business to create their own technical solutions. IT just needs to be invited to the party earlier.

  9. willliebago says:

    Did you question the presenter about this? It would be interesting to know how they would respond.

  10. paschott says:

    Reading through this I was reminded of several “small” projects that quickly became essential to the business. Many were spreadsheets or Access files that helped out some user. Once someone else saw them, they grew and spread until it finally came to the attention of someone who made it an official company app. Sadly, many of these stayed isolated, but needed to integrate somehow with other systems. We exported data, imported data, hacked around various methods, and sometimes even made them a supported app that lived on a server with real backups and everything. :)

    I don’t necessarily have a problem with departmental apps, but too many of them are done without thinking of any possible growth or future use. They often use the wrong data types, depend on manual copying of data, may have incorrect formulas, or all sorts of other issues. Sometimes the apps are very well done, but those who wrote the app leave and the app is unsupported and largely unknown until it breaks.

    I definitely appreciate that people want to get things done and often see IT as the impediment, but to do something just because you can isn’t usually the best solution. Of course, this brings back the whole “DevOps” conversation – getting the developers and IT folk on the same page to build the best solutions possible.

  11. JonRobertson says:

    Data access through a layer such as REST certainly has some benefits. I think such a layer would be useful if the databases were different platforms (SQL Server and Oracle), on different servers, or even distributed across different locations. Even though you can share data in these environments, those methods are often touchy to the slightest change. Or worse, go away completely when the vendor drops support for something like linked servers. But such data access layers also add complexity or performance issues if not implemented well.

    When all of the data is stored on the same platform, it seems schemas could provide the isolation desired. At least with SQL Server, the type of access allowed can also be managed, either with broad strokes or at a very granular level.

    When it comes to making small schema changes just to store one piece of data, there are many times that the database schema doesn’t care about the data. A lot of our columns never see a WHERE, GROUP BY, or ORDER BY clause (so there is also no need to index them). We stream many objects and store the entire object in a varchar column. Changes can be made without any schema changes.

    One problem I’ve seen with separate databases per application is the amount of duplicate data. Often times, data retrieved from another database is then stored in the application’s database. If either application updates the data, the changes have to be synchronized in some way. That introduces complications and data integrity issues that can be a nightmare and best avoided in my experience.

    There are scenarios where multiple databases make sense. We have several databases that are shared across multiple applications and customers. Those databases store various sets of data that are searched and retrieved but never updated by the application. When the data sets are updated (usually monthly or quarterly), it is much easier to maintain a master “current” copy of the database and simply replace each deployed database with an updated version than it is to maintain scripts for each update. Replacing the database always ensures the most recent version is deployed.

    Keith: As for a development team not having at least one person responsible for database design: That’s a db application that I wouldn’t want to be involved with. I’ve served both roles, database developer responsible for design decisions/implementation and an application developer. The database developer doesn’t have to be a certified DBA. But needs to know the do’s and don’ts of database design and work directly with both developers and the DBA.

    Dimitrios: I agree with you about “impedance mismatch” issues. I’ve always said what defines a database application is what is needed out of the application, which in turn defines what is needed in the database. The rest of the application is built based on the database. Not the other way around (i.e. the application design should not define the database design).

    In the end, I don’t believe there is a single solution for all scenarios. I’m sure there are scenarios where MongoDB is an excellent choice. The problem is when people start trying to fit a square peg into a round hole. And when it doesn’t fit, they start searching for a hammer.

  12. DeafProgrammer says:

    The most common problem I have seen in these underlying issues is the “source identification” across the different platforms and departments.

    I agree with Phil Factor that the data architect is the solution to the “source identification” issues.

    My personal view is that without the Corporate Data means poor governance and it breaks the “data integrity” into today issues. Why would you invest the money to solve these today issues? Crazy isn’t it?

  13. Louis Somers says:

    I totally agree with your point, and no I’m not a DBA but a developer. Over the last years I have found myself writing synchronization solutions for many different applications and on different scales, some of which cannot be avoided, like offline (mobile) data or cached data for performance reasons and so on.

    The problem I have with distributed data is the lack of a single point of truth. It is almost impossible to enforce even the most simple form of constraint over a distributed system of restfull services. In code you will always have to prepare for mismatches or “not found” replies, since any part of the whole could have failed and restored a previous backup that does not include the records on hand. Also Data Warehousing or other BI solutions become a lot harder to tackle with distributed systems.

    That said, the object-oriented databases can be a great solution for offline storage or local caching. Especially juniors can become extremely religious about a certain technology or principle, and we should use that energy to our advantage instead of deflating it with a dagger. I like to start off talking about conflicts and tombstoning (usually you’ll have their attention when mentioning tomb stones :-) .

    They have their place for sure, but to speak in architectural terms, a distributed restfull architecture with distributed datastores is like a bungalow park, while a single datastore architecture would be more like an office building. So pick the architecture needed for your solution, If you’re building mobile-homes, don’t bother to use SQL. XML serialization might be a better choice.

Leave a Reply

Blog archive