Click here to monitor SSC
  • Av rating:
  • Total votes: 4
  • Total comments: 0
Buck Woody

Data Science Laboratory System – Object-Oriented Databases

31 January 2014

Object-Oriented Databases (OOD) avoid the object-relational impedence mismatch altogether by tightly integrating into the user-level OOP code to the extent that they are simply an engine that ships with the code itself. The developer is able to instantiate OOD objects directly into the code. Buck Woody explores the Object-Oriented breed of database in his Data Science lab.

This is the eleventh in a series on setting up a Data Science Laboratory server – the first is located here.

My plan is to set up a system that allows me to install and test various methods to store, process and deliver data. These systems range from simple text manipulation to Relational Databases and distributed file and compute environments. Where possible, I plan to install and configure the platforms and code locally. The outline of the series so far looks like this:

I’ll repeat a disclaimer I’ve made in the previous articles - I do this in each one because it informs how you should read the series:

This information is not an endorsement or recommendation to use any particular vendor, software or platform; it is an explanation of the factors that influenced my choices. You can choose any other platform, cloud provider or other software that you like - the only requirement is that it fits your needs. As I always say – use what works for you. You can examine the choices I’ve made here, change the decisions to fit your needs and come up with your own system. The choices here are illustrative only, and not meant to sell you on a software package or vendor.

In this article, I’ll explain my choices for working with an Object-Oriented database system. I’ll briefly explain the concepts and then move on to the methods you can use to use and manage the one I’ve chosen. 

Concepts and Rationale

As far as database management systems go, the trinity of technologies has been Flat Files, the Relational Database Management System, and an offering that many new data technologists may not be familiar with: Object-Oriented Database systems.

The concepts behind an Object-Oriented Database (OOD) are similar to the concepts for Object-Oriented Programming (OOP). This article isn’t intended to make you an OOP expert, but a quick overview of those concepts will help frame the discussion for OOD systems. I’ll be a little loose with the terminology here, so if you’re a purist writing Machine-Level or C++ code, you may want to look away for a few moments. If you’re interested in a more complete, longer, and far more technically accurate discussion of Object Oriented Programming, there’s a decent explanation here: http://www.codeproject.com/Articles/22769/Introduction-to-Object-Oriented-Programming-Concep

When you write software code, you’re really creating a higher-level description of what you want a machine to do. In the early days of computing the programmer sent specific instructions for moving a datum (such as a 1 or a 0) to a particular memory location, perform some math function on it, copy it, delete it, re-arrange it or some other function. Later in the code the datum might be operated on again, or read from memory and displayed to some output. It was similar to what you would do with a hand-held calculator.

Working with a list of instructions (a program) in this way is cumbersome and prone to error. Humans don’t read instructions this way, and the input of keyboards, mice, other programs, machines and a myriad of other sources along with an equally complex method of outputs made understanding, and changing a program almost impossible.

So the first levels of abstraction in programming began. Programmers wrote programs in the lower-level languages into “higher” level ones that were more human-language friendly. As time passed, higher and higher levels of abstraction made programming easier, at the cost of making the programs larger, and a higher memory and CPU use.

In the 1950’s and 1960’s, the first mention of “Object Oriented” programming came into being, and this term really meant a high level of code re-use. Later, Object Oriented Programming came to mean creating a “model” of something you want to work with (the Object).

The Object has Properties (information about the object) and Methods (things the Object can do). You can use an Object to create another Object, and you can encapsulate the information in the Object – meaning that changes to the copy you make to the copy doesn’t affect the first one.

Another advantage is that you can make a “child” Object that inherits all of the Properties and Methods of the parent Object, and extends it with new information or actions. It’s this last feature that makes Object Oriented technology particularly interesting to the data professional.

Later, OOP added Events, which are notifications on an Objects state or activities that can be watched to trigger even more Events, Methods or Properties. Object Oriented Programming now encompasses dozens of concepts, but for this article, I’ll stick with encapsulation, Properties, and Methods.  

Let’s take a look at an analogy to help explain OOP - which of course will sacrifice some fidelity for understanding these concepts.

Assume that you have an Object called “Pizza”. A Pizza, at its very simplest, contains some sort of crust made from dough, and a sauce - for this example, all Pizza objects have tomato-based sauce. All Pizza objects also have cheese. In this case, since programming is abstract and Pizzas are real, we’ll call a recipe a Pizza object. All of the components of the recipe are Properties of the pizza Object.

Now let’s assume that you have a customer that would like a Pizza, but with onions. On an order slip you write down that you want a “Pizza with Onions” object. The cook takes the original Pizza recipe, and creates another one – but adds an additional Property to it: Onions. The customer gets a Pizza object, but with Onions. The next customer might get a Pizza object with Onions, and also with Jalapeños. And so it goes on.

...they are simply an engine ...'

Now assume that the original Pizza recipe changes to add more salt. That information doesn’t have to be transmitted to all of the new recipes – the information is encapsulated inside the original Pizza recipe, and all of the new child objects can simply inherit more salt. (And yes, the child Object can change something it inherited from the parent Object by simply over-riding the Property For instance, if the recipe calls for less salt, not more, the child object can override the Salt property.)

A pizza can’t do a lot, of course – but stretching the analogy a bit we could say that it gets cooked, transported and eaten. Those might be thought of as Methods for the pizza object. We could take the original pizza recipe, extend the ingredients, cook, transfer and eat a pizza.

This simple example has some holes in it, but we’ll stick with it for now. Let’s take what we’ve talked about in the OOP world and apply that to OOD systems.

Object-Oriented Databases (OOD) have some salient characteristics, but the most specific is that OOD systems are very tightly integrated into the user-level OOP code – in some cases, so integrated that they are simply an engine that ships with the code itself. The developer references the engine at the top of their code, and then can proceed to instantiate OOD objects directly into the code. This means there isn’t as much separation as you might see with a C# program, for instance, and a SQL Server database. In that case, the code calls up data, and the database handles retrieving and persisting the data down to a physical store. In the case of an OOD, that separation doesn’t exist.

You’ll find that various OOD systems either lend themselves well to a specific language, such as Delphi and Smalltalk, or have a separate query language that most any program can use. And if the data is fairly small, in some cases there is no OOD at all – the OOP language simply works with objects as normal and persists them down to data on storage within the code – there’s no OOD at all.

Rationale and Examples

If the developer works with objects natively, and the code can write data down to the hard drive, why have an OOD at all? Why not just leave that in the code?

Well, this gets back to why we have databases to begin with: you have to deal with multiple people using the same data at the same time, and you have to have a reliable structure to find and use data. An OOD system does just that – using various mechanisms to handling locking, schemas, data protection and other data requirements. It’s this multi-use requirement that makes a good case for using an Object-Oriented Database system.

And working within a single database environment allows the developer to focus more on what the code does at the logic-layer rather than handling all of the vagaries of dealing with the data management. In addition, some systems allow a type of “journaling” feature, which gives you the ability to track changes in the data through time. This is very advantageous in a Computer Aided Design (CAD) and other systems.

So the key to choosing an OOD over an RDBMS or a NoSQL engine (or perhaps in addition to those) is that it is very developer-centric. It works with objects the way a developer might directly in their code, so there is a good connection between the concepts the developer uses and how the database works.

Database theorists debate the relative merits of OODs and RDBMS

However…

Most developers have learned and adapted well to working with an RDBMS or a NoSQL engine. Because of that, and the strength and variety of capabilities in the RDBMS engines, there are far fewer Object-Oriented Database systems in wide use today. Most of the original OOD vendors have either gone out of business, merged with other companies, or changed owners. In the end, there are only a few to choose from, and even fewer that run on a given platform.

My research led me to choose db40 – db4Objects from Versant. I used the evaluation version for my lab system. This product seems to be in high production use and have good support – as I’ve mentioned before in this series, if I intend to spend time learning a product, I want to make sure it’s something I’ll be able to use in production if needed.

Installation

I started at the main page of the db40Objects at http://db40.com:

At the top is the “Download Now” button, and this is what I chose. This installs the local developer edition – which is what I’ll focus on in this article. You can use db4Objects in a shared mode, but I’m not covering that process in this lab – my focus is learning the language and platform in this system.

After selecting the button, I was brought to a screen where I selected the underlying product that I wanted to install. Here is where the power of this platform lies – it meets the developer at the platform they work with.

Since my Java-fu isn’t very good, I selected the .NET 4.0 release.

It’s a quick download, and then the installer starts.

Clicking next brings up the panel to select the features to install.

I picked “Custom” so that I could set the install path to the “S:” drive I have on my Windows Azure Virtual Machine, although your installation choices might be different.

After I set the path the installation continues the process.

And at the last screen there is a fantastic option – the installer opens a tutorial! I think this is brilliant, and I’m surprised that every product doesn’t follow this model.

At this point I can start using the product, and there’s a simple step-by-step process of working with it.

I’ll leave you to walk through that tutorial, since it would be redundant to cover it here. It covers working with the native query language, which is really just working directly with the objects using the Application Programming Interface (API) for data objects.

What I wanted to focus on in my test is working in a development environment, specifically using Language-Integrated Query (LINQ) to create and query data in a familiar way.

Example

 I start by opening Visual Studio (I have the full product on my Lab system, you can get the free version here if you want to follow along) and creating a new Project:

I’ll create a simple console application to test the process – I’ll try to create a couple of Pizza objects from my earlier explanation.

Once I created the Project and Solution, the first thing I did (following the instructions from the tutorial provided) was to add two references (right-click the project name and select Add…References) in the paths shown in the screen below. I’m adding two key references: one for the Db4objects, which gets me access to the engine, and the other for LINQ, which allows me to use LINQ for queries in addition to the native query methods in the dB4Objects API.

From there I added a couple of statements to use those references, and then added some code to create the database, instantiate some objects as data, and query that data with LINQ. The code is shown in more detail in the next section.

Before we take a closer look at that code, a couple of notes are in order. First, this code isn’t very comprehensive – it isn’t meant to be a full demo of the product. Second, you can perform this process without using dB4Objects as the persistence layer. In fact, you don’t need a database engine at all to replicate the demo here – but that’s the point.

       using System;

       using System.Collections.Generic;

       using System.Linq;

       using System.Text;

       using System.Threading.Tasks;

       using Db4objects.Db4o;

       using Db4objects.Db4o.Linq;

namespace db40Pizza

{

    class Program

    {

 

        static void Main(string[] args)

        {

            System.IO.File.Delete("Linq.db4o"); //starts with a new DB - comment out to re-use

            using (var container = Db4oFactory.OpenFile(Db4oFactory.NewConfiguration(), "Linq.db4o"))

            {

                container.Store(new Pizza { Crust = "Thin", Sauce = "Regular" });

                container.Store(new Pizza { Crust = "Thick", Sauce = "Light" });

                container.GetPizzas();

            }

            Console.Write("\nPress any key to continue...\n");

            Console.ReadKey();

        }

    }

    public static class ExtendContainer

    {

        public static void GetPizzas(this IObjectContainer c)

        {

            var r = from Pizza p in c select p;

            foreach (Pizza p in r)

            {

                Console.Write(string.Format("Pizza Type - Crust: {0} Sauce: {1}\n", p.Crust, p.Sauce));

            }

        }

    }

    public class Pizza

    {

        public string Crust { get; set; }

        public string Sauce { get; set; }

    }

}

Whenever I test and explore a product, I start with something I know well, and then add in the unknown. In this case, I know how to create and query data objects in .NET and LINQ – and in this simple example I’ll add in the dB4Objects as the persistence layer.

So let’s take a look at that code, block by block. I won’t dive into every statement, but I’ll show the general things that make it dB4Objects-specific.

  • 1.  using System;
  • 2.  using System.Collections.Generic;
  • 3.  using System.Linq;
  • 4.  using System.Text;
  • 5.  using System.Threading.Tasks;

In this section, I’m simply adding in the standard components for a console application, and I include LINQ as well.

  • 6.  using Db4objects.Db4o;
  • 7.  using Db4objects.Db4o.Linq;

In lines 6 and 7 I’m setting up “using” statements for the references I created to dB4Objects earlier.

  • 8.  namespace db40Pizza
  • 9.  {
  • 10.      class Program
  • 11.      {

Lines 8-10 start a simple namespace for the program, and set up the outer container.      

  • 12.   static void Main(string[] args)
  • 13.      {
  • 14.      System.IO.File.Delete("Linq.db4o"); //starts with a new DB - comment out to re-use

After the main program insertion point starts in line 12, line 14 deletes any database files I had from the previous program run. Of course, you wouldn’t do that in production! For testing, I can comment this line to continue with the previous run or delete it as I have here.

    15.   using (var container = Db4oFactory.OpenFile(Db4oFactory.NewConfiguration(), "Linq.db4o"))

And in line 15 I create a new container for the data, of the dB4Objects type. I use the Factory API call to create a new database called Linq.db40.

  • 16.   {
  • 17.      container.Store(new Pizza { Crust = "Thin", Sauce = "Regular" });
  • 18.      container.Store(new Pizza { Crust = "Thick", Sauce = "Light" });
  • 19.      container.GetPizzas();

        

In lines 17-18 I create two new Pizza objects, with two attributes each. The main Class will be defined below, but at the moment I simply create two new objects from the Pizza Class. I’ll make one with a thin “Crust” and a regular “Sauce”, and the other with a think “Crust” and light “Sauce”. Once again, I have a lot of leeway on what a Pizza Class (from which I can create more Objects).  Line 19 reads the values from the database.

  • 20.         }
  • 21.      Console.Write("\nPress any key to continue...\n");
  • 22.      Console.ReadKey();
  • 23.      }
  • 24.   }

In lines 20-24 I’m simply waiting for a key to be pressed to continue the program – perhaps here the program could do more from the data input side and so on.

  • 25.   public static class ExtendContainer
  • 26.   {
  • 27.      public static void GetPizzas(this IObjectContainer c)
  • 28.      {
  • 29.        var r = from Pizza p in c select p;
  • 30.        foreach (Pizza p in r)
  • 31.        {
  •               Console.Write(string.Format("Pizza Type - Crust: {0} Sauce: {1}\n", p.Crust, p.Sauce));
  • 32.        }
  • 33.      }
  • 34.    }

Lines 25-34 I’m using the data that returns into the object in a LINQ query. Then I iterate through that list with a foreach action to line 1a. where I write the data to the screen.

  • 35.         public class Pizza
  • 36.      {
  • 37.         public string Crust { get; set; }
  • 38.         public string Sauce { get; set; }
  • 39.      }
  • 40.   }

And down in lines 35-40 I define the Class for Pizza. This is fairly simplistic, of course, and has only two Properties, both publicly accessible and writable. The Class would be far more complex, with non-settable “base” Properties, Methods for baking and delivering and so on.

It’s a short program – could be even shorter if the formatting were different, but it definitely shows the integration with the data persistence layer for the developer.

As a side note, there is an Object Browser you can use with dB4Objects without diving into code. In my experiments so far, it seems to be much easier to evaluate the product in either Java or .NET.

In the next installment, I’ll cover Distributed File Databases that I’ll work with in the laboratory.

Buck Woody

Author profile:

Buck Woody has been working with Information Technology since 1981. He has worked for the U.S. Air Force, at an IBM reseller as technical support, and for NASA as well as U.S. Space Command as an IT contractor. He has worked in most all IT positions from computer repair technician to system and database administrator, and from network technician to IT Manager and with multiple platforms as a Data Professional. He has been a DBA and Database Developer on Oracle systems running on a VAX to SQL Server and DB2 installations. He has been a Simple-Talk DBA of the Day

Search for other articles by Buck Woody

Rate this article:   Avg rating: from a total of 4 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.
 

Top Rated

Data Science Laboratory System – Object-Oriented Databases
 Object-Oriented Databases (OOD) avoid the object-relational impedence mismatch altogether by tightly... Read more...

Tales from a Cloud Software Firm
 Following on from a discussion about how people are using the cloud, the Simple-Talk Editorial Team sat... Read more...

Data Science Laboratory System – Document Store Databases
 A Document Store Database (DSD) is similar to a Relational Database Management system with the... Read more...

Data Science Laboratory System - Instrumentation
 It is sensible to check the performance of different solutions to data analysis in 'lab' conditions.... Read more...

Testing the StreamInsight Service for Windows Azure
 Getting 'up to speed' with StreamInsight is easier if you take the time to run it and test it out.... Read more...

Most Viewed

Windows Azure Virtual Machine: A look at Windows Azure IaaS Offerings (Part 2)
 We continue our introduction of the Azure IaaS by discussing how images and disks are used in the Azure... Read more...

PHPFog and Pagoda Box: A Look at PHP Platforms
 Cloud platforms such as Heroku, AppEngine, PHPFog and Pagoda Box are ideal for companies who just want... Read more...

An Introduction to Windows Azure BLOB Storage
 Azure BLOB storage is persistent Cloud data storage that serves a variety of purposes. Mike Wood shows... Read more...

Managing session state in Windows Azure: What are the options?
 Because you can't maintain session state for ASP.NET applications in Azure using the default in-process... Read more...

Creating a custom Login page for federated authentication with Windows Azure ACS
 Windows Azure Acess Control Service (ACS) provides a way of authenticating users who need to access web... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.