Monitoring In-Memory OLTP: What’s Important?

In-Memory OLTP has evolved to be a great solution. However, a production system that uses it needs careful monitoring to avoid stress conditions becoming problems. As with many new features of SQL Server, it pays to plan a monitoring strategy to ensure that you are alerted as soon as possible when things go awry.

‘To expect the unexpected shows a thoroughly modern intellect.’
Oscar Wilde.

It is important to keep an eye on servers if we are to keep the services running. As the technology evolves, we need to take into account new metrics and components so that we don’t miss any indications of potential failure.

It isn’t enough just to check whether a service is responding. Failure doesn’t always equate to a service going offline. A performance problem will directly affect the end user and can even take the entire service down, in some extreme cases. There is plenty to check.

The way to monitor is dynamic

The monitoring process needs to evolve at the same speed as the technology. A recent example of the way that the demands of monitoring are increasing and changing is the AlwaysOn Availability Groups feature, which is a kind of database mirroring based on a failover cluster and running on standalone instances. There is nothing particularly new in that, but the innovation is in the behavior of the secondary. Basically, it is possible to have multiple Secondary Replicas that have the ability to be readable. This opens up the possibility of offloading operations such as report generation or doing a database backup, leaving the Primary Replica free to meet the demands of the user or application. This way, a single database could be opened for reads and writes in a Primary Replica “A”, have its FULL backup being done in the Secondary Replica “B” and the Transaction-Log backups being performed in the Secondary Replica “C”. This scenario can be even more complex, depending on the way that the servers have been configured.

Before the introduction of that feature, all the backups were being done in a single instance, no matter whether it was a standalone or clustered instance. Now, with AlwaysOn AG, the monitoring needs to be smarter, checking different instances and evaluating several conditions in order to have enough data to be sure that the backup was done, and if not then issue a “missing backup” alert.

What about In-Memory OLTP?

Microsoft introduced the In-Memory OLTP feature in SQL Server 2014. The first public release of the feature came with limitations that prevented its widespread adoption. There were some difficult restrictions such as the fact that the important “ALTER TABLE” command wasn’t supported.

From SQL Server 2016, In-Memory OLTP feature is greatly improved: Most of the main limitations were resolved, and the internal process and algorithms were improved. This has been transformed into a revolutionary feature which can electrify the performance of highly-concurrent systems.

If In-Memory OLTP becomes more popular, it will need to be monitored. My objective in writing this article is to indicate the main areas where we need to adjust the monitoring as soon as the In-Memory OLTP feature is enabled and being used in a specific system. Because In-Memory OLTP uses both existing components and new ones, It is a matter of changing existing metrics and adding some new ones.

If an instance has a database containing memory-optimized tables (MOT), the kind of database tables used by the In-Memory OLTP feature, we need to monitor several components, including the Resource Governor, Memory allocation, disk space and Bucket count. We’ll discuss these one by one.

Resource Governor

The Resource Governor was introduced in SQL Server 2008 in order to allow the system to limit workloads and resource consumption from incoming database requests. The Resource Governor is a very important tool when using In-Memory OLTP, as this will protect the memory intended for MOTs and ensure that the MOTs won’t consume more memory than expected. If MOTs are allowed to do so, it can lead to a serious performance problem.

If a database contains MOTs It is important to bind it to a Resource Pool: It is therefore worth checking to make sure it has been done. It is easy to miss: For example, a simple request to restore a backup file in production may bring an MOT to your instance. A database is able to operate with both MOTs and traditional disk tables, so it is easy enough for a busy DBA to not notice that there’s an MOT in the database.

To avoid surprises, two steps are needed.

Check if there are databases containing memory optimized tables

To detect whether In-Memory OLTP is enabled and being used, you can check whether a Memory-Optimized Filegroup is created in the database and, in addition, make sure that one or more MOTs are created in the database. We can use the DMVs “sys.data_spaces” and “sys.tables” in order to detect this.

Check if the detected databases are bound into a Resource Pool

After evaluating the existence of MOTs in the database. It is time to verify if the database is bound to a Resource Pool. The following query will return 0 if the mentioned condition is not met.

The final step is to verify that the Resource Manager is enabled. The following query will return 1 if the RG is enabled, otherwise it will return 0.

Putting all the code together and adding a “RAISERROR” will complete our task:

As a result, when MOTs are detected and the Resource Manager is disabled or the database is not bound into a specific Resource Pool, then an error will be raised.

Memory Allocation

Memory monitoring becomes crucial when using in-memory OLTP. This is because you are using memory as the primary location of your data while your instance is running. This means that, if the available memory is not enough, SQL Server will start throwing out-of-memory errors, thereby disrupting the activity of users and applications. Although DBAs are generally familiar with the need to keep an eye on memory-stress via monitoring, the consequences of running short of memory are likely to be more severe even if the server has just one active database on any of its instances with MOTs.

We are not talking only about the total available memory in the server, but the memory that is available to the MOTs. As I’ve already mentioned, it is good practice to use the Resource Governor on databases that contain MOTs so, to avoid any risk of memory-starvation, the defined memory limit should be taken into account as well.

When estimating the needed memory for MOTs, you need to take into account more than just the projected data size. There are other structures to consider, such as indexes, row versioning, checkpoint files buffer, transactional and garbage collector structures.

Sometimes we plan for a certain data growth, but the table grows unexpectedly. This needs to be detected early in order to allow the addition of more memory in the server. The following DMV is useful to check the memory consumption per table (data and index):

In order to identify all the memory used by the In-Memory OLTP engine, you can use the following query, based on “sys.dm_os_memory_clerks” DMV.

Disk Space

In-Memory OLTP is not all about memory, the disk is also used with the purpose of keeping the data safe. There are two kinds of Memory-optimized Tables (MOTs): Durable and Non-Durable tables. The type that you select depends on the trade-off you choose between performance and data-persistance.

The data contained by Non-durable tables is not persisted on disk and the operations targeting this table are not logged, which means that this table won’t access the disk at all so as to gain the most performance-advantage. As the data will live only in the RAM memory, which is a volatile data storage medium, the data will be lost if the instance is restarted. On the other hand, the Durable tables are concerned with data persistence, this way operations are logged and the data needs to be stored in a non-volatile storage form, which is the disk, through special files identified as “Checkpoint Files”.

Checkpoint Files are stored in a container defined in the Memory-Optimized Filegroup, in the respective database, for example, I defined the following in my lab “F:\IMOLTP\”. The Memory-Optimized Filegroup is based on Filestream and a database can have multiple containers.

This is how the checkpoint files look like in the disk:

2346-1-417b2763-51f5-4565-9898-fe7a92a0c

The Checkpoint Files are working in pairs; a Data file and a Delta file. The data file is responsible for storing the data and the Delta file is responsible for identifying stale data from its corresponding Data file pair. There are multiple contiguous Checkpoint file pairs, each one responsible for storing data corresponding to a specific timestamp range.

2346-1-7484e739-e18c-4f3f-ac97-2cc90ca2f

This use of checkpoint files on disk to read and write data means that we must always ensure, by correct hardware configuration and monitoring, that the disk performance is adequate and that there’s a good percentage of free space.

Regarding the disk performance, a good recommendation is to create multiple containers allocated to different disks. Microsoft also recommends that you have enough free disk space to store four times the total size of all the durable tables.

Bucket Count – Hash Index

To maintain the best performance, Hash Indexes on Memory-Optimised Tables (MOTs) need to be monitored to make sure that the bucket count value is neither too high nor too low.

In-Memory OLTP is also unique in the way that it handles indexes. The first characteristic is that index data is not persisted in the disk. Taking advantage of the great memory performance, the indexes are always rebuilt once the instance restarts.

Another characteristic is the type of indexes, which are very particular. In-Memory OLTP supports two kinds of indexes: Range and Hash indexes.

The Hash Index is nothing more than an array of buckets, where the data rows will be disposed. This kind of In-Memory OLTP Index has a parameter called “Bucket Count” that can directly affect the performance and needs to be defined when the index is created and its value is, sometimes, difficult to predict.

2346-1-5597801a-c128-45b3-8d2e-8b29fa871

When a Hash Index is created, an “index key” should be specified. Based on that index key, a deterministic hash function will generate a hash value to identify in which bucket the row will be stored and here is the danger: as the hash function is deterministic, for the same value, the function will return the same hash value.

As an example, if you create a table named “Person” – to store user’s data – and define a Hash index setting the column “first_name” as index key, every row containing the same value in the column “first_name” will be part of a chain in the same bucket. This way, if 50% of the rows contain the same value in the “first_name” column, the index won’t be very efficient, as a long chain will be generated in one bucket and all the other buckets will have just a few or even no data.

Based on this example, we can understand that the index key should be carefully chosen so it is ‘discriminating’, to avoid that kind of situation, but what about the Bucket Count, what is the impact? There are two situations: too small a value and too high a value.

If the value defined in the Bucket Count is too low, relative to the distinct values of the index key, the same problem of long chains will happen. This situation will be caused, since the buckets will be used for more than a single hash, mixing rows with different index keys – this is identified as Hash Collision. In other hands, set a too high value for the Bucket Count will lead to waste of memory, as much of the buckets will never be used.

In summary, both a too low and a too high Bucket Count value is a problem for Hash indexes. On SQL Server 2014 indexes cannot be changed, which makes the Bucket Count value decision even more important. From SQL Server 2016 the ALTER INDEX will be supported, which helps the index tuning.

The following query helps in looking into all the Hash indexes for a specific database, showing the total and used number of buckets per index.

Summary

Any production SQL Server database needs to be carefully monitored to ensure that it is operating as efficiently as possible, and that there are no signs of impending problems. This task becomes even more important in a system running In-Memory OLTP, especially where the system is working hard. If it is done routinely, it can allow you to avoid unexpected problems that come from memory pressure or disk usage.

Tags: , , , ,

  • 9263 views

  • Rate
    [Total: 27    Average: 4.8/5]
  • mphilippopoulos

    sys.dm_db_xtp_hash_index_stats returns bucket counts for range indexes
    Hi Murilo,

    I have noticed that sys.dm_db_xtp_hash_index_stats returns information on both hash and range indexes, which, for range indexes, does not make sense.

    Have you noticed this behavior?

    Thank you,
    Marios Philippopoulos