Microsoft announced SQL Server 2014 at TechEd in early June, and the biggest news for SQL Server OLTP in years is the in-memory OLTP engine. This article is an introduction to the features and limitations of this new technology.
There is much to cover here. I will assume that you understand the current versions of SQL Server and so I will only discuss the differences.
Three important points about this new feature that should be stated up front:
- SQL Server 2014 will be a 1.0 version of this feature, and the development focus was on mainstream OLTP processing. Think light-weight transactional records processed with stored procedures. Many of the outlier OLTP features are not yet supported. Some of these gaps may be addressed by RTM, but some are not even planned until a later version.
- Hekaton can be leveraged inside of a normal SQL Server database. It does not require special hardware, nor does it require a separate database. The expectation is that you will migrate only a portion of your tables and stored procedures to Hekaton. That means your database will have a mix of standard and memory-optimized objects in the same database.
- There are customers who have been already running early releases in production with very good results.
CTP1 of SQL Server 2014 is planned for release at TechEd Europe in late June. Release of SQL Server 2014 is expected “in the second half of calendar year 2013.”
What is SQL Server In-memory OLTP Engine (Code-name: Hekaton)
Before the announcement of SQL Server 2014 and the sessions at TechEd 2013 in June, very little information was available about project Hekaton. We knew it was a new technology that 1) allowed tables to be resident in memory, 2) allowed for machine-compiled stored procedures, and 3) was claimed to be very fast.
Now that we have more details, here is a full picture of the two basic features of what is now officially called SQL Server In-memory OLTP Engine.
“Memory-optimized” tables (as opposed to standard tables, now called “disk-based” tables) are completely in memory.
Memory-optimized tables do not use pages for storage. The record location mechanism employed is pointers stored in hash tables.
Optimistic Concurrency Control
There are new algorithms that process records without locking and blocking. First, since there are no pages, there are no page latches. Also, writes to memory-optimized tables use versioning similar to RCSI (read committed snapshot isolation) except that tempdb is not used for row versions. Every write has an associated ever-increasing transaction number that allows for straightforward handling of reads. Uncommitted records are stored in-memory but are not visible until the transaction commits, so there are no “dirty reads” of uncommitted data.
Native Stored Procedures
We have talked for years about Standard SQL Server stored procedures being “compiled”, but the end-product of the compilation is code that is then interpreted at run-time. Hekaton has the ability to take high-level code and translate it to C and then build a DLL. When the stored procedure is called at run-time, the compilation steps and associated CPU overhead have already been completed. The goal here is to do the same work with many fewer CPU instructions. Based on reported results, the savings are dramatic. Microsoft is talking about 10X to 25X performance gains overall.
Native stored procedures are “atomic” in the sense that they run in a single CPU until they complete. This reduces context-switching, which is very costly in terms of CPU cycles.
An Early Caveat
Be aware that the only tables that can be involved in a native stored procedure are memory-optimized tables. Disk-based tables are not allowed. Stored procedures that access both memory-optimized and disk-based tables are called “interop” procedures.
Block Transaction Log Writes
Whereas disk-based tables write individual records to the transaction log, memory-optimized tables write what we’ll call consolidated records. If you wrote a hundred records to a disk-based table that had a single non-clustered index (and were updating a column in that index), you would write two hundred records to the transaction log. With a memory-optimized table and a best-case scenario, you would write exactly one record to the transaction log with all the required detail. (Each log block is 24KB, so depending on the size of the records being written and the number of records, it could write multiple records.)
Disk-based tables store non-clustered indexes on disk. For many OLTP tables, the size of the indexes is larger than the size of the base table. Creation and updates to indexes are written to the transaction log which takes CPU cycles and often very large amounts of disk space.
With memory-optimized tables, non-clustered indexes are not persisted to disk, so they take up no disk space, and updates do not have to be written to the transaction log. When a SQL Server 2014 database is started up, indexes for memory-optimized tables are built at the time that a table is loaded into memory.
Streaming Data Files
Hekaton leverages filestream capabilities to stream data records to disk. Records are written as a continuous stream; there are no UPDATE records, only INSERT and DELETE records. INSERTs are stored in “data files” and DELETEs are stored in “delta files”.
Memory-optimized tables can have durable schema and non-durable data meaning that on start-up, the table is recreated, but no data is loaded.
How could this be used? Besides the ETL scenario described below, the obvious use is ASP session state. This also could be used anywhere you use a #temp or @temp table and you know that only one process is using the table at a given time. The usage of all the other features in Hekaton is rather straightforward. I expect that the provision of non-durable tables is the feature that will be used the most creatively of all. People will use this in ways that the SQL Server team never imagined. (Initially, non-durable tables are only supported in native stored procedures. By RTM, the plan is to support non-durable tables in interop stored procedures as well.)
No Buffer Pool
In Hekaton, there is no buffer pool. All the data for memory-optimized tables is in memory. Note that there are buffer pool-related changes in SQL Server 2014 that allow you to use SSDs to augment the size of your buffer pool.
What Problems Does Hekaton Address?
Scenario 1: Write-intensive Audit Log
Assume you have a centralized audit logging table for an OLTP system. You’ve designed it with an integer or bigint clustered primary key which means all new records are written to the end of the table. This design allows every page to be filled completely without any page fragmentation.
With a very high number of cores (past 12 to 16) and a high-performance OLTP workload, page latches can cause write delays because each writer has to acquire a page latch before it can write a new record.
With memory-optimized tables, there are no pages, so there are no page latches. Also, the writes are so fast that it is much less likely that multiple writers would try to access the same record at once. In disk-based tables, the waits are often because writers are trying to access the same page, rather than the same record. Without pages, this is no longer an issue.
Partitioning a Large Table
An application’s audit log table is likely to be partitioned by time, say a single partition per week. Since Hekaton does not (initially) support partitioning, we need to find a creative solution. The recommendation here is to have the active partition in-memory and all other partitions in a partitioned disk-based table. A view can be created to abstract away the complexity of the split tables.
Say you have two memory-optimized tables: CurrentAuditA and CurrentAuditB. Initially, the application is writing to A until the time boundary is hit. Then it switches and writes to B. (The view must be updated at this time to include table B instead of table A.) After the switch, a simple INSERT/SELECT statement writes the records from memory-optimized table A to disk-based table Audit2013W26 representing the 26th week. When that table has been fully processed (PK, indexes), it can be switched in to the partitioned audit log table.
Scenario 2: Read-intensive Table
Consider a large reference table; say a list of drugs with details about quantities, prices, etc. In an on-line consumer order application, access to this table would be almost all reads with occasional writes. And reads on popular drugs could be very high. The pages that contain popular drugs would likely be cached in memory even with a disk-based table.
Because memory-optimized tables use hash tables and pointers directly to memory-resident records, simple reads are much faster. For a read on a disk-based table, even if the pages are already loaded into memory, the system has to traverse multiple layers in a B-tree index. If the index is non-clustered and not a covering index, then the actual record is accessed. This results in more operations to return the same data.
Scenario 3: Mixed Read-write Customer Table
In that on-line drug order system, the table that holds orders would receive a large amount of writes along with some updates, deletes, and also many select statements. If one process is trying to update the status of an order to in-process and another process is trying to update the quantity, these two processes could end up locking or blocking each other.
Scenario 4: ETL Staging Table
Traditionally, staging tables for ETL processing have used heaps (tables without clustered indexes) and minimal or bulk logging. This was the way to get the best throughput because it minimized the amount of processing that was required with each write.
Memory-optimized tables allow the option of schema-persistence without data-persistence. Writes to these tables don’t cause any transaction logs writes because data written here is transient. If the server crashed during processing, the data can be simply reloaded from the source.
Caveats and Recommendations
If two processes try to write to the same record at the same time, the first will succeed and the second will fail because there is no locking. This requires changes to stored procedures to handle the failure. Whether that “failure” should be followed by a retry or an error should be evaluated on a case-by-case basis. Think through the implications of two writers updating the same record. Retry basically mimics last-writer-wins logic. That is often the simplest choice but not the best choice.
Native stored procedures do not recompile automatically based on statistics updates. This will have to be handled through scripting or manual processing.
Running Out of Memory
As you might expect, running out of memory in a memory-optimized database is a problem. If your tables grow to the point where you run out of memory, write activity stops, but read activity can continue. Even many SELECTs would fail, so your database is fundamentally inoperable. The important thing to remember is that durable in-memory tables are fully ACID-compliant, so you will not have data loss in committed records. The SQL Server will continue to improve memory management before RTM.
Record size is limited to 8,060 bytes. There are a number of creative solutions that should allow you to store your LOB data and long string or binary data in disk-based tables, but the important thing here is that this will require changes to your application.
Durable memory-optimized tables are limited to 512 GB. (Non-durable tables have no size limit.)
Foreign Keys and Check Constraints
My biggest disappointment with the initial version is that it does not support foreign key constraints or check constraints. For a database designer/developer, “let the application handle data quality” is like fingernails on a chalkboard.
IDENTITY and SEQUENCE are not supported. Do a quick search on the web, and you’ll quickly find solutions like this one.
Even interop-mode procedures do not support MERGE. The explanation is that the team concentrated on “mainstream” OLTP workloads, but this will cause grief for many of us that have widespread use of MERGE in our current code.
More on the not-supported list:
- DML triggers
- Data types
- Schema changes are not support in the same way as disk-based tables.
- There are no heaps in memory-optimized tables. (This is not a bad thing, just interesting. It makes sense though. A heap is just a bag of records in a particular spot. Records in memory have to have pointers for look-up.)
- Compression is not supported.
- Range indexes are expected to be available in CTP2.
You likely will not want to put everything in memory-optimized tables. In a normal OLTP database, the 80-20 rule is more like 95-5. If you put effort into moving the dozens of tables that make up just 5% of your database’s processing load, your gain is minimal. Target the tables that are used most heavily. Partition large tables into hot and cold. Keep the hot data in memory-optimized tables and everything else in disk-based tables.
There are reasons to move even the cold part of a large table into memory-optimized tables. Memory-optimized tables do not persist indexes, so depending on the number of indexes on your big tables, you could save significant storage by migrating. Weigh the cost of memory against the fully-realized cost of disk storage. It may tip toward disk-based today, but as memory prices continue to drop, reevaluate.
Transaction Log Performance
Evaluate the implications of pumping maybe ten times as many transactions through your system and consider upgrading the storage for the transaction log.
Consider how your instrumentation will need to change if transactions complete in microseconds instead of seconds or milliseconds.
SQL Server 2014: A Closer Look at SQL Server Blog
Microsoft Announces SQL Server 2014 at Visual Studio Magazine
SQL Server 2014: What’s vNew for vNext by Thomas LaRock
SQL Server 2014! by James Serra