Click here to monitor SSC

John Magnabosco

SQL Server Development and Data Security

Searching Encrypted Data

Published Friday, January 23, 2009 4:25 PM

Frank is a DBA. He maintains a database that contains many columns of data that has been classified as sensitive. He has been assigned the task of protecting this data and has been instructed to use encryption to do so. "No problem" says Frank, "I'll utilize cell-level encryption. I should have this task completed in a day and then I can kick back and sip margaritas with the peace of mind that this data is safe."

Poor Frank. The request that he received was not specific to the additional requirements of how the end users will utilize this sensitive data. Frank, in his excitement to have the opportunity to utilize this feature of SQL Server, did not initially ask for these tiny details. As he began reviewing the code that is associated with the sensitive data he soon realized that the unspoken expectation was to protect the data while maintaining the ability to search the data.

If Frank is to consider what his options are with his intent to implement cell-level encryption he will need to know the following:

  • The encrypted value will need to be stored in a varbinary data type.
  • You cannot index varbinary data types.
  • You will have to explicitly open and close the keys prior to their use.
  • When the data is decrypted it is stored in cache as plain text.

With this knowledge Frank decides to implement the following code into the previously existing stored procedure that performs the search:

OPEN SYMMETRIC KEY SymPassKey 
      DECRYPTION BY CERTIFICATE MyServerCert 
GO

      SELECT 
            Customer_Name,
            Customer_Address_1,
            Customer_City,
            Customer_State,
            Customer_Zip,
            Customer_Account
      FROM
            dbo.CustomerData
      WHERE
            CONVERT(varchar, DecryptByKey(Customer_SSN)) = @SSN_Value
GO

CLOSE SYMMETRIC KEY SymPassKey
GO

Frank rolls this code into a test environment and uses his trusty DBA tools to evaluate the performance against a database that contains millions of records in the CustomerData table. "Holy shnikeys!" exclaims Frank. His is disappointed in the performance of this query. It is significantly greater than the original query. Not only is the encrypted field causing a table scan, the value contained within the Customer_SSN field of each record must be decrypted prior to considering whether or not it should be provided in the result set. Back to the drawing board.

After a tall cup of strong coffee Frank decides to consider one-way encryption as a possibility. In SQL Server this is accomplished by utilizing the HashBytes method. The concept is that when the data is recorded to the table the plain text is encrypted. When the data is searched, the plain text that is submitted is encrypted using the same algorithm and then both cipher text values are compared. His greatest concern is that this method does present a dictionary attack vulnerability.

With this approach Frank must consider the following:

  • Since decryption does not occur, there are no key management requirements.
  • While HashBytes returns a varbinary data type it can be stored in a nvarchar data type.
  • If nvarchar is used to store the encrypted value it can be indexed.
  • Since the data is not decrypted the value in cache remains protected.

Frank decides to consult with his fellow DBA friend, Jerry, who has been sitting quietly all morning monitoring database statistics while enjoying his Grateful Dead collection on his Zen. After a brief discussion regarding the challenge. Jerry suggests that he consider salting his plain text values prior to utilizing the HashBytes method. This approach makes the dictionary process much more difficult because the offending party must either gain or guess the secret value before successfully mounting such an attack.

Rather than place the secret value in plain text into each stored procedure that utilizes the Customer_SSN data, Frank decides to implement a table that will contain the secret values. To further protect this information, he implements cell-level encryption upon it.

When the secret value that salts the one-way encrypted is needed, the following is executed:

OPEN SYMMETRIC KEY SymPassKey
     DECRYPTION BY CERTIFICATE MyServerCert
GO

SELECT 
      @SaltValue = SaltValue 
FROM 
      dbo.SaltValues 
WHERE 
      SaltValueSource = 'SSN' 
GO

CLOSE SYMMETRIC KEY SymPassKey
GO

Since the SaltValues table will contain very few records the performance of utilizing this approach for this table does not negatively affect the entire process. Once the secret value is obtained, an insert statement would appear like the following:

INSERT INTO dbo.CustomerData (
      Customer_Name,
      Customer_Address_1,
      Customer_City,
      Customer_State,
      Customer_Zip,
      Customer_SSN)
VALUES (
      'Jane Smith'

      '123 Main Street', 
      'Indianapolis', 
      'IN', 
      '46204', 
      HashBytes('SHA1', @SaltValue + '555-55-5555') )
GO

A search on this data would be executed with the following statement:

SELECT
      Customer_Name,
      Customer_Address_1,
      Customer_City,
      Customer_State,
      Customer_Zip,
      Customer_Account
FROM
      dbo.CustomerData
WHERE
    Customer_SSN = HashBytes('SHA1', @SaltValue + @SSN_Value)
GO

Once again Frank implements the above approach into a test environment to see how it affects performance. "Holy shnikeys!" Frank exclaims once again; although this time it was a positive response. There was a little performance hit in using this approach the increase was very minimal while protecting sensitive data and preserving the ability to search the sensitive data. Also, a limitation to this approach is that a wild card search could not be provided with this solution; although, it could be argued that a wild card search is a variation of a dictionary attack.

Despite some limitations, Frank's solution proved to be successful. Frank could finally sit back and sip that margarita he dreamed about days ago.

by Johnm

Comments

 

TadRichard said:

Very cool.

It reminds me of the PGP dual-key encryption system.  Because of the huge performance issues of perfoming dual-key encryption over a large plaintext, the creators of PGP decided to use a two step process.  The plaintext is initially encrypted with single key encryption using a psuedorandom key.  Then that relatively small key is encrypted using a dual-key system.
January 24, 2009 6:36 PM
 

DotNetShoutout said:

Thank you for submitting this cool story - Trackback from DotNetShoutout
January 26, 2009 12:06 AM
You need to sign in to comment on this blog

About Johnm

John Magnabosco manages the Data Services Group at one of the fastest growing companies in the United States. He is also a Co-Founder of the Indianapolis Professional Association for SQL Server (IndyPASS), Co-Founder of IndyTechFest, the author of the book titled "Protecting SQL Server Data" and contributing author of "SQL Server MVP Deep Dives Volume 2".
<January 2009>
SuMoTuWeThFrSa
28293031123
45678910
11121314151617
18192021222324
25262728293031
1234567
Automated Script-generation with Powershell and SMO
 In the first of a series of articles on automating the process of building, modifying and copying SQL... Read more...

Converting String Data to XML and XML to String Data
 We all appreciate that, in general, XML documents or fragments are held in strings as text markup. In... Read more...

Geek of the Week: Don Syme
 With the arrival of F# 3.0 Microsoft announced a wide range of improvements such as type providers that... Read more...

How to Document and Configure SQL Server Instance Settings
 Occasionally, when you install identical databases on two different SQL Server instances, they will... Read more...

What's the Point of Using VARCHAR(n) Anymore?
 The arrival of the (MAX) data types in SQL Server 2005 were one of the most popular feature for the... Read more...