In the efforts of preventing unauthorized access to sensitive data, scrambling production data for testing, staging and development environments is a recommended practice. There are many ways to approach data scrambling. One option for numeric values is a process called "numeric variance".
Numeric variance is a process in which the actual numeric values contained within a specified column are increased or decreased by a given percentage. For example, an original value of a bank account balance for a customer is $500. With a numeric variance of 10% applied, the new value for the account balance will be changed to $550.
The addition of a randomizer will strengthen the results of the numeric variance. This strength is that each row of the affected column will be increased or decreased by a different percentage than the other rows.
The RAND function can be utilized to generate a random number. By leaving out the seed argument, SQL Server will dynamically determine the seed that will be used. If a hard-coded value is entered into the seed argument the value returned will be the same for each execution.
Below is some sample code that can be used to perform a randomized numeric variance:
-- Set original value
DECLARE @OrigVal int
SET @OrigVal = 500
-- Set variance to 10%
DECLARE @VarPct numeric (5,2)
SET @VarPct = 10
The preceding code simply provides us with variables to store our original value ($500) and our maximum variance percentage (10%). This process can be created either as a stored procedure or a user defined function; if created as such, these can be used arguments.
-- Set number range for random number
DECLARE @Range int
SET @Range = (((0-@VarPct)+1)-@VarPct)
-- Set random element
DECLARE @Rand int
SET @Rand = CONVERT(int,(@Range * RAND() + @VarPct))
The preceding code uses the RAND function to create a randomized value. When you attempt to create a user-defined function with RAND function contained within it you will receive the following error:
Msg 443, Level 16, State 1, Procedure <UDF NAME>, Line <LOCATION OF RAND>
Invalid use of a side-effecting operator 'rand' within a function.
A work around this issue is to create a simple view that can be referenced to obtain the RAND value. In our specific example, the resulting @Rand variable will contain the value of -19. This value represents the range from -9% to 10%, including 0%.
-- Determine numeric variance
DECLARE @Variance int
SET @Variance = CONVERT(INT,((@OrigVal*@Rand)/100))
-- Return new value
SELECT @OrigVal + @Variance
GO
The preceding code uses our randomized value and generates a new variance with each execution. For illustration purposes let's say that the @Variance value returned is 35. This value is then added to our original value to change 500 to 535. If @Variance returned -45 the adding of a negative number is the same as subtracting a positive number; therefore 500 + -45 = 455.
This process is repeated for each row. At the end, our result is a successfully scrambled set of numeric values that protects the sensitive data that is contained within these columns.