Click here to monitor SSC
Rob Garrison

What's the Point of Using VARCHAR(n) Anymore?

30 January 2012

The arrival of the (MAX) data types in SQL Server 2005 were one of the most popular feature for  the database developer. At the time, there was a lot of discussion as to whether this freedom from having to specify string length came at a cost. Rob attempts to give a final answer as to any down-side.

When you are storing large strings, or any other data types that you’d want to store as VARCHAR or VARBINARY, there are factors that affect how and where that data is stored inside SQL Server. This in turn will affect the performance of inserts and updates.

Basics: Allocation Unit Types, Row and Page Size Limits

SQL Server’s unit of physical storage is a page. Each page has a fixed size of 8K bytes (ref). Where possible, data is stored in rows within these ‘pages’. A row cannot overlap a page boundary. If the space allocated to a datatype is fixed, or if the variable sized data is small enough to allow the row to fit, the base allocation unit type, “IN_ROW_DATA”, is used. However, there are two ways of storing data that would otherwise overflow a page boundary.

  1. When a single row grows too large to be stored in-row, data can be offloaded to "ROW_OVERFLOW_DATA" pages.
  2. When a single column stores more than 8,000 bytes, or if the developer chooses to force this behavior, data is stored in "LOB_DATA" pages.

This becomes much clearer through examples.

Exercising Row and Page Size Limits

Single String Column with VARCHAR(n) and VARCHAR (MAX)

8,000 Characters

First, let’s build two tables and exercise some long strings to see how SQL Server handles both normal and large strings. Here, we will be exercising row size limits.

CREATE TABLE demo.OneColVarcharN   (Col1 VARCHAR(8000));
CREATE TABLE demo.OneColVarcharMax (Col1 VARCHAR(MAX));

INSERT INTO  demo.OneColVarcharN   (Col1)
SELECT REPLICATE('x', 8000);

INSERT INTO  demo.OneColVarcharMax (Col1)
SELECT REPLICATE('x', 8000);

After each insert, we will use versions of the following query to see into which type of page these records were loaded (see MSDN for details on the DMV):

SELECT
    
OBJECT_NAME([object_id])        AS TableName,
    
alloc_unit_type_desc            AS AllocUnitTp,
    
page_count                      AS PgCt,
    
avg_page_space_used_in_percent  AS AvgPgSpcUsed,
    
record_count                    AS RcdCt,
    
min_record_size_in_bytes        AS TableName,,
    
max_record_size_in_bytes        AS MaxRcdSz,
    
forwarded_record_count          AS FwdRcdCt
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL , 'DETAILED');

The query results show that only “IN_ROW_DATA” pages were written for both tables, and the pages are 98.97% full.

TableName AllocUnitTp PgCt AvgPgSpcUsed RcdCt MinRcdSz MaxRcdSz FwdRcdCt
OneColVarcharMax IN_ROW_DATA 1 98.97 1 8011 8011 0
OneColVarcharN IN_ROW_DATA 1 98.97 1 8011 8011 0

8,001 Characters

Clearly we can’t insert 8,001 characters into an 8,000-character column, but we can in the VARCHAR(MAX) column.

INSERT INTO demo.OneColVarcharMax (Col1)
SELECT REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 8001);

The query results here show that pages were written to both “IN_ROW_DATA” and “LOB_DATA” pages.

TableName AllocUnitTp PgCt AvgPgSpcUsed RcdCt MinRcdSz MaxRcdSz FwdRcdCt
OneColVarcharMax IN_ROW_DATA 1 0.43 1 35 35 0
OneColVarcharMax LOB_DATA 1 99.02 1 8015 8015 NULL

Note that the “in row” record is very small. In this case, the only thing stored “in row” is the 24-byte pointer. The full 8,001-character string is moved to the LOB page.

So SQL Server is storing normal VARCHAR(n) columns and VARCHAR(MAX) columns “in row” by default. When VARCHAR(MAX) exceeds 8,000 characters, the pointer is stored “in row”, and the string is stored in “LOB” pages.

Two String Columns with VARCHAR (n) and VARCHAR(MAX)

Let’s try this again but with two columns. This time, we will exercise page size limits.

8,000 Characters

As expected, two 4,000-character strings fit fine in both tables.

CREATE TABLE demo.TwoColVarcharN   (Col1 VARCHAR(8000), Col2 VARCHAR(8000));
CREATE TABLE demo.TwoColVarcharMax (Col1 VARCHAR(MAX),  Col2 VARCHAR(MAX));

INSERT INTO demo.TwoColVarcharN   (Col1, Col2)
SELECT REPLICATE('x', 4000), REPLICATE('x', 4000);

INSERT INTO demo.TwoColVarcharMax (Col1, Col2)
SELECT REPLICATE('x', 4000), REPLICATE('x', 4000);

The query results show that only “IN_ROW_DATA” pages are written for each table, and the pages are 99.00% full.

8,060 Characters

The maximum number of bytes per page is 8,060. Let’s see how this works.

It actually doesn’t require the full 8,060 characters to fill that page. 8,047 characters get written to just “in row” pages, but at 8,048 characters, it writes to both “in row” pages and either “row overflow” pages (VARCHAR(n)) or “LOB” pages (VARCHAR(MAX)).

Results with 8,047 Characters (4,023 and 4,024 characters)
TableName AllocUnitTp PgCt AvgPgSpcUsed RcdCt MinRcdSz MaxRcdSz FwdRcdCt
TwoColVarcharMax IN_ROW_DATA 1 99.58 1 8060 8060 0
TwoColVarcharN IN_ROW_DATA 1 99.58 1 8060 8060 0
Results with 8,048 Characters (4,024 and 4,024 characters)
TableName AllocUnitTp PgCt AvgPgSpcUsed RcdCt MinRcdSz MaxRcdSz FwdRcdCt
TwoColVarcharMax IN_ROW_DATA 1 50.17 1 4061 4061 0
TwoColVarcharMax LOB_DATA 1 49.89 1 4038 4038 NULL
TwoColVarcharN IN_ROW_DATA 1 50.17 1 4061 4061 0
TwoColVarcharN LOB_DATA 1 49.98 1 4038 4038 NULL

Notice that at 8,047 characters, the record size is exactly 8,060 bytes. What is in that 23 bytes? “Each data page contains a page header which stores page meta-data, such as database file identifier, current page number, previous and next page numbers, number of free bytes per page and so forth.” (ref)

200,000 Characters

For the VARCHAR(MAX) table, the results are basically the same even if the column width is pushed far past the 8,000-character limit. Here, we write two 100,000-character strings.

INSERT INTO demo.TwoColVarcharMax (Col1, Col2)
SELECT
    
REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 100000),
    
REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 100000);
Results
TableName AllocUnitTp PgCt AvgPgSpcUsed RcdCt MinRcdSz MaxRcdSz FwdRcdCt
TwoColVarcharMax IN_ROW_DATA 1 0.75 1 61 61 0
TwoColVarcharMax LOB_DATA 27 91.89 28 228 8054 NULL

Here, we have a single page storing two pointers and 27 pages storing the large strings.

In-Row vs. Out-of-Row LOB Storage

For “large-value data types” like VARCHAR(MAX), SQL Server allows the data to be stored in-row (up to 8,000 bytes) or out-of-row (ref).

We will use two new tables, one in-row and one out-of-row:

CREATE TABLE demo.OneColVarcharMaxIn  (Col1 VARCHAR(MAX));
CREATE TABLE demo.OneColVarcharMaxOut (Col1 VARCHAR(MAX));

To change a VARCHAR(MAX) column from the default (in row) to out-of-row, execute this statement:

EXEC sp_tableoption 'demo.OneColVarcharMaxOut', 'large value types out of row', 1;

1 Character

Writing a single-character string to each column in these tables creates an interesting result:

TableName AllocUnitTp PgCt AvgPgSpcUsed RcdCt MinRcdSz MaxRcdSz FwdRcdCt
OneColVarcharMaxIn IN_ROW_DATA 1 0.15 1 12 12 0
OneColVarcharMaxOut IN_ROW_DATA 1 0.33 1 27 27 0
OneColVarcharMaxOut LOB_DATA 1 1.04 1 84 84 NULL

The in-row table created just one page. The out-of-row page created one page for the pointer and one page for the out-of-row string. Notice the byte count difference: the in-row record takes 12 bytes, where the out-of-row record takes 111 bytes.

This illustrates two downsides of out-of-row storage when small records are written:

  1. Multiple pages are affected regardless of the size of the data being written.
  2. Extra storage space is required.

8,000 Characters

TableName AllocUnitTp PgCt AvgPgSpcUsed RcdCt MinRcdSz MaxRcdSz FwdRcdCt
OneColVarcharMaxIn IN_ROW_DATA 1 98.97 1 8011 8011 0
OneColVarcharMaxOut IN_ROW_DATA 1 0.33 1 27 27 0
OneColVarcharMaxOut LOB_DATA 2 50.02 2 84 8014 NULL

Notice that the out-of-row table writes two pages instead of just one (“PgCt” column). It actually takes only 65 characters in the out-of-row table to cause it to create the second page.

8,001 Characters

TableName AllocUnitTp PgCt AvgPgSpcUsed RcdCt MinRcdSz MaxRcdSz FwdRcdCt
OneColVarcharMaxIn IN_ROW_DATA 1 0.43 1 35 35 0
OneColVarcharMaxIn LOB_DATA 1 99.02 126 8015 8015 NULL
OneColVarcharMaxOut IN_ROW_DATA 1 0.33 1 27 27 0
OneColVarcharMaxOut LOB_DATA 2 98.43 127 84 8015 NULL

Once the byte count goes over 8,000, the in-row table writes to a LOB page.

1,000,000 Characters

TableName AllocUnitTp PgCt AvgPgSpcUsed RcdCt MinRcdSz MaxRcdSz FwdRcdCt
OneColVarcharMaxIn IN_ROW_DATA 1 0.43 1 35 35 0
OneColVarcharMaxIn LOB_DATA 126 98.42 126 2020 8054 NULL
OneColVarcharMaxOut IN_ROW_DATA 1 0.33 1 27 27 0
OneColVarcharMaxOut LOB_DATA 126 98.43 127 84 8054 NULL

With such large strings, the results are very similar between default (in-row) and out-of-row settings.

Performance

So how do these various choices affect performance? Let’s code some inserts and record results with different settings and values.

INSERT – 0 to 8,000 Characters

First, we’ll test the performance of a process that inserts strings which fit in VARCHAR(n).

The basic idea here is to insert 10,000 rows of data and record the timing. I ran the tests on my desktop. This is not a server-class machine but is a very high-powered desktop. All tests were executed on SQL Server 2008 R2 and Windows 7 Enterprise.

The results here show microseconds per insert for record lengths between 0 and 8,000 characters. The complete collection of test scripts is linked in the References section at the end of the article.

Performance of a process that inserts strings which fit in VARCHAR(n)

See detailed results in Table 1.

The results here show that VARCHAR(n) is slightly faster than VARCHAR(MAX) (in-row) only for 0-character and 1,000-character strings. At 2,000 characters or above, VARCHAR(MAX) (in-row) is fastest. This result was not expected.

Intuitively, we expect that VARCHAR(n) should be faster than VARCHAR(MAX) (in-row), and in some cases it is. What I think this shows is that, if there is a performance difference, it is small enough that we can’t measure it consistently.

See detailed results in Table 1.

INSERT – Greater Than 8,000 Characters

These tests are limited to VARCHAR(MAX) but compare in-row to out-of-row.

The results here show microseconds per insert for records lengths between 9,000 and 30,000 characters.

Microseconds per insert for records lengths between 9,000 and 30,000 characters

See detailed results in Table 2.

UPDATE – Column Limit

When a VARCHAR(MAX) in-row column goes over the 8,000-character limit, the data is moved to a LOB_DATA page. The linked script Allocation Units - UPDATE.sql tests this 8,000-, then 8,001-, then 8,000-character data shift. This test illustrates the move of data from an IN_ROW_DATA page to a LOB_DATA page and then back for the in-row table.

These operations can be expensive, but how expensive? Make a quick mental prediction before reading on.

The linked script PerfTest - UPDATE.sql tests both the 8,000-to-8,001 scenario and the 8,001-to-8,000 scenario. The results were actually quite surprising. I kept checking the code to see if I had done something wrong because the results didn’t match my expectations at all.

UPDATE Test Results

In/Out

From

To

 Elapsed Mcs

Penalty vs

Out-of-Row

In-Row

8,000

8,001

34,238,839

82%

Out-of-row

8,000

8,001

18,804,768

In-Row

8,001

8,000

9,969,989

-47%

Out-of-row

8,001

8,000

18,804,768

As expected, the 8,000-to-8,001 update takes longer for in-row than for out-of-row. The surprise is that the 8,001-to-8,000 update is faster for in-row. Based on these results, there must be a shortcut employed inside SQL Server that optimizes this move of data from a LOB_DATA page to an IN_ROW_DATA page.

UNICODE

All of these tests were conducted using non-UNICODE strings. Without compression, all the byte counts have to be doubled for UNICODE strings.

Recommendations

Don’t Use VARCHAR(MAX) Everywhere

One design-simplification technique that has been suggested is to use VARCHAR(MAX) for every string column. Though the tests here show that using VARCHAR(MAX) (in-row) instead of VARCHAR(n) is not a performance problem for strings under 8,000 characters, there are a number of reasons to limit the length of strings in your database.

UI Issues

Anyone with experience creating an application UI knows that handling long strings is difficult. When laying out the UI for an application, it is important to know the expected and maximum length of strings that will be displayed. Leaving this open to whatever someone writes to the database makes designing, building, and testing an application very difficult.

Performance – Writing or Updating Long Strings

From the performance results, you can see that writing long strings definitely affects write delays. While writing a record of any size (even zero characters) takes time, writing longer strings takes more time. As an example, using VARCHAR(n), writing 1,000 characters takes an average of 217 microseconds while writing 8,000 characters takes an average of 448 microseconds.

Compression

What I believe is the most compelling reason to avoid over-8,000-character strings is compression. LOB data never gets compressed.

“When considering data compression one very key thing to remember is that out of row data (LOB data) isn’t compressed. If the LOB data is stored in row, then it will be compressed, but only when you compress the table with PAGE level compression. If however you use ROW level compression then the LOB data will never be compressed, no matter if it is stored in row or if it is stored out of row.” (ref)

Conclusions

  • Where appropriate, use VARCHAR(n) over VARCHAR(MAX)
    • for reasons of good design if not performance benefits, and
    • because VARCHAR(MAX) data does not compress
  • Storing large strings takes longer than storing small strings.
  • Updating an in-row VARCHAR(MAX) value from below 8,000 to over 8,000 will be relatively slow, but the difference for a single transaction will likely not be measurable.
  • Updating an in-row VARCHAR(MAX) value from over 8,000 to below 8,000 will be faster than if the table is set to store data out-of-row.
  • Using the out-of-row option for VARCHAR(MAX) will cause slower writes until the strings are very long.

All the scripts used here are attached, so you are encouraged to retest the results.

References

Appendix

Table 1

TableName

MaxRecords

StrLen

ElapsedMcs

VarcharMaxIn

10,000

0

1,341,626

VarcharMaxOut

10,000

0

1,950,038

VarcharN

10,000

0

1,216,823

VarcharMaxIn

10,000

1,000

2,246,443

VarcharMaxOut

10,000

1,000

2,511,649

VarcharN

10,000

1,000

2,168,441

VarcharMaxIn

10,000

2,000

2,308,844

VarcharMaxOut

10,000

2,000

3,104,460

VarcharN

10,000

2,000

2,823,654

VarcharMaxIn

10,000

3,000

2,667,651

VarcharMaxOut

10,000

3,000

3,057,659

VarcharN

10,000

3,000

2,698,852

VarcharMaxIn

10,000

4,000

2,839,255

VarcharMaxOut

10,000

4,000

3,307,264

VarcharN

10,000

4,000

3,010,858

VarcharMaxIn

10,000

5,000

3,229,262

VarcharMaxOut

10,000

5,000

3,525,668

VarcharN

10,000

5,000

3,603,669

VarcharMaxIn

10,000

6,000

3,416,466

VarcharMaxOut

10,000

6,000

4,056,078

VarcharN

10,000

6,000

3,712,871

VarcharMaxIn

10,000

7,000

3,728,472

VarcharMaxOut

10,000

7,000

4,602,088

VarcharN

10,000

7,000

4,056,078

VarcharMaxIn

10,000

8,000

4,321,284

VarcharMaxOut

10,000

8,000

4,945,295

VarcharN

10,000

8,000

4,477,286

Table 2

TableName

MaxRecords

StrLen

ElapsedMcs

VarcharMaxIn

10,000

9,000

5,670,605

VarcharMaxOut

10,000

9,000

6,326,708

VarcharMaxIn

10,000

10,000

5,295,689

VarcharMaxOut

10,000

10,000

5,280,067

VarcharMaxIn

10,000

11,000

5,717,469

VarcharMaxOut

10,000

11,000

5,123,852

VarcharMaxIn

10,000

12,000

6,014,277

VarcharMaxOut

10,000

12,000

5,623,740

VarcharMaxIn

10,000

13,000

6,514,166

VarcharMaxOut

10,000

13,000

6,686,002

VarcharMaxIn

10,000

14,000

6,576,651

VarcharMaxOut

10,000

14,000

6,873,460

VarcharMaxIn

10,000

15,000

6,764,110

VarcharMaxOut

10,000

15,000

6,748,488

VarcharMaxIn

10,000

16,000

6,639,137

VarcharMaxOut

10,000

16,000

7,154,647

VarcharMaxIn

10,000

17,000

7,279,619

VarcharMaxOut

10,000

17,000

7,420,213

VarcharMaxIn

10,000

18,000

7,435,834

VarcharMaxOut

10,000

18,000

7,089,147

VarcharMaxIn

10,000

19,000

7,804,550

VarcharMaxOut

10,000

19,000

7,070,923

VarcharMaxIn

10,000

20,000

7,336,277

VarcharMaxOut

10,000

20,000

8,023,077

VarcharMaxIn

10,000

21,000

8,538,178

VarcharMaxOut

10,000

21,000

7,976,250

VarcharMaxIn

10,000

22,000

7,679,677

VarcharMaxOut

10,000

22,000

8,007,468

VarcharMaxIn

10,000

23,000

7,617,241

VarcharMaxOut

10,000

23,000

7,788,941

VarcharMaxIn

10,000

24,000

7,835,768

VarcharMaxOut

10,000

24,000

8,194,778

VarcharMaxIn

10,000

25,000

8,475,741

VarcharMaxOut

10,000

25,000

8,506,960

VarcharMaxIn

10,000

26,000

8,819,141

VarcharMaxOut

10,000

26,000

8,382,087

VarcharMaxIn

10,000

27,000

9,084,496

VarcharMaxOut

10,000

27,000

8,881,578

VarcharMaxIn

10,000

28,000

8,975,233

VarcharMaxOut

10,000

28,000

9,053,278

VarcharMaxIn

10,000

29,000

9,505,942

VarcharMaxOut

10,000

29,000

8,944,014

VarcharMaxIn

10,000

30,000

9,100,105

VarcharMaxOut

10,000

30,000

9,022,060

Rob Garrison

Author profile:

Rob Garrison is a Data Architect for Nike in Beaverton, Oregon. He has over twenty years of IT experience, as well as having had a number of articles published and speaking regularly at user groups and other events. His immediate family includes his wife, Karen, and his son Tanner. Outside of work, much of their time is spent in activities and service projects involving their church (http://pfcn.org).

Search for other articles by Rob Garrison

Rate this article:   Avg rating: from a total of 71 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Not the entire picture
Posted by: GilaMonster (view profile)
Posted on: Friday, February 03, 2012 at 11:59 AM
Message: There are two things that have been missed here.

1) Online index rebuilds

If you have a varchar(max) in the table, you cannot (in SQL 2005, 2008 or 2008 R2) rebuild the index online. It has to be an offline operation.

For a 24x7x365 system, this is a major reason to avoid varchar/narchar/varbinary(max) or XML as much as possible (and where it is necessary to maybe even consider vertically partitioning the table to move the LOB column to another table)

2) Moving LOB data

Moving tables or indexes to another filegroup is easy. Just rebuild the clustered/nonclustered index on the target filegroup. Well, when there's no LOB data that is. If there is LOB data (stored out of row), it stays behind on the original filegroup and the only way to move it is to create a new table on the desired filegroup WITH TEXT_IMAGE on the desired filegroup, copy the rows across, delete the old table and then add all the indexes, constraints, etc to the new table.

Sure, moving data to a new filegroup isn't something you do every day, but when it does need doing, that decision to use VARCHAR(max) is likely to be a regretted one

Subject: another drawback
Posted by: Jesse (view profile)
Posted on: Sunday, February 05, 2012 at 11:23 PM
Message: another drawback of varchar(max) is that it can't be part of an index key.

This is rather important for things like phonebooks where you look up things by people's names. :)

Subject: Varchar(max) or 8000?
Posted by: Neil Burnett (not signed in)
Posted on: Sunday, February 05, 2012 at 11:26 PM
Message: I was confused by your recommendations. The article' s thrust is using max instead of n, but then the recommendations are about using 8000+ or <8000 char string data. If your data is 8000+ then you have no choice.

Subject: Rebuild index online
Posted by: Neil Burnett (not signed in)
Posted on: Sunday, February 05, 2012 at 11:37 PM
Message: @gilamonster - I didn't know that you can't rebuild an index online if it contains a varchar(max). For me, that is sufficient reason to use varchar(n) wherever possible.

Btw I can't imagine why one would ever use a column with >8000 chars in an index. Any suggestions?

Subject: Rebuild online
Posted by: GilaMonster (view profile)
Posted on: Monday, February 06, 2012 at 2:33 AM
Message: Well the clustered index contains every single column in the table, so if you have a table with a LOB column and the table has a clustered index, then that clustered index has the LOB column as part of it and it cannot be rebuilt online

Nonclustered indexes there are very few reasons why you would include a LOB column (and they certainly can't be part of the index key).

That said, when it comes to rebuilding online, the clustered index is usually the one you most want to rebuild online.

Subject: LOBs
Posted by: Phil Factor (view profile)
Posted on: Monday, February 06, 2012 at 2:45 AM
Message: Do XML columns suffer from the same drawbacks?

Subject: XML columns
Posted by: GilaMonster (view profile)
Posted on: Monday, February 06, 2012 at 3:09 AM
Message: As in clustered index can't be rebuilt online and the LOB pages are hard to move?

Yes.

Msg 2725, Level 16, State 2, Line 1
An online operation cannot be performed for index 'PK__testing' because the index contains column 'Blob' of data type text, ntext, image, varchar(max), nvarchar(max), varbinary(max), xml, or large CLR type.

The limitation around online index rebuilds has been removed in SQL 2012, but that still leaves those on 2005, 2008 and 2008 R2 the problem to deal with.

Subject: Typo?
Posted by: Karen (view profile)
Posted on: Monday, February 06, 2012 at 7:00 AM
Message: In the "Single String Column with VARCHAR(n) and VARCHAR (MAX)" > "8,001 Characters" section, you have OneColVarcharN as the TableName for the second row in the table. Should this be OneColVarcharMax?

Subject: Said it best
Posted by: RowlandG (view profile)
Posted on: Monday, February 06, 2012 at 8:30 AM
Message: I was going to answer along the lines of Gila Monster but s/he beat me to it!

Anyone relying on 24x7 index rebuilds should hate the VARCHAR(MAX) approach. Nasty! Ick! Spit it out!

Subject: What about data modeling fundamentals
Posted by: twilson (view profile)
Posted on: Monday, February 06, 2012 at 10:46 AM
Message: Before one even gets to thinking about the performance issues of using varchar(max) everywhere - what about the fundamentals of data modeling? If an attribute is has a maximum length in the real world - say, 11 characters for an SSN with embedded dashes - then making it be char(10) or varchar(11) in the database eliminates an entire category of invalid data, and documents to a consumer of the data that this column has a definite maximum length.

Subject: Typo
Posted by: RGarrison (view profile)
Posted on: Monday, February 06, 2012 at 10:57 AM
Message: Karen, good catch. We'll get that updated.

Subject: Typo
Posted by: RGarrison (view profile)
Posted on: Monday, February 06, 2012 at 1:52 PM
Message: Karen, good catch. We'll get that updated.

Subject: Responding to Neil
Posted by: RGarrison (view profile)
Posted on: Monday, February 06, 2012 at 2:14 PM
Message: Neil, you wrote, "The article's thrust is using max instead of n, ..."

I would say that the article's thrust was exploration. Given the various choices available, how do those choices affect performance and storage? Even with MAX, you still have the choice of in-row versus out-of-row.

As you point out, three of the four recommendations are only valid if you have an option of using VARCHAR(n). The other, regarding INSERT/UPDATE performance, simply states what is intuitively obvious: writing or updating larger strings takes more time. But now you have a graph and real results that you can reference.

Subject: 8060 Character
Posted by: MidBar (not signed in)
Posted on: Thursday, February 09, 2012 at 10:34 AM
Message: As per Microsoft
http://msdn.microsoft.com/en-us/library/ms190969.aspx

Page size is 8192 bytes and header is 96 bytes
so remaining is 8096

1: so what is this 8060 and 8047 or 8048 limitation?

Please clarify or also include in the initial description that

Page size limitaion i.e.8192
Record size limitation i.e.8060
Column size limitation for varchar i.e. 8000 even it is varchar(max)

thanks.

Subject: 8060 Character
Posted by: MidBar (not signed in)
Posted on: Friday, February 10, 2012 at 3:26 AM
Message: As per Microsoft
http://msdn.microsoft.com/en-us/library/ms190969.aspx

Page size is 8192 bytes and header is 96 bytes
so remaining is 8096

1: so what is this 8060 and 8047 or 8048 limitation?

Please clarify or also include in the initial description that

Page size limitaion i.e.8192
Record size limitation i.e.8060
Column size limitation for varchar i.e. 8000 even it is varchar(max)

thanks.

Subject: Responding to MidBar
Posted by: RGarrison (view profile)
Posted on: Friday, February 10, 2012 at 1:57 PM
Message: There are two references in the article that relate to page size:

Row-Overflow Data Exceeding 8 KB
http://msdn.microsoft.com/en-us/library/ms186981.aspx

Table and Record Structure
http://sqlserverpedia.com/wiki/Table_and_Record_Structure

What is the 8060 limitation? From the MSDN reference: "A table can contain a maximum of 8,060 bytes per row."

Kevin adds details in the SQLServerPedia article.

The exact number of bytes that can be written was found through experimentation, not calculation.

Rob

Subject: RE: UPDATE Performance Difference with IN/OUT of Row
Posted by: srutzky (view profile)
Posted on: Saturday, February 11, 2012 at 11:33 PM
Message: Hi Rob. I was testing the "PerfTest - UPDATE.sql" script in the In-Row vs. Out-of-Row LOB Storage | UPDATE | Performance section and noticed a minor typo. When displaying the results at the end, you have the @etOut8001 variable used for both out-of-row rows when the @etOut8000 variable should be used for the last row: 'out-of-row, 8000'.

After making this change I noticed that the time difference for the out-of-row set varied as to which direction was faster but the in-row set was always faster going to 8000.

My theory is that the out-of-row set fluctuates due to both directions most likely being the same basic operation in that the same LOB pages are being overwritten in both cases. But rather than there being a short-cut for going from LOB to IN_ROW pages, couldn't it be that going from IN_ROW to LOB pages is two steps (write the LOB page and then write the pointer to the LOB page in the IN_ROW page) whereas going from LOB to IN_ROW pages is a single step (write the IN_ROW page)?

Take care,
Solomon...

Subject: SSIS
Posted by: phorstink (not signed in)
Posted on: Wednesday, February 15, 2012 at 12:22 AM
Message: I have had bad experience with varchar(MAX) when using it in SSIS. By default the data will be typed ad DB_TEXT, which will introduce all kinds of problems, like not being able to use it in lookup.
That means we have to 'substring' the data, but how many characters? This goes back to the design-considderation mentioned before...

Subject: Indexing VARCHAR columns
Posted by: Dave.Poole (view profile)
Posted on: Wednesday, February 15, 2012 at 1:32 AM
Message: As the maximum size of a VARCHAR column that can participate in an index is 900 bytes (see Index Key size in http://msdn.microsoft.com/en-us/library/ms188783.aspx) the actual ability of a VARCHAR(n) to store more than that isn't relevant from an indexing perspective.

Incidentally if you are looking for an exact match in a large VARCHAR column then putting an index on a computed column that holds the CHECKSUM of that VARCHAR column is a useful technique.

WHERE YourCheckSumColumn = @CheckSumValue AND YourGreatBigunindexedString = @YourGreateBigunindexedStringValue

The CHECKSUM value is often selective enough to get an index seek so for the cost of a 4 byte integer plus an index on it you gain from not having to have a great big index on your large string.

It's probably preaching to the converted but if your database can have dodgy data in it then it will have dodgy data in it. I've just profiled data where the UK postcode field is a VARCHAR(20) and even though there are no UK postcodes wider than 8 characters there are all sorts of strings longer than that in the field. And this is from something that is allegedly populated from a PAF system!
Sticking VARCHAR(MAX) everywhere is committing a bizarre form of data suicide worthy of an entry in the Darwin awards.

Subject: Insert Test Code?
Posted by: TheSQLGuru (view profile)
Posted on: Wednesday, February 15, 2012 at 7:54 AM
Message: How did you do the insert tests? Were they singular inserts one row at a time? If so the implicit transactions and/or log buffer flushes would likely overwhelm any other performance issues.

Subject: Re: LOB and Compression - NVARCHAR
Posted by: Anonymous (not signed in)
Posted on: Wednesday, February 15, 2012 at 8:38 AM
Message: I am surprised that no one mentions NVARCHAR and NVARCHAR(max) as there are really interesting and difficult issues with index size, data size, and performance. This is the only solution for UTF characters, and UTF 8 remains unsupported in favor of UCS-2 and compression. Might be good as a follow-up article.

Subject: Re: Indexing VARCHAR columns
Posted by: Brian Feifarek (not signed in)
Posted on: Wednesday, February 15, 2012 at 10:20 AM
Message: Regarding Dave.Poole's suggesting to use a CHECKSUM value to look for exact matches in large varchar columns, my exploration of CHECKSUM in SQL Server 2000 found it to not be very useful under the 'real world' data values I was working with. I found very many identical check sums for very different data values in the small data set I was testing (unfortunately I do not recall details, but it was not remotely helpful for flagging identical matches)

Subject: Well Done
Posted by: Anonymous (not signed in)
Posted on: Wednesday, February 15, 2012 at 12:31 PM
Message: great article

Subject: There IS a performance hit with VARCHAR(MAX)
Posted by: Jeff Moden (view profile)
Posted on: Wednesday, February 15, 2012 at 12:34 PM
Message: Actually, there IS a performance hit with VARCHAR(MAX) for many things including doing delimited splits and the like. Even if the MAX data is <8000 characters, the code will run at least twice as slow because SQL Server really doesn't like joining to the MAX datatypes even if it's just a Tally table or Tally cte.

Subject: Compression
Posted by: Martin Smith (view profile)
Posted on: Wednesday, January 02, 2013 at 5:59 AM
Message: The section on compression seems somewhat confusing. To be clear row compression has no effect on varchar(x) stored either in row or off row either (see http://msdn.microsoft.com/en-us/library/cc280576.aspx). It is true that if the strings are greater than 8,000 characters they will necessarily be stored off row but I'm not sure what solution you propose to that. Split them up into multiple rows?

Subject: Performance issue with 0 rows?
Posted by: lewisar (view profile)
Posted on: Tuesday, November 19, 2013 at 5:53 AM
Message: I came across this discussion as a result of searching for anyone experiencing this bizarre and unexpected behaviour... and I'm just wondering whether anyone here has come across this?

I am inserting using a statement of the form

INSERT INTO A columnlist SELECT column list FROM B JOIN C WHERE B.X NOT IN (SELECT X FROM A)

So just inserting into a table only if the data doesn't already exist.

Basically it worked fine until the situation where the SELECT returned 0 rows - then it would take up to a minute to complete, even with no other activity on the whole db server.

I took all the indexes off table A and it was still the same. So I started inserting data one column at a time, then 2 columns and so on, and it was fine until it hit the first VARCHAR(MAX) column - and suddenly it took a minute to insert ZERO rows!

I have now change all VARCHAR(MAX) columns to VARCHAR(N) and the problem has gone away!

If it was actually inserting data it was fine performance-wise, it was only when inserting 0 rows it ground to a halt, and it was definitely the VARCHAR(MAX) columns which killed it.

Any ideas anyone?

Ade.

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

SQL Server XML Questions You Were Too Shy To Ask
 Sometimes, XML seems a bewildering convention that offers solutions to problems that the average... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.