Click here to monitor SSC
  • Av rating:
  • Total votes: 93
  • Total comments: 14
Robert Sheldon

Bulk Inserts via TSQL in SQL Server

31 January 2011

The trouble with using BCP for reading data into SQL Server is that BCP is a command-line program. Fortunately, the same input functionality is available from TSQL within SQL Server. In fact there are two ways, and it is not always obvious which to use. Rob Sheldon explains...

In a past article on the bcp command-line utility, I demonstrated how to use the utility to bulk load external data into a SQL Server database. The advantage of using bcp is that you can quickly and easily import data from a windows command-line process without having to open an Integrated Development Environment (IDE) such as SQL Server Management Studio, or use SQLCMD, to issue a Transact-SQL statement. However, there might be times when you want to use Transact-SQL to bulk load data, in which case, you can use one of two statements supported by SQL Server:

  • BULK INSERT—available in SQL Server 2005 and 2008
  • INSERT…SELECT * FROM OPENROWSET(BULK…)— available in SQL Server 2005 and 2008.

In this article, I show you how to use both statements to bulk import external data, and then I discuss some of the performance considerations to take into account when using these methods. I’ve included a number of examples, which I created on a local instance of SQL Server 2008 (using the AdventureWorks2008 sample database). To develop these examples, I first used the following Transact-SQL to create and populate the dbo.Employees table:

USE master;

IF DB_ID('EmployeeDB') IS NOT NULL

DROP DATABASE EmployeeDB;

CREATE DATABASE EmployeeDB;

GO

USE EmployeeDB;

IF OBJECT_ID ('Employees', 'U') IS NOT NULL

DROP TABLE dbo.Employees;

CREATE TABLE dbo.Employees

(

  EmployeeID INT NOT NULL,

  FirstName NVARCHAR(50) NOT NULL,

  LastName NVARCHAR(50) NOT NULL,

  JobTitle NVARCHAR(50) NOT NULL,

  City NVARCHAR(30) NOT NULL,

  StateProvince NVARCHAR(50) NOT NULL,

  CountryRegion NVARCHAR(50) NOT NULL,

  CONSTRAINT PK_Employees PRIMARY KEY CLUSTERED (EmployeeID ASC)

);

In addition to creating the Employees table, I used the bcp utility to export data from the vEmployee view in the AdventureWorks2008 database. I saved the data to a folder on my local drive (C:\Data\). I then created format files based on the newly created Employees table.

The first bcp command I ran exports the data as character data to a file named EmployeeData_c.dat:

bcp "SELECT BusinessEntityID, FirstName, LastName, JobTitle, City, StateProvinceName, CountryRegionName FROM AdventureWorks2008.HumanResources.vEmployee ORDER BY BusinessEntityID" queryout C:\Data\EmployeeData_c.dat -c -t, -r \r\n -S localhost\SqlSrv2008 -T

(For these commands, you’ll need to change the location of the file and the name of the server, you may also need to change  ‘AdventureWorks2008’ to the current name of the database: The BusinessEntityID, introduced for Entity Framework, has now been changed back to EmployeeID so you might need to rename this too!)

The second bcp command creates a format file named EmployeeFormat_c.fmt, based on character data:

bcp EmployeeDB.dbo.Employees format nul -c -t, -r \r\n -f C:\Data\EmployeeFormat_c.fmt -S localhost\SqlSrv2008 -T

 Next I exported the data in the wide native format and saved the data to the EmployeeData_n.dat file:

bcp "SELECT BusinessEntityID, FirstName, LastName, JobTitle, City, StateProvinceName, CountryRegionName FROM AdventureWorks2008.HumanResources.vEmployee ORDER BY BusinessEntityID" queryout C:\Data\EmployeeData_n.dat -N -S localhost\SqlSrv2008 -T

Finally, I created a format file based on the wide native format and named the file EmployeeFormat_n.fmt:

bcp EmployeeDB.dbo.Employees format nul -N -f C:\Data\EmployeeFormat_n.fmt -S localhost\SqlSrv2008 -T

You now have all the setup information you need to try out the examples in this article. Even if you don’t run the examples, the Transact-SQL and bcp commands will at least help you better understand how the external data was prepared in order to import that data back into the database.

NOTE: For most of the examples, you’ll need to truncate the Employees table after each bulk load to prevent primary key errors when loading the data. This is because you’re loading the same source data into the same target table for each example.

BULK INSERT

The first statement we’ll look at is BULK INSERT, which lets you import data from a data file into a table or view. You can specify the format of the imported data, based on how that data is stored in the file. In the following example, I import the data from the EmployeeData_c.dat file into the Employees table:

BULK INSERT EmployeeDB.dbo.Employees

FROM 'C:\Data\EmployeeData_c.dat'

WITH

  (

    DATAFILETYPE = 'char',

    FIELDTERMINATOR = ',',

    ROWTERMINATOR = '\r\n'

  );

 

Notice that I begin the statement by first specifying the BULK INSERT keywords, followed by the name of the target table, that is, the table in which the data will be loaded. In the next line, I define a FROM clause that includes the full path to the source data file.

The last part of the statement is the optional WITH clause, which is where you define any additional arguments, such as those related to formatting. In this case, I specify that the data file type is char data, which means that the data has been saved in the character format. Next, I specify that each field is terminated with a comma and each line terminated with a newline (\r\n) character.

That’s all there is to using the BULK INSERT statement to import data from a file. In some cases, however, you might want to use a format file when importing your data. In such situations, rather than specifying the formatting details in the WITH clause, you specify the name of the format file, as shown in the following example:

BULK INSERT EmployeeDB.dbo.Employees

FROM 'C:\Data\EmployeeData_c.dat'

WITH

  (

    FORMATFILE = 'C:\Data\EmployeeFormat_c.fmt'

  );

 

As you can see, when you use a format file to import data, you don’t need to know the format of the source data; you need only to know the location of the format file.

In the previous examples, I imported data that had been saved in the character format. However, when possible, you should import your data in native format to help make your bulk load more efficient. In the following example, I specify the data file that’s been saved in the wide native format:

BULK INSERT EmployeeDB.dbo.Employees

FROM 'C:\Data\EmployeeData_n.dat'

WITH

  (

    DATAFILETYPE = 'widenative'

  );

 

As you can see, I specified only one argument in the WITH clause and used the widenative option to indicate that the source data is stored with native data types, except char, varchar, and text columns, which are stored as Unicode.

I can also use a format file to import data saved in the wide native format, as shown in the following example:

BULK INSERT EmployeeDB.dbo.Employees

FROM 'C:\Data\EmployeeData_n.dat'

WITH

  (

    FORMATFILE = 'C:\Data\EmployeeFormat_n.fmt'

  );

 

In this case, my only argument in the WITH clause is the location of the format file, just like you saw when I used a format file to import character data.

That’s all there is to using the BULK IMPORT statement to bulk load data into a SQL Server database. There are, of course, additional arguments you can specify in the WITH clause, some of which I cover later in the article. However, for a description of all arguments available to the WITH clause, check out the topic “BULK INSERT (Transact-SQL)” in SQL Server Books Online.

INSERT…SELECT

Starting with SQL Server 2005, the OPENROWSET function supports bulk operations through the use of the BULK provider. You can use the function and provider within an INSERT…SELECT statement to bulk load data from a data file into a table or view, just like you saw with the BULK INSERT statement. In the following INSERT…SELECT statement, I call the OPENROWSET function and BULK provider in order import data from the EmployeeData_c.dat file:

INSERT INTO

  EmployeeDB.dbo.Employees

SELECT *

FROM

  OPENROWSET(BULK 'c:\data\EmployeeData_c.dat',

  FORMATFILE = 'C:\Data\EmployeeFormat_c.fmt'

  ) AS e;

 

I start the statement as I would any other INSERT…SELECT statement: I first specify the INSERT INTO keywords, followed by the name of the target table. I then define the SELECT statement.

Within the SELECT statement, I use the OPENROWSET function in the FROM clause. Notice that in the function’s first argument I invoke the BULK provider, followed by the name of the source file. For the second argument, I specify the path to the format file. When you use OPENROWSET to bulk load data, you must specify a format file as the second argument. Unlike the BULK INSERT statement, you cannot simply specify formatting information.

Another important point to note about the example above is that I provide an alias (e) for the output of the OPENROWSET function. The alias is required in order to use the function in this way.

In the example above, I import data that’s been saved in the character format. However, I can also use INSERT…SELECT to import data saved in the native format, as shown in the following example:

INSERT INTO

  EmployeeDB.dbo.Employees

SELECT *

FROM

  OPENROWSET(BULK 'c:\data\EmployeeData_n.dat',

  FORMATFILE = 'C:\Data\EmployeeFormat_n.fmt') AS e;

 

As you can see, there’s little difference between this statement and the preceding one. In the case of the latter, I simply specify the source data file that was saved in the native format and the format file that works with the native data. For more details about using the OPENROWSET function in your statements, see the topic “OPENROWSET (Transact-SQL)” in SQL Server Books Online.

Performance

When bulk importing data into a SQL Server database, there are several steps you can take to maximize performance. One of these steps I mentioned earlier, and that is, when possible, work with data in SQL Server native format. Native format retains SQL Server’s native data types. Even if you use wide native format (preserving the Unicode), as I did in several of the examples above, you’re still taking advantage of the non-character native data types.

Of course, you can’t always control whether the source file is in native or wide native format, but there are other steps you can take to maximize performance. In this section, we’ll look at several of those steps and review examples that demonstrate how to implement them.

NOTE: Microsoft documentation on bulk loading operations can often be confusing, especially when trying to understand the BULK INSERT and the SELECT…INTO statements—and the differences between them. In researching these statements, you might find contradictory or inaccurate information, or information that simply doesn’t make sense. Whenever you’re developing a bulk load solution, you’re best bet is to test different scenarios and see for yourself what works best.

Table Locking

One way you can help to improve performance in your bulk load operations is to specify table locking for the duration of the data load. You can do this by including the TABLOCK argument with your statement. The argument issues a bulk update (BU) lock on the table. In the following BULK INSERT statement, I’ve included the TABLOCK argument in the WITH clause:

BULK INSERT EmployeeDB.dbo.Employees

FROM 'C:\Data\EmployeeData_n.dat'

WITH

  (

    FORMATFILE = 'C:\Data\EmployeeFormat_n.fmt',

    TABLOCK

  );

 

Notice that I simply specify the TABLOCK keyword in the WITH clause. However, to use the TABLOCK option in an INSERT…SELECT statement, you must include it as part of the INSERT clause, as shown in the following example:

INSERT INTO

  EmployeeDB.dbo.Employees WITH(TABLOCK)

SELECT *

FROM

  OPENROWSET(BULK 'C:\Data\EmployeeData_n.dat',

  FORMATFILE = 'C:\Data\EmployeeFormat_n.fmt') AS e;

 

As you can see, I’ve simply added a WITH subclause, with the TABLOCK hint enclosed in parentheses. Now when I run either of these statements, a BU lock will be issued on the table.

The TABLOCK argument can be particularly useful if you want to perform parallel data loads into a single table. However, you should use TABLOCK for parallel operations only if the table is unindexed (is a heap). For nonparallel operations, TABLOCK is useful whether or not the table is indexed. The lock prevents operations that are not related to bulk loading data from accessing the table.

Minimal Logging

Another method you can use to maximize performance is to minimally log the operation when bulk loading data. Minimal logging can make bulk load operations more efficient and minimize the risk that the transaction log will fill up. To minimally log a bulk load operation, the TABLOCK option must be specified and the table must not be being replicated. In addition, the operation can be minimally logged only under one of the following conditions:

  • If the table has no indexes, the data pages can be minimally logged.
  • If the table has no clustered indexes, has nonclustered indexes, and is empty, data pages and index pages can be minimally logged.
  • If the table has no clustered indexes, has nonclustered indexes, and has data, data pages can be minimally logged but index pages cannot.
  • If the table has a clustered index but is empty, data pages and indexed pages can be minimally logged. (Both types of pages are fully logged whenever the table contains data.)

To support minimal logging, you must change the database’s recovery mode to BULK_LOGGED. You should then change the recovery mode back to its original state as soon as the bulk load operation is over. In the following example, I set the recovery mode before and after I issue a BULK INSERT statement:

ALTER DATABASE EmployeeDB

SET RECOVERY BULK_LOGGED;

 

BULK INSERT EmployeeDB.dbo.Employees

FROM 'C:\Data\EmployeeData_n.dat'

WITH

  (

    FORMATFILE = 'C:\Data\EmployeeFormat_n.fmt',

    TABLOCK

  );

 

ALTER DATABASE EmployeeDB

SET RECOVERY FULL;

 

Notice that I issue an ALTER DATABASE statement before and after my BULK INSERT statement. The ALTER DATABASE statement includes the SET RECOVERY clause, which first sets the recovery mode to BULK_LOGGED, and then sets it back to FULL

You can do the same thing when you issue an INSERT…SELECT statement, as shown in the following example:

ALTER DATABASE EmployeeDB

SET RECOVERY BULK_LOGGED;

 

INSERT INTO

  EmployeeDB.dbo.Employees WITH(TABLOCK)

SELECT *

FROM

  OPENROWSET(BULK 'C:\Data\EmployeeData_n.dat',

  FORMATFILE = 'C:\Data\EmployeeFormat_n.fmt') AS e

 

ALTER DATABASE EmployeeDB

SET RECOVERY FULL;

 

Note that, if you bulk load data in batches, the table is considered empty only at the start of the first batch, which means that minimal logging will apply to only the first batch. For this reason, it is better to bulk load your data in a single batch, when possible. If it’s not possible, then you should maintain a close watch on your transaction logs. Again, you want to be sure to test different scenarios when preparing to bulk load data.

Dropping Indexes

Bulk load operations can be affected by the indexes defined on a table, not only with regard to minimally logging your operations, but also when you’re importing a large amount of data compared to the amount already in the table. In such cases, it is sometimes useful to drop the indexes before you load the data, and then re-create those indexes after the operation has completed.

Dropping indexes can significantly improve the performance of a bulk load operation. However, it’s not always a clear-cut decision when to drop them. If you’re importing a small amount of data compared to the amount of data in the table, then you probably shouldn’t drop the indexes; otherwise, you’ll spend more time rebuilding the indexes than it takes to load the data. However, when the situation is reversed, you’ll often benefit from dropping the indexes.

In the following example, I first include an ALTER TABLE statement that drops the primary key constraint (and its associated clustered index) and then include a second ALTER TABLE statement that re-creates the constraint and its index:

ALTER DATABASE EmployeeDB

SET RECOVERY BULK_LOGGED;

 

IF OBJECT_ID ('PK_Employees', 'PK') IS NOT NULL

ALTER TABLE Employees

DROP CONSTRAINT PK_Employees;

 

BULK INSERT EmployeeDB.dbo.Employees

FROM 'C:\Data\EmployeeData_n.dat'

WITH

  (

    FORMATFILE = 'C:\Data\EmployeeFormat_n.fmt',

    TABLOCK

  );

 

ALTER TABLE Employees

ADD CONSTRAINT PK_Employees PRIMARY KEY CLUSTERED (EmployeeID);

 

ALTER DATABASE EmployeeDB

SET RECOVERY FULL;

As you can see, I drop the clustered index, run the BULK INSERT statement, and then re-create the clustered index. I take the same approach in the next example, except that I issue an INSERT…SELECT statement:

ALTER DATABASE EmployeeDB

SET RECOVERY BULK_LOGGED;

 

IF OBJECT_ID ('PK_Employees', 'PK') IS NOT NULL

ALTER TABLE Employees

DROP CONSTRAINT PK_Employees;

 

INSERT INTO

  EmployeeDB.dbo.Employees WITH(TABLOCK)

SELECT *

FROM

  OPENROWSET(BULK 'C:\Data\EmployeeData_n.dat',

  FORMATFILE = 'C:\Data\EmployeeFormat_n.fmt') AS e

 

ALTER TABLE Employees

ADD CONSTRAINT PK_Employees PRIMARY KEY CLUSTERED (EmployeeID);

 

ALTER DATABASE EmployeeDB

SET RECOVERY FULL;

 

If a table has an index other than one associated with a primary key or unique constraint, you should use the DROP INDEX statement to remove the index and then use the CREATE INDEX statement to re-create the index.

Ordering

By default, SQL Server assumes that the source data is unordered when you perform a bulk load operation. However, in some cases, you can improve the performance of that operation by specifying the sort order of the imported data. But you can do this only if the target table has a clustered index and the source data file is sorted in the same way as defined by the index. If these conditions are met, you can add an ORDER option to your bulk load statement to help increase the speed of the operation. The ORDER option specifies the sort order of the incoming data. The columns listed along with that option should match the columns as defined in the clustered index and be in the same order.

In the following BULK INSERT statement, I’ve added an ORDER option to the WITH clause and specified the EmployeeID column as the sort column:

ALTER DATABASE EmployeeDB

SET RECOVERY BULK_LOGGED;

 

BULK INSERT EmployeeDB.dbo.Employees

FROM 'C:\Data\EmployeeData_n.dat'

WITH

  (

    FORMATFILE = 'C:\Data\EmployeeFormat_n.fmt',

    TABLOCK,

    ORDER (EmployeeID ASC)

  );

 

ALTER DATABASE EmployeeDB

SET RECOVERY FULL;

 

The EmployeeID column, as it’s listed with the ORDER option, matches how the clustered index is defined on the Employees table. Because the data in the source file matches the clustered index, I’m able to use the sort order to improve the performance of the bulk load operation, without having to drop the index.

I can do the same thing with the SELECT…INTO statement, as shown in the following example:

ALTER DATABASE EmployeeDB

SET RECOVERY BULK_LOGGED;

 

INSERT INTO

  EmployeeDB.dbo.Employees WITH(TABLOCK)

SELECT *

FROM

  OPENROWSET(BULK 'C:\Data\EmployeeData_n.dat',

  FORMATFILE = 'C:\Data\EmployeeFormat_n.fmt',

  ORDER (EmployeeID ASC)

  ) AS e;

 

ALTER DATABASE EmployeeDB

SET RECOVERY FULL;

 

Notice that I’ve included the ORDER option in the FROM clause and, as with the BULK INSERT statement, specified the EmployeeID as the sort column.

There’s one other action you can take with an INSERT…SELECT statement when bulk loading data into a table with a clustered index. You can turn on trace flag 610, which controls minimal logging of a bulk load operation. Under certain circumstances, you can achieve minimal logging when inserting data into an indexed table. For instance, when a new page is allocated, all the rows added sequentially to that page are minimally logged, which can lead to improved performance.

In the following example, I turn on trace flag 610, issue the INSERT…SELECT statement, and then turn off the trace flag:

ALTER DATABASE EmployeeDB

SET RECOVERY BULK_LOGGED;

 

DBCC TRACEON(610);

 

INSERT INTO

  EmployeeDB.dbo.Employees WITH(TABLOCK)

SELECT *

FROM

  OPENROWSET(BULK 'C:\Data\EmployeeData_n.dat',

  FORMATFILE = 'C:\Data\EmployeeFormat_n.fmt',

  ORDER (EmployeeID ASC)

  ) AS e;

 

DBCC TRACEOFF(610);

 

ALTER DATABASE EmployeeDB

SET RECOVERY FULL;

As you can see, I simply issued a DBCC statement and specified the TRACEON option to turn on the trace flag, and then issued a DBCC statement and specified the TRACEOFF option to turn off the trace flag.

Constraints and Triggers

When running BULK INSERT and INSERT…SELECT statements, you can control whether check constraints and triggers are enforced during the bulk load operation. By disabling the constraints and triggers, you can see an improvement in the bulk load operations. However, if you do, you must be sure to follow up on any critical operations the constraints and triggers perform, after you’ve completed your bulk load operation.

When it comes to check constraints and triggers, the default behavior of the BULK INSERT and INSERT…SELECT statement is different:

  • By default, the BULK INSERT statement ignores constraints and disables triggers.
  • By default, the INSERT…INTO statement executes check constraints and triggers.

For both statements, you can override the default behavior. For instance, in the following BULK INSERT statement, I override the default behavior in order to run the check constraints and triggers:

ALTER DATABASE EmployeeDB

SET RECOVERY BULK_LOGGED;

 

IF OBJECT_ID ('PK_Employees', 'PK') IS NOT NULL

ALTER TABLE Employees

DROP CONSTRAINT PK_Employees;

 

BULK INSERT EmployeeDB.dbo.Employees

FROM 'C:\Data\EmployeeData_n.dat'

WITH

  (

    FORMATFILE = 'C:\Data\EmployeeFormat_n.fmt',

    TABLOCK,

    CHECK_CONSTRAINTS,

    FIRE_TRIGGERS

  );

 

ALTER TABLE Employees

ADD CONSTRAINT PK_Employees PRIMARY KEY CLUSTERED (EmployeeID);

 

ALTER DATABASE EmployeeDB

SET RECOVERY FULL;

 

Notice that I’ve added the CHECK_CONSTRAINTS and FIRE_TRIGGERS arguments to the WITH clause. Now when I use the BULK INSERT statement to load data, the check constraints and triggers will run, which can affect the performance negatively of the bulk load operation, but it does ensure that the constraint and trigger operations are performed.

I take an opposite approach in the following INSERT…SELECT statement by specifying that constraints and triggers be ignored:

ALTER DATABASE EmployeeDB

SET RECOVERY BULK_LOGGED;

 

IF OBJECT_ID ('PK_Employees', 'PK') IS NOT NULL

ALTER TABLE Employees

DROP CONSTRAINT PK_Employees;

 

INSERT INTO

  EmployeeDB.dbo.Employees

  WITH(TABLOCK, IGNORE_CONSTRAINTS, IGNORE_TRIGGERS)

SELECT *

FROM

  OPENROWSET(BULK 'C:\Data\EmployeeData_n.dat',

  FORMATFILE = 'C:\Data\EmployeeFormat_n.fmt') AS e

 

ALTER TABLE Employees

ADD CONSTRAINT PK_Employees PRIMARY KEY CLUSTERED (EmployeeID);

 

ALTER DATABASE EmployeeDB

SET RECOVERY FULL;

 

Notice that I’ve added the IGNORE_CONSTRAINTS and IGNORE_TRIGGERS arguments to the WITH subclause of the INSERT clause. Now check constraints and triggers will be ignored. However, whenever ignoring constraints and triggers, be sure that the time and effort it takes to revalidate the data and take any other necessary actions doesn’t outweigh the gains you made during the bulk load operation.

Conclusion

As you can see, trying to determine the best way to bulk load data can be a complicated process. It often takes testing various scenarios to come up with the best approach. In addition, it’s not always clear, based on Microsoft documentation, whether to use BULK INSERT or INSERT…SELECT * FROM OPENROWSET(BULK…) when importing external data. Some information indicates that INSERT…SELECT will not support concurrent bulk load operations into a single table. (I haven’t tested this scenario.) It also appears that INSERT…SELECT doesn’t let you specify multiple batches in a single operation (no BATCHSIZE option). Even so, for most bulk load operations, you’ll probably do fine with either statement, as long as you take other steps to improve performance. You’ll have to take each operation on a case-by-case basis and determine what works best for your specific circumstances.

 Appendix

Here are the railroad diagrams for Bulk Insert and OpenRowSet. The PDF versions are attached as files to this article. For completeness, the OpenRowSet diagram shows all the available syntax as well as the BULK functionality.

BULK INSERT

OpenRowSet

Robert Sheldon

Author profile:

After being dropped 35 feet from a helicopter and spending the next year recovering, Robert Sheldon left the Colorado Rockies and emergency rescue work to pursue safer and less painful interests—thus his entry into the world of technology. He is now a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. He has also written news stories, feature articles, restaurant reviews, legal summaries, and the novel 'Dancing the River Lightly'. You can find more information at http://www.rhsheldon.com.

Search for other articles by Robert Sheldon

Rate this article:   Avg rating: from a total of 93 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Where's the BULK EXPORT?
Posted by: Paul Hunter (view profile)
Posted on: Wednesday, February 02, 2011 at 6:35 AM
Message: I keep BCP around (and therefore xp_cmdshell enabled) because there is no simple T-SQL export mechanism. SSIS can be used, but if all you're trying to do is export the contents of a table/view then SSIS is a bit like using a sledge hammer to kill a fly. If Microsoft can create a routine with access to the file system (BULK INSERT) then why not use a similar metaphor for a BULK EXPORT?

Subject: PDF versions
Posted by: Anonymous SQL Guy (not signed in)
Posted on: Wednesday, February 02, 2011 at 7:01 AM
Message: Both attachments are BULK INSERT, please post OPENROWSET

Subject: re: PDF version
Posted by: Andrew Clarke (view profile)
Posted on: Wednesday, February 02, 2011 at 7:30 AM
Message: Now fixed!  Heaven only knows how that happened!

Subject: Great!
Posted by: Jeanine (not signed in)
Posted on: Wednesday, February 02, 2011 at 7:35 AM
Message: One of the best learning aids I've seen yet on bcp!

Subject: Thank you!
Posted by: Andre Guerreiro (not signed in)
Posted on: Wednesday, February 02, 2011 at 6:52 PM
Message: I will bookmark this article and read it as often as I can to assimilate its content. Bravo. Very detailed explanations.

One question: on the first code of the article, why do you need to check if the "Employees" table already exists even after you have dropped the database to which it belongs in the previous commands?

Thanks a lot for such a fine contribution. Now I'm going back to study. :)

Regards,

Andre Guerreiro Neto
MCP-MCTS SQL Server 2008 Database Development
www.novosis.com.br

Subject: Thank you!
Posted by: Andre Guerreiro (not signed in)
Posted on: Wednesday, February 02, 2011 at 7:42 PM
Message: I will bookmark this article and read it as often as I can to assimilate its content. Bravo. Very detailed explanations.

One question: on the first code of the article, why do you need to check if the "Employees" table already exists even after you have dropped the database to which it belongs in the previous commands?

Thanks a lot for such a fine contribution. Now I'm going back to study. :)

Regards,

Andre Guerreiro Neto
MCP-MCTS SQL Server 2008 Database Development
www.novosis.com.br

Subject: sq
Posted by: server management (not signed in)
Posted on: Wednesday, February 02, 2011 at 9:52 PM
Message: This article gives the light in which we can observe the reality. This is highly nice one and gives in-depth information. Thanks for this nice article.
<a href="http://www.seeksadmin.com">server management</a>

Subject: Railroad Diagrams
Posted by: Andrew Freeman (not signed in)
Posted on: Monday, February 07, 2011 at 1:50 AM
Message: Great Article!
I find the railroad diagrams you use to illustrate your articles very easy to read:
Do you generate them yourself? If you do, do you generate them by hand or do you use a tool. I'd love to have a complete set for all SQL server commands, is this available anywhere?

Regards
Andrew

Subject: Railroad Diagrams
Posted by: Andrew Freeman (not signed in)
Posted on: Monday, February 07, 2011 at 2:41 AM
Message: Great Article!
I find the railroad diagrams you use to illustrate your articles very easy to read:
Do you generate them yourself? If you do, do you generate them by hand or do you use a tool. I'd love to have a complete set for all SQL server commands, is this available anywhere?

Regards
Andrew

Subject: Very Good Article..
Posted by: Sreyas MN (not signed in)
Posted on: Monday, February 07, 2011 at 3:48 AM
Message: Hi

It is an amazing article...Great.I am waiting more and more article like this

Thanks and Regards
Sreyas MN

Subject: Excellent Article
Posted by: Hank Freeman - Senior SQL DBA/Architect (not signed in)
Posted on: Monday, February 07, 2011 at 5:05 AM
Message: Robert,

You have taken the time to work with this arduous subject matter that will befuddle less senior technicians. Meaning your article is really helpful for most and hopefully the comment of others and mine below can bring in another real-world set of constraints that you did not discuss for it may have been out of scope. The following is my concise synopsis of the problem I had to solve, for which your instructions in the article are absolute SPOT ON.

I my specific case, I had 146 million and 196 million rows of data to load into a table for subsequent updating of master table in a SEPERATE database for separate sequential load and update requirements. The word SEPARATE is key here, for I did this so that I could do the work in the separate database and set the recovery mode to SIMPLE and keep it that way and never reset it so FULL. Meaning, I knew what I was doing and I did not need to create log files that would be 1.4 Terabyte in size for just the BULK Insert alone not even taking into consideration the updates.

The whole concept of truncating the target table and then loading it was complicated by so many foreign key constraints on the table. Additionally, the whole process was incredibly CPU (93% utilization) and disk I/O (95%) bound for over 30 minutes. So to solve the problem I had to perform the BULK INSERT on an ancillary server (a test server) so as to never interfere with the production server’s performance.

In short I used BULK INSERT as you have discussed above with my separate database on a separate server to nullify the performance issue. The update was complicated as well, for I had to efficiently copy the information from the production server (141 million rows), update the copy on the separate server and then reverse direction and update the production server... Needless to say there was a whole bunch of table locking going on at the right time.

Additionally, I had to use dreaded Linked Server to get the data and update it… This required some simple select into statements to get the table from the production server and then another from the production server to put the update table back as a separate table so that an efficient update between the Updated table could be performed on the production table at the production server. For those reading these post, NEVER EVER perform an update on a large table using LINKED Servers. You may lose your job because you locked out the production or target system during the operation for an absurd amount of time.

Bottom line I had to test every part of this operation and tweak it for the best performance and then I isolated the BULK INSERT operations to avoid the assault on the production server. This was accomplished for the final update of millions of rows of data was performed on the production table With UPDLOCK in less than one minute.
I hope this help those who read the article and these subsequent post.
Hank Freeman - Metro Atlanta, GA

Hfreeman@msn.com
End

Subject: Where's the BULK EXPORT?
Posted by: Anon (not signed in)
Posted on: Monday, February 07, 2011 at 6:00 AM
Message: I can support Paul Hunter... how about a TSQL bulk export.

I wrote one ages ago using DMO, specifically SQLDMO.BulkCopy, which would be SMO today. I'm sure that I saw a discussion of using OPENROWSET for exporting tables without the need for either xp_cmdshell or for sp_OACreate.

Is the use of OPENROWSET for exporting data possible? Can anyone demonstrate it?

Thanks in advance.

Subject: Export yes, but no BULK export.
Posted by: Phil Factor (view profile)
Posted on: Monday, February 07, 2011 at 6:41 AM
Message: It is certainly possible to export data from a table with OPENROWSET, but not using the BULK feature (AFAIK).
Here is an example from an old article that Robyn and I wrote four years ago!
SQL Server Excel Workbench

UPDATE OPENROWSET('Microsoft.Jet.OLEDB.4.0',
  
'Excel 8.0;DATABASE=c:\CambridgePubs.xls',
  
'Select * from CambridgePubs')
    
SET Address='34 Glemsford Road'
    
WHERE Address = '65 Cavendish Road'

INSERT INTO OPENROWSET ('Microsoft.Jet.OLEDB.4.0',
  
'Excel 8.0;DATABASE=c:\CambridgePubs.xls',
  
'Select * from CambridgePubs')
  (
Pubname, Address, Postcode)
    
SELECT 'The Bull', 'Antioch Road','CB2 5TY'


for a few other techniques for exporting stuff, try
The TSQL of Text files


Subject: INSERT INTO SELECT ... with parallel updates.
Posted by: gooram (view profile)
Posted on: Wednesday, August 31, 2011 at 11:25 PM
Message: According to Microsoft (and as you say their documentation can be less than succinct), using the WITH (TABLOCK) hint in an INSERT INTO SELECT ... statement, issues an exclusive lock (X) on the table and not a (BU) lock.

See this statement from http://msdn.microsoft.com/en-us/library/ms174335.aspx:

"Specifying the TABLOCK hint on a table that is the target of an INSERT statement has the same effect as specifying the TABLOCKX hint. An exclusive lock is taken on the table."

My testing does seem to produce different results when trying to do bulk inserts into the same table using multiple concurrent threads, but it's inconclusive.

Does anyone have any experience here they could share? Thanks

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Searching for Strings in SQL Server Databases
 Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Highway to Database Recovery
 Discover the best backup and recovery articles on Simple-Talk, all in one place. Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.