I started using SQL Server Integration Services (SSIS) when I had a job that required me to move and manipulate data between files and other data sources. I did a bit of research using the resources available-Twitter, Simple-Talk, SQL Server Central, etc.-and concluded that SSIS was the right way to go. I had to get my head around it quite rapidly, but I couldn’t find help at the level I required. For that reason, I noted the points where I had struggled so that, once I’d learned more, I could help others who might otherwise struggle as I did.
In this article, the first of the “SSIS Basics” series, I go through the basics required for anyone starting out with SSIS, before he or she can venture off into more exotic uses for the tool. In subsequent articles, we’ll cover such topics as variables, for-each loops, and XML. If you’re already a regular SSIS user, this series is not for you!
What can you use SSIS for?
Essentially, SSIS can be used for any task related to files or data. You can, for example,
- Move and rename files
- Delete, update or insert data
- Execute SQL scripts or stored procedures
- Import and export data between different file types, such as Access, Excel, or any data source that supports an ODBC connection
These are, of course, only a few of the tasks you can perform in SSIS. As we work through this series, you’ll get a better sense of how extensive SSIS actually is.
SSIS is available only in SQL Server 2005 onwards. You create and develop SSIS in SQL Server Business Intelligence Development Studio (BIDS), a visual development tool based on Microsoft Visual Studio. (BIDS has morphed into SQL Server Data Tools (SSDT) in SQL Server 2012.)
Before going further, there is some terminology that’s important to understand. SSIS files are organized into packages, projects and solutions. The package is at the bottom of the hierarchy and contains the tasks necessary to perform the actual extract, transform, and load (ETL) operations. Each package is saved as a .dtsx file and is part of a project. You can include one or more packages in a project. That project, in turn, is part of a solution, which is at the top of the hierarchy. You can include one or more projects within a solution.
When you first open BIDS, you’re presented with the interface shown in Figure 1.
To create an SSIS package, point to the
File menu, point to
New, and click
Project. This launches the
Project dialog box, shown in Figure 2.
Project dialog box, select the
Project template. Then, provide a name for the project in the
Name text box. Next, in the
Location text box, specify the folder where your project files should be saved, and then provide a name for the solution in the
Name text box.
After you’ve entered the project and solution information, click
OK. Your new package will open in the SSIS window, as shown in Figure 3.
Notice that the SSIS interface is divided into the following five sections (windows):
Control Flow Items: The components necessary to control a package’s workflow. For example, the section includes components that let you move or copy data, run SQL statements, or send emails. (The components will be explained in more detail in this article and in articles that will follow.)
Connection Managers: The connections to your data sources (whether retrieving or loading data). Your data sources can include SQL Server databases, CSV files, Excel spreadsheets, and a variety of other sources.
Solution Explorer: A hierarchical view of the data sources, data source views, packages, and other components included within the current solution.
Properties: The properties and their values for the package or the selected component within that package.
SSIS Designer: The main working area for developing your SSIS package.
SSIS Designeris broken into four tabs:
Explorer. We’ll look at each of these in greater detail as we progress through this series.
Control Flow Items
In this article, I focus on setting up the SSIS package and defining the data connections. I do not cover all the components in the
Control Flow Items window. In the next article, I will demonstrate using, what I think is, the most important of these components-the
Data Flow Task-and cover other control flow tasks in subsequent articles
I will now explain how to create connection managers that connect to both Excel files and a SQL Server database. However, it is important to note that any connection created through the
Connection Manager window is available only to the package it is created in.
Connecting to an Excel File
One of the first steps you’ll often take when developing an SSIS package is to create the connection managers necessary to retrieve data from or load data into your data sources. You can also set up connections “on the fly,” so if you miss creating one here it can be done as part of other tasks. This approach is most commonly used when you wish to create a connection based on the source. For example, if you wish to copy data out of a SQL Server database and export it to an Excel spreadsheet, you can create the connection manager when you set up your data flow.
To add a connection manager, right-click the blank area in the
Manager window, where it says
Right-click here to add a new connection manager to the SSIS package, as shown in Figure 4.
This will launch a context menu that provides a number of options for creating various types of connections, as Figure 5 illustrates.
Notice you can create connections for such sources as OLE DB, ADO.NET, Analysis Services, and different types of files. In this case, we want to create a connection to an Excel file, so click the
Connection option. This will launch the
Editor dialog box, shown in Figure 6.
For this example, we’ll be connecting to an Excel file I created for demonstration purposes. Figure 7 shows the worksheet I set up in this file.
I named the Excel file
Employees.xlsx and saved it in the
Usage type drop-down list in the
File Connection Manager Editor dialog box, select
file. Next, click the
Browse button, navigate to the folder that contains the Excel file, and select the file. The dialog box should now look like the one shown in Figure 8.
Once you’ve selected the file, click
OK. The new connection manager will be added to the
Managers window and will be assigned the name of the file, as shown in Figure 9.
It is very easy to rename the connection manager to something that may be more appropriate. To do so, right-click the new connection manager and select
Rename from the context menu, as show in Figure 10
The name then becomes updateable and you can rename it to whatever you like. In this case, I renamed the connection manager
Employees (Excel), as shown in Figure 11.
When you view a connection manager in the
Managers window, you’ll see that each connection type is associated with a different icon. If you created an Excel connection from here, it is displayed with the same icon used for any flat file connection. However if you create an Excel connection when adding a component to the
Data Flow tab, the connection manager will display an Excel Icon.
Connecting to a SQL Server Table
Because our example will retrieve data from a SQL Server database, you’ll also need to create a connection manager for that database. To do so, you should again right-click the
Managers window to open the context menu, but this time, click the
Connection option. The
Configure OLE DB Connection Manager dialog box will appear, as shown in Figure 12.
OLE DB connections have already been defined on the package, they will appear in
connections list. You can use one of these, if it fits your needs, or you can create a new one. To create a new connection, click the
New button to launch the
Manager dialog box, shown in Figure 13.
To configure the connection manager, select the SQL Server instance from the
name drop-down list, and then select the authentication type. In this case, I selected the
Authentication option and provided a username and password. You might decide to select the
Authentication option, in which case your current Windows credentials will be used to establish the connection with SQL Server. In a later article, when we look at deploying the package, we will look at how the connections can be altered at run time and therefore how the login details can be changed then. For now, ensure that you set up the login the way you need it to run the package while you’re developing it.
Select or enter a database name drop-down list, select the name of the
AdventureWorks database. Your
Manager dialog box should now look similar to the one shown in Figure 14.
Be sure to click the
Connection button to verify that you can connect to the target database. The system will display a message similar to the one in Figure 15 to confirm whether you’ve successfully connected to the database.
After you’ve confirmed your connection, click
OK to close the message box, and then click
OK to close the
Manager dialog box. You will be returned to the
Manager dialog box, shown in Figure 16.
Notice that your new connection has been added to the
connections list. Click
OK to close the dialog box. The
Managers window will show your two connections. You’re now ready to start working with them.
Solution Explorer, you can view all projects, packages, data sources and data source views that make up the solution.
Adding New Projects
If you wish to add an additional project to a package, point to
File on the menu bar, point to
Add, and click
New Project, as shown in Figure 17.
Add New Project window opens. Select
Integration Services Project and in the Name box enter the name you wish to call the new project as shown in Figure 18.
As you can see in Figure 19 a new project is added to the solution named “Dev” and will appear in
Solution Explorer. The project will contain three empty folders named
Data Source Views and
Miscellaneous. The project will also contain a folder named
SSIS Packages and within the folder a file named
Package.dtsx, which is an empty SSIS package created automatically when the project is created. Figure 19 shows the new project and its folders in
Earlier I showed you how to create connections in the
Connection Managers window. As I mentioned, if a connection is created in the
Connection Managers window, it is available only to the package it was created in. However, you can also create connections known as data sources, which are available to all packages in a project.
To create a new data source, right-click
Data Sources in
Solution Explorer to open the
Connection Manager dialog box (shown in Figure 20). Then fill in the options as you did when you created an
OLE DB connection manager. Be sure to click
Test Connection to confirm the connection has been created successfully.
Data Source Wizard will appear, with the new data connection highlighted, as shown in Figure 21. After you review the settings, click
When the next page of the wizard appears, type in a name for the data source. As this is project wide, I would recommend you fully describe the source using the server and database name. I have renamed my data source
RGTest_AdventureWorks2008, as shown in Figure 22. I try to set up and follow consistent naming conventions.
After you’ve renamed the data source, click
Data Source should now be listed under
Data Sources in
Solution Explorer, as shown in Figure 23. Notice that the data source is saved with the .ds file extension to indicate that it is indeed a data source.
Initially, the new data source is not listed in your package’s
Connection Managers window; however, it is available to your package. Once you have made use of the data source in the package it will be visible in the
Connection Managers window.Data Source Views
Data source views, like data sources, are available to all packages in a project. A data source view is used to define a subset of a data from a data source. The data source view can include only some of the tables or it can be used to define relationships or calculated columns.
Because a data source view is based on a data source, you would normally create the data source before starting to create the data source view. However, this is not compulsory because you can create the data source when you’re creating the data source view. To create a data source view, right-click the
Data Source Views folder and click
New Data Source View, as shown in Figure 24.
Data Source View Wizard appears, click
Next. The next page shows the data sources available to the project, as shown in Figure 25. (In this case, there’s only one.)
As you can see, the page shows the name of the data source in the
Relational data sources list. The properties for the selected data source appear to the right, in the
Data source properties window. A data source must be selected before you can continue with the wizard. If you haven’t created the data source you need, you can create one now by clicking the
New Data Source button.
Once you’ve selected the necessary data source, click
Next. The new page provides a list of the tables and views available through the selected data source. If you wish to filter the list, type the filter criteria in the
Filter text box below the
Available Objects list, and then click the filter icon to the right of the text box. For example, I typed
Emp in the
Filter text box, which reduced the list of available objects to those that contain “Emp” in their name, as shown in Figure 26.
The next step is to determine which tables and views you want to include in your data source view. From the filtered list of tables and views in the
Available Objects list, select the objects you want to include. You can select more than one object by clicking the first one, holding down the Ctrl key, and then clicking the additional objects. Once you’ve selected the objects, click the single right arrow button to move those objects to the
Included Objects window. If you want to move all the listed objects, simply click the double right arrow button.
Once an object has been moved to the
Included Objects list, the single left arrow button and double left arrow button become active. These work the same as the right arrows. The single left arrow moves a single selected object or multiple selected objects from the
Included objects list back to the
Available objects list. The double left arrow moves all objects in the
Included objects list back to the
Available objects list.
Figure 27 shows the full list of available objects (without filters), minus the two objects that have been moved to the
Included objects list. Notice that two additional objects are selected in the
Available objects window. As you would expect, you can move the files to the
Included objects list by clicking the single right arrow button.
If you click the
Add Related Tables button beneath the
Included objects list, all tables related to the objects in the
Included objects list will be automatically added.
Once all required objects have been selected, click
Next. You can now see a preview of what you have selected, and you can rename the data source view to something more appropriate. If you have missed an object, click the
Back button to return to the previous page.
For this example, I renamed my data source view
AW2008-Employees. As you’re changing the name in the
Name textbox, the name is also updated in the
Preview window, as shown in Figure 28.
If you are happy with the configuration, click
Finish. The data source view is saved with the .dsv file extension and is added to the
Data Source Views folder in
Solution Explorer. A new window appears in
SSIS Designer and shows the data source view in design mode, as shown in Figure 29.
Amending a Data Source View
SSIS provides a number of options for modifying a data source view. Most of those options are at the table level. If you right-click the table name either on the design surface or in the
Tables pane (on the left side of the screen), you can choose from the following options:
Adding a calculation
Adding a relationship
Replacing a table
Deleting a table
Deleting an object
Suppose I added the
Store table in error. I can delete from table from my data source view by right-clicking the table name and selecting the
Delete table from DSV option, as shown in Figure 30.
You’ll then be prompted to confirm your deletion. When the
Delete Objects message box appears, click
OK, as shown in Figure 31.
When you click
OK, the object is permanently removed from the data source view.
Adding a new column
To add a calculated column to a data source view, right-click the table name and select
New Named Calculation to open the
Create Named Calculation dialog box. Enter the new column name in the
Column name text box, add an appropriate description in the
Description text box, if required, and then create the calculation in the
Expression text box. For this example, I’ve assigned the name
Age to the column and added the description
Current Age based on Birth Date. For the expression, I added the one shown in Figure 32. Note that, at this stage, there is no way to test whether your code is correct!
Figure 33 shows us that the
Age column has been added to our table. The icon next to the column shows that it is a calculated column.
To view the data in the table and verify that the new column has been created correctly, right-click one of the columns and then click
Explore Data, as shown in Figure 34.
Explore Employee Table window appears, as shown in Figure 35. We can now view all the data in the
Employee table. Notice that the
Age column has been added to the table (on the far right side) and displays the data returned by our expression.
Once you have made all the necessary changes, save the data source view. It will then be available for you to use in any of your packages in the project.
In this article, I’ve shown you how to create an SSIS package and set up connection managers, data sources, and data source views. In the next article, I will show you how to set up a package that retrieves data from a SQL Server database and loads it into an Excel file. I will also show you how to add a derived column that calculates the data to be inserted into the file. In addition, I will demonstrate how to run the package.
In future articles, I plan to show you how to deploy the package so it can be run as part of a scheduled job or called in other ways. I also plan to cover how to use variables and how they can be passed between tasks. I also aim to cover more control flow tasks and data flow components, including those that address conditional flow logic and for-each looping logic. There is much much more that can be done using SSIS, and I hope over the course of this series to cover as much information as possible.