The Conditional Split can route data rows to different outputs depending on whatever criteria of the data that you wish. It works rather like the SWITCH block in a procedural language. Annette shows how to add a conditional split to your data flow.
In previous articles in this SQL Server Integration Services (SSIS) series, you’ve seen how to add a
Data Flow task to your control flow and then add components to the data flow. In this article, you’ll learn how to add the
Conditional Split transformation to your data flow. The transformation lets you route your data flow to different outputs, based on criteria defined within the transformation’s editor.
The example I demonstrate in this article uses the
Conditional Split transformation to divide data retrieved from a table in a SQL Server database into two data flows so we can save the data into separate tables. We’ll use an
DB connection to retrieve the source data from the
Employee table in the
AdventureWorks database. We’ll then separate the data into two sets. The employees who started before 1 March 1999 will be loaded into the
PreMar1999 table, and those on or after that date will be loaded into the
PostMar1999 table. We’ll create these tables as we define our data flow.
Adding a Connection Manager to Our SSIS Package
Before we do anything else, we need to set up a connection manager to the database that contains our source data. To do this, right-click the
s window and then click
OLE DB Connection, as shown in Figure 1.
Manager dialog box appears, click
New. This launches the
Manager dialog box. In the
name drop-down list, select the name of the SQL Server instance that hosts the database containing the source data you want to use for this exercise. (Note that it should be some version of the
AdventureWorks database.) Then configure the authentication type you use to connect to the database. Finally, in the
database drop-down list, select the source database. On my system, I’m connecting to a SQL Server instance named
SQL Server Authentication as my authentication type, and connecting to the
AdventureWorks database, as shown in Figure 2.
After you set up your connection manager, click
Test Connection to ensure that you can properly connect to the database. If you receive a message confirming that the test connection succeeded, click
OK to close the
Manager dialog box. Your new connection manager should now be listed in the
Adding an OLE DB Source to Our Data Flow
Now that we’ve established a connection to our database, we can set up our data flow to retrieve data from the
Employee table and then split that data into two sets. That means we must first add a
Flow task to our control flow. To do so, drag the task from the
Control Flow Items section within the
Toolbox window to the control flow design surface, as shown in Figure 3.
After you’ve added the
Flow task, double-click the task to go to the
Flow tab. Once there, drag an
DB source from the
Toolbox window to the data flow design surface.
The next step is to configure the
OLE DB source. To do so, double-click the component to open the
OLE DB Source Editor. Because we already created an
OLE DB connection manager, it will be the default value in the
OLE DB connection m
anager drop-down list. That’s the connection manager we want. Next, ensure that the
view option is selected in the
mode drop-down list, and then select the
HumanResources.Employee table from the
view drop-down list. Your
Editor should now look similar to the one shown in Figure 4.
Now we need to select the columns we want to include in our data flow. Go to the
Columns page of the
Editor. You’ll see a list of available columns in the
Columns table, as shown in Figure 5. We don’t want to include all those columns, however. We want to include only those shown in the
Column list in the grid at the bottom of the page. To remove the unwanted columns, deselect them in the
Once your selected columns are correct, click
OK to close the
Adding a Conditional Split Transformation to the Data Flow
Now we need to add in the
Split transformation to the data flow. To do so, drag the transformation from the
Data Flow Transformations section of the
Toolbox window to the data flow design surface, beneath the
DB source, as shown in Figure 6.
We now need to connect the
DB source to the
Split transformation. Click the green arrow (the data flow path) at the bottom of the
DB source and drag it to the
Split transformation, as shown in Figure 7.
After we add the
Conditional Split transformation, we need to do something with it. As I mentioned previously, we’re going to use the transformation to divide the data flow into two separate paths.
To do this, double-click on the
Conditional Split transformation to open the
Conditional Split Transformation Editor. The editor is divided into three main windows, as shown in Figure 8.
Notice in Figure 8 that I’ve labeled the windows
3 to make them easier to explain. The windows provide the following functionality:
- Columns and variables we can use in our expressions that define how to split the data flow.
- Functions we can use in our expressions that define how to split the data flow.
- Conditions that define how to split the data flow. These need to be set in priority order; any rows that evaluate to true for one condition will not be available to the condition that follows.
For this example, we’ll define our condition based on the
e field in the data flow. So the first step to take is to expand the
Columns node in window 1 and drag the
HireDate field to the
Condition column of the grid in window 3. When you drag the file to the
Condition column, SSIS will populate the
Output Name columns, with all three columns written in red text as shown in Figure 9.
As you can see in Figure 9, SSIS populates the columns with default values. For the
Order column, stick with the default value of
1. For the
Output Name column, we’ll provide a name for our outbound data path. (We’ll be creating two data paths, based on our split condition.) On my system, I used the name
1999, so type that or another name into the
Name column. Now we must define the expression that determines how the condition will evaluate the data. This we do in the
As previously stated, we’ll split the data flow based on those employees who started before 1 March 1999 and those who started on or after that date. So we need to define two conditions. We’ve already been working on the first condition. To finish defining that one, we need to add a comparison operator to the expression in the
Condition column, in this case, the lesser-than (
<) symbol. There are two ways to add the operator. You can type it directly into the
Condition column (after the column name), or you can expand the
Operators node in section 2 and drag the less-than symbol to the
Next, we need to add a date for our comparison, which we add as a string literal, enclosed in double quotes. However, because we’re comparing the date to the
HireDate column, we need to convert the string to the
DT_DBDATE SSIS data type. Luckily, SSIS makes this very easy. We enclose the data type in parentheses and add it before the string value. Once we’ve done this, the equation in the
Condition column should look as follows:
HireDate < (DT_DBDATE)"01-Mar-1999"
When you include the data type in this way, SSIS automatically converts the string literal to the specified data type (assuming that the string conforms to the data type’s specifications).
There are two ways you can add a data type to your equations. Either type it directly into the
Condition column, or expand the
Casts node in window 2 and drag the data type to the
Condition column. Once you’ve defined your equation in the
Condition column, you’re first data path is ready to go.
We now need to define our second data path. To do so, follow the same procedures as described above, only this time, use the greater-than-or-equal-to (
>=) operator instead of the lesser-than one. Your second expression should be as follows:
HireDate >= (DT_DBDATE)"01-Mar-1999"
That’s all there is to it. Your
Editor should now look similar to the one shown in Figure 10.
Once you’re satisfied that you’ve configured the
Split transformation the way you want it, click
OK to close the
Adding Data Flow Destinations to the Data Flow
Now that we’ve split the data flow into multiple data paths, we now need to add a destination for each of those paths. So let’s start with the first data path. Drag an
DB destination from the
Data Flow Destinations section of the
Toolbox window to the data flow design surface, somewhere beneath the
Split transformation. Next, drag the green data path arrow from the
Split transformation to the
DB destination. When you connect the data path to the destination, the
Input Output Selection dialog box appears, as shown in Figure 11. The dialog box lets us choose which output we want to direct toward the selected destination.
Notice that the dialog box includes the
Output drop-down list. These are the data path outputs available from the
Split transformation. In this case, the drop-down list will include three options:
We’ve just set up the first two, so these should be self-explanatory. However, there is a third output data path,
Output, which captures any records that don’t meet the conditions defined in the first two outputs. In our example, there shouldn’t be any records in this category, but later we’ll configure this one anyway, just to demonstrate how it works.
For the first
DB destination, select the first option,
PreMar1999, and then click
OK. SSIS will assign the output name to the data path. We now need to configure the
DB destination. Double-click the component to open the
OLE DB Destination Editor. Next to the
Name of the table or the view drop-down list, click the New button. This launches the Create
Table dialogue box, which includes a
TABLE statement that defines a table based on the data flow, as shown in Figure 12. The only change you might want to make to the statement is to modify the table name. I renamed mine to
Once you’re satisfied with the
TABLE statement, click
OK to close the
TABLE dialog box. You should be returned to the
Manager page of the
OLE DB Destination Editor. For all other settings on this page, stick with the default values, as shown in Figure 13.
Now go to the
Mappings page of the
OLE DB Destination Editor to check that the columns have mapped correctly. You can also click the
Preview button on that page to ensure that the data looks as you would expect. At this stage, however, it’s likely to appear empty.
OK to close the
Next, you should consider renaming your
DB destination to avoid confusion down the road. On my system, I renamed the destination to
PreMar1999. To rename a component, right-click it, click
Rename, and then type in the new name. Renaming components makes it easier to distinguish one from the other and for other developers to better understand their purpose.
Once you’ve renamed your destination, you’re ready to set up the other destinations. Repeat the data flow configuration process for the
PostMar1999 data path and the default data path. You’ll notice that once a data path has been used, it’s no long available as an option.
When you’re finished configuring your
DB destinations and their data paths, your data flow should look similar to the one shown in Figure 14.
Now it’s time to test how it works. To do this, click the
Execute button (the green arrow) on the menu bar. If the package runs successfully, it should look similar to the data flow shown in Figure 15.
If all your data flow components turn green, you know that your package is probably running correctly. However, another important indicator is the number of rows that were loaded into each table. As Figure 15 indicates, the package inserted 136 rows into the
PreMar1999 table and 154 rows into the
PostMar1999 table. (These numbers could vary, depending on which version of the
AdventureWorks database you’re using.) As expected, no rows were passed along the default data path. If you were to check the source data (the
Employee table), you should come up with a matching number of records.
In this article, we created an SSIS package with a single data flow. We added a
Conditional Split transformation to the data flow in order split the data into multiple paths. We then directed each of those data paths to a different destination table. However, our destinations did not have to be SQL Server tables. They could have been flat files, spreadsheets or any other destination that SSIS supports. In future articles, we’ll use the
Conditional Split with other data flow components, such as the
Merge Join transformation, to insert, update and delete data.