Subscribe

RSS Feed (xml)

Powered By

Skin Design:
Free Blogger Skins

Powered by Blogger


Saturday, September 27, 2008

Pivot and UnPivot with SSIS

Introduction

The presentation of your data is required for easy analysis. Turning columns into rows and rows into columns is another way of presenting your data so that end-users can understand it easily. Other than presentation purposes, you need to transform your data within your data warehouse application into different data formats. This process can be referred to as ‘Pivoting’, and the reversing of this process can be referred to as ‘UnPivoting’.

In this article, we will discuss how you can use the Pivot and Unpivot processes in SQL Server Integration Service (SSIS).


Requirements

As this article will not tell you how to write SSIS packages, it is a requirement to have a fair amount of experience in creating SSIS packages. However, I will try to discuss details of creating SSIS packages wherever necessary without disturbing the main topics of the article. In addition, it is necessary to know how to write a query with joining three or more tables.

Software requirements include SQL Server 2005 installed with SQL Server Business Intelligence Development Studio. This article is written with the assumption that the Adventureworks database’s data is being used. It would be optimal for the reader to have an installed Adventureworks database along with a SQL Server 2005 database server.

The data file for this exercise can be found here .

Pivot

To understand what pivoting is, let’s see an example. Below illustrates the relationship in the Advenureworks database of Sales.SalesOrderHeader, Sales.SalesOrderOrder, Prodcution.Product and Production.ProductCategory tables.

From the above relationship, we can assume that we need following output.

Product

Qrt1

Qtr2

Qrt3

Qrt4

Accessories

870

411

Bikes

1167

1369

2844

2495

Components

2641

2966

6173

5253

It is known that you will not retrieve the above data set by a simple T-SQL query. However, we can build the following format with a simple T-SQL code.

Name

Qtr

OrderQty

Accessories

3

870

Accessories

4

411

Bikes

1

1167

Bikes

2

1369

Bikes

3

2844

Bikes

4

2495

Components

1

2641

Components

2

2966

Components

3

6173

Components

4

5253

Turning the above table into the required format is an example of “pivoting”.

Let’s start with the design package. First, add a new package after creating Integration Project.

Then add a Data flow task to control flow. Next, add an OLE DB connection and configure that OLE DB connection to the Adventureworks database.

Finally, add an OLE DB source to the added data flow. While we are going to design the SSIS package with minimum coding, we will use as many SSIS controls as possible. Thus, we will add following T-SQL to the added OLE DB Source.

SELECT PC.Name

,soh.OrderDate

,SOD.OrderQty

FROM Sales.SalesOrderDetail SOD

INNER

JOIN Sales.SalesOrderHeader SOH

ON SOH.SalesOrderID = SOD.SalesOrderID

INNER

JOIN Production.Product PROD

ON Prod.ProductID = SOD.ProductID

INNER JOIN production.ProductCategory PC

ON PROD. ProductSubcategoryID = PC.ProductCategoryID

After including the above code, the OLE DB Source will resemble the screenshot below.


We are going to add a condition split since we are doing this pivot for only a ‘year’. To improve the usability, I have included a variable name called ‘intYear’. Therefore, if you want to change the year, it is just a matter of changing the value of the variable ‘intYear’.

The following is a screenshot of the conditional split.


In the above case, the value of @User::intYear is set to 2001.



Next, we need to derive the Quarter. Even though we can modify the initial T-SQL to return the Quarter, I have used derive column data flow transformation task. The following expression is used to derive the Quarter.

MONTH(OrderDate) >= 1 && MONTH(OrderDate) <= 3 ? 1 : MONTH(OrderDate) >= 4 && MONTH(OrderDate) <= 6 ? 2 : MONTH(OrderDate) >= 7 && MONTH(OrderDate) <= 9 ? 3 : MONTH(OrderDate) >= 10 && MONTH(OrderDate) <= 12 ? 4 : 0

We now need to group the above data with Category and Quarter. We can use aggregate transformation and configure it to be grouped by Name and intQtr.


Next we need to add a sort transformation, and here I have used category to sort. We also need to sort the key column, otherwise pivot will not work properly. To see the data up to this point,you can add a data viewer.

Below is the scrennshot of the data set should be getting, which is the data set we need to pivot.


We have now reached the core part of this article- pivoting. For pivoting, there is a pivot transformation confirguration which is not exactly straight forward. At input tab of the pivot transformation, you need to select columns that you would use in the pivot operation, which in this case would be all three available columns.

The next most important tab is the ‘Input and Output’ properties tab, pictured below.


For input columns, we need to configure the pivot usage attribute.

Option

Description

0

The column is not pivoted, and the column values are passed through to the transformation output.

1

The column is part of the set key that identifies one or more rows as part of one set. All input rows with the same set key are combined into one output row.

2

The column is a pivot column. At least one column is created from each column value.

3

The values from this column are placed in columns that are created as a result of the pivot.

Source: Books on line, SQL Server 2005

According to the above table, Name column should be Option 1 , intQtr should have Option 2 and OrderQty should have Option 3 for pivot usage attribute value.



The next step is to configure the output columns.

By clicking Add column button, ProductCategory, Qtr1, Qtr2, Qtr3 and Qtr4 are added. We have to configure the SourceColumn and PivotKeyValue attributes for those four columns.

For the ProductCategory column, assign LineageID of the Name input column to the SourceColumn. In this sample, it is 1161. For the Qtr1 column, assign LineageID of the OrderQty input column to the SourceColumn, which is 1166 in this example. PivotKeyValue is the column value of the initial data set. For Qtr1, we need records which has the value 1 for intQtr. So the PivotKeyValue for Qtr1 is 1.

Similary, Qtr2 will have 1166 for the SourceColumn and 2 for the PivotKeyValue. Qtr3 will have 1166 for the SourceColumn and 3 for the PivotKeyValue. Qtr4 will have 1166 for the SourceColumn and 4 for the PivotKeyValue.

Here we have also added another data viewer after the pivot transformation. Following is the output of the data viewer, thus achiving our goal.

The following depicts the final layout for the package.


We can extend this package to view Months from Quarters. For this, we need to change the derive column transformation. The following shows how this can be done.

MONTH(OrderDate) == 1 ? "Jan" : MONTH(OrderDate) == 2 ? "Feb" : MONTH(OrderDate) == 3 ? "Mar" : MONTH(OrderDate) == 4 ? "Apr" : MONTH(OrderDate) == 5 ? "May" : MONTH(OrderDate) == 6 ? "Jun" : MONTH(OrderDate) == 7 ? "Jul" : MONTH(OrderDate) == 8 ? "Aug" : MONTH(OrderDate) == 9 ? "Sep" : MONTH(OrderDate) == 10 ? "Oct" : MONTH(OrderDate) == 11 ? "Nov" : MONTH(OrderDate) == 12 ? "Dec" : "UNK"

The sorting was done for the Month column in the Sort transformation.



The next key configuration is the Input and Output Configuration of the Pivot transformation. We will need to add all twelve columns for Month apart from the Product Category column.


The configuration is similar as in the first example. For example, the Jan column PivotKeyValue is Jan and the SourceColumn is LinegeID of OrderQty of InuputColumns.

Below depicts the final output of this package.


In both of these cases, we have used one column to pivot, which is ProductCategory. Now we will attempt to add two columns for pivoting - Year and Month.

We will need to add the Year column, which is YEAR(OrderDate) at derive column. Year column was included for Aggregate transformation. Sorting is done for Month and Year.

Let us see what we have to do for Pivot transformation. PivotUsage of intYear column is 1. Source column of Year output column LineageID of intYear input column.

Below depicts the output for the above package.


Unpivot

Now that we have reviewed pivoting, it is time to learn how to “Unpivot”. The unpivot transformation makes a de-normalized dataset into a more normalized version by expanding values from multiple columns in a single record into multiple records within the same values in a single column. It is the exact the opposite of pivoting.

We can begin with our earlier data format, which is in a comma separated value (csv) format.

2002,Accessories,,,,,,,204,353,313,117,166,128

2003,Accessories,51,75,80,100,138,156,215,381,386,120,131,188

2004,Accessories,40,76,75,91,188,178,,,,,,

2001,Bikes,,,,,,,148,519,383,213,660,492

2002,Bikes,206,516,445,234,635,500,751,1088,1005,646,968,881

2003,Bikes,580,887,680,746,937,753,876,1312,1333,878,1234,1332

2004,Bikes,758,1084,1033,940,1332,1336,,,,,,

2003,Clothing,,,,,,,890,1474,1700,829,1254,1610

2004,Clothing,732,1018,1447,890,1301,1606,,,,,,

2001,Components,,,,,,,453,764,575,667,1258,1007

2002,Components,618,1161,862,738,1282,946,1865,2526,1782,1468,2229,1556

2003,Components,1176,2029,1286,1454,2283,1485,1225,1562,1311,1082,1436,1408

2004,Components,965,1374,1170,1217,1597,1379,,,,,,

You can get the above data in a text file name PivotedData.txt, which you will find with the sample package.



1 comment:

  1. It's really excellent blog, I just share your blog because it's really nice. Just look at this MSBI Online Training

    ReplyDelete

Recent Posts

Archives