Subscribe

RSS Feed (xml)

Powered By

Skin Design:
Free Blogger Skins

Powered by Blogger


Saturday, September 27, 2008

SSIS Usage of Checkpoint File

If you are an ETL application developer or administrator, one problem you may have faced is rerunning the entire ETL application in the event of failure. Rerunning the application from the point of failure is often not possible due to many reasons, but mainly due to the difficulty in implementing logging and maintaining each step of the application and configuring the execution based on the available log. SQL Server Integration Services provides a mechanism to automate the restarting process from the point of failure up to some extent. This mechanism has been provided using Checkpoint files. It helps us configure the ETL solution's execution flow in the event of a previous execution failure. This article explains the usage of checkpoint files with some tasks available with Integration Services.

This article Includes:

  • Setting up the checkpoint file
  • Checkpoint usage with for loop
  • Checkpoint usage with events
  • Other constraints

Setting up the checkpoint file

The configuration of the checkpoint file starts with setting up three properties of the package; CheckpointFileName, CheckpointUsage, SaveCheckpoints. These properties are available with the package properties, under Checkpoints category (See the properties with categorized view). CheckpointFileName allows us to set the path of the checkpoint file. Since the info is stored in an XML format, giving an xml extension to the file would be better though it is not required (eg. D:\Checkpoint Files/LoadDataPackage.xml).

The second property, CheckpointUsage tells Integration Services whether this package requires a checkpoint file or not. Three values are accepted; Never, IfExists and Always. Never indicates that the checkpoint file should not be used and execution of the package must be started (or restarted after failure) from the beginning. Always requires the checkpoint file for every execution. Once set to Always, the package will not be start without a checkpoint file. IfExists instructs the package to use the checkpoint file if it exists only. IfExists is the best setting for CheckpointUsage because the checkpoint file will exist only if the package ends in failure. Remember that once the checkpoint setting is set, the package creates the checkpoint file at the given location during execution and then erases it on successful execution. The checkpoint file will only remain if the package ends in a failure.

SaveChekpoints indicates whether the package needs to save checkpoints. It accepts either true or false. This needs to be set to true if the package needs to be restarted from the point of failure.

The last step is, configuring each task, specifying the involvement in the checkpoints. This is done through the property called FailPackageOnFailure. This property needs to be set to True for enabling the task in the checkpoint.

Okay, let's set this up. Open Business Intelligence Development Studio and start a new Integration Services project. Add two File System Tasks. Create three variables for holding two physical paths for two text files and another path for destination. Name the variables as illustrated below. You may change the path according to your folder structure.

Name Data Type Value
DestinationFolder String D:\FilesReceived
SourceFile1Path String E:\Clients\Files\Transaction1.txt
SourceFile2Path String E:\Clients\Files\Transaction2.txt

This is the assumption: We receive two text files from the client to a folder called E:\Clients\Files\. A Package loads the files and copies them to the destination (D:\FilesReceived). Once copied, it extracts data and transforms and transfers to a relevant tables in the database. Your package may look this this:



Note that, since this demo shows the implementation of checkpoints, transforming and transferring have not been implemented, hence they are represented with two Script Tasks that just pop-up messages saying "Records successfully saved!". Now create two text file as Transaction1.txt and Transaction2.txt. Do not copy them to E:\Clients\Files\. Now Configure File System Tasks like below:



Add messages to both the Script Tasks too. Now the package is done. Note that we have NOT configured anything related to the checkpoint. First we run the package without setting it, and understand the problem, then try to solve the problem with checkpoint.




Before starting it, Let's make sure that package runs. Place both text files in the E:\Clients\Files\ folder. Run the package and see whether the files are copied to the destination and messages from both Script Tasks are displayed. It should work. Okay, let's start testing the checkpoint. Remove the Transaction2.txt file from the source folder and run it. The output will be something like this:



The task Load Transaction2 File failed because of the unavailability of the Transaction2.txt file. We can run the package again once the file is available but the biggest problem is, it runs the first task again that loads Transaction1.txt too. Because of that, it may introduce data (or business) errors or redundancy (if the transformation/transferring is implemented). This is what we want to avoid and this is where we have use checkpoint files. If the package fails, we need to restart the package from the point of failure, not from scratch.
Let's configure it. Change the properties as illustrated below:

Property Value
Package
CheckpointFileName D:\CheckpointFiles\SamplePackageCheckpoint.xml
CheckpointUsage IfExists
SaveCheckpoints True
All Other Tasks
FailPackageOnFailure True

Let's run the package again. Make sure the source folder contains only the Transaction1.txt file. Just like the first time, the package fails after the second task. Now open the folder which is supposed to hold the checkpoint file. You will notice that SamplePackageCheckpoint.xml has been created. This file has been created because we have set the value as True for SaveCheckpoints in the package. It has not been erased due to the failure in the package. The failure resulted because of the FailPackageOnFailure property of the Load Transaction2 File is set to True.

Place the Tranaction2.txt file in the source folder and run the package. You will notice that the package starts from the Load Tranaction2 File task and continues:



It started from the third task because the checkpoint file instructs it to, and the package has used this checkpoint because the CheckpointUsage property was set to IfExists. If you check the checkpoint file's folder, you will not see the file now. It has been deleted since the package was successful.

Checkpoint usage with for loop
Checkpoint settinga are different with for loops. I came across this while trying to implement it in one of my ETL test applications. To understand it, let's try out some simple code.

Open a new SSIS project and add a For Loop Container. Create a variable called Year and set the type as Int32. Set the default value of the Year as 2004. Think about a scenario where you need to do some processing from a given year to the current year. So the given year would be the value in the Year variable. Add a script task onto the For Loop Container that represents this process. Add the Year variable in the ReadOnlyVaribles of the ScriptTask and add a message box that shows a message like below:

' code 1
MsgBox("Calculation done for year " & Dts.Variables("User::Year").Value.ToString() & ".")
Dts.TaskResult = Dts.Results.Success

Now configure the For Loop Container like below. Note that we do not set the value for @Year at the InitExpression. The initial value should be taken from the variable.



Done! Run the package. You will see messages for year 2004, 2005, 2006, and 2007. Let's configure the checkpoint file now. Set properties like below:

Property Value
Package
FailPackageOnFailure True
CheckpointUsage IfExists
SaveCheckpoints True
For Loop Container
FailPackageOnFailure True
Script Task

FailPackageOnFailure True

Since we need to fail the package at a certain point, change the code of the Script Task like below.

' code 2
If (CInt(Dts.Variables("User::Year").Value) = 2005) Then
MsgBox("Error occurred while processing for " & Dts.Variables("User::Year").Value.ToString() & ".")
Dts.TaskResult = Dts.Results.Failure
Else
MsgBox("Calculation done for year " & Dts.Variables("User::Year").Value.ToString() & ".")
Dts.TaskResult = Dts.Results.Success
End If

Run the package again. The Package fails after processing year 2004. You will see that checkpoint file has been created and has not been removed. Change the code of the Script Task as it was (code 1). Since the package ended up in failure and a checkpoint file is available, if we restart the package, it should start from the point of failure, it should start processing 2005. Restart the package. It starts from point of failure but it starts the processing from year 2004. This is because the checkpoint file says that the value of Year variable was 2004 at the failure, which is wrong. The value of variable Year has not been saved correctly. This happens when the values of FailPackageOnFailure property is set to true for tasks inside the For Loop Container. Get the properties of ScriptTask inside the For Loop Container and set the FilaPackageOnFailure to false. Delete the checkpoint file. Change the code of the ScriptTask back to code 2. Run the package again.

Once the package fails, change the code of ScriptTask back to code 1. The package should start from the point of failure. Check the checkpoint file created by the previous execution. You will see that the value of the variable Year has been saved as 2005. Run the package. Notice that For Loop Container starts counting from 2005, as we expected. Important point here is, we should not set the property FailPackageOnFailure to True for tasks inside the For Loop Container even if you have another For Loop Container (nested) inside.

Further, I noticed that when the For Loop Container contains more than one task that are connected via precedence constraints, when a restart after failure occurs, it starts the For Loop Container from the point of failure but starts with the first task within the loop without starting from the failed task. This happens, no matter what setting is configured; which proves that we have no way of controlling the tasks inside the for loop for checkpoints.


Checkpoint usage with events
I found that saving checkpoints at event execution is bit different too. And it is very important to be aware of it because we often write codes in events. Once the package is restarted using a checkpoint file, depending on the point of failure, it fires some events but some are not. Let's configure one package for testing. Create a new package and add a Script Task. Set the checkpoint configuration like below:

Property Value
Package
CheckpointFileName D:\CheckpointFiles\SamplePackageCheckpoint.xml
CheckpointUsage IfExists
SaveCheckpoints True
Script Task

FailPackageOnFailure True

Add code like below to the Script Task. It just pops-up a message.

' code 1
MsgBox("Script task executed!")
Dts.TaskResult = Dts.Results.Success

Configure the events below in the Script Task. Add a Script Task to every event and code for displaying proper message boxes about the event.

Event Event Script Task message
OnPreExecute "OnPreExecute event fired!"
OnPreValidate "OnPreValidate event fired!"
OnPostValidate "OnPostValidate event fired!"
OnInformation "OnInformation event fired!"
OnPostExecute "OnPostExecute event fired!"

Note that no checkpoint-related settings are configured in events. Run the package for testing. You will get messages ordered as the above list. The message "Script task executed" will be shown between OnPostValidate and OnInformation events. Change the code of the main Script Task in Control Flow.

' code 2
MsgBox("Script task failed!")
Dts.TaskResult = Dts.Results.Failure

This causes the package to fail. Run the package and notice the messages. Package displays event messages from OnPreExecute to OnPostValidate, main script task message and then fails. This causes the checkpoint file to remain in the folder. If you correct the error by replacing code 2 with code 1 and execute the package again, you will notice that the task starts but the first three events are not fired. It starts firing events from OnInformation event onwards. Remember this happens even if the script tasks inside the events are set with FailPackageOnFailure = true. Be cautious when you implement checkpoints with packages that have events implemented.

Other constraints
Checkpoints work with Control Flow tasks only. Tasks related to Data Flow cannot be controlled by checkpoints. In addition to that, the ForEach Loop Container cannot be controlled either. It starts iterating the loop from the beginning without starting from the point of failure. If you need to start the iteration from the failure task inside the ForEach Loop, you need to implement your own codes for it.

You may be holding some secure information inside the package. If the checkpoint is enabled, it may store secure information in the checkpoint file at a failure. Because of this, it is better to set a secure location for the checkpoint file that everyone cannot access.

No comments:

Post a Comment

Recent Posts

Archives