Subscribe

RSS Feed (xml)

Powered By

Skin Design:
Free Blogger Skins

Powered by Blogger


Friday, September 26, 2008

Data Comparison with Data Dude

I have already written two articles regarding Data Dude (aka Visual Studio Team Edition for Database Professionals); one published as Schema Comparison with Data Dude and other published as Test Data Generation with Data Dude. This is my third article regarding Data Dude, and in this article I will illustrate the way of doing "Data Comparison" with Data Dude. Since both my previous articles provided an introduction to Data Dude, I decided not to add the same here. Let's see what "Data Compare" is.

Data Compare allows us to make a comparison of data between two databases and update the target. Just like the "Schema Compare" operation, this requires a "source" and "target" too. On comparison, it allows us to see the differences and update the "target" as we want. We will perfrom a simple comparison and see how it can be used and how it is useful for us.

The first step is, creating a database project. And we need two databases with the same schema for comparing data. Since I have covered the initial steps for creating the project model with my previous articles, I am not going to show them here again. Simply follow the steps below;

  • Open "SQL Server Business Intelligence Development Studio" and create a "SQL Server 2005" database project. Since we do not want to load a schema in to the project, do not use the "SQL Server 2005 wizard".
  • Let's use ther "AdventureWorks" database as the source for the comparison. As we need to have a target database with the same schema, let's duplicate the "AdventureWorks" database. So open SQL Server Management Studio.
  • Backup the "AdventureWorks" database and restore the database as "AdventureWorksTemp".
  • Now go back to the project in BIDS. Though there are no differences between data in the two databases, we will see a way of performing the data comparison between the two databases. Click on "Data -> Data Compare -> New Data Comparison" menu item.
  • It opens the "New Data Comparison" window as below.



We need to set the source as the "AdventureWorks" database. If the drop-down contains the connection for the database, select it, or else make a new connection by clicking on the "New Connection" button and select it.

  • Select the target as "AdventureWorksTemp". Again, if it is not in the drop-down, create and select.
  • There are four Data Compare options in the screen;
    • Different Records: This results the records available in both databases that have differences. Records will be updated in the target database.
    • Only in Source: This results the records only in the source database. Records will be inserted to the target.
    • Only in Target: This results the records only in the target database. Records will be deleted from the target.
    • Identical Records: This results similar records in source and target. No action will be performed against target for these records.
  • Select all checkboxes. Click on "Next" to continue.
  • It opens a window like below;

This allows us to select the objects we want to do the comparison of data on. If you want to exclude any object, expand "Tables" and de-select items you want to exclude. Click on the "Finish" button to start the operation.


The result of the comparison looks like below;



The result window of the comparison contains three sections but by default, you will see only two sections. The first section shows each object selected for the comparison and the number of records identified under specific category. For example, it says that there are 451 identical records in both databases for the table of "dbo.DatabaseLog". Since there is no difference between the two selected databases, you will not see any tables that have values for columns except "Identical Records".

The second section shows the records related to the selected row(object). It is a tab-sheet that has four tabs, each representing columns in above section. The tab, "Different Records" shows two columns for one data column (field) representing both source and target whereas other tabs use single column to a data column. If you click on "Identical Records" tab sheet, you will see records for the selected object.

The last section of the result is, by default not shown, the script for updating the target. It can be opened by clicking the icon "Show Data Update Script" (or Data -> Data Compare -> Show Update Script menu item). The default setting instructs to drop foreign keys before updating data and recreate them. We will see other setting later in this article. Examine the code and see.

As no data to be updated in the target, the buttons "Write updates" that allows to update the target database directly and "Export to Editor" that allows to load the update script onto separate editor for modification and execution, are disabled. These two buttons are mainly used for updating the target.

Now you have seen the way of doing the data comparison. Let's discuss few more things and then see how changes are reflected through the comparison.

Comparison Key
Data Dude requires a column that has unique values for each record in the table for data comparison. If the table does not contains a unique key, it will be not be listed in the second screen of "New Data Comparison" and it will not be taken into comparison. So, remember a unique key is a must for data comparison.

If the table contains more than one unique key, both will be listed in the drop-down under the "Comparison Key" column. Data Dude will select the first one for doing the comparison, but you can change it as you want. The scrip below creates a table in both databases and shows how it is displayed in the second screen.

use AdventureWorks
go
create table TestTable (Column1 int primary key not null, Column2 varchar(100))
go
create unique index IX_TestTable on TestTable (Column2)
go
insert into TestTable values (1, 'Value1')
insert into TestTable values (2, 'Value2')
go
use AdventureWorksTemp
go
create table TestTable (Column1 int primary key not null, Column2 varchar(100))
go
create unique index IX_TestTable on TestTable (Column2)
go


Once executed, start a new data comparison and note the second screen;



It shows both the primary key and the unique key. Anyway I am not too sure whether the operation or performance will be affected by changing the comparison key but selecting the clustered index (PK) may improve the performance of data comparison.

Reflection of data changes
Now let's change some data in both database and see how data changes are shown and scripted by Data Dude by following the steps below.
  • Open Management Studio and open a new query window.
  • Run the below script;

USE AdventureWorks
GO
UPDATE Sales.SalesOrderDetail
SET OrderQty = 3
WHERE SalesOrderDetailID = 27070
GO

USE AdventureWorksTemp
GO
INSERT Person.ContactType
(Name) VALUES ('Assistant Account Manager')

  • Open the project we created and start a new Data Compare.
  • Select "AdventureWorks" as the source and "AdventureWorksTemp" as the target. Click on "Finish" to start the process.
  • Analyze the result of the comparison. You will see the first difference for the "Person.ContactType" table. Note the value "1" for "Only in Target" column and it is supposed to be deleted from the target.



Note the next change on the "Sales.SalesOrderDetail" table. Though we changed the record only in the "Sales.SalesOrderDetail", a trigger attached to the table has updated both 'Sales.SalesOrderHeader" and "Production.TransactionHistory" tables.



The changed will be applied to the target as "UPDATE".

  • Note that the "Write Updates" and "Export to Editor" buttons are enabled now. Click on the "Export to Editor" and analyze the script generated for the target database. The script contains;

DROP CONSTRAINT and CREATE CONSTRAINT statements for all foreign keys.
DELETE statement for Person.ContactType table.
UPDATE statement for Sales.SalesOrderDetail and Sales.SalesOrderHeader tables.
INSERT statement for Production.TransactionHistory table.

  • Go back to the DataCompare tab and click on "Write Updates". It updates the target database.
  • Query tables and see whether updates are gone to target database.

Is everything in the target updated as the source? Note the message at the top of the DataCompare window. It says x number of items are compared and updated. To make sure that every change has gone, let's compare again. Click on the "Refresh" icon in the "DataCompare" toolbar to start the comparison. Are all records identical? You will see that still data in the "Order" tables mismatched. Find the reason for the mismatch. The column "ModifiedDate" is the problematic column. Let's update the target again. Click on the "Write Updates" to pass the correct updates from the source to target. Refresh and compare. Are records are still mismatched? Yes, this behavior continues because of the trigger attached to the "Sales.SalesOrderDetails" table. Once updates are received from the source, trigger gets fired at the target and updates the "ModifiedDate" column and "Sales.SalesOrderHeader" table. This default behavior can be stopped by enabling the "Disable triggers" option.

Data Compare Options
There are 7 Data Compare options available to change the behavior of the update process. Enabling and disabling allow you to pass the changes from source to target as you want. Options can be seen with "Main Options window (Tools -> Options)" under Database tools -> Data compare node. Here are the available options.

  1. Disable foreign keys
    This is enabled by default. That is why we see drop and re-create of all foreign keys in the script. Enabling this will stop checking the foreign key availability at the execution, when records are updated and inserted but disabling this option enforce the checking.
  2. Disable triggers
    Enabling this stops firing triggers. If you had enabled this option before running the previous update, you would not see records mismatched. You can try above again by enabling this.
  3. Drop primary keys, indexes and unique constrains
    Once you enable the option, it suppose to drop all primary keys, all indexes, and unique constraints but I did not come across a script that drops primary keys and clustered indexes. Dropping indexes would be a good option if you update a table that is heavily indexed with large data set.
  4. Include timestamp columns
    The default setting for this option is "on". When it is on, it compares the timestamp columns and shows if there are differences. But it does not update the target or generate script for update. Disabling the option stops comparing timestamp values.
  5. Do not use transactions
    Enabling this stops the script execution under a transaction. By default it is on.
  6. Exclude IDENTITY columns
    The default setting of this is "off", hence it compares IDENTITY values. It does not compare IDENTITY values and show if there are differences when the setting is "on". I found that it does compares IDENTITY even with the option "on" when the IDENTITY is a part of primary key.
  7. Trim trailing
    Enabling this option trims the trailing spaces in columns when the target is updated.

Hope I have given enough knowledge to you about the feature "Data Compare" and you will find it very useful. If you have not started using it, download the latest from here. Power tools for it can be downloaded from here.

Here are some of the references for Data Dude.

  • Gert's Data Dude Blog: http://blogs.msdn.com/gertd/
  • Visual Studio Team Edition for DB Pros: http://blogs.msdn.com/camerons/
  • Article I wrote about "Test Data Generation": http://sqlserveruniverse.com/content/ADMN0100212262007DataGenerationwithDataDude.aspx
  • Article I wrote about "Schema Comarison":
    http://sql-server-performance.com/articles/per/Schema_Comparison_with_VSTE_DBPro_p1.aspx

I would appreciate your comments on this, so please send your comments to dinesh@sqlserveruniverse.com.

No comments:

Post a Comment

Recent Posts

Archives