Subscribe

RSS Feed (xml)

Powered By

Skin Design:
Free Blogger Skins

Powered by Blogger


Saturday, September 20, 2008

How Change Data Capture Can be Used

Introduction
Change Data Capture or CDC is a new feature included in SQL Server 2008. There are several excellent introductionary articles on sql-server-performancesql-server-performance that cover what is CDC, how to configure CDC and where CDC data is captured.

This article discusses how we can use CDC in our day to day wok.

Uses of CDC
There three ways you can use CDC:

  1. As your data Auditing feature
  2. To monitor DDL changes
  3. Load your data warehouse with net changes

Auditing
Auditing is a very important feature to have for your database systems. There are now a lot of legislations in placed around the world such as Data Protection Law, Sarbeny-Oxley. These are just some of the laws which require you to audit your databases.

The following image shows how to use CDC for auditing in SQL Server 2008.



Source: Books On line, SQL Server 2008

After configuring CDC, you can see that you won’t get modified date and time in the CDC tables. If you do a select, you can see that there is a column called _$start_lsn which contains a hex number. LSN stands for Log Sequence Number. If you have a close look at cdc tables, you can see that there is a table called cdc.lsn_time_mapping. This table is used to map between log sequence number (LSN) commit values and the time the transaction committed. Entries may also be logged for which there are no change tables entries. This allows the table to record the completion of LSN processing in periods of low or no change activity.

Rather than querying the CDC tables directly, there are two functions to get relevant data from these tables. sys.fn_cdc_map_lsn_to_time and sys.fn_cdc_map_time_to_lsn are the two functions and these functions will be created in the database upon enabling the CDC.

DECLARE @max_lsn binary(10);
SELECT @max_lsn = MAX(__$start_lsn)
FROM cdc.dbo_Products_CT
SELECT sys.fn_cdc_map_lsn_to_time(@max_lsn);
GO

The above script will return the mapping time for the maximum LSN number for the Products_CT table(Products_CT is the CDC table for Products).

Next, is to include this function in a standard select query.

SELECT sys.fn_cdc_map_lsn_to_time(__$start_lsn) TransactionTime,
__$operation Operatio,
ID,
ProductName ,
Date
FROM cdc.dbo_Products_CT

The above query will return following results:

sys.fn_cdc_map_time_to_lsn function will perform works in the opposite manner. That is it will return the LSN for the relevant datetime value. This function has two parameters; The first parameter is a relational operator which can be either one of following operators.

  • largest less than
  • largest less than or equal
  • smallest greater than
  • smallest greater than or equal

The above operators will allow a user to return the exact LSN number as most of the time there won’t be a LSN for the exact date time. For example, If you want to get data from 2008-08-25 09:45, there won’t be an LSN record which matches that time.

DECLARE @begin_time datetime, @end_time datetime, @begin_lsn binary(10), @end_lsn binary(10);

SET @begin_time = '2008-08-15 09:45:00.000'
Set @end_time = '2008-08-15 10:15:00.000'

SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @begin_time);
print @begin_lsn
SELECT sys.fn_cdc_map_lsn_to_time(@begin_lsn)

Returns 2008-08-15 09:46:53.090

SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);
print @end_lsn
SELECT sys.fn_cdc_map_lsn_to_time(@end_lsn)


Returns 2008-08-15 10:12:13.457

There are another two functions to query data. cdc.fn_cdc_get_all_changes_capture_instance and cdc.fn_cdc_get_net_changes_capture_instance are those function. cdc.fn_cdc_get_all_changes_capture_instance will be created after enabling cdc but to create second one you need to set 1 for @supports_net_changes parameter when using sys.sp_cdc_enable_table stored procedure.

EXECUTE sys.sp_cdc_enable_table
@source_schema = N'dbo'
, @source_name = N'Products'
, @role_name = N'cdc_Admn'
,@captured_column_list ='ID,ProductName,Date'
,@supports_net_changes=1
GO

cdc.fn_cdc_get_all_changes_capture_instance returns one row for each change applied to the source table within the specified log sequence number (LSN) range. In this function, there are three parameters. First two parameters are from and to LSNs. Last parameter is filter option. For the filter option you have two operations. all filter return all records but it returns only the updated records not before update record. This means that you will see records with operation 1,2 and 4. The other parameter, all update old will return all the records including before update records.

cdc.fn_cdc_get_net_changes_capture_instance returns one net change row for each source row changed within the specified LSN range. That is, when a source row has multiple changes during the LSN range, a single row that reflects the final content of the row is returned by the function. For example, if a transaction inserts a row in the source table and a subsequent transaction within the LSN range updates one or more columns in that row, the function returns only one row, which includes the updated column values.

DECLARE @begin_time datetime, @end_time datetime, @from_lsn binary(10), @to_lsn binary(10);
SET @end_time = GETDATE();
SELECT @from_lsn = sys.fn_cdc_get_min_lsn ('dbo_Products')
SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);
declare @row_filter nvarchar(30)
Set @row_filter ='all'
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_Products(@from_lsn, @to_lsn,@row_filter);

Returns 3 records one each for the records

SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Products(@from_lsn, @to_lsn,@row_filter);

Retruns all records for this time range.

The main drawback of this method of auditing is that it does not have user information.

DDL Changes
Monitoring your schema changes is very important as it will impact your future releases. In SQL Server 2005, you could use DDL triggers to capture those changes. However, it is not an easy task as you need to create a DDL trigger for each and every table. However, with cdc things have become easy. After enabling CDC in the database, there will be a table called [cdc].[ddl_history]. Rather than simply executing a SELECT statement, there is a stored procedure to execute to get the DDL changes.

exec sys.sp_cdc_get_ddl_history
@capture_instance ='dbo_Products'


The above T-SQL script will return the DDL history for the products table.

Like the previous case, you will not be able to find the user information from this query.

Conclusion
Although there are limitations (Enterprise edition is needed for CDC) in CDC, it is a perfect solution to capture data changes in your database.

CDC can also be used in data warehouses. As it needs discussion, I will leave this discussion for another article. Like other articles, I wish to open this for discussion.

No comments:

Post a Comment

Recent Posts

Archives