MS SQL Server: Speeding up the Performance of Table Counts in SQL Server 2005

Introduction
When we want to get the row count of a table, we use the transact-SQL function count (*). In the case of large table which may contain millions of rows, it can take a long time to return the row count of the whole table, and that leads to slow performance for the query.

Count () function
Every DBA know how to use count (*) and know how suffering the performance. SQL server makes a full index/ table scan to return the total table's row count. There was advice for the DBA to avoid using the aggregate function count () for the whole table because it affects the database performance. When the following query is executed in query analyzer for the demo database AdventureWorks:

use AdventureWorks
go
select count (*) from Sales.SalesOrderDetail

The Query analyzer returns 121317 rows.

When we click the icon "display estimated execution plan" in the toolbar of the management studio for SQL server 2005, we get the following graph:

As we see in the figure from right to left, the following actions are applied:

Index scan for the whole table, which is time consuming process.
Apply stream aggregate in the next step.

The New Method row_count ()
The information included in Object Catalog Views of SQL Server 2005: sys.partitions and sys.allocation_units are used to get the count of the rows of the whole table. This function can be used in SQL Server 2005.

sys.partitions View
sys.partitions View contains a row for each partition of all the tables and indexes in the database. All tables and indexes in SQL Server 2005 contain at least one partition in that view, even if they are not explicitly partitioned.

The View includes the following fields which are used in the new method:

Field Name	Data Type	Description
partition_id	bigint	ID of the partition. Is unique within a database.
object_id	int	ID of the table to which this partition belongs. Every table is composed from one partition at least.
index_id		ID of the index within the object to which this partition belongs. 0 : heap table 1: with clustered index
rows	bigint	number of rows in the partition.

sys.allocation_units
The sys.allocation_units view contains a row for each allocation unit in the database.

The View includes the following fields which are used in the new method:

Field Name	Data Type	Description
container_id	bigint	The container_id = sys.partitions.partition_id.
Type	tinyint	Type of allocation unit: 0 = Dropped 1 = In-row data (all data types, except LOB data types) 2 = Large object (LOB) data (text, ntext, image, xml) 3 = Row-overflow data

In the new UDF row_count, [sys.partitions] view is joined with [sys.allocation_units] view. The filter of selection based on the following criteria:

[sys.allocation_units].type = 1 which retrieve only the in-row data except the large object (LOB) of type like text, ntext, image.
[sys.partitions].index_id in (0,1): 0 is the heap table , 1 : Clustered table
[sys.partitions].rows is not null

The UDF row_count is executed in every database , and is given a permission public. The UDF row_count is described in the next section.

The UDF Function Code
IF EXISTS (SELECT name FROM dbo.sysobjects WHERE id = Object_id(N'[dbo].[row_count]') )
DROP FUNCTION [dbo].[row_count]
GO

CREATE FUNCTION dbo.row_count (@table_name sysname)
-- @table_name we want to get count

RETURNS bigint
/*
-------------------------------------------------------
-- Function Name: row_count
-- Author: Mohamed Hassan
-- Email: moh_hassan20@yahoo.com
-- Development Date: 08/11/2008
-- Version: 1.0
-- Description: Return row count of the whole table, as a replacement for count(*) , give extra performance at least 70% over , than count(*) for large tables with millions of rows
-- SQL Server: SQL server 2005
-- Usage Example: select dbo.row_count ('Sales.SalesOrderDetail')
-- Copyright:
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as
published by the Free Software Foundation, either version 3 of the
License, or any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-------------------------------------------------------
*/
AS
BEGIN
DECLARE @nn bigint -- number of rows

IF @table_name IS NOT NULL
BEGIN
SELECT @nn = sum( p.rows )
FROM sys.partitions p
LEFT JOIN sys.allocation_units a ON p.partition_id = a.container_id
WHERE
p.index_id in(0,1) -- 0 heap table , 1 table with clustered index
and p.rows is not null
and a.type = 1 -- row-data only , not LOB
and p.object_id = object_id(@table_name)
END
RETURN (@nn)
END
GO

The function usage
The function row_count is called by passing the full name of the table schema.table name

Example1:
Select dbo.row_count (schema.[table name]), as in the following example:
use AdventureWorks
go
select dbo.row_count ('Sales.SalesOrderDetail')

The query analyzer return the result of that query with 121317 rows in the table, which is the same as count(*) but quicker and achieve the best performance than count(*).

Example 2:
SELECT top 5 TABLE_SCHEMA, TABLE_NAME, (TABLE_SCHEMA +'.'+TABLE_NAME) 'Full table name',
dbo.row_count(TABLE_SCHEMA +'.'+TABLE_NAME) rows
FROM INFORMATION_SCHEMA.TABLES
where TABLE_TYPE ='BASE TABLE'
ORDER BY rows desc

The result of the query is in the following table:

Schema	Table Name	Full Table Name	rows
Sales	SalesOrderDetail	Sales.SalesOrderDetail	121317
Production	TransactionHistory	Production.TransactionHistory	113443
Production	TransactionHistory	Production.TransactionHistoryArchive	89253
Production	WorkOrder	Production.WorkOrder	72591
Production	WorkOrderRouting	Production.WorkOrderRouting	67131

The function is tested carefully by adding /deleting rows , truncating table , bulk insert /bulk delete , and return accurate results as given by count(*).

Performance Evaluation
A batch is executed to compare the performance of count(*) and UDF row_count ,as in the following code:

select dbo.row_count ('Sales.SalesOrderDetail')
go
select count (*) from Sales.SalesOrderDetail
go

When the execution plan is reviewed, we found the first query (row_count) cost relative to the batch is 7% and estimated subtree cost is 0.03 as in the following figure:

The second query (count(*)) cost relative to the batch is 93% and estimated subtree cost is 0.37 as in the following figure:

The performance of the UDF function row_count () is the best, and consume low resources of the database. The cost ratio between count () and the new UDF "row_count is" is 93: 7, which means that performance of row_count is better ten times more than count (*).

These numbers may vary based on the rows in the table, but in general it gives an indication how the new UDF is better than Count (*).

The UDF function can be efficient in the case you need to get the count of the whole table.

Conclusion
The built-in function count (*) is very consuming time to get the count of the whole table, especially in tables with millions of row.

The new method of the user defined function "row_count" which depends on Object Catalog Views give accurate results and quicker and achieve best performance than count(*).

This function can be used in SQL Server 2005, because these new views are available only on these versions.

MS SQL Server

Subscribe

Powered By

Saturday, September 20, 2008

Speeding up the Performance of Table Counts in SQL Server 2005

No comments:

Post a Comment

Recent Posts

Archives

Followers

Recent Comments