Introduction The New Method row_count () Field Name Data Type Description
When we want to get the row count of a table, we use the transact-SQL function count (*). In the case of large table which may contain millions of rows, it can take a long time to return the row count of the whole table, and that leads to slow performance for the query.
Count () function
Every DBA know how to use count (*) and know how suffering the performance. SQL server makes a full index/ table scan to return the total table's row count. There was advice for the DBA to avoid using the aggregate function count () for the whole table because it affects the database performance. When the following query is executed in query analyzer for the demo database AdventureWorks:
use AdventureWorks
go
select count (*) from Sales.SalesOrderDetail
The Query analyzer returns 121317 rows.
When we click the icon "display estimated execution plan" in the toolbar of the management studio for SQL server 2005, we get the following graph:
As we see in the figure from right to left, the following actions are applied:
The information included in Object Catalog Views of SQL Server 2005: sys.partitions and sys.allocation_units are used to get the count of the rows of the whole table. This function can be used in SQL Server 2005.
sys.partitions View
sys.partitions View contains a row for each partition of all the tables and indexes in the database. All tables and indexes in SQL Server 2005 contain at least one partition in that view, even if they are not explicitly partitioned.
The View includes the following fields which are used in the new method:
partition_id bigint ID of the partition. Is unique within a database. object_id int ID of the table to which this partition belongs. Every table is composed from one partition at least. index_id ID of the index within the object to which this partition belongs.
0 : heap table
1: with clustered index rows bigint number of rows in the partition.
sys.allocation_units
The sys.allocation_units view contains a row for each allocation unit in the database.
The View includes the following fields which are used in the new method:
Field Name | Data Type | Description |
container_id | bigint | The container_id = sys.partitions.partition_id. |
Type | tinyint | Type of allocation unit: 0 = Dropped 1 = In-row data (all data types, except LOB data types) 2 = Large object (LOB) data (text, ntext, image, xml) 3 = Row-overflow data |
In the new UDF row_count, [sys.partitions] view is joined with [sys.allocation_units] view. The filter of selection based on the following criteria:
- [sys.allocation_units].type = 1 which retrieve only the in-row data except the large object (LOB) of type like text, ntext, image.
- [sys.partitions].index_id in (0,1): 0 is the heap table , 1 : Clustered table
- [sys.partitions].rows is not null
The UDF row_count is executed in every database , and is given a permission public. The UDF row_count is described in the next section.
The UDF Function Code
IF EXISTS (SELECT name FROM dbo.sysobjects WHERE id = Object_id(N'[dbo].[row_count]') )
DROP FUNCTION [dbo].[row_count]
GO
CREATE FUNCTION dbo.row_count (@table_name sysname)
-- @table_name we want to get count
RETURNS bigint
/*
-------------------------------------------------------
-- Function Name: row_count
-- Author: Mohamed Hassan
-- Email: moh_hassan20@yahoo.com
-- Development Date: 08/11/2008
-- Version: 1.0
-- Description: Return row count of the whole table, as a replacement for count(*) , give extra performance at least 70% over , than count(*) for large tables with millions of rows
-- SQL Server: SQL server 2005
-- Usage Example: select dbo.row_count ('Sales.SalesOrderDetail')
-- Copyright:
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as
published by the Free Software Foundation, either version 3 of the
License, or any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-------------------------------------------------------
*/
AS
BEGIN
DECLARE @nn bigint -- number of rows
IF @table_name IS NOT NULL Schema Table Name Full Table Name rows
BEGIN
SELECT @nn = sum( p.rows )
FROM sys.partitions p
LEFT JOIN sys.allocation_units a ON p.partition_id = a.container_id
WHERE
p.index_id in(0,1) -- 0 heap table , 1 table with clustered index
and p.rows is not null
and a.type = 1 -- row-data only , not LOB
and p.object_id = object_id(@table_name)
END
RETURN (@nn)
END
GO
The function usage
The function row_count is called by passing the full name of the table schema.table name
Example1:
Select dbo.row_count (schema.[table name]), as in the following example:
use AdventureWorks
go
select dbo.row_count ('Sales.SalesOrderDetail')
The query analyzer return the result of that query with 121317 rows in the table, which is the same as count(*) but quicker and achieve the best performance than count(*).
Example 2:
SELECT top 5 TABLE_SCHEMA, TABLE_NAME, (TABLE_SCHEMA +'.'+TABLE_NAME) 'Full table name',
dbo.row_count(TABLE_SCHEMA +'.'+TABLE_NAME) rows
FROM INFORMATION_SCHEMA.TABLES
where TABLE_TYPE ='BASE TABLE'
ORDER BY rows desc
The result of the query is in the following table:
Sales SalesOrderDetail Sales.SalesOrderDetail 121317 Production TransactionHistory Production.TransactionHistory 113443 Production TransactionHistory Production.TransactionHistoryArchive 89253 Production WorkOrder Production.WorkOrder 72591 Production WorkOrderRouting Production.WorkOrderRouting 67131
The function is tested carefully by adding /deleting rows , truncating table , bulk insert /bulk delete , and return accurate results as given by count(*).
Performance Evaluation
A batch is executed to compare the performance of count(*) and UDF row_count ,as in the following code:
select dbo.row_count ('Sales.SalesOrderDetail')
go
select count (*) from Sales.SalesOrderDetail
go
When the execution plan is reviewed, we found the first query (row_count) cost relative to the batch is 7% and estimated subtree cost is 0.03 as in the following figure:
The second query (count(*)) cost relative to the batch is 93% and estimated subtree cost is 0.37 as in the following figure:
The performance of the UDF function row_count () is the best, and consume low resources of the database. The cost ratio between count () and the new UDF "row_count is" is 93: 7, which means that performance of row_count is better ten times more than count (*).
These numbers may vary based on the rows in the table, but in general it gives an indication how the new UDF is better than Count (*).
The UDF function can be efficient in the case you need to get the count of the whole table.
Conclusion
The built-in function count (*) is very consuming time to get the count of the whole table, especially in tables with millions of row.
The new method of the user defined function "row_count" which depends on Object Catalog Views give accurate results and quicker and achieve best performance than count(*).
This function can be used in SQL Server 2005, because these new views are available only on these versions.
No comments:
Post a Comment