
Actress Mega Nair
MS SQL Server Database, Queries, Sample Codes and Errors
In the different projects where I perform a performance audit or am involved in performance tuning, I often find that while queries and stored procedures function correctly, they are not efficient in processing larger data sets. Many OLTP databases are set up as OLAP databases to handle single records instead of a record set as a whole.
In tuning a query with proper WHERE clauses, indexes, etc., you can often achieve a performance gain. However, when tuning the complete process, or handling record sets as a whole, the performance gain can be many times greater with this tuning.
Tuning is a process of getting the optimal speed when working through the mutations using the least amount of recourses possible, while still keeping optimal performance in selections of data. Adding an index to a table will speed up selections, but slow down any mutations (inserts, updates, or deletes) in the table. The choice in defining the optimal balance between these two depends not only on the queries being executed on the database, but also on their frequency and priority.
If, for instance, all mutations are handled in off-hours, the tuning is often best if these can be completed within these hours, but the maximum performance is set up for the selections during office hours. During these hours, people are waiting for results of a selection.
The performance is determined by the limitations of the available resources. The specific hardware available, type of resource needed to perform the requested query and the concurrent use of the server and its resources determine the time needed. Often one of the resources determines the major part of the query cost. When performing on the fly calculations, the processor is a key issue. When the amount of data increases, memory and disk I/O are a large influence. When tuning, the biggest gain can be reached by addressing these resources first. The query execution plan gives insight into the use of the resources.
In query tuning, the main focus is to execute the existing queries with the best performance possible. By creating indexes, retrieving only the necessary columns and rows with correct where clauses, using indexed views, using pre-calculated values, spreading tables over multiple disks, etc., a given query's speed can be increased tremendously. However, there is a limit to the extent this can be achieved. After this, extra resources like more memory or faster disks can be added.
By altering the complete process of handling the dataflow, the target is to use as few connections as possible, and limiting the number of query executions and complexity. This may require the data model to be altered. Because the process is always very specific to the given situation and often influences many aspects of the database, there are no general guidelines to lead through this tuning. However, by identifying the area where the largest amount of time or resources are used in handling the data, a critical look at the current situation can lead to new methods. Most of the time, there is more than one method of handling the dataflow.
Below are some illustrative examples.
Making a connection to a database takes time. Executing even a simple query takes time to compile and execute. This overhead is partly dependant on the table and its content, but always takes some time. For instance, the following code creates a table with one field. Without inserting any data, I query the table repeatedly and note the time:
CREATE TABLE tbl_Test (TestID CHAR(1))
DECLARE @Time1 DATETIME, @Time2 DATETIME, @i INT
SELECT @Time1 = GETDATE(), @i=0
AGAIN:
SELECT * FROM tbl_Test
SELECT @Time2 = GETDATE(), @i=@i+1
PRINT 'TIME IS ' + CONVERT(CHAR, @Time2, 14) + ', i = ' + CONVERT(CHAR, @i) + ', TIMEDIFF = ' + CONVERT (CHAR, DATEDIFF(ms, @Time1, @Time2))
SELECT * FROM tbl_Test
SELECT @Time1 = GETDATE(), @i=@i+1
PRINT 'TIME IS ' + CONVERT(CHAR, @Time2, 14) + ', i = ' + CONVERT(CHAR, @i) + ', TIMEDIFF = ' + CONVERT (CHAR, DATEDIFF(ms, @Time2, @Time1))
IF @i <>
This will produce 1000 empty selections. The messages tell me the time difference between the previous selection and the current one. The first 160 or so selections are executed within the same millisecond. However, even in this small selection, after 160 selections there is some overhead that can be measured.
TIME IS 14:54:29:430, i = 158, TIMEDIFF = 0
(0 row(s) affected)
TIME IS 14:54:29:430, i = 159, TIMEDIFF = 0
(0 row(s) affected)
TIME IS 14:54:29:430, i = 160, TIMEDIFF = 16
(0 row(s) affected)
TIME IS 14:54:29:447, i = 161, TIMEDIFF = 0
(0 row(s) affected)
TIME IS 14:54:29:447, i = 162, TIMEDIFF = 0
As the table definitions grow more complex and the number of records in them increases, this will occur faster and with greater time loss.
The selection speed of different record sets is not linear to the number of rows. Because many steps have to be taken for any selections, getting extra records out of the database often hardly takes any more time. In a typical database, I have about 17 million records in a table. By making a selection of 20,000, 50,000, 100,000, and 150,000 records, I calculated the execution time per record. These are some of the results:
Rows | Rows / Second |
---|---|
20,000 | 476 |
51,987 | 456 |
20,000 | 377 |
51,987 | 702 |
50,000 | 704 |
133,276 | 1,293 |
50,000 | 694 |
133,276 | 1,211 |
100,000 | 1,369 |
282,818 | 2,643 |
100,000 | 1,388 |
282,818 | 2,525 |
150,000 | 2,027 |
421,581 | 3,798 |
150,000 | 2,027 |
421,581 | 3,603 |
20,000 | 408 |
51,987 | 577 |
20,000 | 400 |
51,987 | 742 |
50,000 | 735 |
133,276 | 1,402 |
50,000 | 735 |
133,276 | 1,373 |
100,000 | 1,449 |
282,818 | 2,525 |
100,000 | 1,470 |
282,818 | 2,459 |
150,000 | 2173 |
421,581 | 4,093 |
150,000 | 2,142 |
421,581 | 4,053 |
This test indicates that one selection of 100,000 records is about three times as fast as four selections of 20,000 records each. So if possible, get all the information you need in one selection instead of going back to the database many times.
A well-designed relational OLAP database gets all mutations via source files. In a file, a complete record for an entity is given in a single line. The definition of the records is as follows:
First 50 characters identify the file and its origin with fixed field lengths.
After this, one or more categories are listed. These correspond to one or more tables in our database. The first four characters identify the category. The next three characters identify the length of the category content. After this, the next category starts.
Within a category, one or more elements are listed. These correspond to fields in the database. The first four characters identify the element, the next three characters identify the length of the content.
Because the number of categories, as well as the number of elements varies, and they have to be linked to a single entity, the chosen method was to parse the file in a .NET application to split it into a relational model for each record. From here, for each record an INSERT with VALUES was given for each table in the database. Loading files of a million records (as was common) with an average of nine tables leads to 9 million connections / executions to the database.
I set up a file import table, where the complete record was bulk loaded into the database. I added an identity field to uniquely identify a record / entity. From here, I parsed the identifying fields for the file, wrote them to a separate table, and removed this part of the record.
Next, I inserted all first categories into a separate table, removed this category from the body, and repeated the step until all categories were split. I repeated this step for each element in a category.
I used this code for it:
-- CAPTURE RECORD INFO
INSERT INTO dbo.tbl_RecordInfo (RecordID, IDField, Status, ...)
SELECT RecordID, SUBSTRING (RecordBody, 1, 10), SUBSTRING (RecordBody, 11, 1), ....
FROM dbo.tbl_RecordImport
--REMOVE RECORD INFO FROM BODY
UPDATE dbo.tbl_RecordImport
SET RecordBody = SUBSTRING(RecordBody, 51, LEN(RecordBody) - 50)
SET @CategoryCount = 0
CATEGORY_LOOP:
SET @CategoryCount = @CategoryCount + 1
--INSERT CATEGORY
INSERT INTO dbo.tbl_RecordCategory (RecordID, SortOrder, CatNumber, CatBody)
SELECT RecordID, @CategoryCount, SUBSTRING (RecordBody, 1, 2), SUBSTRING (RecordBody, 6, CONVERT(INT, SUBSTRING(RecordBody, 3, 3)))
FROM dbo.tbl_RecordImport
--REMOVE ALL RECORDS OF WICH ALL CATEGORIES ARE HANDLED
DELETE FROM dbo.tbl_RecordImport
WHERE (LEN(RecordBody)= CONVERT(INT, SUBSTRING(RecordBody, 3, 3)) + 10)
--REMOVE HANDLED CATEGORY FROM BODY
UPDATE dbo.tbl_RecordImport
SET RecordBody = SUBSTRING(RecordBody, CONVERT(INT, SUBSTRING(RecordBody, 3, 3)) + 6, LEN(RecordBody) - CONVERT(INT, SUBSTRING(RecordBody, 3, 3)) + 5)
SET @Rows = @@ROWCOUNT
--IF NOT ALL CATEGORIES ARE HANDLED, LOOP AGAIN
IF @Rows > 0 GOTO CATEGORY_LOOP
ELEMENT_LOOP:
--INSERT ELEMENT
INSERT INTO dbo.tbl_Element (CatID, ElementNumber, ElementContent)
SELECT CatID, SUBSTRING (CatBody, 1, 4), SUBSTRING (CatBody, 8, CONVERT(INT, SUBSTRING (CatBody, 5, 3)))
FROM dbo.tbl_RecordCategory
WHERE CatBody IS NOT NULL
--REMOVE BODY IF ALL ELEMENTS ARE HANDLED
UPDATE dbo.tbl_RecordCategory
SET CatBody = NULL
WHERE LEN(CatBody) = CONVERT(INT, SUBSTRING(CatBody, 5, 3)) + 7
--REMOVE ELEMENT FROM BODY
UPDATE dbo.tbl_RecordCategory
SET CatBody = SUBSTRING (CatBody, CONVERT(INT, SUBSTRING(CatBody, 5, 3)) + 8, LEN(CatBody) - CONVERT(INT, SUBSTRING(CatBody, 5, 3)) + 7 )
WHERE CatInhoud IS NOT NULL
SET @Rows = @@ROWCOUNT
--IF NOT ALL ELEMENTS ARE HANDLED, LOOP AGAIN
IF @Rows > 0 GOTO ELEMENT _LOOP
This code splits the elements into readable pieces for the database. In order to populate the tables in the database, the following view gives the results in a table-like record set:
SELECT
C.BerichtID AS 'BerichtID',
C.Nummer AS 'Categorie',
C.Volgorde AS 'Volgorde',
MAX(CASE E.ElementNumber WHEN '0110' THEN E.ElementContent ELSE NULL END) AS 'E0110',
MAX(CASE E.ElementNumber WHEN '0120' THEN E.ElementContent ELSE NULL END) AS 'E0120',
MAX(CASE E.ElementNumber WHEN '0130' THEN E.ElementContent ELSE NULL END) AS 'E0130',
FROM dbo.tbl_RecordCategory C
LEFT JOIN dbo.tbl_Element E ON C.CatID = E.CatID
WHERE C.CatNumber = '01'
GROUP BY C.CatNumber, C.SortOrder, C.RecordID
The whole process loops through all categories and each element in them. This is about 20 loops each, hence 40 database executions. A file of 100,000 records is completely handled in the database in about two minutes. The alternative, handling each record separately takes close to an hour. By changing the process, the performance is 25 times faster.
As illustrated, there can be a great performance boost by altering the data-process. There is often more than one way to insert or manipulate the data in an OLAP database. By trying more than one method, insight can be gained on the options, flexibility, and processing speed. When working with large datasets, the objective is to handle as many records as possible in one set. This may result in a significant performance gain.
Nils Bevaart has been a DBA for over seven years and has had hands-on experience in server management, database design, data warehousing, and performance tuning. Starting in 2006, he set up a company specializing in data processing, covering issues of effective maintenance, dataflow, database architecture, consultancy, and training.
A proper, technically correct system and storage sizing assessment of a SQL Server system and storage configuration needs to consider a broad range of CPU, memory and IO usage analysis. The cost of this assessment might be $10-20K with proper accounting for personnel time, depending on the availability of an existing production environment from which forward projections can be made, or whether this has to be done entirely from theoretical models and a database prototype. Now, should it turn out that a $20K server and storage system is determined to be overkill to meet the final requirements, it should be asked whether it makes sense to conduct a rigorous technical assessment. Fortunately, a reasonable set of system configurations can be arrived at without heavy technical analysis, driven mostly by the cost structure of the Windows operating system, SQL Server licensing, and the server platform. In cases where a large system with more than four processor sockets is required, it still does make sense to conduct a proper technical analysis.
Management Summary
Upper management people frequently do not have time to internalize the detailed rational for each system configuration decision and only want a quick summary. The simple picture and answer, without the detailed underlying rational is presented below. In case it is not already apparent, the general theme is prolific IO bandwidth and random IOPS capability. In theory, the realizable sequential bandwidth over 2 internal SCSI and 8 external SCSI channels is 2.5GB/sec, but this has not been verified.
The table below is a quick summary of system configuration options for systems using the Intel E7230, E7520 and E8500 chipsets. The configurations are based on U320 SCSI storage technology. Over the next year, expect SCSI to be replaced by a combination of SAS and SATA as the technology of choice in a wide range of server and storage systems.
Processor Sockets | Single | Dual | Quad |
Chipset | E7230 | E7520 | E8500 |
Dual Core Processors | 1 | 2 | 4 |
Memory Sockets | 4 | 6-8 | 16-32 |
Native PCI-Express config. | 1 x8, 1 x4 | 3 x8 | 3 x8, 1x4 |
Embedded controller: | SATA | SCSI | SCSI |
SCSI Channels | Int. SATA | 2 Internal | 2 Internal |
External Storage Enclosures | 1 | 2 | 4 |
Disk Drives - Internal | 2-4 | 4-6 | 4-8 |
Sequential Disk Bandwidth | 600MB/sec+ | 1.2GB/sec+ | 2GB/sec+ |
*A single x8 slots can be configured as two x4 slots or converted to 2 PCI-X busses.
A brief description of software, system, memory and storage choices are described below.
One final point is of critical importance. Test the performance of the final configuration. What a disk configuration should be able to do and can actually do are not the same thing.
Software Versions
For practical purposes, the software licensing fee structure helps in narrowing the recommended set of system configurations. The table below shows the Windows Server 2003 R2 processor and memory support by edition for both 32 and 64-bit versions.
Windows Server 2003 R2 Processor and Memory support by Edition | Standard Ed. 32-bit / 64-bit | Enterprise Ed. 32-bit / 64-bit | Datacenter Ed. |
Max Processor (Sockets) | 4 / 4 | 8 / 8 | 32 / 64 |
Max Memory | 4GB / 32GB | 64GB / 1TB | 64GB / 1TB |
The table below shows SQL Server 2000 processor and memory support by edition. The SQL Server 2000 64-bit version is for Itanium only.
SQL Server 2000 Processor and Memory support by Edition | Standard Ed. 32-bit / 64-bit | Enterprise Ed. 32-bit / 64-bit | |
Max Processor (Sockets) | 4 | No Limit | |
Max Memory | 2GB | 64GB / 512GB |
The table below shows SQL Server 2005 processor and memory support by edition for 32-bit and 64-bit versions.
SQL Server 2005 Processor and Memory support by Edition | Workgroup | Standard Ed. 32-bit / 64-bit | Enterprise Ed. 32-bit / 64-bit |
Max Processor (Sockets) | 2 | 4 | No Limit |
Max Memory | 3GB | OS Max | OS Max |
The main point of the above is that the older software products restricted the ability to use memory. At the time of product launch, the restrictions were reasonable. The 3GB and AWE where limited to the Enterprise Editions, which was not unreasonable in 2000, when 4GB memory was expensive and required an expensive system as well. Since then 8-16GB memory is not unreasonable for the Standard Edition price point, but the lack new releases meant the only option was SQL Server 2000 Enterprise Edition. With the release of SQL Server 2005, memory restrictions have been lifted in Standard Edition. The OS does have memory restrictions, but these are more relaxed, especially for the 64-bit versions, and the price step to Enterprise Edition is not as steep.
In any case, the preferred solution is the 64-bit version of Windows Server 2003. Either Standard or Enterprise Editions will fit most needs. SQL Server 2005 is strongly favored over SQL Server 2000 because of the increased memory capability, more so for the Standard Edition. The 64-bit version has unimpeded access to >4GB memory compared with 32-bit versions, which must use AWE.
Platforms
For the very large majority of needs today, the 1, 2 and 4 socket systems are more than adequate. Most sites buy either the 2 or 4 socket system. However, given the capability of microprocessors today and the prevalent availability of multi-core processors, serious consideration should also be given to the single socket system when appropriate. Vendors are reluctant to recommend this system over higher margin dual socket systems.
The table below summarizes the processor options. Dual core processors are currently available for all option except for Itanium, for which dual core should be available in mid or late 2006. The single core processors are usually available at slightly higher frequency than the comparable dual core versions. The software licensing costs will strongly favor the dual core option except for code that must execute in a serialized manner.
Processor | Sockets | Notes |
AMD Opteron | 1-4 | General purpose, best overall characteristics. |
Intel Pentium D | 1 | Entry |
Intel Core Duo | 1 | Entry, low power, 32-bit only |
Intel Xeon | 2 | Mid-range, excellent table scan performance, Hyper-threading best for high network round-trip applications. |
Intel Xeon MP | 4-32 | Same |
Itanium | 2-64 | High-end, favored in full 64-bit, scale-up environments. Dual core in mid-2006. |
The AMD Opteron processor probably has the higher performance for a broad range of SQL Server operations in the early 2006 time frame. The Opteron also does exceptionally well where serialized memory accesses are required. The Intel Pentium D, Xeon and Xeon MP processor line is somewhat behind at this point in time, mostly due to thermal limitations, capping the top operating frequency far below the maximum that could be supported by the transistor switching rate. The Pentium 4 / Xeon family does have the best table scan performance of all processor lines. The Hyper-Threading feature in this line does improve performance in high network round-trip volume applications. Unfortunately, HT does not improve performance uniformly for other operations, and may even cause performance degradation in some operations, so the use of HT requires careful analysis. HT does improve compression performance by 50% per core, which is very impressive. This characteristic benefits the database backup and restore operations when using third party backup compression software such as Quest (formerly Imceda) LiteSpeed.
The Intel Core Duo and its predecessor, Pentium M are 32-bit only processors, but have excellent performance and very low power consumption. These are most suited to 32-bit only environments, single socket systems, in high density or other power constrained environments. The Itanium 2 line is currently lagging in the 2-4 socket system space because of the delay in releasing a dual-core version and that the current 130nm Madison processor is competing against 90nm Opteron and Xeon processors. However, in the high-end scale-up environments, Itanium systems warrant serious consideration for Windows and SQL Server based solutions. Unisys offers a high-end product line based on the Xeon MP processor. Prior to the introduction of 64-bit capability in the Xeon MP line, there were serious limitations in the use 32-bit Xeon MP processors in large NUMA systems, mostly due to the limited 4GB address space. It is unclear today how the market for systems with more than four sockets will split between Itanium, Xeon MP and possible Opteron contenders, now that full 64-bit capability is available in all processor lines.
Below is a simplified representation of a single socket system based on the Intel E7230 chipset. The IO capability of this desktop derived chipset is quite impressive. There is one x8 PCI-Express port that is bridged into 2 PCI-X busses. The first at 100MHz has 2 slots and the second at 133MHz with 1 slot. Alternative configurations include a single x8 PCI-E port, two x4 PCI-E ports or one x4 PCI-E and one PCI-X bus. In general, PCI-E configurations should be preferred if PCI-E adapters are available. The DMI port connects the memory controller to the ICH7 with the same bandwidth as one x4 PCI-E port. The ICH7 has 4 SATA ports at 3.0Gbit/sec, 2 PCI-E x1 ports for gigabit Ethernet and one x4 PCI-E port.
The E7230 chipset should have no problems driving 4 internal SATA drives plus 2 RAID controllers and 4 SCSI channels. However, a single RAID controller with 2 SCSI channels should be adequate for this single socket system. It could be mentioned that the E7230 is derived from the 9XX desktop chipset line, which actually has one x16 PCI-E port for graphics and one DMI port, which is essentially a PCI-E x4 port. There is no reason the x16 graphics port could not have been configured as two x8 ports, except that IO capability is already beyond the needs for most single socket systems.
The figure below is a representation of a two socket system based on the Intel E7520 chipset. The E7520 has three x8 PCI-E ports and one HI interface for low bandwidth legacy devices. Each x8 PCI has a very impressive 2Gbytes/sec nominal bandwidth in each direction. The actual sustainable bandwidth has not been verified.
As shown, one x8 PCI-E port is bridged into two PCI-X busses. Each PCI-X bus has an embedded PCI-X device in addition to the available open slots. Alternative configurations for the E7520 chipset include 3 x8 PCI-E ports, or 1 x8 PCI-E and four PCI-X busses, two each on two IO bridges. Each x8 PCI-E port can also be configured as two x4 ports. For the two socket system, 4-8 internal SCSI disks on the two embedded SCSI channels and two dual channel RAID controllers for a total of 2 internal and 4 external SCSI channels is a good combination.
The figure below shows a four socket system based on the Intel E8500 chipset. There are two independent processor busses, each capable of supporting two processor sockets for a total of four sockets. There are the four independent memory interfaces (IMI) that attach to the north bridge. Each IMI channel can be expanded with an external memory bridge (XMB) device into two DDR-II memory channels. In the IO subsystem, there are three x8 PCI-E ports, one x4 port and the legacy HI link for legacy IO devices. As with the other chipsets, each x8 port can be configured as two x4 ports. As shown below, one x8 port is bridged into two PCI-X busses, with embedded SCSI and GE. One x8 port is left as is, the third x8 port is configured as x4 ports for a configuration of one 1 x8 and three x4 PCI-E slots.
The disk configuration is 4-8 internal disks on the two embedded SCSI channels and four dual channel RAID controllers on each of the available PCI-E slots, for a total of 10 SCSI channels, 8 connected to four external dual-channel disk enclosures. The objective of this configuration is to achieve >2GB/sec sequential disk bandwidth. It is unclear whether the E8500 can actually sustain this bandwidth, but at least 1.5GB/sec has been verified.
In any case, both the E7520 and E8500 have very impressive IO bandwidth capability. If there is any disappoint, it the continued use of the HI interface for legacy devices, considering that the much improved DMI interface is available on desktop chipsets. It could be that Intel did not want to risk a delay in launch of the E7520 and E8500 chipsets on non-essential new technology, but a follow chipset could have corrected this deficiency.
The AMD Opteron chipset has been verified to sustain 2.4GB/sec in disk bandwidth. (This was reported by Microsoft Research).
The technically correct assessment of memory requirement should consider disk load versus memory to determine the best cost and system characteristics. The cost of the technical assessment will probably exceed the cost of memory except for very large systems. In most cases, a reasonable memory configuration can be arrived at from the operating system and SQL Server edition support and the system DIMM sockets. The table below shows the common number of DIMM sockets by system type.
Processor | Chipset | Processor Sockets | DIMMS sockets | Max Memory |
Opteron | 8000 | 2 | 8 | 16-32GB |
Opteron | 4 | 32 | 32-128GB | |
Pentium D | E7230 | 1 | 4 | 8GB |
Xeon | E7520 | 2 | 6-8 | 12-16GB |
Xeon MP | E8500/8501 | 4 | 16-32 | 32-64GB |
The table below shows the current price of DDR2 PC2-3200 registered ECC memory from Crucial (www.crucial.com) as of Mar 2006.
Capacity | Mar-2006 Price |
512MB | $100 |
1GB | $190 |
2GB | $1000 / $550 |
At any given point in time, the high volume DIMM products will have an essentially linear price-capacity relation. The low volume high capacity DIMM products will have a much higher price on a per GB basis. Approximately every two years, the high end DIMM doubles in capacity. Until recently, the price of the 2GB ECC DIMM was more than quadruple the price of the 1GB ECC DIMM, or twice the cost per GB. Just recently, a new 2GB DIMM SKU has appeared at a much more favorable price relative to the 1GB DIMM. The original 2GB DIMM retains the unfavorable price. It will probably another 2 years before the 4GB ECC DIMM becomes favorably priced relative to the 2GB DIMM.
In most cases it is simpler to fill the system with the largest capacity commodity DIMM, currently a 1GB or 2GB module, than to conduct a detailed analysis, except when otherwise limited by OS or SQL Server edition. If it is known that disk IO is sensitive to memory and the cost of the storage system is high, then the large capacity DIMM can make sense.
Storage System
The biggest and most frequent mistake people make with regard to the storage system is in considering only capacity as the primary requirement. Storage system vendors should know better, but tend to propagate this mistake. The storage system needs to be sized for both performance and capacity. Storage performance is further distinguished in terms of random IO capability and sequential transfer bandwidth. Given the capacity of individual disk drives today, it actually turns out that on meeting storage performance requirements, capacity is essentially irrelevant.
In most cases, one should buy the lowest sensible capacity 10K or 15K disk drives. Today, the sensible capacities are the 73GB 10K and the 36GB 15K disk drives. Both drives are typically $200 per disk when purchased as bare drives. Prices from system vendors can be 50% higher. The cost per drive in a SAN system can be over $2000 per drive. When the cost of the next capacity (146GB 10K and 73GB 15) is less than 30% higher, these become the lowest sensible capacity drives. The reason that one should not wait for price parity is that a larger drive can short-stroked for better random IO performance for a fixed data size. In terms of cost per IOP, both the 10K and 15K drives are reasonably comparable. The storage configuration with the 10K drives will have higher capacity while the 15K version will have lower latency in random IO operations.
As mentioned in the previous section, the technically correct assessment of storage performance should consider the memory versus disk system performance characteristics. Without knowing the actual application characteristics and needs, the following can be used as a minimum baseline configuration. Most transactional databases will perform both transactions and some amount of reporting, even if replication is used to move many reports to a replicated database.
Reporting queries will frequently involve either large table scans or sizeable index range scans involving a large number of rows. If these operations need to go to disk for data, the storage system could be saturated and effectively shutdown the system transaction processing capability. One reason is that the report query may generate a large number of disk read-ahead calls, which leads to a high queue depth in the disk system. Normal transactions may generate only a handful of disk reads, probably without read-ahead reads. So even if the intended priority of the transaction is higher than that of the report, in effect, the storage system has been prioritized to favor the report IO requests. Transaction driven disk reads will sit behind a long line of report driven disk reads.
Now it turns out SQL Server 2000 has limited ability to read from disk in a table scan operation. Without hints, SQL Server 2000 table scan operation in most systems will generate between 300-400MB/sec in disk reads. With the NOLOCK hint, the table scan might generate between 700-800MB/sec in disk traffic. If the configured storage system has less sequential capacity than this, a table scan can saturate the disk system, and cause a substantial degradation in transaction driven disk IO. If the storage system can handle both the sequential table scan and transaction random IO load, then there may be little to no degradation in transaction throughput, depending on the availability of CPU and other resources.
SQL Server 2005 has no such limitation in sequential disk operations. The SQL Server 2005 table scan operation on the HP Superdome with dozens of dual-channel U320 SCSI RAID controllers and 3-5 disks in each channel can drive over 12GB/sec in disk traffic. So it may not be possible or practical to configure the storage system to exceed the capability of a SQL Server 2005 table scan. On the positive side, it will also be possible to complete a report query that much quicker, thus minimizing the period during which transactions are disrupted. An infrequent 2-3 sec disruption should be barely noticeable. This cautions against the setting of arbitrary Service Level Agreements on response time without a comprehensive and detailed understanding of database and system performance characteristics, especially so if the user does not perceive a meaningful impact. It is always better to specify a high percentage of responses under a reasonable time, and allow for a small percentage at a longer interval, unless true absolute requirements mandates otherwise.
The storage performance goals are then as follows. Sequential bandwidth for the SQL Server main database data files should exceed 400-800MB/sec, depending on requirements. The recommended minimum configuration to meet this goal is described for SCSI, SAS and FC storage systems.
Direct Attach Storage on U320 SCSI Technology
The rational for using the internal drive bays is that the drive bays are already paid for, along with an embedded controller. The bare drive cost is relatively low compared with the full amortized external storage cost covering the disk drive, the external enclosure and the controller. Even in a clustered environment where the clustered resources must be shared, it never hurts to have extra bandwidth on a non-shared resource.
The rational for external storage is as follows. Each U320 SCSI channel with a nominal bandwidth of 320MB/sec can in fact sustain approximately 240-260MB/sec in large block sequential disk transfers. It takes 3 current generation disk drives to saturate a single U320 channel. Now it is not always possible to use the fastest portion of each disk drive, or to achieve pure sequential transfer. The external enclosures also typically have a minimum of 7 bays per channel. So for the combination of reasons, any where from 4-7 disks per SCSI channel is practical. External 3U SCSI storage enclosures are available with single or dual SCSI channels. Configuring more the 7 disks per channel is a waste of sequential bandwidth capability, so avoid single channel enclosures unless a very high spindle count is required that would exceed the capacity when configuring up to 7 disks per channel. Obsolete technologies tend to persist in storage products, so avoid the U160 enclosures.
The figures below shows the sequential disk read and write characteristics for a current generation 36GB 15 SCSI drive. The front portion of a 15K drive can transfer data at over 90MB/sec. The inner portion performance is still over 50MB/sec. Data placement can affect performance. A common practice is to use the front portion of each disk drive for the most performance critical data, the middle for less critical data and the end for backup and auxiliary storage.
The choice of the RAID controllers then falls to either dual or quad channel controllers. The PCI-X bus at 133MHz and 64-bit has a nominal bandwidth of 1GB/sec, which might seem appropriate for four U320 channels capable of sustaining approximately 250MB/sec each. However, it is not clear that any of the current quad-channel PCI-X SCSI controllers can actually sustain more the 500-600MB/sec. So the best choice for controllers is the dual channel model unless very high spindle count is required that would exceed the capacity of dual channel controllers.
Fiber Channel / SAN Storage
Fiber channel was designed for the high-end storage market. For some reason, the FC/SAN vendors did not continue to increase FC bandwidth from 2Gbit/sec to 4Gbit/sec in an expeditious time frame. In the mean time, U320 SCSI is being replaced by 3Gbit/sec SAS and even the original SATA 1.5Gbit/sec is being replaced with an improved 3Gbit/sec version. One might argue that FC at 4Gbit/sec, finally becoming available after a long pause since the 2Gbit/sec launch, is now faster than SAS and SATA. However the SAS and SATA 3Gbit/sec links are meant to connect 1-4 disk drives, while SAN system intend to cram 14-28 disks over a single 2-4Gbit/sec link, severely constraining sequential bandwidth.
Another issue with mid-range SAN systems is that most adapted components from existing server systems, either the Xeon processor with the ServerWorks GC-LE chipset, or RISC processors and chipsets with even less capability. At the time, none of the existing chipsets could deliver massive IO bandwidth like chipsets today can. So the mid-range SAN systems from 2003-2005 were not the most practical solutions for delivering high sequential bandwidth at a reasonable price point compared with direct attach storage options.
For example: in the EMC Clarion line, the CX 700 supports 8 FC ports to hosts and 8 ports to storage. Configuring 8 racks of 14 drives over 8 FC ports from storage to SAN to host should deliver nearly the full 1.3GB/sec maximum sustained transfer rate of the CX 700. For the HP EVA line, the underlying storage controllers are the HSV110/210, which have 2 ports to hosts, and 4 ports to storage. Each rack of 14 disks connected to a single HSV controller can deliver 300MB/sec sequential bandwidth, but more drives are required for high write bandwidth. For best overall performance and practicality, 2 racks of disks per HSV controller is probably the better balance over 1 disk rack per HSV controller.
Serial Attached SCSI (SAS)
The recommended configuration for the newer SAS technology drives is not yet clear. Of the major vendors, only HP currently offers SAS with the external storage unit holding 10 SFF drives in a 1U enclosure. The SAS RAID controller has 8 3Gbit/sec ports. Based on current disk drives, 2-3 disk drives per port is probably reasonable, pending verification that a single adapter can handle the combined sequential throughput of 12-18 disk drives. Otherwise consider 1-2 disk drives per port. Two 6 port adapters, 2 1U storage units and the internal bays should meet the basic SQL Server 2000 storage performance level. When configuring for higher sequential transfer rate, verify the bandwidth possible in each configuration to determine the best controller – disks combination.
SATA Controllers and Disk Drives
Many people today ask about SATA controllers and disks drives, especially now that native command queuing (NCQ) is now finally supported by both SATA controllers and disk drives. The low cost and high capacity of SATA drives is particularly attractive. The main issue is that most SATA drives are 7200RPM drives targeted at the desktop market. The design life of a desktop drive is typically 1-2 years at 20% duty cycle, while most server drives are designed for 5 years at 100% duty cycle. Primarily for this reason, SATA drives should be restricted to development environments, possibly some data warehouse environments, and potentially as backup storage until more about SATA drive reliability is better understood. SATA drives should not be used in production transaction processing environments at this time.
Summary
The main points are as follows. The 64-bit version of Windows Server 2003 has much expanded memory capability, even in Standard Edition. SQL Server 2005 Standard has removed memory constraints. The 64-bit version has full access to large address space, while the 32-bit version must use the PAE/AWE combination. Dual core processors are preferred over single core processors in most circumstances due to the favorable licensing terms. Storage configuration should factor random and sequential IO performance as the primary consideration. This means distributing IO load over as many spindles (disk drives) and IO channels as practical for the system. As always, verify the actual performance characteristics of the final configuration.
Enterprise-grade reporting and data analysis has become more decision oriented due to Microsoft's SQL Server 2005. Moreover, the expenses involved in building and deploying applications have stayed down due to the low cost of the software. SQL Server 2005, being a highly reliable database platform for vital applications, is capable of delivering high levels of performance, security, and availability.
ASP.NET 2.0 lets developers add data to a Web site in a manner that is almost identical to how they would normally add another server control with the help of the control-based approach to data access. The performance of your Web application can be enhanced with the introduction of several new features of ASP.NET 2.0. Combined with SQL Server 2005, the new features of ASP.NET 2.0 allow data to be accessed only when changes have taken place in the database, resulting in extraordinary database scalability enhancements.
By default, almost all ASP.NET 2.0 application services are configured to use the built-in SQL Express provider. When you start using any of these application services for the first time, this provider will automatically create a new database and offer a simple way to get started without any setup problems.
SQL Server 2005 can be installed in two modes, the full-blown version, or the compact Express version. The default mode is SQL Express; when developing large-scale N-tier applications, the full version is preferred.
To use a complete SQL Server 2005 database instance in place of SQL Express for applications development, you need to follow the steps below.
The first step you should take is to create or obtain a connection string to an empty SQL database.
Run the aspnet_regsql.exe utility usually installed with ASP.NET 2.0 by navigating to C:\WINDOWS\Microsoft.NET\Framework\v2.0.xyz\>aspnet_regsql.exe.
You can also run the aspnet_regsql.exe utility from the command line. This will start the wizard and display the Welcome screen.
You can use this wizard to walk through the process of creating the schema, tables, and sprocs for the built-in SQL providers that come with ASP.NET 2.0:
The first screen that appears will be Welcome to the ASP.NET SQL Server Setup Wizard, as shown above. Click on Next to proceed with the wizard.
On the next screen (Select a Setup Option), you will find two setup options:
You need to select the Configure SQL Server for application services radio button to run a script that will configure an existing database or generate a new database for storing the ASP.NET membership, profiles, role management, personalization, and SQL Web event provider information.
Once selected, click Next to proceed.
Now you need to specify your Connection Information.
First specify your SQL Server name. Then choose the authentication method that will subsequently provide SQL Server 2005 access to all your Windows users.
Select Windows Authentication if you want to use your Windows user name and password. Select SQL Server Authentication by specifying the SQL Server username and password if you want to restrict access to SQL Server 2005 for other users.
Finally, select the database that you need to configure to support ASP.NET 2.0 features.
Once done, click Next to proceed.
On the Confirm Your Settings screen, click Previous to change your settings or click Next to proceed.
At this point you will get a screen telling you that the database has been created or modified, which implies that all the database schema and sprocs to support the application services have been installed and configured. Now just simply click Finish to exit the wizard.
Note: Your database administrator (DBA) can see the exact process running in the background by viewing the raw .sql files below the above framework directory. Your DBA can go through those raw .sql files and run them manually to install the database.
Your web.config file in ASP.NET 2.0 now supports a new section called
Another feature of ASP.NET 2.0 is that it supports encrypting any section stored in the web.config files. This means you can securely store private data like connection strings without writing any of your own encryption code. By default, the built-in connection string called "LocalSqlServer" that comes with ASP.NET 2.0 is configured to use an SQL Express database. This in turn is configured to use Membership, Roles, Personalization, Profile, and Health Monitoring services by default.
Replacing the connection string value of "LocalSqlServer" in your application's local web.config file will help your application get an automatic advantage in your newly created SQL database. For example, if you create your database on the local machine in an "appservicesdb" database instance and connect it using Windows Integrated security, you should change your local web.config file to specify this:
Save this information to start using the newly created and defined SQL Server database for all of the built-in application services.
Note: The use of the "LocalSqlServer" connection string name is the only disadvantage with the above approach, as it seems odd when you install the database on another machine. You can give it your own name by adding an entirely new connection string and pointing the existing providers to the new connection string name for the default LocalSqlServer.
The list of common DBA tasks includes setting up a server performance base line and comparing the performance of two or more servers. We often have to do this urgently when our customer calls and expects us to identify and remove a performance bottleneck right away. The following article includes code and hands-on instruction on how to automate such tedious tasks as reusing the preset monitor parameters and analyzing the results. Given the ever-increasing size of databases and the importance of database performance, this article can also help you monitor and tune the performance of your system, as well as improve your performance as a DBA.
The Performance Monitor console for Microsoft Windows 2000 Server can monitor both the operating system and the utilization of SQL Server resources. Unfortunately, it stores the results of the monitoring as a binary file or comma delimited text file, which is inconvenient for further analysis. Although Windows XP and Windows 2003 can write directly to the SQL Server database, the format of the stored data is also hard to use. However, we can work around this problem.
A typical monitoring setup includes the following scripts:
The typical SQLperfmon Counter Logs I usually use contain the following counters:
Memory counters
- Pages/sec
- Page Reads/sec
Disk counters for every physical disk
- Avg. Disk Queue Length,
- Avg. Disk sec/Transfer
Processor counters
- % Processor Time - _Total
System counters
- Processor Queue Length
SQL Server counters
- Access Methods - Page Splits/sec
- General Statistics - User Connections
- Buffer Manager - Buffer Cash Hit Ratio
Observing the Counter Log HTML file, you'll notice that a computer name is not mentioned anywhere in the script, thus this SQLperfmon.htm file can be installed on any machine running Windows 2000 Server with SQL Server 2000. Here is the script for SQLperfmon.htm:
In order to prepare and execute the monitoring processes you should perform the following steps on the server you will monitor:
Another monitoring setup that could be useful for SQL Server performance monitoring was recommended in "Performance Monitoring — Basic Counters" by Steve Jones. It also uses the Counter Log file with the corresponding table and import scripts. I included all the counters recommended in that article except the "Network Interface Object" because it uses a particular network card name, which may not be the same on different servers. The following script assumes that the monitored server has a physical disk, "0 C:", and that the Transactions/Sec counter will check the "pubs" database.
SJperformanceset.htm:
SJperformanceset Table Script:
CREATE TABLE [SJperformanceset] (
[CounterDateTime] [datetime] NOT NULL,
[Available MBytes] [numeric](18, 2) NULL,
[Page Reads/sec] [numeric](18, 2) NULL,
[Percent Idle Time] [numeric](18, 2) NULL,
[Avg Disk Queue Length] [numeric](18, 2) NULL,
[Processor Time] [numeric](18, 2) NULL,
[Full Scans/sec] [numeric](18, 2) NULL,
[Cache Hit Ratio] [numeric](18, 2) NULL,
[Transactions/Sec] [numeric](18, 2) NULL,
[User Connections] [numeric](18, 2) NULL,
[Average Wait Time] [numeric](18, 2) NULL,
[Processor Queue Length] [numeric](18, 2) NULL
)
GO
ALTER TABLE [SJperformanceset] WITH NOCHECK ADD
CONSTRAINT [PK_SJperformanceset] PRIMARY KEY CLUSTERED
(
[CounterDateTime]
)
GO
Insert Monitoring Data Script:
INSERT INTO [SJperformanceset] (
[CounterDateTime]
,[Available MBytes]
,[Page Reads/sec]
,[Percent Idle Time]
,[Avg Disk Queue Length]
,[Processor Time]
,[Full Scans/sec]
,[Cache Hit Ratio]
,[Transactions/Sec]
,[User Connections]
,[Average Wait Time]
,[Processor Queue Length]
)
SELECT
[(PDH-CSV 4#0) (Pacific Standard Time)(480)]
,cast([\Memory\Available MBytes] as float)
,cast([\Memory\Pages/sec] as float)
,cast([\PhysicalDisk(0 C:)\% Idle Time] as float)
,cast([\PhysicalDisk(0 C:)\Avg# Disk Queue Length] as float)
,cast([\Processor(_Total)\% Processor Time] as float)
,cast([\SQLServer:Access Methods\Full Scans/sec] as float)
,cast([\SQLServer:Buffer Manager\Buffer cache hit ratio] as float)
,cast([\SQLServer:Databases(pubs)\Transactions/Sec] as float)
,cast([\SQLServer:General Statistics\User Connections] as float)
,cast([\SQLServer:Locks(_Total)\Average Wait Time (ms)] as float)
,cast([\System\Processor Queue Length] as float)
FROM OpenDataSource( 'Microsoft.Jet.OLEDB.4.0',
-- csv file on a local drive
'Data Source=C:\PerfLogs;Extended properties=Text')...SJperformanceset_02272122#csv
If you need to consistently monitor the database server's performance and analyze the log data, you can run both the file copying process and the data importing process using the scheduler.
A command to copy a file is shown below:
copy /Y SJperfomanceset_03010952.csv \\yeto\c$\perflogs
Where option /Y overwrites the existing performance log file.
The following template of an import script with an added WHERE clause will insert only new records into the SJperformanceset table — those added after the last insert:
INSERT INTO [SJperformanceset] (
[CounterDateTime]
...
)
SELECT
[(PDH-CSV 4#0) (Pacific Standard Time)(480)]
...
FROM OpenDataSource( 'Microsoft.Jet.OLEDB.4.0',
-- csv file on a local drive
'Data Source=C:\PerfLogs;Extended properties=Text')...SJperformanceset_02272122#csv
-- add new records only
WHERE [(PDH-CSV 4#0) (Pacific Standard Time)(480)] > (SELECT max([CounterDateTime]) FROM SJperfomanceset)
The monitoring sets provided in this article should be enough to do most of your database performance analysis, but you can modify the monitoring set to your liking by adding or deleting some of the particular counters and modifying tables and import scripts accordingly. Using these examples, you can also create your own monitoring tool set for fine-grain monitoring of resource utilization in particular areas such as memory, I/O, processor, etc.
Predefined monitoring sets will increase your ability to setup a performance base line, to compare the performance of multiple servers, to find possible performance bottlenecks, and to monitor the overall health of your system. You can also use these sets for additional system analysis, generating reports, and predicting future problems.
|
Ranking functions, introduced in SQL Server 2005, are a great enhancement to Transact-SQL. Many tasks, like creating arrays, generating sequential numbers, finding ranks, and so on, which in pre-2005 versions requires many lines of code, now can be implemented much easier and faster.
Let's look at the syntax of ranking functions:
ROW_NUMBER () OVER ([
RANK () OVER ([
DENSE_RANK () OVER ([
NTILE (integer_expression) OVER ([
All four functions have "partition by" and "order by" clauses and that makes these functions very flexible and useful. However, there is one nuance in syntax that deserves your attention: the "order by" clause is not an option.
Why should you worry about the "order by" clause?
Well, as a DBA or database programmer you know that sorting is a fairly expensive operation in terms of time and resources. And if you were forced to use it always, even in a situation where you didn't need it, you could expect degradation of performance, especially in large databases.
Is it possible to avoid sorting in ranking functions? If possible, how would it improve performance?
Let's try to answer these questions.
Create a sample table (Listing 1):
-- Listing 1. Create a sample table.
CREATE TABLE RankingFunctions(orderID int NOT NULL);
INSERT INTO RankingFunctions VALUES(7);
INSERT INTO RankingFunctions VALUES(11);
INSERT INTO RankingFunctions VALUES(4);
INSERT INTO RankingFunctions VALUES(21);
INSERT INTO RankingFunctions VALUES(15);
Run the next query with the ROW_NUMBER() function:
SELECT ROW_NUMBER () OVER (ORDER BY orderID) AS rowNum, orderID
FROM RankingFunctions;
If you check the execution plan for that query (see Figure 1), you will find that the Sort operator is very expensive and costs 78 percent.
Run the same query, leaving the OVER() clause blank:
SELECT ROW_NUMBER () OVER () AS rowNum, orderID
FROM RankingFunctions;
You will get an error:
Msg 4112, Level 15, State 1, Line 1
The ranking function "row_number" must have an ORDER BY clause.
Since the parser doesn't allow you to avoid the "order by" clause, maybe you can force the query optimizer to stop using the Sort operator. For example, you could create a computed column that consists of a simple integer, 1, and then use that virtual column in the "order by" clause (Listing 2):
-- Listing 2. ORDER BY computed column.
-- Query 1: Using derived table.
SELECT ROW_NUMBER () OVER (ORDER BY const) AS rowNum, orderID
FROM (SELECT orderID, 1 as const
FROM RankingFunctions) t1
GO
-- Query 2: Using common table expression (CTE).
WITH OriginalOrder AS
(SELECT orderID, 1 as const
FROM RankingFunctions)
SELECT ROW_NUMBER () OVER (ORDER BY const) AS rowNum, orderID
FROM OriginalOrder;
If you check the execution plans now (see Figure 2), you will find that query optimizer doesn't use the Sort operator anymore. Both queries will generate the row numbers and return the orderID values in the original order.
RowNum | orderID |
1 | 7 |
2 | 11 |
3 | 4 |
4 | 21 |
5 | 15 |
There is a small problem with the queries in Listing 2 — they need time (resources) to create and populate the virtual column. As a result, the performance gains that you achieve by avoiding the sort operation may disappear when you populate the computed column. Is there any other way to skip the sort operation?
Let's try to answer this question.
The "order by" clause allows the expressions. The expression can be simple, constant, variable, column, and so on. Simple expressions can be organized into complex ones.
What if you talk to query optimizer using the expression's language? For example, try to use the subquery as an expression:
SELECT ROW_NUMBER () OVER (ORDER BY (SELECT orderID FROM RankingFunctions)) AS rowNum, orderID
FROM RankingFunctions;
No, you can't bypass the parser. You will get an error:
Msg 512, Level 16, State 1, Line 1
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
O-o-o-p-s, here's the hint! The expression (or in our case, the subquery) has to produce a single value.
This should work:
SELECT ROW_NUMBER () OVER (ORDER BY (SELECT MAX(OrderID) FROM RankingFunctions)) AS rowNum, orderID
FROM rankingFunctions;
Bingo! That query is working exactly as you wanted — no Sort operator has been used.
Now you can write an expression in the "order by" clause that returns a single value, forcing the query optimizer to refrain from using a sort operation.
By the way, the solutions in Listing 2 worked because the integer values in computed columns have been duplicated in all the rows and for that reason were considered a single value.
Here are some more examples of expression usage in an "order by" clause (Listing 3):
-- Listing 3. Using an expression in an ORDER BY clause.
SELECT ROW_NUMBER () OVER (ORDER BY (SELECT 1 FROM sysobjects WHERE 1<>1)) AS rowNum, orderID
FROM RankingFunctions;
GO
SELECT ROW_NUMBER () OVER (ORDER BY (SELECT 1)) AS rowNum, orderID
FROM RankingFunctions;
GO
DECLARE @i as bit;
SELECT @i = 1;
SELECT ROW_NUMBER () OVER (ORDER BY @i) AS rowNum, orderID
FROM RankingFunctions;
Figure 3 shows the execution plans for the queries in Listing 3.
Before we move forward, we should check the correctness of the solutions for the rest of the ranking functions.
Let's create a few duplicates in the RankingFunctions table and start testing the RANK() and DENSE_RANK() functions:
-- Listing 4. RANK() and DENSE_RANK() functions with expressions in an ORDER BY clause.
-- Create duplicates in table RankingFunctions.
INSERT INTO RankingFunctions VALUES(11);
INSERT INTO RankingFunctions VALUES(4);
INSERT INTO RankingFunctions VALUES(4);
GO
-- Query 1: (ORDER BY orderID).
SELECT RANK () OVER (ORDER BY orderID) AS rankNum,
DENSE_RANK () OVER (ORDER BY orderID) AS denseRankNum,
orderID
FROM RankingFunctions;
GO
-- Query 2: (ORDER BY expression).
SELECT RANK () OVER (ORDER BY (SELECT 1)) AS rankNum,
DENSE_RANK () OVER (ORDER BY (SELECT 1)) AS denseRankNum,
orderID
FROM RankingFunctions;
GO
If you check the execution plans (see Figure 4), you will find that the first query in Listing 4 requires a lot of resources for sorting. The second query doesn't have a Sort operator. So the queries behave as expected.
However, when you run the queries, the second result will be wrong:
Query 1 retrieves the correct result:
RankNum | denseRankNum | orderID |
1 | 1 | 4 |
1 | 1 | 4 |
1 | 1 | 4 |
4 | 2 | 7 |
5 | 3 | 11 |
5 | 3 | 11 |
7 | 4 | 15 |
8 | 5 | 21 |
Query 2 retrieves the wrong result:
rankNum | denseRankNum | orderID |
1 | 1 | 7 |
1 | 1 | 11 |
1 | 1 | 4 |
1 | 1 | 21 |
1 | 1 | 15 |
1 | 1 | 11 |
1 | 1 | 4 |
1 | 1 | 4 |
Even though the expressions in the "order by" clause help to skip sorting, they can't be applied to the RANK() and DENSE_RANK() functions. Apparently, these ranking functions must have a sorted input to produce the correct result.
Now let's look at the NTILE() function:
-- Listing 5. NTILE() function with expressions in an ORDER BY clause.
-- Query 1: ORDER BY orderID.
SELECT NTILE(3) OVER (ORDER BY orderID) AS NTileNum, orderID
FROM RankingFunctions;
GO
-- Query 2: ORDER BY expression.
SELECT NTILE(3) OVER (ORDER BY (SELECT 1)) AS NTileNum, orderID
FROM RankingFunctions;
Analyzing the execution plans for both queries (see Figure 5), you will find that:
Now, when you know how to avoid sorting in ranking functions you can test their performance.
Let's insert more rows into the RankingFunctions table (Listing 6):
-- Listing 6. Insert more rows into the RankingFunctions table.
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[RankingFunctions]') AND type in (N'U'))
DROP TABLE RankingFunctions
SET NOCOUNT ON
CREATE TABLE RankingFunctions(orderID int NOT NULL);
INSERT INTO RankingFunctions VALUES(7);
INSERT INTO RankingFunctions VALUES(11);
INSERT INTO RankingFunctions VALUES(4);
INSERT INTO RankingFunctions VALUES(21);
INSERT INTO RankingFunctions VALUES(15);
DECLARE @i as int, @LoopMax int, @orderIDMax int;
SELECT @i = 1, @LoopMax = 19;
WHILE (@i <= @LoopMax)
BEGIN
SELECT @orderIDMax = MAX(orderID) FROM RankingFunctions;
INSERT INTO RankingFunctions(OrderID)
SELECT OrderID + @orderIDMax FROM RankingFunctions;
SELECT @i = @i + 1;
END
SELECT COUNT(*) FROM RankingFunctions;
-- 2,621,440.
UPDATE RankingFunctions
SET orderID = orderID/5
WHERE orderID%5 = 0;
The INSERT and SELECT parts of the INSERT…SELECT statement are using the same RankingFunctions table.
The number of generated rows can be calculated as:
generated rows number = initial rows number * power(2, number of loop iterations)
Since RankingFunctions initially has 5 rows and @LoopMax = 19, the number of generated rows will be:
5 * POWER(2,19) = 2,621,440
To increase the entropy in the row order, I changed (updated) the orderID values in the rows where orderID can be divided by 5 without the remainder.
Then I tested the INSERT and DELETE commands, using ranking functions with and without sorting (Listing 7 and Listing 8).
-- Listing 7. Performance tests 1 (Inserts, using SELECT ...INTO).
-- Query 1: Using ORDER BY orderID.
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].RankingFunctionsInserts') AND type in (N'U'))
DROP TABLE RankingFunctionsInserts;
GO
SELECT ROW_NUMBER () OVER (ORDER BY OrderID) AS rowNum, OrderID
INTO RankingFunctionsInserts
FROM RankingFunctions;
-- Drop table RankingFunctionsInserts and run Query 2.
-- Query 2: Without sorting.
SELECT ROW_NUMBER () OVER (ORDER BY (SELECT 1)) AS rowNum, OrderID
INTO RankingFunctionsInserts
FROM RankingFunctions;
-- Drop table RankingFunctionsInserts and run Query 3.
-- Query 3: Using a pre-2005 solution.
SELECT IDENTITY(int,1,1) AS rowNum, orderID
INTO RankingFunctionsInserts
FROM RankingFunctions;
Each of the three queries in Listing 7 inserts the generated row number and orderID into the RankingFunctionsInserts table, using the SELECT…INTO statement. (This technique is very helpful when you trying to create pseudo-arrays in SQL.)
For the sake of curiosity, I tested a solution with an IDENTITY column (Query 3). That solution is very common in pre-2005 versions of SQL Server.
-- Listing 8. Performance tests 2 (Delete every fifth row in the RankingFunctions table).
-- Query 1: Without sorting.
-- Run the script from Listing 6 to insert 2,621,440 rows into RankingFunctions.
WITH originalOrder AS
(SELECT ROW_NUMBER ( ) OVER (ORDER BY (SELECT 1)) AS rowNum, OrderID
FROM RankingFunctions)
DELETE originalOrder WHERE rowNum%5 = 0;
-- Query 2: With ORDER BY OrderID.
-- Run the script from Listing 6 to insert 2,621,440 rows into RankingFunctions.
WITH originalOrder AS
(SELECT ROW_NUMBER ( ) OVER (ORDER BY OrderID) AS rowNum, OrderID
FROM RankingFunctions)
DELETE originalOrder WHERE rowNum%5 = 0;
Deleting every Nth row or duplicates in the table are common tasks for a DBA or database programmer. In Listing 8, I used CTE to delete every fifth row in the RankingFunctions table.
Here are the results that I got on a regular Pentium 4 desktop computer with 512 MB RAM running Windows 2000 Server and Microsoft SQL Server 2005 Developer Edition:
ROW_NUMBER() | RANK() | DENSE_RANK() | NTILE(3) | |
INSERT 2,621,440 rows | ||||
without sorting | 5 sec. | N/A | N/A | 35 sec. |
with sorting | 14 sec. | 14 sec. | 14 sec. | 40 sec. |
with IDENTITY | 8 sec. | N/A | N/A | N/A |
DELETE each 5th row | ||||
without sorting | 5 sec. | |||
with sorting | 24 sec. |
As you can see, the ROW_NUMBER() function works much faster without sorting. It also performs better than the IDENTITY solution, which is unsorted as well.
The RANK() and DENSE_RANK() functions, as we found earlier, don't work properly without sorting. NTILE() shows a very small improvement, about 10 percent. This is can be explained.
As I mentioned earlier, the optimizer is using Nested Loops to implement the NTILE() function. For large data sets, without the indexes (as in our case), Nested Loops can be very inefficient. However, you will find that they are inexpensive in the execution plan (see Figure 6), because sorting helps to make Nested Loops lighter.
When sorting is missing (see Figure 7), the Nested Loops become much heavier and almost "eat" the performance gains that you achieve by avoiding sorting.
As you know, all the pages of non-clustered indexes, and the intermediate-level pages of clustered indexes, are linked together and sorted in key sequence order. The leaf-level of a clustered index consists of data pages that are physically sorted in the same key sequence order as the clustered index key. All that means is that you already store some part(s) of your table's data in a particular order. If your query can use that sorted data — and this is what happens when you have a covering index — you will increase the performance of your query dramatically.
Take any table with many columns and rows (or create and populate one using the technique from Listing 6). Then create different indexes and test the ranking functions. You will find that for covered queries the optimizer won't use a Sort operator. This is what makes the ranking function as fast as, or even faster than, the functions with an expression in an "order by" clause.
This article explains ranking functions and helps you understand how they work. The techniques shown here, in some situations, can increase the performance of ranking functions 3-5 times. In addition, this article discusses some common trends in the behavior of an ORDER BY clause with expressions.