A proper, technically correct system and storage sizing assessment of a SQL Server system and storage configuration needs to consider a broad range of CPU, memory and IO usage analysis. The cost of this assessment might be $10-20K with proper accounting for personnel time, depending on the availability of an existing production environment from which forward projections can be made, or whether this has to be done entirely from theoretical models and a database prototype. Now, should it turn out that a $20K server and storage system is determined to be overkill to meet the final requirements, it should be asked whether it makes sense to conduct a rigorous technical assessment. Fortunately, a reasonable set of system configurations can be arrived at without heavy technical analysis, driven mostly by the cost structure of the Windows operating system, SQL Server licensing, and the server platform. In cases where a large system with more than four processor sockets is required, it still does make sense to conduct a proper technical analysis.
Management Summary
Upper management people frequently do not have time to internalize the detailed rational for each system configuration decision and only want a quick summary. The simple picture and answer, without the detailed underlying rational is presented below. In case it is not already apparent, the general theme is prolific IO bandwidth and random IOPS capability. In theory, the realizable sequential bandwidth over 2 internal SCSI and 8 external SCSI channels is 2.5GB/sec, but this has not been verified.
The table below is a quick summary of system configuration options for systems using the Intel E7230, E7520 and E8500 chipsets. The configurations are based on U320 SCSI storage technology. Over the next year, expect SCSI to be replaced by a combination of SAS and SATA as the technology of choice in a wide range of server and storage systems.
Processor Sockets | Single | Dual | Quad |
Chipset | E7230 | E7520 | E8500 |
Dual Core Processors | 1 | 2 | 4 |
Memory Sockets | 4 | 6-8 | 16-32 |
Native PCI-Express config. | 1 x8, 1 x4 | 3 x8 | 3 x8, 1x4 |
Embedded controller: | SATA | SCSI | SCSI |
SCSI Channels | Int. SATA | 2 Internal | 2 Internal |
External Storage Enclosures | 1 | 2 | 4 |
Disk Drives - Internal | 2-4 | 4-6 | 4-8 |
Sequential Disk Bandwidth | 600MB/sec+ | 1.2GB/sec+ | 2GB/sec+ |
*A single x8 slots can be configured as two x4 slots or converted to 2 PCI-X busses.
A brief description of software, system, memory and storage choices are described below.
- Software: Windows Server 2003 64-bit and SQL Server 2005 64-bit are preferred over 32-bit versions, including SQL Server 2000, if full 64-bit operation is supported by other required components. Some performance tuning may be required due to changes in behavior from SQL Server 2000 to 2005 and from 32-bit to 64-bit versions.
- System: any suitable 1, 2 or 4 socket server system with dual core processors. Even if it turns out that a big NUMA system is required, just rotate the 1-4 socket system to a QA function. Consider that a single socket dual-core system today is comparable to a 4-way Xeon MP 2.0GHz system of the 2002 vintage.
- Memory: fill available DIMM sockets with 1GB or 2GB DIMMs, unless it is otherwise known that the 4GB DIMM is required. Chose the 2GB module if the price is favorable relative to the 1GB module. Memory requirements may influence system selection. It can be more effective to buy a four socket system with more DIMM sockets even if only 2 processor sockets are actually desired, especially if this avoids the need to configure 4GB memory modules.
- Storage: Distribute load across 4-10 or more IO channels and 16-60+ disk drives. Any of the U320 SCSI, FC or SAS interfaces can meet most performance requirements with proper distribution. The disk drives of choice for early 2006 are the 15K 36GB drive or the 10K 72-74GB drive. Do not let a moron convince you that the big 146GB 15K or 300GB 10K drives meets capacity requirements at lower cost. That is not the point with regard to performance. For SFF 2.5” SAS drives, either the 36 or 72GB 10K are acceptable.
One final point is of critical importance. Test the performance of the final configuration. What a disk configuration should be able to do and can actually do are not the same thing.
Software Versions
For practical purposes, the software licensing fee structure helps in narrowing the recommended set of system configurations. The table below shows the Windows Server 2003 R2 processor and memory support by edition for both 32 and 64-bit versions.
Windows Server 2003 R2 Processor and Memory support by Edition | Standard Ed. 32-bit / 64-bit | Enterprise Ed. 32-bit / 64-bit | Datacenter Ed. |
Max Processor (Sockets) | 4 / 4 | 8 / 8 | 32 / 64 |
Max Memory | 4GB / 32GB | 64GB / 1TB | 64GB / 1TB |
The table below shows SQL Server 2000 processor and memory support by edition. The SQL Server 2000 64-bit version is for Itanium only.
SQL Server 2000 Processor and Memory support by Edition | Standard Ed. 32-bit / 64-bit | Enterprise Ed. 32-bit / 64-bit | |
Max Processor (Sockets) | 4 | No Limit | |
Max Memory | 2GB | 64GB / 512GB |
The table below shows SQL Server 2005 processor and memory support by edition for 32-bit and 64-bit versions.
SQL Server 2005 Processor and Memory support by Edition | Workgroup | Standard Ed. 32-bit / 64-bit | Enterprise Ed. 32-bit / 64-bit |
Max Processor (Sockets) | 2 | 4 | No Limit |
Max Memory | 3GB | OS Max | OS Max |
The main point of the above is that the older software products restricted the ability to use memory. At the time of product launch, the restrictions were reasonable. The 3GB and AWE where limited to the Enterprise Editions, which was not unreasonable in 2000, when 4GB memory was expensive and required an expensive system as well. Since then 8-16GB memory is not unreasonable for the Standard Edition price point, but the lack new releases meant the only option was SQL Server 2000 Enterprise Edition. With the release of SQL Server 2005, memory restrictions have been lifted in Standard Edition. The OS does have memory restrictions, but these are more relaxed, especially for the 64-bit versions, and the price step to Enterprise Edition is not as steep.
In any case, the preferred solution is the 64-bit version of Windows Server 2003. Either Standard or Enterprise Editions will fit most needs. SQL Server 2005 is strongly favored over SQL Server 2000 because of the increased memory capability, more so for the Standard Edition. The 64-bit version has unimpeded access to >4GB memory compared with 32-bit versions, which must use AWE.
Platforms
For the very large majority of needs today, the 1, 2 and 4 socket systems are more than adequate. Most sites buy either the 2 or 4 socket system. However, given the capability of microprocessors today and the prevalent availability of multi-core processors, serious consideration should also be given to the single socket system when appropriate. Vendors are reluctant to recommend this system over higher margin dual socket systems.
The table below summarizes the processor options. Dual core processors are currently available for all option except for Itanium, for which dual core should be available in mid or late 2006. The single core processors are usually available at slightly higher frequency than the comparable dual core versions. The software licensing costs will strongly favor the dual core option except for code that must execute in a serialized manner.
Processor | Sockets | Notes |
AMD Opteron | 1-4 | General purpose, best overall characteristics. |
Intel Pentium D | 1 | Entry |
Intel Core Duo | 1 | Entry, low power, 32-bit only |
Intel Xeon | 2 | Mid-range, excellent table scan performance, Hyper-threading best for high network round-trip applications. |
Intel Xeon MP | 4-32 | Same |
Itanium | 2-64 | High-end, favored in full 64-bit, scale-up environments. Dual core in mid-2006. |
The AMD Opteron processor probably has the higher performance for a broad range of SQL Server operations in the early 2006 time frame. The Opteron also does exceptionally well where serialized memory accesses are required. The Intel Pentium D, Xeon and Xeon MP processor line is somewhat behind at this point in time, mostly due to thermal limitations, capping the top operating frequency far below the maximum that could be supported by the transistor switching rate. The Pentium 4 / Xeon family does have the best table scan performance of all processor lines. The Hyper-Threading feature in this line does improve performance in high network round-trip volume applications. Unfortunately, HT does not improve performance uniformly for other operations, and may even cause performance degradation in some operations, so the use of HT requires careful analysis. HT does improve compression performance by 50% per core, which is very impressive. This characteristic benefits the database backup and restore operations when using third party backup compression software such as Quest (formerly Imceda) LiteSpeed.
The Intel Core Duo and its predecessor, Pentium M are 32-bit only processors, but have excellent performance and very low power consumption. These are most suited to 32-bit only environments, single socket systems, in high density or other power constrained environments. The Itanium 2 line is currently lagging in the 2-4 socket system space because of the delay in releasing a dual-core version and that the current 130nm Madison processor is competing against 90nm Opteron and Xeon processors. However, in the high-end scale-up environments, Itanium systems warrant serious consideration for Windows and SQL Server based solutions. Unisys offers a high-end product line based on the Xeon MP processor. Prior to the introduction of 64-bit capability in the Xeon MP line, there were serious limitations in the use 32-bit Xeon MP processors in large NUMA systems, mostly due to the limited 4GB address space. It is unclear today how the market for systems with more than four sockets will split between Itanium, Xeon MP and possible Opteron contenders, now that full 64-bit capability is available in all processor lines.
Below is a simplified representation of a single socket system based on the Intel E7230 chipset. The IO capability of this desktop derived chipset is quite impressive. There is one x8 PCI-Express port that is bridged into 2 PCI-X busses. The first at 100MHz has 2 slots and the second at 133MHz with 1 slot. Alternative configurations include a single x8 PCI-E port, two x4 PCI-E ports or one x4 PCI-E and one PCI-X bus. In general, PCI-E configurations should be preferred if PCI-E adapters are available. The DMI port connects the memory controller to the ICH7 with the same bandwidth as one x4 PCI-E port. The ICH7 has 4 SATA ports at 3.0Gbit/sec, 2 PCI-E x1 ports for gigabit Ethernet and one x4 PCI-E port.
The E7230 chipset should have no problems driving 4 internal SATA drives plus 2 RAID controllers and 4 SCSI channels. However, a single RAID controller with 2 SCSI channels should be adequate for this single socket system. It could be mentioned that the E7230 is derived from the 9XX desktop chipset line, which actually has one x16 PCI-E port for graphics and one DMI port, which is essentially a PCI-E x4 port. There is no reason the x16 graphics port could not have been configured as two x8 ports, except that IO capability is already beyond the needs for most single socket systems.
The figure below is a representation of a two socket system based on the Intel E7520 chipset. The E7520 has three x8 PCI-E ports and one HI interface for low bandwidth legacy devices. Each x8 PCI has a very impressive 2Gbytes/sec nominal bandwidth in each direction. The actual sustainable bandwidth has not been verified.
As shown, one x8 PCI-E port is bridged into two PCI-X busses. Each PCI-X bus has an embedded PCI-X device in addition to the available open slots. Alternative configurations for the E7520 chipset include 3 x8 PCI-E ports, or 1 x8 PCI-E and four PCI-X busses, two each on two IO bridges. Each x8 PCI-E port can also be configured as two x4 ports. For the two socket system, 4-8 internal SCSI disks on the two embedded SCSI channels and two dual channel RAID controllers for a total of 2 internal and 4 external SCSI channels is a good combination.
The figure below shows a four socket system based on the Intel E8500 chipset. There are two independent processor busses, each capable of supporting two processor sockets for a total of four sockets. There are the four independent memory interfaces (IMI) that attach to the north bridge. Each IMI channel can be expanded with an external memory bridge (XMB) device into two DDR-II memory channels. In the IO subsystem, there are three x8 PCI-E ports, one x4 port and the legacy HI link for legacy IO devices. As with the other chipsets, each x8 port can be configured as two x4 ports. As shown below, one x8 port is bridged into two PCI-X busses, with embedded SCSI and GE. One x8 port is left as is, the third x8 port is configured as x4 ports for a configuration of one 1 x8 and three x4 PCI-E slots.
The disk configuration is 4-8 internal disks on the two embedded SCSI channels and four dual channel RAID controllers on each of the available PCI-E slots, for a total of 10 SCSI channels, 8 connected to four external dual-channel disk enclosures. The objective of this configuration is to achieve >2GB/sec sequential disk bandwidth. It is unclear whether the E8500 can actually sustain this bandwidth, but at least 1.5GB/sec has been verified.
In any case, both the E7520 and E8500 have very impressive IO bandwidth capability. If there is any disappoint, it the continued use of the HI interface for legacy devices, considering that the much improved DMI interface is available on desktop chipsets. It could be that Intel did not want to risk a delay in launch of the E7520 and E8500 chipsets on non-essential new technology, but a follow chipset could have corrected this deficiency.
The AMD Opteron chipset has been verified to sustain 2.4GB/sec in disk bandwidth. (This was reported by Microsoft Research).
Memory
The technically correct assessment of memory requirement should consider disk load versus memory to determine the best cost and system characteristics. The cost of the technical assessment will probably exceed the cost of memory except for very large systems. In most cases, a reasonable memory configuration can be arrived at from the operating system and SQL Server edition support and the system DIMM sockets. The table below shows the common number of DIMM sockets by system type.
Processor | Chipset | Processor Sockets | DIMMS sockets | Max Memory |
Opteron | 8000 | 2 | 8 | 16-32GB |
Opteron | 4 | 32 | 32-128GB | |
Pentium D | E7230 | 1 | 4 | 8GB |
Xeon | E7520 | 2 | 6-8 | 12-16GB |
Xeon MP | E8500/8501 | 4 | 16-32 | 32-64GB |
The table below shows the current price of DDR2 PC2-3200 registered ECC memory from Crucial (www.crucial.com) as of Mar 2006.
Capacity | Mar-2006 Price |
512MB | $100 |
1GB | $190 |
2GB | $1000 / $550 |
At any given point in time, the high volume DIMM products will have an essentially linear price-capacity relation. The low volume high capacity DIMM products will have a much higher price on a per GB basis. Approximately every two years, the high end DIMM doubles in capacity. Until recently, the price of the 2GB ECC DIMM was more than quadruple the price of the 1GB ECC DIMM, or twice the cost per GB. Just recently, a new 2GB DIMM SKU has appeared at a much more favorable price relative to the 1GB DIMM. The original 2GB DIMM retains the unfavorable price. It will probably another 2 years before the 4GB ECC DIMM becomes favorably priced relative to the 2GB DIMM.
In most cases it is simpler to fill the system with the largest capacity commodity DIMM, currently a 1GB or 2GB module, than to conduct a detailed analysis, except when otherwise limited by OS or SQL Server edition. If it is known that disk IO is sensitive to memory and the cost of the storage system is high, then the large capacity DIMM can make sense.
Storage System
The biggest and most frequent mistake people make with regard to the storage system is in considering only capacity as the primary requirement. Storage system vendors should know better, but tend to propagate this mistake. The storage system needs to be sized for both performance and capacity. Storage performance is further distinguished in terms of random IO capability and sequential transfer bandwidth. Given the capacity of individual disk drives today, it actually turns out that on meeting storage performance requirements, capacity is essentially irrelevant.
In most cases, one should buy the lowest sensible capacity 10K or 15K disk drives. Today, the sensible capacities are the 73GB 10K and the 36GB 15K disk drives. Both drives are typically $200 per disk when purchased as bare drives. Prices from system vendors can be 50% higher. The cost per drive in a SAN system can be over $2000 per drive. When the cost of the next capacity (146GB 10K and 73GB 15) is less than 30% higher, these become the lowest sensible capacity drives. The reason that one should not wait for price parity is that a larger drive can short-stroked for better random IO performance for a fixed data size. In terms of cost per IOP, both the 10K and 15K drives are reasonably comparable. The storage configuration with the 10K drives will have higher capacity while the 15K version will have lower latency in random IO operations.
As mentioned in the previous section, the technically correct assessment of storage performance should consider the memory versus disk system performance characteristics. Without knowing the actual application characteristics and needs, the following can be used as a minimum baseline configuration. Most transactional databases will perform both transactions and some amount of reporting, even if replication is used to move many reports to a replicated database.
Reporting queries will frequently involve either large table scans or sizeable index range scans involving a large number of rows. If these operations need to go to disk for data, the storage system could be saturated and effectively shutdown the system transaction processing capability. One reason is that the report query may generate a large number of disk read-ahead calls, which leads to a high queue depth in the disk system. Normal transactions may generate only a handful of disk reads, probably without read-ahead reads. So even if the intended priority of the transaction is higher than that of the report, in effect, the storage system has been prioritized to favor the report IO requests. Transaction driven disk reads will sit behind a long line of report driven disk reads.
Now it turns out SQL Server 2000 has limited ability to read from disk in a table scan operation. Without hints, SQL Server 2000 table scan operation in most systems will generate between 300-400MB/sec in disk reads. With the NOLOCK hint, the table scan might generate between 700-800MB/sec in disk traffic. If the configured storage system has less sequential capacity than this, a table scan can saturate the disk system, and cause a substantial degradation in transaction driven disk IO. If the storage system can handle both the sequential table scan and transaction random IO load, then there may be little to no degradation in transaction throughput, depending on the availability of CPU and other resources.
SQL Server 2005 has no such limitation in sequential disk operations. The SQL Server 2005 table scan operation on the HP Superdome with dozens of dual-channel U320 SCSI RAID controllers and 3-5 disks in each channel can drive over 12GB/sec in disk traffic. So it may not be possible or practical to configure the storage system to exceed the capability of a SQL Server 2005 table scan. On the positive side, it will also be possible to complete a report query that much quicker, thus minimizing the period during which transactions are disrupted. An infrequent 2-3 sec disruption should be barely noticeable. This cautions against the setting of arbitrary Service Level Agreements on response time without a comprehensive and detailed understanding of database and system performance characteristics, especially so if the user does not perceive a meaningful impact. It is always better to specify a high percentage of responses under a reasonable time, and allow for a small percentage at a longer interval, unless true absolute requirements mandates otherwise.
The storage performance goals are then as follows. Sequential bandwidth for the SQL Server main database data files should exceed 400-800MB/sec, depending on requirements. The recommended minimum configuration to meet this goal is described for SCSI, SAS and FC storage systems.
Direct Attach Storage on U320 SCSI Technology
- 2-8 disk drives in the system internal drive bays on the embedded SCSI controller.
- 1-4 external enclosures (14-15 bays) with dual-channel SCSI.
- 4-7 disk drives in each channel of the external storage unit, for a total of 8-14 drives per unit.
- 1 PCI-X or PCI-Express dual channel RAID controller per storage unit.
The rational for using the internal drive bays is that the drive bays are already paid for, along with an embedded controller. The bare drive cost is relatively low compared with the full amortized external storage cost covering the disk drive, the external enclosure and the controller. Even in a clustered environment where the clustered resources must be shared, it never hurts to have extra bandwidth on a non-shared resource.
The rational for external storage is as follows. Each U320 SCSI channel with a nominal bandwidth of 320MB/sec can in fact sustain approximately 240-260MB/sec in large block sequential disk transfers. It takes 3 current generation disk drives to saturate a single U320 channel. Now it is not always possible to use the fastest portion of each disk drive, or to achieve pure sequential transfer. The external enclosures also typically have a minimum of 7 bays per channel. So for the combination of reasons, any where from 4-7 disks per SCSI channel is practical. External 3U SCSI storage enclosures are available with single or dual SCSI channels. Configuring more the 7 disks per channel is a waste of sequential bandwidth capability, so avoid single channel enclosures unless a very high spindle count is required that would exceed the capacity when configuring up to 7 disks per channel. Obsolete technologies tend to persist in storage products, so avoid the U160 enclosures.
The figures below shows the sequential disk read and write characteristics for a current generation 36GB 15 SCSI drive. The front portion of a 15K drive can transfer data at over 90MB/sec. The inner portion performance is still over 50MB/sec. Data placement can affect performance. A common practice is to use the front portion of each disk drive for the most performance critical data, the middle for less critical data and the end for backup and auxiliary storage.
The choice of the RAID controllers then falls to either dual or quad channel controllers. The PCI-X bus at 133MHz and 64-bit has a nominal bandwidth of 1GB/sec, which might seem appropriate for four U320 channels capable of sustaining approximately 250MB/sec each. However, it is not clear that any of the current quad-channel PCI-X SCSI controllers can actually sustain more the 500-600MB/sec. So the best choice for controllers is the dual channel model unless very high spindle count is required that would exceed the capacity of dual channel controllers.
Fiber Channel / SAN Storage
Fiber channel was designed for the high-end storage market. For some reason, the FC/SAN vendors did not continue to increase FC bandwidth from 2Gbit/sec to 4Gbit/sec in an expeditious time frame. In the mean time, U320 SCSI is being replaced by 3Gbit/sec SAS and even the original SATA 1.5Gbit/sec is being replaced with an improved 3Gbit/sec version. One might argue that FC at 4Gbit/sec, finally becoming available after a long pause since the 2Gbit/sec launch, is now faster than SAS and SATA. However the SAS and SATA 3Gbit/sec links are meant to connect 1-4 disk drives, while SAN system intend to cram 14-28 disks over a single 2-4Gbit/sec link, severely constraining sequential bandwidth.
Another issue with mid-range SAN systems is that most adapted components from existing server systems, either the Xeon processor with the ServerWorks GC-LE chipset, or RISC processors and chipsets with even less capability. At the time, none of the existing chipsets could deliver massive IO bandwidth like chipsets today can. So the mid-range SAN systems from 2003-2005 were not the most practical solutions for delivering high sequential bandwidth at a reasonable price point compared with direct attach storage options.
- If available, 4Gbit/sec Fiber Channel is preferred over 2Gbit.sec FC.
- Dual channel controllers should be employed with 2Gbit/sec FC on PCI-X and 4Gbit/sec FC on PCI-X 2.0 or PCI-Express.
- 1-2 FC ports per 14 disk external storage unit.
For example: in the EMC Clarion line, the CX 700 supports 8 FC ports to hosts and 8 ports to storage. Configuring 8 racks of 14 drives over 8 FC ports from storage to SAN to host should deliver nearly the full 1.3GB/sec maximum sustained transfer rate of the CX 700. For the HP EVA line, the underlying storage controllers are the HSV110/210, which have 2 ports to hosts, and 4 ports to storage. Each rack of 14 disks connected to a single HSV controller can deliver 300MB/sec sequential bandwidth, but more drives are required for high write bandwidth. For best overall performance and practicality, 2 racks of disks per HSV controller is probably the better balance over 1 disk rack per HSV controller.
Serial Attached SCSI (SAS)
The recommended configuration for the newer SAS technology drives is not yet clear. Of the major vendors, only HP currently offers SAS with the external storage unit holding 10 SFF drives in a 1U enclosure. The SAS RAID controller has 8 3Gbit/sec ports. Based on current disk drives, 2-3 disk drives per port is probably reasonable, pending verification that a single adapter can handle the combined sequential throughput of 12-18 disk drives. Otherwise consider 1-2 disk drives per port. Two 6 port adapters, 2 1U storage units and the internal bays should meet the basic SQL Server 2000 storage performance level. When configuring for higher sequential transfer rate, verify the bandwidth possible in each configuration to determine the best controller – disks combination.
SATA Controllers and Disk Drives
Many people today ask about SATA controllers and disks drives, especially now that native command queuing (NCQ) is now finally supported by both SATA controllers and disk drives. The low cost and high capacity of SATA drives is particularly attractive. The main issue is that most SATA drives are 7200RPM drives targeted at the desktop market. The design life of a desktop drive is typically 1-2 years at 20% duty cycle, while most server drives are designed for 5 years at 100% duty cycle. Primarily for this reason, SATA drives should be restricted to development environments, possibly some data warehouse environments, and potentially as backup storage until more about SATA drive reliability is better understood. SATA drives should not be used in production transaction processing environments at this time.
Summary
The main points are as follows. The 64-bit version of Windows Server 2003 has much expanded memory capability, even in Standard Edition. SQL Server 2005 Standard has removed memory constraints. The 64-bit version has full access to large address space, while the 32-bit version must use the PAE/AWE combination. Dual core processors are preferred over single core processors in most circumstances due to the favorable licensing terms. Storage configuration should factor random and sequential IO performance as the primary consideration. This means distributing IO load over as many spindles (disk drives) and IO channels as practical for the system. As always, verify the actual performance characteristics of the final configuration.
No comments:
Post a Comment