OLTP vs OLAP/DSS
When classifying the nature of I/O, two main terms are used; OLTP and OLAP. An example of an OLTP (OnLine Transaction Processing) database is one that stores data for a Point of Sales application, typically consisting of a high percentage of simple, short transactions from a large number of users. Such transactions generate what's referred to as Random I/O, where the physical disks spend a measurable percentage of time seeking data from various different parts of the disk for read or write purposes.
In contrast, an OLAP (Online Analytical Processing) or DSS (Decision Support System) database is one that stores data for reporting applications which typically have a smaller number of users generating much larger queries, typically resulting in Sequential I/O, where the physical disks spend most of their time scanning a range of data clustered together in the same part of the disk. Unlike OLTP databases, OLAP databases have a much higher percentage of read activity.
It's important to note that even for classic OLTP applications such as point of sales systems, actions such as backups and database consistency checks will still generate large amounts of sequential I/O. For the purposes of I/O workload classification, we take into consideration the main I/O pattern only.
As we'll see a little later, the difference between sequential and random I/O has an important bearing on the storage system design.
I/O Metrics
In order to design a storage system for a database application, as well as knowing the type of workload it produces (OLTP vs OLAP), we also need to know the volume of workload, typically measured by the number of disk reads and writes per second.
The process of obtaining or deriving these figures is determined by the state of the application. If the application is an existing production system, the figures can be easily obtained using Windows Performance Monitor. Alternatively, if the system is yet to be commissioned, estimates are derived using various methods, a common one being profiling the read and writes per transaction type, and then multiplying by the expected number of transactions per second per type.
Existing Systems
For the purposes of this section, let's assume that DISK I/O is determined to be a significant bottleneck and we need to redesign the storage system to correct it. The task then, is to collect I/O metrics to assist in this process.
The Windows Performance Monitor tool can be used to collect, among others, the DISK I/O metrics that we need. For each logical disk volume, i.e.; drive letter corresponding to a data or log drive, the following counter's average value should be collected;
- PhysicalDisk : Disk Reads/sec
- PhysicalDisk : Disk Writes/sec
In the next section, we'll use these values to approximate the number of physical disks required for optimal IO performance.
New Systems
For a system not yet in production, application I/O is estimated per transaction type in an isolated test environment. Projections are then made based on the estimated maximum number of expected transactions/sec with an appropriate adjustment made for future growth.
Armed with these metrics, let's proceed to the next section where we'll use them to project the estimated number of disks and controllers required to design a high performance storage system capable of handling the application load.
Determining the Required Number of Disks & Controllers
In the previous section we covered the process of measuring, or estimating, the number of database disk reads and writes generated by an application per second. In this section we'll cover the formula used to estimate the number of disks and controllers required to design a storage system capable of handling the expected application I/O load.
Note that the calculations presented in this section are geared towards Direct Attached Storage solutions using traditional RAID storage. Configuring SAN based Virtualized RAID (V-RAID) storage is a specialist skill, and one that differs between various SAN solutions and vendors. As such, the calculations presented here should be used as a rough guideline only.
Calculating Number of Disks Required
In calculating the number of disks required to support a given workload, two values must be known; the required disk I/O's per second, which is the sum of the reads and writes that we looked at in the previous section, and the I/O per second capacity, or IOPS, of the individual disks involved.
The IOPS value of a given disk depends on many factors including the type of disk (SCSI, SAS, SATA, and Fiber), the spin speed (e.g.: 10,000 RPM, 15,000 RPM) and the IO type (Random vs Sequential). Tools such as SQLIO, can be used to measure a disk's IOPS capacity. The sidebar “Disk Drive Technologies” covers the different types of disk storage available, and the attributes of each which effect the calculations presented below.
Storage Virtualization
The process of selecting RAID levels and calculating the required number of disks is significantly different in a SAN (and between different SAN vendors) compared to a traditional Direct Attached Storage (DAS) solution. Configuring and monitoring virtualized SAN storage is a specialist skill, and DBAs should insist on SAN vendor involvement in the setup and configuration of storage for SQL Server deployments. The big four SAN Vendors (EMC, Hitachi, HP and IBM) are all capable of providing their own consultants, usually well versed in SQL Server storage requirements, to setup and configure storage and related backup solutions to maximize SAN investment
For the purposes of calculating required disk numbers, an often used average is 125 IOPS per disk for Random I/O. Whilst commonly used server class 15,000 RPM SCSI disks are capable of higher speeds, particularly for sequential I/O, the 125 IOPS figure is a reasonable average for the purposes of estimation and enables the calculated disk number to include a comfortable margin for error for handling peak, or higher than expected, loads.
Let's look at a commonly used formula for calculating disk numbers;
Required # Disks = (Reads/Sec + (Writes/Sec * RAID adjuster)) / Disk IOPS
As above, dividing the sum of the disk reads and writes per second by the disk's IOPS yields the amount of disks required to support the workload.
RAID adjuster takes into account the additional writes incurred by a RAID system in providing fault tolerance at the disk level. RAID 0, which provides no fault tolerance, has no write overhead, hence a RAID adjuster of 1. RAID 1 and 10 incur two physical writes to mirror each requested write, hence they have a raid adjuster of 2. RAID 5, in maintaining parity, has a raid adjuster of 4.
Disk Drive Technologies
ATA
Using a parallel interface, ATA is one of the original implementations of disk drive technologies for the personal computer. Also known as IDE or Parallel ATA, it integrates the disk controller on the disk itself and uses ribbon style cables for connection to the host.
SATA
In widespread use today, SATA, or Serial ATA drives are an evolution of the older Parallel ATA drives offering numerous improvements such as faster data transfer, thinner cables for better air flow, and a feature known as Native Command Queuing (NCQ) whereby queued disk requests are reordered to maximize the throughput. Compared to SCSI drives, discussed next, SATA drives offer much higher capacity per disk, with multi terabyte drives available today. The downside of very large SATA disk sizes is the increased latency of disk requests, partially offset with Native Command Queuing.
SCSI
Generally offering higher performance than SATA drives, albeit for a higher cost, SCSI drives are commonly found in server based RAID implementations and high end workstations. Paired with a SCSI controller card, up to 15 disks can be connected to a server for each channel on the controller card. Dual channel cards enable 30 disks to be connected per card, and multiple controller cards can be installed in a server, allowing a large number of disks to be directly attached to a server. It's increasingly common for organizations to use a mixture of both SCSI drives, for performance sensitive applications, and SATA drives, for applications requiring high amounts of storage. An example of this for a Database application is to use SCSI drives for storing the database, and SATA drives for storing online disk backups.
SAS
SAS, or Serial Attached SCSI disks connect directly to an SAS port, unlike traditional SCSI disks which share a common bus. Borrowing from aspects of Fibre Channel technology, SAS was designed to break past the current performance barrier of the existing Ultra 320 SCSI technology, and offers numerous advantages owing to its smaller form factor and backwards compatibility with SATA disks. As a result, SAS drives are growing in popularity as an alternative to SCSI.
Fibre Channel
Fibre Channel allows high speed, serial duplex communications between storage systems and server hosts. Typically found on Storage Area Networks, Fiber Channel offers more flexibility than a SCSI bus with support for more physical disks, more connected servers, and longer cable lengths.
Solid State Disks
Used today primarily in laptops and SAN cache, Solid State Disks (SSD) are gaining momentum in the desktop and server space. As the name suggests, SSDs use Solid State Memory to persist data in contrast to rotating platters in a conventional Hard Disk. With no moving parts, SSDs are more robust and promise (near) zero seek time, high performance and low power consumption. They're an exciting future prospect for SQL Server storage.
Bus Bandwidth
Once we determine the number of disks required, we need to ensure the I/O bus is capable of handling the I/O throughput. With today's disks capable of sustained sequential throughput of greater than 100MB/sec, it's easy to overwhelm an I/O channel.
Disk throughput varies depending on the load and type of I/O. As we covered in the previous section, OLTP applications consist of random I/O with a moderate percentage of disk time seeking data, with disk latency (time between disk request and response) an important attribute. In contrast, OLAP applications spend a much higher percentage of time performing sequential I/O, thus the throughput is greater and therefore bandwidth requirements are higher.
There are a variety of bus types matching the different disk types. For example, in a direct attached SCSI disk enclosure, the typical bus used today is Ultra320, with a maximum of 320 MB/sec per channel. In contrast, today's 2 Gigabit fiber channel offers approximately 200 MB/sec throughput.
In our above example of 2000 disk transfers/sec (1200 read and 400 write), assuming these were for an OLTP application with Random I/O and 8K I/O transfers (which is the SQL Server transfer size for Random I/O), the bandwidth requirements can be calculated as 2000 * 8K, which is a total of 16MB/sec, well within the capabilities of either Ultra320 SCSI or 2 Gigabit Fiber Channel.
Should the bandwidth requirements exceed the maximum throughput, additional disk controllers and/or channels will be required to support the load.
A Note on Capacity
You'll note that we haven't addressed storage capacity requirements yet. This is a deliberate decision to ensure the storage system is designed for throughput and performance as the highest priority.
A common mistake made when designing storage for SQL Server databases is to base the design on capacity requirements alone. A guiding principle in designing high performance storage solutions for SQL Server is to stripe data across a large number of dedicated disks and multiple controllers. The resultant performance is much greater than what would be achieved with fewer, higher capacity disks. Storage solutions designed in this manner usually exceed the capacity requirements as a consequence of the performance centric approach.
In our above example where we calculated the need for 16 disks, assuming we use 73GB 15,000 RPM SCSI disks on an Ultra320 bus, we have a total available capacity of 1.1TB. Usable space, after RAID 10 is implemented, would come down to around 500GB. Higher capacity can obviously be achieved with higher capacity disks, or more disks of the same size. Given the choice, go with more disks of a lower capacity, particularly for systems with a high percentage of read activity.
If the projected capacity requirements for our database only total 50GB, then so be it. We end up with 10% storage utilization as a consequence of a performance centric design.
In contrast, a design that was capacity centric would probably choose a single 73GB disk, or two disks to provide redundancy. What are the consequences of this for our example? Assuming 125 IOPS per disk, we would experience extreme disk bottlenecks with massive disk queues handling close to 2000 required IOPS!
Whilst low utilization levels will probably be frowned upon, this is the price of performance, and a much better outcome than constantly dealing with disk bottlenecks. A quick look at any of the server specifications used in setting performance records for the Transaction Processing Council (tpc.org) tests will confirm a low utilization, high disk stripe approach such as that presented above.
Finally, placing capacity as a secondary priority behind performance does not mean we can ignore it. Sufficient work should be carried out to estimate both the initial and future storage requirements. Running out of disk space at 3am is not something I recommend!
This article is an excerpt from Rod Colledge's forthcoming book SQL Server 2008 Administration, published by Manning Publications. All rights reserved.
No comments:
Post a Comment