Subscribe

RSS Feed (xml)

Powered By

Skin Design:
Free Blogger Skins

Powered by Blogger


Monday, September 29, 2008

Analysis Services in High Data Volume Business scenarios

Executive Summary

The realization of the importance of data warehouses coupled with ever decreasing price of hardware and advanced computing options has resulted in datawarehouses of thousands of gigabytes over the last few years. This in turn has paved way for OLAP solutions that store detail and aggregate data for optimally answering complex business queries related to multi-dimensional data. However, with increasing data volumes and complexity of business requirements, OLAP solutions are also challenged to meet the processing and reporting SLA’s defined by business. Parallel Processing of OLAP cubes is a very powerful option for meeting such performance and timeline demands. This white paper is an initiative to understand the benefits Parallel Processing Utility provides in processing large Analysis Services 2000 cubes in high volume business scenarios. The paper also explains how improvements in Analysis Services 2005 ensure that Parallel Processing is inbuilt in the upgraded version and there is no requirement for Parallel Processing Utility in 2005.

Introduction

This white paper explains the benefits derived by use of Parallel Processing utility particularly where large volumes of data need to be processed as part of the overnight batch process. It explains the underlying data processing and storage needs in large businesses across domains with the implications of impact on overnight batch loads. The complex data processing requirements in terms of data volumes and business logic have implications on the OLAP and relational database design and in the use of optimal processing methods whereby over night batch processes can meet the SLA(service level agreement) and also address the business needs for large complex data sets. This white paper addresses how Parallel Processing Utility in Analysis Services plays a stellar role in addressing such requirements.
This white paper applies to SQL Server 2000 SP4 and assumes that the reader has a basic understanding of online analytical processing (OLAP) and Analysis Services 2000 in particular. This white paper is divided into 2 parts –
The first part talks of the business problem, the data storage needs and batch process requirements, the implications of these requirements on the OLAP design.
The second half of the paper deals with the performance problems faced with sequential processing and the performance benefits derived by the use of Parallel Processing Utility
(PPU) and the optimal processing of partitions in parallel with a given set of hardware and some recommendations on the use of Parallel Processing Utility.
The available hardware environment is also explained to give the readers an understanding of the test environment and the metrics mentioned in the paper are based on the given test environment. The metrics collected are based on a common set of business requirements across domains like Retail and Manufacturing and provide real life scenarios that can be addressed using the given technology suite.
The last part of the paper explains how these problems are overcome in Analysis Services 2005 and how parallel processing is in built in the software.

Business Requirements and their Implications on storage and processing

A common business requirement across domains is to store historic information to aid managers to analyze business trends based on historic data. For instance sales volume information over the years was used by managers to understand the seasonal fluctuation of sales and analyze the impact on sales of promotions. Also waste related information is key to understand vendor performance trends over time and in ranking vendors. Similarly raw material spends over time for a manufacturer helps in understanding cost price trends over time and the impact of finished goods costs.

  • A typical requirement for any manufacturing, telecomm or retail business user is to store current quarter transaction data and historic 2 years transaction data. Hence at any given point in time the requirement is to store at least 9 quarters ( 9*3 = 27 months data).
  • Usually for large businesses across domains transaction data volumes touch 35-40 million rows per month. In data volume terms this means storing data for 27 months with fact table volume at 945-1080 million rows.
  • The large data volumes are further complicated by complex requirements like handling backdated updates. For instance Account Payment Vouchers data often have the backdated updates extending to 2 weeks. This implies that the overnight load needs to handle 15 days data instead of 1. This implies processing a larger volume of records in the overnight batch process. This also means that Incremental Processing of cubes is not possible. The performance implication of this requirement is shown in the figure (Incremental Processing) below –


In case of backdated updates the entire partition would need to be processed thereby increasing the batch load time.

  • Handling such large volumes means the use of partitioning in relational database as well as OLAP databases. This is needed to address batch load performance as well as query performance.
  • The reports need to be available early in the morning at the start of the business day. Which effectively makes the batch load window limited to anything between 2-3 hours.

The obvious benefits of partitioning being the following

  • Improve the efficiency of the backing up of the data during downtime – in case of daily runs just the back up can be limited to the current partition and previous partition based on date ranges.
  • Better query response times when date ranges are clearly defined – having multiple partitions results in lesser number of rows per partition. Hence the query response times would be better as the number of rows to traverse would be lesser.
  • Ease of archiving of data – archiving data becomes a lot more manageable with partitions. Data can be archived and purged partition wise.

Considering the data volume of 900+ million rows in our case, the transaction related facts need to be partitioned by month. The idea behind monthly partitions being not to have very large number of partitions as well ensure that number of rows in a single partition were not too large.

Cube Design Considerations for handling large data volumes:

Dimension Storage and Processing –
The dimension design has a direct impact on processing performance. The design guidelines to be followed are as follows:

  • Use MOLAP as storage mode for dimensions – MOLAP dimensions use a muti-dimensional storage area separate from the underlying dimension data while ROLAP dimensions use the underlying dimension data. MOLAP is the preferred storage mode as the dimensions involved would not have large number of members.
  • Use of Unique Member Keys – The proper use of unique members , especially at the lowest level of the dimension ensures that unnecessary joins when processing a dimension are or processing a cube partition using a dimension can be avoided.

Cube Design –
Cubes are designed based on shared dimensions and MOLAP storage. Large cubes related to sales data in manufacturing and retail or call information in telecomm industry is usually partitioned by month to ensure optimal batch load processing and query performance.

Some basic design guidelines that have to be followed while designing the cubes in Analysis Services are:

  • Each cube corresponds to a fact table in the underlying SQL Server relational database. Hence each cube in turn corresponds to specific subject area.
  • The naming convention used for the cubes is very similar to that used for fact tables and is the list of subject areas of measures in a cube. For instance Sales cube would store all measures related to Sales.
  • All cubes contain the measures stored in the underlying fact tables ( classified as Base Measures) in the cubes and calculated measures which are based on a calculation on set of Base Measures.
  • All cubes are designed with 20% aggregations using the Storage Design Wizard. It Is also recommended that 6 months after the application goes live a usage pattern analysis of queries is performed to understand the effectiveness of existing aggregations and whether any additional aggregations are needed.
  • Cubes are referenced by Shared Dimensions hence there is no additional overhead of processing dimensions while processing the cubes as the dimensions would already have been processed. Moreover the build time for cubes would be reduced due to the use of Shared dimensions.
  • The maximum number of dimensions referenced by a cube is restricted to 4.
  • Cubes based on large fact tables (like Receipts, Payments, Sales) are partitioned by month to ensure that query and processing performance of cubes is optimal.
  • The storage mode for the cubes is MOLAP as MOLAP gives the best performance in terms of query response times and storage requirements.
  • Use of Optimize Schema while doing cube design to ensure that unnecessary joins between facts and dimensions can be avoided resulting in better cube processing performance.


Performance Issues with Sequential Processing

Inspite of usage of partitions it was found that given the time window of 2 hours for processing the dimensions and cubes was not sufficient as the volume of data was fairly large (often 40 million rows for Payments, Sales cubes) due to back dated updates for Payments and Sales based fact tables.It was found that with sequential processing the time taken to process the backdated transaction related cubes was exceeding two hours. Hence the cube processing SLA was not being met.
The average data volumes processed across different manufacturing projects in the daily runs for the dimensions and cube and the time taken to process is given in the table below –

Dimension/Cube

Daily Refresh Strategy

Object Type

Processing Time

Data Volume

Item

Full

Dimension

6 seconds

21593

Time

Full

Dimension

5 seconds

22286

Factory

Full

Dimension

2 seconds

450

Supplier

Full

Dimension

2 seconds

780

Payments

Full

Cube

1 hour 05 minutes

37,123,254

Receipts

Full

Cube

1 hour 20 minutes

46,187,562

Total time taken for Cube Processing = 2 hours 25 minutes (sequential processing)

The average data volumes processed across different retail projects in the daily runs for the dimensions and cube and the time taken to process is given in the table below –

Dimension/Cube

Daily Refresh Strategy

Object Type

Processing Time

Data Volume

Product

Full

Dimension

7 seconds

22978

Time

Full

Dimension

4 seconds

22286

Outlet

Full

Dimension

2 seconds

566

Supplier

Full

Dimension

2 seconds

963

Waste

Full

Cube

1 hour 15 minutes

44,123,254

Sales

Full

Cube

1 hour 25 minutes

52,187,562

Total time taken for Cube Processing = 2 hours 40 minutes (sequential processing)
The diagram below shows the problems with sequential processing can be resolved by utilizing the Parallel Processing Utility in Analysis Services 2000.

Performance Gains from Parallel Processing

The Parallel Processing Utility from Microsoft helps in processing cubes in parallel. It achieves this by using DSO calls and optimum utilization of CPU’s. The PPU can process cubes, partitions and dimensions in Analysis Services databases on local or remote servers. It includes two executable files namely ParallelProcess.exe and ProcessPartition.exe. ParallelProcess.exe is a Microsoft Windows program that enables a user to specify and process multiple cubes, partitions and dimensions on a server. Unlike Analysis Manager this program executes requests in parallel. ParallelProcess.exe calls the auxiliary executable file, ProcessPartition.exe to perform the actual processing of cubes, partitions and dimensions.

The basic rationale of parallel processing is explained by understanding the cube processing holistically. Processing partitions is a two-stage operation. In the first stage, data is read from the relational database into AS, grouped into segments and saved to disk. CPU usage in this stage is largely dependent on how fast data arrives from the RDBMS. Generally less than one CPU is occupied at this time. The second stage requires reading the data back from disk and computing aggregations and indexes for the data. This operation is performed in multiple threads, and can be very CPU intensive. Multiple processors may be fully utilized.

The PPU was integrated with the existing DTS jobs for the cubes and based on benchmark testing in the given environment it was observed that 2 partitions could be run in parallel and the processing time came down by around 25 %. This was a substantial gain and helped in meeting the cube processing SLA. The table below shows the data volumes and execution time for the dimensions and cubes with parallel processing of the cubes in manufacturing and retail domains -

Dimension/Cube

Daily Refresh Strategy

Object Type

Processing Time

Data Volume

Item

Full

Dimension

6 seconds

21593

Time

Full

Dimension

5 seconds

22286

Factory

Full

Dimension

2 seconds

450

Supplier

Full

Dimension

2 seconds

780

Payments

Full

Cube

47 minutes

37,123,254

Receipts

Full

Cube

57 minutes

46,187,562

Total Time to Process the cubes – 1 hour 44 minutes (parallel processing of 2 partitions).
Gain in Time from use of PPU – 41 minutes
Percentage Gain in Processing from Use of AS PPU– (41/145 ) * 100 = 28%

Dimension/Cube

Daily Refresh Strategy

Object Type

Processing Time

Data Volume

Product

Full

Dimension

7 seconds

22978

Time

Full

Dimension

4 seconds

22286

Outlet

Full

Dimension

2 seconds

566

Supplier

Full

Dimension

2 seconds

963

Waste

Full

Cube

50 minutes

44,123,254

Sales

Full

Cube

1 hour 10 minutes

52,187,562

Total Time to Process the cubes – 2 hours (parallel processing of 2 partitions).
Gain in Time from use of PPU – 40 minutes
Percentage Gain in Processing from Use of AS PPU– (40/160 ) * 100 = 25%

Hence a substantial performance benefit was found with the use of Parallel Processing Utility (PPU). The gain percentage will however vary depending on available hardware and data volumes. With the given environment it is recommended to process two partitions in parallel without causing memory issues or excessive load on CPU’s.

Recommendations about the use of PPU

The recommendations for parallel processing stem from the basic tenet that “All partitions CANNOT be processed in parallel”. This basically means that increasing parallelism does not imply performance gains.

The recommendations based on CPU sizes are as follows:

# of CPUs Optimum # of parallel executions
--------- ----------------------------------------------------
4 2-3
8 4-5
16 4-6

Considering the above it is recommended that:

  • Dimensions are processed in serial order. Since dimension processing in the current scenario takes negligible amount of time, this will not impact the processing times.
  • Process partitions of a SINGLE cube parallely. Since with the current design, for Sales and Waste at the most two monthly partitions need to be processed it will be recommended to process these two monthly partitions parallely. Hence at any given time not more than 2 partitions should be processed. However in cases where additional memory and processors are available more number of cube partitions could be processed in parallel.

Don’ts for the usage of the Parallel Processing Utility

  • Do not parallelize processing of dimensions and facts. For example, do not process the TIME dimension in parallel with the Sales cube.
  • Do not increase the degree of parallelism in the Utility to a number greater than the number of CPU’s in the machine plus one. Thus, if the server has two CPU’s the degree of the parallelism should NOT exceed 3. A number greater than 3 will not give significant performance gains. Moreover, a high degree of parallelism will consume entire CPU resources and might create memory issues.
  • Expect the Process Buffer Size to reflect the buffer allocations for each single thread. In other words, if the Process Buffer Size is set to 600 MB and the parallelism is set to 3, it is recommended that the RAM in the server at least provides for 1800 MB (600*3)

Overview of Parallel Processing in Analysis Services 2005

Analysis Services 2005 embraces the XMLA standards and an enhanced object model which supports parallel processing without use of external utilities. The basic drawback of DSO based objects in AS 2000 rendered it unavailable for parallel processing using Analysis Services GUI. The DSO objects had to be invoked in a defined manner to simulate parallelism. Usage of finer grained control on the processing options provides increased flexibility in AS 2005 to perform processes in parallel using XMLA scripts, GUI based processing or SSIS based processing.
Parallel Processing – In Detail
When an execution request is submitted to AS 2005, an execution plan is generated that provides the blueprint for processing the given task. The execution plan is prepared based on dependencies existing between the objects to be processed.
Flowchart: Alternate Process: Request submitted for processing

The following parameters provide opportunities for configuring the parallel capabilities of AS 2005. These are:

  • CoordinatorExecutionMode: This is a server-wide configuration value (specified in the msmdsrv.ini file) that specifies the maximum number of concurrent jobs in a single processing request. The server administrator can use this to throttle parallelism. The default is zero, i.e., unlimited parallelism.
  • MaxParallel: This is an attribute in the command. It specifies the maximum number of concurrent jobs in the request. It does not override CoordinatorExecutionMode. It can only further throttle the parallelism.
  • CPUs: Depending on the number of CPUs, the Analysis server can artificially throttle the parallelism to avoid unnecessary context switching.
  • Memory Quotas: Every job requests a memory quota that is the minimum and maximum amount of memory it needs. If the memory management system cannot grant it the minimum requested memory, then it blocks the job until the memory is available.

Conclusions

The first half of the paper discussed the business problem and the implications on the relational database and cube design. The business requirements were explained and the implications of these requirements on the design were also illustrated. The second half of the paper spoke of the problems faced with sequential processing of partitions in the large volume cubes and how the processing times were brought down by 25-28% by the use of Parallel Processing Utility. The final part of the paper spoke of parallel processing in Analysis Services 2005. The expected benefits from PPU in Analysis Services can be matched by parallel processing in Analysis Services 2005 and there is much better control on the process at server side level unlike in PPU with Analysis Services 2000. Some general observations about optimum cube processing performance in large data warehouse/ data mart implementations

  • In large data mart/ data warehouse to reduce processing time it is recommend to use a multiprocessor server. Multiprocessor servers enable cubes to be processed in parallel and thereby reduce the time needed to process the cubes.
  • Use of Parallel Processing Utility to parallelize processing of cubes, partitions and dimensions as applicable. This will utilize the existing hardware and reduce the processing time as compared to sequential processing.
  • It is however important to understand that just increasing the number of processors will not increase the performance linearly. It is important to understand that during parallel processing memory also plays a significant role. For instance if the Process Buffer Size is set to 600 MB and 3 partitions are to be processed in parallel the RAM in the server needs to be 1800 MB (600 * 3).
  • It is recommended that large data mart implementation use 64 bit SQL Server. The benefits are manifold – the memory allocation for Analysis Services can be increased beyond 3 GB. In addition with 64 bit SQL Server the benefits are pronounced where large cubes need to be processed ion short time windows. The 64 bit Analysis Services would have more memory and hence while processing partitions the use of temporary files is minimized thereby reducing the processing times. Temporary file usage can be monitored by looking at the performance counters Temp files bytes written/sec and Temp file rows written /sec under the Analysis Services: Proc Aggs object.
  • It is recommended that the aggregation levels do not exceed 20% for large data volume implementations. The increase in aggregation percentage implies that the storage needs for the cubes would go up and also the processing time.
  • It is recommended that MOLAP storage mode is used for optimum cube processing and query performance. However for large dimension scenarios ROLAP is the more optimal mode of storage.
  • Migrating to SQL Server 2005 ensures that parallel processing of cubes is in built in Analysis Services and there is no need to process cubes in parallel using external utilities like the Parallel Processing Utility.

No comments:

Post a Comment

Recent Posts

Archives