This is an old revision of the document!


Administration: Omnidex Features

Rollup Tables

Overview

When working with very large database, there are lots of different techniques that can be used to improve performance. Omnidex provides a variety of partitioning strategies that can be used to quickly reduce a large database to a small result set. Omnidex provides specialized bitmap indexes geared for large databases. A variety of other techniques, from multi-column indexing, to Hashed Data Caching, to reusing Omnidex ID lists, all help extract the best performance possible from queries against large databases.

An additional technique is being added to Omnidex to speed performance on large databases. Omnidex will now support the use of Rollup Tables. Rollup Tables will allow certain queries that involve counts and aggregations to be processed significantly faster.

What is a Rollup Table?

Simply described, a rollup table is a physical table that pre-aggregates another table. This idea originally had its origins in accounting, where specific general ledger categories were “rolled up” into broader categories, which were in turn “rolled up” into broader categories. Ultimately, all categories “rolled up” into two categories: Income and Expense. At each level of rollup, people could see the amounts summarized at that particular level. Sometimes people wanted to see transactions at the detail level; other times they wanted see subtotals at the project or category level; and other times they wanted to see totals at a summary level.

Any database table can be similarly rolled up. While an accounting system summarizes dollar amounts, databases can summarize any information contained in the database. An orders database can summarize order amounts, order quantities, discounts, taxes, credit limits, or any other information stored in the table. This information can be pre-aggregated by product numbers or customer numbers. Even tables without financial information can be rolled up. A simple table of customers can be rolled up by geographic regions or personal traits, aggregating nothing other than the number of people within each geographic region or sharing a personal trait. Rollups can contain any aggregation allowed in SQL, including COUNT, SUM, AVG, MIN and MAX.

Once a rollup table has been created, queries can be redirected to the rollup table rather than the primary table. The rollup table is much smaller than the primary table, so this can improve performance dramatically. For example, obtaining the number of customers for each state can take a long time on a 300 million row table. The 300 million rows must be processed, or if an index exists, then a 300 million row index must be processed. If a rollup table exists that pre-aggregates by geographic region, this table will be significantly smaller. In the United States, there are about 100,000 combinations of zip code, city, county, and state. If the request for the number of customers for each state is redirected to this rollup table, it only involves processing 100,000 rows, which is 3,000 times fewer. That directly translates to a query that performs 3,000 times faster.

Multiple Rollup Tables

It can be valuable to have multiple rollups for the same primary table. For example, a primary table of customers could be rolled up first by geography, and then again by personal traits such as gender, age and marital status. One might ask why have multiple rollups rather than one larger rollup. The more columns that are added to a rollup, generally the larger the table becomes. If a rollup only contained gender, it would have only 2 rows. If age were added, it would grow to around 200 rows. If marital status were added, it could easily grow to 1000 rows. By the time education, citizenship, call flags, bankruptcy flags and other demographics are added, our rollup table will have grown to nearly 1 million rows. In order for a rollup table to provide its performance advantage, it must be only a fraction of the size of the original table. A good rule of thumb would be to limit a rollup table to 1 million rows or 10% of the primary table, whichever is smaller. If we combined all of the geographic information with all the personal traits, we would exceed this recommendation.

One might then ask why not have a separate rollup for each column, such as a rollup for gender, a separate rollup for age and a separate rollup for marital status. There are important advantages to having multiple columns in a rollup. For a query to be redirected to a rollup table, all columns from the primary table that are referenced in the query must exist in the rollup table. For example, obtaining the number of customers that are married or single, broken out by gender and age requires that all three of these columns exist in the rollup table. If each column was in a separate rollup table, it would only be able to support the simplest of queries.

This means that rollup tables require a good design, combining columns that are likely to be referenced together while insuring that the size of the table does not get too big. Even with good design, rollup tables will not be able to service every query. Many queries will still be processed against the primary table. Rollup tables are designed to help with queries that regularly aggregate large amounts of data. Because of this, analyzing the queries is always the first step in designing rollup tables, as that will reveal where the greatest opportunities lie.

Sample Rollup Tables

The following example shows a sample primary table with two rollup tables. In this example, the rollup only pre-aggregates the counts of records, but they could also be expanded to pre-aggregate other number data.

PRIMARY TABLE

Name State Zip Gender Age
Alfred CA 94122 M 27
Catherine CA 94123 F 29
Elizabeth CO 80301 F 29
George CA 94123 M 26
Jane CO 80301 F 26
Jeff CO 80304 M 27
Jennifer CA 94122 F 27
Jim CO 80301 M 27
John CO 80304 M 26
Karen CA 94122 F 29
Linda CA 94123 F 26
Mark CA 94122 M 27
Mary CA 94122 F 26
Nancy CO 80301 F 29
Paul CO 80301 M 26
Peter CA 94123 M 29
Robert CO 80301 M 27
Sarah CO 80304 F 26
Susan CA 94123 F 29
William CO 80301 M 26

GEOGRAPHIC ROLLUP TABLE

State Zip Num_Individuals
CA 94122 5
CA 94123 5
CO 80301 7
CO 80304 3

PERSONAL TRAITS ROLLUP TABLE

Gender Age Num_Individuals
F 26 4
F 27 1
F 29 5
M 26 4
M 27 5
M 29 1

Additional Resources

 
Back to top
admin/features/rollups/home.1295882813.txt.gz · Last modified: 2016/06/28 22:38 (external edit)