Administration: Indexing Strategies

Basic Strategies

Overview | Criteria | Table Joins | Aggregations | Ordering

Optimizing Count Aggregations

Count aggregations are any SELECT statements that request “COUNT(*)”, “COUNT(column)”, or “COUNT(DISTINCT column)”, regardless of whether there is a GROUP BY clause. These queries are optimized using a collection of Omnidex indexes. Each column that is referenced in the GROUP BY clause and referenced in the COUNT clause must be indexed using a standard Omnidex index. Omnidex will use all of these indexes in combination to optimize the query.

The advantage of this optimization approach is that a small number of indexes can support nearly any combination of GROUP BY columns, providing high performance at little cost.

There are some situations in which specialized indexes are better for optimizing a query. When a GROUP BY column has especially high cardinality, such as over 64,000 distinct values, it is necessary to create a multi-column index as described below. Additionally, if the same GROUP BY columns are used frequently, a multi-column index will provide even faster performance.

Optimizing Other Aggregations

Aggregations that use the SUM, AVERAGE, MIN and MAX functions are optimized by creating an index containing all of the columns in the GROUP BY clause and all of the columns referenced in SUM, MIN, MAX and AVERAGE functions. It is preferable to have the columns in the GROUP BY clause precede the columns being aggregated. The order of the GROUP BY columns does not make much difference, though if the same GROUP BY clause is repeated used, then order the columns in the index the same way.

Some applications will use GROUP BY clauses that consist of columns from a child table and columns from parent or grandparent tables. In these cases, the Omnidex index in the child should include all columns from the child table as well as the foreign keys that point to the parent table.

It is fine for one index to service multiple aggregations. This occurs when one index contains a superset of all of the columns for several queries. For example, an index on columns A, B, C and D would allow sums of D grouped by A, and also averages of D and C grouped by B and A. The performance of these aggregations will begin to degrade as the index width increases, so indexes should not contain an excessive number of columns. A good rule of thumb is to keep the width of these indexes below 64 bytes, though they can be as large as 240 bytes.

Examples

Example 1. Ungrouped COUNT(*) aggregations

In this example, the COUNT(*) does not require any additional indexes.

  select     count(*)
    from     INDIVIDUALS
    where    GENDER = 'M' and NAME = 'John';

The GENDER and NAME indexes which were created to satisfy the criteria will also satisfy the requested count.

Example 2. Grouped COUNT(*) aggregations

In this example, the COUNT(*) does not require any additional indexes, but the GROUP BY clause requires an index.

  select     GENDER, count(*)
    from     INDIVIDUALS
    where    BIRTHDATE = 'Jan 10, 1986' and NAME = 'John'
    group by GENDER;

The BIRTHDATE and NAME columns must be indexed to satisfy the criteria and the GENDER column must be indexed to satisfy the GROUP BY clause. As discussed earlier, GENDER is a low-cardinality column and can be an Omnidex Bitmap index.

Example 3. Aggregations between a child and multiple parents

In this example, the GROUP BY clause contains columns from multiple parents.

  select     C.DESCRIPTION, S.DESCRIPTION, H.CITY, count(DISTINCT H.ZIP)
    from     HOUSEHOLDS H join COUNTRIES C on H.COUNTRY = C.COUNTRY 
                          join STATES S on H.STATE = S.STATE
    group by C.DESCRIPTION, S.DESCRIPTION, H.CITY;

This statement is indexed by creating a multi-column Omnidex index on the COUNTRY, STATE, CITY and ZIP columns in HOUSEHOLDS. The COUNTRY and STATE columns are included because they are foreign keys that correlate to the COUNTRIES.DESCRIPTION and STATES.DESCRIPTION columns. The CITY column is included because it is also in the GROUP BY clause. The ZIP column is included because it is referenced in an aggregation.

On relational databases, the primary keys for COUNTRIES and STATES do not need to be indexed, but on non-relational databases, they must be indexed as well.

Example 4. Multiple aggregations satisfied by one index

In this example, both SQL statements can be satisfied by one Omnidex index.

  select     GENDER, BIRTHDATE, count(DISTINCT HOUSEHOLD)
    from     INDIVIDUALS
    group by GENDER, BIRTHDATE;

  select     HOUSEHOLD, GENDER, count(*)
    from     INDIVIDUALS
    group by HOUSEHOLD, GENDER;

A multi-column Omnidex index containing GENDER, BIRTHDATE and HOUSEHOLD would satisfy both of these statements because all GROUP BY columns and all aggregated columns can be found in this index.

Sample Environment File

This sample environment file shows the Omnidex indexes that will optimize these queries.

create environment
 in                   "simple.xml"
 with                 delete;

create database       "SIMPLE"
 type                 FILE
 index_directory      "idx"
 in                   "simple.xml";

create table          "COUNTRIES"
 physical             "dat/cnt.dat"
 (
  "COUNTRY"           CHARACTER(2)      omnidex,
  "DESCRIPTION"       STRING(47),
  "LATITUDE"          FLOAT                          usage "LATITUDE",
  "LONGITUDE"         FLOAT                          usage "LONGITUDE",
  "CAPITAL"           STRING(31),
  "CAPITAL_LAT"       FLOAT                          usage "LATITUDE",
  "CAPITAL_LONG"      FLOAT                          usage "LONGITUDE",
  constraint COUNTRIES_COUNTRY_PK primary ("COUNTRY")
 )
 in                   "simple.xml";

create table          "STATES"
 physical             "dat/sta.dat"
 (
  "STATE"             CHARACTER(2)      omnidex,
  "DESCRIPTION"       STRING(31),
  "STATE_CODE"        CHARACTER(2),
  "REGION"            CHARACTER(2),
  "COUNTRY"           CHARACTER(2),
  "TAX_RATE"          FLOAT,
  constraint STATES_STATE_PK primary ("STATE"),
  constraint STATES_COUNTRY_FK foreign ("COUNTRY") references "COUNTRIES"
 )
 in                   "simple.xml";

create table          "HOUSEHOLDS"
 physical             "dat/households.dat"
 (
  "HOUSEHOLD"         CHARACTER(12),
  "ADDRESS"           CHARACTER(50),
  "CITY"              CHARACTER(28),
  "STATE"             CHARACTER(2),
  "ZIP"               CHARACTER(5),
  "COUNTRY"           CHARACTER(2),
  constraint HSHD_HOUSEHOLD_PK primary ("HOUSEHOLD"),
  constraint HSHD_STATE_FK foreign ("STATE") references "STATES",
  constraint HSHD_COUNTRY_FK foreign ("COUNTRY") references "COUNTRIES",
  omnidex "AGGREGATION_01" ("COUNTRY", "STATE", "CITY", "ZIP")
 )
 in                   "simple.xml";

create table          "INDIVIDUALS"
 physical             "dat/individuals.dat"
 (
  "INDIVIDUAL"        CHARACTER(12),
  "HOUSEHOLD"         CHARACTER(12),
  "NAME"              CHARACTER(50)     quicktext,
  "GENDER"            CHARACTER(1)      omnidex bitmap,
  "BIRTHDATE"         ANSI DATE         omnidex,
  "PHONE"             CHARACTER(14),
  "EMAIL"             CHARACTER(60),
  constraint IND_INDIVIDUAL_PK primary ("INDIVIDUAL"),
  constraint IND_HOUSEHOLD_FK foreign ("HOUSEHOLD") references "HOUSEHOLDS",
  omnidex "AGGREGATION_01" ("GENDER", "BIRTHDATE", "HOUSEHOLD"),
 )
 in                   "simple.xml";

Prev | Next