This is an old revision of the document!


Administration: Optimizing Queries

Query Plans

Optimizing Queries

The basic optimization of a query with Omnidex has six main steps. Most of the time, following these steps is enough to insure that the query performs well.

Step 1. Understand the data model

Omnidex has different indexing approaches based on the data model. Be sure to understand the parent-child relationships of tables by following their primary and foreign constraints. For more complex databases, it is often useful to diagram the data model.

Step 2. Understand the cardinalities

For obvious reasons, Omnidex performance is affected by the cardinality of the table, meaning the number of rows in the table. This is especially true when large tables need to be joined together. Omnidex performance is also affected by the cardinality of the columns, meaning the number of unique values in the column. For example, columns with cardinalities of 32 or less are good candidates for Omnidex Bitmap indexes.

Step 3. Generate a query plan for the query

The query plan is the key to understanding the performance of the query. A quick look at the warnings and notes in the Summary section provides clues for optimizing the query, and the Details provide a clear understanding of what is actually happening when the query is processed.

This sample query has been run an a database with no Omnidex indexing at all. Moreover, this is a database of raw data files, and so there is no indexing in the underlying database, either.

----------------------------------- SUMMARY -----------------------------------
Select        I.GENDER,
              count(*)
  from        HOUSEHOLDS H
  join        INDIVIDUALS I on H.HOUSEHOLD = I.HOUSEHOLD
  where       ((H.STATE = 'CO' and
                H.CITY = 'DENVER') or
               (H.STATE = 'AZ' and
                H.CITY = 'PHOENIX')) and
              I.BIRTHDATE > 'January 1, 1980'
  group by    I.GENDER;

Version:      5.2.01  (Compiled Feb  2 2012 21:29:57)
Warnings:     UNOPTIMIZED_CRITERIA, UNOPTIMIZED_AGGREGATION, UNOPTIMIZED_SORT,
              SEQUENTIAL_SCAN, SEQUENTIAL_TABLE_JOIN
Notes:        Optimized aggregations are not possible because Omnidex indexes
                don't exist on all aggregated columns and GROUP BY columns and
                all links to dimension tables
              HDC optimization not used because table INDIVIDUALS has zero
                cardinality.
              SortMerge optimization not used because table HOUSEHOLDS has zero
                cardinality.
              Sequential Table Join on INDIVIDUALS with H.HOUSEHOLD =
                I.HOUSEHOLD
              Filter on column STATE will not be optimized because there is not
                an Omnidex index installed on the column.
              Filter on column CITY will not be optimized because there is not
                an Omnidex index installed on the column.
              Filter on column BIRTHDATE will not be optimized because there is
                not an Omnidex index installed on the column.
----------------------------------- DETAILS -----------------------------------
Retrieve HOUSEHOLDS H sequentially;
 Retrieve INDIVIDUALS I sequentially;
 Filter H.HOUSEHOLD = I.HOUSEHOLD;
 Filter I.BIRTHDATE > 'January 1, 1980';
 Filter H.STATE = 'CO';
 Filter H.CITY = 'Denver';
 Filter FILTER 0 AND FILTER 1;
 Filter H.STATE = 'AZ';
 Filter H.CITY = 'Phoenix';
 Filter FILTER 3 AND FILTER 4;
 Filter FILTER 2 OR FILTER 5;
 Pass to queue {1} [I.GENDER];
Sort {1} for GROUP BY [I.GENDER];
Retrieve {1} sequentially;
Return I.GENDER, COUNT('*');
-------------------------------------------------------------------------------

Step 4. Review the Warnings, Notes and Details of the query plan

An analysis of the warnings and notes will provide many clues for optimizing the query:

Warnings

The Warnings section displays many warnings that show opportunities for improvement:

  • UNOPTIMIZED_CRITERIA. At least one piece of criteria is not supported by an Omnidex index. The goal is always to have all criteria processed through the Omnidex indexes. This is especially true for queries that have table joins, aggregations or ordering, since Omnidex indexes cannot be used for these steps unless all criteria has been processed through Omnidex indexes.
  • UNOPTIMIZED_AGGREGATION. The aggregations in this query cannot be satisfied using Omnidex indexes. This means that the data must be retrieved from the database and sorted. It is preferable to have the aggregation processed through Omnidex indexes.
  • UNOPTIMIZED_SORT. Some aspect of this query required sorting at run time. This is not always an optimization issue, as sometimes it is required to sort intermediate results that were derived from Omnidex indexes. If sorting can be eliminated through the use of Omnidex indexes, it is an easy way to improve the query performance.
  • SEQUENTIAL_SCAN. At least one table is being sequentially scanned. This is not always an optimization issue, especially if more than 10% of the table is to be ultimately retrieved. Often it is an indication that Omnidex indexes were not present, either for processing criteria or table joins.
  • SEQUENTIAL_TABLE_JOIN. At least one table join required a sequential table join, where the second table is sequentially scanned for each and every row in the first table. This is the most inefficient approach to table joins and should always be avoided when possible. It is usually an indication that an UPDATE STATISTICS command has not be run to register the table cardinalities, and that indexes are not present on the join columns.
Notes
  • Optimized aggregations are not possible because Omnidex indexes don't exist on all aggregated columns and GROUP BY columns and all links to dimension tables. This note describes how this query is performing aggregations, but they are not supported by Omnidex indexes. The technique for optimizing most aggregations is to add Omnidex indexes to all columns that are being aggregated, or that are GROUP BY columns, or that are foreign keys to dimension tables containing GROUP BY columns. In this query, the GENDER column must be indexes since it is a GROUP BY column. No other indexes would be necessary since columns are not being aggregated, and the GROUP BY columns are not in dimension tables within a Star Schema.
  • HDC optimization not used because table INDIVIDUALS has zero cardinality. This note describes how Omnidex is trying to find a way to optimize the table join between HOUSEHOLDS and INDIVIDUALS. Omnidex is evaluating whether it can use a Hashed Data Cache (HDC) to optimize the join, but cannot because the table cardinalities are not known. This is an indication that an UPDATE STATISTICS has not been performed.
  • SortMerge optimization not used because table HOUSEHOLDS has zero cardinality. This note also describes how Omnidex is trying to find a way to optimize the table join between HOUSEHOLDS and INDIVIDUALS. Omnidex is evaluating whether it can use a Sort/Merge to optimize the join, but cannot because the table cardinalities are not known. This is an indication that an UPDATE STATISTICS has not been performed.
  • Sequential Table Join on INDIVIDUALS with H.HOUSEHOLD = I.HOUSEHOLD. This note describes that Omnidex has found no other approach to processing this table join other than a Sequential Table Join, where the second table is sequentially scanned for each and every row of the first table. This is the worst performing method of processing a table join. In this case, Omnidex indexes should be added to the join columns, HOUSEHOLDS.HOUSEHOLD and INDIVIDUALS.HOUSEHOLD.
  • Filter on column STATE will not be optimized because there is not an Omnidex index installed on the column. This note describes how there is not an Omnidex index on the STATE column, and so Omnidex must process this filter by retrieving each row and comparing it at run time. It is important to process all criteria with Omnidex indexes, both for the speed of processing the criteria and also for the speed of later steps, such as table joins, aggregations and ordering. In this case, an Omnidex index should be installed on the STATE column.
  • Filter on column CITY will not be optimized because there is not an Omnidex index installed on the column. Similar to the note above, an Omnidex index should be installed on the CITY column. Depending on whether the criteria is textual or not, the administrator should choose between an Omnidex index and a QuickText index.
  • Filter on column BIRTHDATE will not be optimized because there is not an Omnidex index installed on the column. Similar to the note above, an Omnidex index should be installed on the BIRTHDATE column.

Step 5. Apply any indexing changes or other recommendations

After considering all of these warnings and notes, we have determined that we need to add Omnidex indexes to the following columns:

  • HOUSEHOLDS.HOUSEHOLD
  • HOUSEHOLDS.STATE
  • HOUSEHOLDS.CITY
  • INDIVIDUALS.HOUSEHOLD
  • INDIVIDUALS.GENDER.

We have also determined that we run the UPDATE STATISTICS command after indexing the database.

Step 6. Produce a new query plan and re-analyze the query

After applying the indexing changes and other recommendations, a new query plan should be analyzed. The changes made in each pass will affect the processing of the query, and may result in other warnings or suggestions for query optimization. These steps should be repeated until the best optimization is obtain.

After applying the changes described above, the new query plan shows that the query is well optimized. There are no warnings or notes, and the details show that all processing is done through the Omnidex indexes, rather than the database.

----------------------------------- SUMMARY -----------------------------------
Select        I.GENDER,
              count(*)
  from        HOUSEHOLDS H
  join        INDIVIDUALS I on H.HOUSEHOLD = I.HOUSEHOLD
  where       ((H.STATE = 'CO' and
                H.CITY = 'DENVER') or
               (H.STATE = 'AZ' and
                H.CITY = 'PHOENIX')) and
              I.BIRTHDATE > 'January 1, 1980'
  group by    I.GENDER;

Version:      5.2.01  (Compiled Feb  2 2012 21:29:57)
----------------------------------- DETAILS -----------------------------------
Qualify (HOUSEHOLDS)HOUSEHOLDS where CITY = 'DENVER';
Qualify (HOUSEHOLDS)HOUSEHOLDS where and STATE = 'CO';
Create index segment O1 on 1;
Qualify (HOUSEHOLDS)HOUSEHOLDS where CITY = 'PHOENIX';
Qualify (HOUSEHOLDS)HOUSEHOLDS where and STATE = 'AZ';
Qualify (HOUSEHOLDS)HOUSEHOLDS where or $ODXID = 'segment(O1)';
Join HOUSEHOLDS using HOUSEHOLD to (INDIVIDUALS)INDIVIDUALS using HOUSEHOLD;
Qualify (INDIVIDUALS)INDIVIDUALS where and BIRTHDATE > '"January 1, 1980"';
Aggregate INDIVIDUALS using GENDER for GROUP(GENDER), COUNT(*);
Return I.GENDER, COUNT('*');
-------------------------------------------------------------------------------

Further Optimization

A query is considered “fully optimized” when there are no wanted warnings or notes in the query plan. Moreover, the goal is always to process as much of the query in the Omnidex indexes as possible, and to access the underlying database as little as possible. It is not uncommon for extremely complex queries to be processed entirely in the Omnidex indexes, with no access to the underlying database, even when there are many table joins, subqueries, complex criteria, aggregations and ordering.

Sometimes the techniques described above are not enough to achieve the desired query performance. Some queries will perform poorly because of the fundamental structure of the query. This can happen because the query is written in an unusual way, often using techniques that are recommended for an underlying relational database. It is sometimes best to study the query, understand what the query is fundamentally trying to do, and then simplify the query.

Other times, it is beneficial to restructure the query to take advantage of Omnidex indexing techniques, similar to what may have been done the underlying relational database. It is usually preferable to maintain queries in their simplest form, but like all databases, certain approaches work better than others.

If you are not able to achieve the level of performance you desired with a query, contact Technical Support. We can sometimes offer suggestions as part of your support contract, and if needed, we can discuss having an engineer analysis your environment and your query plans to achieve improved performance.

Additional Resources

See also:

 
Back to top
admin/optimization/plans/optimization.1328290115.txt.gz · Last modified: 2016/06/28 22:38 (external edit)