Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
admin:indexing:strategies:aggregations [2011/03/17 22:03]
deb
admin:indexing:strategies:aggregations [2016/06/28 22:38] (current)
Line 15: Line 15:
 ---- ----
  
-==== Optimizing Aggregations ====+==== Optimizing ​Count Aggregations ====
  
-Aggregations ​are usually optimized by creating an index containing all of the columns in the GROUP BY clause ​and all of the columns referenced in COUNT, SUM, MIN, MAX and AVERAGE functions.  ​It is preferable to have the columns ​in the GROUP BY clause ​precede ​the columns being aggregated.  ​+Count aggregations ​are any SELECT statements that request "​COUNT(*)",​ "​COUNT(column)",​ or "​COUNT(DISTINCT column)",​ regardless ​of whether there is a GROUP BY clause.  These queries are optimized using a collection ​of Omnidex indexes.  ​Each column that is referenced ​in the GROUP BY clause ​and referenced in the COUNT clause must be indexed using a standard Omnidex index.  ​Omnidex will use all of these indexes in combination to optimize the query.
  
-The COUNT(*) aggregation ​is treated uniquely.  ​The use of the asterisk does not equate ​to an individual ​column and does not expand ​the side of the index.  ​In fact, Omnidex maintains counts at each step of the search, so simple COUNT(*) aggregations do not need specialized indexing.+The advantage of this optimization approach ​is that a small number of indexes can support nearly any combination of GROUP BY columns, providing high performance at little cost. 
 + 
 +There are some situations in which specialized indexes are better for optimizing a query.  ​When a GROUP BY column has especially high cardinality,​ such as over 64,000 distinct values, it is necessary ​to create a multi-column ​index as described below. ​ Additionally,​ if the same GROUP BY columns are used frequently, a multi-column index will provide even faster performance. 
 + 
 +==== Optimizing Other Aggregations ==== 
 + 
 +Aggregations that use the SUM, AVERAGE, MIN and MAX functions are optimized by creating an index containing all of the columns in the GROUP BY clause and all of the columns referenced in SUM, MIN, MAX and AVERAGE functions.  ​It is preferable to have the columns in the GROUP BY clause precede the columns being aggregated. ​ The order of the GROUP BY columns does not make much difference, though if the same GROUP BY clause is repeated used, then order the columns in the index the same way.
  
 Some applications will use GROUP BY clauses that consist of columns from a child table and columns from parent or grandparent tables. ​ In these cases, the Omnidex index in the child should include all columns from the child table as well as the foreign keys that point to the parent table. Some applications will use GROUP BY clauses that consist of columns from a child table and columns from parent or grandparent tables. ​ In these cases, the Omnidex index in the child should include all columns from the child table as well as the foreign keys that point to the parent table.
  
 It is fine for one index to service multiple aggregations. ​ This occurs when one index contains a superset of all of the columns for several queries. ​ For example, an index on columns A, B, C and D would allow sums of D grouped by A, and also averages of D and C grouped by B and A.  The performance of these aggregations will begin to degrade as the index width increases, so indexes should not contain an excessive number of columns. ​ A good rule of thumb is to keep the width of these indexes below 64 bytes, though they can be as large as 240 bytes. It is fine for one index to service multiple aggregations. ​ This occurs when one index contains a superset of all of the columns for several queries. ​ For example, an index on columns A, B, C and D would allow sums of D grouped by A, and also averages of D and C grouped by B and A.  The performance of these aggregations will begin to degrade as the index width increases, so indexes should not contain an excessive number of columns. ​ A good rule of thumb is to keep the width of these indexes below 64 bytes, though they can be as large as 240 bytes.
 +
 +==== Examples ====
  
 == Example 1.  Ungrouped COUNT(*) aggregations == == Example 1.  Ungrouped COUNT(*) aggregations ==
Line 29: Line 37:
 In this example, the COUNT(*) does not require any additional indexes. In this example, the COUNT(*) does not require any additional indexes.
  
-<​code ​sql>+<​code>​
  
   select ​    ​count(*)   select ​    ​count(*)
Line 43: Line 51:
 In this example, the COUNT(*) does not require any additional indexes, but the GROUP BY clause requires an index.  ​ In this example, the COUNT(*) does not require any additional indexes, but the GROUP BY clause requires an index.  ​
  
-<​code ​sql>+<​code>​
  
   select ​    ​GENDER,​ count(*)   select ​    ​GENDER,​ count(*)
Line 58: Line 66:
 In this example, the GROUP BY clause contains columns from multiple parents. In this example, the GROUP BY clause contains columns from multiple parents.
  
-<​code ​sql>+<​code>​
  
   select ​    ​C.DESCRIPTION,​ S.DESCRIPTION,​ H.CITY, count(DISTINCT H.ZIP)   select ​    ​C.DESCRIPTION,​ S.DESCRIPTION,​ H.CITY, count(DISTINCT H.ZIP)
Line 75: Line 83:
 In this example, both SQL statements can be satisfied by one Omnidex index. In this example, both SQL statements can be satisfied by one Omnidex index.
  
-<​code ​sql>+<​code>​
  
   select ​    ​GENDER,​ BIRTHDATE, count(DISTINCT HOUSEHOLD)   select ​    ​GENDER,​ BIRTHDATE, count(DISTINCT HOUSEHOLD)
Line 83: Line 91:
  
  
-<​code ​sql>+<​code>​
  
   select ​    ​HOUSEHOLD,​ GENDER, count(*)   select ​    ​HOUSEHOLD,​ GENDER, count(*)
Line 97: Line 105:
 This sample environment file shows the Omnidex indexes that will optimize these queries. This sample environment file shows the Omnidex indexes that will optimize these queries.
  
-<​code ​sql>+<​code>​
 create environment create environment
  ​in ​                  "​simple.xml"​  ​in ​                  "​simple.xml"​
Line 103: Line 111:
  
 create database ​      "​SIMPLE"​ create database ​      "​SIMPLE"​
- ​type ​                FLATFILE+ ​type ​                FILE
  ​index_directory ​     "​idx"​  ​index_directory ​     "​idx"​
  ​in ​                  "​simple.xml";​  ​in ​                  "​simple.xml";​
  
 create table          "​COUNTRIES"​ create table          "​COUNTRIES"​
- ​physical ​            "​dat\cnt.dat"​+ ​physical ​            "​dat/cnt.dat"​
  (  (
   "​COUNTRY" ​          ​CHARACTER(2) ​     omnidex,   "​COUNTRY" ​          ​CHARACTER(2) ​     omnidex,
Line 122: Line 130:
  
 create table          "​STATES"​ create table          "​STATES"​
- ​physical ​            "​dat\sta.dat"​+ ​physical ​            "​dat/sta.dat"​
  (  (
   "​STATE" ​            ​CHARACTER(2) ​     omnidex,   "​STATE" ​            ​CHARACTER(2) ​     omnidex,
Line 136: Line 144:
  
 create table          "​HOUSEHOLDS"​ create table          "​HOUSEHOLDS"​
- ​physical ​            "​dat\households.dat"​+ ​physical ​            "​dat/households.dat"​
  (  (
   "​HOUSEHOLD" ​        ​CHARACTER(12),​   "​HOUSEHOLD" ​        ​CHARACTER(12),​
Line 152: Line 160:
  
 create table          "​INDIVIDUALS"​ create table          "​INDIVIDUALS"​
- ​physical ​            "​dat\individuals.dat"​+ ​physical ​            "​dat/individuals.dat"​
  (  (
   "​INDIVIDUAL" ​       CHARACTER(12),​   "​INDIVIDUAL" ​       CHARACTER(12),​
 
Back to top
admin/indexing/strategies/aggregations.1300399385.txt.gz · Last modified: 2016/06/28 22:38 (external edit)