Differences

This shows you the differences between two versions of the page.

Link to this comparison view

admin:indexing:strategies:aggregations [2012/01/26 23:19]
doc
admin:indexing:strategies:aggregations [2016/06/28 22:38]
Line 1: Line 1:
-~~NOTOC~~ 
  
-{{page>:​top_add&​nofooter&​noeditbtn}} 
- 
-====== Administration:​ Indexing Strategies ====== 
- 
-===== Basic Strategies ===== 
- 
-[[admin:​indexing:​strategies:​home|Overview]] |  
-[[admin:​indexing:​strategies:​criteria|Criteria]] |  
-[[admin:​indexing:​strategies:​joins|Table Joins]] |  
-**[[admin:​indexing:​strategies:​aggregations|Aggregations]]** |  
-[[admin:​indexing:​strategies:​orderby|Ordering]] 
- 
----- 
- 
-==== Optimizing Count Aggregations ==== 
- 
-Count aggregations are grouped or ungrouped counts that match any of the following constructs: 
- 
-<code SQL> 
- 
-select count(*) from <​table>;​ 
-select count(distinct <​column>​) from <​table>;​ 
-select count(*) from <​table>​ where <​criteria>;​ 
-select count(distinct <​column>​) from <​table>​ where <​criteria>;​ 
-select <​column>,​ count(*) from <​table>​ where <​criteria>​ group by <​column>;​ 
-select <​column>,​ count(distinct <​column>​) from <​table>​ where <​criteria>​ group by <​column>;​ 
-</​code>​ 
- 
-==== Optimizing Other Aggregations ==== 
- 
- 
-by creating an index containing all of the columns in the GROUP BY clause and all of the columns referenced in COUNT, SUM, MIN, MAX and AVERAGE functions. ​ It is preferable to have the columns in the GROUP BY clause precede the columns being aggregated.  ​ 
- 
-The COUNT(*) aggregation is treated uniquely. ​ The use of the asterisk does not equate to an individual column and does not expand the side of the index. ​ In fact, Omnidex maintains counts at each step of the search, so simple COUNT(*) aggregations do not need specialized indexing. 
- 
-Some applications will use GROUP BY clauses that consist of columns from a child table and columns from parent or grandparent tables. ​ In these cases, the Omnidex index in the child should include all columns from the child table as well as the foreign keys that point to the parent table. 
- 
-It is fine for one index to service multiple aggregations. ​ This occurs when one index contains a superset of all of the columns for several queries. ​ For example, an index on columns A, B, C and D would allow sums of D grouped by A, and also averages of D and C grouped by B and A.  The performance of these aggregations will begin to degrade as the index width increases, so indexes should not contain an excessive number of columns. ​ A good rule of thumb is to keep the width of these indexes below 64 bytes, though they can be as large as 240 bytes. 
- 
-== Example 1.  Ungrouped COUNT(*) aggregations == 
- 
-In this example, the COUNT(*) does not require any additional indexes. 
- 
-<code sql> 
- 
-  select ​    ​count(*) 
-    from     ​INDIVIDUALS 
-    where    GENDER = '​M'​ and NAME = '​John';​ 
-</​code>​ 
- 
- 
-The GENDER and NAME indexes which were created to satisfy the criteria will also satisfy the requested count. 
- 
-== Example 2.  Grouped COUNT(*) aggregations == 
- 
-In this example, the COUNT(*) does not require any additional indexes, but the GROUP BY clause requires an index.  ​ 
- 
-<code sql> 
- 
-  select ​    ​GENDER,​ count(*) 
-    from     ​INDIVIDUALS 
-    where    BIRTHDATE = 'Jan 10, 1986' and NAME = '​John'​ 
-    group by GENDER; 
-</​code>​ 
- 
- 
-The BIRTHDATE and NAME columns must be indexed to satisfy the criteria and the GENDER column must be indexed to satisfy the GROUP BY clause. ​ As discussed earlier, GENDER is a low-cardinality column and can be an Omnidex Bitmap index. 
- 
-== Example 3.  Aggregations between a child and multiple parents == 
- 
-In this example, the GROUP BY clause contains columns from multiple parents. 
- 
-<code sql> 
- 
-  select ​    ​C.DESCRIPTION,​ S.DESCRIPTION,​ H.CITY, count(DISTINCT H.ZIP) 
-    from     ​HOUSEHOLDS H join COUNTRIES C on H.COUNTRY = C.COUNTRY ​ 
-                          join STATES S on H.STATE = S.STATE 
-    group by C.DESCRIPTION,​ S.DESCRIPTION,​ H.CITY; 
-</​code>​ 
- 
- 
-This statement is indexed by creating a multi-column Omnidex index on the COUNTRY, STATE, CITY and ZIP columns in HOUSEHOLDS. ​ The COUNTRY and STATE columns are included because they are foreign keys that correlate to the COUNTRIES.DESCRIPTION and STATES.DESCRIPTION columns. ​ The CITY column is included because it is also in the GROUP BY clause. ​ The ZIP column is included because it is referenced in an aggregation. 
- 
-On relational databases, the primary keys for COUNTRIES and STATES do not need to be indexed, but on non-relational databases, they must be indexed as well. 
- 
-== Example 4.  Multiple aggregations satisfied by one index == 
- 
-In this example, both SQL statements can be satisfied by one Omnidex index. 
- 
-<code sql> 
- 
-  select ​    ​GENDER,​ BIRTHDATE, count(DISTINCT HOUSEHOLD) 
-    from     ​INDIVIDUALS 
-    group by GENDER, BIRTHDATE; 
-</​code>​ 
- 
- 
-<code sql> 
- 
-  select ​    ​HOUSEHOLD,​ GENDER, count(*) 
-    from     ​INDIVIDUALS 
-    group by HOUSEHOLD, GENDER; 
-</​code>​ 
- 
- 
-A multi-column Omnidex index containing GENDER, BIRTHDATE and HOUSEHOLD would satisfy both of these statements because all GROUP BY columns and all aggregated columns can be found in this index. 
- 
-== Sample Environment File == 
- 
-This sample environment file shows the Omnidex indexes that will optimize these queries. 
- 
-<code sql> 
-create environment 
- ​in ​                  "​simple.xml"​ 
- ​with ​                ​delete;​ 
- 
-create database ​      "​SIMPLE"​ 
- ​type ​                FILE 
- ​index_directory ​     "​idx"​ 
- ​in ​                  "​simple.xml";​ 
- 
-create table          "​COUNTRIES"​ 
- ​physical ​            "​dat/​cnt.dat"​ 
- ( 
-  "​COUNTRY" ​          ​CHARACTER(2) ​     omnidex, 
-  "​DESCRIPTION" ​      ​STRING(47),​ 
-  "​LATITUDE" ​         FLOAT                          usage "​LATITUDE",​ 
-  "​LONGITUDE" ​        ​FLOAT ​                         usage "​LONGITUDE",​ 
-  "​CAPITAL" ​          ​STRING(31),​ 
-  "​CAPITAL_LAT" ​      ​FLOAT ​                         usage "​LATITUDE",​ 
-  "​CAPITAL_LONG" ​     FLOAT                          usage "​LONGITUDE",​ 
-  constraint COUNTRIES_COUNTRY_PK primary ("​COUNTRY"​) 
- ) 
- ​in ​                  "​simple.xml";​ 
- 
-create table          "​STATES"​ 
- ​physical ​            "​dat/​sta.dat"​ 
- ( 
-  "​STATE" ​            ​CHARACTER(2) ​     omnidex, 
-  "​DESCRIPTION" ​      ​STRING(31),​ 
-  "​STATE_CODE" ​       CHARACTER(2),​ 
-  "​REGION" ​           CHARACTER(2),​ 
-  "​COUNTRY" ​          ​CHARACTER(2),​ 
-  "​TAX_RATE" ​         FLOAT, 
-  constraint STATES_STATE_PK primary ("​STATE"​),​ 
-  constraint STATES_COUNTRY_FK foreign ("​COUNTRY"​) references "​COUNTRIES"​ 
- ) 
- ​in ​                  "​simple.xml";​ 
- 
-create table          "​HOUSEHOLDS"​ 
- ​physical ​            "​dat/​households.dat"​ 
- ( 
-  "​HOUSEHOLD" ​        ​CHARACTER(12),​ 
-  "​ADDRESS" ​          ​CHARACTER(50),​ 
-  "​CITY" ​             CHARACTER(28),​ 
-  "​STATE" ​            ​CHARACTER(2),​ 
-  "​ZIP" ​              ​CHARACTER(5),​ 
-  "​COUNTRY" ​          ​CHARACTER(2),​ 
-  constraint HSHD_HOUSEHOLD_PK primary ("​HOUSEHOLD"​),​ 
-  constraint HSHD_STATE_FK foreign ("​STATE"​) references "​STATES",​ 
-  constraint HSHD_COUNTRY_FK foreign ("​COUNTRY"​) references "​COUNTRIES",​ 
-  omnidex "​AGGREGATION_01"​ ("​COUNTRY",​ "​STATE",​ "​CITY",​ "​ZIP"​) 
- ) 
- ​in ​                  "​simple.xml";​ 
- 
-create table          "​INDIVIDUALS"​ 
- ​physical ​            "​dat/​individuals.dat"​ 
- ( 
-  "​INDIVIDUAL" ​       CHARACTER(12),​ 
-  "​HOUSEHOLD" ​        ​CHARACTER(12),​ 
-  "​NAME" ​             CHARACTER(50) ​    ​quicktext,​ 
-  "​GENDER" ​           CHARACTER(1) ​     omnidex bitmap, 
-  "​BIRTHDATE" ​        ANSI DATE         ​omnidex,​ 
-  "​PHONE" ​            ​CHARACTER(14),​ 
-  "​EMAIL" ​            ​CHARACTER(60),​ 
-  constraint IND_INDIVIDUAL_PK primary ("​INDIVIDUAL"​),​ 
-  constraint IND_HOUSEHOLD_FK foreign ("​HOUSEHOLD"​) references "​HOUSEHOLDS",​ 
-  omnidex "​AGGREGATION_01"​ ("​GENDER",​ "​BIRTHDATE",​ "​HOUSEHOLD"​),​ 
- ) 
- ​in ​                  "​simple.xml";​ 
-</​code>​ 
- 
-=====  ===== 
- 
-**[[admin:​indexing:​strategies:​joins|Prev]]** |  
-**[[admin:​indexing:​strategies:​orderby|Next]]** 
- 
-====== Additional Resources ====== 
- 
-See also:  
- 
-{{page>:​admin:​indexing:​see_also&​nofooter&​noeditbtn}} 
- 
-{{page>:​bottom_add&​nofooter&​noeditbtn}} 
 
Back to top
admin/indexing/strategies/aggregations.txt ยท Last modified: 2016/06/28 22:38 (external edit)