Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
admin:indexing:concepts:aggregations [2011/01/14 17:47]
els created
admin:indexing:concepts:aggregations [2016/06/28 22:38] (current)
Line 1: Line 1:
 +~~NOTOC~~
 +
 {{page>:​top_add&​nofooter&​noeditbtn}} {{page>:​top_add&​nofooter&​noeditbtn}}
  
-====== Administration:​ Indexing ​Strategies ​======+====== Administration: ​Omnidex ​Indexing ======
  
 ===== Indexing Concepts ===== ===== Indexing Concepts =====
  
-**[[admin:​indexing:​concepts:​home|Overview]]** |  +[[admin:​indexing:​concepts:​home|Overview]] | 
-[[admin:​indexing:​concepts:​basics|Basics]] | +
 [[admin:​indexing:​concepts:​criteria|Criteria]] |  [[admin:​indexing:​concepts:​criteria|Criteria]] | 
 [[admin:​indexing:​concepts:​joins|Joins]] |  [[admin:​indexing:​concepts:​joins|Joins]] | 
-[[admin:​indexing:​concepts:​aggregations|Aggregations]] | +**[[admin:​indexing:​concepts:​aggregations|Aggregations]]** 
 [[admin:​indexing:​concepts:​ordering|Ordering]] |  [[admin:​indexing:​concepts:​ordering|Ordering]] | 
-[[admin:​indexing:​concepts:​retrieval|Retrievals]] |  +[[admin:​indexing:​concepts:​retrieval|Retrievals]]
-[[admin:​indexing:​concepts:​alternatives|Alternatives]]+
  
 ---- ----
  
-==== Overview ​====+==== Aggregations ​==== 
 + 
 +Omnidex indexing is valuable for processing aggregations. ​ Aggregations involve the use of the COUNT, SUM, AVERAGE, MIN and MAX functions, and may or may not include a GROUP BY clause. ​ Omnidex first considers the type of aggregation. ​ Queries to request a COUNT(*) without a GROUP BY clause are the simplest queries. ​ These are optimized differently than queries that use a GROUP BY clause or use other aggregation functions. 
 + 
 +=== Simple COUNT(*) Queries === 
 + 
 +Omnidex always maintains a count when accessing its indexes. ​ Whether processing criteria or performing a table join, a count is always available. ​ Counts that simply request COUNT(*) are subsequently the simplest to optimize. 
 + 
 +The following query requests a simple count from a table. ​  
 + 
 +<​code>​ 
 + 
 +  select ​       count(*) 
 +    from        HOUSEHOLDS 
 +    where       STATE = '​CO'​ and 
 +                CITY = '​Boulder'​ and 
 +                ZIP = '​80301';​ 
 +</​code>​ 
 + 
 + 
 +Omnidex will search its indexes to process the criteria, and it will have the count ready when it is complete. ​ There is no additional work needed to return the count, and no access to the database is required. ​ Since these types of queries are very common, this makes Omnidex especially fast as a count engine. 
 + 
 +=== General Aggregations === 
 + 
 +Queries that aggregate data often use GROUP BY clauses or other aggregation functions. ​ These queries can be optimized with Omnidex indexing as well.  The following query shows counts from the HOUSEHOLDS table aggregated by STATE, CITY and ZIP. 
 + 
 +<​code>​ 
 + 
 +  select ​       STATE, CITY, ZIP, count(*) 
 +    from        HOUSEHOLDS 
 +    group by    STATE, CITY, ZIP 
 +</​code>​ 
 + 
 +An index has the advantage of already being in sorted order. ​ If the STATE, CITY and ZIP columns are all individually indexed with Omnidex, then these three indexes will be used in combination to perform the aggregation. ​ The STATE index will be scanned, followed by the CITY index, followed by the ZIP index. ​ The aggregations can be returned without accessing the database. ​  
 + 
 +Some aggregations benefit from multi-column indexes, such as a single index with the STATE, CITY and ZIP.  This is more efficient than using three separate indexes, and is especially important if any of the columns contain fairly unique values. ​ It is also required for aggregations that use the SUM, AVERAGE, MIN and MAX functions. ​  
 + 
 +These techniques can be used even after other indexes have been used to process criteria. ​ After the criteria has been processed, the isolated record pointers can be used as a filter when the aggregation index is scanned. ​ This allows for great flexibility,​ since the columns referenced in criteria do not need to be included in the aggregation index. ​ In the following query, the same aggregations are restricted to people in the United States. ​  
 + 
 +<​code>​ 
 + 
 +  select ​       STATE, CITY, ZIP, count(*) 
 +    from        HOUSEHOLDS 
 +    where       ​COUNTRY = '​US'​ 
 +    group by    STATE, CITY, ZIP 
 +</​code>​ 
 + 
 + 
 +Omnidex will first qualify the rows in the United States, and will then allow those record pointers to be used as a filter against the aggregation index.  ​
  
-  * Basics of indexing +=== When to Avoid Indexes ===
-  * Indexing for criteria  +
-    * Basic cost of qualifications +
-    * Use of multiple indexes +
-    * MDK_RETRIEVAL_THRESHOLD (rename ​to QUALIFICATION_THRESHOLD) +
-  * Indexing for joins +
-    * Basic costs of joins +
-    * Prejoined indexes +
-    * Joining across databases +
-  * Indexing for aggregations +
-    * Basic cost of index scan +
-    * Use of partition qualifiers +
-    * ASK_RETRIEVAL_THRESHOLD (rename to AGGREGATION_THRESHOLD) +
-  * Indexing for ordering +
-    * Similar to aggregations +
-  * Indexing for retrieval +
-    * Obtaining data from indexes +
-    * Base cost of reading the index +
-  * Alternatives to indexing +
-    * Full table scans +
-    * Sort-merge +
-    * Hashed data caching+
  
 +As was seen with criteria and table joins, sometimes an index is not the fastest approach. ​ Using an index for aggregations usually requires scanning the index from the beginning to the end, even when there is criteria. ​ If the criteria isolates a very small number of rows, it would be faster to simply retrieve the rows and aggregate them directly, than to scan millions of entries in an index.
  
 +As a general rule of thumb, it is valuable to use indexing for processing criteria when aggregating greater than 0.01% of the rows in a table. Omnidex automatically follows this rule of thumb, and you may find that some queries don't use indexes for this very reason. ​ Omnidex also allows this threshold to be configured. ​ A setting called the AGGREGATION_THRESHOLD is set to 0.01% by default, but can be [[admin:​optimization:​config:​thresholds|changed]] if needed.
  
 +====  ====
 +The [[admin:​indexing:​strategies:​aggregations|following article]] explains basic strategies for optimizing aggregations.
  
 =====  ===== =====  =====
  
-**[[admin:​indexing:​concepts:​criteria|Next]]**+**[[admin:​indexing:​concepts:​joins|Prev]]** 
 +**[[admin:​indexing:​concepts:​ordering|Next]]**
  
 ====== Additional Resources ====== ====== Additional Resources ======
 
Back to top
admin/indexing/concepts/aggregations.1295027269.txt.gz · Last modified: 2016/06/28 22:38 (external edit)