Differences

This shows you the differences between two versions of the page.

Link to this comparison view

admin:indexing:concepts:aggregations [2012/01/20 18:05]
doc
admin:indexing:concepts:aggregations [2016/06/28 22:38]
Line 1: Line 1:
-~~NOTOC~~ 
  
-{{page>:​top_add&​nofooter&​noeditbtn}} 
- 
-====== Administration:​ Indexing Strategies ====== 
- 
-===== Indexing Concepts ===== 
- 
-[[admin:​indexing:​concepts:​home|Overview]] |  
-[[admin:​indexing:​concepts:​criteria|Criteria]] |  
-[[admin:​indexing:​concepts:​joins|Joins]] |  
-**[[admin:​indexing:​concepts:​aggregations|Aggregations]]** |  
-[[admin:​indexing:​concepts:​ordering|Ordering]] |  
-[[admin:​indexing:​concepts:​retrieval|Retrievals]] 
- 
----- 
- 
-==== Aggregations ==== 
- 
-Omnidex indexing is valuable for processing aggregations. ​ Aggregations involve the use of the COUNT, SUM, AVERAGE, MIN and MAX functions, and may or may not include a GROUP BY clause. ​ Omnidex first considers the type of aggregation. ​ Queries to request a COUNT(*) without a GROUP BY clause are the simplest queries. ​ These are optimized differently than queries that use a GROUP BY clause or use other aggregation functions. 
- 
-=== Simple COUNT(*) Queries === 
- 
-Omnidex always maintains a count when accessing its indexes. ​ Whether processing criteria or performing a table join, a count is always available. ​ Counts that simply request COUNT(*) are subsequently the simplest to optimize. 
- 
-The following query requests a simple count from a table.  ​ 
- 
-<code sql> 
- 
-  select ​       count(*) 
-    from        HOUSEHOLDS 
-    where       STATE = '​CO'​ and 
-                CITY = '​Boulder'​ and 
-                ZIP = '​80301';​ 
-</​code>​ 
- 
- 
-Omnidex will search its indexes to process the criteria, and it will have the count ready when it is complete. ​ There is no additional work needed to return the count, and no access to the database is required. ​ Since these types of queries are very common, this makes Omnidex especially fast as a count engine. 
- 
-=== General Aggregations === 
- 
-Queries that aggregate data often use GROUP BY clauses or other aggregation functions. ​ These queries can be optimized with Omnidex indexing as well.  The following query shows counts from the HOUSEHOLDS table aggregated by STATE, CITY and ZIP. 
- 
-<code sql> 
- 
-  select ​       STATE, CITY, ZIP, count(*) 
-    from        HOUSEHOLDS 
-    group by    STATE, CITY, ZIP 
-</​code>​ 
- 
-An index has the advantage of already being in sorted order. ​ If the STATE, CITY and ZIP columns are all individually indexed with Omnidex, then these three indexes will be used in combination to perform the aggregation. ​ The STATE index will be scanned, followed by the CITY index, followed by the ZIP index. ​ The aggregations can be returned without accessing the database.  ​ 
- 
-Some aggregations benefit from multi-column indexes, such as a single index with the STATE, CITY and ZIP.  This is more efficient than using three separate indexes, and is especially important if any of the columns contain fairly unique values. ​ It is also required for aggregations that use the SUM, AVERAGE, MIN and MAX functions.  ​ 
- 
-These techniques can be used even after other indexes have been used to process criteria. ​ After the criteria has been processed, the isolated record pointers can be used as a filter when the aggregation index is scanned. ​ This allows for great flexibility,​ since the columns referenced in criteria do not need to be included in the aggregation index. ​ In the following query, the same aggregations are restricted to people in the United States.  ​ 
- 
-<code sql> 
- 
-  select ​       STATE, CITY, ZIP, count(*) 
-    from        HOUSEHOLDS 
-    where       ​COUNTRY = '​US'​ 
-    group by    STATE, CITY, ZIP 
-</​code>​ 
- 
- 
-Omnidex will first qualify the rows in the United States, and will then allow those record pointers to be used as a filter against the aggregation index.  ​ 
- 
-=== When to Avoid Indexes === 
- 
-As was seen with criteria and table joins, sometimes an index is not the fastest approach. ​ Using an index for aggregations usually requires scanning the index from the beginning to the end, even when there is criteria. ​ If the criteria isolates a very small number of rows, it would be faster to simply retrieve the rows and aggregate them directly, than to scan millions of entries in an index. 
- 
-As a general rule of thumb, it is valuable to use indexing for processing criteria when aggregating greater than 0.01% of the rows in a table. Omnidex automatically follows this rule of thumb, and you may find that some queries don't use indexes for this very reason. ​ Omnidex also allows this threshold to be configured. ​ A setting called the AGGREGATION_THRESHOLD is set to 0.01% by default, but can be [[admin:​optimization:​config:​thresholds|changed]] if needed. 
- 
-====  ==== 
-The [[admin:​indexing:​strategies:​aggregations|following article]] explains basic strategies for optimizing aggregations. 
- 
-=====  ===== 
- 
-**[[admin:​indexing:​concepts:​joins|Prev]]** 
-**[[admin:​indexing:​concepts:​ordering|Next]]** 
- 
-====== Additional Resources ====== 
- 
-See also:  
- 
-{{page>:​admin:​indexing:​see_also&​nofooter&​noeditbtn}} 
- 
-{{page>:​bottom_add&​nofooter&​noeditbtn}} 
 
Back to top
admin/indexing/concepts/aggregations.txt ยท Last modified: 2016/06/28 22:38 (external edit)