Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
admin:indexing:concepts:aggregations [2011/01/17 21:32]
els
admin:indexing:concepts:aggregations [2016/06/28 22:38] (current)
Line 1: Line 1:
 +~~NOTOC~~
 +
 {{page>:​top_add&​nofooter&​noeditbtn}} {{page>:​top_add&​nofooter&​noeditbtn}}
  
-====== Administration:​ Indexing ​Strategies ​======+====== Administration: ​Omnidex ​Indexing ======
  
 ===== Indexing Concepts ===== ===== Indexing Concepts =====
  
-**[[admin:​indexing:​concepts:​home|Overview]]** |  +[[admin:​indexing:​concepts:​home|Overview]] | 
-[[admin:​indexing:​concepts:​basics|Basics]] | +
 [[admin:​indexing:​concepts:​criteria|Criteria]] |  [[admin:​indexing:​concepts:​criteria|Criteria]] | 
 [[admin:​indexing:​concepts:​joins|Joins]] |  [[admin:​indexing:​concepts:​joins|Joins]] | 
-[[admin:​indexing:​concepts:​aggregations|Aggregations]] | +**[[admin:​indexing:​concepts:​aggregations|Aggregations]]** 
 [[admin:​indexing:​concepts:​ordering|Ordering]] |  [[admin:​indexing:​concepts:​ordering|Ordering]] | 
-[[admin:​indexing:​concepts:​retrieval|Retrievals]] |  +[[admin:​indexing:​concepts:​retrieval|Retrievals]]
-[[admin:​indexing:​concepts:​alternatives|Alternatives]]+
  
 ---- ----
Line 18: Line 18:
 ==== Aggregations ==== ==== Aggregations ====
  
-Indexes are most commonly used for processing criteria and joining tablesbut they can also be quite helpful for aggregations.  An index has the advantage of already being in sorted order. ​ If a multi-column index contains all of the columns in the GROUP BY clause, as well as all of the columns ​referenced in COUNT, ​SUM, AVERAGE, MIN and MAX functions, the the index can simply ​be scanned in sorted orderand the aggregations ​can be returned without accessing ​the disk drive.+Omnidex indexing is valuable ​for processing aggregations. ​ Aggregations involve the use of the COUNT, SUM, AVERAGE, MIN and MAX functions, and may or may not include a GROUP BY clause. ​ Omnidex first considers the type of aggregation. ​ Queries to request a COUNT(*) without a GROUP BY clause are the simplest queries. ​ These are optimized differently than queries that use a GROUP BY clause or use other aggregation functions. 
 + 
 +=== Simple COUNT(*) Queries === 
 + 
 +Omnidex always maintains a count when accessing its indexes. ​ Whether ​processing criteria ​or performing a table join, a count is always available. ​ Counts that simply request COUNT(*) are subsequently the simplest to optimize. 
 + 
 +The following query requests a simple count from a table. ​  
 + 
 +<​code>​ 
 + 
 +  select ​       count(*) 
 +    from        HOUSEHOLDS 
 +    where       STATE = '​CO' ​and 
 +                CITY = '​Boulder'​ and 
 +                ZIP = '​80301';​ 
 +</​code>​ 
 + 
 + 
 +Omnidex will search its indexes to process the criteriaand it will have the count ready when it is complete. ​ There is no additional work needed to return the count, and no access to the database is required. ​ Since these types of queries are very common, this makes Omnidex especially fast as a count engine. 
 + 
 +=== General Aggregations === 
 + 
 +Queries that aggregate data often use GROUP BY clauses or other aggregation functions. ​ These queries ​can be optimized with Omnidex indexing as well.  ​The following query shows counts from the HOUSEHOLDS table aggregated by STATE, CITY and ZIP. 
 + 
 +<​code>​ 
 + 
 +  select ​       STATE, CITY, ZIP, count(*) 
 +    from        HOUSEHOLDS 
 +    group by    STATE, CITY, ZIP 
 +</​code>​ 
 + 
 +An index has the advantage of already being in sorted order. ​ If the STATE, CITY and ZIP columns ​are all individually indexed with Omnidex, then these three indexes will be used in combination to perform ​the aggregation. ​ The STATE index will be scanned, followed by the CITY index, followed by the ZIP index. ​ The aggregations can be returned without accessing the database. ​  
 + 
 +Some aggregations benefit from multi-column indexessuch as a single index with the STATE, CITY and ZIP.  This is more efficient than using three separate indexes, and is especially important if any of the columns ​contain fairly unique values. ​ It is also required for aggregations that use the SUM, AVERAGE, MIN and MAX functions.   
 + 
 +These techniques can be used even after other indexes have been used to process criteria. ​ After the criteria has been processed, the isolated record pointers ​can be used as a filter when the aggregation index is scanned.  This allows for great flexibility,​ since the columns referenced ​in criteria do not need to be included in the aggregation index. ​ In the following query, the same aggregations ​are restricted to people in the United States  
 + 
 +<​code>​ 
 + 
 +  select ​       STATE, CITY, ZIP, count(*) 
 +    from        HOUSEHOLDS 
 +    where       ​COUNTRY = '​US'​ 
 +    group by    STATE, CITY, ZIP 
 +</​code>​ 
  
-This technique can be used even after other indexes have been used to process criteria. ​ After the criteria has been processedthe isolated ​record pointers ​can be used as a filter ​when the aggregation index is scanned.  ​This allows for great flexibility,​ since the columns referenced in criteria do not need to be included in the aggregation index.+Omnidex will first qualify the rows in the United Statesand will then allow those record pointers ​to be used as a filter ​against ​the aggregation index.  ​
  
-As was seen with criteria and table joins, sometimes an index is not the fastest approach ​to aggregating data.  Using an index for aggregations requires scanning the index from the beginning to the end.  If the criteria isolates only a small number of rows, it would be faster to simply retrieve the rows and aggregate them directly, than to scan millions of entries in an index.+=== When to Avoid Indexes ===
  
-As a general rule of thumbit is valuable to use indexing ​for processing criteria ​when aggregating greater than 0.01% of the rows in table. If aggregating less than 0.01% of the rows, the overhead of indexes exceeds its gains, and it is better ​to simply ​scan the data+As was seen with criteria and table joinssometimes an index is not the fastest approach. ​ Using an index for aggregations usually requires scanning the index from the beginning to the end, even when there is criteria ​If ​the criteria isolates ​very small number ​of rows, it would be faster ​to simply ​retrieve ​the rows and aggregate them directly, than to scan millions of entries in an index.
  
-Omnidex automatically ​considers ​this issue when optimizing SQL queries. Omnidex ​maintains a configuration ​setting called the AGGREGATION_THRESHOLD ​which controls whether to use an indexes to resolve aggregations. The AGGREGATION_THRESHOLD defaults ​to 0.01%, ​meaning that indexes will be used to aggregate more than 0.01% of the data.  This [[admin:​optimization:​config:​aggregation|setting]] can be overridden as needed.+As a general rule of thumb, it is valuable to use indexing for processing criteria when aggregating greater than 0.01% of the rows in a table. ​Omnidex automatically ​follows ​this rule of thumb, and you may find that some queries ​don't use indexes for this very reason Omnidex ​also allows this threshold to be configured. ​ A setting called the AGGREGATION_THRESHOLD ​is set to 0.01% by defaultbut can be [[admin:​optimization:​config:​thresholds|changed]] if needed.
  
-The [[admin:​indexing:​basic:​aggregations|following article]] explains basic strategies for optimizing aggregations.+====  ==== 
 +The [[admin:​indexing:​strategies:​aggregations|following article]] explains basic strategies for optimizing aggregations.
  
 =====  ===== =====  =====
  
-**[[admin:​indexing:​concepts:​criteria|Next]]**+**[[admin:​indexing:​concepts:​joins|Prev]]** 
 +**[[admin:​indexing:​concepts:​ordering|Next]]**
  
 ====== Additional Resources ====== ====== Additional Resources ======
 
Back to top
admin/indexing/concepts/aggregations.1295299976.txt.gz · Last modified: 2016/06/28 22:38 (external edit)