Differences

This shows you the differences between two versions of the page.

--- admin:optimization:caches:hdc [2012/02/06 22:03]
doc created
+++ admin:optimization:caches:hdc [2016/06/28 22:38] (current)
@@ Line 8: / Line 8: @@
-**[[admin:optimization:caches:home|Overview]]** |
+[[admin:optimization:caches:home|Overview]] |
-[[admin:optimization:caches:hdc|Data Caches]] |
+**[[admin:optimization:caches:hdc|Hashed Data Caching]]** |
-[[admin:optimization:caches:qualifications|Qualification Caches]] |
+[[admin:optimization:caches:cachequal|Qualification Caching]] |
-[[admin:optimization:caches:files|File Caches]]
+[[admin:optimization:caches:filesystem|Filesystem Caching]]
 ----
-==== Data Caches ====
+==== Hashed Data Caching ====
-Omnidex has several approaches to choose from when joining tables.  Standard Omnidex indexes can be used to join tables, and pre-joined indexes can be used to accelerate joins between parents tabled and child tables.  Omnidex can use the underlying database's indexes to perform table joins.  Omnidex can use a sort/merge technique to join especially tables without indexes.  Omnidex can also load smaller tables into a temporary, memory-resident, hashed data cache for extremely fast table joins.
+Omnidex has several approaches to choose from when joining tables.  Standard Omnidex indexes can be used to join tables, and pre-joined indexes can be used to accelerate joins between parents tabled and child tables.  Omnidex can use the underlying database's indexes to perform table joins.  Omnidex can use a sort/merge technique to join large tables that have no indexes.  Omnidex can also load smaller tables into a temporary, memory-resident, hashed data cache for extremely fast table joins.
-The Hashed Data Cache is used when a smaller table will be repeatedly accessed many times.  The classic example of this joining the fact table of a star schema to dimension or snowflake tables.  Other examples include any time a table is being joined to small reference tables.  In these situations, Omnidex indexes will be favored for resolving criteria; however, if the smaller table must be accessed to return data in the result set, then the Hashed Data Cache may be used.
+The Hashed Data Cache is used when a smaller table will be repeatedly accessed in a table join.  A classic example is joining the fact table of a star schema to dimension or snowflake tables.  Other examples include any time a table is being joined to small reference or code-lookup tables.  In these situations, Omnidex indexes will be favored for resolving criteria; however, the smaller table will frequently be loaded into a Hashed Data Cache.
-A classic example occurs when joining an INDIVIDUALS table to several reference tables, one for GENDERS, one for MARITAL_STATUSES, and one for EDUCATION.  Each of these reference tables has less then 20 rows, meaning that the same rows will be repeated accessed throughout the scan of the INDIVIDUALS table.  Assuming two GENDERS --- Male and Female --- a scan of one million INDIVIDUALS would result in each GENDERS row being accessed approximately a half-million times.
+The Hashed Data Cache temporarily caches these smaller tables into memory and create hash indexes for the join column.  Once the cache is loaded, all subsequent access is memory-resident and does not require access to the filesystem.  This yields a tremendous gain in performance.
-The Hashed Data Cache will temporarily cache these smaller tables into memory and create hash indexes for the join column.  This means that the underlying data will only be accessed once, and all other access will be memory-resident.  This yields a tremendous gain in performance.
 This temporary cache is unique to a connection, and only lasts the duration of the query.  This means that the Hashed Data Cache will work fine in environments that use online updates.
-The Hashed Data Cache is limited to 32MB of memory per table by default.  This can be adjusted using the
+=== Hashed Data Caches in Query Plans ===
+If a Hashed Data Cache is used in a query, it can be seen using [[admin:optimization:plans:home|Omnidex query plans]].  The following query plan shows a "Build cache" step that builds the Hashed Data Cache, and then a subsequent "Retrieve" step that retrieves from the Hashed Data Cache.
+<code>
+----------------------------------- SUMMARY -----------------------------------
+Select        INDIVIDUALS.NAME,
+              GENDERS.DESCRIPTION
+  from        INDIVIDUALS
+  join        GENDERS on INDIVIDUALS.GENDER = GENDERS.GENDER;
-Caching is a common method of optimizing queries in Omnidex, as with other relational databases.  Omnidex works with caching in three areas:
+Version:      5.2.01  (Compiled Feb  2 2012 21:29:57)
+Warnings:     SEQUENTIAL_SCAN
+----------------------------------- DETAILS -----------------------------------
+Build cache {1} as (SELECT DESCRIPTION, GENDER FROM GENDERS) on 2;
+Retrieve INDIVIDUALS sequentially;
+ Retrieve {1} using PK_GENDER = INDIVIDUALS.GENDER;
+ Return INDIVIDUALS.NAME, GENDERS.DESCRIPTION;
+-------------------------------------------------------------------------------
+</code>
-  * **Data Caches** - Omnidex creates temporary caches of data during some query processing in order to minimize access to the underlying database.  Typically, this occurs when smaller tables are joined to from a much larger table, such as in a star schema data warehouse where a fact table joins to many dimension and snowflake tables.  In these cases, Omnidex will load the smaller tables into a hashed data cache, allowing the join to occur in memory rather than requiring repeated access to the disk.
+=== Adjusting the Cache Size ===
-  * **Qualification Caches** - Omnidex is able to recognize patterns in queries, and will try to reuse previous qualifications when appropriate.  A very common scenario involves a succession of SQL statements, each of which adds another piece of criteria.  Another common scenario involves queries that begin with requesting a count of rows, and then either asks for aggregations or detailed rows using the same criteria.  In both of these cases, Omnidex is able to reuse earlier qualifications, greatly improving performance.
+The Hashed Data Cache size defaults to 32MB of memory per query.  This can be adjusted using the HDC_THRESHOLD option of the SELECT statement, which sets the maximum cache size, measured in MB.
-  * **File Caches** - Omnidex relies on file caching from the operating system in order to achieve high performance.  This file caching is performed automatically by the operating system and benefits Omnidex, but only if certain memory requirements are met.
-All of these caches are designed to support standard Omnidex applications, and they are activated automatically.  These caches also reflect that Omnidex is most frequently used in read-only environments.  These caches can be deactivated in read-write environments as needed.
 =====  =====
 ----
 \\
-**[[admin:optimization:caches:hdc|Next]]**
+**[[admin:optimization:caches:home|Prev]]** |
+**[[admin:optimization:caches:cachequal|Next]]**
 ====== Additional Resources ======
 See also:
-{{page>:admin:indexing:see_also&nofooter&noeditbtn}}
+{{page>:admin:optimization:see_also&nofooter&noeditbtn}}
 {{page>:bottom_add&nofooter&noeditbtn}}

Back to top

admin/optimization/caches/hdc.1328565832.txt.gz · Last modified: 2016/06/28 22:38 (external edit)

Sitemap Recent Changes