Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
admin:indexing:concepts:criteria [2011/01/17 22:20]
els
admin:indexing:concepts:criteria [2012/01/20 18:07]
doc
Line 1: Line 1:
 +~~NOTOC~~
 +
 {{page>:​top_add&​nofooter&​noeditbtn}} {{page>:​top_add&​nofooter&​noeditbtn}}
  
-====== Administration:​ Indexing ​Strategies ​======+====== Administration: ​Omnidex ​Indexing ======
  
 ===== Indexing Concepts ===== ===== Indexing Concepts =====
Line 16: Line 18:
 ==== Criteria ==== ==== Criteria ====
  
-One of the most common ​purposes of indexing ​is to help process criteria.  ​Consider the index in the back of book; it allows ​the user to look up topic and quickly find the pages explaining that topic.  ​When looking ​for a specific pagean index is certainly faster than reading each individual row.  ​In databasesthe same rule applies.  ​An index is especially helpful ​to locate an individual row, especially when the index is on Customer ID or E-mail Address column.+=== Using a Single Index === 
 + 
 +The most common ​reason to use an index is to help identify rows based on a unique value, such as a Customer ID or an Email address.  ​All relational databases have this capability, and Omnidex does as well.  Omnidex also allows ​you to put these kinds of indexes on raw data files or delimited files - something they don't inherently have. 
 + 
 +=== Using Multiple Indexes === 
 + 
 +Relational databases also allow you to put an index on non-unique value, such as a State or Zip code.  Omnidex does as well, but Omnidex goes an important step further. ​ Omnidex allows multiple indexes to be used in the same query.  ​If your query searches ​for all people in particular State or Zip Codewith a particular Name and several other criteria, Omnidex will intersect the results of multiple ​index searches to identify the appropriate rows.  ​Many relational ​databases ​pick an index and then begin to retrieve rows from the database.  ​It is much more efficient ​to intersect ​the results of multiple ​index searches, and this is one of the keys of Omnidex'​s speed. 
 + 
 +Omnidex can also intersect the results of multiple indexes across multiple tables. ​ This is a very powerful feature of Omnidex because table joins can be one of the most time-consuming operations in databases ​Omnidex can qualify in one table, join within the indexes to the next table and then continue qualifying from that point on. 
 + 
 +Omnidex differs from relational databases in that you tend to index most or all of the columns in your tables. ​ They are highly compressed and can be built extremely quickly so that you can afford to have so many indexes. ​ The queries truly benefit from this approach.  ​
  
-Sometimes an index is less helpful.  ​When an index entry points ​to too many pages in the book, people get frustrated looking through all the pages. ​ First, they have to look at the index and then they have to find each page number. ​ That means a fair amount of looking around in the book, but if it only happens a few times, it is still faster. ​ But if an index were to point to 25% of the pages in book, it would get a little ridiculous. ​ The same principle applies to databases.  ​Indexes ​are valuable for retrieving a small percentage of rows in the databases, but they lose their advantage when retrieving a larger percentage of the rows.  ​+=== When to Avoid Indexes ​===
  
-Using multiple ​indexes ​at the same time makes this much more tolerable.  ​If Zip Code, Last Name and First name all retrieve a larger percentage individuallythey may only retrieve a small percentage collectively. ​ There may be a lot of people ​in the Zip Code 80301; there may be lot of people with last name of Smith; ​and there may be lot of people with first name of John.  But there are relatively few people named John Smith that live in 80301.  ​+One might think that indexes ​are always ​the best way to process all criteria.  ​Depending on the type of querythis may or may not be true.  When you qualify ​large number ​of rows in a table using an index, you may slow down the query. ​ Using an index to find single row in database takes a few disk reads to search the index and then disk read to find the row in the table. ​ This small handful ​of disk reads is lot better than reading the whole table without an index.  But when you have to read a large percentage of the rows in a table, the overhead of using the index isn't justified.  ​
  
-As general rule of thumb, it is valuable to use indexing for processing criteria when isolating less than 1% of the rows in a table.  ​If isolating more than 1% of the rows, the overhead of indexes exceeds its gains, and it is better to simply scan the data.  ​This is similar to reading a book.  If you want to understand the book as a whole, you read it from start to finish, or at least skim through the important sections. ​ If you want to look up something specific, you use the index to find it.+A sequential table scan can read multiple rows with each disk read.  The exact number varies, but if table scan reads 100 rows with each disk readthen it could read 1,​000,​000 ​rows in only 10,000 disk reads.  ​That's fairly efficient. ​ Accessing all of those rows with an index would take 1,000,000 disk reads, plus the overhead of searching ​the index.  ​That's fairly inefficient
  
-Omnidex automatically ​considers ​this issue when optimizing SQL queries. ​ Omnidex ​maintains a configuration ​setting called the QUALIFICATION_THRESHOLD ​which controls whether to use an indexes to resolve criteria, or to use a table scan.  The QUALIFICATION_THRESHOLD defaults ​to 1%, meaning that indexes will be used to isolate less than 1% of the data, and table scans will be use to isolate more than 1% of the data.  This [[admin:​optimization:​config:​qualification|setting]] can be overridden as needed.+The general rule of thumb is to use an index when you are retrieving less than 1% of the database.  ​Omnidex automatically ​follows ​this rule of thumb, and you may find that some queries ​don't use indexes for this very reason.  Omnidex ​also allows this threshold to be configured. ​ A setting called the QUALIFICATION_THRESHOLD ​is set to 1% by defaultbut can be [[admin:​optimization:​config:​thresholds|changed]] if needed.
  
-The [[admin:​indexing:​basic:​criteria|following article]] explains basic strategies for indexing criteria.+====   ​==== 
 +The [[admin:​indexing:​strategies:​criteria|following article]] explains basic strategies for indexing criteria.
  
 =====  ===== =====  =====
  
-**[[admin:​indexing:​concepts:​overview|Prev]]** |+**[[admin:​indexing:​concepts:​home|Prev]]** |
 **[[admin:​indexing:​concepts:​joins|Next]]** **[[admin:​indexing:​concepts:​joins|Next]]**
  
 
Back to top
admin/indexing/concepts/criteria.txt · Last modified: 2016/06/28 22:38 (external edit)