Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
admin:indexing:concepts:criteria [2011/01/17 22:20]
els
admin:indexing:concepts:criteria [2016/06/28 22:38] (current)
Line 1: Line 1:
 +~~NOTOC~~
 +
 {{page>:​top_add&​nofooter&​noeditbtn}} {{page>:​top_add&​nofooter&​noeditbtn}}
  
-====== Administration:​ Indexing ​Strategies ​======+====== Administration: ​Omnidex ​Indexing ======
  
 ===== Indexing Concepts ===== ===== Indexing Concepts =====
Line 16: Line 18:
 ==== Criteria ==== ==== Criteria ====
  
-One of the most common ​purposes of indexing ​is to help process criteria.  ​Consider the index in the back of book; it allows ​the user to look up topic and quickly find the pages explaining that topic.  ​When looking ​for a specific pagean index is certainly faster than reading each individual row.  ​In databasesthe same rule applies.  ​An index is especially helpful ​to locate an individual row, especially when the index is on Customer ID or E-mail Address column.+=== Using a Single Index === 
 + 
 +The most common ​reason to use an index is to help identify rows based on a unique value, such as a Customer ID or an Email address.  ​All relational databases have this capability, and Omnidex does as well.  Omnidex also allows ​you to put these kinds of indexes on raw data files or delimited files - something they don't inherently have. 
 + 
 +=== Using Multiple Indexes === 
 + 
 +Relational databases also allow you to put an index on non-unique value, such as a State or Zip code.  Omnidex does as well, but Omnidex goes an important step further. ​ Omnidex allows multiple indexes to be used in the same query.  ​If your query searches ​for all people in particular State or Zip Codewith a particular Name and several other criteria, Omnidex will intersect the results of multiple ​index searches to identify the appropriate rows.  ​Many relational ​databases ​pick an index and then begin to retrieve rows from the database.  ​It is much more efficient ​to intersect ​the results of multiple ​index searches, and this is one of the keys of Omnidex'​s speed. 
 + 
 +Omnidex can also intersect the results of multiple indexes across multiple tables. ​ This is a very powerful feature of Omnidex because table joins can be one of the most time-consuming operations in databases ​Omnidex can qualify in one table, join within the indexes to the next table and then continue qualifying from that point on. 
 + 
 +Omnidex differs from relational databases in that you tend to index most or all of the columns in your tables. ​ They are highly compressed and can be built extremely quickly so that you can afford to have so many indexes. ​ The queries truly benefit from this approach.  ​
  
-Sometimes an index is less helpful.  ​When an index entry points ​to too many pages in the book, people get frustrated looking through all the pages. ​ First, they have to look at the index and then they have to find each page number. ​ That means a fair amount of looking around in the book, but if it only happens a few times, it is still faster. ​ But if an index were to point to 25% of the pages in book, it would get a little ridiculous. ​ The same principle applies to databases.  ​Indexes ​are valuable for retrieving a small percentage of rows in the databases, but they lose their advantage when retrieving a larger percentage of the rows.  ​+=== When to Avoid Indexes ​===
  
-Using multiple ​indexes ​at the same time makes this much more tolerable.  ​If Zip Code, Last Name and First name all retrieve a larger percentage individuallythey may only retrieve a small percentage collectively. ​ There may be a lot of people ​in the Zip Code 80301; there may be lot of people with last name of Smith; ​and there may be lot of people with first name of John.  But there are relatively few people named John Smith that live in 80301.  ​+One might think that indexes ​are always ​the best way to process all criteria.  ​Depending on the type of querythis may or may not be true.  When you qualify ​large number ​of rows in a table using an index, you may slow down the query. ​ Using an index to find single row in database takes a few disk reads to search the index and then disk read to find the row in the table. ​ This small handful ​of disk reads is lot better than reading the whole table without an index.  But when you have to read a large percentage of the rows in a table, the overhead of using the index isn't justified.  ​
  
-As general rule of thumb, it is valuable to use indexing for processing criteria when isolating less than 1% of the rows in a table.  ​If isolating more than 1% of the rows, the overhead of indexes exceeds its gains, and it is better to simply scan the data.  ​This is similar to reading a book.  If you want to understand the book as a whole, you read it from start to finish, or at least skim through the important sections. ​ If you want to look up something specific, you use the index to find it.+A sequential table scan can read multiple rows with each disk read.  The exact number varies, but if table scan reads 100 rows with each disk readthen it could read 1,​000,​000 ​rows in only 10,000 disk reads.  ​That's fairly efficient. ​ Accessing all of those rows with an index would take 1,000,000 disk reads, plus the overhead of searching ​the index.  ​That's fairly inefficient
  
-Omnidex automatically ​considers ​this issue when optimizing SQL queries. ​ Omnidex ​maintains a configuration ​setting called the QUALIFICATION_THRESHOLD ​which controls whether to use an indexes to resolve criteria, or to use a table scan.  The QUALIFICATION_THRESHOLD defaults ​to 1%, meaning that indexes will be used to isolate less than 1% of the data, and table scans will be use to isolate more than 1% of the data.  This [[admin:​optimization:​config:​qualification|setting]] can be overridden as needed.+The general rule of thumb is to use an index when you are retrieving less than 1% of the database.  ​Omnidex automatically ​follows ​this rule of thumb, and you may find that some queries ​don't use indexes for this very reason.  Omnidex ​also allows this threshold to be configured. ​ A setting called the QUALIFICATION_THRESHOLD ​is set to 1% by defaultbut can be [[admin:​optimization:​config:​thresholds|changed]] if needed.
  
-The [[admin:​indexing:​basic:​criteria|following article]] explains basic strategies for indexing criteria.+====   ​==== 
 +The [[admin:​indexing:​strategies:​criteria|following article]] explains basic strategies for indexing criteria.
  
 =====  ===== =====  =====
  
-**[[admin:​indexing:​concepts:​overview|Prev]]** |+**[[admin:​indexing:​concepts:​home|Prev]]** |
 **[[admin:​indexing:​concepts:​joins|Next]]** **[[admin:​indexing:​concepts:​joins|Next]]**
  
 
Back to top
admin/indexing/concepts/criteria.1295302805.txt.gz · Last modified: 2016/06/28 22:38 (external edit)