Administration: Omnidex Indexing

Indexing Concepts


Using a Single Index

The most common reason to use an index is to help identify rows based on a unique value, such as a Customer ID or an Email address. All relational databases have this capability, and Omnidex does as well. Omnidex also allows you to put these kinds of indexes on raw data files or delimited files - something they don't inherently have.

Using Multiple Indexes

Relational databases also allow you to put an index on a non-unique value, such as a State or Zip code. Omnidex does as well, but Omnidex goes an important step further. Omnidex allows multiple indexes to be used in the same query. If your query searches for all people in a particular State or Zip Code, with a particular Name and several other criteria, Omnidex will intersect the results of multiple index searches to identify the appropriate rows. Many relational databases pick an index and then begin to retrieve rows from the database. It is much more efficient to intersect the results of multiple index searches, and this is one of the keys of Omnidex's speed.

Omnidex can also intersect the results of multiple indexes across multiple tables. This is a very powerful feature of Omnidex because table joins can be one of the most time-consuming operations in databases. Omnidex can qualify in one table, join within the indexes to the next table and then continue qualifying from that point on.

Omnidex differs from relational databases in that you tend to index most or all of the columns in your tables. They are highly compressed and can be built extremely quickly so that you can afford to have so many indexes. The queries truly benefit from this approach.

When to Avoid Indexes

One might think that indexes are always the best way to process all criteria. Depending on the type of query, this may or may not be true. When you qualify a large number of rows in a table using an index, you may slow down the query. Using an index to find a single row in a database takes a few disk reads to search the index and then a disk read to find the row in the table. This small handful of disk reads is a lot better than reading the whole table without an index. But when you have to read a large percentage of the rows in a table, the overhead of using the index isn't justified.

A sequential table scan can read multiple rows with each disk read. The exact number varies, but if a table scan reads 100 rows with each disk read, then it could read 1,000,000 rows in only 10,000 disk reads. That's fairly efficient. Accessing all of those rows with an index would take 1,000,000 disk reads, plus the overhead of searching the index. That's fairly inefficient.

The general rule of thumb is to use an index when you are retrieving less than 1% of the database. Omnidex automatically follows this rule of thumb, and you may find that some queries don't use indexes for this very reason. Omnidex also allows this threshold to be configured. A setting called the QUALIFICATION_THRESHOLD is set to 1% by default, but can be changed if needed.

The following article explains basic strategies for indexing criteria.

Additional Resources

See also:

Back to top
admin/indexing/concepts/criteria.txt ยท Last modified: 2016/06/28 22:38 (external edit)