This is an old revision of the document!


Administration: Indexing Strategies

Indexing Concepts

Criteria

One of the most common purposes of indexing is to help process criteria. Consider the index in the back of book; it allows the user to look up a topic and quickly find the pages explaining that topic. When looking for a specific page, an index is certainly faster than reading each individual row. In databases, the same rule applies. An index is especially helpful to locate an individual row, especially when the index is on a Customer ID or E-mail Address column.

Sometimes an index is less helpful. When an index entry points to too many pages in the book, people get frustrated looking through all the pages. First, they have to look at the index and then they have to find each page number. That means a fair amount of looking around in the book, but if it only happens a few times, it is still faster. But if an index were to point to 25% of the pages in book, it would get a little ridiculous. The same principle applies to databases. Indexes are valuable for retrieving a small percentage of rows in the databases, but they lose their advantage when retrieving a larger percentage of the rows.

Using multiple indexes at the same time makes this much more tolerable. If Zip Code, Last Name and First name all retrieve a larger percentage individually, they may only retrieve a small percentage collectively. There may be a lot of people in the Zip Code 80301; there may be a lot of people with a last name of Smith; and there may be a lot of people with a first name of John. But there are relatively few people named John Smith that live in 80301.

As a general rule of thumb, it is valuable to use indexing for processing criteria when isolating less than 1% of the rows in a table. If isolating more than 1% of the rows, the overhead of indexes exceeds its gains, and it is better to simply scan the data. This is similar to reading a book. If you want to understand the book as a whole, you read it from start to finish, or at least skim through the important sections. If you want to look up something specific, you use the index to find it.

Omnidex automatically considers this issue when optimizing SQL queries. Omnidex maintains a configuration setting called the QUALIFICATION_THRESHOLD which controls whether to use an indexes to resolve criteria, or to use a table scan. The QUALIFICATION_THRESHOLD defaults to 1%, meaning that indexes will be used to isolate less than 1% of the data, and table scans will be use to isolate more than 1% of the data. This setting can be overridden as needed.

The following article explains basic strategies for indexing criteria.

Additional Resources

See also:

 
Back to top
admin/indexing/concepts/criteria.1295295912.txt.gz ยท Last modified: 2016/06/28 22:38 (external edit)