This is an old revision of the document!


Administration: Indexing Strategies

Indexing Concepts

Table Joins

Indexes are also valuable for processing table joins. The best indexing approach varies depending on whether the join is required for retrieving rows, or simply to qualify rows based on criteria. If a query joins to multiple tables so that their values can be used in criteria; however, the data in that table is not returned to the user, then this is called a Qualification Join.If a query tries to retrieve data from multiple tables and return it to the user, then this is called a Retrieval Join.

Qualification Join

It is common to join to a table so that criteria can be issued against it. The following example shows a join that returns columns from the INDIVIDUALS table to the user, but also joins to the HOUSEHOLDS and STATES tables to process criteria. These joins are Qualification Joins.

select        I.NAME, I.PHONE
  from        INDIVIDUALS I
  join        HOUSEHOLDS H on I.HOUSEHOLD = H.HOUSEHOLD
  join        STATES S on H.STATE = S.STATE
  where       S.TAX_RATE > 5.0;

Omnidex will always attempt to process Qualification Joins in the indexes without ever going to the disk drive, as this dramatically improves performance. If the join columns are indexed with Omnidex, then Omnidex will process the criteria in the STATES table, use indexes to join to the HOUSEHOLDS table, and then use indexes to join to the INDIVIDUALS table.

Generally speaking, Omnidex can do this type of optimization when joining FROM a parent table INTO a child table. In this example, the STATES table is a parent of HOUSEHOLDS, and HOUSEHOLDS is a parent of INDIVIDUALS. It is easy for Omnidex to perform this kind of join since the primary keys are always available in the indexes. It is more difficult for Omnidex to do this type of optimization when joining INTO a parent table FROM a child table. Omnidex will still perform the optimization if it can, but it will not always do so.

Omnidex also provides an advanced featured called pre-joined indexes. Information about the table join is stored in a pre-joined index, making certain types of Qualification Joins even faster. Pre-joined indexes are more expensive to build, but can be faster to search, depending on the nature of the query. The following article explains pre-joined indexes.

Retrieval Join

Some tables are joined so that data can be returned to the user. The following example shows a join that returns columns from the INDIVIDUALS, HOUSEHOLDS and STATES tables. These joins are Retrieval Joins.

select        I.NAME, H.ADDRESS, H.CITY, H.STATE, I.PHONE, S.TAX_RATE
  from        INDIVIDUALS I
  join        HOUSEHOLDS H on I.HOUSEHOLD = H.HOUSEHOLD
  join        STATES S on H.STATE = S.STATE
  where       I.NAME = 'John Smith'

When tables share a fairly specific key, such as a HOUSEHOLD ID, an index makes it easy to retrieve common rows between the two tables. As was the case with criteria, this is only efficient when retrieving a small portion of each table. If two tables are to be joined in their entirety, indexing will actually slow things down. In the example above, the criteria against the NAME column probably isolates a small number of rows. Using indexes for this join makes sense.

If the query retrieved a large portion of each table, indexing would be less helpful. This query retrieves all rows in the United States, which probably comprises a large percentage of this database.

select        I.NAME, H.ADDRESS, H.CITY, H.STATE, I.PHONE, S.TAX_RATE
  from        INDIVIDUALS I
  join        HOUSEHOLDS H on I.HOUSEHOLD = H.HOUSEHOLD
  join        STATES S on H.STATE = S.STATE
  where       H.COUNTRY = 'US';

In this case, it may be faster to use a technique called a Sort-Merge Join. This technique first sorts the two tables and then scans both sorted tables, matching entries as it goes. This can provide a significant improvement in performance.

A third approach can provide even better performance improvements, but requires that the needed portion of one of the tables be fairly small. If a portion of a table can be read into memory with a hashed indexed, then joins to that table will be very fast. The disk drives are hit once to load it in memory, but thereafter, the disk drives are not impacted. This optimization is commonly used when joining a table to dimension tables or code lookup tables. In the example above, the STATES would be small enough to use this technique.

Omnidex automatically chooses between these approaches to table joins, and no intervention is required by the user.

The following article explains basic indexing strategies for optimizing table joins.

Additional Resources

See also:

 
Back to top
admin/indexing/concepts/joins.1295298286.txt.gz ยท Last modified: 2016/06/28 22:38 (external edit)