Administration: Omnidex Indexing

Indexing Types and Options

Index Options

Omnidex supports a variety of options that can be applied to indexes. Not every option can be used with every index type, so the tables below show the options and when they can be used.

Index Options For Each Index Type

This table shows the options that are available for each index type. Note that Custom Indexes can use all options, and so these indexes should be used to achieve specific combinations that are not allowed in the other types. In the table below, combinations that are shaded and contain a dash are disallowed.

Omnidex QuickText FullText Custom
KEYWORDING - Implied Implied
CASE_SENSITIVE Default
CASE_INSENSITIVE Default Default
PROXIMITY - - Implied
PHONETIC - - -
EXCLUDED_WORDS - Implied Implied
BITMAP - -
STANDALONE - - -
RECORD_SPECIFIC - - -
RECORD_COMPLEX - - -
PREJOIN table - - -

Index Option Compatibility

This table shows the compatibility of index options with each other. Some index options are mutually exclusive, and some options imply other options. In the table below, combinations that are marked with “Implied” mean that the option to the left implies the option on the top. Combinations that are shaded and contain a dash are disallowed.

KW CS CI PX PH EW BM SA RS RC PJ
KEYWORDING KW - Implied Implied - -
CASE_SENSITIVE CS - -
CASE_INSENSITIVE CI - -
PROXIMITY PX Implied Implied - Implied - - - - -
PHONETIC PH - -
EXCLUDED_WORDS EW -
BITMAP BM - - - - Implied - - -
STANDALONE SA - - - - - - -
RECORD_SPECIFIC RS - - - - -
RECORD_COMPLEX RC - - - - -
PREJOIN table PJ - - - -

Option Descriptions

KEYWORDING

The KEYWORDING option causes the contents of the index to be parsed into keywords based on spaces, punctuation marks and special characters. The individual keywords are indexed and can be qualified separately in queries. For example, if a column containing “John Q. Doe” is indexed with KEYWORDING, it can be qualified using any of the following criteria:

  • “John”
  • “Doe”
  • “John Doe”
  • “Doe, John

Unless overridden, the KEYWORDING option implies the CASE_INSENSITIVE and EXCLUDED_WORDS options

Special Characters During Indexing

The following characters are stripped if they are leading characters on parsed keywords:

<space> ! ” # $ % & ' ( ) * + , - / : ; < = > ? @ [ \ ] ^ _ ` { | } ~

Also . (period) if not followed by digits or in the middle of text.

4.5 is indexed.
a.b is indexed.

The following characters are stripped if they are trailing characters on parsed keywords

<space> ! “ ( ) * + , - . / : ; < = > ? @ [ \ ] ^ ` { | } ~

The following characters are considered keyword delimiters on parsed indexes

<space> ! ” ( ) * + , : ; < = > ? @ [ \ ] ^ ` { | } ~

The following characters carry special parsing rules.

- / .

Hewlett-Packard is indexed as:

Hewlett
Packard
hewlett-packard
Special Characters on SELECT retrievals

The following characters are reserved low-level operator characters for Omnidex searches. These operators are not supported in the SQL SELECT interface, however, they are usable in the QUALIFY statement.

<space> ( ) ; * + , - . : < = > @ and leading $ or &

Due to the way the underlying Omnidex indexing routines treat special characters Omnidex's SELECT statement must take steps if it determines that these special characters in SELECT… WHERE clauses should be treated as literals, or should be removed from the criteria.

In the QUALIFY statement, these special characters cause special processing such as MultiFind from a file:

<space> ( ) ; * + , - . : < = > @ and leading $ or &

CASE_SENSITIVE

The CASE_SENSITIVE option causes the contents of the index to be case-sensitive. This will require that criteria use the same combination of upper-case and lower-case letters as found in the data.

The CASE_SENSITIVE and CASE_INSENSITIVE options are mutually exclusive.

CASE_INSENSITIVE

The CASE_INSENSITIVE option causes the contents of the index to be case-insensitive. This allows the criteria to match the data regardless of the combinations of upper-case and lower-case letters.

The CASE_SENSITIVE and CASE_INSENSITIVE options are mutually exclusive.

PROXIMITY

The PROXIMITY option is similar to the KEYWORDING option; however, it also records the relative location of the keywords within the index. This allows the use of extended operators that compare the location of one keyword against another, such as the NEAR, BEFORE, and AFTER operators. The PROXIMITY option also enables the use of the $CONTEXT and $SCORE functions in the SQL Engine. Consult the article on Omnidex Text for more information on these operators and functions.

The PROXIMITY option implies the KEYWORDING option, and unless overridden, implies the CASE_INSENSITIVE and EXCLUDED_WORDS options.

PHONETIC

The PHONETIC option causes the contents of the index to include the phonetic equivalent of the data, as well as the original data itself. The PHONETIC option is required to use the PHONETIC option in a $CONTAINS function. Consult the article on Omnidex Text for more information on these operators and functions.

EXCLUDED_WORDS

The EXCLUDED_WORDS option causes columns containing only spaces, or in the case of binary data, containing only zeros, to be omitted from the indexes.

The EXCLUDED_WORDS option is implied by the KEYWORDING and PROXIMITY options.

BITMAP

The BITMAP option causes the index to use a bitmap storage format. This storage format improves qualification speed and saves disk space for low-cardinality data. This option is only appropriate for columns with less than 32 distinct values.

The BITMAP option implies the STANDALONE option.

STANDALONE

The STANDALONE option causes the index to be stored in a separate file. Indexes with the STANDALONE option can be built individually using the UPDATE INDEXES statement. Non-textual indexes that are greater than 32 bytes will automatically use the STANDALONE option. Non-textual indexes include Omnidex Indexes, or any Custom Index that does not invoke the KEYWORDING or PROXIMITY option.

RECORD_SPECIFIC

The RECORD_SPECIFIC option is used in conjunction with the PREJOIN option to create a pre-joined index. The RECORD_SPECIFIC option indicates that the index in the pre-joined child table should record information about both the row and the parent. This provides a powerful ability to join between multiple children in a parent. At the same time, these indexes are more expensive to build and are slower in base qualifications. Consult the article on Prejoined Indexes for more information about this option.

The RECORD_SPECIFIC and RECORD_COMPLEX options are mutually exclusive.

RECORD_COMPLEX

The RECORD_COMPLEX option is used in conjunction with the PREJOIN option to create a pre-joined index. The RECORD_COMPLEX option indicates that the index in the pre-joined child table should record information about only the parent. This is valuable when a single column in a pre-joined child is used as criteria. This provides a powerful ability to join between a child and its parent. Consult the article on Prejoined Indexes for more information about this option.

The RECORD_SPECIFIC and RECORD_COMPLEX options are mutually exclusive.

PREJOIN table

The PREJOIN option is used to pre-join a RECORD_SPECIFIC or RECORD_COMPLEX index with a parent table. This is required for Prejoined Indexes. Consult the article on Prejoined Indexes for more information about this option.

Additional Resources

See also:

 
Back to top
admin/indexing/indexes/options.txt · Last modified: 2012/10/26 14:57 (external edit)