Administration: Omnidex Indexing

Omnidex Text

Proximity Searches

FullText Indexes are a specialized index designed for large blocks of text, such as abstracts, articles, and text documents. FullText Indexes parse the contents of the column so that each word is indexed separately. FullText Indexes also track the position of each word in the field to aid in providing relevancy scores. Queries can use special syntax to require that one word be a certain distance from another word, or adjacent as a phrase. FullText Indexes necessarily have more overhead than QuickText Indexes.

Searches that evaluate the distance between one word and another are called Proximity Searches. Proximity Searches have an extended syntax, with special operators. Note that Proximity Searches are only possible against columns that are indexed using FullText indexes.

There are three basic categories of Proximity Searches:

Phrase Searches

A phrase search finds occurrences of between two and eight words adjacent to each other in the column. To specify phrases in the search criteria, enclose the phrase in double quotation marks. The following example searches for the phrase “No place like home” in the sample BOOKS database, finding the Wizard of Oz.

> select title from books where content = '"no place like home"';

TITLE
-----------------------------------------------------------------------------
The Wonderful Wizard of Oz

BEFORE Searches

A BEFORE search is an expansion on a Phrase Search. The BEFORE operator used in qualification criteria allows more control over how many words are allowed between two keywords. In fact, a Phrase Search is simply a special implementation of a BEFORE search.

The BEFORE operator is used between two words, as in “word1 BEFORE(n) word2”. The BEFORE operator accepts an optional parameter containing a value between 1 and 999, representing the number of words by which word1 may precede word2. If no value is provided, the default value of 10 is used. The Phrase Search example above could also be submitted as: no BEFORE place BEFORE like BEFORE home, or no BEFORE(1) place BEFORE(1) like BEFORE(1) home.

The following example searches for the word place within 25 words before the word home. Note that the use of operators within SQL criteria requires that the criteria be enclosed in parentheses, indicating that the criteria is really an expression with special operators, rather than a literal string.

> select title from books where content = '(place before(25) home)';

TITLE
-----------------------------------------------------------------------------
Around the World in Eighty Days
The Wonderful Wizard of Oz

This example immediate raises the question of how to see the locations in the book where this criteria is met, and also how to sort the results based on relevancy. The next two pages of this section describe functions that can be used to meet these requirements.

NEAR Searches

A NEAR search is nearly the same as a BEFORE search, except that the words can be in any relative order to each other. The NEAR operator is used between two words, as in “word1 NEAR(n) word2”. The NEAR operator accepts an optional parameter containing a value between 1 and 999, representing the number of words by which word1 may precede or follow word2. If no value is provided, the default value of 10 is used.

The following example searches for the word home within 25 words near the word place. The words may be in any order relative to each other.

> select title from books where content = '(place near(25) home)';

TITLE
-----------------------------------------------------------------------------
Around the World in Eighty Days
The Adventures of Tom Sawyer
The Wonderful Wizard of Oz

Any search against a FullText index is inherently a Proximity Search. Specifically, the NEAR(999) operator is used whenever there is no other operator before words. This means that the following two examples are synonymous:

> select title from books where content = 'place home';

TITLE
-----------------------------------------------------------------------------
Alice's Adventures in Wonderland
Around the World in Eighty Days
Hamlet
The Adventures of Tom Sawyer
The Wonderful Wizard of Oz

> select title from books where content = '(place NEAR(999) home)';

TITLE
-----------------------------------------------------------------------------
Alice's Adventures in Wonderland
Around the World in Eighty Days
Hamlet
The Adventures of Tom Sawyer
The Wonderful Wizard of Oz

Proximity Search Limitations

There are some limitations to Proximity Searches. Columns indexed with the Proximity option are limited to 4 million keywords. The Proximity option cannot be used on pre-joined indexes. Lastly, Proximity Searches only pay attention to word proximity, and not to semantics, sentence structure or context.

Additional Resources

See also:

 
Back to top
admin/indexing/text/proximity.txt · Last modified: 2016/06/28 22:38 (external edit)