This is an old revision of the document!


Administration: Omnidex Indexing

Omnidex Text

Proximity Searches

FullText Indexes are a specialized index designed for large blocks of text, such as abstracts, articles, and text documents. FullText Indexes parse the contents of the column so that each word is indexed separately. FullText Indexes also track the position of each word in the field to aid in providing relevancy scores. Queries can use special syntax to require that one word be a certain distance from another word, or adjacent as a phrase. FullText Indexes necessarily have more overhead than QuickText Indexes.

Searches that evaluate the distance between one word and another are called Proximity Searches. Proximity Searches have an extended syntax, with special operators. Note that Proximity Searches are only possible against columns that are indexed using FullText indexes.

There are three basic categories of Proximity Searches:

Phrase Searches

A phrase search finds occurrences of between two and eight words adjacent to each other in the column. To specify phrases in the search criteria, enclose the phrase in double quotation marks. For example, to look for the phrase cell phone, search for the criteria of “cell phone”.

BEFORE Searches

BEFORE Searches A BEFORE search is an expansion on a Phrase Search. The BEFORE operator used in qualification criteria allows more control over how many words are allowed between two keywords. In fact, a Phrase Search is simply a special implementation of a BEFORE search.

The BEFORE operator is used between two words, as in “word1 BEFORE(n) word2”. The BEFORE operator accepts an optional parameter containing a value between 1 and 999, representing the number of words by which word1may proceed word2. The Phrase Search example above could also be submitted as: cell BEFORE phone, or cell BEFORE(1) phone.

A Phrase Search with “cell phone” would not find the sentence:

“Our company markets cell and mobile phones.”

To locate this sentence, the search criteria must be: cell BEFORE(3) phone. Any parameter of 3 or above would similarly find this sentence.

If the BEFORE operator is used without a parameter, it defaults to 10 words. This provides the most relaxed and flexible search, and can dramatically increase the total number of rows found. In these situations, queries are often sorted by their relevancy score using the $SCORE function. This allows rows with close proximity to be presented first, while still including rows with distant proximity.

NEAR Searches A NEAR Search is very similar to a BEFORE Search. The only difference is that NEAR allows words to be in any order.

Neither a Phrase Search with “cell phone”, nor a BEFORE Search with cell BEFORE phone would find the sentence:

“The call was dropped because we traveled outside of the mobile phone cell”

To locate this sentence, the search criteria must be: cell NEAR(1) phone, or cell NEAR phone.

If the NEAR operator is used without a parameter, it also defaults to 10 words. This provides the most relaxed and flexible search, and can dramatically increase the total number of rows found. In these situations, queries are often sorted by their relevancy score using the $SCORE function. This allows rows with close proximity to be presented first, while still including rows with distant proximity. Most search engines use NEAR Searches rather than BEFORE Searches to provide the greatest search flexibility. Results are then presented in order of the relevancy score.

There are some limitations to Proximity Searches. Columns indexed with the Proximity option are limited to 4 million keywords. The Proximity option cannot be used on pre-joined indexes. Lastly, Proximity Searches only pay attention to word proximity, and not to semantics, sentence structure or context.

Figure 8 - The BEFORE and NEAR Operators

BEFORE[(n)] NEAR[(n)]

n	A number between 1 and 999 representing the number of other words allowed between 			the two requested words.  The default value is 10.

Figure 9 - Examples of Using the BEFORE and NEAR Operators

The following example shows a proximity search using the QUALIFY statement:

Qualify CATALOG where CONTENT = ‘cell BEFORE(3) phone’

The following example shows a proximity search using the $CONTAINS function of a SELECT statement:

Select … from CATALOG 
	where $contains(CONTENT, ‘cell BEFORE(3) phone’)

Proximity searches are automatically performed when criteria is submitted against columns installed with the Proximity option. However, the default processing of that criteria can be overridden with the PROXIMITY option.

Figure 10 - The PROXIMITY Function

PROXIMITY(‘criteria’[,’options’]])

criteria The qualification criteria to be converted using the passed options.

options An optional parameter that controls options for the function. If no option is supplied, then NEAR(999) is assumed as the default.

PHRASE Convert all unquoted spaces to BEFORE(1) operators, producing a phrase search.

BEFORE(n) Convert all unquoted spaces to BEFORE(n) operators.

NEAR(n) Convert all unquoted spaces to NEAR(n) operators.

Additional Resources

See also:

 
Back to top
admin/indexing/text/proximity.1328032074.txt.gz · Last modified: 2016/06/28 22:38 (external edit)