Home

Getting Started

Utilities

Indexing

Omnidex

Development

Tutorials

Quick Links

 

OMNIDEX

Omnidex Text

Phrase Searches

BEFORE Searches

NEAR Searches

Limitations

Example

 

Omnidex Text

 

Proximity Searches

A Proximity search searches for keywords within a certain proximity to each other in a text field or document. This means that the user can specify that only records that contain keywords in a certain order or within a certain number of words of each other, be qualified.

Proximity searches are enabled by installing the ;PX option on a textual data type field. This will cause Omnidex to index keywords with information about word positions within the column. Proximity searches are automatically invoked on columns installed with proximity for all searches where the criteria contains multiple keywords, defaulting to near(256).

The position and proximity of the keywords can be specified using the before(n) or near(n) functions. These functions are "QUALIFY" syntax, meaning if used in a SELECT statement, the criteria must be enclosed in parenthesis or preceded by the Omnidex Sentinel Character.
'(white before(1) house)'
Note that this example can also be interpreted as a phrase search which can be accomplished by enclosing the phrase in double quotes.
'"white house"'
Also note that the single quotes in both examples are required. See Examples (below) for detailed examples of how these are used in a $CONTAINS or other criteria predicate.

Proximity searches are allowed as criteria in the oaqualify routine, the QUALIFY statement, and the $CONTAINS function of a SELECT statement, and may be performed on any column installed with the ;PX option. Rank and relevancy scores and context excerpts are only available, however, when a $CONTAINS criteria predicate is used.

There are three basic categories of Proximity Searches: Phrase Searches, BEFORE Searches, NEAR Searches.

 

Phrase Searches

A phrase search finds occurrences complete phrases up to eight in length. Normal keyword searches enclose the criteria in single quotes, like company = 'systems'. Phrase search criteria is enclosed in single quotes AND double quotes. The following qualify statements illustrate the differences.

This first qualify, which is a simple keyword search, finds the keywords white and house occur 381 times in 372 rows.

> qualify news where headline='white house'
381 matches (372 NEWS records) qualify

This second qualify, which results in a true phrase search, finds the phrase white house occurs 370 times in 370 rows, once in each row.

> qualify news where headline='"white house"'
370 matches (370 NEWS records) qualify

This final qualify, which again is a simple keyword search, finds the keywords white and house occur 381 times in 372 rows. Exactly like the first qualify.

> qualify news where headline="white house"
381 matches (372 NEWS records) qualify

From the second qualify, we know that the phrase white house occurs in 370 rows. We can then surmise that the keywords white and/or house occur 11 more times but not as a phrase, resulting in two more rows qualified.

 

 

BEFORE Searches

The BEFORE proximity search finds rows where the first word comes before the second word, with 0 or more words between them, up to the number passed in the distance parameter. The BEFORE operator used in qualification criteria allows more control over how many words are allowed between two keywords.

The BEFORE operator is used between two or more words, as in word1 BEFORE(5) word2. The BEFORE operator accepts an optional distance parameter containing a value between 1 and 999 (default 256), representing the number of words that may come between word1 and word2.

select headline from articles where headline = '("high school" before(10) football)'

This example combines a phrase search with a before proximity search. It finds rows where the phrase high school occurs before the word football anywhere in the headline, with no more than 10 words between them.

A Look at Friday's High School Football Action
Super 44 High School All-Star Football Game
Should Your Son Play High School Football?
Complete High School Baseball, Basketball and Football Coverage

Note that if the outer parenthesis are omitted from the above example,
select headline from articles where headline = '"high school" before(10) football'
the criteria will be interpreted as high school near(256) before near(256) football
using the phrase high school and the keywords before and football.

 

 

NEAR Searches

A NEAR search is similar to a BEFORE search except that the second word can come before OR after the first word, separated by 0 or more words, up to the number set in the distance parameter. The distance parameter can be a number from 1 to 999 (default 256), representing the number of words that may come between word1 and word2.

A near(256) search is the default for all searches on a proximity column where two or more words are passed. This means if the words are not double quoted (not a phrase search) and near and before are not specified, column = 'word1 word2', the criteria will be interpreted as column='(word1 near(256) word1)'.

select headline from articles where headline = '("high school" near(10) football)'

This example combines a phrase search with a near proximity search. It finds rows where the phrase high school occurs near the word football anywhere in the headline, with no more than 10 words between them.

A Look at Friday's High School Football Action
Super 44 High School All-Star Football Game
Should Your Son Play High School Football?
Complete High School Baseball, Basketball and Football Coverage
Coaching Football in High School

Note that if the outer parenthesis are omitted from the above example,
select headline from articles where headline = '"high school" near(10) football'
the criteria will be interpreted as high school near(256) near near(256) football
using the phrase high school and the keywords near and football.

 

Limitations

  • The ;PX (Proximity) option cannot be used on pre-joined indexes.
  • Proximity searches pay attention ONLY to word proximity, NOT semantics, sentence structure, or context.
  • Qualifying counts on a column installed with proximity reflect the number of occurrences of the criteria keyword, rather than the number of qualified rows.
  • Multiple criteria predicates on proximity fields cannot be OR'd together within a single select.
  • Columns indexed with the ;PX (Proximity) option are limited to 4 million unique keywords. That is 4 million unique keywords PER ROW. Remember that in large textual documents, many of the keyword will appear multiple times. Therefore, this is not a limit on the size of the column.

 

Example

QUALIFY CATALOG WHERE CONTENT = 'cell BEFORE(3) phone'

SELECT * FROM CATALOG WHERE $CONTAINS(CONTENT,'cell BEFORE(3) phone')

 

Top