Home

Getting Started

Utilities

Indexing

Omnidex

Development

Tutorials

Quick Links

 

OMNIDEX

Omnidex Text

Setup

Options

MISSPELLINGS Function

Examples

 

$CONTAINS

UPDATE TEXT

 

Omnidex Text

 

Misspelling Searches

Omnidex supports the ability to locate words that may be misspelled in the data. In these searches, Omnidex is not searching for common misspellings of known words, but rather is evaluating the data itself to identify possible misspellings. Given the variety of ways that words are misspelled, this is much more powerful than searching for common misspellings.

Misspellings may occur against words in the English language, proper names or other terms not found in a dictionary. Omnidex uses a combination of phonetic and typographical algorithms to compare a given word to the words that have been indexed for a column. These words are then scored according to how closely they match and how infrequently the word occurs. The search is then expanded to include these words.

All words that Omnidex identifies as a possible misspelling are not guaranteed to be a misspelling, however. It is often not possible to differentiate between an unusually spelled word, a nearly identically spelled word and an actual misspelling. Omnidex does consider how frequently a word occurs on the presumption that misspelled words occur less frequently than properly spelled words. Nevertheless, a misspelling search may retrieve a handful of similarly and correctly spelled words as well as true misspellings.

The number of possible misspellings that are found will vary greatly based on the underlying data. If data is relatively free of misspellings, then the only words returned are likely to be variations of spelling against the original word. If the data contains a variety of misspellings, then a large number of words are likely to be returned, with most being possible misspellings.

As an example, we searched the DISC technical support database looking for misspellings of the word "Omnidex". Here are the results:

Omnnidex

Omnideex

Omnidex'

Omnidex2

Omnidexs

Omnindex

Omnmidex

Omnidx

Omidex

Onidex

Onmidex

Omnudex

Omnidex'D

Omnidexed

Omnidxe

Omndiex

Omindex

Omnidex'S

Omnidexes

Odmidex

Omnidev

Omnidex-Ed

Omnidex-Hp

Omnidexing

Omnidexkey

 

Setup

To enable misspellings searches, you MUST execute the UPDATE TEXT command in ODXSQL to generate spelling dictionaries for the words in a column. This step must be performed AFTER the Omnidex indexes have been built. All misspelling searches are derived from these dictionaries.

At present, these dictionaries are not updated as new rows are added to the database. However, it is uncommon that a word is added that has never been added before, particularly in large databases. Therefore, it is only necessary to update the dictionaries with the UPDATE TEXT command on a periodic basis.

The UPDATE TEXT command can be performed against an entire database, a table or just a single column.

Misspelling searches are accomplished by using the MISSPELLINGS option of the $CONTAINS function or by using the MISSPELLINGS function in a QUALIFY or oaqualify statement, directly in the qualification criteria. Use the $LOOKUP function with the MISSPELLINGS option to retrieve the actual misspelled words in the select list.

 

Options

There are currently no options available for use in a Misspellings search.

 

 

 

MISSPELLINGS Function

Syntax

MISSPELLINGS(criteria[, approach[, options]])

criteria
Required. The qualification criteria to be expanded to include misspellings. Each word will be expanded to include the possible misspellings that occur in the data.

approach
Optional. This option is currently ignored.

options
Optional. See Options (below) for a list of valid options.

 

 

Examples

The following two statements will produce the same results.

QUALIFY ARTICLES WHERE CONTENT = 'MISSPELLINGS(ACETAMINOPHEN)'

SELECT CONTENT FROM ARTICLES WHERE $CONTAINS(CONTENT,'ACETAMINOPHEN','MISSPELLINGS')

You can also define the misspellings option in the environment catalog. The following environment catalog column declaration will cause Omnidex to automatically look for add possible misspellings on the keyword to the qualification criteria, using the AUTOENABLE option.

COLUMN CONTENT DATATYPE C STRING(20K) MISSPELLINGS 'AUTOENABLE'
...

SELECT CONTENT FROM ARTICLES WHERE CONTENT='ACETAMINOPHEN'

 

 

 

 

Top