This is an old revision of the document!


$CONTAINS

Syntax

$CONTAINS(column, criteria, [, options [,label]])

column The column against which the criteria should be applied.

criteria Criteria to be applied against the column, consisting of either simple keywords, or an expression. Expressions may contain nested parentheses, qualification functions, Boolean operators and other textual operators described in this document. Expressions must be surrounded by outer parentheses.

options An optional parameter that controls options for the function. The primary options determine which features to apply to this search. If multiple options are submitted, they will be performed in the order shown below:

PROXIMITY[=approach] Convert all unquoted spaces in criteria to proximity searches based on approach. Valid approaches are:

	PHRASE
	BEFORE(n)
	AFTER(n)

SPELLCHECK[=list] Check each word in criteria to see if it is a commonly misspelled word, and automatically replace the word with the corrected spelling. If list is not provided, then use the list declared in the environment file for this column. If no list is provided in the environment file, then use the default list provided with Omnidex (recommended).

SYNONYMS[=list] Expand the search to include the synonyms of each word in criteria. If list is not provided, then use the list declared in the environment file for this column. If no list is provided in the environment file, then use the thesaurus.

FORMS[=approach] Expand the search to include the various forms of each word. Approach can be set to one or more of PLURALS, CONJUGATIONS, and DERIVATIONS. If approach is not provided, then use the approach declared in the environment file for this column. If no approach is provided in the environment file, then all approaches will be used by default.

STOPWORDS[=list] Restrict this search by excluding all words found in the stopwords list. If list is not provided, then use the list declared in the environment file for this column. If no list is provided in the environment file, then use the default list provided with Omnidex.

Stopwords are not eliminated if a single stopword is the only word in criteria.  

MISSPELLINGS[=list] Locate possible misspellings for each word in criteria. If list is not provided, then use the spelling dictionaries created by ODXSQL’s UPDATE TEXT command.

PHONETIC Expand the search to include the phonetic equivalents for each word. This option requires that the Phonetic indexing option (;SX) be used for this column.

The secondary options configure the way that the primary options are applied:

INCLUSIVE Search for the original word as well as any additional words. This option applies to the SYNONYMS option only, and is true by default.

EXCLUSIVE Search only for the additional words, but not for the original word. This option applies to the SYNONYMS only.

PARTOFSPEECH=pos When searching the thesaurus or dictionary, limit searches to the named part of speech. Parts of speech are:

  • NOUN
  • VERB
  • ADJECTIVE
  • ADVERB

SENSE=n When searching the thesaurus or dictionary, limit searches to the specified sense. Senses can be seen by looking up the definition with the $LOOKUP function or ODXSQL’s LOOKUP command.

MAX_SENSES=n | ALL When searching the thesaurus or dictionary, limit searches to the specified number of senses. By default, searches are limited to the most common sense.

label An optional label that describes this particular search. This label is only needed when there are multiple $CONTAINS functions in a single SELECT statement. The label is then used to associate a particular $CONTAINS function with a $SCORE or $CONTEXT function. The label is case-insensitive, and may be up to 32 characters long.

The $CONTAINS function returns a Boolean TRUE or FALSE.

Options

PROXIMITY=<PHRASE | BEFORE(n) | NEAR(n)>

Examples

This example shows how to enhance an existing SQL statement to use a $CONTAINS function:

Original Omnidex SQL Select:

select ACCT, COMPANY, CONTACT 
  from CUSTOMERS 
  where FIRST_NAME = ‘William’

Enhanced with $CONTAINS function:

select ACCT, COMPANY, CONTACT 
  from CUSTOMERS 
  where $CONTAINS(FIRST_NAME, ‘William’, 
                  ‘synonyms=all_given_names’)

This example shows how to enhance a LIKE search to use a $CONTAINS function:

select ACCT, COMPANY, CONTACT 
  from CUSTOMERS 
  where ADDRESS LIKE ‘%FIRST%STREET%’

Enhanced with $CONTAINS function:

select 	ACCT, COMPANY, CONTACT 
  from 	CUSTOMERS 
  where	$CONTAINS(ADDRESS, ‘FIRST STREET’, 
                    ‘synonyms=all_addresses’)

From Omnidex TEXT class

$CONTAINS Searches

Most SQL statements are written with criteria in simple predicates, such as: FIRST_NAME = ‘William’.  This predicate compares the entire contents of FIRST_NAME against the criteria ‘William’, providing no opportunities for more flexible searches.  If the criterion does not match the column exactly, then the row will not be included.  Given the likelihood of everything from nicknames to misspellings, this means that a lot of rows are “lost”, never to be seen by the user despite their importance and relevance.   

SQL does provide a LIKE operator that allows exact substring searches and pattern-match searches.  But this is very limited, and does not accommodate nicknames and misspellings.  Furthermore, the LIKE operator has no awareness of the parsed words in a field.  LIKE’s substring searches would find the word “LACE” in “WALLACE”, and the word “OLD” in “REYNOLDS” despite their lack of relevance.

There is no standardized SQL to support textual searches involving keyword searches, proximity searches, synonym searches, spellchecking and similar features.  Some database vendors have implemented their own independent approaches, but they vary between databases.

Omnidex specializes in textual searches and provides a rich suite of tools for locating rows containing these words and their variations.   Within SQL, the gateway to most of these features is the $CONTAINS function.  Each $CONTAINS function allows complex criteria  against a particular column.   These criteria can apply to parsed words within a column, and can also include parenthetical and Boolean relationships between words.  The $CONTAINS function also allows phrase and proximity searches, and a wide array of options to trigger features such as synonym searches, spellchecking, form searches, phonetic searches and stopwords.  A typical application that uses Omnidex Text involves writing standard SQL statements that are then enhanced to use one or more $CONTAINS functions.   

The examples below demonstrate a substantial improvement over the capabilities of traditional SQL.  In the first example, the use of the SYNONYMS option will trigger a synonym search using a list of first names.  This will expand the search to look for alternates of the name William, such as Will, Bill, Liam and others.  In the second example, the two criteria will be searched for as actual parsed words rather than as substrings.  Additionally, the search will be expanded to include alternative forms of those words, such as “1st St.”.  These specific capabilities are discussed in depth in the remainder of this document.  These examples simply demonstrate how the $CONTAINS function is the gateway to the textual features of Omnidex Text.  
 

Figure 6 - Basic Examples of the $CONTAINS Function

This example shows how to enhance an existing SQL statement to use a $CONTAINS function:

	select 	ACCT, COMPANY, CONTACT 
  from 	CUSTOMERS 
  where	FIRST_NAME = ‘William’

… becomes … 

	select 	ACCT, COMPANY, CONTACT 
  from 	CUSTOMERS 
  where	$CONTAINS(FIRST_NAME, ‘William’, 
                      ‘synonyms=all_given_names’)


This second example shows how to enhance a LIKE search to use a $CONTAINS function:

	select 	ACCT, COMPANY, CONTACT 
  from 	CUSTOMERS 
  where	ADDRESS LIKE ‘%FIRST%STREET%’

… becomes … 

	select 	ACCT, COMPANY, CONTACT 
  from 	CUSTOMERS 
  where	$CONTAINS(ADDRESS, ‘FIRST STREET’, 
                      ‘synonyms=all_addresses’)




The $CONTAINS function is also a prerequisite for using the $SCORE and $CONTEXT functions described later in this document.  Those functions provide relevancy scoring and context excerpt, and must be applied to a particular $CONTAINS search.  If multiple $CONTAINS functions are used in a single SQL statement, then labels must be used to identify which $CONTAINS function to use.

The $CONTAINS function is unusual in that it comprises an entire predicate rather than being used in a predicate.  Functions like CHAR_LENGTH and CASE can be as operands, such as 

SELECT 	… 
  WHERE  	CHAR_LENGTH(column) > 16 AND 
…

$CONTAINS, however, is used as an entire predicate, as in 

SELECT 	… 
  WHERE 	$CONTAINS(column, ‘ABCD’) AND
 …



	

Figure 7 - The $CONTAINS Function

$CONTAINS(column, ‘criteria’, [, options [,’label’]])

column	The column against which the criteria should be applied.

criteria	Criteria to be applied against the column, consisting of either simple keywords, or an expression.  Expressions may contain nested parentheses, qualification functions, Boolean operators and other textual operators described in this document.  Expressions must be surrounded by outer parentheses.

options	An optional parameter that controls options for the function.  The primary options determine which features to apply to this search.  If multiple options are submitted, they will be performed in the order shown below:

PROXIMITY[=approach]	Convert all unquoted spaces in criteria to proximity searches based on approach.  Valid approaches are: 
		PHRASE
		BEFORE(n)
		AFTER(n)

SPELLCHECK[=list]	Check each word in criteria to see if it is a commonly misspelled word, and automatically replace the word with the corrected spelling.  If list is not provided, then use the list declared in the environment file for this column.  If no list is provided in the environment file, then use the default list provided with Omnidex (recommended).

SYNONYMS[=list]	Expand the search to include the synonyms of each word in criteria.  If list is not provided, then use the list declared in the environment file for this column.  If no list is provided in the environment file, then use the thesaurus.

FORMS[=approach]	Expand the search to include the various forms of each word.  Approach can be set to one or more of PLURALS, CONJUGATIONS, and DERIVATIONS.  If approach is not provided, then use the approach declared in the environment file for this column.  If no approach is provided in the environment file, then all approaches will be used by default.

STOPWORDS[=list]	Restrict this search by excluding all words found in the stopwords list.  If list is not provided, then use the list declared in the environment file for this column.  If no list is provided in the environment file, then use the default list provided with Omnidex.

	Stopwords are not eliminated if a single stopword is the only word in criteria.  

MISSPELLINGS[=list]	Locate possible misspellings for each word in criteria.  If list is not provided, then use the spelling dictionaries created by ODXSQL’s UPDATE TEXT command.

PHONETIC	Expand the search to include the phonetic equivalents for each word.  This option requires that the Phonetic indexing option (;SX) be used for this column.

	The secondary options configure the way that the primary options are applied:

INCLUSIVE	Search for the original word as well as any additional words.  This option applies to the SYNONYMS option only, and is true by default.

EXCLUSIVE	Search only for the additional words, but not for the original word.  This option applies to the SYNONYMS only.
	
PARTOFSPEECH=pos	When searching the thesaurus or dictionary, limit searches to the named part of speech.  Parts of speech are:
			NOUN
			VERB
			ADJECTIVE
			ADVERB

SENSE=n	When searching the thesaurus or dictionary, limit searches to the specified sense.  Senses can be seen by looking up the definition with the $LOOKUP function or ODXSQL’s LOOKUP command.
	
MAX_SENSES=n | ALL	When searching the thesaurus or dictionary, limit searches to the specified number of senses.  By default, searches are limited to the most common sense.  

label	An optional label that describes this particular search.  This label is only needed when there are multiple $CONTAINS functions in a single SELECT statement.  The label is then used to associate a particular $CONTAINS function with a $SCORE or $CONTEXT function.  The label is case-insensitive, and may be up to 32 characters long.


The $CONTAINS function returns a Boolean TRUE or FALSE.


 
Back to top
dev/sql/functions/contains/home.1261155104.txt.gz · Last modified: 2016/06/28 22:38 (external edit)