Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
dev:sql:functions:contains:home [2009/11/16 22:25]
127.0.0.1 external edit
dev:sql:functions:contains:home [2012/10/26 14:28] (current)
Line 52: Line 52:
  
 The $CONTAINS function returns a Boolean TRUE or FALSE. The $CONTAINS function returns a Boolean TRUE or FALSE.
- 
 ===== Examples ===== ===== Examples =====
 This example shows how to enhance an existing SQL statement to use a $CONTAINS function: This example shows how to enhance an existing SQL statement to use a $CONTAINS function:
Line 78: Line 77:
     where $CONTAINS(ADDRESS,​ ‘FIRST STREET’, ​     where $CONTAINS(ADDRESS,​ ‘FIRST STREET’, ​
                       ‘synonyms=all_addresses’)                       ‘synonyms=all_addresses’)
 +
 +==== TO BE REVIEWED ====
 +
 +<​code>​
 +$CONTAINS Searches
 +
 +Most SQL statements are written with criteria in simple predicates, such as: FIRST_NAME = ‘William’. ​ This predicate compares the entire contents of FIRST_NAME against the criteria ‘William’,​ providing no opportunities for more flexible searches. ​ If the criterion does not match the column exactly, then the row will not be included. ​ Given the likelihood of everything from nicknames to misspellings,​ this means that a lot of rows are “lost”, never to be seen by the user despite their importance and relevance. ​  
 +
 +SQL does provide a LIKE operator that allows exact substring searches and pattern-match searches. ​ But this is very limited, and does not accommodate nicknames and misspellings. ​ Furthermore,​ the LIKE operator has no awareness of the parsed words in a field. ​ LIKE’s substring searches would find the word “LACE” in “WALLACE”,​ and the word “OLD” in “REYNOLDS” despite their lack of relevance.
 +
 +There is no standardized SQL to support textual searches involving keyword searches, proximity searches, synonym searches, spellchecking and similar features. ​ Some database vendors have implemented their own independent approaches, but they vary between databases.
 +
 +Omnidex specializes in textual searches and provides a rich suite of tools for locating rows containing these words and their variations. ​  ​Within SQL, the gateway to most of these features is the $CONTAINS function. ​ Each $CONTAINS function allows complex criteria ​ against a particular column. ​  These criteria can apply to parsed words within a column, and can also include parenthetical and Boolean relationships between words. ​ The $CONTAINS function also allows phrase and proximity searches, and a wide array of options to trigger features such as synonym searches, spellchecking,​ form searches, phonetic searches and stopwords. ​ A typical application that uses Omnidex Text involves writing standard SQL statements that are then enhanced to use one or more $CONTAINS functions. ​  
 +
 +The examples below demonstrate a substantial improvement over the capabilities of traditional SQL.  In the first example, the use of the SYNONYMS option will trigger a synonym search using a list of first names. ​ This will expand the search to look for alternates of the name William, such as Will, Bill, Liam and others. ​ In the second example, the two criteria will be searched for as actual parsed words rather than as substrings. ​ Additionally,​ the search will be expanded to include alternative forms of those words, such as “1st St.”. ​ These specific capabilities are discussed in depth in the remainder of this document. ​ These examples simply demonstrate how the $CONTAINS function is the gateway to the textual features of Omnidex Text.  ​
 + 
 +
 +Figure 6 - Basic Examples of the $CONTAINS Function
 +
 +This example shows how to enhance an existing SQL statement to use a $CONTAINS function:
 +
 + select ACCT, COMPANY, CONTACT ​
 +  from CUSTOMERS ​
 +  where FIRST_NAME = ‘William’
 +
 +… becomes … 
 +
 + select ACCT, COMPANY, CONTACT ​
 +  from CUSTOMERS ​
 +  where $CONTAINS(FIRST_NAME,​ ‘William’, ​
 +                      ‘synonyms=all_given_names’)
 +
 +
 +This second example shows how to enhance a LIKE search to use a $CONTAINS function:
 +
 + select ACCT, COMPANY, CONTACT ​
 +  from CUSTOMERS ​
 +  where ADDRESS LIKE ‘%FIRST%STREET%’
 +
 +… becomes … 
 +
 + select ACCT, COMPANY, CONTACT ​
 +  from CUSTOMERS ​
 +  where $CONTAINS(ADDRESS,​ ‘FIRST STREET’, ​
 +                      ‘synonyms=all_addresses’)
 +
 +
 +
 +
 +The $CONTAINS function is also a prerequisite for using the $SCORE and $CONTEXT functions described later in this document. ​ Those functions provide relevancy scoring and context excerpt, and must be applied to a particular $CONTAINS search. ​ If multiple $CONTAINS functions are used in a single SQL statement, then labels must be used to identify which $CONTAINS function to use.
 +
 +The $CONTAINS function is unusual in that it comprises an entire predicate rather than being used in a predicate. ​ Functions like CHAR_LENGTH and CASE can be as operands, such as 
 +
 +SELECT … 
 +  WHERE  CHAR_LENGTH(column) > 16 AND 
 +
 +
 +$CONTAINS, however, is used as an entire predicate, as in 
 +
 +SELECT … 
 +  WHERE $CONTAINS(column,​ ‘ABCD’) AND
 + …
 +
 +
 +
 +
 +
 +Figure 7 - The $CONTAINS Function
 +
 +$CONTAINS(column,​ ‘criteria’,​ [, options [,​’label’]])
 +
 +column The column against which the criteria should be applied.
 +
 +criteria Criteria to be applied against the column, consisting of either simple keywords, or an expression. ​ Expressions may contain nested parentheses,​ qualification functions, Boolean operators and other textual operators described in this document. ​ Expressions must be surrounded by outer parentheses.
 +
 +options An optional parameter that controls options for the function. ​ The primary options determine which features to apply to this search. ​ If multiple options are submitted, they will be performed in the order shown below:
 +
 +PROXIMITY[=approach] Convert all unquoted spaces in criteria to proximity searches based on approach. ​ Valid approaches are: 
 + PHRASE
 + BEFORE(n)
 + AFTER(n)
 +
 +SPELLCHECK[=list] Check each word in criteria to see if it is a commonly misspelled word, and automatically replace the word with the corrected spelling. ​ If list is not provided, then use the list declared in the environment file for this column. ​ If no list is provided in the environment file, then use the default list provided with Omnidex (recommended).
 +
 +SYNONYMS[=list] Expand the search to include the synonyms of each word in criteria. ​ If list is not provided, then use the list declared in the environment file for this column. ​ If no list is provided in the environment file, then use the thesaurus.
 +
 +FORMS[=approach] Expand the search to include the various forms of each word.  Approach can be set to one or more of PLURALS, CONJUGATIONS,​ and DERIVATIONS. ​ If approach is not provided, then use the approach declared in the environment file for this column. ​ If no approach is provided in the environment file, then all approaches will be used by default.
 +
 +STOPWORDS[=list] Restrict this search by excluding all words found in the stopwords list.  If list is not provided, then use the list declared in the environment file for this column. ​ If no list is provided in the environment file, then use the default list provided with Omnidex.
 +
 + Stopwords are not eliminated if a single stopword is the only word in criteria.  ​
 +
 +MISSPELLINGS[=list] Locate possible misspellings for each word in criteria. ​ If list is not provided, then use the spelling dictionaries created by ODXSQL’s UPDATE TEXT command.
 +
 +PHONETIC Expand the search to include the phonetic equivalents for each word.  This option requires that the Phonetic indexing option (;SX) be used for this column.
 +
 + The secondary options configure the way that the primary options are applied:
 +
 +INCLUSIVE Search for the original word as well as any additional words. ​ This option applies to the SYNONYMS option only, and is true by default.
 +
 +EXCLUSIVE Search only for the additional words, but not for the original word.  This option applies to the SYNONYMS only.
 +
 +PARTOFSPEECH=pos When searching the thesaurus or dictionary, limit searches to the named part of speech. ​ Parts of speech are:
 + NOUN
 + VERB
 + ADJECTIVE
 + ADVERB
 +
 +SENSE=n When searching the thesaurus or dictionary, limit searches to the specified sense. ​ Senses can be seen by looking up the definition with the $LOOKUP function or ODXSQL’s LOOKUP command.
 +
 +MAX_SENSES=n | ALL When searching the thesaurus or dictionary, limit searches to the specified number of senses. ​ By default, searches are limited to the most common sense.  ​
 +
 +label An optional label that describes this particular search. ​ This label is only needed when there are multiple $CONTAINS functions in a single SELECT statement. ​ The label is then used to associate a particular $CONTAINS function with a $SCORE or $CONTEXT function. ​ The label is case-insensitive,​ and may be up to 32 characters long.
 +
 +
 +The $CONTAINS function returns a Boolean TRUE or FALSE.
 +
 +</​code>​
  
  
  
  
 
Back to top
dev/sql/functions/contains/home.txt · Last modified: 2012/10/26 14:28 (external edit)