Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
admin:indexing:powersearch:custom [2011/01/24 18:29]
els
admin:indexing:powersearch:custom [2016/06/28 22:38] (current)
Line 16: Line 16:
 ---- ----
  
-==== Creating ​Custom ​Synonym Lists ====+==== Custom ​Usages ​====
  
-Synonym lists are an important tool for PowerSearch.  Synonym lists allow your search terms to be automatically expanded to make your search more flexibility.  The examples on the previous pages used many synonym listssuch as Given Names, Postal Abbreviations,​ City Names and Area Codes.  ​Omnidex provides a base set of synonym lists as part of the productbut it is also possible to create your own synonym lists.+Usages ​are essential to the customization of PowerSearch. ​ The POWERSEARCH option instructs Omnidex to evaluate ​the columns referenced in the WHERE clause of the SELECT statement, and matches their USAGE clauses with the preconfigured instructions for each USAGE.  ​These instructions describe actions such as using synonymschecking for misspelling,​ or even doing geographic searches 
  
-The synonym lists that are provided with Omnidex are: 
  
 +=== The Usages File Layout ===
  
-^LIST^DESCRIPTION+These preconfigured instructions can be customized or expanded to meet the needs of each application. ​ The usage instructions exist in a tab-delimited file with six columns. ​ It can be modified using a text editor such as '​Notepad'​ on Windows or '​vi'​ on UNIX.  It can also be created in a spreadsheet program such as Microsoft Excel, which allows a file to be saved as a Tab-Delimited File. 
-|  ​\\  **Geography and Addresses**\\ ​ \\  |+ 
-|COUNTRY_CODES|ISO Standard country codes+The record layout of the usages table consists of six columns: 
-|STATE_CODES|USPS state codes+ 
-|CANADIAN_PROVINCES|Canadian province codes+Column Name        ​Datatype ​      ^  
-|CITY_ABBR|Common abbreviations used in city names+$LIST              | CHARACTER(32) ​ |  
-|STREET_SUFFIXES|USPS standard abbreviations used in street addresses+$USAGE ​            CHARACTER(32)  ​
-|SECONDARY_UNIT_ABBR|USPS standard abbreviations used for apartments, suites, etc.| +$PARSING ​          CHARACTER(32)  ​
-|DIRECTIONS|Common abbreviations used for directions ​of the compass.+$FORMAT ​           ​CHARACTER(255) ​
-|NUMERALS|Correlation ​of numerical and textual numberssuch as 10 and ten.| +$POWERSEARCH ​      STRING(4094) ​  
-|ORDINALS|Correlation ​of numerical and textual ordinals, such as 1st and first.| +$COMMENTS ​         ​STRING(255) ​   ​|  
-|SALUTATION_ABBR|Standard abbreviations ​used in salutations| + 
-|MILTARY_RANK_ABBR|Standard abbreviations used for military ranks| +== $LIST == 
-|ORGANIZATION_ABBR|Standard abbreviations ​used in the names  ​of organizations| + 
-|AIRPORT_CODES_US|Standard airport codes within the United States| +The name of the list, repeated for each row in the list This name can be optionally referenced in the POWERSEARCH option to specify a specific list of usages. ​ This allows different applications to use different sets of PowerSearch instructions. ​ Omnidex is shipped with only one list named DEFAULTwhich is used if no list is provided  
-|AIRPORT_CODES_INTL|Standard airport codes outside of the United States| + 
-|ALL_INDIVIDUAL_ADDRESSEES*|Synonyms appropriate for addressees that are individual people, as opposed ​to organizations.  This list is a composite ​of SALUTATION_ABBR and MILITARY_RANK_ABBR.| +== $USAGE == 
-|ALL_ADDRESSEES*|Synonyms appropriate ​for addresseesincluding people and organizations.  ​This list is a composite ​of SALUTATIONS_ABBRMILITARY_RANK_ABBR and ORGANIZATION_ABBR.| + 
-|ALL_ADDRESS_LINES*|Synonyms appropriate ​for address lines containing street addressbut not citystatezip and country.  ​This list is composite ​of STREET_SUFFIXESSECONDARY_UNIT_ABBRDIRECTIONSNUMERALS and ORDINALS| +The name of the usage as declared for the column in the CREATE TABLE statement  ​ 
-|ALL_FREEFORM_ADDRESSES*|Synonyms appropriate ​for address lines containing street address, city, state and zip, but not country.  ​This list is a composite of STATE_CODESCANADIAN_PROVINCESCITY_ABBR, STREET_SUFFIXES,​ SECONDARY_UNIT_ABBR,​ DIRECTIONS, NUMERALS and ORDINALS.| + 
-|ALL_AIRPORTS*|All standard airport codesboth within and outside ​the United States.  ​This list is composite of AIRPORT_CODES_US and AIRPORT_CODES_INTL.| +== $PARSING == 
-|ALL_GEOGRAPHY_ADDRESSES*|All geographical and address synonyms.  ​This list is a composite of all lists in this section.| + 
- ​\\ ​ **Proper Names**\\ ​ \\  |+This column is reserved for future use and not used in this version of Omnidex. 
-|FEMALE_GIVEN_NAMES|Variations for female-only given names.+ 
-|MALE_GIVEN_NAMES|Variations for male-only given names.+== $FORMAT == 
-|EXPANDED_FEMALE_GIVEN_NAMES|Expanded variations for male or female given names. ​ This list is a superset of FEMALE_GIVEN_NAMES,​ and also includes more unusual variations, as well as correlated nicknames sharing the same base name (eg. Elizabeth = Liz and Beth, therefore Liz = Beth).+ 
-|EXPANDED_MALE_GIVEN_NAMES|Expanded variations ​for male or female given names.  This list is superset of MALE_GIVEN_NAMES,​ and also includes more unusual variations, as well as correlated nicknames sharing ​the same base name (egWilliam ​Bill and Will, therefore Bill Will).| +This column is reserved for future use and not used in this version ​of Omnidex. 
-|SURNAMES|Variations ​for surnames.  ​Generally combined with use of phonetic functions.| + 
-|ALL_GIVEN_NAMES*|All variations of given names.  ​This list is a composite ​of FEMALE_GIVEN_NAMES ​and MALE_GIVEN_NAMES.| +== $POWERSEARCH == 
-|EXPANDED_GIVEN_NAMES*|All variations ​of expanded given names.  ​This list is a composite of EXPANDED_FEMALE_GIVEN_NAMES ​and EXPANDED_MALE_GIVEN_NAMES.| + 
-|ALL_PROPER_NAMES*|All variations of proper names. ​ This list is a composite of all lists in this section.| +The replacement string ​to be substituted in the WHERE clause when the POWERSEARCH option is used.  ​There are special tokens that can be referenced in this field: 
-|  \\  ​**Acronyms**\\  \\  || + 
-|GENERAL_ACRONYMS|General acronyms found in general use.+  * %COLUMN% - This token is replaced with the name of the column applying this usage
-|ORGANIZATION_ACRONYMS|Acronyms ​for companies and organizations.| +  %CRITERIA% - This token is replaced with the criteria passed in the WHERE clause ​for this column. 
-|GOV_ABBR_GPO|Government acronyms from the Government Printing Office.| + 
-|GOV_ABBR_IUPUI|Government acronyms from Indiana University/​Perdue University.| +Typicallythe replacement string uses a [[dev:​sql:​functions:​contains|$CONTAINS]] clause that applies synonyms, misspellings or other Omnidex features.  ​The use of a $CONTAINS clause ​is not required, though. ​ While every SQL construct cannot be supported, administrators can use wide variety ​of SQL clauses in this sectionincluding complex, parenthesized,​ Boolean predicates, nested queries, etc
-|EMAIL_CHAT_ACRONYMS|Acronyms commonly used in emails and chat rooms.+ 
-|ALL_ACRONYMS*|All acronyms.  ​This list is a composite of all lists in this section.| +== $COMMENTS == 
- ​\\ ​ **Abbreviations**\\ ​ \\  || + 
-|MEASURE_BASIC_ABBR|Abbreviations for basic measures+Comments that are useful to the administrator to document this usage. 
-|MEASURE_LENGTH_ABBR|Abbreviations for measurements of lengths.| + 
-|MEASURE_AREA_ABBR|Abbreviations for measurements of area.+ 
-|MEASURE_LIQUID_VOLUME_ABBR|Abbreviations for measurements of liquid volume.| + 
-|MEASURE_DRY_VOLUME_ABBR|Abbreviations for measurements of dry volume.+==== Custom Synonym Lists ==== 
-|MEASURE_WEIGHT_ABBR|Abbreviations for measurements of weight.| + 
-|MEASURE_ENERGY_ABBR|Abbreviations for measurements of energy.+Synonym lists are an important tool for PowerSearch. ​ Synonym lists allow your search terms to be automatically expanded to make your search more flexibile. ​ The examples on the previous pages used many synonym listssuch as Given NamesPostal AbbreviationsCity Names and Area Codes.  ​Omnidex provides ​base set of synonym lists as part of the productbut it is also possible to create your own synonym lists. 
-|MEASURE_TIME_ABBR|Abbreviations for  measurements of time.  ​Note that this list does not include ​the standard abbreviations for the days of the week and the months.  ​Those abbreviations are found in their respective lists.| + 
-|TIME_PERIODS|Mnemonics for common time periods, correlated to their appropriate date range. ​ | +Synonym lists are usually limited to a specific topicsuch as Given Namesor City Abbreviations. ​ It is important to watch for cross-overs between synonyms. ​ For exampleit would be prudent to keep city, state and country ​abbreviations in separate lists.  ​LA as a city code is an abbreviation for Los Angeles; however, as state codeit is an abbreviation for Louisiana. ​ In these situationseach column should use its own managed synonym list so that overlap does not occur  
-|MONTH_ABBR|Standard abbreviations for the months of the year.| + 
-|DAYS_OF_WEEK_ABBR|Standard abbreviations for the days of the week.| +=== The Synonym List Library === 
-|TIME_ZONE_ABBR|Standard abbreviations for the worldwide time zones.+ 
-|ALL_MEASURES*|All abbreviations of measurements. ​ This list is a composite of MEASURE_BASIC_ABBR,​ MEASURE_LENGTH_ABBR,​ MEASURE_AREA_ABBR,​ MEASURE_LIQUID_VOLUME_ABBR,​ MEASURE_DRY_VOLUME_ABBR,​ MEASURE_WEIGHT_ABBR,​ MEASURE_ENERGY_ABBR and MEASURE_TIME_ABBR.| +Before creating your own synonym listsbe sure to check the [[appendix:​synonyms|synonym lists]] that are provided with Omnidex.  ​The list you need may have already been created, or there may be similar list that you can use as a starting point If you do create your own synonym list, consider whether it would benefit the broader Omnidex community. ​ If you would like to submit a synonym list for inclusion in the product, simply send it to [[appendix:​contactus|Technical Support]] with a note saying that you are contributing it to the product. 
-|ALL_CALENDAR*|All calendar-related abbreviations. ​ This list is a composite of MONTH_ABBR ​and DAYS_OF_WEEK_ABBR.+ 
-|ALL_TIME*|All time-related abbreviations. ​ This list is a composite of MEASURE_TIME_ABBR,​ MONTH_ABBR, DAYS_OF_WEEK_ABBR and TIME_ZONE_ABBR.| +=== The Synonym List File Layout === 
-|ALL_ABBREVIATIONS*|All abbreviations. ​ This list is a composite of all lists in this section.+ 
-|  \\  ​**Medical**\\  \\  || +Creating a new synonym list is straightforward.  ​It is simply ​tab-delimited file with four columns. ​ It can be created using a text editor such as '​Notepad'​ on Windows or '​vi'​ on UNIX.  It can also be created ​in a spreadsheet program such as Microsoft Excel, which allows a file to be saved as a Tab-Delimited File. 
-|MEDICAL_ABBR|Common medical abbreviations.| + 
-|DRUG_BRANDS_DISCN|Correlation of discontinued drug brands and their ingredients from the FDA's Orange Book.+The record layout of the synonym table consists of four columns: 
-|DRUG_BRANDS_OTC|Correlation of over-the-counter drug brands and their ingredients from the FDA's Orange Book.+ 
-|DRUG_BRANDS_RX|Correlation of prescription drug brands and their ingredients from the FDA's Orange Book.| +^ Column Name        ^ Datatype ​      ​^ ​ 
-|DRUG_INGR_DISCN|Correlation of the ingredients of discontinued drugs and their brands from the FDA's Orange Book.| +$LIST              | CHARACTER(32) ​ |  
-|DRUG_INGR_OTC|Correlation of the ingredients of over-the-counter drugs and their brands from the FDA's Orange Book.| +$WORD              ​STRING(127) ​   ​|  
-|DRUG_INGR_RX|Correlation of the ingredients of prescription drugs and their brands from the FDA's Orange Book.| +$REPLACEMENT ​      STRING(4094) ​  |  
-|COMMON_DRUGS|Correlation of common drug names and their generic equivalents.+$COMMENTS ​         ​STRING(255   |  
-|ALL_DRUG_BRANDS*|All drugs from FDA Orange Book, by brand. ​ This list is a composite of DRUG_BRANDS_DISCN,​ DRUG_BRANDS_OTC and DRUG_BRANDS_RX.| + 
-|ALL_DRUG_INGR*|All drugs from FDA Orange Bookby ingredien. ​ This list is a composite of DRUG_INGR_DISCNDRUG_INGR_OTC and DRUG_INGR_RX.+ 
-|FDA_ORANGE_BOOK*|All drugs from FDA Orange Book, by brand and ingredient. ​ This list is a composite of DRUG_BRANDS_DISCN,​ DRUG_BRANDS_OTC,​ DRUG_BRANDS_RX,​ DRUG_INGR_DISCN,​ DRUG_INGR_OTC and DRUG_INGR_RX.| +== $LIST == 
-|ALL_DRUGS*|All drugs from FDA Orange Book plus other lists of drugs. ​ This list is a composite of DRUG_BRANDS_DISCNDRUG_BRANDS_OTCDRUG_BRANDS_RXDRUG_INGR_DISCNDRUG_INGR_OTC,​ DRUG_INGR_RX and COMMON_DRUGS.| + 
-|ALL_MEDICAL*|All medical synonyms ​This ​list is a composite of all lists in this section.| +The name of the list, repeated ​for each row in the list.  This name will be referenced in the $CONTAINS clause using the syntax, '​SYNONYMS=list'​. ​ Be sure to choose ​list that does not conflict with an existing list in the library  
- ​\\ ​ ​**Science**\\  \\  || + 
-|CERN_ABBR|Acronyms and abbreviations appropriate for the CERN environment.| +== $WORD == 
-|BASIC_ELEMENTS|Correlation of abbreviations and elements from the Periodic Table of Elements | + 
-|COMMON_CHEMICAL_COMPOUNDS|Correlation of composition and names for common chemical compounds.| +The word or phrase ​for which synonyms are being created.  ​Phrases should be enclosed in double quotes.  
-|ALL_SCIENCE*|All science ​synonyms. ​ This list is a composite of all lists in this section.| + 
- \\  **General**\\ ​ ​\\ ​ || +== $REPLACEMENT == 
-|ALL_SYNONYMS*|This list is a composite of all lists in the sections above.|+ 
 +The replacement string to be used as a synonym list.  ​Normally, this is a comma-delimited list of synonyms; however, it can be any of the following choices, intermingled ​and in any order.  
 + 
 +Replacements may consist ​of any of the following choices.  ​Replacements may also be intermingled, ​and occur in any order
 + 
 +** Words and Phrases ​** 
 + 
 +Replacements can be a word or phrase to be used in place of the current word Note that for the synonyms to include the current word itself, it must be included ​in the replacement text Words may be separated by commas.  ​Phrases must be enclosed ​in double-quotation marks  For example: 
 + 
 +^ $LIST    ^ $WORD ^ $REPLACEMENT ​                      ^ $COMMENTS ^ 
 +CITIES ​  LA    ​LA, "Los Angeles" ​                 ​          ​
 +CITIES ​  SF    ​SF, "San Francisco",​ "Santa Fe" ​   ​          ​
 +STATES ​  LA    ​LA, Louisiana ​                     ​          ​
 +STATES ​  NM    ​NM, "New Mexico" ​                            ​
 + 
 + 
 +** Qualification Criteria ** 
 + 
 +Replacements can be qualification criteria, indicated by enclosing the entire replacement string in parentheses.  ​These criteria may include ​Boolean operators ​and nested parentheses.  ​For example: 
 + 
 +^ $LIST        ^ $WORD       ^ $REPLACEMENT ​                      ^ $COMMENTS ^ 
 +DATE_RANGES  ​FISCAL_2009 ​(Between 7/1/2008 and 6/​30/​2009) ​            ​
 +DATE_RANGES ​ ​| ​FISCAL_2010 ​(Between 7/​1/​2009 ​and 6/​30/​2010) ​  ​| ​          
 +DATE_RANGES ​ ​| ​FISCAL_2011 ​(Between 7/1/2010 and 6/​30/​2011) ​            ​|== 
 + 
 +** Pointers to Other Entries ​**  
 + 
 +Replacements can be pointers to other entries within ​the same list ​Pointers are indicated by prefixing ​the word with a greater-than sign (>) ​Pointers are allowed to be nested For example: 
 + 
 +^ $LIST        ^ $WORD       ^ $REPLACEMENT ​                         ^ $COMMENTS ^ 
 +FIRST_NAMES  ​FRED        ​>​Fredrick ​                                      ​
 +FIRST_NAMES ​ ​| ​FREDRICK ​   ​FredrickFredRick                  |           
 +FIRST_NAMES ​ ​| ​RICK        ​RickRickyRichardDick>​Fredrick ​          ​| 
 + 
 +== $COMMENTS == 
 + 
 +Comments that are useful to the administrator in documenting this synonym. 
 + 
 +=== Installing a New Synonym List === 
 + 
 +Installing a new synonym ​list is as simple as saving the file in the synonym directory The synonym directory is as follows: 
 + 
 +Windows: 
 + 
 +  ​%OMNIDEX_HOME%\config\english\synonyms 
 + 
 +UNIX: 
 + 
 +  ​$OMNIDEX_HOME/​config/​english/​synonyms 
 + 
 +After the file has been installed, it must be indexed.  This is done using the following command: 
 + 
 +Windows: 
 + 
 +  cd %OMNIDEX_HOME%\config 
 +  build_config.bat 
 + 
 +UNIX: 
 + 
 +  cd $OMNIDEX_HOME/​config 
 +  build_config.ksh 
 + 
 +=== Testing ​New Synonym List === 
 + 
 +Once a synonym list has been installed, it can be tested using the following command ​in OdxSQL. ​ In this example, replace the values in angle brackets with values appropriate to your database
 + 
 +  connect <​environment>​ 
 +  lookup $contains(<​table>​.<​column>,​ '​criteria',​ '​SYNONYMS=<​list>'​) 
 + 
 +As an example, this same command is shown running against the ALL_GIVEN_NAMES synonym list: 
 + 
 +  > connect simple 
 +  Connected to D:\class\lab2\simple.xml 
 +  > lookup $contains(individuals.name, '​William',​ '​synonyms=ALL_GIVEN_NAMES'​) 
 +    IN (William, Bill, Billy, Will, Williams, Willie, Willis, Wilson)
  
-$LIST                                CHARACTER(32) 
-$WORD                                C STRING(127) 
-$REPLACEMENT ​                        C STRING(4094) 
-$COMMENTS ​                           C STRING(255) 
  
  
 
Back to top
admin/indexing/powersearch/custom.1295893759.txt.gz · Last modified: 2016/06/28 22:38 (external edit)