This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
admin:indexing:powersearch:sql [2012/01/30 18:43] doc |
admin:indexing:powersearch:sql [2016/06/28 22:38] (current) |
||
---|---|---|---|
Line 23: | Line 23: | ||
In most scenarios, the key to implementing PowerSearch is to simply use the POWERSEARCH option on a SELECT statement. Alternatively, the SET POWERSEARCH ON command can be used to automatically activate PowerSearch for all queries on that connection. | In most scenarios, the key to implementing PowerSearch is to simply use the POWERSEARCH option on a SELECT statement. Alternatively, the SET POWERSEARCH ON command can be used to automatically activate PowerSearch for all queries on that connection. | ||
- | |||
- | The POWERSEARCH option works hand-in-hand with the data stored in the Omnidex Environment File. To use POWERSEARCH, administrators add a context to columns telling Omnidex what they contain. For example, a column named FIRST_NAME may have a character datatype, but that alone is not enough to tell Omnidex what the column contains. Moreover, Omnidex cannot look to the column name, as there are many different conventions for naming columns. Administrators add a USAGE clause that identifies the contents as given names, as shown in this example: | ||
<code> | <code> | ||
- | create table "HOUSEHOLDS" | + | > select name from individuals where name = 'William Cooke' with powersearch; |
- | physical "dat\households.dat" | + | |
- | ( | + | |
- | "HOUSEHOLD" CHARACTER(12) omnidex, | + | |
- | "ADDRESS" CHARACTER(50) quicktext usage "STREET_ADDRESS", | + | |
- | "CITY" CHARACTER(28) quicktext usage "CITY", | + | |
- | "STATE" CHARACTER(2) omnidex usage "STATE", | + | |
- | "ZIP" CHARACTER(5) omnidex usage "ZIPCODE", | + | |
- | "COUNTRY" CHARACTER(2) omnidex usage "COUNTRY", | + | |
- | constraint HOUSEHOLDS_HOUSEHOLD_PK primary ("HOUSEHOLD"), | + | |
- | constraint HOUSEHOLDS_STATE_FK foreign ("STATE") references "STATES", | + | |
- | constraint HOUSEHOLDS_COUNTRY_FK foreign ("COUNTRY") references "COUNTRIES" | + | |
- | ) | + | |
- | in "simple.xml"; | + | |
+ | NAME | ||
+ | -------------------------------------------------- | ||
+ | BILLIE E COOK | ||
+ | MR. WILL COOK | ||
+ | </code> | ||
+ | |||
+ | === The USAGE Clause === | ||
+ | The POWERSEARCH option works hand-in-hand with the data stored in the Omnidex Environment File. To use POWERSEARCH, administrators add a context to columns telling Omnidex what they contain. For example, a column named NAME may have a character datatype, but that alone is not enough to tell Omnidex what the column contains. Moreover, Omnidex cannot look to the column name, as there are many different conventions for naming columns. Administrators add a USAGE clause that identifies the contents as given names, as shown in this example: | ||
+ | |||
+ | <code> | ||
create table "INDIVIDUALS" | create table "INDIVIDUALS" | ||
physical "dat\individuals.dat" | physical "dat\individuals.dat" | ||
Line 48: | Line 43: | ||
"INDIVIDUAL" CHARACTER(12) omnidex, | "INDIVIDUAL" CHARACTER(12) omnidex, | ||
"HOUSEHOLD" CHARACTER(12) omnidex , | "HOUSEHOLD" CHARACTER(12) omnidex , | ||
- | "NAME" CHARACTER(50) quicktext usage "NAME", | + | "NAME" CHARACTER(50) quicktext usage "NAME", |
"GENDER" CHARACTER(1) omnidex bitmap, | "GENDER" CHARACTER(1) omnidex bitmap, | ||
"BIRTHDATE" ANSI DATE omnidex, | "BIRTHDATE" ANSI DATE omnidex, | ||
- | "PHONE" CHARACTER(14) omnidex usage "PHONE", | + | "PHONE" CHARACTER(14) omnidex usage "PHONE", |
- | "EMAIL" CHARACTER(60) quicktext usage "EMAIL", | + | "EMAIL" CHARACTER(60) quicktext usage "EMAIL", |
constraint INDIVIDUALS_INDIVIDUAL_PK primary ("INDIVIDUAL"), | constraint INDIVIDUALS_INDIVIDUAL_PK primary ("INDIVIDUAL"), | ||
constraint INDIVIDUALS_HOUSEHOLD_FK foreign ("HOUSEHOLD") references "HOUSEHOLDS", | constraint INDIVIDUALS_HOUSEHOLD_FK foreign ("HOUSEHOLD") references "HOUSEHOLDS", | ||
Line 60: | Line 55: | ||
</code> | </code> | ||
- | Once these usages are applied, the POWERSEARCH option automatically expands the search: | + | Omnidex provides many different usages to meet the general needs of most applications; however, administrators can add custom usages to meet the specific needs of their application. The general usages provided by Omnidex include: |
- | <code> | + | * SALUTATION |
- | Without the PowerSearch option: | + | * FIRST_NAME |
+ | * MIDDLE_NAME | ||
+ | * LAST_NAME | ||
+ | * NAME | ||
+ | * COMPANY_NAME | ||
+ | * STREET_NUMBER | ||
+ | * STREET_DIRECTION | ||
+ | * STREET_NAME | ||
+ | * STREET_TYPE | ||
+ | * UNIT_TYPE | ||
+ | * UNIT_NUMBER | ||
+ | * CITY | ||
+ | * STATE | ||
+ | * COUNTRY | ||
+ | * ZIPCODE | ||
+ | * STREET_ADDRESS | ||
+ | * CITY_STATE_POSTAL | ||
+ | * ADDRESS | ||
+ | * PHONE_AREACODE | ||
+ | * PHONE_PREFIX | ||
+ | * PHONE_SUFFIX | ||
+ | * PHONE | ||
+ | |||
+ | * ... plus, you can add any custom usages needed for your application. | ||
- | |||
- | |||
- | WITH the POWERSEARCH option: | ||
- | |||
- | |||
- | </code> | ||
- | SALUTATION | ||
- | FIRST_NAME | ||
- | MIDDLE_NAME | ||
- | LAST_NAME | ||
- | NAME | ||
- | COMPANY_NAME | ||
- | STREET_NUMBER | ||
- | STREET_DIRECTION | ||
- | STREET_NAME | ||
- | STREET_TYPE | ||
- | UNIT_TYPE | ||
- | UNIT_NUMBER | ||
- | CITY | ||
- | STATE | ||
- | COUNTRY | ||
- | ZIPCODE | ||
- | STREET_ADDRESS | ||
- | CITY_STATE_POSTAL | ||
- | ADDRESS | ||
- | PHONE_AREACODE | ||
- | PHONE_PREFIX | ||
- | PHONE_SUFFIX | ||
- | PHONE | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | the $CONTAINS function in Omnidex SQL. This function provides access to most of the search capabilities shown on the previous page. | ||
- | |||
- | === The $CONTAINS Function === | ||
- | |||
- | The $CONTAINS function is used in the WHERE clause of a SELECT statement to provide criteria accompanied by special options. In traditional SQL statements, criteria is specified using a syntax of: | ||
- | |||
- | <code sql> | ||
- | select count(*) | ||
- | from LIST | ||
- | where FNAME = 'Bill'; | ||
- | </code> | ||
- | |||
- | The $CONTAINS function allows this same statement to be reworded as follows: | ||
- | |||
- | <code sql> | ||
- | select count(*) | ||
- | from LIST | ||
- | where $contains(FNAME, 'Bill'); | ||
- | </code> | ||
- | |||
- | The real advantage of the $CONTAINS function lies in the options that can be added. The $CONTAINS function is discussed in depth in the [[dev:sql:functions:contains:home|documentation]] on the function. Meanwhile, the following examples show how the options were used to provide each of the capabilities shown on the previous page. | ||
- | |||
- | \\ | ||
- | ---- | ||
- | === First Name === | ||
- | |||
- | PowerSearch uses a given names synonym list to search for both the formal given name and the most common variations. For example: | ||
- | |||
- | |||
- | |< 85% 30% 70% >| | ||
- | ^ Given Name ^ Synonyms ^ | ||
- | |William |Bill, Billy, Will, Williams, Willie, Willis, Wilson | | ||
- | |James |Jim, Jimmy, Jimmie, Jay, Jaime, Jacob | | ||
- | |Elizabeth |Beth, Betty, Eliza, Elsa, Lisa, Liz, Liza | | ||
- | |Ann |Ana, Anita, Anna, Anne, Annette, Annie, Annmarie | | ||
- | |||
- | The SQL syntax for given name synonyms searches is shown below. | ||
- | |||
- | <code sql> | ||
- | select ... | ||
- | from ... | ||
- | where $contains(FNAME, 'Bill', 'SYNONYMS=ALL_GIVEN_NAMES') ... | ||
- | </code> | ||
- | |||
- | \\ | ||
- | ---- | ||
- | === Last Name === | ||
- | |||
- | PowerSearch uses phonetic algorithms to search for last names that sound similar, but are spelled differently. | ||
- | |||
- | Here are some examples: | ||
- | |||
- | |||
- | |< 85% 30% 70% >| | ||
- | ^ Last Name ^ Phonetic Equivalents ^ | ||
- | |Smith |Schmit, Schmidt, Schmitt, Smidt, Smit, Smith, Smithey, Smyth, Smythe | | ||
- | |Meyers |Meyers, Myers, Myres, Meiers, Moyers, Meers, Mayers, Maris | | ||
- | |Nelson |Nelsen, Nielsen, Nielson, Neilsen, Neilson, Nilsen, Nilson, Nilsson | | ||
- | |Williams |Willaims, Wiliams, Willimas, Wilmes, Willams, Willems | | ||
- | |||
- | The SQL syntax for phonetic last name searches is shown below. Note that phonetic searches require an index that uses the [[admin:indexing:types:options#phonetic|PHONETIC]] option. | ||
- | |||
- | <code sql> | ||
- | select ... | ||
- | from ... | ||
- | where $contains(LNAME, 'Meyers', 'PHONETIC') ... | ||
- | </code> | ||
- | |||
- | \\ | ||
- | ---- | ||
- | === Address === | ||
- | |||
- | PowerSearch uses a postal abbreviations synonym list, combined with misspelling and transposition algorithms, to search street addresses. | ||
- | |||
- | Here are some examples: | ||
- | |||
- | |||
- | |< 85% 30% 30% 40% >| | ||
- | ^ Category ^ Search Term ^ Equivalents ^ | ||
- | |Street Types |Ave, St, Blvd |Avenue, Street, Boulevard | | ||
- | |Street Directions |N, S, E, W |North, South, East, West | | ||
- | |Street Names (abbreviated) |1st, 2nd, 3rd, 4th, 5th |First, Second, Third, Fourth, Fifth | | ||
- | |Street Names (misspelled) |Canterbury |Cantaberry, Canterbury, Cantabury | | ||
- | |Unit Types |Apt, Ste, Rm |Apartment, Suite, Room | | ||
- | |Street Numbers |12345 |13245, 15324, 1234, 2345 | | ||
- | |||
- | |||
- | The SQL syntax for postal abbrevation synonym searches and misspelling searches is shown below. | ||
- | |||
- | <code sql> | ||
- | select ... | ||
- | from ... | ||
- | where $contains(ADDRESS1, '825 Fifth Avenue', | ||
- | 'SYNONYMS=ALL_ADDRESS_LINES,MISSPELLINGS') ... | ||
- | </code> | ||
- | \\ | ||
- | ---- | ||
- | === City === | ||
- | |||
- | PowerSearch uses a cities synonym list, combined with misspelling algorithms, to search city names. For example: | ||
- | |||
- | |||
- | |||
- | |< 85% 30% 30% 40% >| | ||
- | ^ Category ^ Search Term ^ Equivalents ^ | ||
- | |Abbreviations |SF, LA, NY |San Francisco, Los Angeles, New York | | ||
- | |Misspellings |Milwaukee |Milwalkee, Milwauki, Milwalke | | ||
- | |||
- | The SQL syntax for city synonym searches and misspelling searches is shown below. | ||
- | |||
- | |||
- | <code sql> | ||
- | select ... | ||
- | from ... | ||
- | where $contains(CITY, 'NY', 'SYNONYMS=CITY_ABBR,MISSPELLINGS') ... | ||
- | </code> | ||
- | |||
- | \\ | ||
- | ---- | ||
- | === Zip Code === | ||
- | |||
- | PowerSearch uses a geographic radius search to locate zip codes that are within a certain distance from the named zip code. This is an example of a PowerSearch search that does not use the $CONTAINS function. Instead, it uses the [[admin:features:geo:home|$DISTANCE function]] to perform the radius search. This search requires that the database have a table of postal codes that includes their geographic coordinates. | ||
- | |||
- | Here are some examples: | ||
- | |||
- | |< 85% 30% 30% 40% >| | ||
- | ^ Zip Code ^ Adjacent Zip Code ^ Distance ^ | ||
- | |10021\\ \\ \\ \\ \\ \\ |10162\\ 10155\\ 10028\\ 10044\\ 10052\\ 10022 |.48 miles\\ .64 miles\\ .67 miles\\ .73 miles\\ .80 miles\\ .82 miles | | ||
- | |||
- | The SQL syntax for zip code geographic radius searches is shown below. | ||
- | |||
- | |||
- | <code sql> | ||
- | select ... | ||
- | from ... | ||
- | where ZIP in | ||
- | (select ZIP | ||
- | from ZIPCODES | ||
- | where $distance((select LATITUDE, | ||
- | LONGITUDE | ||
- | from ZIPCODES | ||
- | where ZIP = '10022'), | ||
- | LATITUDE, | ||
- | LONGITUDE) <= 5) ... | ||
- | </code> | ||
- | |||
- | \\ | ||
- | ---- | ||
- | === Phone Area Code === | ||
- | |||
- | PowerSearch uses an area code synonym list to locate area codes that are shared within the same metropolitan area. | ||
- | |||
- | Here are some examples: | ||
- | |||
- | |< 85% 30% 70% >| | ||
- | ^ Metropolitan Area ^ Area Codes ^ | ||
- | |New York, NY |212, 646, 917 | | ||
- | |Denver, CO |303, 720 | | ||
- | |Seattle, WA |206, 360, 253, 425, 564 | | ||
- | |||
- | The SQL syntax for telephone area code synonym searches is shown below. | ||
- | |||
- | <code sql> | ||
- | select ... | ||
- | from ... | ||
- | where $contains(PHONE1_AREA, '917', 'SYNONYMS=AREACODES') ... | ||
- | </code> | ||
- | \\ | ||
- | ---- | ||
- | === Phone Prefix and Suffix === | ||
- | |||
- | PowerSearch uses a transposition search to locate telephone numbers that have been transposed or mistyped. | ||
- | |||
- | Here are some examples: | ||
- | |||
- | |< 85% 30% 70% >| | ||
- | ^ Phone Number ^ Transposition ^ | ||
- | |775-4866 | 755-4846 | | ||
- | |831-6584 | 381-8564 | | ||
- | |||
- | The SQL Syntax for telephone number transposition searches is shown below. | ||
- | |||
- | <code sql> | ||
- | select ... | ||
- | from ... | ||
- | where $contains(PHONE1_PREFIX, '755', 'MISSPELLINGS MIN_SCORE=70') and | ||
- | $contains(PHONE1_SUFFIX, '4686', 'MISSPELLINGS MIN_SCORE=70') ... | ||
- | </code> | ||
- | |||
- | \\ | ||
- | ---- | ||
- | === Email === | ||
- | |||
- | |||
- | PowerSearch uses a standard misspelling search to locate email addresses that have similar spellings. The SQL syntax for email misspelling searches is shown below. | ||
- | |||
- | <code sql> | ||
- | select ... | ||
- | from ... | ||
- | where $contains(EMAIL, 'wmeyers@med.cornell.edu', 'MISSPELLINGS MIN_SCORE=80') ... | ||
- | </code> | ||
- | |||
- | \\ | ||
- | ---- | ||
- | === Final SQL Statement === | ||
- | |||
- | When combined, the final SQL statement has many $CONTAINS functions which combine to provide the full power of PowerSearch: | ||
- | |||
- | <code sql> | ||
- | select count(*) | ||
- | from LIST | ||
- | where $contains(FNAME, 'Bill', 'SYNONYMS=ALL_GIVEN_NAMES') and | ||
- | $contains(LNAME, 'Meyers', 'PHONETIC') and | ||
- | $contains(ADDRESS1, '825 Fifth Avenue', | ||
- | 'SYNONYMS=ALL_ADDRESS_LINES,MISSPELLINGS') and | ||
- | $contains(CITY, 'NY', 'SYNONYMS=CITY_ABBR,MISSPELLINGS') and | ||
- | LIST.STATE = 'NY' and | ||
- | ZIP in | ||
- | (select ZIP | ||
- | from ZIPCODES | ||
- | where $distance((select LATITUDE, | ||
- | LONGITUDE | ||
- | from ZIPCODES | ||
- | where ZIP = '10022'), | ||
- | LATITUDE, | ||
- | LONGITUDE) <= 5) and | ||
- | $contains(PHONE1_AREA, '917', 'SYNONYMS=AREACODES') and | ||
- | $contains(PHONE1_PREFIX, '755', 'MISSPELLINGS MIN_SCORE=70') and | ||
- | $contains(PHONE1_SUFFIX, '4686', 'MISSPELLINGS MIN_SCORE=70') and | ||
- | $contains(EMAIL, 'wmeyers@med.cornell.edu', 'MISSPELLINGS MIN_SCORE=80'); | ||
- | </code> | ||