This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
admin:indexing:text:relevancy [2012/01/31 23:03] doc created |
admin:indexing:text:relevancy [2016/06/28 22:38] (current) |
||
---|---|---|---|
Line 10: | Line 10: | ||
[[admin:indexing:text:retrieve|External Files]] | | [[admin:indexing:text:retrieve|External Files]] | | ||
[[admin:indexing:text:proximity|Proximity Searches]] | | [[admin:indexing:text:proximity|Proximity Searches]] | | ||
- | [[admin:indexing:text:contains|Advanced Searches]] | | + | [[admin:indexing:text:advanced|Advanced Searches]] | |
[[admin:indexing:text:results|Displaying Results]] | | [[admin:indexing:text:results|Displaying Results]] | | ||
**[[admin:indexing:text:relevancy|Relevancy]]** | **[[admin:indexing:text:relevancy|Relevancy]]** | ||
Line 18: | Line 18: | ||
==== Relevancy ==== | ==== Relevancy ==== | ||
- | Proximity searches can qualify rows with large blocks of text using Phrase Searches, BEFORE Searches and NEAR Searches. Once the rows are qualified, the obvious next step is to display the results. This can be more difficult if the text is as long as an entire book, or some other large block of text. | + | When searching large blocks of text, relevancy becomes more important. If the criteria occurs many times in a large block of text, it can be considered more relevant than if the criteria occurs only once. When multiple words are nearer to each other, or occur nearer to the beginning of the block of text, it can be considered more relevant. Omnidex provides a $SCORE function that provides relevancy scores based on these considerations. Rows can be filtered or ordered based on relevancy scores so that the most valuable blocks of text are shown first. |
- | + | ||
- | Omnidex allows excerpts to be retrieved from large blocks of text to make viewing easier. These excerpts show the portions of the text that qualified the row, with the search terms highlighted. | + | |
- | + | ||
- | Excerpts are retrieved using the $CONTEXT function. The $CONTEXT function works hand-in-hand with the $CONTAINS function. The $CONTAINS function is used to label a particular search, and the $CONTEXT function retrieves excerpts for that same label. This is needed before there may be other criteria in the SQL statement, and even multiple Proximity Searches against multiple columns and tables. Only one Proximity Search can feed these excerpts, necessitating the pairing of the $CONTAINS function and the $CONTEXT function. | + | |
=== $SCORE Function === | === $SCORE Function === | ||
- | The [[dev:sql:functions:context|$CONTEXT]] function retrieves excerpts of a text field based on a paired $CONTAINS function. By default, a simple excerpt is displayed; however, options exist to allow embedding HTML tags to highlight the search terms for easy display in a web environment. | + | The [[dev:sql:functions:score|$SCORE]] function retrieves excerpts of a text field based on a paired $CONTAINS function. The score is a number between 1 and 100, with 100 representing the highest relevancy. Note that the $SCORE function only returns a relevancy score when paired with a $CONTAINS function; otherwise, it will always return a score of 100. |
<code> | <code> | ||
- | > select TITLE, | + | > select $score, |
+ | >> TITLE, | ||
>> $context | >> $context | ||
>> from BOOKS | >> from BOOKS | ||
- | >> where $contains(CONTENT, 'missisipi', 'misspellings'); | + | >> where $contains(CONTENT, '(place near(25) home)'); |
+ | |||
+ | $SCORE | ||
+ | -------------------------------- | ||
TITLE | TITLE | ||
----------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ||
$CONTEXT(BOOKS.CONTENT) | $CONTEXT(BOOKS.CONTENT) | ||
----------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ||
- | Around the World in Eighty Days | + | 49.410000 |
- | --- at Nauvoo, on the *Mississippi*, numbering twenty-five thousand --- | + | The Wonderful Wizard of Oz |
- | >> night it crossed the *Mississippi* at Davenport, and by --- | + | --- There is no *place* like *home*." --- |
+ | 37.210000 | ||
The Adventures of Tom Sawyer | The Adventures of Tom Sawyer | ||
- | --- a point where the *Mississippi* River was a trifle --- and saw the broad | + | --- I want to go *home*." "But, Joe, there ain't such another |
- | >> *Mississippi* rolling by! --- | + | >> swimming-*place* anywhere." --- |
- | 2 rows returned | + | |
- | </code> | + | |
- | Excerpts can be easily formatted for display using HTML, including assigning CSS classes as needed: | + | 36.400000 |
- | + | ||
- | <code> | + | |
- | > select TITLE, | + | |
- | >> $context(255, 'STYLE=HTML CLASSES') | + | |
- | >> from BOOKS | + | |
- | >> where $contains(CONTENT, 'missisipi', 'misspellings'); | + | |
- | TITLE | + | |
- | ----------------------------------------------------------------------------- | + | |
- | $CONTEXT(BOOKS.CONTENT) | + | |
- | ----------------------------------------------------------------------------- | + | |
Around the World in Eighty Days | Around the World in Eighty Days | ||
- | --- at Nauvoo, on the <span class="odx_word">Mississippi</span>, numbering | + | --- travelled nor stayed from *home* overnight, he felt...this would be the |
- | >> twenty-five thousand --- night it crossed the <span | + | >> *place* he was after. --- which was to take *place* the next...found him |
- | >> class="odx_word">Mississippi</span> at Davenport, and by --- | + | >> not at *home*. --- |
- | The Adventures of Tom Sawyer | + | 3 rows returned |
- | --- a point where the <span class="odx_word">Mississippi</span> River was a | + | |
- | >> trifle --- and saw the broad <span class="odx_word">Mississippi</span> | + | |
- | >> rolling by! --- | + | |
- | 2 rows returned | + | |
</code> | </code> | ||
- | If the statement contains multiple $CONTAINS functions, they should be labelled with distinct names, and the $CONTEXT should reference the appropriate $CONTAINS clause. The excerpts will be created based on that column's criteria. | + | If the statement contains multiple $CONTAINS functions, they should be labelled with distinct names, and the $SCORE function should reference the appropriate $CONTAINS label. The relevancy score will be created based on that column's criteria. |
<code> | <code> | ||
- | select TITLE, | + | > select $score(, 'CONTENT') RELEVANCY, |
- | $CONTEXT(255, 'STYLE=TEXT', 'CONTENT') | + | >> TITLE, |
- | from BOOKS | + | >> $context(255, 'STYLE=TEXT', 'CONTENT') |
- | where $contains(LANGUAGE, 'English',, 'LANGUAGE') and | + | >> from BOOKS |
- | $contains(CONTENT, 'missisipi', 'misspellings', 'CONTENT'); | + | >> where $contains(LANGUAGE, 'English',, 'LANGUAGE') and |
+ | >> $contains(CONTENT, 'magic',, 'CONTENT') | ||
+ | >> order by RELEVANCY desc; | ||
+ | RELEVANCY | ||
+ | -------------------------------- | ||
TITLE | TITLE | ||
----------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ||
$CONTEXT(BOOKS.CONTENT) | $CONTEXT(BOOKS.CONTENT) | ||
----------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ||
+ | 83.350000 | ||
+ | The Wonderful Wizard of Oz | ||
+ | --- The *Magic* Art of the Great --- will use all the *magic* arts I know of | ||
+ | >> --- me to use my *magic* power to send you --- and then by her *magic* | ||
+ | >> arts made the iron --- | ||
+ | |||
+ | 56.380000 | ||
Around the World in Eighty Days | Around the World in Eighty Days | ||
- | --- at Nauvoo, on the *Mississippi*, numbering twenty-five thousand --- | + | --- transferred by some strange *magic* to the antipodes. --- interest, as |
- | >> night it crossed the *Mississippi* at Davenport, and by --- | + | >> if by *magic*; --- |
- | The Adventures of Tom Sawyer | + | |
- | --- a point where the *Mississippi* River was a trifle --- and saw the broad | + | 52.360000 |
- | >> *Mississippi* rolling by! --- | + | Alice's Adventures in Wonderland |
- | 2 rows returned | + | --- for Alice, the little *magic* bottle had now had --- |
- | > | + | 52.360000 |
- | </code> | + | Hamlet |
+ | --- thrice infected, Thy natural *magic* and dire property, On --- | ||
+ | 4 rows returned</code> | ||
===== ===== | ===== ===== | ||
- | **[[admin:indexing:text:contains|Prev]]** | | + | **[[admin:indexing:text:results|Prev]]** |
- | **[[admin:indexing:text:relevancy|Next]]** | + | |
====== Additional Resources ====== | ====== Additional Resources ====== |