Home

Getting Started

Utilities

Indexing

Omnidex

Development

Tutorials

Quick Links

 

OMNIDEX

Omnidex Text

Syntax

Options

Example

 

External Documents

 

Omnidex Text

 

$RETRIEVE_FILE

Some databases may be used to catalog a collection of external files. In these situations, the database contains a series of columns such as title, authorship and filename while the actual file is stored outside the database.

$RETRIEVE_FILE allows text files external to the database to be indexed in the same way as any other textual column. This function is used as part of a pseudocolumn in the Omnidex environment catalog, with the data type commonly declared as a CLOB or C STRING.

The $RETRIEVE_FILE function returns a buffer containing the contents of an external file, using the data type and length specified in the parameters. If no parameters are specified, the default data type and length are returned.

This column can also be used in the WHERE clause or as a select-item of a SELECT statement, although the latter is less common. Usually, the existing application will have an established approach for retrieving the file using the filename, rather than having Omnidex traffic the content within a SQL statement.

 

Syntax

$RETRIEVE_FILE(filename[, datatype[, length[, options]]])

$RETRIEVE_FILE
Required.

filename
Required. Can be a string literal, a column or an expression, and must contain the filename to retrieve.

datatype
Optional. The data type to be used for retrieving the file's content. Typically a CLOB or C STRING is used to retrieve ASCII data such as text and HTML, and BLOB is used to retrieve binary data such as Microsoft Word and Adobe PDF documents. Alternatively, a CLOB can be used to retrieve the text from Microsoft Word and Adobe PDF documents if the EXTRACT_TEXT option is used.
Data types are specified in textual form and may be used with or without lengths. If no lengths are specified, then CLOB is presumed.

length
Optional. The length to be used retrieving the file's content. Lengths may also be specified in the datatype parameter using the standard Omnidex syntax (C STRING(50KB)). If no length is provided in either place, the length defaults to 64KB.

options
Optional. Options to be applied to retrieving this file.

 

 

Options

EXTRACT_TEXT
Extract the text from the file, rather than returning the exact contents of the file. Strips formatting and other non-printable characters.

AUTO_EXTENSION
If the passed filename does not exist, AND the passed filename does not contain an extension (suffix), AND a single file exists with the same name plus an extension in the specified directory, use that file. This option allows filenames to be included without an extension as long as only one file is possible.

STOPWORDS=
Use the STOPWORDS list identified by this option.

INCLUDED_HTML_TAGS=
Use the INCLUDED_HTML_TAGS list identified by this option.

EXCLUDED_HTML_TAGS=
Use the EXCLUDED_HTML_TAGS list identified by this option.

INCLUDED_XML_TAGS=
Use the INCLUDED_XML_TAGS list identified by this option.

EXCLUDED_XML_TAGS=
Use the EXCLUDED_XML_TAGS list identified by this option.

PARSE
Parse the keywords from the text and discard all white space and punctuation.

 

Example

table "CATALOG"

  column "FILENAME" datatype C STRING(255)
  column "CONTENT"  datatype CLOB(16MB)
      as "$retrieve_file(FILENAME)"

Top