DISC

Contents | What's New | Quick Links

 

Text Indexing

Introduction

Software Installation

Concepts and Design

Designing Applications

Omnidex Environments

SQL Reference

Omnidex API's

Utilities

Interfaces

Performance Guide

Troubleshooting Guide

Appendix

 

 

External Document Indexing

 

External Document Indexing is used to parse and index the contents of text-based documents that are external to a database. HTML files, ASCII files, Word documents, Excel Spreadsheets, and many other types of documents can be indexed and easily retrieved using Omnidex keyword retrievals. For example, a document containing the following paragraph:

At Dynamic Information Systems Corporation (DISC), our mission is to change and enhance the way organizations access, interact with, and analyze their data, to improve efficiency, and to help achieve organizational goals through advanced indexing technology.

Limitations

Supported File Types

Maintenance

ODXEXTERNAL

DBINSTAL

Would be indexed as follows:

("ACCESS", "ACHIEVE", "ADVANCED", "ANALYZE", "AND", "AT", "CHANGE", "CORPORATION", "DATA", "DISC", "DYNAMIC", "EFFICIENCY", "ENHANCE", "GOALS", "HELP", "IMPROVE", "INDEXING", "INFORMATION", "INTERACT", "IS", "MISSION", "ORGANIZATIONAL", "ORGANIZATIONS", "OUR", "SYSTEMS", "TECHNOLOGY", "TO", "THE", "THEIR", "THROUGH", "WAY", "WITH")

A keyword search using any of these words will qualify this document.

keywords: ANALYZE and TECH@
qualify count: 1

 

Limitations

There is currently a 64 Kbyte file size limitation.

However, you can easily get around this by splitting the document into multiple records in the external document table, and then setting the START_OFFSET and STOP_OFFSET fields in each record.

 

 

Supported File Types

ASCII and HTML are supported on all platforms. Other types of supported files are determined by the operating systems' file system. For example, Microsoft Word documents (.DOC), Microsoft Excel spreadsheets (.XLS), text files (.TXT), etc... are all supported on a Windows NT machine.

Microsoft Word must be installed on the machine if Word documents are going to be indexed.

 

 

Maintenance

You can maintain the external document table in the same manner used to add new records. Any methods that call DBIPUT or OAINSERT will automatically update the indexes. However, it is important that you use the following guidelines when making changes to the documents:

If you are going to make changes to the contents of a document, you must delete the document record/row from the external document table prior to making the changes. This will remove the existing indexes. After making the changes, you can re-add the record to update the indexes with the changes.

If deleting a document, be sure to remove the indexes prior to delting the document from the operating system. The document must exist in order for the indexes to be removed.

Failure to remove the indexes prior to changing or deleting a document will cause the indexes to be out of synch with the documents and require a reindex using DBINSTAL.

 

 

ODXEXTERNAL

ODXEXTERNAL is the Omnidex routine that is called for each document file from DBIPUT or OAINSERT during indexing operations. When ODXEXTERNAL is called, it calls the appropriate indexing routine for the specific type of file as defined in the external configuration file. If no custom routine is defined, the default ASCII text file parsing routine will be called.

char*odxexternal( short*dbname,
char*tablename, char*column_name,
char*column_data_ptr, int*column_length)

ODXEXTERNAL returns a pointer to the buffer containing space delimited keywords to be indexed. The number of bytes contained within the buffer is returned in the column_length parameter. If no keywords are returned, a NULL pointer is returned.

The column_data_ptr parameter is an array of char pointers, charr**. The first element point

If performing a BuildFast, using column pointers instead of full record images, because only the column designated as the external link is required, OmniAccess may return a null pointer for the first column, causing the second element in the column_data_ptr to be null. To prevent this, DBINSTAL may need to make sure the Required Column Mask passed to OA prior to the BuildFast's serial read always flags column 1 if any columns in the table use the (;EX) option.

top

Dynamic Information Systems Corporation - Omnidex Version 3.8 Build 6 J15.03-Copyright © 2003

DISC | Documentation Home