Action disabled: source

DRAFT

Dbinstal: Internal Index Specification and Build Utility

Overview

DBINSTAL is an internal server based console utility that is used to install and build the Omnidex indexes on Omnidex Environments.

For Omnidex XML Catalogs, the Dbinstal functions are called directly by the Omnidex Administrator (OdxAdmin). Dbinstal can not be used directly to specify or populate indexes on an Omnidex XML Catalog.

Indexing options are specified during the DBINSTAL installation process and then built with the DBINSTAL Build command.

It is important to note that the two processes, installing the indexes and building the indexes, are two completely separate processes that do not necessarily have to be performed at the same time. This means that there are times when the indexes must be reinstalled but not necessarily rebuilt after upgrading to a more recent version of Omnidex. Please refer to the Release Notes to determine if indexes need to be re-installed or re-indexed after an version upgrade.

Running Dbinstal

DBINSTAL is an Omnidex executable that can be run by entering Dbinstal at a command window prompt.

os> dbinstal

It can optionally take two different command line arguments:

  1. An Omnidex Environment Catalog filespec
  2. Dbinstal index installation or index build script

Running Dbinstal with an Omnidex Environment Catalog

os> dbinstal c:\omnidex\demo\orders.env

Running Dbinstal with an installation and/or build script

os> dbinstal < orders.in

Commands

Operational Commands

The following is a list of commands that can be entered at the DBINSTAL Cmd: prompt during program operation.

System Commands - Execute a system command from within DBINSTAL. See below for more details.

Command Description Options Option Explanation
/ Return to higher level prompt
B Build Omnidex indexes [;buf=n][;nomr][;status] ;buf=n sets the buffer size where n is the number of MegaBytes, 2 - 512.
;nomr No Multi-Record Reads
;status [=file] Outputs indexing status information to the display or a file.
E Exit to the Environment prompt.
H or ? Display help information.
I Specify Indexes Specifies or replaces an existing Omnidex index.
INDEX Build Omnidex indexes - same as the B-Build command.
L Display a range of keys to the screen or a file.
M Enable or disable bitmap merges for an MDK table
MI Display a map of installed indexes.
P=parentname Link a child table to its parent at the Link column #: prompt.
QUIT Quit DBINSTAL. Can be issued at any prompt.
Set UTIL Enable or disable diagnostic info display {ON/OFF}
T=file Load a translation table.
V Display version and existing installation information.
X Loads an excluded words list. [!] Clears the Exclude Word List.
Z Erase and optionally delete the existing index files. [!] Deletes the physical index files from the operating system.

System Commands

System commands can be executed from within DBINSTAL by preceding them with one of the following characters:

! (exclamation point) executes the command as typed

: (colon) upshifts the command before executing

$ (dollar sign) upshifts the command before executing

Indexing Options

Database Level Indexing

Indexes installed at the database-level will apply to all indexes installed on this database.

T - TRANSLATE - Use a custom Translation Table for translating 8-bit extended ASCII characters. The translation table will be applied to all Omnidex indexes that are NOT installed with the ;NT (No Translate) option. To disable translation of 8-bit characters for any column, install the column with the ;NT - No Translate option. The translation table will be applied to all columns except those installed with the ;NT (No Translate) option. More Info

X - XCLUDE - to prevent words contained in an Excluded Words List from being indexed. The excluded words list will be applied to all Omnidex indexes that are NOT installed with the ;NE (No Exclude) option. To disable the excluded words list for any index, install the index with the ;NE - No Exclude option. The excluded words list will be applied to all indexes except those installed with the No Exclude option. More Info

Effective in version 4.0, No Translate and No Exclude are the default MDK indexing options.

Table-Level Indexing

As of Omnidex version 4, there is no need for any table-level indexing options. By default, all tables are installed with ;TRR (see below). The ;TRR option is still supported for backward compatibility but is not necessary.

TRR - Transparency with Rowid - The ;TRR option creates a 32-bit, well-behaved ODXID for each row in the table, and a cross-reference file to link these ODXIDs with the data.

Index-Level Indexing

Option Description
BI Batch Indexing
BM Bitmap Indexing
CI Case Insensitive
DI Distributed Index
EX External Document Index
Gn Grouping to group number
KW Keyword indexing
NE No Exclude - overrides use of Excluded Word for this index
NP No Parse
NT No Translate
RC Record Complex
SX Soundex
Type=t Typecasting
UK Unique Key
BI - Batch Indexing

The Batch Indexing (;BI) option disables the real-time updating of indexes as rows are added, deleted or updated. Batch Indexed indexes are updated only during a B or an INDEX operation.

The Batch Indexing option eliminates the overhead associated with updating indexes in real time when adding, deleting, or updating rows. For example, if data is entered during business hours, when online response is critical, it might be beneficial to install Batch Indexing on indexes that need not reflect the most current data, like a YTD (year to date) index in an ACCOUNTING table. Then a batch process could be run at night or on a weekend, when online response time is less critical, to update Batch Indexed indexes.

Although there are other ways to prevent real-time indexing, the Batch Indexing option gives the added benefit of affecting only selected indexes.

Install Batch Indexing only on indexes that need not reflect the most current data.

BM - Bitmap Indexing

Bitmap indexing stores the Omnidex MDK index as a bitmap. Used on low cardinality data of less than 30 unique values with well behaved Omnidex IDs. Ideal for Data Warehouse applications that do not need real time indexing.

A bitmap index will be created for each unique value in the column installed with the ;BM option, and each bitmap index will contain a bit for each row in the table. Therefore, the table must have a well-behaved rowid to efficiently utilize bitmap indexing.

  • Using the ;BM option will automatically invoke the DI, BI.
  • Used only for MDK indexing.
  • Most beneficial on tables with more than 100,000 rows

Because the Omnidex IDs must be a 32-bit integer, bitmap indexing can only be used on parent tables and DR child tables. A prejoined child table has a 64-bit integer ID and therefore cannot use bitmap indexing.

CI - Case Insensitivity

The CI option causes text searches on the column to be case insensitive. This is most useful for non-parsed MDK fields like a two-digit state field and ASK indexed fields for sorting.

It is not necessary to install ;CI on an MDK indexed field installed with ;KW because case insensitivity is inherent with keyword indexed fields.

DI - Distributed Indexes

Distributed Indexing causes each MDK index to be stored in a separate physical file instead of a single physical file.

  • DI indexing implies the ;NP option and no grouping; that is, ;DI cannot be used in conjunction with ;Gn.
  • Distributed Indexes can be up to 255 bytes where normally a non-parsed MDK Index can be only 32 bytes long.
  • DI keys work in parents and DR children, but not in RS children.
  • Distributed Indexing applies to only MDK Indexing.
  • Most useful on very large databases where the Omnidex indexes hit an operating system limit.

Use the Distributed indexing (;DI) option for the ability to exceed the 2 (or 4) Gbyte file size limit on operating systems with this file size limitation.

Use the Distributed indexing (;DI) option to utilize the efficiencies of multiple CPUs to reduce indexing time.

When indexing Distributed Indexes (;DI), use the Build @@[n] or @[n] option where n ranges from 1 to 16 and represents the number of Distributed Indexes to process per pass through the data. For example, if eight Distributed Indexes are specified and the @@ option is used (where n is not specified), then the index Build process would require eight passes through the internal extract file used for indexing. If the Build option is specified as @@n (where n is 4), then only two passes through the extract file are needed.

EX - External Document Indexing

External Document Indexing is used to parse and index the contents of text-based documents that are external to a database. HTML files, ASCII files, Word documents, Excel Spreadsheets, and many other types of documents can be indexed and easily retrieved using Omnidex keyword retrievals.

Gn - Grouping

Grouping is a very powerful feature that lets you combine several MDK indexes into one logical index. This saves programming effort by requiring only one call to oaqualify to search the entire group of indexes. For example, if you grouped the keyed columns COMPANY, CONTACT and TITLE together, you could search all three of these indexes by referencing any one of them in an oaqualify procedure call.

To group different fields together, append ;Gn(where n represents a group number you assigned) to their names when specifying them at a Keyword field: prompt. For example, to group the COMPANY, CONTACT and TITLE fields from the CUSTOMERS parent table together into a group numbered “3”, you would enter:

Column name: COMPANY;G3
Column name: CONTACT;G3
Column name: TITLE;G3

You can group indexes across tables in the same domain. For example, if the CUST_NOTES child table belonged to the CUSTOMERS domain, you could group CUST_NOTES MDK index ENTERED_BY with CUSTOMERS MDK indexes COMPANY, CONTACT and TITLE:

Table will be linked in the domain of CUSTOMERS . .

Column name: ENTERED_BY;G3

A row will qualify if the search criteria is found in any of the columns in the group.

When grouping indexes together, the following restrictions apply:

  • Any given index can only belong to one group.
  • You can have up to 200 groups per environment catalog.
  • You cannot group a column that you might reference as the to_column in an oajoin operation.
  • You cannot group binary and ASCII (CHARACTER) columns together.
  • Grouped indexes should share similar indexing options. For example, group No Parse indexes only with other No Parse indexes.
  • You should not group columns that contain similar data. For example, grouping an ORDER_DATE column and a SHIP_DATE column may produce confusing results when the index is searched.
KW - Keywording

Keyword indexing parses a text field, allowing keyword searches. This option applies to MDK indexes only.

By default, ;KW fields are

parsed, meaning each individual word is stored separately in the indexes and punctuation and white space are stripped out. case insensitive and are translated if a translation table is specified if an excluded words list is provided, “noise” or “stop” words are stripped out.

Example:

ID Company Name
-----------------------------------------------------

21 Dynamic Information Systems Corporation

When installed with the ;KW option, the following is stored in the indexes for this record:

21 CORPORATION
21 DYNAMIC
21 INFORMATION
21 SYSTEMS

This record can then be retrieved by including any of the 4 keywords as criteria.

NE - No Exclude

More Info

This option is the default as of Omnidex version 4.0 but is supported for backward compatibility. It causes DBINSTAL to ignore the excluded word list for this index. It also causes zeros to be indexed for binary MDK Indexes and blanks and nulls to be indexed for ASK Indexes.

The NE option applies to MDK and ASK Indexes.

  • When installed on ASCII MDK indexes, the No Exclude option (;NE) disables the excluded word list, and causes the indexing of every word in a keyed field (except for blanks, zeros or null entries).
  • Normally, excluded words are not indexed, but the No Exclude option causes them to be indexed for any index on which it is installed. For example, an indexed STATE column that could contain “OR” and “ME”, the state abbreviations for Oregon and Maine, would benefit from the NE option if the excluded word list contained the words “OR” and “ME”.
  • To index blanks, zeros and nulls for an ASCII MDK index, combine the No Exclude option with the No Parse option.
  • When installed on binary MDK indexes, the No Exclude option causes all values to be indexed, including binary zeros. By default, binary zero values in numeric columns are excluded.
  • When installed on ASK indexes, the No Exclude option causes the indexing of blanks, zeros and nulls. ASK indexes do not use the excluded word list.
NP - No Parse

This option is the default as of Omnidex version 4.0 but is supported for backward compatibility. It causes DBINSTAL to store data contained in a column as one entire keyword, including special characters (! ? * : ; { } [ ] . ,) and blanks. This means that data in the column is not parsed into separate keywords. When using No Parse on a keyword column, the first 32 bytes of the corresponding field are indexed as one keyword, including embedded blanks, special characters, and control characters.

No Parse is useful for columns holding nine-digit zip codes containing hyphens (for example, “80033-4030”) or phone numbers containing hyphens or parentheses (for example, “(303) 444-4000”).

When searching columns that use the NP option, enclose in quotation marks argument strings that contain special characters or blanks. For example, the search string:

“ ABC”*

qualifies rows with keywords that start with a blank followed by “ABC”.

The search string

“ BC”:“ DA”

qualifies rows with keywords between those starting with one blank and the letters “BC” through those starting with a blank and the letters “DA”.

The wildcard or pattern-matching character occurs outside the quotes. If you place them within the quotes, they are included in the keyword search.

When an ASCII column is used in oajoin operations as the to_column, install the No Parse option on it. This prevents the column from being parsed, and ensures the correct correspondence between rows in the joined tables.

NT - No Translate

More Info

This option is the default as of Omnidex version 4.0 but is supported for backward compatibility. For both MDK and ASK indexes, the No Translate (;NT) option disables:

  • translation of 8-bit extended ASCII characters to their 7-bit equivalents.
  • custom translations established through a translation table.
  • upshifting of all lower case characters to uppercase

In a column with the ;NT option, “George Peña”, for example, would be indexed as “George” and “Peña”, not as “GEORGE” and “PENA”. Consequently, when you search for “Peña” on such a column, you must enter Peña; pena, PENA, peña, and PEÑA will not locate the row.

Use No Translate if you want to store keyword values for a particular field in case sensitive form, or if you want to prevent Omnidex from translating 8-bit extended character sets.

If you group ASK indexes installed with the No Translate option, group them with other No Translate indexes.

RC - Record Complex

The Record Complex option can only be installed on child tables. When installed on an MDK index in a child table, it stores that index’s values as though they belong to the parent table.

By default, you can use MDK indexes installed on a child table to find and ultimately retrieve rows from either the child or its parent. When you install the Record Complex option, you restrict the use of child indexes to the qualification of parent rows. However, Record Complex indexes require less overhead to maintain. Therefore you might install the Record Complex option on any MDK indexes in a child that you would only use to find records in its parent.

For example, if you only use the STATUS column of an ORDERS child table to see which CUSTOMERS are waiting for back-ordered merchandise, you would install the Record Complex option on the STATUS index.

Applies to MDK Indexes only.

SX - Soundex

The Soundex option, when installed on an MDK index, lets you find rows by using phonetic arguments. This feature is useful for MDK index columns containing data that may be difficult to spell (for example, Galeziowski) or that can be spelled several ways (for example, Allen). Installing Soundex on these types of column could make names easier to find.

To search using a phonetic argument, spell the argument the way it sounds and end it with an exclamation point ( ! ). For example, to find all records with the keywords “ALLEN”, “ALAN” OR “ALAINE” indexed for a given key, search that key using the argument ALAN!.

The restrictions for Soundex are:

  • You can install Soundex only on parsed character columns. This means that you cannot install Soundex and No Parse on the same column.
  • Soundex indexes can require twice the disk space of a non-Soundex index because a keyword is indexed in both its actual and Soundex spelling. Install Soundex only where you need it.
  • Soundex indexes should be grouped with other Soundex indexes. Grouping Soundex indexes with non-Soundex indexes can interfere with Soundex searches or yield inaccurate results.
Type=t - Typecasting

Typecasting informs Omnidex when data is stored in a format that does not match the data type declared in the Omnidex environment catalog for a keyed column. To typecast a keyed column, composite index, or composite index component, append ;TYPE= t to its name, where t equals one of the following values:

D - D float values (OpenVMS only) E - IEEE floating point value F - F float values (OpenVMS only) G - G float values (OpenVMS only) I - Signed binary integer values on 2-byte boundaries J - Signed binary integer values on 2-byte boundaries K - Unsigned binary integer values on 2-byte boundaries L - Unsigned 1-byte binary integer values P - Packed decimal values R - Floating point values (MPE/iX only) U - Character values, all uppercase V - C-String X - Character values Z - Zoned

UK - Unique Key

The Unique Key option applies exclusively to ASK indexes.

When installed on a sorted key, it imposes a constraint on that key that prevents duplicate values from being indexed for that key. If an application tries to post a duplicate value to the ASK index, it incurs the error:

9143 - A record with this primary (unique) key value already exists.

Program Operation

DBINSTAL uses an interactive prompting sequence for initial prototyping of the Omnidex indexes. For production operations, the input is generally saved in separate script files for the indexing Install and the indexing B (or Index) operations. DBINSTAL continues to prompt for additional Tables and Columns until a '/' is specified as the table or column name. Installation batch files must follow the prompting sequence exactly to avoid errors during execution.

First run DBINSTAL:

osf> [/users/test]$ dbinstal orders.env

The DBINSTAL header is displayed followed by the Cmd: prompt.

Cmd:

Enter any database-level indexing options prior to beginning the installation.

To install indexes use the Install command followed by table options and index options (on columns). The following illustrates the DBINSTAL prompts:

- Enter a compiled Environment filespec. If the environment file suffix is omitted, .env is assumed.

Database or Environment: /usr/disctest/orders.env

Cmd: Install

At the Cmd: prompt, enter any desired database-level options, Install or I to begin the installation or ? or H for a list of valid commands.

Multi-Dimensional Keyword (MDK) Indexes are specified first.

- Enter a table name followed by the desired table options.

Table name? ORDERS

If the table entered is a child table, you will be prompted for a column linking it to its parent. To index this table in its parent's domain, enter the link column name or P=parentname.

Link Column 1: ACCT

or

Link Column 1: P=PROSPECTS

To index this table as both a child and a parent, enter $PK at the first link column prompt and then the link column at the second prompt.

  Link Column 1: $PK
  Table will be indexed in its own domain

  Link Column 2: P=PROSPECTS
  Table will be indexed in the domain of PROSPECTS

  Link Column 3: /

To index this table in its own domain and in its parent's domain, enter the link column and then $DR.

  Link Column 1: P=PROSPECTS
  Table will be indexed in the domain of PROSPECTS

  Link Column 2: $DR
  Table will be indexed in its own domain

  Link Column 3: /

To index this table in its own domain without also indexing it in its parent's domain, enter a forward slash.

Link Column 1? /

- Enter the column name to be indexed followed by the desired indexing options.

Column name: mycolumn;NP

To specify a composite index of multiple columns, specify a name with the exclamation point:

  Column name: CITY_STATE!
  Component 1 (Column [, start, length]): City
  Component 2 (Column [, start, length]): State
  Component 3 (Column [, start, length]): /

- Enter an additional column or a slash to exit the current table when all columns are specified.

Column name: /

- Enter an additional table name or a slash to exit the MDK installation.

Table name? /

Enter Aggregation/Sorted Keyword (ASK) Indexes next.

  Table name? mytable

  Column name: agg1!
  Component 1 (Column [,start,length]): Account
  Component 2 (Column [,start,length]): Trans_amount
  Component 3 (Column [,start,length]): /

  Column name: /

  Table name: /

  Proceed with installation [Y]?

  Press return or enter 'Y' to install the indexes.

  DBINSTAL will now create the empty index files.

Check the Status of Indexing Operations

When indexing in real time, DBINSTAL issues a status report of the tables that have been processed, the number of keywords unloaded, and the number of keywords loaded. When indexing in batch, the progress of the operation is not automatically listed to the terminal.

To check the status of an indexing operation running in batch, or to get a more detailed account than the online display, use the STATUS option. The STATUS option causes DBINSTAL to list the progress of an indexing operation to a file named ODXSTAT. DBINSTAL builds this file in the login directory (or account). This file can be examined to check the status of any INDEX or B operation being performed.

To use the STATUS option, append ;STATUS to the B command.

Cmd: B;STATUS

When the process is executing, examine ODXSTAT by using TYPE in DCL, PRINT in MPE or more in UNIX.

Index Specification versus Builds

It is important to note that the two processes, installing the indexes and building the indexes, are two completely separate processes that do not necessarily have to be performed at the same time. This means that there are times when the indexes must be reinstalled but not necessarily rebuilt, for instance when upgrading a minor version of Omnidex, with the exception of upgrading from 4.0.7 to 4.0.8. See Index File Limits for more information about this upgrade.

There are also times when the indexes must be rebuilt but not necessarily reinstalled, like when deferring indexing to a batch process in an update application. The existing index installation is still valid but the indexes are built and rebuilt during periodic batch processes.

The actual process of the Build command is to serially read the specified table and unload the appropriate columns of data into a special Omnidex unload file. This file is then read and the physical Omnidex indexes are populated with the internal Omnidex index structure from the unload file. Omnidex has options to generate multiple indexes and uses very high-speed processing techniques to minimize the actual indexing time.

For the initial index installation and build, DBINSTAL should be run interactively. Manually define the MDK and ASK indexes for each table and then run the Build command to populate the index files. After this is complete, run OAHELPER to generate an installation batch script. This script allows the indexes to be re-installed and re-built in the future by simply executing the script. This installation script can be easily modified using any text editor.

Settings

DBINSTAL Environment Variables The following variables can be set before running DBINSTAL.

The required variables (above) must also be set prior to running DBINSTAL.

DBINSTAL uses the temporary directory while building the indexes.

OA_MAX_ROWS OA_MAX_ROWS resticts the number of rows to be indexed. This setting should be used for testing purposes only.

Setting OA_MAX_ROWS to 100 for example, will cause the first 100 rows of each table to be indexed. However, there is no guarantee, in a parent/child relationship, that the 100 rows in the child table will belong to any of the 100 rows in the parent table.

Unix export OA_MAX_ROWS=100

Windows set OA_MAX_ROWS=100

OA_SKIP_ROWS OA_SKIP_ROWS skips indexing the specified number of rows.

Unix export OA_SKIP_ROWS=100

Windows set OA_SKIP_ROWS=500

OA_OPTIONS OA_OPTIONS sets the connect options that will be used to connect to the Environment Catalog. Options can be any valid options for a CONNECT.

Unix export OA_OPTIONS=“WRITE”

Windows set OA_OPTIONS=“READ”

Buffer Size

Use the DBINSTAL Build BUF=n option to allocate an appropriate amount of memory for DBINSTAL indexing operations.

Determining the optimal BUF=n size is best done by trial and error on a subset of the database. Monitor the number of sort segments required (displayed by DBINSTAL). The maximum is 2048 segments. Vary the BUF=n size on the database subset, and use the BUF=n value that provides a sort segment size closest to 1024.

Use 8 Mbytes of BUF=n space for each Distributed Index (;DI) processed. If a Build option of @9 is used, then the BUF=n size should be set to 72 (8 * 9).

Indexing Child Tables

A child table can be indexed as a stand-alone table in its own domain, in its parent's domain, as a parent and child, or any combination of these. With all of these options, Omnidex can always ensure the best query performance.

Child tables are defined with foreign key constraints in the Omnidex environment catalog. This causes DBINSTAL to issue a Link Column prompt when defining indexes for a child table.

To index a child table in its own domain, without indexing it in its parent's domain, simply enter a forward slash at the Link Column prompt:

Table name? ORDERS
Link Column 1: /

To index a child only in its parent's domain, enter the column name that links it to its parent or enter P=parentname, at the Link Column prompt:

Table name ? ORDERS
Link Column 1: ACCT

or

Link Column 1: P=CUSTOMERS
Table will be indexed in the domain of CUSTOMERS
Link Column 2: /

To index a child in both its parent's domain and its own domain, link the child table to the parent table and then enter $DR at the next Link Column prompt:

Link Column 1: P=CUSTOMERS
Table will be indexed in the domain of CUSTOMERS
Link Column 2: $DR
Table will be indexed in its own domain

To index a table as both a parent and a child table, enter $PK at the first Link Column prompt, then link the table to its parent at the next Link Column prompt:

Link Column 1: $PK
Table will be indexed in its own domain
Link Column 2: P=CUSTOMERS
Table will be indexed in the domain of CUSTOMERS
 
Back to top
programs/dbinstal/home.txt · Last modified: 2016/06/28 22:38 (external edit)