Integration: Raw Data Files

Fixed-length Files

Environments

The Omnidex Environment File is the key to setting up Omnidex on raw data files. Fixed-length files do not contain any metadata, meaning there is no available information about things like columns and datatypes. The Omnidex Environment File fulfills this role.

Declaring Databases for Raw Data Files

Databases in Omnidex are of a particular type, such as Oracle, SQL Server, MySQL or ODBC. A database that accesses raw data files has its own type: FILE. All tables within this database must be raw data files, though they can be a mixture of the types of raw data files supported by Omnidex.

create database       "SIMPLE"
 type                 FILE
 index_directory      "idx"
 in                   "simple.xml";

Declaring Tables for Fixed-length Files

Tables will point to one or more fixed-length files, and the files must match the record layout specified in the CREATE TABLE statement. In the CREATE TABLE statement, the PHYSICAL clause is used to point to the physical location of the raw data file. This PHYSICAL clause can contain a relative path name (relative to the location of the Omnidex Environment File) such as “dat/states.dat”, or an absolute path name such as “d:/class/lab1/dat/states.dat”. This PHYSICAL clause can also contain wildcards as described below, and can also reference environment variables using special syntax.

create table          "STATES"
 physical             "dat/states.dat"
 ...

In the example above, a single file was used as the source of data for a table. It is also possible for multiple files to be used as the source of data for a table. As long as the files all share the same record layout, they will all be processed when reading the table. Filesets can be declared as a comma-separated list of files, or they can be declared using wildcards in the filename. In this example, three files are declared individually in the PHYSICAL clause:

create table          "STATES"
 physical             "c:/class/lab1/dat/states1.dat, 
                       c:/class/lab1/dat/states2.dat, 
                       d:/class/lab1/dat/states3.dat"
 ...

In this second example, wildcards are used to declare these files:

create table          "STATES"
 physical             "c:/class/lab1/dat/states*.dat, 
                       d:/class/lab1/dat/states*.dat"
 ...

Files can reside in multiple directories and in multiple drives or volumes to allow a table to span multiple storage devices. Files may also reside on a Storage Area Network (SAN); in fact, this is one of the most common implementations of Omnidex. Files may also reside on Microsoft Windows mapped drives; however, users should expect performance degradation to occur in this scenario.

Wildcards provide great flexibility since they allow more data to be added to a table by simply moving another file into place. Care should be taken with this approach, though, since the table must be reindexed as soon as new data is in place.

Additional Resources

See also:

 
Back to top
integration/rawdata/fixed/environments.txt · Last modified: 2016/06/28 22:38 (external edit)