Show pagesource

Old revisions

Omnidex SQL Quicklinks
Overview	Statements	Functions	Examples	Quick Reference

Omnidex SQL: CREATE TABLE

Overview | Syntax | Examples

Syntax

CREATE [ < GLOBAL | LOCAL > ] TEMPORARY ] TABLE table
    [ NODE node]
        [ OPTIONS "options" ] 
        [ PHYSICAL < "filespec" | rdbms_table > ]
        [ DATA_CACHING < DYNAMIC | NONE > ]
        [ PARTITION_BY "criteria" ]
        [ INDEX_MAINTENANCE < API | DBMS > ]
    ( 
        column_specification,
        [ constraint_specification, ]
        [ index_specification, ]
        [ index_group_specification ]
    )
    [ AS statement ]
    [ IN filename ]
    [ WITH options ]

TABLE table_name

The table_name uniquely identifies the table within this database. It may be up to 32 characters long, and can contain letters, digits and the following special characters: ! @ # $ % ^ _. It must begin with a letter and is case-insensitive. The table_name must be unique within the database. If table_name is a SQL reserved word, enclose it in double quotation marks.

If you will have more than one database within this environment, consider having table names be unique across all databases. This prevents developers from having to qualify the table with the database name, using the syntax “MYDATABASE”.“MYTABLE”.

create table                 "LIST"
  ...
  in                         "list.xml"

NODE grid_node

The optional NODE clause is used to configure an Omnidex Grid. Nodes that were previously declared in the CREATE ENVIRONMENT statement can be referenced here, and the table configuration can be named for each node. The NODE clause is only required for tables within an Omnidex Grid that are partitioned and distributed across multiple nodes. If not using an Omnidex Grid, the NODE clause can be skipped and the underlying configuration can be specified for the entire table.

For simple environments, the table configuration does not require NODE clauses:

create table                 "LIST"
 physical                    "dat/list.dat"
 (
   column "NAME"             CHAR(50),
   column "ADDRESS1"         CHAR(50),
   column "ADDRESS2"         CHAR(50),
 )
 in                          "list.xml";

For Omnidex Grid environments, the table requires NODE clauses:

create table                 "LIST"
 node "GRID01"
  physical                    "dat/list1.dat"
 node "GRID02"
  physical                    "dat/list2.dat"
 node "GRID03"
  physical                    "dat/list3.dat"
 node "GRID04"
  physical                    "dat/list4.dat"
 node "GRID05"
  physical                    "dat/list*.dat"
 (
   column "NAME"             CHAR(50),
   column "ADDRESS1"         CHAR(50),
   column "ADDRESS2"         CHAR(50),
 )
 in                          "list.xml";

OPTIONS options

The OPTIONS clause provides options specific to the table. The primary use of the OPTIONS clause is to provide information about a delimited file or an Omnidex Standalone Table. The following article on delimited files gives specific instructions for use of this OPTIONS clause.

PHYSICAL <"filespec" | rdbms_table>

The PHYSICAL clause specifies the location of the data that comprises this table. The PHYSICAL clause varies according to the database type as follows:

Database Type	Description
FILE	For fixed-length raw data files and delimited files, this clause contains one or more filenames or wildcard filesets, separated by commas. All of the files listed here will be sequentially combined to comprise the table. For Omnidex Standalone Tables (OST), this contains the name of OST file. Example: “dat/list1.dat,dat/list2.dat,dat/list3*.dat”.
ORACLE	The PHYSICAL clause contains the name of the Oracle table or view. The table can be qualified by the owner as necessary. Example: “OWNER.TABLE”.
SQLSERVER	The PHYSICAL clause contains the SQL Server table or view. The table can be qualified by the database as necessary. Example: “DB.TABLE”.
ODBC	The PHYSICAL clause contains the SQL Server table or view. The table can be qualified by the database as necessary. Example: “DB.TABLE”.

DATA_CACHING < DYNAMIC | NONE >

The DATA_CACHING clause specifies whether this table is eligible for using Omnidex's internal hash tables for high-speed joins. A value of DYNAMIC makes this table eligible and a value of NONE makes this table ineligible. Tables are eligible by default.

PARTITION_BY "criteria"

The PARTITION_BY clause is used in Omnidex Grids to declare the criteria used to partition this table into each node. Each node uses a different PARTITION_BY clause, with each node specifying the criteria for that node.

Partitioning criteria is limited to simple criteria consisting of equalities, IN clauses and BETWEEN clauses. Multiple criteria can be used; however they must be connected using AND operators.

create table          "HOUSEHOLDS"
 node "NODE01"
  physical            "dat/households01.dat"
  partition_by        "STATE in ('NY','PR','MA','RI','ME','NH','VT','CT','NJ',
                                 'PA','DE','DC','MD','VA','WV','NC','SC','GA',
                                 'FL','AL','TN','MS','KY','OH') and 
                       ZIP between '01000' and '45999'"
 node "NODE02"
  physical            "dat/households02.dat"
  partition_by        "STATE in ('IN','MI','IA','WI','MN','SD','ND','MT','IL',
                                 'MO','KS','NE','LA','AR','OK','TX','TX','CO',
                                 'WY','ID','UT','AZ','NM','NV','CA','HI','OR',
                                 'WA','AK') and 
                       ZIP between '46000' and '99999'"

The following article on Omnidex Grids gives specific instructions for partitioning tables.

INDEX_MAINTENANCE < API | DBMS >

The INDEX_MAINTENANCE clause is an optional clause that specifies how Omnidex indexes will be updated. The index_maintenance_type can be either API or DBMS.

API - The indexes will be updated by Omnidex automatically as the database data is updated.
DBMS - The indexes will be updated by relational database triggers in conjunction with the OdxAIM services.

This setting applies to this table, and overrides any setting made at the database level. If no setting is made, API is used as the default.

column_specification

    column_name  column_datatype 
        [ PHYSICAL "rdbms_name" ]
        [ USAGE usage ] 
        [ FORMAT format ]
        [ constraint_specification ] 
        [ index_specification ]
        [ AS expression ],

column_name

The column_name uniquely identifies the column within this table. It may be up to 32 characters long, and can contain letters, digits and the following special characters: ! @ # $ % ^ _. It must begin with a letter and is case-insensitive. The column_name must be unique within the table. If column_name is a SQL reserved word, enclose it in double quotation marks.

The column_name does not need to be the same as in the relational database. Use the PHYSICAL clause to have Omnidex recognize a different name than the underlying database.

column_datatype

The column_datatype declares the datatype for the column. This datatype is one listed in the datatypes section of the Appendix.

For relational databases, this datatype does not need to be identical to the datatype in the relational database. For instance, a NUMBER datatype in Oracle can be represented as an INTEGER datatype in Omnidex. An INTEGER datatype in SQL Server can be represented as a CHARACTER(20) in Omnidex.

Similarly, the lengths of character and string fields do not need to be identical to the length in the relational database. For instance, a VARCHAR(200) datatype in Oracle can be represented as a CHARACTER(50) in Omnidex, and all values greater than 50 will be appropriately truncated.

These same rules are also true for delimited files. Since the delimited files store data in character format, the datatype declared here tells Omnidex how to convert the data for use in SQL statements.

For fixed-length raw data files and for Omnidex Standalone Tables (OST), the datatype and length must match the data in the data file.

PHYSICAL "rdbms_name"

The PHYSICAL clause specifies the name of the column within the relational database. This clause is not used with raw data files, delimited files or Omnidex Standalone Tables.

USAGE usage

The USAGE clause declares the content of the column, as contrasted with the datatype of the column. For example, a column may be a CHARACTER(7) or a DOUBLE datatype; however, its usage may be LATITUDE or LONGITUDE.

At present, the only usages are LATITUDE and LONGITUDE. In future versions of Omnidex, more usages are anticipated.

FORMAT format

The FORMAT clause determines the extent to which a date-class datatype should be indexed. The FORMAT clause may contain contiguous pairs of characters within the string: YYYYMMDDHHNNSSFF. The characters within this string are defined as follows:

YYYY is the full year MM is the month of the year DD is the day of the month HH is the hour of the day NN is the minute of the hour SS is the second of the minute FF is the fraction of the second

By default, Omnidex indexes date and datetime datatypes as YYYYMMDD, and time datatypes as HHNNSSFF. These can be overriden as desired. For example, to index a datetime to its full extent, choose a format of YYYYMMDDHHNNSSFF.

AS expression

The AS clause declares an expression-based column in the table. The column does not exist in the underlying relational table or in the raw data files, but rather is derived from the SQL expression. In many ways, this is similar to what can be achieved in a relational view.

The SQL expression can reference any of the other columns in this table, and can perform functions, mathematical operations, string operations or other similar functions. With few exceptions, the SQL expression can be any expression used as a select item in a SQL SELECT statement.

Expression-based columns are the basis for creating expression-based indexes. If an Omnidex index is installed on the expression-based column, then this equates to an expression-based index. If the same expression is recognized in the SQL SELECT statement, it will recognize that this equates to the expression-based index and will optimize the query accordingly.

The following article on expression-based columns and indexes describes these optimizations in more detail.

constraint_specification

Normal constraint definition:

  [ CONSTRAINT constraint_name ]
      < PRIMARY [ KEY ] ( column [, column ... ] )   |
        UNIQUE  [ KEY ] ( column [, column ... ] )   |
        DISTINCT [ KEY ] ( column [, column ... ] )  |
        FOREIGN [ KEY ] ( column [, column ... ] ) 
            REFERENCES table [ ( column [, column ... ] ) ] [ PREJOIN ]  >

Inlined constraint definition:

  [ CONSTRAINT constraint ]
      < PRIMARY [ KEY ]  |
        UNIQUE  [ KEY ]  |
        DISTINCT [ KEY ] |
        FOREIGN [ KEY ] REFERENCES table [ ( column [, column ... ] ) ] [ PREJOIN ]  >

CONSTRAINT constraint_name

The constraint_name is optional and uniquely identifies the constraint within this table. It may be up to 32 characters long, and can contain letters, digits and the following special characters: ! @ # $ % ^ _. It must begin with a letter and is case-insensitive. The constraint_name must be unique within the table. If constraint_name is a SQL reserved word, enclose it in double quotation marks.

The constraint_name does not need to be the same as in the relational database.

PRIMARY KEY

The PRIMARY KEY clause declares a primary constraint in the table. A primary constraint identifies a column, or set of columns, that uniquely identify each row in the table. A primary constraint is also paired with foreign constraints to establish parent-child relationships between tables.

A primary constraint may reference more than one column if declared after the columns in the table, or may reference a single column if declared inline with the columns. A table does not have a presumed primary constraint unless one is declared.

A table may only have one primary key constraint.

UNIQUE KEY

The UNIQUE KEY clause declares a unique constraint in the table. A unique constraint identifies a column, or set of columns, that uniquely identify each row in the table. Furthermore, the unique constraint is used within the Omnidex indexes as the reference pointer for each row.

A unique constraint may reference more than one column if declared after the columns in the table, or may reference a single column if declared inline with the columns. By default, the UNIQUE KEY is assumed to be the table's rowid unless the RDBMS does not support rowids. When no rowid exists, the UNIQUE KEY is assumed to be the table's primary key. If no primary key exists, then the UNIQUE KEY must be specifically declared in order to be indexed with Omnidex.

A table may only have one unique key constraint.

DISTINCT KEY

The DISTINCT KEY clause declares a distinct constraint in the table. A distinct constraint is used in Omnidex Grids to identify columns, or a set of columns, that are distinct on each node. Columns are distinct on each node when specific values are not duplicated across nodes. For example, if an Omnidex Grid is partitioned so that there is a node for each geographic region, then the REGION code will be distinct on each node. Conversely, the LAST_NAME column will not be distinct as it likely repeats on all nodes.

A distinct constraint may reference more than one column if declared after the columns in the table, or may reference a single column if declared inline with the columns. A table does not have any presumed distinct constraints unless they are declared.

A table may have multiple distinct key constraints.

The following article on Omnidex Grids describes the use of distinct constraints in more detail.

FOREIGN KEY

The FOREIGN KEY clause declares a foreign constraint in the table. A foreign constraint identifies a column, or set of columns, that are reference the primary key of another table to establish a parent-child relationship.

A foreign constraint may reference more than one column if declared after the columns in the table, or may reference a single column if declared inline with the columns. A table does not have any presumed foreign constraints unless they are declared.

A table may have multiple foreign key constraints; however, a unique column or set of columns can only reference one parent table. For the purposes of Omnidex prejoined indexing, only the first 31 foreign key constraints are considered.

PREJOIN

The PREJOIN keyword indicates that this table should be pre-joined within the Omnidex Indexes based on this foreign constraint.

index_specification

Normal index specification:

    [ < OMNIDEX | QUICKTEXT | FULLTEXT | CUSTOM | NATIVE > [INDEX] index_name
        ( < column | substring > [ , < column | substring > ... ] ) 
        indexing_options

Inlined index specification:

    [ < OMNIDEX | QUICKTEXT | FULLTEXT | CUSTOM | NATIVE > [INDEX] 
        indexing_options

indexing_options:

    Options available for OMNIDEX index:

        [ BITMAP ]
        [ STANDALONE ]
        [ < CASE_INSENSITIVE | CASE_SENSITIVE > ]

    Options available for QUICKTEXT indexes:

        [ < CASE_INSENSITIVE | CASE_SENSITIVE > ]

    Options available for FULLTEXT indexes:

        [ < CASE_INSENSITIVE | CASE_SENSITIVE > ]

    Options available for CUSTOM indexes:

        [ BITMAP ]
        [ STANDALONE ]
        [ KEYWORDING ]
        [ PROXIMITY ]
        [ < CASE_INSENSITIVE | CASE_SENSITIVE > ]
        [ PHONETIC ]
        [ PREJOIN table_name ]
        [ < RECORD_SPECIFIC | RECORD_COMPLEX > ]

OMNIDEX INDEX

The OMNIDEX INDEX clause declares an Omnidex Index. An Omnidex Index is the basic index used in most scenarios. An Omnidex Index can optimize criteria, table joins, aggregations and ordering.

An Omnidex Index may reference more than one column or substring expression if declared after the columns in the table, or may reference a single column if declared inline with the columns. A table does not have any presumed Omnidex Indexes unless they are declared.

An Omnidex Index is limited to a maximum of 240 bytes of data.

The following article on Omnidex Index Types describes Omnidex Indexes in more detail.

QUICKTEXT INDEX

The QUICKTEXT INDEX clause declares an QuickText Index. A QuickText Index is the basic index used on columns containing textual data. A QuickText Index indexes the individual keywords in a column and allows them the be qualified individually in searches. As a parsed index, QuickText Indexes can only be used to optimize criteria.

A QuickText Index may reference more than one column or substring expression if declared after the columns in the table, or may reference a single column if declared inline with the columns. A table does not have any presumed QuickText Indexes unless they are declared.

A QuickText Index is limited to a 16 MB of data if applied to the last column in the table, and 64 KB if applied to a middle column in the table.

The following article on Omnidex Index Types describes QuickText Indexes in more detail.

FULLTEXT INDEX

The FULLTEXT INDEX clause declares a FullText Index. A FullText Index is an advanced indexed used with large amounts of textual. A FullText Index indexes the individual keywords in a column along with information about their location within the column. This additional location information allows proximity and phrase searches, such as search for one word near another word. This additional information also allows relevancy scoring, similar to that used in Internet search engines. While FullText Indexes are quite advanced and powerful, they also require additional overhead. Because of this overhead, QuickText indexes should be used for textual data unless the features of FullText indexes are needed.

A FullText Index may reference more than one column or substring expression if declared after the columns in the table, or may reference a single column if declared inline with the columns. A table does not have any presumed FullText Indexes unless they are declared. FullText Indexes cannot be declared on pre-joined child tables.

A FullText Index is limited to a 16 MB of data if applied to the last column in the table, and 64 KB if applied to a middle column in the table.

The following article on Omnidex Index Types describes FullText Indexes in more detail.

CUSTOM INDEX

The CUSTOM INDEX clause declares a Custom Index. A Custom Index allows the administrator to specify many different advanced options for the index. These options usually enable optimizations for specific situations, or are used to achieve backward compatibility with older installations.

A Custom Index may reference more than one column or substring expression if declared after the columns in the table, or may reference a single column if declared inline with the columns. A table does not have any presumed Custom Indexes unless they are declared. Custom Indexes cannot be declared on pre-joined child tables.

A Custom Index is limited to 240 bytes unless the Keywording option is used. If the Keywording option is used, the index is limited to a 16 MB of data if applied to the last column in the table, and 64 KB if applied to a middle column in the table.

The following article on Omnidex Index Types describes Custom Indexes in more detail.

NATIVE INDEX

The NATIVE INDEX clause declares a native index in a relational database. Declaring the native indexes in an Omnidex Environment File is optional, but doing so will enable additional optimizations within Omnidex.

A Native Index may reference more than one column if declared after the columns in the table, or may reference a single column if declared inline with the columns. A table does not have any presumed Native Indexes unless they are declared.

INDEX index_name

The index_name uniquely identifies the index within this table. It may be up to 32 characters long, and can contain letters, digits and the following special characters: ! @ # $ % ^ _. It must begin with a letter and is case-insensitive. The index_name must be unique within the table. If index_name is a SQL reserved word, enclose it in double quotation marks.

For single-column indexes, it is not required to name the index. In these cases, the index will be named the same as the column.

Index Options

BITMAP

The BITMAP option causes an Omnidex or Custom index to be stored in a bitmap format rather than a B-tree format. The bitmap format is more efficient for indexes that have fewer than 32 distinct values. The B-tree format is more efficient for indexes that have greater than 32 distinct values.

The BITMAP option is frequently used on low-cardinality columns such as Yes/No flags, or columns like GENDER or MARITAL_STATUS. Using a Bitmap Index on these types of columns can significantly improve query performance.

Bitmap Indexes are not maintained when INSERT, DELETE and UPDATE statements are processed. Bitmap Indexes are rebuilt when an UPDATE INDEXES command is issued.

STANDALONE

The STANDALONE option causes a non-keyworded Custom index to reside independently from the other indexes in the table. Standalone indexes can be built individually, avoiding the costs of updating all of the indexes for a given table.

KEYWORDING

The KEYWORDING option causes the data in the index to be parsed based on spaces and punctuation marks. This allows each word to be used as independent criteria. For example, the criteria “INFORMATION” would locate a row with the value of “DYNAMIC INFORMATION SYSTEMS CORPORATION”.

The KEYWORDING option automatically implies the CASE_INSENSITIVE option, unless specifically overridden using the CASE_SENSITIVE option. This means that the criteria “INFORMATION” would locate a row with the value of “Dynamic Information Systems Corporation”.

The KEYWORDING option does not track the location of each word in each column. The PROXIMITY option is required to track the location of each word. The KEYWORDING option simply records that the keyword was present for that row. The criteria “John Doe” would locate both a row containing “John Q. Doe” and a row containing “Doe, John Q.”.

The KEYWORDING option has substantially less overhead than the PROXIMITY option and is recommended when performance is important. The KEYWORDING option is commonly used for smaller text fields such as name, address, descriptions, and brief comments. The PROXIMITY option is more commonly used on large text fields containing entire articles.

PROXIMITY

The PROXIMITY option causes the data in the index to be parsed based on spaces and punctuation marks, and also causes location information about each keyword to be recorded. This allows more sophisticated queries that compare the location of one keyword with another. The PROXIMITY option enables the use of the NEAR, BEFORE and AFTER operators, as well as the phrase searches. The PROXIMITY option also enables the use of the $SCORE and $CONTEXT functions.

The PROXIMITY option has substantially more overhead than the KEYWORDING option. The KEYWORDING option is commonly used for smaller text fields such as name, address, descriptions, and brief comments. The PROXIMITY option is more commonly used on large text fields containing entire articles.

The following article on Omnidex Proximity Searches provides more details on performing proximity searches.

< CASE_INSENSITIVE | CASE_SENSITIVE >

The CASE_INSENSITIVE and CASE_SENSITIVE options cause the data in the index to be treated in either a case sensitive or a case insensitive manner.

PHONETIC

The PHONETIC option causes index values to be indexed along with their phonetic equivalents. This option is required for performing phonetic searches using the $CONTAINS function in SELECT statements.

PREJOIN table_name

The PREJOIN option causes this index to be pre-joined to table_name. This option is used in conjunction with the RECORD_SPECIFIC and RECORD_COMPLEX options to determine how the index should be pre-joined. If neither the RECORD_SPECIFIC or RECORD_COMPLEX option is used, the RECORD_SPECIFIC option is used by default.

< RECORD_SPECIFIC | RECORD_COMPLEX >

The RECORD_SPECIFIC and RECORD_COMPLEX options are used to determine how the index should be pre-joined to the table named in the PREJOIN option.

index_group_specification

  INDEX GROUP [ ( owner_table_name ) ] group_name ( index_name [, index_name ... ] )

INDEX GROUP ( owner_table_name ) group_name

The INDEX GROUP clause is used to cause multiple indexes to be processed as a single indexes. When multiple indexes are grouped together, a search against any member of the group will automatically search the other members of the group. For example, if the columns ADDRESS1 and ADDRESS2 are combined into an Index Group, then a search for “MAIN STREET” will locate rows that have those two columns in either or both columns.

Index Groups can be created across multiple pre-joined tables; however, doing so requires using the CREATE INDEX GROUP statement.