Administration: Omnidex Indexing

Omnidex Text

Textual Datatypes

Databases generally store text in character columns or in variable character columns (often called VARCHAR columns). These datatypes are good for storing smaller amounts of text, such as names, addresses and short descriptions. These datatypes are often limited in size, however. For larger amounts of text, a database may employ special datatypes designed specifically for this purpose. As examples, Oracle provides a CLOB (Character Large Object) datatype and SQL Server provides a TEXT datatype. These datatypes can often store up to 2-4 gigabytes of text.

Omnidex recommends storing text in either CHARACTER or STRING datatypes. These are the simplest datatypes to use and provide great flexibility. Omnidex also supports VARCHAR and CLOB dataypes, but these datatypes are more difficult to use and are more restricted. For most applications, the STRING datatype will be preferred since it allows null-terminated data up to 16 MB.

Datatype Description



CHARACTER



Space-padded data up to 4,095 bytes.



STRING



Null-terminated data up to 64MB. If indexed with Omnidex, the extracted text from this column may be up to 16MB.



VARCHAR



Non-terminated and non-padded data up to 4,095 bytes. This datatype may contain embedded null characters since it is not null-terminated; however, it should not be used to store binary data. When using API's to access this datatype, data lengths variables are required since no terminator is used to indicate the end of the text. This datatype is also not appropriate for fixed length raw data files since the data length cannot be stored in raw data files. Raw data files should use CHARACTER or STRING datatypes.



CLOB



Non-terminated and non-padded data up to 64MB. If indexed with Omnidex, the extracted text from this column may be up to 16MB. This datatype may contain embedded null characters since it is not null-terminated; however, it should not be used to store binary data. When using API's to access this datatype, data lengths variables are required since no terminator is used to indicate the end of the text. This datatype is also not appropriate for fixed length raw data files since the data length cannot be stored in raw data files. Raw data files should use CHARACTER or STRING datatypes.

The handling of CLOB data may be more expensive than the handling of CHARACTER, STRING and VARCHAR data. It is better to use those datatypes if their size limitations will not be exceeded.



Comparing Textual Datatypes

The textual datatypes have different characteristics and have different restrictions within Omnidex SQL. The following table shows the capabilities of each datatype.

Characteristics CHARACTER STRING VARCHAR CLOB

Datatype Characteristics
Character data allowed Yes Yes Yes Yes
Binary data allowed No No No No
Embedded nulls allowed No No Yes Yes
Null-terminated No Yes No No
Data_lengths required No No Yes Yes
Max size 4,095 16mb 4,095 16mb

Usage Characteristics
Select item of simple query Yes Yes Yes Yes
Select item of outer query Yes Yes Yes Yes
Select item of nested query Yes Yes Yes No
Select item of INSERT Yes Yes Yes No
Select item of set operation Yes Yes Yes Yes
Table joins Yes Yes Yes No
WHERE clause Yes Yes Yes Limited 1
GROUP BY clause Yes Yes Yes No
ORDER BY clause Yes Yes Yes No
HAVING clause Yes Yes Yes No
SELECT INTO clause Yes Yes Yes No
Aggregate functions Yes Yes Yes No
SQL Functions Yes Yes Yes Limited 2

Update Characteristics
Inserts, updates and deletes Yes Yes Yes No

1. Limited to use of LIKE, $CONTAINS and IS NULL operators.
2.See individual functions for compatibility with CLOB datatype.

Additional Resources

See also:

 
Back to top
admin/indexing/text/clob.txt ยท Last modified: 2016/06/28 22:38 (external edit)