Differences

This shows you the differences between two versions of the page.

Link to this comparison view

integration:rawdata:delimited:environments [2012/10/26 15:34]
127.0.0.1 external edit
integration:rawdata:delimited:environments [2016/06/28 22:38]
Line 1: Line 1:
-~~NOTOC~~ 
  
-{{page>:​top_add&​nofooter&​noeditbtn}} 
- 
-====== Adminstration:​ Raw Data Files ====== 
- 
-===== Delimited Files ===== 
- 
-[[integration:​rawdata:​delimited:​home|Overview]] | 
-**[[integration:​rawdata:​delimited:​environments|Environments]]** | 
-[[integration:​rawdata:​delimited:​options|Options]] | 
-[[integration:​rawdata:​delimited:​datatypes|Datatypes]] | 
-[[integration:​rawdata:​delimited:​ofx|Offset Indexes]] |  
-[[integration:​rawdata:​delimited:​export|Exporting]] | 
-[[integration:​rawdata:​delimited:​limitations|Limitations]] | 
-[[integration:​rawdata:​delimited:​example|Example]] 
- 
- 
----- 
- 
-==== Environments ==== 
- 
-The Omnidex Environment File is the key to setting up Omnidex on raw data files. ​ Delimited files do not contain metadata about columns and datatypes. ​ The Omnidex Environment File fulfills this role.  
- 
-=== Declaring Databases for Raw Data Files === 
- 
-Databases in Omnidex are of a particular type, such as Oracle, SQL Server, MySQL or ODBC.  A database that accesses a raw data file has its own type: FILE.  All tables within this database must be raw data files, though they can be a mixture of the types of raw data files supported by Omnidex.  ​ 
- 
-<​code>​ 
-create database ​      "​SIMPLE"​ 
- ​type ​                FILE 
- ​index_directory ​     "​idx"​ 
- ​in ​                  "​simple.xml";​ 
-</​code>​ 
- 
- 
-=== Declaring Tables for Delimited Files === 
- 
-Tables will point to one or more delimited files, and the files must match the record layout specified in the [[dev:​sql:​statements:​create_table:​home|CREATE TABLE]] statement. ​ In the [[dev:​sql:​statements:​create_table:​home|CREATE TABLE]] statement, the [[integration:​rawdata:​delimited:​options|OPTIONS clause]] is used to identify this as a delimited file and indicate which delimiters to use.  The DELIMITED option indicates that this is a delimited file.  The remaining options govern the specific delimiters and behavior to use when parsing the file.  ​ 
- 
-In the CREATE TABLE statement, the PHYSICAL clause is used to point to the physical location of the raw data file.  This PHYSICAL clause can contain a relative path name (relative to the location of the Omnidex Environment File) such as "​dat/​states.csv",​ or an absolute path name such as "​d:/​class/​lab1/​dat/​states.csv"​. ​ This PHYSICAL clause can also contain wildcards as described below, and can also reference [[appendix:​reference:​envvar:​home|environment variables]] using special syntax. 
- 
-<​code>​ 
-create table          "​STATES"​ 
- ​options ​             "​DELIMITED COLUMN=','​ RECORD='​\r\n'​ QUOTES"​ 
- ​physical ​            "​dat/​states.csv"​ 
- ... 
-</​code>​ 
- 
-In the example above, a single file was used as the source of data for a table. ​ It is also possible for multiple files to be used as the source of data for a table. ​ As long as the files all share the same record layout, they will all be processed when reading the table. Filesets can be declared as a comma-separated list of files, or they can be declared using wildcards in the filename. ​ In this example, three files are declared individually in the PHYSICAL clause: 
- 
-<​code>​ 
-create table          "​STATES"​ 
- ​options ​             "​DELIMITED COLUMN=','​ RECORD='​\r\n'​ QUOTES"​ 
- ​physical ​            "​c:/​class/​lab1/​dat/​states1.csv, ​ 
-                       ​c:/​class/​lab1/​dat/​states2.csv, ​ 
-                       ​d:/​class/​lab1/​dat/​states3.csv"​ 
- ... 
-</​code>​ 
- 
-In this second example, wildcards are used to declare these files: 
- 
-<​code>​ 
-create table          "​STATES"​ 
- ​options ​             "​DELIMITED COLUMN=','​ RECORD='​\r\n'​ QUOTES"​ 
- ​physical ​            "​c:/​class/​lab1/​dat/​states*.csv, ​ 
-                       ​d:/​class/​lab1/​dat/​states*.csv"​ 
- ... 
-</​code>​ 
- 
-Files can reside in multiple directories and in multiple drives or volumes to allow a table to span multiple storage devices. ​ Files may also reside on a Storage Area Network (SAN); in fact, this is one of the most common implementations of Omnidex. ​ Files may also reside on Microsoft Windows mapped drives; however, users should expect performance degradation to occur in this scenario. ​ 
- 
-Wildcards provide greater flexibility since they allow more data to be added to a table by simply moving another file into place. ​ Care should be taken with this approach, though, since the table must be reindexed as soon as new data is in place. 
- 
-=====  ===== 
- 
-**[[integration:​rawdata:​delimited:​home|Prev]]** | 
-**[[integration:​rawdata:​delimited:​options|Next]]** 
- 
-====== Additional Resources ====== 
- 
-See also:  
- 
-{{page>:​integration:​rawdata:​see_also&​nofooter&​noeditbtn}} 
- 
-{{page>:​bottom_add&​nofooter&​noeditbtn}} 
 
Back to top
integration/rawdata/delimited/environments.txt ยท Last modified: 2016/06/28 22:38 (external edit)