Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
integration:rawdata:delimited:environments [2011/03/14 14:21]
127.0.0.1 external edit
integration:rawdata:delimited:environments [2016/06/28 22:38] (current)
Line 3: Line 3:
 {{page>:​top_add&​nofooter&​noeditbtn}} {{page>:​top_add&​nofooter&​noeditbtn}}
  
-====== ​Adminstration: Raw Data Files ======+====== ​Integration: Raw Data Files ======
  
 ===== Delimited Files ===== ===== Delimited Files =====
Line 25: Line 25:
 === Declaring Databases for Raw Data Files === === Declaring Databases for Raw Data Files ===
  
-Databases in Omnidex are of a particular type, such as Oracle, SQL Server, MySQL or ODBC.  A database that accesses raw data file has its own type: FILE.  All tables within this database must be raw data files, though they can be a mixture of the types of raw data files supported by Omnidex.  ​+Databases in Omnidex are of a particular type, such as Oracle, SQL Server, MySQL or ODBC.  A database that accesses ​raw data file has its own type: FILE.  All tables within this database must be raw data files, though they can be a mixture of the types of raw data files supported by Omnidex.  ​
  
 <​code>​ <​code>​
Line 39: Line 39:
 Tables will point to one or more delimited files, and the files must match the record layout specified in the [[dev:​sql:​statements:​create_table:​home|CREATE TABLE]] statement. ​ In the [[dev:​sql:​statements:​create_table:​home|CREATE TABLE]] statement, the [[integration:​rawdata:​delimited:​options|OPTIONS clause]] is used to identify this as a delimited file and indicate which delimiters to use.  The DELIMITED option indicates that this is a delimited file.  The remaining options govern the specific delimiters and behavior to use when parsing the file.  ​ Tables will point to one or more delimited files, and the files must match the record layout specified in the [[dev:​sql:​statements:​create_table:​home|CREATE TABLE]] statement. ​ In the [[dev:​sql:​statements:​create_table:​home|CREATE TABLE]] statement, the [[integration:​rawdata:​delimited:​options|OPTIONS clause]] is used to identify this as a delimited file and indicate which delimiters to use.  The DELIMITED option indicates that this is a delimited file.  The remaining options govern the specific delimiters and behavior to use when parsing the file.  ​
  
-In the CREATE TABLE statement, the PHYSICAL clause is used to point to the physical location of the raw data file.  This PHYSICAL clause can contain a relative path name (relative to the location of the Omnidex Environment File) such as "​dat/​states.csv",​ or an absolute path name such as "​d:/​class/​lab1/​dat/​states.csv"​. ​ This PHYSICAL clause can also contain wildcards as described below, and can also reference [[appendix:​reference:​envvar|environment variables]] using special syntax.+In the CREATE TABLE statement, the PHYSICAL clause is used to point to the physical location of the raw data file.  This PHYSICAL clause can contain a relative path name (relative to the location of the Omnidex Environment File) such as "​dat/​states.csv",​ or an absolute path name such as "​d:/​class/​lab1/​dat/​states.csv"​. ​ This PHYSICAL clause can also contain wildcards as described below, and can also reference [[appendix:​reference:​envvar:home|environment variables]] using special syntax.
  
-<​code ​sql>+<​code>​
 create table          "​STATES"​ create table          "​STATES"​
  ​options ​             "​DELIMITED COLUMN=','​ RECORD='​\r\n'​ QUOTES"​  ​options ​             "​DELIMITED COLUMN=','​ RECORD='​\r\n'​ QUOTES"​
Line 50: Line 50:
 In the example above, a single file was used as the source of data for a table. ​ It is also possible for multiple files to be used as the source of data for a table. ​ As long as the files all share the same record layout, they will all be processed when reading the table. Filesets can be declared as a comma-separated list of files, or they can be declared using wildcards in the filename. ​ In this example, three files are declared individually in the PHYSICAL clause: In the example above, a single file was used as the source of data for a table. ​ It is also possible for multiple files to be used as the source of data for a table. ​ As long as the files all share the same record layout, they will all be processed when reading the table. Filesets can be declared as a comma-separated list of files, or they can be declared using wildcards in the filename. ​ In this example, three files are declared individually in the PHYSICAL clause:
  
-<​code ​sql>+<​code>​
 create table          "​STATES"​ create table          "​STATES"​
  ​options ​             "​DELIMITED COLUMN=','​ RECORD='​\r\n'​ QUOTES"​  ​options ​             "​DELIMITED COLUMN=','​ RECORD='​\r\n'​ QUOTES"​
Line 61: Line 61:
 In this second example, wildcards are used to declare these files: In this second example, wildcards are used to declare these files:
  
-<​code ​sql>+<​code>​
 create table          "​STATES"​ create table          "​STATES"​
  ​options ​             "​DELIMITED COLUMN=','​ RECORD='​\r\n'​ QUOTES"​  ​options ​             "​DELIMITED COLUMN=','​ RECORD='​\r\n'​ QUOTES"​
Line 71: Line 71:
 Files can reside in multiple directories and in multiple drives or volumes to allow a table to span multiple storage devices. ​ Files may also reside on a Storage Area Network (SAN); in fact, this is one of the most common implementations of Omnidex. ​ Files may also reside on Microsoft Windows mapped drives; however, users should expect performance degradation to occur in this scenario. ​ Files can reside in multiple directories and in multiple drives or volumes to allow a table to span multiple storage devices. ​ Files may also reside on a Storage Area Network (SAN); in fact, this is one of the most common implementations of Omnidex. ​ Files may also reside on Microsoft Windows mapped drives; however, users should expect performance degradation to occur in this scenario. ​
  
-Wildcards provide ​a great flexibility since they allow more data to be added to a table by simply moving another file into place. ​ Care should be taken with this approach, though, since the table must be reindexed as soon as new data is in place.+Wildcards provide ​greater ​flexibility since they allow more data to be added to a table by simply moving another file into place. ​ Care should be taken with this approach, though, since the table must be reindexed as soon as new data is in place.
  
 =====  ===== =====  =====
 
Back to top
integration/rawdata/delimited/environments.1300112479.txt.gz ยท Last modified: 2016/06/28 22:38 (external edit)