Integration: Raw Data Files

Delimited Files

Datatypes

The record layout described in the CREATE TABLE statement must exactly match the order of columns in the data file. Care must also be taken with binary columns to insure that the character content in the data file can be legitimately converted to the binary datatype declared in the table. If these record layouts do not match, Omnidex will have problems reading the data from the delimited file.

In the CREATE TABLE statement, the record layout is specified using the COLUMN clause. The datatypes and lengths for these columns can be nearly any datatype supported by Omnidex. The only exceptions are the VARCHAR and CLOB datatypes, as these require that the underlying data also include stored data lengths. With raw data files, use STRING and CHARACTER datatypes instead.

create table          "STATES"
 options              "DELIMITED COLUMN=',' RECORD='\r\n' QUOTES"
 physical             "dat/states.dat"
 (
  "STATE"             CHARACTER(2),                
  "DESCRIPTION"       STRING(31),                 
  "STATE_CODE"        CHARACTER(2),                
  "REGION"            CHARACTER(2),                
  "COUNTRY"           CHARACTER(2),                
  "TAX_RATE"          FLOAT                       
 )
 in                   "simple.xml";

Most raw data files are accompanied by some kind of record layout that can be used to construct the CREATE TABLE statement. For those that do not, the DUMP DELIMITED command in OdxSQL can be helpful. This command will dump each line of the file in hex, honoring the line delimiters specified.

> dump delimited states.csv

Delimited file dump for states.csv

Record delimiter: \r\n

states.csv is 1.6K

Next, First or Quit:
Filename: states.csv  Record: 2  Offset: 0
00000-00015:   414b2c41 6c61736b 612c3032 2c50432c     AK,Alaska,02,PC,
00016-00028:   55532c30 2e303030 3030300d 0a           US,0.000000..
Next, First or Quit:
Filename: states.csv  Record: 4  Offset: 29
00000-00015:   414c2c41 6c616261 6d612c30 312c4553     AL,Alabama,01,ES
00016-00029:   2c55532c 342e3030 30303030 0d0a         ,US,4.000000..
Next, First or Quit:
Filename: states.csv  Record: 6  Offset: 59
00000-00015:   41522c41 726b616e 7361732c 30352c57     AR,Arkansas,05,W
00016-00030:   532c5553 2c342e36 32353030 300d0a       S,US,4.625000..
Next, First or Quit:
Filename: states.csv  Record: 8  Offset: 90
00000-00015:   415a2c41 72697a6f 6e612c30 342c4d54     AZ,Arizona,04,MT
00016-00029:   2c55532c 352e3030 30303030 0d0a         ,US,5.000000..
Next, First or Quit:
Filename: states.csv  Record: 10  Offset: 120
00000-00015:   43412c43 616c6966 6f726e69 612c3036     CA,California,06
00016-00031:   2c50432c 55532c36 2e303030 3030300d     ,PC,US,6.000000.
00032-00032:   0a                                      .
Next, First or Quit:
Filename: states.csv  Record: 12  Offset: 153
00000-00015:   434f2c43 6f6c6f72 61646f2c 30382c4d     CO,Colorado,08,M
00016-00030:   542c5553 2c332e30 30303030 300d0a       T,US,3.000000..


Note that if an Offset Index exists for this file, the options recorded in the Offset Index will be used rather than prompting for options, and the information from the Offset Index is displayed.

> dump delimited states.csv

Delimited file dump for states.csv

OFX file for states.csv opened with:
 Options:
   Column delimiter: ,
   Record delimiter: \r\n
   Nulls:
   Quote character:  None
   Quote datatypes:  Unknown
   Force quotes:     No
   Escape character: \\
   Trail empty:      No
   Header row:       No
   Skip Bytes:       0

 Header:
   Version: 20000  Timestamp: Wed Mar 02 14:56:39 2011

   Storage: Inc Subp  Offset Len:     2  Incrementg:     Y  Locked:         N
   Min Reclen:     7  Max Reclen:    84
   Page Sz:    32768  Entr/Page:  16284
   Subpg Sz:     712  Entr/Subpg:   354  Subpg/Page:    46
   Num Pages:      1  Num cols:       6  Num rows:      51  Highwater:     51
   Tbl Offset:   352  Tbl Length:  2216  Own Offset:  2568  Own Length:    28
   Data Offst:  2608

 Table STATES:
   Endian:         L  Encryption:     N  Deletes:        N  Compression: NONE
   Reclen Len:     0  Rowid Len:      0
   Null Ind:       N  Null Off:       0  Null Len:       0
   Hdr Rec:        0  Data Rec:      44  Total Rec:     44  Card:           51
   Num Cols:       6  Num Keys:       0  Num Idxs:       0

   Column Name                      Datatype           Usage        Options
   ---------------------------------------------------------------------------
   STATE                            CHAR(2)                         TRUN VALUE EXP
   DESCRIPTION                      STRING(31)                      TRUN VALUE EXP
   STATE_CODE                       CHAR(2)                         TRUN VALUE EXP
   REGION                           CHAR(2)                         TRUN VALUE EXP
   COUNTRY                          CHAR(2)                         TRUN VALUE EXP
   TAX_RATE                         FLOAT                           VALUE EXP OVRFL


states.csv is 1.6K and has 51 rows

Rec #, Previous, Next, First, Last or Quit:
Filename: states.csv  Record: 1  Offset: 0
00000-00015:   414b2c41 6c61736b 612c3032 2c50432c     AK,Alaska,02,PC,
00016-00028:   55532c30 2e303030 3030300d 0a           US,0.000000..
Rec #, Previous, Next, First, Last or Quit:
Filename: states.csv  Record: 2  Offset: 29
00000-00015:   414c2c41 6c616261 6d612c30 312c4553     AL,Alabama,01,ES
00016-00029:   2c55532c 342e3030 30303030 0d0a         ,US,4.000000..
Rec #, Previous, Next, First, Last or Quit:
Filename: states.csv  Record: 3  Offset: 59
00000-00015:   41522c41 726b616e 7361732c 30352c57     AR,Arkansas,05,W
00016-00030:   532c5553 2c342e36 32353030 300d0a       S,US,4.625000..
Rec #, Previous, Next, First, Last or Quit:
Filename: states.csv  Record: 4  Offset: 90
00000-00015:   415a2c41 72697a6f 6e612c30 342c4d54     AZ,Arizona,04,MT
00016-00029:   2c55532c 352e3030 30303030 0d0a         ,US,5.000000..
Rec #, Previous, Next, First, Last or Quit:
Filename: states.csv  Record: 5  Offset: 120
00000-00015:   43412c43 616c6966 6f726e69 612c3036     CA,California,06
00016-00031:   2c50432c 55532c36 2e303030 3030300d     ,PC,US,6.000000.
00032-00032:   0a                                      .
Rec #, Previous, Next, First, Last or Quit:
Filename: states.csv  Record: 6  Offset: 153
00000-00015:   434f2c43 6f6c6f72 61646f2c 30382c4d     CO,Colorado,08,M
00016-00030:   542c5553 2c332e30 30303030 300d0a       T,US,3.000000..

Omnidex cannot verify that the record layouts exactly match, so it is always wise to retrieve a few rows and visually inspect that the column definition and the data file line up properly.

Additional Resources

See also:

 
Back to top
integration/rawdata/delimited/datatypes.txt ยท Last modified: 2016/06/28 22:38 (external edit)
 
 
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0