Skip to content

Importing CSV data

Yeray edited this page Jun 28, 2017 · 3 revisions

The BI.Data.CSV unit contains the TBICSV class to import data in "CSV" (Comma Separated Values) format, from files, text strings or streams.

TBICSV attempts to identify some details in the CSV content in an automatic way.

Properties

  • Delimiter

    The separator character or text between fields.

    The "," delimiter is tested by default. Other delimiters include the tab and space character.

    Custom delimiter can also be specified:

var CSV : TBICSV;  CSV.Delimiter:= '|'
  • Quote

    The character used to surround text values.

    The single and double quote characters are automatically tested.

    Custom quote character can also be specified:

var CSV : TBICSV; CSV.Quote:= '"';`
  • Header

    TBICSV automatically tests if the first lines of CSV content can be considered the "header" text that contains the name of the CSV fields.

CSV.Header.Headers:= TTextHeaders.Yes; // Auto, Yes, No`
  • Decimal separator

    Floating point numeric values are attempted to parse using the "." or "," separator character between the integral and decimal parts of the number.

TDataItem conversions

When importing CSV data, each "column" (or "field") is automatically created using the most appropiate TDataKind (integer, floating point, text, boolean, etc)

Speed

Several optimizations are used to achieve high speed when importing CSV text.

For example, a huge 3GB CSV file containing 1 billion cells (150k rows by 7k columns) takes 120 seconds to import (using a normal i7 desktop CPU and hard disk).

However, after the data has been imported and saved to TeeBI native binary format, the same file takes only 9 seconds to load.

Note:

Parallelization is not currently used on the import phase.

Clone this wiki locally