Module ecsv

Description
Data Types
Function Index
Function Details

Erlang NIF CSV parser and writer.

Data Types

callback_fun()


callback_fun(CallBackStateType) = fun((Message::callback_message(), CallBackState0::CallBackStateType) -> CallBackState::CallBackStateType)

Callback function used for processing parsed data in parse_stream/4 and parse_stream/5.

Note callback_message() contains Rows in reverse order.

callback_message()


callback_message() = {eof | rows, RevRows::rows()}

callback_state()


callback_state() = any()

input()


input() = eof | binary()

line()


line() = [atom() | number() | iolist()]

option()


option() = strict | null | all_lines | strict_finish | {delimiter, byte()} | {quote, byte()}

options()


options() = [option()]

See parser_init/1 for details about option().

reader_fun()


reader_fun(ReaderStateType) = fun((ReaderState0::ReaderStateType) -> {Input::input(), ReaderState::ReaderStateType})

Reader function which feeds data to parse_stream/4 and parse_stream/5.

Note function has to return eof as the last Input value or parse_stream/4 and parse_stream/5 will never finish otherwise.

reader_state()


reader_state() = any()

row()


row() = tuple()

rows()


rows() = [row()]

state()


state() = any()

Internal parser state which is an NIF resource type. Note it's value is not immutable so doesn't have to be passed from call to call. Anyway, the whole API is designed and internally used as the state is immutable which allows writing pure Erlang implementation with exactly same API. So please do not abuse this feature because it could be incompatible in future. On another side, the current implementation doesn't allow restart parsing from last correct state after error.

All functions after finished parsing return state in a condition which allows starting another parsing with the same settings. It means parse_step/2 and parse_raw/3 after Input = eof and parse_stream/4 and parse_stream/5 always.

Function Index

accumulator/0	Return simple accumulator callback function.
block_chopper/1	Return simple binary reader function.
default_block_size/0	Default block size used by parser API functions.
file_reader/0	Return file reader function.
parse/1	Equivalent to `parse(Bin, [])`.
parse/2	Parse CSV data in binary and return rows in order.
parse_raw/3	Parse Input and accumulate result.
parse_step/2	Parse `Input` and return rows in order.
parse_stream/4	Equivalent to `parse_stream(Reader, ReaderState0, CallBack, CallbackState0, [])`.
parse_stream/5	Parse stream produced by `Reader` and process by `CallBack`.
parser_init/1	Initialise parser state.
write/1	Equivalent to `write_lines([Line])`.
write_lines/1	Experimental CSV writer.

Function Details

accumulator/0


accumulator() -> CallBack

CallBack = fun((Input, RevRows) -> Rows)
Input = callback_message()
RevRows = rows()
Rows = rows()

Return simple accumulator callback function.

The callback function (see callback_fun()) reverses rows as reaction to{eof, _} callback message so returned final state is in order when used with parse_stream/4 and parse_stream/5.

Returned callback is equivalent to

  Accumulator = fun({eof, Rs}, Acc) -> lists:reverse(Rs ++ Acc);
                   ({rows, Rs}, Acc) -> Rs ++ Acc
                end

If efficiency is a concern (even only new rows are appended to accumulator), consider direct using of parse_raw/3.

See also: parse_raw/3, parse_stream/4, parse_stream/5.

block_chopper/1


block_chopper(BlockSize) -> Reader

BlockSize = pos_integer()
Reader = reader_fun(State)
State = binary()

Return simple binary reader function.

Function comes handy when you have already whole CSV data but would like use parse_stream/4 or parse_stream/5 with custom callback function working on amount of data defined by BlockSize.

See also: parse_stream/4, parse_stream/5.

default_block_size/0


default_block_size() -> 20480

Default block size used by parser API functions

See also: file_reader/0, parse_raw/3.

file_reader/0


file_reader() -> Reader

Reader = reader_fun(FH)
FH = file:io_device()

Return file reader function.

Returned reader function reads file:io_device() using file:read/2 calls with default block size.

parse/1


parse(Bin) -> Rows

Bin = binary()
Rows = rows()

Equivalent to parse(Bin, []).

parse/2


parse(Bin, Opts) -> Rows

Bin = binary()
Opts = options()
Rows = rows()

Parse CSV data in binary and return rows in order.

parse_raw/3


parse_raw(Input, State0, Acc) -> Result

Input = input()
State0 = state()
Acc = rows()
Result = {ok, Acc, State} | {error, Acc, Reason}
Acc = rows()
State = state()
Reason = any()

Parse Input and accumulate result

It is low-level parsing function which allows writing your own iterative parsing functions like parse_stream/5. Note it returns newly parsed rows in reversed order with Acc0 content appended. All other parser functions use this function internally.

  1> {ok, R1, S1} = ecsv:parse_raw(<<"foo\nbar">>, ecsv:parser_init([]), []).
  {ok,[{<<"foo">>}],<<>>}
  2> {ok, R2, S2} = ecsv:parse_raw(<<"\nbaz\nquux">>, S1, R1).
  {ok,[{<<"baz">>},{<<"bar">>},{<<"foo">>}],<<>>}
  3> ecsv:parse_raw(eof, S2, R2).
  {ok,[{<<"quux">>},{<<"baz">>},{<<"bar">>},{<<"foo">>}],<<>>}

Function chops Input binary by default_block_size/0 which should take 10-15% of timeslice on decent 2.6GHz CPU and keeps VM responsive. You should not call NIF function directly.

parse_step/2


parse_step(Input, State0) -> Result

Input = input()
State0 = state()
Result = {ok, Rows, State}
Rows = rows()
State = state()

Parse Input and return rows in order.

This function allows writing simple parsing loop over chunked data. It requires initialised parser state and returns rows in order. The call with eof is necessary if there is missing line terminator after the last row. Use parser_raw/3 if an order of rows is not important for you or you want accumulate rows and use lists:reverse/0 afterward.

See also: parser_init/1, parser_raw/3.

parse_stream/4


parse_stream(Reader, ReaderState0, CallBack, CallbackState0) -> Result

Reader = reader_fun(ReaderStateType)
ReaderState0 = ReaderStateType
CallBack = callback_fun(CallBackStateType)
CallbackState0 = CallBackStateType
Result = {ReaderState, CallbackState, State}
ReaderState = ReaderStateType
CallbackState = CallBackStateType
State = state()

Equivalent to parse_stream(Reader, ReaderState0, CallBack,CallbackState0, []).

parse_stream/5


parse_stream(Reader, ReaderState0, CallBack, CallbackState0, StateOrOpts) -> Result

Reader = reader_fun(ReaderStateType)
ReaderState0 = ReaderStateType
CallBack = callback_fun(CallBackStateType)
CallbackState0 = CallBackStateType
StateOrOpts = state() | options()
Result = {ReaderState, CallbackState, State}
ReaderState = ReaderStateType
CallbackState = CallBackStateType
State = state()

Parse stream produced by Reader and process by CallBack.

Function parses Input form Reader (See reader_fun()) and result feeds into CallBack (See callback_fun()).

Code

  {ok, Bin} = file:read_file("test/FL_insurance_sample.csv"),
  Rows = ecsv:parse(Bin).

leads in same result as

  {ok, FH} = file:open("test/FL_insurance_sample.csv", [read, raw, binary]),
  try ecsv:parse_stream(ecsv:file_reader(), FH, ecsv:accumulator(), []) of
      {_, Rows, _} -> Rows
  after file:close(FH)
  end.

or

  {ok, Bin} = file:read_file("test/FL_insurance_sample.csv"),
  BC = ecsv:block_chopper(ecsv:default_block_size()),
  {_, Rows, _} = ecsv:parse_stream(BC, Bin, ecsv:accumulator(), []).

But using parse_stream/4,5 allows stream processing. For example

  Counter = fun({_, Rs}, {Fs, Ls}) ->
                {Fs + lists:sum([tuple_size(X) || X <- Rs]),
                 Ls + length(Rs)}
            end,
  {ok, FH2} = file:open("test/FL_insurance_sample.csv", [read, raw, binary]),
  try ecsv:parse_stream(ecsv:file_reader(), FH2, Counter, {0, 0}) of
      {_, {NumberOfFields, NumberOfRows} = Result, _} -> Result
  after file:close(FH2)
  end.

will be way more efficient than reading all rows into memory for big data files.

parser_init/1


parser_init(Opts) -> State

Opts = options()
State = state()

Initialise parser state

Return State for parsing CSV using given 'Opts' options(). See state() for more details about State behaviour.

strict

Force strict quoting rules.

null

Unquoted empty field is returned as atom null. Compare

  1> ecsv:parse(<<"a,,b,\"\",c">>, []).
  [{<<"a">>,<<>>,<<"b">>,<<>>,<<"c">>}]
  2> ecsv:parse(<<"a,,b,\"\",c">>, [null]).
  [{<<"a">>,null,<<"b">>,<<>>,<<"c">>}]

all_lines

Return all even empty rows. Compare

  1> ecsv:parse(<<"a\n\nb\n\n">>,[]).
  [{<<"a">>},{<<"b">>}]
  2> ecsv:parse(<<"a\n\nb\n\n">>,[all_lines]).
  [{<<"a">>},{},{<<"b">>},{}]

strict_finish

Force strict quoting rules only for last field of last row with missing line terminator.

{delimiter, D}

Define alternative delimiter character. See {quote, Q} for example.

{quote, Q}

Define alternative quotation character. Compare

  1> ecsv:parse(<<"'a,\",';'b,\"\",''c';\"">>, [strict]).
  [{<<"'a">>,<<",';'b,\",''c';">>}]
  2> ecsv:parse(<<"'a,\",';'b,\"\",''c';\"">>, [strict, {delimiter, $;}, {quote, $'}]).
  [{<<"a,\",">>,<<"b,\"\",'c">>,<<"\"">>}]

write/1


write(Line) -> Result

Line = line()
Result = iolist()

Equivalent to write_lines([Line]).

write_lines/1


write_lines(Lines) -> Result

Lines = [line()]
Result = iolist()

Experimental CSV writer

Function writes binaries, iolists, atoms, integers and floats. Fields are quoted and quotes escaped as needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ecsv.md

ecsv.md

Module ecsv

Data Types

callback_fun()

callback_message()

callback_state()

input()

line()

option()

options()

reader_fun()

reader_state()

row()

rows()

state()

Function Index

Function Details

accumulator/0

block_chopper/1

default_block_size/0

file_reader/0

parse/1

parse/2

parse_raw/3

parse_step/2

parse_stream/4

parse_stream/5

parser_init/1

write/1

write_lines/1

Files

ecsv.md

Latest commit

History

ecsv.md

File metadata and controls

Module ecsv

Data Types

Function Index

Function Details

accumulator/0

block_chopper/1

default_block_size/0

file_reader/0

parse/1

parse/2

parse_raw/3

parse_step/2

parse_stream/4

parse_stream/5

parser_init/1

write/1

write_lines/1