Changelog

3.2.1 (2023-11-26)

Correct typespec for decode, reported in #125 by @AntoineAugusti

3.2.0 (2023-09-24)

Strict mode: Exception messages of thrown exceptions are now redacted by default to avoid data unintentionally leaking into logs. This behaviour change is not considered to be breaking backwards compatibility since source data presented in exception messages is not considered part of the CSV public API.
Strict mode: Exception messages can be unredacted using the unredact_exceptions option
Normal mode: Error messages can be redacted using the redact_errors option
Option to (un)redact exception messages [contributed in #122 by @taylor-redden-papa

3.0.5 (2022-12-03)

Exclude dialyzer files from library package [contributed in #121 by @milmazz

3.0.4 (2022-11-19)

Add missing escape_max_lines to decode options typespec closes #120

3.0.3 (2022-11-04)

Ensure that reparsing of lines with stray escape characters does not produce duplicate error output closes #119
Deduplication of type specs in #118 contributed by @joseph-lozano
Documentation fixes and improvements contributed by @jamesvl in #115

3.0.2 (2022-11-03)

Ensure that escaped fields as the last field on the last line without a newline are included in the results - fixes #117 raised by @superhawk610

3.0.1 (2022-10-25)

Ensure that stray escape quotes and unterminated escape sequences on a last line without a newline produce errors

3.0.0 (2022-10-25)

The parallel parser/lexer with a binary matching parser with better performance.
A new :field_transform option allows specifying functionality applied when decoding any field through a function
Escape characters can now be specified using the :escape_character option, this Closes #59
The library will now reparse lines that follow e.g. an unterminated escape sequence. This ensures that all possible valid rows will be returned in normal mode
Encoding checks have been removed because they can either be done using :field_transform or outside the library
Better docs

Upgrading from 2.x

Parallelism has been removed, alongside its options :num_workers and :worker_work_ratio. You can safely remove them.
StrayQuoteError is now StrayEscapeCharacterError. If you catch this error in your code, you need to rename it.

The :strip_fields option needs to be replaced with the :field_transform option:

File.stream!("data.csv") |> CSV.decode(field_transform: &String.trim/1)

:validate_row_length now defaults to false. This option produces an error for rows with different length. Set it to true to get the same behaviour as in 2.x
:escape_formulas is now :unescape_formulas for decode and decode!. It is still :escape_formulas for encode. Change :escape_formulas to :unescape_formulas in decode calls to get the same behaviour as in 2.x
:escape_max_lines now defaults to 10 instead of 1000. To get the same behaviour as in 2.x, use:
```
File.stream!("data.csv") |> CSV.decode(escape_max_lines: 1000)
```

:replace has been removed. CSV will now return fields with incorrect encoding as-is. You can use the new :field_transform option to provide a function transforming fields while they are being parsed. This allows to e.g. replace incorrect encoding:

defp replace_bad_encoding(field) do
  if String.valid?(field) do
    field
  else
    field
    |> String.codepoints()
    |> Enum.map(fn codepoint -> if String.valid?(codepoint), do: codepoint, else: "?" end)
    |> Enum.join()
  end
end

2.5.0 (2022-09-17)

Optional parameter escape_formulas to prevent CSV injection. Fixes #103 reported by @maennchen. Contributed by @maennchen in PR #104.
Optional parameter force_quotes to force quotes when encoding contributed by @stuart
Bugfix to pass non UTF-8 lines through in normal mode so other lines can be processed, Fixes #107. Contributed by @al2o3cr.
Allow to encode keyword lists specifying headers as values, contributed by @michaelchu
Better docs thanks to @kianmeng

2.4.1 (2020-09-12)

Fix unnecessary escaping of delimiters when encoding Fixes #70 reported by @karmajunkie

2.4.0 (2020-09-12)

Fix StrayQuoteError not getting passed the correct arguments in strict mode. Fixes #96.
When headers are present multiple times and the :headers option is set to true, parse the values into a list. Contributed by @MrAlexLau in PR #97.

2.3.1 (2019-03-30)

Fix StrayQuoteError incorrectly getting raised when escape sequences end in new lines. Fixes #89. Raised by @rockwood in Issue #96.

2.3.0 (2019-03-17)

Add StrayQuoteError which gets raised when a row has stray quotes rather than EscapeSequenceError to help with common encoding errors.

2.2.0 (2019-03-03)

Make syntax compatible with latest Elixir releases
Add validate_row_length: option defaulting to true to allow disabling validation of row length.

2.0.0 (2017-05-29)

Make decode return row and error tuples instead of raising errors directly
Make old behaviour of raising errors directly available via decode!
Improve error messages for escape sequences
Rewrite parts of the pipeline to be more modular

1.4.4 (2016-11-12)

Load parallel_stream as an app dependency to avoid load level errors. See issue #56 reported by @luk3thomas

1.4.3 (2016-08-27)

Fix a case where lines would not be aggregated correctly see #52 reported by @yury-dimov

1.4.2 (2016-06-20)

Update dependency on parallel_stream

1.4.1 (2016-05-21)

Fix condition where rows would be dropped when decoding from stateful streams. See #39 reported by @moxley

1.4.0 (2016-04-03)

add option to specify headers in encode - added in #34 by @barruumrex

1.3.3 (2016-03-25)

Fix empty streams raising a lexer error - raised in #28 by @kiliancs

1.3.2 (2016-03-08)

Cleanup, removing some unused defaults in function headers to remove compile time warnings

1.3.1 (2016-03-08)

Fix :strip_cells not stripping cells when multiple options are specified - #29 by @tomjoro

1.3.0 (2016-03-01)

Now supports linebreaks inside escaped fields (#13)
Raises an error when row length mismatches across rows
Uses parallel_stream for parallelism

1.2.4 (2016-02-06)

Fix encoding of double quotes

1.2.3 (2016-01-19)

Fix a condition where headers: true would enumerate the whole file once before parsing

1.2.2 (2016-01-02)

Fix default num_pipes argument to evaluate num_pipes dependent on scheduler at runtime
Test utf-8 files with BOM
Syntax and mix updates for elixir 1.2

1.2.1 (2015-10-17)

Decoder performance optimisations

1.2.0 (2015-10-11)

Use Stream.transform/4 - incompatible with Elixir < 1.1.0

1.1.5 (2015-10-11)

Decoder refactor from Stream.resource/3 to Stream.transform/3 in order to get more predictable stream behaviour
Rows now get processed in order
Fix a bug where stream would get evaluated before being decoded

1.1.4 (2015-09-13)

Fix a bug where headers could be out of order

1.1.3 (2015-09-12)

Fix a bug where headers could get parsed as the first row

1.1.2 (2015-09-05)

Fix a bug where calls to decode with num_pipes: 1 would yield varying results due to leftover state in decoder message queue

1.1.1 (2015-07-14)

Rescue from errors in stream producer to get more predictable behaviour in case of failure

1.1.0 (2015-07-12)

Better error messages when encountering invalid encodings

1.0.1 (2015-07-11)

Indicate consolidate_protocols for better encoding performance

1.0.0 (2015-05-24)

Use bytes as separators

0.2.3 (2015-05-24)

Add benchmarking

0.2.2 (2015-05-20)

Use utf-8 bytes instead of codepoints for multi-byte parsing

0.2.1 (2015-05-20)

Fix handling of multi-byte utf-8 characters

0.2.0 (2015-03-25)

Implement encoder protocol

Files

CHANGELOG.md

Latest commit

History