- Correct typespec for decode, reported in #125 by @AntoineAugusti
- Strict mode: Exception messages of thrown exceptions are now redacted by default to avoid data unintentionally leaking into logs.
This behaviour change is not considered to be breaking backwards compatibility since source data presented in exception messages is
not considered part of the
CSV
public API. - Strict mode: Exception messages can be unredacted using the
unredact_exceptions
option - Normal mode: Error messages can be redacted using the
redact_errors
option - Option to (un)redact exception messages [contributed in #122 by @taylor-redden-papa
- Add missing
escape_max_lines
to decode options typespec closes #120
- Ensure that reparsing of lines with stray escape characters does not produce duplicate error output closes #119
- Deduplication of type specs in #118 contributed by @joseph-lozano
- Documentation fixes and improvements contributed by @jamesvl in #115
- Ensure that escaped fields as the last field on the last line without a newline are included in the results - fixes #117 raised by @superhawk610
- Ensure that stray escape quotes and unterminated escape sequences on a last line without a newline produce errors
- The parallel parser/lexer with a binary matching parser with better performance.
- A new
:field_transform
option allows specifying functionality applied when decoding any field through a function - Escape characters can now be specified using the
:escape_character
option, this Closes #59 - The library will now reparse lines that follow e.g. an unterminated escape sequence. This ensures that all possible valid rows will be returned in normal mode
- Encoding checks have been removed because they can either be done using
:field_transform
or outside the library - Better docs
- Parallelism has been removed, alongside its options
:num_workers
and:worker_work_ratio
. You can safely remove them. StrayQuoteError
is nowStrayEscapeCharacterError
. If you catch this error in your code, you need to rename it.- The
:strip_fields
option needs to be replaced with the:field_transform
option:File.stream!("data.csv") |> CSV.decode(field_transform: &String.trim/1)
:validate_row_length
now defaults tofalse
. This option produces an error for rows with different length. Set it totrue
to get the same behaviour as in 2.x:escape_formulas
is now:unescape_formulas
fordecode
anddecode!
. It is still:escape_formulas
forencode
. Change:escape_formulas
to:unescape_formulas
indecode
calls to get the same behaviour as in 2.x:escape_max_lines
now defaults to10
instead of1000
. To get the same behaviour as in 2.x, use:File.stream!("data.csv") |> CSV.decode(escape_max_lines: 1000)
:replace
has been removed.CSV
will now return fields with incorrect encoding as-is. You can use the new:field_transform
option to provide a function transforming fields while they are being parsed. This allows to e.g. replace incorrect encoding:defp replace_bad_encoding(field) do if String.valid?(field) do field else field |> String.codepoints() |> Enum.map(fn codepoint -> if String.valid?(codepoint), do: codepoint, else: "?" end) |> Enum.join() end end
- Optional parameter
escape_formulas
to prevent CSV injection. Fixes #103 reported by @maennchen. Contributed by @maennchen in PR #104. - Optional parameter
force_quotes
to force quotes when encoding contributed by @stuart - Bugfix to pass non UTF-8 lines through in normal mode so other lines can be processed, Fixes #107. Contributed by @al2o3cr.
- Allow to encode keyword lists specifying headers as values, contributed by @michaelchu
- Better docs thanks to @kianmeng
- Fix unnecessary escaping of delimiters when encoding Fixes #70 reported by @karmajunkie
- Fix StrayQuoteError not getting passed the correct arguments in strict mode. Fixes #96.
- When headers are present multiple times and the
:headers
option is set totrue
, parse the values into a list. Contributed by @MrAlexLau in PR #97.
- Fix StrayQuoteError incorrectly getting raised when escape sequences end in new lines. Fixes #89. Raised by @rockwood in Issue #96.
- Add StrayQuoteError which gets raised when a row has stray quotes rather than EscapeSequenceError to help with common encoding errors.
- Make syntax compatible with latest Elixir releases
- Add
validate_row_length:
option defaulting to true to allow disabling validation of row length.
- Make
decode
return row and error tuples instead of raising errors directly - Make old behaviour of raising errors directly available
via
decode!
- Improve error messages for escape sequences
- Rewrite parts of the pipeline to be more modular
- Load
parallel_stream
as an app dependency to avoid load level errors. See issue #56 reported by @luk3thomas
- Fix a case where lines would not be aggregated correctly see #52 reported by @yury-dimov
- Update dependency on
parallel_stream
- Fix condition where rows would be dropped when decoding from stateful streams. See #39 reported by @moxley
- add option to specify headers in encode - added in #34 by @barruumrex
- Cleanup, removing some unused defaults in function headers to remove compile time warnings
- Fix
:strip_cells
not stripping cells when multiple options are specified - #29 by @tomjoro
- Now supports linebreaks inside escaped fields (#13)
- Raises an error when row length mismatches across rows
- Uses parallel_stream for parallelism
- Fix encoding of double quotes
- Fix a condition where headers: true would enumerate the whole file once before parsing
- Fix default num_pipes argument to evaluate num_pipes dependent on scheduler at runtime
- Test utf-8 files with BOM
- Syntax and mix updates for elixir 1.2
- Decoder performance optimisations
- Use
Stream.transform/4
- incompatible with Elixir <1.1.0
- Decoder refactor from
Stream.resource/3
toStream.transform/3
in order to get more predictable stream behaviour - Rows now get processed in order
- Fix a bug where stream would get evaluated before being decoded
- Fix a bug where headers could be out of order
- Fix a bug where headers could get parsed as the first row
- Fix a bug where calls to decode with num_pipes: 1 would yield varying results due to leftover state in decoder message queue
- Rescue from errors in stream producer to get more predictable behaviour in case of failure
- Better error messages when encountering invalid encodings
- Indicate
consolidate_protocols
for better encoding performance
- Use bytes as separators
- Add benchmarking
- Use utf-8 bytes instead of codepoints for multi-byte parsing
- Fix handling of multi-byte utf-8 characters
- Implement encoder protocol