Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: set line terminator chars to correspond with csv writer #35

Merged
merged 1 commit into from
Mar 12, 2024

Conversation

srpwnd
Copy link
Contributor

@srpwnd srpwnd commented Mar 12, 2024

Problem

Data with certain new line characters throws duckdb.duckdb.InvalidInputException: Invalid Input Error: Wrong NewLine Identifier. Expecting \r\n as described in #32.

Proposed changes

The Pythons csv.writer by default sets it's parameter of lineterminator to '\r\n' when writing the CSV files. This is not correctly reflected in DuckDB when trying to load these CSV files as it automatically assumes the new line character by contents of the file which is incorrect under certain circumstances. Setting the new_line argument in DuckDBs CSV COPY manually prevents this wrong assumption.

Types of changes

What types of changes does your code introduce to PipelineWise?

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

@srpwnd
Copy link
Contributor Author

srpwnd commented Mar 12, 2024

I can also edit this to make the line terminator characters configurable by user in both csv.writer and COPY so that everyone can chose between '\r\n', '\n', and '\r' based on their use case.

@jwills
Copy link
Owner

jwills commented Mar 12, 2024

Ah this is great Pavel, thank you! Tests are failing b/c MD doesn't yet support 0.10.0, taking a look

@jwills jwills merged commit c96527c into jwills:main Mar 12, 2024
1 check failed
@jwills
Copy link
Owner

jwills commented Mar 12, 2024

tested this locally and verified it worked-- thank you @srpwnd !

@srpwnd
Copy link
Contributor Author

srpwnd commented Mar 12, 2024

@jwills Thanks for quickly testing and merging this! 🙏

Will this be reflected in the PyPi/Meltano catalog straight away or do you need to make a release first?

@jwills
Copy link
Owner

jwills commented Mar 12, 2024

I think I need to make a release first, though of course you can use it right away by pointing at the Github main branch from Meltano

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants