[FIX] Cannot parse CSV with newlines #642

TungstnBallon · 2025-01-27T14:24:29Z

Closes #608.

Problem: The TextFile IOType contains a list of lines, which CSVInterpreter considers a string value containing a newline as two different rows.
Solution: Change TextFile to contain a single string, not a list of lines. This means that existing blocks relying on the lines being already split have been adapted as well. E.g. the TextLineDeleter splits the string into lines, deletes the lines required and joins the lines back into a single string.

joluj

Awesome work 🎉

libs/execution/src/lib/util/file-util.ts

libs/extensions/tabular/exec/src/lib/csv-interpreter-executor.spec.ts

joluj · 2025-01-27T17:52:59Z

libs/extensions/tabular/exec/src/lib/csv-interpreter-executor.ts

@@ -2,7 +2,10 @@
 //
 // SPDX-License-Identifier: AGPL-3.0-only

-import { parseString as parseStringAsCsv } from '@fast-csv/parse';
+// eslint-disable-next-line unicorn/prefer-node-protocol


I think I asked this before, why not just add the node protocol?

importing node:assert gives me this linter error:

'node:assert' import is restricted from being used. Please use the library `assert` instead to keep browser compatibility. You might need to disable rule `unicorn/prefer-node-protocol` to do so no-restricted-imports`

joluj · 2025-01-27T18:01:03Z

libs/extensions/tabular/exec/src/lib/csv-interpreter-executor.ts

+      (reason) => {
+        assert(typeof reason === 'string');
+        return R.err({
+          message: reason,
+          diagnostic: {
+            node: context.getCurrentNode(),
+            property: 'name',
+          },
+        });
+      },


Wow I didn't know that you could add a second parameter to .then :D I still prefer using .catch because it think has more semantic value, but I don't mind using it.

joluj · 2025-01-27T18:08:27Z

I was wondering if we should add improvements for selecting just the first couple of rows.

I think it's a very common use case, and removing the first couple of rows of a large text file could lead to performance issue (if I understood the algorithm correctly).
Not something we need to tackle, but maybe keep in mind.

TungstnBallon added 7 commits January 24, 2025 12:36

Add test to verify CSV with newlines

2e0bb8a

Change TextFile to not split lines prematurely

c849987

Add util transformTextFileLines()

347e663

Adapt TextFileInterpreter

f101247

Adapt TextLineDeleter

097b0e4

Adapt TextRangeSelector

62645ee

Adapt CSVInterpreter

cbf5a77

TungstnBallon self-assigned this Jan 27, 2025

TungstnBallon requested a review from joluj January 27, 2025 14:30

TungstnBallon mentioned this pull request Jan 27, 2025

[BUG] Cannot read properties of undefined (reading 'length') #641

Open

joluj approved these changes Jan 27, 2025

View reviewed changes

Add tests for transformTextFileLines

2663484

joluj approved these changes Jan 27, 2025

View reviewed changes

TungstnBallon merged commit 902ba93 into main Jan 28, 2025
3 checks passed

TungstnBallon deleted the 608-csv-newlines branch January 28, 2025 09:56

github-actions bot locked and limited conversation to collaborators Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] Cannot parse CSV with newlines #642

[FIX] Cannot parse CSV with newlines #642

TungstnBallon commented Jan 27, 2025

joluj left a comment

joluj Jan 27, 2025

TungstnBallon Jan 28, 2025

joluj Jan 27, 2025

joluj commented Jan 27, 2025

[FIX] Cannot parse CSV with newlines #642

[FIX] Cannot parse CSV with newlines #642

Conversation

TungstnBallon commented Jan 27, 2025

joluj left a comment

Choose a reason for hiding this comment

joluj Jan 27, 2025

Choose a reason for hiding this comment

TungstnBallon Jan 28, 2025

Choose a reason for hiding this comment

joluj Jan 27, 2025

Choose a reason for hiding this comment

joluj commented Jan 27, 2025