Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When bom and skipRecordsWithError skip event is not raised for skipped records #411

Closed
omerlh opened this issue Dec 7, 2023 · 7 comments

Comments

@omerlh
Copy link

omerlh commented Dec 7, 2023

Describe the bug

Setting both options results in csv file parsed correctly but no event is fired for skipped lines.

To Reproduce

import { createReadStream } from 'node:fs';
import { finished } from 'node:stream/promises';
import { parse } from 'csv-parse';

const parser = parse({
  bom: true,
  skipRecordsWithError: true,
});

const outStream = createReadStream('tests/closed-box/fixtures/data-with-bom.csv').pipe(parser);

outStream.on('skip', (e) => {
  console.error(e);
});


await finished(outStream);

I would expect the console error to be called but it doesn't. When removing the bom option and using something like strip-bom-stream the event is fired correctly.

Additional context
data-with-bom.csv

@alexkorsun
Copy link

Any update on this issue?

@wdavidw
Copy link
Member

wdavidw commented May 13, 2024

Please look at this commit. I have tried to reproduce your error but it seems to run fine, catch the error passed in the "skip" event.

@alexkorsun
Copy link

alexkorsun commented May 13, 2024

There is a code that reproduces the issue on my side.

import path from 'path';
import { pipeline } from 'stream/promises';
import { parse as parseCSV } from 'csv-parse';
import { Writable } from 'stream';
import { createReadStream } from 'fs';

async function testRecordsSkip() {
  const errors: any = [];
  const records: any = [];

  const sink = new Writable({
    objectMode: true,
    write: (_, __, callback) => {
      records.push(_);
      callback();
    },
  });

  const csvSource = createReadStream(path.join(__dirname, 'test_with_bom.csv'));
  const parser = parseCSV({
    skip_records_with_error: true,
    bom: true,
  });
  parser.on('skip', function (err) {
    errors.push(err);
  });

  await pipeline(csvSource, parser, sink);

  console.log({
    records,
    errors,
  });
}

testRecordsSkip().catch(console.error);

The output is

{
  records: [
    [ 'id', 'first_name', 'last_name', 'email', 'modified_at' ],
    [ '1', 'Ring', 'Grinyov', '[email protected]', '2022-02-14' ],
    [ '3', 'Cammi', 'Bendix', '[email protected]', '2022-02-14' ]
  ],
  errors: []
}

Test file is there
test_with_bom.csv

@wdavidw
Copy link
Member

wdavidw commented May 13, 2024

I tried again with your code and the output is still what we expect:

{
  records: [
    [ 'id', 'first_name', 'last_name', 'email', 'modified_at' ],
    [ '1', 'Ring', 'Grinyov', '[email protected]', '2022-02-14' ],
    [ '3', 'Cammi', 'Bendix', '[email protected]', '2022-02-14' ]
  ],
  errors: [
    CsvError: Invalid Record Length: expect 5, got 6 on line 3
        at Object.__onRecord (file:///Users/david/projects/open-source/csv/csv/packages/csv-parse/lib/api/index.js:364:11)
        at Object.parse (file:///Users/david/projects/open-source/csv/csv/packages/csv-parse/lib/api/index.js:247:40)
        at Parser._transform (file:///Users/david/projects/open-source/csv/csv/packages/csv-parse/lib/index.js:31:26)
        at Transform._write (node:internal/streams/transform:175:8)
        at writeOrBuffer (node:internal/streams/writable:392:12)
        at _write (node:internal/streams/writable:333:10)
        at Writable.write (node:internal/streams/writable:337:10)
        at ReadStream.ondata (node:internal/streams/readable:766:22)
        at ReadStream.emit (node:events:514:28)
        at addChunk (node:internal/streams/readable:324:12) {
      code: 'CSV_RECORD_INCONSISTENT_FIELDS_LENGTH',
      bytes: 141,
      comment_lines: 0,
      empty_lines: 0,
      invalid_field_length: 0,
      lines: 3,
      records: 2,
      columns: false,
      error: undefined,
      header: false,
      index: 6,
      raw: undefined,
      column: 6,
      quoting: false,
      record: [Array]
    }
  ]
}

Note, tested with Node.js versions "v16.13.0", "v18.17.1", "v20.5.1" and "v21.5.0".

@alexkorsun
Copy link

alexkorsun commented May 13, 2024

I tried again with your code and the output is still what we expect:

{
  records: [
    [ 'id', 'first_name', 'last_name', 'email', 'modified_at' ],
    [ '1', 'Ring', 'Grinyov', '[email protected]', '2022-02-14' ],
    [ '3', 'Cammi', 'Bendix', '[email protected]', '2022-02-14' ]
  ],
  errors: [
    CsvError: Invalid Record Length: expect 5, got 6 on line 3
        at Object.__onRecord (file:///Users/david/projects/open-source/csv/csv/packages/csv-parse/lib/api/index.js:364:11)
        at Object.parse (file:///Users/david/projects/open-source/csv/csv/packages/csv-parse/lib/api/index.js:247:40)
        at Parser._transform (file:///Users/david/projects/open-source/csv/csv/packages/csv-parse/lib/index.js:31:26)
        at Transform._write (node:internal/streams/transform:175:8)
        at writeOrBuffer (node:internal/streams/writable:392:12)
        at _write (node:internal/streams/writable:333:10)
        at Writable.write (node:internal/streams/writable:337:10)
        at ReadStream.ondata (node:internal/streams/readable:766:22)
        at ReadStream.emit (node:events:514:28)
        at addChunk (node:internal/streams/readable:324:12) {
      code: 'CSV_RECORD_INCONSISTENT_FIELDS_LENGTH',
      bytes: 141,
      comment_lines: 0,
      empty_lines: 0,
      invalid_field_length: 0,
      lines: 3,
      records: 2,
      columns: false,
      error: undefined,
      header: false,
      index: 6,
      raw: undefined,
      column: 6,
      quoting: false,
      record: [Array]
    }
  ]
}

Note, tested with Node.js versions "v16.13.0", "v18.17.1", "v20.5.1" and "v21.5.0".

I replaced your file with the one I shared and it doesn't work, but when I test it with the one from the link you shared, it works.

When I check the files, I see this, meaning that your file doesn't contain a BOM?

alex@undefined:~/WebstormProjects/test$ cat src/411.csv | file -
/dev/stdin: ASCII text
alex@undefined:~/WebstormProjects/test$ cat src/test_with_bom.csv | file -
/dev/stdin: Unicode text, UTF-8 (with BOM) text

@alexkorsun
Copy link

So there is a script that should reproduce an issue on your side

import assert from 'node:assert';
import { Writable } from 'node:stream';
import { parse } from 'csv-parse';
import { finished } from 'stream/promises';
import { Readable } from 'stream';

async function a() {
  const errors: any = [];
  const parser = parse({
    bom: true,
    skipRecordsWithError: true,
  });
  // Create a stream and consume its source
  const sink = new Writable({
    objectMode: true,
    write: (_, __, callback) => callback(),
  });

  const csv = `\ufeffkey,value
1,value-1
2,value-2
3,"value-3
one more line
and one more line"`;

  const csvSource = Readable.from([csv]);
  const outStream = csvSource.pipe(parser).pipe(sink);
  // Catch records with errors
  parser.on('skip', (e) => {
    errors.push(e);
  });
  // Wait for stream to be consumed
  await finished(outStream);
  // Catch error from skip event
  assert.equal(errors.length, 1);
}

a().catch(console.error);

@wdavidw
Copy link
Member

wdavidw commented May 13, 2024

My bad, I forgot that the issue was associated with the bom. Version "5.5.6" of csv-parse fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants