Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse a large uploaded gzipped csv file in the browser #1074

Open
psaffrey-biomodal opened this issue Nov 13, 2024 · 0 comments
Open

Parse a large uploaded gzipped csv file in the browser #1074

psaffrey-biomodal opened this issue Nov 13, 2024 · 0 comments

Comments

@psaffrey-biomodal
Copy link

psaffrey-biomodal commented Nov 13, 2024

The file size I'm using is about 200MB. I've already made this work by dumping everything into memory, but I want to use streams and put some custom logic into the parsing step to speed it up.

Doing this with streams is easy enough in Node:

import fs from 'fs'
import zlib from 'zlib'
import papaparse from 'papaparse'

const path = "/path/to/file.csv.gz"    
const data = [];

const parser = papaparse.parse(papaparse.NODE_STREAM_INPUT)

parser.on('data', (chunk) => {
    data.push(chunk);
    }
);

parser.on('end', () => {
    console.log(data)
    }
);

fs.createReadStream(filePath)
  .pipe(zlib.createGunzip())
  .pipe(parser);

To do the same in the browser (as far as I can tell), you need to turn an <input> file into a stream, push it through a DecompressionStream('gzip').writable and then push that into a stream capable papaparse parser. So I have this:

const fileInput = document.getElementById('selectFileBtn');

async function parseGzippedCsv(file) {
  const parser = papaparse.parse(papaparse.NODE_STREAM_INPUT)
  parser.on('data', (chunk) => {
      data.push(chunk);
      }
  );

  parser.on('end', () => {
      // I actually need to wrap in a Promise to make this work, but it fails before it gets here
      resolve(data);
    }
  );
  
  file.stream()        
    .pipeTo(new DecompressionStream('gzip').writable)
    .pipeTo(parser);
}

fileInput.addEventListener('change', async function(e) {
  const file = e.target.files[0];
  const data = await parseGzippedCsv(file);
});

The error is: TypeError: Cannot read properties of null (reading 'stream') on the line that creates the papaparse parser, so maybe I can't use papaparse.NODE_STREAM_INPUT in the browser...?

I've also tried to do something similar with csv-parse without success. I'm a bit surprised nobody else wants to do this in the browser 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant