Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chunkHandler callback for use inside worker thread. #265

Open
SamMousa opened this issue Oct 1, 2015 · 2 comments
Open

Add chunkHandler callback for use inside worker thread. #265

SamMousa opened this issue Oct 1, 2015 · 2 comments

Comments

@SamMousa
Copy link

SamMousa commented Oct 1, 2015

I'm trying to create an interactive csv mapper / parser / uploader.

Assumptions:

  • CSV file is too large to fit in memory.
  • The file is uploaded in chunks.
  • The uploading / server side processing is smaller than the client side parsing / processing.
  • I use web workers to keep the page responsive.
  • I'm queuing ajax requests and firing them one after another.
    Problem:
  • Each time a chunk has been read the data gets copied to the main thread.
  • Data gets parsed faster than it gets consumed, but I cannot pause the worker.
  • So memory usage increases and browser (theoretically) runs out of memory.

Possible solutions (also based on other issues mentioning pausing for web workers in general).

  • Add callback that gets copied to worker thread. This callback runs synchronously.
  • The callback can massage data before sending it to the main thread callback. (Or just compute some statistics and send those)
  • The worker callback can pause and resume parsing.

Note that while #130 notes the downsides of allowing the asynchronous callback to pause workers it does not explore other solutions.

For example, the parse object (Papa.parse(File, config)) could expose the pause, abort and resume functions and post messages to the worker. This would allow for asynchronous pausing (ie pausing will happen whenever the worker thread checks its messages next.), but that is acceptable for most use cases.
For me it's unclear why resuming would be an issue. The only issue I currently see is that we cannot get a reference to the worker from outside the parse object, but that is something that can easily be fiexd.

@SamMousa
Copy link
Author

SamMousa commented Oct 1, 2015

Just noticed that when using workers, Papa.parse has no return value at all. It would be easy to create a management object that has a reference to the worker and functions pause and resume to send messages to the worker right?

@adamreisnz
Copy link

Since this is an issue open since 2015, I don't have high hopes of it being addressed.

But I too have exactly the same scenario as described above. I think it's a pretty significant shortcoming not to be able to pause processing in workers.

Yes, it will slow down processing as we're being warned of in the FAQ, but that should not be a reason to not have this feature at all in my opinion. The user will have to wait for server side processing to complete anyway, so it doesn't matter if processing is slower.

Currently we have no way of staggering that as the parser just dumps chunk after chunk which overloads the server.

The only way to solve this as far as I can see, would be to store the output chunks in memory, but that would defeat the purpose of streaming the file in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants