Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async Methods #35

Closed
KyleGobel opened this issue Jan 29, 2020 · 7 comments
Closed

Async Methods #35

KyleGobel opened this issue Jan 29, 2020 · 7 comments

Comments

@KyleGobel
Copy link
Contributor

Kinda picking up off of https://github.com/elastacloud/parquet-dotnet/issues/20

I see now that the thrift lib ONLY has async methods, and the older clients are being removed next release.

I'm not sure how often parquet updates it's thrift spec, or how often thrift adds new features or w/e though, so this might not be a big deal.

Curious if In the past have you done any testing to see what kind of negative performance making some of the reading/writing async has?

Was considering helping and trying to update to the latest netstd thrift generation, but I'm more interested in making this faster....not slower ;)

@aloneguid
Copy link
Owner

This can be a good thing if you're writing a very busy service, I agree. We could prototype this in some way and compare the performance. It will be slightly lower on a single thread for sure, but for multiple transforms could be a big win.

@felipepessoto
Copy link
Contributor

Async won't make it faster, but will make it more scalable. For I/O, async is definitely the way to go.

We had a lot of async enhancements in the recent releases of .NET Core, like performance and memory allocation improvements and new API's. Do you have plans to drop support to old .NET versions and migrate to .NET 3.1/5?

Span can also bring performance improvements, many new Async APIs uses Span to avoid copying memory

@felipepessoto
Copy link
Contributor

Thrift for netstd supports only .NET 4.6.1+ : https://github.com/apache/thrift/blob/master/lib/netstd/README.md
It seems these isn't an NuGet package with the new library

@aloneguid
Copy link
Owner

The issue is Thrift part plays a miniscule role on performance, and will actually slow down a bit. You want async in the core where data pages are read from disk.

@felipepessoto
Copy link
Contributor

@aloneguid are you talking about APIs using BinaryReader/BinaryWriter?

I sent I PR where I changed the ParquetActor.ReadMetadata and ParquetRowGroupReader.ReadColumn to async for example, but yes, I started it based on Thrift APIs.

BinaryReader/BinaryWriter still don't have async overload, we may need to read directly from Stream, or use an Open Source alternative: https://github.com/ronnieoverby/AsyncBinaryReaderWriter

@felipepessoto
Copy link
Contributor

felipepessoto commented Aug 8, 2020

Partially blocked by dotnet/runtime#17229

@aloneguid
Copy link
Owner

closing as very old issue unlikely to be worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants