Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support compiling remaining DataFusion crates (datafusion-core) to WASM #7652

Open
Tracked by #13815 ...
alamb opened this issue Sep 25, 2023 · 5 comments
Open
Tracked by #13815 ...

Support compiling remaining DataFusion crates (datafusion-core) to WASM #7652

alamb opened this issue Sep 25, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Sep 25, 2023

Is your feature request related to a problem or challenge?

As shown by @jonmmease in #7633, some of the datafusion crates can be compiled to WASM:

datafusion-common
datafusion-expr
datafusion-optimizer
datafusion-physical-expr
datafusion-sql

The difficulty with getting the remaining DataFusion crates compiled to WASM is that they have non-optional dependencies on the parquet crate with its default features enabled. Several of the default parquet crate features require native dependencies that are not compatible with WASM, in particular the lz4 and zstd features. If we can arrange our feature flags to make it possible to depend on parquet with these features disabled, then it should be possible to compile the core datafusion crate to WASM as well.

Describe the solution you'd like

One approach might be to disable the relevant parquet features that could not be compiled as described below.

From https://github.com/apache/arrow-datafusion/pull/7633/files#r1335824930 between @jonmmease and @tustvold

@tustvold do you have any thoughts about finagling the parquet crate's dependencies so it can compile, by default, on wasm? Should we perhaps change datafusion to disable the parquet default features?

 tustvold 

IIRC it is the compression codecs that have issues with WASM, disabling these by default I think would be surprising for users. Further I'm not sure how useful parquet support would be given that only InMemory object_store is supported on WASM, although I may have some time to look into this over the next couple of days

 jonmmease 

Yeah, I don't think we'd want DataFusion's default build to disable the default parquet features, but if we could arrange things so that depending on the datafusion core crate with default-features=false would either remove the parquet dependency all together, or disable the default parquet features, then I think we could get things at least compiling for wasm.

Describe alternatives you've considered

No response

Additional context

No response

@alamb
Copy link
Contributor Author

alamb commented Sep 26, 2023

A good first step might be to simply make parquet optional in DataFusion -- aka #7653

That would allow us to validate and explore what dependencies are blocking wasm compilation

@tustvold
Copy link
Contributor

tustvold commented Oct 1, 2023

apache/arrow-rs#4884 makes parquet compile for WASM

@alamb
Copy link
Contributor Author

alamb commented Oct 25, 2023

Also, #7745 make parquet support optional in DataFusion

@fudini
Copy link

fudini commented Jan 20, 2024

I managed to compile for wasm, but I encountered a couple of problems:

  1. Stack overflow at SessionContext::new
  2. Use of std::time::Instant - this won't compile and probably needs to be hidden behind cfg
    main...fudini:arrow-datafusion:wasm

After these changes I was able to create SessionContext and run a simple query

@alamb
Copy link
Contributor Author

alamb commented Dec 17, 2024

I have filed an epic to track additional WASM work here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants