Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize translation when Schema information is available in Spark Structured Streaming runner #19989

Closed
damccorm opened this issue Jun 4, 2022 · 1 comment

Comments

@damccorm
Copy link
Contributor

damccorm commented Jun 4, 2022

Spark Structured Streaming runner supports Datasets that already have Schema information. This is used by Spark to optimize jobs (via Catalyst). This issue is to implement optimized translations of the transforms for the runner so we can benefit of the performance improvements internally done by Spark.

Notice that we also may need to map Beam's core internal representations like WindowedValue so we can have intermediary optimizations.

Imported from Jira BEAM-9451. Original Jira may contain additional context.
Reported by: iemejia.

@mosche
Copy link
Member

mosche commented Oct 21, 2022

Fixed with #22445

@mosche mosche closed this as not planned Won't fix, can't repro, duplicate, stale Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants