Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate and remove export to parquet feature #3156

Closed
lmsurpre opened this issue Jan 5, 2022 · 2 comments
Closed

Deprecate and remove export to parquet feature #3156

lmsurpre opened this issue Jan 5, 2022 · 2 comments
Assignees
Labels
bulk-data deprecation removal this change involves removal of a component, class, method, etc

Comments

@lmsurpre
Copy link
Member

lmsurpre commented Jan 5, 2022

IBM FHIR Server currently has an "export to parquet" feature that is disabled by default.
The feature has seen limited usage and the currently implementation brings in much of Apache Spark which leads oto the following issues when enabled:

  1. greatly increases the size of the ibm-fhir-server image
  2. increases the attack surface area (e.g. the recent log4j curfuffle)
  3. introduces jar hell leading to issues like Odd logging behavior when parquet export is enabled #3070

Unless we can come up with a much better implementation (no small feat), I think we should remove this feature and replace it with documentation that clearly shows how to convert the exported NDJSON to Parquet using spark (maybe a blog post?)

@lmsurpre lmsurpre added deprecation removal this change involves removal of a component, class, method, etc labels Jan 5, 2022
prb112 added a commit that referenced this issue Jan 21, 2022
lmsurpre pushed a commit that referenced this issue Jan 26, 2022
* Deprecate and remove export to parquet feature #3156

Signed-off-by: Paul Bastide <[email protected]>

* Update Parquet to Deprecated

Signed-off-by: Paul Bastide <[email protected]>
@lmsurpre
Copy link
Member Author

I think we should remove this feature and replace it with documentation that clearly shows how to convert the exported NDJSON to Parquet using spark (maybe a blog post?)

this piece of it is not done yet and I think we should do it either for this one or in a new related task

@lmsurpre
Copy link
Member Author

I split the doc task into its own issue. The spark and stocator depencies have been removed and references to "export to parquet" are removed from the documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bulk-data deprecation removal this change involves removal of a component, class, method, etc
Projects
None yet
Development

No branches or pull requests

2 participants