Skip to content

Commit

Permalink
Deprecate and remove export to parquet feature #3156 (#3228)
Browse files Browse the repository at this point in the history
* Deprecate and remove export to parquet feature #3156

Signed-off-by: Paul Bastide <[email protected]>

* Update Parquet to Deprecated

Signed-off-by: Paul Bastide <[email protected]>
  • Loading branch information
prb112 authored Jan 26, 2022
1 parent 07a077b commit 8d5cc0f
Show file tree
Hide file tree
Showing 11 changed files with 22 additions and 458 deletions.
4 changes: 2 additions & 2 deletions docs/src/pages/guides/FHIRBulkOperations.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: post
title: IBM FHIR Server Bulk Data Guide
description: IBM FHIR Server Bulk Data Guide
date: 2021-03-10
date: 2022-01-20
permalink: /FHIRBulkOperations/
---

Expand All @@ -23,7 +23,7 @@ The `$export` operation uses three OperationDefinition:
- [Patient](http://hl7.org/fhir/uv/bulkdata/STU1/OperationDefinition-patient-export.html) - Obtain a set of resources pertaining to all patients. Exports to an S3-compatible data store.
- [Group](http://hl7.org/fhir/uv/bulkdata/STU1/OperationDefinition-group-export.html) - Obtain a set of resources pertaining to patients in a specific Group. Only supports static membership; does not resolve inclusion/exclusion criteria.

The export may be to the ndjson or parquet format.
The export is in the ndjson format.

### **$export: Create a Bulk Data Request**
To create an export request, the IBM FHIR Server requires the body fields of the request object to be a FHIR Resource `Parameters` JSON Object. The request must be posted to the server using `POST`. Each request is limited to a single resource type in each imported or referenced file.
Expand Down
20 changes: 2 additions & 18 deletions docs/src/pages/guides/FHIRServerUsersGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
layout: post
title: IBM FHIR Server User's Guide
description: IBM FHIR Server User's Guide
Copyright: years 2017, 2021
lastupdated: "2021-12-03"
Copyright: years 2017, 2022
lastupdated: "2022-01-20"
permalink: /FHIRServerUsersGuide/
---

Expand Down Expand Up @@ -1337,7 +1337,6 @@ The Bulk Data web application writes the exported FHIR resources to an IBM Cloud
"accessKeyId": "example",
"secretAccessKey": "example-password"
},
"enableParquet": false,
"disableBaseUrlValidation": true,
"disableOperationOutcomes": true,
"duplicationCheck": false,
Expand Down Expand Up @@ -1512,7 +1511,6 @@ Example of `path` based access:
"accessKeyId": "example",
"secretAccessKey": "example-password"
},
"enableParquet": false,
"disableBaseUrlValidation": true,
"disableOperationOutcomes": true,
"duplicationCheck": false,
Expand Down Expand Up @@ -1554,7 +1552,6 @@ Example of `host` based access:
"accessKeyId": "example",
"secretAccessKey": "example-password"
},
"enableParquet": false,
"disableBaseUrlValidation": true,
"disableOperationOutcomes": true,
"duplicationCheck": false,
Expand Down Expand Up @@ -1611,15 +1608,6 @@ This feature is useful for imports which follow a prefix pattern:
### 4.10.3 Integration Testing
To integration test, there are tests in `ExportOperationTest.java` in `fhir-server-test` module with server integration test cases for system, patient and group export. Further, there are tests in `ImportOperationTest.java` in `fhir-server-test` module. These tests rely on the `fhir-server-config-db2.json` which specifies two storageProviders.

### 4.10.4 Export to Parquet
Version 4.4 of the IBM FHIR Server introduced experimental support for exporting to Parquet format (as an alternative to the default NDJSON export). However, due to the size of the dependencies needed to make this work, this feature is disabled by default.

To enable export to parquet, an administrator must:
1. make Apache Spark (version 3.0) and the IBM Stocator adapter (version 1.1) available to the fhir-bulkdata-webapp by dropping the necessary jar files under `fhir-server/userlib` directory; and
2. set the `/fhirServer/bulkdata/storageProviders/(source)/enableParquet` config property to `true`

An alternative way to accomplish the first part of this is to change the scope of these dependencies from the fhir-bulkdata-webapp pom.xml and rebuild the webapp to include them.

### 4.10.5 Job Logs
Because the bulk import and export operations are built on Liberty's java batch implementation, users may need to check the [Liberty batch job logs](https://www.ibm.com/support/knowledgecenter/SSEQTP_liberty/com.ibm.websphere.wlp.doc/ae/rwlp_batch_view_joblog.html) for detailed step information / troubleshooting.

Expand Down Expand Up @@ -2232,7 +2220,6 @@ This section contains reference information about each of the configuration prop
|`fhirServer/bulkdata/storageProviders/<source>/fileBase`|string| The absolute path of the output directory. It is recommended this path is not the mount point of a volume. For instance, if a volume is mounted to /output/bulkdata, use /output/bulkdata/data to ensure a failed mount does not result in writing to the root file system.|
|`fhirServer/bulkdata/storageProviders/<source>/validBaseUrls`|list|The list of supported urls which are approved for the fhir server to access|
|`fhirServer/bulkdata/storageProviders/<source>/disableBaseUrlValidation`|boolean|Disables the URL checking feature, allowing all URLs to be imported|
|`fhirServer/bulkdata/storageProviders/<source>/enableParquet`|boolean|Whether or not the server is configured to support export to parquet; to properly enable it the administrator must first make spark and stocator available to the fhir-bulkdata-webapp (e.g through the shared lib at `wlp/user/shared/resources/lib`)|
|`fhirServer/bulkdata/storageProviders/<source>/disableOperationOutcomes`|boolean|Disables the base url validation, allowing all URLs to be imported|
|`fhirServer/bulkdata/storageProviders/<source>/duplicationCheck`|boolean|Enables duplication check on import|
|`fhirServer/bulkdata/storageProviders/<source>/validateResources`|boolean|Enables the validation of imported resources|
Expand Down Expand Up @@ -2363,7 +2350,6 @@ This section contains reference information about each of the configuration prop
|`fhirServer/bulkdata/cosFileMaxSize`|209715200|
|`fhirServer/bulkdata/patientExportPageSize`|200|
|`fhirServer/bulkdata/useFhirServerTrustStore`|false|
|`fhirServer/bulkdata/enableParquet`|false|
|`fhirServer/bulkdata/ignoreImportOutcomes`|false|
|`fhirServer/bulkdata/enabled`|true |
|`fhirServer/bulkdata/core/api/trustAll`|false|
Expand All @@ -2386,7 +2372,6 @@ This section contains reference information about each of the configuration prop
|`fhirServer/bulkdata/core/defaultOutcomeProvider`|default|
|`fhirServer/bulkdata/core/enableSkippableUpdates`|true|
|`fhirServer/bulkdata/storageProviders/<source>/disableBaseUrlValidation`|false|
|`fhirServer/bulkdata/storageProviders/<source>/enableParquet`|false|
|`fhirServer/bulkdata/storageProviders/<source>/disableOperationOutcomes`|false|
|`fhirServer/bulkdata/storageProviders/<source>/duplicationCheck`|false|
|`fhirServer/bulkdata/storageProviders/<source>/validateResources`|false|
Expand Down Expand Up @@ -2554,7 +2539,6 @@ must restart the server for that change to take effect.
|`fhirServer/bulkdata/storageProviders/<source>/fileBase`|Y|Y|
|`fhirServer/bulkdata/storageProviders/<source>/validBaseUrls`|Y|Y|
|`fhirServer/bulkdata/storageProviders/<source>/disableBaseUrlValidation`|Y|Y|
|`fhirServer/bulkdata/storageProviders/<source>/enableParquet`|Y|Y|
|`fhirServer/bulkdata/storageProviders/<source>/disableOperationOutcomes`|Y|Y|
|`fhirServer/bulkdata/storageProviders/<source>/duplicationCheck`|Y|Y|
|`fhirServer/bulkdata/storageProviders/<source>/validateResources`|Y|Y|
Expand Down
65 changes: 5 additions & 60 deletions fhir-bulkdata-webapp/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,11 @@
<artifactId>fhir-provider</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>jakarta.servlet</groupId>
<artifactId>jakarta.servlet-api</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<!-- azure needs to come before spark/stocator -->
<groupId>com.azure</groupId>
Expand All @@ -126,66 +131,6 @@
<groupId>com.azure</groupId>
<artifactId>azure-core</artifactId>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</exclusion>
<exclusion>
<groupId>org.glassfish.jersey.core</groupId>
<artifactId>jersey-client</artifactId>
</exclusion>
<exclusion>
<groupId>org.glassfish.jersey.core</groupId>
<artifactId>jersey-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.glassfish.jersey.containers</groupId>
<artifactId>jersey-container-servlet</artifactId>
</exclusion>
<exclusion>
<groupId>org.glassfish.jersey.core</groupId>
<artifactId>jersey-server</artifactId>
</exclusion>
<exclusion>
<groupId>org.glassfish.jersey.inject</groupId>
<artifactId>jersey-hk2</artifactId>
</exclusion>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-storage-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.orc</groupId>
<artifactId>orc-mapreduce</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-vector</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.ibm.stocator</groupId>
<artifactId>stocator</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* (C) Copyright IBM Corp. 2020, 2021
* (C) Copyright IBM Corp. 2020, 2022
*
* SPDX-License-Identifier: Apache-2.0
*/
Expand All @@ -21,8 +21,6 @@
import javax.enterprise.context.Dependent;
import javax.inject.Inject;

import org.apache.spark.sql.SparkSession;

import com.ibm.fhir.bulkdata.jbatch.context.BatchContextAdapter;
import com.ibm.fhir.bulkdata.jbatch.export.data.ExportCheckpointUserData;
import com.ibm.fhir.exception.FHIRException;
Expand Down Expand Up @@ -62,24 +60,6 @@ public void beforeJob() throws Exception {
// Register the context to get the right configuration.
ConfigurationAdapter adapter = ConfigurationFactory.getInstance();
adapter.registerRequestContext(ctx.getTenantId(), ctx.getDatastoreId(), ctx.getIncomingUrl());

if (adapter.isStorageProviderParquetEnabled(ctx.getSource())) {
try {
Class.forName("org.apache.spark.sql.SparkSession");

// Create the global spark session
SparkSession.builder().appName("parquetWriter")
// local : Run Spark locally with one worker thread (i.e. no parallelism at all).
// local[*] : Run Spark locally with as many worker threads as logical cores on your machine.
.master("local[*]")
// this undocumented feature allows us to avoid a bunch of unneccessary dependencies and avoid
// launching the unnecessary SparkUI stuff, but there is some risk in using it as its
// undocumented.
.config("spark.ui.enabled", false).getOrCreate();
} catch (ClassNotFoundException e) {
logger.warning("No SparkSession in classpath; skipping spark session initialization");
}
}
} catch (Exception e) {
logger.log(Level.SEVERE, "ExportJobListener: beforeJob failed job[" + executionId + "]", e);
throw e;
Expand Down
Loading

0 comments on commit 8d5cc0f

Please sign in to comment.