Releases: GoogleCloudDataproc/hadoop-connectors
2019-02-13 (GCS 1.9.14, BQ 0.13.14)
This version has a bug that leads to GCS list requests spike, please use 1.9.15 version instead.
Changelog
Cloud Storage connector:
- Implement Hadoop File System
concat
method using GCS compose API. - Add Hadoop File System extended attributes support.
BigQuery connector:
- POM updates for GCS connector 1.9.14.
2019-02-04 (GCS 1.9.13, BQ 0.13.13)
This version has a bug that leads to GCS list requests spike, please use 1.9.15 version instead.
Changelog
Cloud Storage connector:
- Fix implicit directories inference.
BigQuery connector:
- POM updates for GCS connector 1.9.13.
2019-01-30 (GCS 1.9.12, BQ 0.13.12)
This version has a bug in implicit directories inference feature and a bug that leads to GCS lists request spike, please use 1.9.15 version instead.
Changelog
Cloud Storage connector:
- Update all dependencies to latest versions.
- Improve GCS IO exception messages.
- Reduce latency of GCS IO operations.
- Fix bug that could lead to data duplication when reading files with GZIP content encoding (HTTP header
Content-Encoding: gzip
) that have uncompressed size of more than 2.14 GiB.
BigQuery connector:
- POM updates for GCS connector 1.9.12.
- Improve exception message for BigQuery job execution errors.
- Update all dependencies to latest versions.
2018-12-20 (GCS 1.9.11, BQ 0.13.11)
Changelog
Cloud Storage connector:
-
Changed the default value of
fs.gs.path.encoding
to 'uri-path', the new codec introduced in 1.4.5. The old behavior can be restored by settingfs.gs.path.encoding
to 'legacy'. -
Update all dependencies to latest versions.
-
Don't use
fs.gs.performance.cache.dir.metadata.prefetch.limit
property to prefetch metadata inPerformanceCachingGoogleCloudStorage
- always use single objects list request, because prefetching metadata with multiple list requests (when directory contains a lot of files) could introduce performance penalties when using performance cache. -
Add an option to lazily initialize
GoogleHadoopFileSystem
instances:fs.gs.lazy.init.enable (default: false)
-
Add ability to unset
fs.gs.system.bucket
with an empty string value:fs.gs.system.bucket=
-
Set default value for
fs.gs.working.dir
property to/
.
BigQuery connector:
- POM updates for GCS connector 1.9.11.
- Update all dependencies to latest versions.
2018-11-01 (GCS 1.9.10, BQ 0.13.10)
Changelog
Cloud Storage connector:
- Use Hadoop
CredentialProvider
API to retrieve proxy credentials. - Remove 1024 compose components limit from
SYNCABLE_COMPOSITE
output stream type.
BigQuery connector:
- POM updates for GCS connector 1.9.10.
2018-10-19 (GCS 1.9.9, BQ 0.13.9)
Changelog
Cloud Storage connector:
-
Add an option for running flat and regular glob search algorithms in parallel:
fs.gs.glob.concurrent.enable (default: true)
Returns a result of an algorithm that finishes first and cancels the other algorithm.
-
Add an option to provide path for configuration override file:
fs.gs.config.override.file (default: null)
Connector overrides its configuration with values provided in this file. This file should be in XML format that follows the same schema as Hadoop configuration files.
BigQuery connector:
- POM updates for GCS connector 1.9.9.
2018-10-03 (GCS 1.9.8, BQ 0.13.8)
Changelog
Cloud Storage connector:
-
Expose
FileChecksum
inGoogleHadoopFileSystem
via property (valid values:NONE
,CRC32C
,MD5
):fs.gs.checksum.type (default: NONE)
CRC32c checksum is compatible with HDFS-13056.
-
Add support for proxy authentication for both
APACHE
andJAVA_NET
HttpTransport
options.Proxy authentication is configurable with properties:
fs.gs.proxy.username (default: null) fs.gs.proxy.password (default: null)
-
Update Apache HttpClient to the latest version.
BigQuery connector:
- POM updates for GCS connector 1.9.8.
2018-09-20 (GCS 1.9.7, BQ 0.13.7)
Changelog
Cloud Storage connector:
-
Add an option to provide credentials directly in Hadoop Configuration, without having to place a file on every node, or associating service accounts with GCE VMs.
fs.gs.auth.service.account.private.key.id fs.gs.auth.service.account.private.key
-
Add an option to specify max bytes rewritten per rewrite request when
fs.gs.copy.with.rewrite.enable
is set totrue
:fs.gs.rewrite.max.bytes.per.call (default: 536870912)
Even though GCS does not require this parameter for rewrite requests, rewrite requests are flaky without it.
BigQuery connector:
- POM updates for GCS connector 1.9.7.
2018-09-20 (GCS 1.6.10, BQ 0.10.11)
Changelog
Cloud Storage connector:
-
Add an option to specify max bytes rewritten per rewrite request when
fs.gs.copy.with.rewrite.enable
is set totrue
:fs.gs.rewrite.max.bytes.per.call (default: 536870912)
Even though GCS does not require this parameter for rewrite requests, rewrite requests are flaky without it.
BigQuery connector:
- POM updates for GCS connector 1.6.10.
2018-09-04 (GCS 1.6.9, BQ 0.10.10)
Changelog
Cloud Storage connector:
-
Change default values for GCS batch/directory operations properties to improve performance:
fs.gs.copy.max.requests.per.batch (default: 1 -> 15) fs.gs.copy.batch.threads (default: 50 -> 15) fs.gs.max.requests.per.batch (default: 25 -> 15) fs.gs.batch.threads (default: 25 -> 15)
-
Update all dependencies to latest versions.
BigQuery connector:
- POM updates for GCS connector 1.6.9.
- Poll BQ jobs in their correct locations.
- Update all dependencies to latest versions.