Skip to content

Releases: GoogleCloudDataproc/hadoop-connectors

2019-02-13 (GCS 1.9.14, BQ 0.13.14)

14 Feb 05:13
Compare
Choose a tag to compare

This version has a bug that leads to GCS list requests spike, please use 1.9.15 version instead.

Changelog

Cloud Storage connector:

  1. Implement Hadoop File System concat method using GCS compose API.
  2. Add Hadoop File System extended attributes support.

BigQuery connector:

  1. POM updates for GCS connector 1.9.14.

2019-02-04 (GCS 1.9.13, BQ 0.13.13)

04 Feb 20:44
Compare
Choose a tag to compare

This version has a bug that leads to GCS list requests spike, please use 1.9.15 version instead.

Changelog

Cloud Storage connector:

  1. Fix implicit directories inference.

BigQuery connector:

  1. POM updates for GCS connector 1.9.13.

2019-01-30 (GCS 1.9.12, BQ 0.13.12)

30 Jan 22:51
Compare
Choose a tag to compare

This version has a bug in implicit directories inference feature and a bug that leads to GCS lists request spike, please use 1.9.15 version instead.

Changelog

Cloud Storage connector:

  1. Update all dependencies to latest versions.
  2. Improve GCS IO exception messages.
  3. Reduce latency of GCS IO operations.
  4. Fix bug that could lead to data duplication when reading files with GZIP content encoding (HTTP header Content-Encoding: gzip) that have uncompressed size of more than 2.14 GiB.

BigQuery connector:

  1. POM updates for GCS connector 1.9.12.
  2. Improve exception message for BigQuery job execution errors.
  3. Update all dependencies to latest versions.

2018-12-20 (GCS 1.9.11, BQ 0.13.11)

20 Dec 21:50
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Changed the default value of fs.gs.path.encoding to 'uri-path', the new codec introduced in 1.4.5. The old behavior can be restored by setting fs.gs.path.encoding to 'legacy'.

  2. Update all dependencies to latest versions.

  3. Don't use fs.gs.performance.cache.dir.metadata.prefetch.limit property to prefetch metadata in PerformanceCachingGoogleCloudStorage - always use single objects list request, because prefetching metadata with multiple list requests (when directory contains a lot of files) could introduce performance penalties when using performance cache.

  4. Add an option to lazily initialize GoogleHadoopFileSystem instances:

    fs.gs.lazy.init.enable (default: false)
    
  5. Add ability to unset fs.gs.system.bucket with an empty string value:

    fs.gs.system.bucket=
    
  6. Set default value for fs.gs.working.dir property to /.

BigQuery connector:

  1. POM updates for GCS connector 1.9.11.
  2. Update all dependencies to latest versions.

2018-11-01 (GCS 1.9.10, BQ 0.13.10)

01 Nov 21:58
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Use Hadoop CredentialProvider API to retrieve proxy credentials.
  2. Remove 1024 compose components limit from SYNCABLE_COMPOSITE output stream type.

BigQuery connector:

  1. POM updates for GCS connector 1.9.10.

2018-10-19 (GCS 1.9.9, BQ 0.13.9)

20 Oct 00:00
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Add an option for running flat and regular glob search algorithms in parallel:

    fs.gs.glob.concurrent.enable (default: true)
    

    Returns a result of an algorithm that finishes first and cancels the other algorithm.

  2. Add an option to provide path for configuration override file:

    fs.gs.config.override.file (default: null)
    

    Connector overrides its configuration with values provided in this file. This file should be in XML format that follows the same schema as Hadoop configuration files.

BigQuery connector:

  1. POM updates for GCS connector 1.9.9.

2018-10-03 (GCS 1.9.8, BQ 0.13.8)

03 Oct 21:06
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Expose FileChecksum in GoogleHadoopFileSystem via property (valid values: NONE, CRC32C, MD5):

    fs.gs.checksum.type (default: NONE)
    

    CRC32c checksum is compatible with HDFS-13056.

  2. Add support for proxy authentication for both APACHE and JAVA_NET HttpTransport options.

    Proxy authentication is configurable with properties:

    fs.gs.proxy.username (default: null)
    fs.gs.proxy.password (default: null)
    
  3. Update Apache HttpClient to the latest version.

BigQuery connector:

  1. POM updates for GCS connector 1.9.8.

2018-09-20 (GCS 1.9.7, BQ 0.13.7)

20 Sep 19:36
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Add an option to provide credentials directly in Hadoop Configuration, without having to place a file on every node, or associating service accounts with GCE VMs.

    fs.gs.auth.service.account.private.key.id
    fs.gs.auth.service.account.private.key
    
  2. Add an option to specify max bytes rewritten per rewrite request when fs.gs.copy.with.rewrite.enable is set to true:

    fs.gs.rewrite.max.bytes.per.call (default: 536870912)
    

    Even though GCS does not require this parameter for rewrite requests, rewrite requests are flaky without it.

BigQuery connector:

  1. POM updates for GCS connector 1.9.7.

2018-09-20 (GCS 1.6.10, BQ 0.10.11)

20 Sep 17:19
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Add an option to specify max bytes rewritten per rewrite request when fs.gs.copy.with.rewrite.enable is set to true:

    fs.gs.rewrite.max.bytes.per.call (default: 536870912)
    

    Even though GCS does not require this parameter for rewrite requests, rewrite requests are flaky without it.

BigQuery connector:

  1. POM updates for GCS connector 1.6.10.

2018-09-04 (GCS 1.6.9, BQ 0.10.10)

04 Sep 22:40
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Change default values for GCS batch/directory operations properties to improve performance:

    fs.gs.copy.max.requests.per.batch (default: 1 -> 15)
    fs.gs.copy.batch.threads (default: 50 -> 15)
    fs.gs.max.requests.per.batch (default: 25 -> 15)
    fs.gs.batch.threads (default: 25 -> 15)
    
  2. Update all dependencies to latest versions.

BigQuery connector:

  1. POM updates for GCS connector 1.6.9.
  2. Poll BQ jobs in their correct locations.
  3. Update all dependencies to latest versions.