-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GOBBLIN-1709] Create Iceberg Datasets Finder, Iceberg Dataset and FileSet to generate Copy Entities to support Distcp for Iceberg #3560
[GOBBLIN-1709] Create Iceberg Datasets Finder, Iceberg Dataset and FileSet to generate Copy Entities to support Distcp for Iceberg #3560
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3560 +/- ##
============================================
+ Coverage 46.77% 46.80% +0.03%
- Complexity 10512 10544 +32
============================================
Files 2099 2105 +6
Lines 81988 82136 +148
Branches 9132 9144 +12
============================================
+ Hits 38349 38444 +95
- Misses 40096 40148 +52
- Partials 3543 3544 +1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good start here, meeth.
do perhaps consider retitling the commit since this is about IcebergDataset
, IcebergFileSet
and CopyEntities
...-management/src/test/java/org/apache/gobblin/runtime/embedded/EmbeddedGobblinDistcpTest.java
Show resolved
Hide resolved
gobblin-runtime/src/main/resources/templates/icebergDistcp.template
Outdated
Show resolved
Hide resolved
...ement/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergTableFileSet.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergFileSet.java
Outdated
Show resolved
Hide resolved
...ment/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetFinder.java
Outdated
Show resolved
Hide resolved
...gement/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java
Outdated
Show resolved
Hide resolved
...gement/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java
Outdated
Show resolved
Hide resolved
...gement/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java
Outdated
Show resolved
Hide resolved
...gement/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java
Outdated
Show resolved
Hide resolved
...gement/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice revision... very close now.
again, I still suggest retitling. maybe "Add IcebergDatasetsFinder to generate CopyEntities for Iceberg Distcp"?
...ment/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetFinder.java
Outdated
Show resolved
Hide resolved
...ment/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetFinder.java
Outdated
Show resolved
Hide resolved
...ement/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergTableFileSet.java
Outdated
Show resolved
Hide resolved
...gement/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java
Outdated
Show resolved
Hide resolved
...gement/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java
Outdated
Show resolved
Hide resolved
...gement/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java
Outdated
Show resolved
Hide resolved
c77f229
to
a792af1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost looks like some changes you recently made have been backed out... I'll let you guide me before I continue re-reading
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
...ment/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetFinder.java
Outdated
Show resolved
Hide resolved
...gement/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test is looking much better! now mostly just some tips on late-breaking logging you added
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Show resolved
Hide resolved
...ment/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetFinder.java
Outdated
Show resolved
Hide resolved
...ement/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergTableFileSet.java
Outdated
Show resolved
Hide resolved
...-management/src/test/java/org/apache/gobblin/runtime/embedded/EmbeddedGobblinDistcpTest.java
Show resolved
Hide resolved
...gement/src/test/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great meeth--nice work!
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
...ment/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetFinder.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
...ment/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDatasetFinder.java
Outdated
Show resolved
Hide resolved
...management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergDataset.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great--I believe we're finally there, nice work!
…leSet to generate Copy Entities to support Distcp for Iceberg (apache#3560) * initial commit for iceberg distcp. * adding copy entity helper and icerbeg distcp template and test case. * Adding unit tests and refactoring method definitions for an Iceberg dataset. * resolve conflicts after cleaning history * update iceberg dataset and finder to include javadoc * addressed comments on PR and aligned code check style * renamed vars, added logging and updated javadoc * update dataset descriptor with ternary operation and rename fs to sourceFs * added source and target fs and update iceberg dataset finder constructor * Update source and dest dataset methods as protected and add req args constructor * change the order of attributes for iceberg dataset finder ctor * update iceberg dataset methods with correct source and target fs Co-authored-by: Meeth Gala <[email protected]>
…leSet to generate Copy Entities to support Distcp for Iceberg (apache#3560) * initial commit for iceberg distcp. * adding copy entity helper and icerbeg distcp template and test case. * Adding unit tests and refactoring method definitions for an Iceberg dataset. * resolve conflicts after cleaning history * update iceberg dataset and finder to include javadoc * addressed comments on PR and aligned code check style * renamed vars, added logging and updated javadoc * update dataset descriptor with ternary operation and rename fs to sourceFs * added source and target fs and update iceberg dataset finder constructor * Update source and dest dataset methods as protected and add req args constructor * change the order of attributes for iceberg dataset finder ctor * update iceberg dataset methods with correct source and target fs Co-authored-by: Meeth Gala <[email protected]>
…can_icebergs_incrementally * upstream/master: [GOBBLIN-1704] Purge offline helix instances during startup (apache#3561) [GOBBLIN-1708] Improve TimeAwareRecursiveCopyableDataset to lookback only into datefolders that match range (apache#3563) [GOBBLIN-1707] Add `IcebergTableTest` unit test (apache#3564) [GOBBLIN-1709] Create Iceberg Datasets Finder, Iceberg Dataset and FileSet to generate Copy Entities to support Distcp for Iceberg (apache#3560) [GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding (apache#3549) [GOBBLIN-1711] Replace Jcenter with maven central (apache#3566)
…one flow execution (#3558) * address comments * use connectionmanager when httpclient is not cloesable * [GOBBLIN-1706]Add DagActionStore to store the action to kill/resume one flow execution * add new flow execution handler which use DagactionStore to persist dag actions and let other host get the info * Make dag manager integrate with the dag action store * address comments * address comments * fix typo and add comments * [GOBBLIN-1699] Log progress of reducer task for visibility with slow compaction jobs #3552 * before starting reduce * after first record is reduced * after reducing every 1000 records Co-authored-by: Urmi Mustafi <[email protected]> * [GOBBLIN-1673][GOBBLIN-1683] Skeleton code for handling messages between task runner / application master for Dynamic work unit allocation (#3539) * [GOBBLIN-1673] Schema for dynamic work unit message * [GOBBLIN-1683] Dynamic Work Unit messaging abstractions * [GOBBLIN-1698] Fast fail during work unit generation based on config. (#3542) * fast fail during work unit generation based on config. * [GOBBLIN-1690] Added logging to ORC writer Closes #3543 from rdsr/master * [GOBBLIN-1678] Refactor git flowgraph component to be extensible (#3536) * Refactor git flowgraph component to be extensible * Move files to appropriate modules * Cleanup and add javadocs * Cleanup, add missing javadocs * Address review and import order * Fix findbugs * Use java sort instead of collections * Add GMCE topic explicitly to hive commit event (#3547) * [GOBBLIN-1689] Decouple compiler from scheduler in warm standby mode (#3544) * address comments * use connectionmanager when httpclient is not cloesable * [GOBBLIN-1689] Decouple compiler from scheduler in warm standby mode * add orchestor as listener before service start * fix code style * address comments * fix test case to test orchestor as one listener of flow spec * remove unintentional change * remove unused import * address comments * fix typo Co-authored-by: Zihan Li <[email protected]> * fast fail during work unit generation based on config. Co-authored-by: Meeth Gala <[email protected]> Co-authored-by: Ratandeep <[email protected]> Co-authored-by: William Lo <[email protected]> Co-authored-by: Jack Moseley <[email protected]> Co-authored-by: Zihan Li <[email protected]> Co-authored-by: Zihan Li <[email protected]> * Define basics for collecting Iceberg metadata for the current snapshot (#3559) * [GOBBLIN-1701] Replace jcenter with either maven central or gradle plugin portal (#3554) * remove jcentral * Use gradle plugin portal for shadow * Use maven central in all other cases * [GOBBLIN-1695] Fix: Failure to add spec executors doesn't block deployment (#3551) * Allow first time failure to authenticate with Azkaban to fail silently * Fix findbugs report * Refactor azkaban authentication into function. Call on init and if session_id is null when adding a flow * Add handling for fetchSession throwing an exception * Add logging when fails on constructor and initialization, but continue to local deploy * Revert changes for azkabanSpecProducer, but quiet log instead of throw in constructor * Fixed vars * Revert changes on azkabanSpecProducer * clean up error throwing * revert function checking changes * Reformat file * Clean up function * Format file for try/catch * Allow first time failure to authenticate with Azkaban to fail silently * Fix findbugs report * Refactor azkaban authentication into function. Call on init and if session_id is null when adding a flow * Fixed rebase * Fixed rebase * Revert changes for azkabanSpecProducer, but quiet log instead of throw in constructor * Add whitespace back * fix helix job wait completion bug when job goes to STOPPING state (#3556) address comments update stoppingStateEndTime with currentTime update test cases * [GOBBLIN-1699] Log progress of reducer task for visibility with slow compaction jobs #3552 * before starting reduce * after first record is reduced * after reducing every 1000 records Co-authored-by: Urmi Mustafi <[email protected]> * Define basics for collecting Iceberg metadata for the current snapshot * [GOBBLIN-1673][GOBBLIN-1683] Skeleton code for handling messages between task runner / application master for Dynamic work unit allocation (#3539) * [GOBBLIN-1673] Schema for dynamic work unit message * [GOBBLIN-1683] Dynamic Work Unit messaging abstractions * Address review comments * Correct import order Co-authored-by: Matthew Ho <[email protected]> Co-authored-by: Andy Jiang <[email protected]> Co-authored-by: Hanghang Nate Liu <[email protected]> Co-authored-by: umustafi <[email protected]> Co-authored-by: Urmi Mustafi <[email protected]> Co-authored-by: William Lo <[email protected]> * [GOBBLIN-1710] Codecov should be optional in CI and not fail Github Actions (#3562) * [GOBBLIN-1711] Replace Jcenter with maven central (#3566) * [GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding (#3549) * address comments * use connectionmanager when httpclient is not cloesable * fix test case to test orchestor as one listener of flow spec * remove unintentional change * [GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding * fix compilation error * address comments * address comments * address comments * update outdated javadoc Co-authored-by: Zihan Li <[email protected]> * [GOBBLIN-1709] Create Iceberg Datasets Finder, Iceberg Dataset and FileSet to generate Copy Entities to support Distcp for Iceberg (#3560) * initial commit for iceberg distcp. * adding copy entity helper and icerbeg distcp template and test case. * Adding unit tests and refactoring method definitions for an Iceberg dataset. * resolve conflicts after cleaning history * update iceberg dataset and finder to include javadoc * addressed comments on PR and aligned code check style * renamed vars, added logging and updated javadoc * update dataset descriptor with ternary operation and rename fs to sourceFs * added source and target fs and update iceberg dataset finder constructor * Update source and dest dataset methods as protected and add req args constructor * change the order of attributes for iceberg dataset finder ctor * update iceberg dataset methods with correct source and target fs Co-authored-by: Meeth Gala <[email protected]> * [GOBBLIN-1707] Add `IcebergTableTest` unit test (#3564) * Add `IcebergTableTest` unit test * Fixup comment and indentation * Minor correction of `Long` => `Integer` * Correct comment * [GOBBLIN-1711] Replace Jcenter with maven central (#3566) * Minor rename of local var Co-authored-by: Matthew Ho <[email protected]> * [GOBBLIN-1708] Improve TimeAwareRecursiveCopyableDataset to lookback only into datefolders that match range (#3563) * Check datetime range validity prior to recursing * Remove unused packages * Remove extra line * Reformat function * Check string prior to parsing * removed unused import * Change checkpathdatetimevalidity to use available localdatetime library parsing functions * Change to isempty * Modify check path to be flexible * Update javadoc * Add unit tests and refactor * change bind class as GOBBLIN-1697 get merged Co-authored-by: Zihan Li <[email protected]> Co-authored-by: umustafi <[email protected]> Co-authored-by: Urmi Mustafi <[email protected]> Co-authored-by: Matthew Ho <[email protected]> Co-authored-by: meethngala <[email protected]> Co-authored-by: Meeth Gala <[email protected]> Co-authored-by: Ratandeep <[email protected]> Co-authored-by: William Lo <[email protected]> Co-authored-by: Jack Moseley <[email protected]> Co-authored-by: Kip Kohn <[email protected]> Co-authored-by: Andy Jiang <[email protected]> Co-authored-by: Hanghang Nate Liu <[email protected]>
…not only the current one (#3569) * Add `IcebergTableTest` unit test * Fixup comment and indentation * Minor correction of `Long` => `Integer` * Correct comment * [GOBBLIN-1711] Replace Jcenter with maven central (#3566) * Minor rename of local var * Extend `IcebergTable` to collect Iceberg metadata across all snapshots * [GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding (#3549) * address comments * use connectionmanager when httpclient is not cloesable * fix test case to test orchestor as one listener of flow spec * remove unintentional change * [GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding * fix compilation error * address comments * address comments * address comments * update outdated javadoc Co-authored-by: Zihan Li <[email protected]> * [GOBBLIN-1709] Create Iceberg Datasets Finder, Iceberg Dataset and FileSet to generate Copy Entities to support Distcp for Iceberg (#3560) * initial commit for iceberg distcp. * adding copy entity helper and icerbeg distcp template and test case. * Adding unit tests and refactoring method definitions for an Iceberg dataset. * resolve conflicts after cleaning history * update iceberg dataset and finder to include javadoc * addressed comments on PR and aligned code check style * renamed vars, added logging and updated javadoc * update dataset descriptor with ternary operation and rename fs to sourceFs * added source and target fs and update iceberg dataset finder constructor * Update source and dest dataset methods as protected and add req args constructor * change the order of attributes for iceberg dataset finder ctor * update iceberg dataset methods with correct source and target fs Co-authored-by: Meeth Gala <[email protected]> * Update `IcebergDataset` to use `IcebergTable.getIncrementalSnapshotInfosIterator` rather than `.getCurrentSnapshotInfo` * Augment `IcebergDatasetTest` unit test to exercise mult-snapshot icebergs * Minor javadoc Update * Throw `IcebergTable.TableNotFoundException` when no such table found Co-authored-by: Matthew Ho <[email protected]> Co-authored-by: Zihan Li <[email protected]> Co-authored-by: Zihan Li <[email protected]> Co-authored-by: meethngala <[email protected]> Co-authored-by: Meeth Gala <[email protected]>
…not only the current one (apache#3569) * Add `IcebergTableTest` unit test * Fixup comment and indentation * Minor correction of `Long` => `Integer` * Correct comment * [GOBBLIN-1711] Replace Jcenter with maven central (apache#3566) * Minor rename of local var * Extend `IcebergTable` to collect Iceberg metadata across all snapshots * [GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding (apache#3549) * address comments * use connectionmanager when httpclient is not cloesable * fix test case to test orchestor as one listener of flow spec * remove unintentional change * [GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding * fix compilation error * address comments * address comments * address comments * update outdated javadoc Co-authored-by: Zihan Li <[email protected]> * [GOBBLIN-1709] Create Iceberg Datasets Finder, Iceberg Dataset and FileSet to generate Copy Entities to support Distcp for Iceberg (apache#3560) * initial commit for iceberg distcp. * adding copy entity helper and icerbeg distcp template and test case. * Adding unit tests and refactoring method definitions for an Iceberg dataset. * resolve conflicts after cleaning history * update iceberg dataset and finder to include javadoc * addressed comments on PR and aligned code check style * renamed vars, added logging and updated javadoc * update dataset descriptor with ternary operation and rename fs to sourceFs * added source and target fs and update iceberg dataset finder constructor * Update source and dest dataset methods as protected and add req args constructor * change the order of attributes for iceberg dataset finder ctor * update iceberg dataset methods with correct source and target fs Co-authored-by: Meeth Gala <[email protected]> * Update `IcebergDataset` to use `IcebergTable.getIncrementalSnapshotInfosIterator` rather than `.getCurrentSnapshotInfo` * Augment `IcebergDatasetTest` unit test to exercise mult-snapshot icebergs * Minor javadoc Update * Throw `IcebergTable.TableNotFoundException` when no such table found Co-authored-by: Matthew Ho <[email protected]> Co-authored-by: Zihan Li <[email protected]> Co-authored-by: Zihan Li <[email protected]> Co-authored-by: meethngala <[email protected]> Co-authored-by: Meeth Gala <[email protected]>
…leSet to generate Copy Entities to support Distcp for Iceberg (apache#3560) * initial commit for iceberg distcp. * adding copy entity helper and icerbeg distcp template and test case. * Adding unit tests and refactoring method definitions for an Iceberg dataset. * resolve conflicts after cleaning history * update iceberg dataset and finder to include javadoc * addressed comments on PR and aligned code check style * renamed vars, added logging and updated javadoc * update dataset descriptor with ternary operation and rename fs to sourceFs * added source and target fs and update iceberg dataset finder constructor * Update source and dest dataset methods as protected and add req args constructor * change the order of attributes for iceberg dataset finder ctor * update iceberg dataset methods with correct source and target fs Co-authored-by: Meeth Gala <[email protected]>
…one flow execution (apache#3558) * address comments * use connectionmanager when httpclient is not cloesable * [GOBBLIN-1706]Add DagActionStore to store the action to kill/resume one flow execution * add new flow execution handler which use DagactionStore to persist dag actions and let other host get the info * Make dag manager integrate with the dag action store * address comments * address comments * fix typo and add comments * [GOBBLIN-1699] Log progress of reducer task for visibility with slow compaction jobs apache#3552 * before starting reduce * after first record is reduced * after reducing every 1000 records Co-authored-by: Urmi Mustafi <[email protected]> * [GOBBLIN-1673][GOBBLIN-1683] Skeleton code for handling messages between task runner / application master for Dynamic work unit allocation (apache#3539) * [GOBBLIN-1673] Schema for dynamic work unit message * [GOBBLIN-1683] Dynamic Work Unit messaging abstractions * [GOBBLIN-1698] Fast fail during work unit generation based on config. (apache#3542) * fast fail during work unit generation based on config. * [GOBBLIN-1690] Added logging to ORC writer Closes apache#3543 from rdsr/master * [GOBBLIN-1678] Refactor git flowgraph component to be extensible (apache#3536) * Refactor git flowgraph component to be extensible * Move files to appropriate modules * Cleanup and add javadocs * Cleanup, add missing javadocs * Address review and import order * Fix findbugs * Use java sort instead of collections * Add GMCE topic explicitly to hive commit event (apache#3547) * [GOBBLIN-1689] Decouple compiler from scheduler in warm standby mode (apache#3544) * address comments * use connectionmanager when httpclient is not cloesable * [GOBBLIN-1689] Decouple compiler from scheduler in warm standby mode * add orchestor as listener before service start * fix code style * address comments * fix test case to test orchestor as one listener of flow spec * remove unintentional change * remove unused import * address comments * fix typo Co-authored-by: Zihan Li <[email protected]> * fast fail during work unit generation based on config. Co-authored-by: Meeth Gala <[email protected]> Co-authored-by: Ratandeep <[email protected]> Co-authored-by: William Lo <[email protected]> Co-authored-by: Jack Moseley <[email protected]> Co-authored-by: Zihan Li <[email protected]> Co-authored-by: Zihan Li <[email protected]> * Define basics for collecting Iceberg metadata for the current snapshot (apache#3559) * [GOBBLIN-1701] Replace jcenter with either maven central or gradle plugin portal (apache#3554) * remove jcentral * Use gradle plugin portal for shadow * Use maven central in all other cases * [GOBBLIN-1695] Fix: Failure to add spec executors doesn't block deployment (apache#3551) * Allow first time failure to authenticate with Azkaban to fail silently * Fix findbugs report * Refactor azkaban authentication into function. Call on init and if session_id is null when adding a flow * Add handling for fetchSession throwing an exception * Add logging when fails on constructor and initialization, but continue to local deploy * Revert changes for azkabanSpecProducer, but quiet log instead of throw in constructor * Fixed vars * Revert changes on azkabanSpecProducer * clean up error throwing * revert function checking changes * Reformat file * Clean up function * Format file for try/catch * Allow first time failure to authenticate with Azkaban to fail silently * Fix findbugs report * Refactor azkaban authentication into function. Call on init and if session_id is null when adding a flow * Fixed rebase * Fixed rebase * Revert changes for azkabanSpecProducer, but quiet log instead of throw in constructor * Add whitespace back * fix helix job wait completion bug when job goes to STOPPING state (apache#3556) address comments update stoppingStateEndTime with currentTime update test cases * [GOBBLIN-1699] Log progress of reducer task for visibility with slow compaction jobs apache#3552 * before starting reduce * after first record is reduced * after reducing every 1000 records Co-authored-by: Urmi Mustafi <[email protected]> * Define basics for collecting Iceberg metadata for the current snapshot * [GOBBLIN-1673][GOBBLIN-1683] Skeleton code for handling messages between task runner / application master for Dynamic work unit allocation (apache#3539) * [GOBBLIN-1673] Schema for dynamic work unit message * [GOBBLIN-1683] Dynamic Work Unit messaging abstractions * Address review comments * Correct import order Co-authored-by: Matthew Ho <[email protected]> Co-authored-by: Andy Jiang <[email protected]> Co-authored-by: Hanghang Nate Liu <[email protected]> Co-authored-by: umustafi <[email protected]> Co-authored-by: Urmi Mustafi <[email protected]> Co-authored-by: William Lo <[email protected]> * [GOBBLIN-1710] Codecov should be optional in CI and not fail Github Actions (apache#3562) * [GOBBLIN-1711] Replace Jcenter with maven central (apache#3566) * [GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding (apache#3549) * address comments * use connectionmanager when httpclient is not cloesable * fix test case to test orchestor as one listener of flow spec * remove unintentional change * [GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding * fix compilation error * address comments * address comments * address comments * update outdated javadoc Co-authored-by: Zihan Li <[email protected]> * [GOBBLIN-1709] Create Iceberg Datasets Finder, Iceberg Dataset and FileSet to generate Copy Entities to support Distcp for Iceberg (apache#3560) * initial commit for iceberg distcp. * adding copy entity helper and icerbeg distcp template and test case. * Adding unit tests and refactoring method definitions for an Iceberg dataset. * resolve conflicts after cleaning history * update iceberg dataset and finder to include javadoc * addressed comments on PR and aligned code check style * renamed vars, added logging and updated javadoc * update dataset descriptor with ternary operation and rename fs to sourceFs * added source and target fs and update iceberg dataset finder constructor * Update source and dest dataset methods as protected and add req args constructor * change the order of attributes for iceberg dataset finder ctor * update iceberg dataset methods with correct source and target fs Co-authored-by: Meeth Gala <[email protected]> * [GOBBLIN-1707] Add `IcebergTableTest` unit test (apache#3564) * Add `IcebergTableTest` unit test * Fixup comment and indentation * Minor correction of `Long` => `Integer` * Correct comment * [GOBBLIN-1711] Replace Jcenter with maven central (apache#3566) * Minor rename of local var Co-authored-by: Matthew Ho <[email protected]> * [GOBBLIN-1708] Improve TimeAwareRecursiveCopyableDataset to lookback only into datefolders that match range (apache#3563) * Check datetime range validity prior to recursing * Remove unused packages * Remove extra line * Reformat function * Check string prior to parsing * removed unused import * Change checkpathdatetimevalidity to use available localdatetime library parsing functions * Change to isempty * Modify check path to be flexible * Update javadoc * Add unit tests and refactor * change bind class as GOBBLIN-1697 get merged Co-authored-by: Zihan Li <[email protected]> Co-authored-by: umustafi <[email protected]> Co-authored-by: Urmi Mustafi <[email protected]> Co-authored-by: Matthew Ho <[email protected]> Co-authored-by: meethngala <[email protected]> Co-authored-by: Meeth Gala <[email protected]> Co-authored-by: Ratandeep <[email protected]> Co-authored-by: William Lo <[email protected]> Co-authored-by: Jack Moseley <[email protected]> Co-authored-by: Kip Kohn <[email protected]> Co-authored-by: Andy Jiang <[email protected]> Co-authored-by: Hanghang Nate Liu <[email protected]>
…not only the current one (apache#3569) * Add `IcebergTableTest` unit test * Fixup comment and indentation * Minor correction of `Long` => `Integer` * Correct comment * [GOBBLIN-1711] Replace Jcenter with maven central (apache#3566) * Minor rename of local var * Extend `IcebergTable` to collect Iceberg metadata across all snapshots * [GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding (apache#3549) * address comments * use connectionmanager when httpclient is not cloesable * fix test case to test orchestor as one listener of flow spec * remove unintentional change * [GOBBLIN-1697]Have a separate resource handler to rely on CDC stream to do message forwarding * fix compilation error * address comments * address comments * address comments * update outdated javadoc Co-authored-by: Zihan Li <[email protected]> * [GOBBLIN-1709] Create Iceberg Datasets Finder, Iceberg Dataset and FileSet to generate Copy Entities to support Distcp for Iceberg (apache#3560) * initial commit for iceberg distcp. * adding copy entity helper and icerbeg distcp template and test case. * Adding unit tests and refactoring method definitions for an Iceberg dataset. * resolve conflicts after cleaning history * update iceberg dataset and finder to include javadoc * addressed comments on PR and aligned code check style * renamed vars, added logging and updated javadoc * update dataset descriptor with ternary operation and rename fs to sourceFs * added source and target fs and update iceberg dataset finder constructor * Update source and dest dataset methods as protected and add req args constructor * change the order of attributes for iceberg dataset finder ctor * update iceberg dataset methods with correct source and target fs Co-authored-by: Meeth Gala <[email protected]> * Update `IcebergDataset` to use `IcebergTable.getIncrementalSnapshotInfosIterator` rather than `.getCurrentSnapshotInfo` * Augment `IcebergDatasetTest` unit test to exercise mult-snapshot icebergs * Minor javadoc Update * Throw `IcebergTable.TableNotFoundException` when no such table found Co-authored-by: Matthew Ho <[email protected]> Co-authored-by: Zihan Li <[email protected]> Co-authored-by: Zihan Li <[email protected]> Co-authored-by: meethngala <[email protected]> Co-authored-by: Meeth Gala <[email protected]>
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Tests
Commits