Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-1620]Make yarn container allocation group by helix tag #3487

Conversation

hanghangliu
Copy link
Contributor

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):
  • GobblinHelixJobLauncher can pick up helix tag and resource config for each Gobblin job run. These configs will be read by YarnAutoScalingManager and YarnService, thus allocated container can have helix instances with required tag and desired resources. Since helix instances can contain unique tags, tasks within each Gobblin job run will be assigned to containers with specific tag and resource.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    Test in testing pipeline.

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

… workflow so that containers will be assigned to correct task

update test cases

update helix instance tag during task runner initiation

update logs

update test case
@codecov-commenter
Copy link

codecov-commenter commented Mar 25, 2022

Codecov Report

Merging #3487 (ea22d96) into master (8c9c8a8) will increase coverage by 0.08%.
The diff coverage is 39.02%.

@@             Coverage Diff              @@
##             master    #3487      +/-   ##
============================================
+ Coverage     46.62%   46.71%   +0.08%     
- Complexity    10360    10408      +48     
============================================
  Files          2076     2078       +2     
  Lines         81071    81295     +224     
  Branches       9049     9082      +33     
============================================
+ Hits          37801    37975     +174     
- Misses        39789    39827      +38     
- Partials       3481     3493      +12     
Impacted Files Coverage Δ
...bblin/cluster/GobblinClusterConfigurationKeys.java 0.00% <ø> (ø)
...n/java/org/apache/gobblin/yarn/YarnHelixUtils.java 23.52% <0.00%> (-3.14%) ⬇️
...apache/gobblin/yarn/event/NewContainerRequest.java 0.00% <0.00%> (ø)
...main/java/org/apache/gobblin/yarn/YarnService.java 15.15% <8.53%> (-0.32%) ⬇️
...pache/gobblin/cluster/GobblinHelixJobLauncher.java 64.84% <16.66%> (-2.37%) ⬇️
.../org/apache/gobblin/cluster/GobblinTaskRunner.java 63.63% <42.85%> (+0.17%) ⬆️
...rg/apache/gobblin/yarn/YarnAutoScalingManager.java 57.41% <80.00%> (+6.13%) ⬆️
...pache/gobblin/yarn/YarnContainerRequestBundle.java 80.64% <80.64%> (ø)
.../java/org/apache/gobblin/cluster/SleepingTask.java 39.39% <0.00%> (-6.07%) ⬇️
...ce/modules/flowgraph/pathfinder/BFSPathFinder.java 75.00% <0.00%> (-5.00%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c9c8a8...ea22d96. Read the comment docs.

logger.info("Actual tags binding " + receiverManager.getClusterManagmentTool()
.getInstanceConfig(this.clusterName, this.helixInstanceName).getTags());
// The helix instance associated with this container should be consistent on helix tag
List<String> existedTags = receiverManager.getClusterManagmentTool()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't understand why we need to query Helix for existing tags. Won't Yarn start containers and assign both job-specific and Helix-wide tags to each container?

Copy link
Contributor Author

@hanghangliu hanghangliu Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a helix instance already have tag assigned during last run, it won't be auto removed. So need to remove unwanted tags here. Added this comment.

}
int containerCount = (int) Math.ceil(((double)partitions / this.partitionsPerContainer) * this.overProvisionFactor);
YarnHelixUtils.ensureResourceFitMaxCapacity(this.yarnService.getMaxResourceCapacity(), resource);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do the max resource validation here? Isn't this already being done inside YarnService?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the logic. Now the max resource validation will only be addressed in YarnService, and will fail fast if Yarn can't meet our resource request.

@@ -260,12 +302,14 @@ void runInternal() {

// adjust the number of target containers based on the configured min and max container values.
numTargetContainers = Math.max(this.minContainers, Math.min(this.maxContainers, numTargetContainers));
trimContainerSize(numTargetContainers, yarnContainerRequestBundle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we should dynamically trim container size. Will lead to unpredictable behavior at runtime. I think we should fail fast if Yarn cannot meet our container request requirements. This is the behavior in YarnService.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only to ensure the container size is within the range of max and min container size read from config. Originally the resource requirement was universal for all containers. Since we have different resource requirement for different helix tag, I'm not sure there's any better way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have this issue even without the changes in this PR? How is it handled currently? IIRC, the resource request just fails if it exceeds the max specified by Yarn.

Set<String> tagSet = resourceHelixTagMap.getOrDefault(resource.toString(), new HashSet<>());
tagSet.add(helixTag);
resourceHelixTagMap.put(resource.toString(), tagSet);
totalContainers += containerCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if add() is called multiple times with the same arguments? Should we avoid incrementing the totalContainers multiple times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is expected behavior. Say multiple gobblin job run request the same resource requirement and helix tag, multiple add will be called and the totalContainers ned to be incremented.

@@ -781,6 +818,8 @@ public void onContainersAllocated(List<Container> containers) {
instanceName = null;
}
}
helixTagAllocatedContainerCount.put(containerHelixTag,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this operation thread-safe? What happens if onContainersAllocated() is called from multiple callback threads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it need to be thread safe, so I put it within the synchronized block.

@hanghangliu hanghangliu requested a review from sv2000 April 21, 2022 17:32
@@ -128,6 +138,11 @@ public YarnAutoScalingManager(GobblinApplicationMaster appMaster) {

this.autoScalingExecutor = Executors.newSingleThreadScheduledExecutor(
ExecutorsUtils.newThreadFactory(Optional.of(log), Optional.of("AutoScalingExecutor")));

this.helixInstanceTags = ConfigUtils.getString(config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these are common tags for all Helix instances. If so, can we add a comment here for clarity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Changed the member variable name to defaultHelixInstanceTags and added comment.

@@ -260,12 +302,14 @@ void runInternal() {

// adjust the number of target containers based on the configured min and max container values.
numTargetContainers = Math.max(this.minContainers, Math.min(this.maxContainers, numTargetContainers));
trimContainerSize(numTargetContainers, yarnContainerRequestBundle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have this issue even without the changes in this PR? How is it handled currently? IIRC, the resource request just fails if it exceeds the max specified by Yarn.

for(; requestedContainerCount < desiredContainerCount; requestedContainerCount++) {
requestContainer(Optional.absent(), yarnContainerRequestBundle.getHelixTagResourceMap().get(currentHelixTag));
}
requestedContainerCountMap.put(currentHelixTag, requestedContainerCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, isn't the requestedContainerCount == desiredContainerCount? If so, the value added to the map looks incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, most case they should be equal. I found a corner case: initially we will spin up 10 containers, if the desiredContainerCount is smaller than 10, no more new container will be requested, and we still need to put requestedContainerCount into the map

@@ -479,32 +495,48 @@ public synchronized void requestTargetNumberOfContainers(int numTargetContainers

this.eventBus.post(new ContainerReleaseRequest(containersToRelease));
}

this.yarnContainerRequest = yarnContainerRequestBundle;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this member field being accessed? Do we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is being used by the private class AMRMClientCallbackHandler for determining the helix tag that need to be assigned to the allocated container

@hanghangliu
Copy link
Contributor Author

@sv2000 As we discussed, I removed trimContainerSize logic, and completely removed the buggy config:
private final int minContainers;
private final int maxContainers;

@hanghangliu hanghangliu requested a review from sv2000 April 28, 2022 16:28
Copy link
Contributor

@sv2000 sv2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. LGTM.

@sv2000 sv2000 merged commit 3e87795 into apache:master Apr 28, 2022
homatthew added a commit to homatthew/gobblin that referenced this pull request May 11, 2022
Revert "Fix bug when shrinking the container in Yarn service (apache#3504)"
This reverts commit dd6d910.

Revert "[GOBBLIN-1620]Make yarn container allocation group by helix tag (apache#3487)"
This reverts commit 3e87795.
ZihanLi58 pushed a commit that referenced this pull request May 12, 2022
Revert "Fix bug when shrinking the container in Yarn service (#3504)"
This reverts commit dd6d910.

Revert "[GOBBLIN-1620]Make yarn container allocation group by helix tag (#3487)"
This reverts commit 3e87795.
phet added a commit to phet/gobblin that referenced this pull request May 17, 2022
* upstream/master:
  [GOBBLIN-1633] Fix compaction actions on job failure not retried if compaction succeeds (apache#3494)
  [GOBBLIN-1646] Revert yarn container / helix tag group changes (apache#3507)
  [GOBBLIN-1641] Add meter for sla exceeded flows (apache#3502)
  GOBBLIN-1644 (apache#3506)
  [GOBBLIN-1645]Change the prefix of dagManager heartbeat to make it consistent with other metrics (apache#3505)
  Fix bug when shrinking the container in Yarn service (apache#3504)
  [GOBBLIN-1637] Add writer, operation, and partition info to failed metadata writer events (apache#3498)
  [GOBBLIN-1638] Fix unbalanced running count metrics due to Azkaban failures (apache#3499)
  [GOBBLIN-1634] Add retries on flow sla kills (apache#3495)
  [GOBBLIN-1620]Make yarn container allocation group by helix tag (apache#3487)
  [GOBBLIN-1636] Close DatasetCleaner after clean task (apache#3497)
  [GOBBLIN-1635] Avoid loading env configuration when using config store to improve the performance (apache#3496)
  use user supplied props to create FileSystem in DatasetCleanerTask (apache#3483)
  [GOBBLIN-1619] WriterUtils.mkdirsWithRecursivePermission contains race condition and puts unnecessary load on filesystem (apache#3477)
  use data node aliases to figure out data node names before using DMAS (apache#3493)
jack-moseley pushed a commit to jack-moseley/gobblin that referenced this pull request Aug 24, 2022
…he#3487)

* make yarn service aware of helix tag and resource requirment for each workflow so that containers will be assigned to correct task

update test cases

update helix instance tag during task runner initiation

update logs

update test case

* remove lib not used, add test case

address comments

* update test cases

* remove container min and max config
jack-moseley pushed a commit to jack-moseley/gobblin that referenced this pull request Aug 24, 2022
…e#3507)

Revert "Fix bug when shrinking the container in Yarn service (apache#3504)"
This reverts commit dd6d910.

Revert "[GOBBLIN-1620]Make yarn container allocation group by helix tag (apache#3487)"
This reverts commit 3e87795.
phet added a commit to phet/gobblin that referenced this pull request Sep 12, 2022
* upstream/master: (124 commits)
  [GOBBLIN-1699] Log progress of reducer task for visibility with slow compaction jobs apache#3552
  fix helix job wait completion bug when job goes to STOPPING state (apache#3556)
  [GOBBLIN-1695] Fix: Failure to add spec executors doesn't block deployment (apache#3551)
  [GOBBLIN-1701] Replace jcenter with either maven central or gradle plugin portal (apache#3554)
  [GOBBLIN-1700] Remove unused coveralls-gradle-plugin dependency
  add MysqlUserQuotaManager (apache#3545)
  [GOBBLIN-1689] Decouple compiler from scheduler in warm standby mode (apache#3544)
  Add GMCE topic explicitly to hive commit event (apache#3547)
  [GOBBLIN-1678] Refactor git flowgraph component to be extensible (apache#3536)
  [GOBBLIN-1690] Added logging to ORC writer
  Allow all iceberg exceptions to be fault tolerant (apache#3541)
  Guard against exists fs call as well (apache#3538)
  Add error handling for timeaware finder to handle scenarios where fil… (apache#3537)
  [GOBBLIN-1675] Add pagination for GaaS on server side (apache#3533)
  [GOBBLIN-1672] Refactor metrics from DagManager into its own class, add metrics per … (apache#3532)
  [GOBBLIN-1677] Fix timezone property to read from key correctly (apache#3535)
  [Gobblin-931] Fix typo in gobblin CLI usage (apache#3530)
  [GOBBLIN-1671] : Fix gobblin.sh script to add external jars as colon separated to HADOOP_CLASSPATH (apache#3531)
  [GOBBLIN-1656] Return a http status 503 on GaaS when quota is exceeded for user or flowgroup (apache#3516)
  [GOBBLIN-1669] Clean up TimeAwareRecursiveCopyableDataset to support seconds in time… (apache#3528)
  [GOBBLIN-1670] Remove rat tasks and unneeded checkstyles blocking build pipeline (apache#3529)
  [GOBBLIN-1668] Add audit counts for iceberg registration (apache#3527)
  [GOBBLIN-1667] Create new predicate - ExistingPartitionSkipPredicate (apache#3526)
  Calculate requested container count based on adding allocated count and outstanding ContainerRequests in Yarn (apache#3524)
  make the requestedContainerCountMap correctly update the container count (apache#3523)
  Fix running counts for retried flows (apache#3520)
  Allow table to flush after write failure (apache#3522)
  [GOBBLIN-1652]Add more log in the KafkaJobStatusMonitor in case it fails to process one GobblinTrackingEvent (apache#3513)
  Make Yarn container and helix instance allocation group by tag (apache#3519)
  [GOBBLIN-1657] Update completion watermark on change_property in IcebergMetadataWriter (apache#3517)
  [GOBBLIN-1654] Add capacity floor to avoid aggressively requesting resource and small files. (apache#3515)
  [GOBBLIN-1653] Shorten job name length if it exceeds 255 characters (apache#3514)
  [GOBBLIN-1650] Implement flowGroup quotas for the DagManager (apache#3511)
  [GOBBLIN-1648] Complete use of JDBC `DataSource` 'read-only' validation query by incorporating where previously omitted (apache#3509)
  Add config to set close timeout in HiveRegister (apache#3512)
  add an API in AbstractBaseKafkaConsumerClient to list selected topics (apache#3501)
  [GOBBLIN-1649] Revert gobblin-1633 (apache#3510)
  [GOBBLIN-1639] Prevent metrics reporting if configured, clean up workunit count metric (apache#3500)
  [GOBBLIN-1647] Add hive commit GTE to HiveMetadataWriter (apache#3508)
  [GOBBLIN-1633] Fix compaction actions on job failure not retried if compaction succeeds (apache#3494)
  [GOBBLIN-1646] Revert yarn container / helix tag group changes (apache#3507)
  [GOBBLIN-1641] Add meter for sla exceeded flows (apache#3502)
  GOBBLIN-1644 (apache#3506)
  [GOBBLIN-1645]Change the prefix of dagManager heartbeat to make it consistent with other metrics (apache#3505)
  Fix bug when shrinking the container in Yarn service (apache#3504)
  [GOBBLIN-1637] Add writer, operation, and partition info to failed metadata writer events (apache#3498)
  [GOBBLIN-1638] Fix unbalanced running count metrics due to Azkaban failures (apache#3499)
  [GOBBLIN-1634] Add retries on flow sla kills (apache#3495)
  [GOBBLIN-1620]Make yarn container allocation group by helix tag (apache#3487)
  [GOBBLIN-1636] Close DatasetCleaner after clean task (apache#3497)
  [GOBBLIN-1635] Avoid loading env configuration when using config store to improve the performance (apache#3496)
  use user supplied props to create FileSystem in DatasetCleanerTask (apache#3483)
  [GOBBLIN-1619] WriterUtils.mkdirsWithRecursivePermission contains race condition and puts unnecessary load on filesystem (apache#3477)
  use data node aliases to figure out data node names before using DMAS (apache#3493)
  [GOBBLIN-1630] Remove flow level metrics for adhoc flows (apache#3491)
  [GOBBLIN-1631]Emit heartbeat for dagManagerThread (apache#3492)
  [GOBBLIN-1624] Refactor quota management, fix various bugs in accounting of running … (apache#3481)
  [GOBBLIN-1613] Add metadata writers field to GMCE schema (apache#3490)
  Update README.md
  [GOBBLIN-1629] Make GobblinMCEWriter be able to catch error when calculating hive specs (apache#3489)
  Add/fix some fields of MetadataWriterFailureEvent (apache#3485)
  [GOBBLIN-1627] provide option to convert datanodes names (apache#3484)
  Add coverage for edge cases when table paths do not exist, check parents (apache#3482)
  [GOBBLIN-1616] Add close connection logic in salseforceSource (apache#3486)
  [GOBBLIN-1621] Make HelixRetriggeringJobCallable emit job skip event when job is dropped due to previous job is running (apache#3478)
  [GOBBLIN-1623] Fix NPE when try to close RestApiConnector (apache#3480)
  Clear bad mysql packages from cache in CI/CD machines (apache#3479)
  [GOBBLIN-1617] pass configurations to some HadoopUtils APIs (apache#3475)
  [GOBBLIN-1616] Make RestApiConnector be able to close the connection finally (apache#3474)
  add config to set log level for any class (apache#3473)
  Fix bug where partitioned tables would always return the wrong equality in paths (apache#3472)
  [GOBBLIN-1602] Change hive table location and partition check to validate using FS r… (apache#3459)
  Don't flush on change_property operation (apache#3467)
  Fix case where error GTE is incorrectly sent from MCE writer (apache#3466)
  partial rollback of PR 3464 (apache#3465)
  [GOBBLIN-1604] Throw exception if there are no allocated requests due to lack of res… (apache#3461)
  [GOBBLIN-1603] Throws error if configured when encountering an IO exception while co… (apache#3460)
  [GOBBLIN-1606] change DEFAULT_GOBBLIN_COPY_CHECK_FILESIZE value (apache#3464)
  Upgraded dropwizard metrics library version from 3.2.3 -> 4.1.2 and added a new wrapper class on dropwizard Timer.Context class to handle the code compatibility as the newer version of this class implements AutoClosable instead of Closable. (apache#3463)
  [GOBBLIN-1605] Fix mysql ubuntu download 404 not found for Github Actions CI/CD (apache#3462)
  [GOBBLIN-1601] implement ChangePermissionCommitStep (apache#3457)
  [GOBBLIN-1598]Fix metrics already exist issue in dag manager (apache#3454)
  [GOBBLIN-1597] Add error handling in dagmanager to continue if dag fails to process,… (apache#3452)
  GOBBLIN-1579 Fail job on hive existing target table location mismatch (apache#3433)
  [GOBBLIN-1596] Ignore already exists exception if the table has already been created… (apache#3451)
  [GOBBLIn-1595]Fix the dead lock during hive registration (apache#3450)
  Add guard in DagManager for improperly formed SLA (apache#3449)
  [GOBBLIN-1588] Send failure events for write failures when watermark is advanced in MCE writer (apache#3441)
  [GOBBLIN-1593] Fix bugs in dag manager about metric reporting and job status monitor (apache#3448)
  Fix bug in `JobSpecSerializer` of inadequately preventing access errors (within `MysqlJobCatalog`) (apache#3447)
  [GOBBLIN-1583] Add System level job start SLA (apache#3437)
  [GOBBLIN-1592] Make hive copy be able to apply filter on directory (apache#3446)
  [GOBBLIN-1585]GaaS (DagManager) keep retrying a failed job beyond max attempt number (apache#3439)
  [GOBBLIN-1590] Add low/high watermark information in event emitted by Gobblin cluster (apache#3443)
  [HotFix]Try to fix the mysql dependency issue in Github action (apache#3445)
  Lazily initialize FileContext and do not store a handle of it so it can be GC'ed when required (apache#3444)
  [GOBBLIN-1584] Add replace record logic for Mysql writer (apache#3438)
  Bump up code cov version (apache#3440)
  [GOBBLIN-1581] Iterate over Sql ResultSet in Only the Forward Direction (apache#3435)
  [GOBBLIN-1575] use reference count in helix manager, so that connect/disconnect are called once and at the right time (apache#3427)
  ...
phet added a commit to phet/gobblin that referenced this pull request Sep 19, 2022
* upstream/master: (124 commits)
  [GOBBLIN-1699] Log progress of reducer task for visibility with slow compaction jobs apache#3552
  fix helix job wait completion bug when job goes to STOPPING state (apache#3556)
  [GOBBLIN-1695] Fix: Failure to add spec executors doesn't block deployment (apache#3551)
  [GOBBLIN-1701] Replace jcenter with either maven central or gradle plugin portal (apache#3554)
  [GOBBLIN-1700] Remove unused coveralls-gradle-plugin dependency
  add MysqlUserQuotaManager (apache#3545)
  [GOBBLIN-1689] Decouple compiler from scheduler in warm standby mode (apache#3544)
  Add GMCE topic explicitly to hive commit event (apache#3547)
  [GOBBLIN-1678] Refactor git flowgraph component to be extensible (apache#3536)
  [GOBBLIN-1690] Added logging to ORC writer
  Allow all iceberg exceptions to be fault tolerant (apache#3541)
  Guard against exists fs call as well (apache#3538)
  Add error handling for timeaware finder to handle scenarios where fil… (apache#3537)
  [GOBBLIN-1675] Add pagination for GaaS on server side (apache#3533)
  [GOBBLIN-1672] Refactor metrics from DagManager into its own class, add metrics per … (apache#3532)
  [GOBBLIN-1677] Fix timezone property to read from key correctly (apache#3535)
  [Gobblin-931] Fix typo in gobblin CLI usage (apache#3530)
  [GOBBLIN-1671] : Fix gobblin.sh script to add external jars as colon separated to HADOOP_CLASSPATH (apache#3531)
  [GOBBLIN-1656] Return a http status 503 on GaaS when quota is exceeded for user or flowgroup (apache#3516)
  [GOBBLIN-1669] Clean up TimeAwareRecursiveCopyableDataset to support seconds in time… (apache#3528)
  [GOBBLIN-1670] Remove rat tasks and unneeded checkstyles blocking build pipeline (apache#3529)
  [GOBBLIN-1668] Add audit counts for iceberg registration (apache#3527)
  [GOBBLIN-1667] Create new predicate - ExistingPartitionSkipPredicate (apache#3526)
  Calculate requested container count based on adding allocated count and outstanding ContainerRequests in Yarn (apache#3524)
  make the requestedContainerCountMap correctly update the container count (apache#3523)
  Fix running counts for retried flows (apache#3520)
  Allow table to flush after write failure (apache#3522)
  [GOBBLIN-1652]Add more log in the KafkaJobStatusMonitor in case it fails to process one GobblinTrackingEvent (apache#3513)
  Make Yarn container and helix instance allocation group by tag (apache#3519)
  [GOBBLIN-1657] Update completion watermark on change_property in IcebergMetadataWriter (apache#3517)
  [GOBBLIN-1654] Add capacity floor to avoid aggressively requesting resource and small files. (apache#3515)
  [GOBBLIN-1653] Shorten job name length if it exceeds 255 characters (apache#3514)
  [GOBBLIN-1650] Implement flowGroup quotas for the DagManager (apache#3511)
  [GOBBLIN-1648] Complete use of JDBC `DataSource` 'read-only' validation query by incorporating where previously omitted (apache#3509)
  Add config to set close timeout in HiveRegister (apache#3512)
  add an API in AbstractBaseKafkaConsumerClient to list selected topics (apache#3501)
  [GOBBLIN-1649] Revert gobblin-1633 (apache#3510)
  [GOBBLIN-1639] Prevent metrics reporting if configured, clean up workunit count metric (apache#3500)
  [GOBBLIN-1647] Add hive commit GTE to HiveMetadataWriter (apache#3508)
  [GOBBLIN-1633] Fix compaction actions on job failure not retried if compaction succeeds (apache#3494)
  [GOBBLIN-1646] Revert yarn container / helix tag group changes (apache#3507)
  [GOBBLIN-1641] Add meter for sla exceeded flows (apache#3502)
  GOBBLIN-1644 (apache#3506)
  [GOBBLIN-1645]Change the prefix of dagManager heartbeat to make it consistent with other metrics (apache#3505)
  Fix bug when shrinking the container in Yarn service (apache#3504)
  [GOBBLIN-1637] Add writer, operation, and partition info to failed metadata writer events (apache#3498)
  [GOBBLIN-1638] Fix unbalanced running count metrics due to Azkaban failures (apache#3499)
  [GOBBLIN-1634] Add retries on flow sla kills (apache#3495)
  [GOBBLIN-1620]Make yarn container allocation group by helix tag (apache#3487)
  [GOBBLIN-1636] Close DatasetCleaner after clean task (apache#3497)
  [GOBBLIN-1635] Avoid loading env configuration when using config store to improve the performance (apache#3496)
  use user supplied props to create FileSystem in DatasetCleanerTask (apache#3483)
  [GOBBLIN-1619] WriterUtils.mkdirsWithRecursivePermission contains race condition and puts unnecessary load on filesystem (apache#3477)
  use data node aliases to figure out data node names before using DMAS (apache#3493)
  [GOBBLIN-1630] Remove flow level metrics for adhoc flows (apache#3491)
  [GOBBLIN-1631]Emit heartbeat for dagManagerThread (apache#3492)
  [GOBBLIN-1624] Refactor quota management, fix various bugs in accounting of running … (apache#3481)
  [GOBBLIN-1613] Add metadata writers field to GMCE schema (apache#3490)
  Update README.md
  [GOBBLIN-1629] Make GobblinMCEWriter be able to catch error when calculating hive specs (apache#3489)
  Add/fix some fields of MetadataWriterFailureEvent (apache#3485)
  [GOBBLIN-1627] provide option to convert datanodes names (apache#3484)
  Add coverage for edge cases when table paths do not exist, check parents (apache#3482)
  [GOBBLIN-1616] Add close connection logic in salseforceSource (apache#3486)
  [GOBBLIN-1621] Make HelixRetriggeringJobCallable emit job skip event when job is dropped due to previous job is running (apache#3478)
  [GOBBLIN-1623] Fix NPE when try to close RestApiConnector (apache#3480)
  Clear bad mysql packages from cache in CI/CD machines (apache#3479)
  [GOBBLIN-1617] pass configurations to some HadoopUtils APIs (apache#3475)
  [GOBBLIN-1616] Make RestApiConnector be able to close the connection finally (apache#3474)
  add config to set log level for any class (apache#3473)
  Fix bug where partitioned tables would always return the wrong equality in paths (apache#3472)
  [GOBBLIN-1602] Change hive table location and partition check to validate using FS r… (apache#3459)
  Don't flush on change_property operation (apache#3467)
  Fix case where error GTE is incorrectly sent from MCE writer (apache#3466)
  partial rollback of PR 3464 (apache#3465)
  [GOBBLIN-1604] Throw exception if there are no allocated requests due to lack of res… (apache#3461)
  [GOBBLIN-1603] Throws error if configured when encountering an IO exception while co… (apache#3460)
  [GOBBLIN-1606] change DEFAULT_GOBBLIN_COPY_CHECK_FILESIZE value (apache#3464)
  Upgraded dropwizard metrics library version from 3.2.3 -> 4.1.2 and added a new wrapper class on dropwizard Timer.Context class to handle the code compatibility as the newer version of this class implements AutoClosable instead of Closable. (apache#3463)
  [GOBBLIN-1605] Fix mysql ubuntu download 404 not found for Github Actions CI/CD (apache#3462)
  [GOBBLIN-1601] implement ChangePermissionCommitStep (apache#3457)
  [GOBBLIN-1598]Fix metrics already exist issue in dag manager (apache#3454)
  [GOBBLIN-1597] Add error handling in dagmanager to continue if dag fails to process,… (apache#3452)
  GOBBLIN-1579 Fail job on hive existing target table location mismatch (apache#3433)
  [GOBBLIN-1596] Ignore already exists exception if the table has already been created… (apache#3451)
  [GOBBLIn-1595]Fix the dead lock during hive registration (apache#3450)
  Add guard in DagManager for improperly formed SLA (apache#3449)
  [GOBBLIN-1588] Send failure events for write failures when watermark is advanced in MCE writer (apache#3441)
  [GOBBLIN-1593] Fix bugs in dag manager about metric reporting and job status monitor (apache#3448)
  Fix bug in `JobSpecSerializer` of inadequately preventing access errors (within `MysqlJobCatalog`) (apache#3447)
  [GOBBLIN-1583] Add System level job start SLA (apache#3437)
  [GOBBLIN-1592] Make hive copy be able to apply filter on directory (apache#3446)
  [GOBBLIN-1585]GaaS (DagManager) keep retrying a failed job beyond max attempt number (apache#3439)
  [GOBBLIN-1590] Add low/high watermark information in event emitted by Gobblin cluster (apache#3443)
  [HotFix]Try to fix the mysql dependency issue in Github action (apache#3445)
  Lazily initialize FileContext and do not store a handle of it so it can be GC'ed when required (apache#3444)
  [GOBBLIN-1584] Add replace record logic for Mysql writer (apache#3438)
  Bump up code cov version (apache#3440)
  [GOBBLIN-1581] Iterate over Sql ResultSet in Only the Forward Direction (apache#3435)
  [GOBBLIN-1575] use reference count in helix manager, so that connect/disconnect are called once and at the right time (apache#3427)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants