[GOBBLIN-1703] avoid double quota increase for adhoc flows #3550

arjun4084346 · 2022-09-07T01:40:51Z

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

My PR addresses the following Gobblin JIRA issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
- https://issues.apache.org/jira/browse/GOBBLIN-1703

Description

Here are some details about my PR, including screenshots (if applicable):
When a gaas flow request comes, resource handler checks the quota right there.
However, if the flow has runImmediately=true, the quota will be checked again when the first job starts. This should be avoided.

Tests

My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"

codecov-commenter · 2022-09-07T01:54:01Z

Codecov Report

Merging #3550 (30049f6) into master (ae2ea19) will decrease coverage by 0.89%.
The diff coverage is 46.50%.

@@             Coverage Diff              @@
##             master    #3550      +/-   ##
============================================
- Coverage     47.76%   46.86%   -0.90%     
- Complexity     8557    10617    +2060     
============================================
  Files          1705     2111     +406     
  Lines         64976    82667   +17691     
  Branches       7036     9202    +2166     
============================================
+ Hits          31035    38742    +7707     
- Misses        31240    40363    +9123     
- Partials       2701     3562     +861

Impacted Files	Coverage Δ
.../org/apache/gobblin/service/ServiceConfigKeys.java	`0.00% <ø> (ø)`
...gobblin/runtime/spec_store/MysqlBaseSpecStore.java	`89.16% <ø> (ø)`
...lin/service/modules/flow/MultiHopFlowCompiler.java	`67.25% <0.00%> (-1.62%)`	⬇️
...odules/orchestration/AbstractUserQuotaManager.java	`100.00% <ø> (+22.85%)`	⬆️
...ervice/modules/flow/BaseFlowToJobSpecCompiler.java	`60.68% <12.50%> (-3.54%)`	⬇️
...e/modules/orchestration/MysqlUserQuotaManager.java	`47.34% <31.38%> (-26.53%)`	⬇️
...blin/service/modules/orchestration/DagManager.java	`82.09% <50.00%> (-0.14%)`	⬇️
...odules/orchestration/InMemoryUserQuotaManager.java	`70.07% <73.03%> (+5.97%)`	⬆️
.../modules/scheduler/GobblinServiceJobScheduler.java	`64.62% <100.00%> (ø)`
...e/gobblin/service/monitoring/GitConfigMonitor.java	`95.23% <0.00%> (-4.77%)`	⬇️
... and 412 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Will-Lo · 2022-09-07T23:56:46Z

...e/src/main/java/org/apache/gobblin/service/modules/scheduler/GobblinServiceJobScheduler.java

          quotaManager.get().checkQuota(dag.getNodes().get(0));
+          ((FlowSpec) addedSpec).getConfigAsProperties().setProperty(ServiceConfigKeys.GOBBLIN_SERVICE_ADHOC_FLOW, "true");


Would it make sense if we added this in createFlowSpecForConfig() under FlowConfigLocalResourceHandler.java? Seems like this property can be used in a lot of areas outside of just this feature

that function notifies the addSpec immediately after so I'm not sure we will use the value directly anywhere else. However, if we do use the in memory value of the Spec in other places, this property will be useful to add there. Btw shouldn't this value be set to false after the first iteration of an adhoc flow in the DagManager or Scheduler?

Actually, the FlowSpec created by createFlowSpecForConfig will be persisted in the spec store. If we add the config there, it will be persisted permanently in the spec store. And then we will not be able to distinguish if it is a scheduled run or the adhoc run of a "schedule with runImmediately=true" kind of flows.
cc @ZihanLi58

@umustafi the if block inside which this statement is present, will be false for the second iteration.
!jobConfig.containsKey(ConfigurationKeys.JOB_SCHEDULE_KEY) || PropertiesUtils.getPropAsBoolean(jobConfig, ConfigurationKeys.FLOW_RUN_IMMEDIATELY, "false"))
run immediately is set to false on service restart and if job schedule key is absent , the flow would be deleted from the spec store after the 1st iteration.

I will add a comment there explaining this.

gobblin-service/src/main/java/org/apache/gobblin/service/modules/flow/MultiHopFlowCompiler.java

umustafi · 2022-09-08T20:04:09Z

...e/src/main/java/org/apache/gobblin/service/modules/scheduler/GobblinServiceJobScheduler.java

@@ -335,7 +335,9 @@ public AddSpecResponse onAddSpec(Spec addedSpec) {
      if (quotaManager.isPresent()) {
        // QuotaManager has idempotent checks for a dagNode, so this check won't double add quotas for a flow in the DagManager
        try {
+          // todo : we should probably check quota for all of the start nodes


do u mean this for all hops in multi-hop flow?

I mean for all the start nodes, not for all the nodes.
So, a Dag has n nodes, some of which can be startNodes, that means startNodes can start concurrently. So It makes more sense to check quota on all the start nodes when the ad hoc flow request comes, instead of just checking for it for the node with 0th index.

I think quota is at flow level but not job level? If that's the case, checking once should be enough?

umustafi · 2022-09-08T20:08:00Z

...e/src/main/java/org/apache/gobblin/service/modules/scheduler/GobblinServiceJobScheduler.java

          quotaManager.get().checkQuota(dag.getNodes().get(0));
+          ((FlowSpec) addedSpec).getConfigAsProperties().setProperty(ServiceConfigKeys.GOBBLIN_SERVICE_ADHOC_FLOW, "true");


that function notifies the addSpec immediately after so I'm not sure we will use the value directly anywhere else. However, if we do use the in memory value of the Spec in other places, this property will be useful to add there. Btw shouldn't this value be set to false after the first iteration of an adhoc flow in the DagManager or Scheduler?

ZihanLi58 · 2022-09-09T00:03:14Z

gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java

-        quotaManager.checkQuota(dagNode);
+        if (!Boolean.parseBoolean(dagNode.getValue().getJobSpec().getConfigAsProperties().getProperty(
+            ServiceConfigKeys.GOBBLIN_SERVICE_ADHOC_FLOW, "false"))) {
+          quotaManager.checkQuota(dagNode);


Wondering do you want to put the flow id into running map of quota manager without increasing quota even it's adhoc flow? I feel it's necessary or for the next job in the dag, when you call checkQuota, you will increase the quota again? Am I missing anything here?

ZihanLi58 · 2022-09-09T00:07:41Z

...ce/src/main/java/org/apache/gobblin/service/modules/orchestration/MysqlUserQuotaManager.java

-    String stateStoreTableName = ConfigUtils.getString(config, ConfigurationKeys.STATE_STORE_DB_TABLE_KEY,
-        ConfigurationKeys.DEFAULT_STATE_STORE_DB_TABLE);
+    String quotaStoreTableName = ConfigUtils.getString(config, ServiceConfigKeys.QUOTA_STORE_DB_TABLE_KEY,
+        ServiceConfigKeys.DEFAULT_QUOTA_STORE_DB_TABLE);

    BasicDataSource basicDataSource = MysqlStateStore.newDataSource(config);


In this way, you will use "state.store.db.url" as the key to mysql db, which is too general. I will suggest to check how mysqlSpecStore handle that, we should have a config prefix to get the config related to this mysql quota manager.

ZihanLi58 · 2022-09-09T00:14:25Z

...e/src/main/java/org/apache/gobblin/service/modules/scheduler/GobblinServiceJobScheduler.java

@@ -332,10 +332,13 @@ public AddSpecResponse onAddSpec(Spec addedSpec) {
    // Check quota limits against run immediately flows or adhoc flows before saving the schedule
    // In warm standby mode, this quota check will happen on restli API layer when we accept the flow
    if (!this.warmStandbyEnabled && (!jobConfig.containsKey(ConfigurationKeys.JOB_SCHEDULE_KEY) || PropertiesUtils.getPropAsBoolean(jobConfig, ConfigurationKeys.FLOW_RUN_IMMEDIATELY, "false"))) {
+      // This block should be reachable only for the first execution for the adhoc flows (flows that either do not have a schedule or have runImmediately=true.


Just a reminder: As on line 334 we checked whether this.warmStandbyEnabled should set to be false, this will only be reached in the old mode, where we didn't enable message forwarding.
You should check quota on the compiler onAddSpec method when we enable message forwarding

ZihanLi58 · 2022-09-09T00:15:29Z

...e/src/main/java/org/apache/gobblin/service/modules/scheduler/GobblinServiceJobScheduler.java

@@ -335,7 +335,9 @@ public AddSpecResponse onAddSpec(Spec addedSpec) {
      if (quotaManager.isPresent()) {
        // QuotaManager has idempotent checks for a dagNode, so this check won't double add quotas for a flow in the DagManager
        try {
+          // todo : we should probably check quota for all of the start nodes


I think quota is at flow level but not job level? If that's the case, checking once should be enough?

arjun4084346 · 2022-09-24T05:14:24Z

Functionality of checking quota for every job is left unchanged that can be done in the other PR.
Handling parallel flow functionality is also left unchanged for the other PR.

ZihanLi58

Left some comments, main suggestion is we should come up with a way to make sure running dag map and quota table are consistent at any given time. We can even just have one table of (dagnodeId, proxyUser, group, requester) and do count to calculate the quota every time.

ZihanLi58 · 2022-09-27T00:01:59Z

gobblin-service/src/main/java/org/apache/gobblin/service/modules/flow/MultiHopFlowCompiler.java

@@ -281,6 +281,13 @@ public Dag<JobExecutionPlan> compileFlow(Spec spec) {
    Instrumented.markMeter(flowCompilationSuccessFulMeter);
    Instrumented.updateTimer(flowCompilationTimer, System.nanoTime() - startTime, TimeUnit.NANOSECONDS);

+    if (Boolean.parseBoolean(flowSpec.getConfigAsProperties().getProperty(ServiceConfigKeys.GOBBLIN_SERVICE_ADHOC_FLOW))) {
+      // todo : we should probably set it on all of the start nodes
+      for (Dag.DagNode<JobExecutionPlan> dagNode : jobExecutionPlanDag.getStartNodes()) {


If we already change it to start nodes, should we remove the todo log on line 285 also change the logic in compiler onAddFlowSpec to check quota for all start nodes as well?

Also if we have running dag Ids, do we still need this? anyway we should be able to avoid double count using that map?

ZihanLi58 · 2022-09-27T00:08:02Z

...ce/src/main/java/org/apache/gobblin/service/modules/orchestration/MysqlUserQuotaManager.java

@@ -62,7 +80,7 @@ public void init(Collection<Dag<JobExecutionPlan>> dags) {
  @Override
  int incrementJobCount(String user, CountType countType) throws IOException {
    try {
-      return this.mysqlStore.increaseCount(user, countType);
+      return this.quotaStore.increaseCount(user, countType);


Question here, shouldn't we change the increaseCount and addDagId to be one mysql transaction? Otherwise we will see discrepancy between these two table?

Probably yes, but it was not being done together in InMemory version also.
Do you want me to change both?

...service/src/main/java/org/apache/gobblin/service/modules/flow/BaseFlowToJobSpecCompiler.java

...ce/src/main/java/org/apache/gobblin/service/modules/orchestration/MysqlUserQuotaManager.java

ZihanLi58 · 2022-09-28T17:53:22Z

...ce/src/main/java/org/apache/gobblin/service/modules/orchestration/MysqlUserQuotaManager.java

+          DagManagerUtils.getFlowGroupQuotaKey(flowGroup, dagNode), getQuotaForFlowGroup(flowGroup), CountType.FLOWGROUP_COUNT);
+      flowGroupCheck = flowGroupQuotaIncrement >= 0;
+      quotaCheck.setFlowGroupCheck(flowGroupCheck);
+      if (!flowGroupCheck) {


For mysql quota, if we fail to check any of the quota type, shouldn't we directly rollback the change in this method?

Do you mean if user check fails, should we immediately stop without trying to check proxy/requesterService quota? I think we can, but the existing code would check all the quotas anyway to be able to form a complete error message (in requesterMessage) so I chose not to disturb that functionality

I don't mean that, we still process to the end, but before commit the change, we check whether all three succeed, if not, instead of commit, we should revert the change at this point. But we still return all check result to user.

...service/src/main/java/org/apache/gobblin/service/modules/flow/BaseFlowToJobSpecCompiler.java

ZihanLi58

Thanks for this change, looks good to me overall, just one minor comment, please also fix the compile/test error

ZihanLi58 · 2022-09-28T22:45:31Z

...src/main/java/org/apache/gobblin/service/modules/orchestration/InMemoryUserQuotaManager.java

+  public void checkQuota(Collection<Dag.DagNode<JobExecutionPlan>> dagNodes) throws IOException {
+    for (Dag.DagNode<JobExecutionPlan> dagNode : dagNodes) {
+      QuotaCheck quotaCheck = increaseAndCheckQuota(dagNode);
+      if ((!quotaCheck.proxyUserCheck || !quotaCheck.requesterCheck || !quotaCheck.flowGroupCheck)) {


you might want to roll back for previous dag node as well in this case.

...ce/src/main/java/org/apache/gobblin/service/modules/orchestration/MysqlUserQuotaManager.java

ZihanLi58

+1, thanks for addressing all my comments

* avoid double quota increase for adhoc flows * corrected some config names * address review comments, made runningDagIds map implementation specific * address review comments * address review comments * fix checkstyle * make checking quota for multiple dag nodes atomic * fix unit test * remove unused code * remove unused imports * address review comments

* upstream/master: move dataset handler code before cleaning up staging data (apache#3594) [GOBBLIN-1730] Include flow execution id when try to cancel/submit job using SimpleKafkaSpecProducer (apache#3588) [GOBBLIN-1734] make DestinationDatasetHandler work on streaming sources (apache#3592) give option to cancel helix workflow through Delete API (apache#3580) [GOBBLIN-1728] Fix YarnService incorrect container allocation behavior (apache#3586) Support multiple node types in shared flowgraph, fix logs (apache#3590) Search for dummy file in writer directory (apache#3589) Use root cause for checking if exception is transient (apache#3585) [GOBBLIN-1724] Support a shared flowgraph layout in GaaS (apache#3583) [GOBBLIN-1731] Enable HiveMetadataWriter to override table schema lit… (apache#3587) [GOBBLIN-1726] Avro 1.9 upgrade of Gobblin OSS (apache#3581) [GOBBLIN-1725] Fix bugs in gaas warm standby mode (apache#3582) [GOBBLIN-1718] Define DagActionStoreMonitor to listen for kill/resume… (apache#3572) Add log line for committing/retrieving watermarks in streaming (apache#3578) [GOBBLIN-1707] Enhance `IcebergDataset` to detect when files already at dest then proceed with only delta (apache#3575) Ignore AlreadyExistsException in hive writer (apache#3579) Fail GMIP container for known transient exceptions to avoid data loss (apache#3576) GOBBLIN-1715: Support vectorized row batch pooling (apache#3574) [GOBBLIN-1696] Implement file based flowgraph that detects changes to the underlying… (apache#3548) GOBBLIN-1719 Replace moveToTrash with moveToAppropriateTrash for hadoop trash (apache#3573) [GOBBLIN-1703] avoid double quota increase for adhoc flows (apache#3550)

Will-Lo reviewed Sep 7, 2022

View reviewed changes

umustafi reviewed Sep 8, 2022

View reviewed changes

arjun4084346 changed the title ~~avoid double quota increase for adhoc flows~~ [GOBBLIN-1703] avoid double quota increase for adhoc flows Sep 8, 2022

ZihanLi58 requested changes Sep 9, 2022

View reviewed changes

arjun4084346 force-pushed the useMysqlQuotaManager branch from fb35d30 to 22b2a74 Compare September 24, 2022 04:07

ZihanLi58 requested changes Sep 27, 2022

View reviewed changes

umustafi reviewed Sep 28, 2022

View reviewed changes

...service/src/main/java/org/apache/gobblin/service/modules/flow/BaseFlowToJobSpecCompiler.java Outdated Show resolved Hide resolved

...ce/src/main/java/org/apache/gobblin/service/modules/orchestration/MysqlUserQuotaManager.java Show resolved Hide resolved

arjun4084346 added 5 commits September 28, 2022 10:26

avoid double quota increase for adhoc flows

7b6239c

corrected some config names

7f4b025

address review comments, made runningDagIds map implementation specific

bf9e1d6

address review comments

f86cdde

address review comments

93c1a32

arjun4084346 force-pushed the useMysqlQuotaManager branch from 6c71b02 to 93c1a32 Compare September 28, 2022 15:26

fix checkstyle

07d9ac9

umustafi approved these changes Sep 28, 2022

View reviewed changes

ZihanLi58 requested changes Sep 28, 2022

View reviewed changes

make checking quota for multiple dag nodes atomic

373158a

ZihanLi58 reviewed Sep 28, 2022

View reviewed changes

fix unit test

b33ea65

ZihanLi58 reviewed Sep 30, 2022

View reviewed changes

...ce/src/main/java/org/apache/gobblin/service/modules/orchestration/MysqlUserQuotaManager.java Outdated Show resolved Hide resolved

remove unused code

42ba8de

ZihanLi58 requested changes Sep 30, 2022

View reviewed changes

...ce/src/main/java/org/apache/gobblin/service/modules/orchestration/MysqlUserQuotaManager.java Show resolved Hide resolved

arjun4084346 added 2 commits September 30, 2022 13:18

remove unused imports

3c06d3a

address review comments

30049f6

ZihanLi58 approved these changes Sep 30, 2022

View reviewed changes

ZihanLi58 merged commit 0c3f6d1 into apache:master Oct 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GOBBLIN-1703] avoid double quota increase for adhoc flows #3550

[GOBBLIN-1703] avoid double quota increase for adhoc flows #3550

arjun4084346 commented Sep 7, 2022 •

edited

Loading

codecov-commenter commented Sep 7, 2022 •

edited

Loading

Will-Lo Sep 7, 2022

umustafi Sep 8, 2022

arjun4084346 Sep 8, 2022

arjun4084346 Sep 8, 2022

umustafi Sep 8, 2022

arjun4084346 Sep 8, 2022

ZihanLi58 Sep 9, 2022

umustafi Sep 8, 2022

ZihanLi58 Sep 9, 2022

ZihanLi58 Sep 9, 2022

ZihanLi58 Sep 9, 2022

ZihanLi58 Sep 9, 2022

arjun4084346 commented Sep 24, 2022

ZihanLi58 left a comment

ZihanLi58 Sep 27, 2022

ZihanLi58 Sep 27, 2022

ZihanLi58 Sep 27, 2022

arjun4084346 Sep 27, 2022

ZihanLi58 Sep 28, 2022

arjun4084346 Sep 28, 2022

ZihanLi58 Sep 28, 2022

ZihanLi58 left a comment

ZihanLi58 Sep 28, 2022

ZihanLi58 left a comment

		quotaManager.get().checkQuota(dag.getNodes().get(0));
		((FlowSpec) addedSpec).getConfigAsProperties().setProperty(ServiceConfigKeys.GOBBLIN_SERVICE_ADHOC_FLOW, "true");

[GOBBLIN-1703] avoid double quota increase for adhoc flows #3550

[GOBBLIN-1703] avoid double quota increase for adhoc flows #3550

Conversation

arjun4084346 commented Sep 7, 2022 • edited Loading

JIRA

Description

Tests

Commits

codecov-commenter commented Sep 7, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arjun4084346 commented Sep 24, 2022

ZihanLi58 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZihanLi58 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZihanLi58 left a comment

Choose a reason for hiding this comment

arjun4084346 commented Sep 7, 2022 •

edited

Loading

codecov-commenter commented Sep 7, 2022 •

edited

Loading