Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-1766] Define metric to measure lag from producing to consume… #3625

Merged
merged 5 commits into from
Jan 24, 2023

Conversation

umustafi
Copy link
Contributor

… change stream events

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):
    Modify event for change stream events to add the transaction identifier and produceTimestamp. Use the transaction identifier instead of timestamp to dedup events as "at least once" delivery may result in multiple change stream events with different timestamps for the same row update in mysql. Instead use the timestamp to emit a metric measuring the time between producing the event and consuming on our monitor.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@umustafi
Copy link
Contributor Author

@ZihanLi58 @phet can you review?

@codecov-commenter
Copy link

codecov-commenter commented Jan 14, 2023

Codecov Report

Merging #3625 (53cad73) into master (85cd1f9) will increase coverage by 0.15%.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##             master    #3625      +/-   ##
============================================
+ Coverage     46.44%   46.60%   +0.15%     
- Complexity    10621    10667      +46     
============================================
  Files          2130     2133       +3     
  Lines         83303    83503     +200     
  Branches       9282     9281       -1     
============================================
+ Hits          38692    38914     +222     
+ Misses        41039    41030       -9     
+ Partials       3572     3559      -13     
Impacted Files Coverage Δ
...pache/gobblin/runtime/kafka/HighLevelConsumer.java 0.00% <0.00%> (ø)
...apache/gobblin/runtime/metrics/RuntimeMetrics.java 0.00% <ø> (ø)
...ervice/monitoring/DagActionStoreChangeMonitor.java 0.00% <0.00%> (ø)
...lin/service/monitoring/SpecStoreChangeMonitor.java 0.00% <0.00%> (ø)
...dules/flowgraph/pathfinder/AbstractPathFinder.java 83.70% <0.00%> (-1.89%) ⬇️
...lin/elasticsearch/writer/FutureCallbackHolder.java 61.42% <0.00%> (-1.43%) ⬇️
.../java/org/apache/gobblin/runtime/api/FlowSpec.java 38.91% <0.00%> (-1.19%) ⬇️
...n/service/modules/template/StaticFlowTemplate.java 86.74% <0.00%> (-0.93%) ⬇️
...odules/orchestration/InMemoryUserQuotaManager.java 70.07% <0.00%> (-0.79%) ⬇️
...service/modules/dataset/HiveDatasetDescriptor.java 54.76% <0.00%> (ø)
... and 34 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree to have you confirm whether the txId is in fact what we want to key/cache on... otherwise leave a clarifying comment

Copy link
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clearly took a closer read this time and have a few more Qs... sorry to have not noticed them earlier

Copy link
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems very close!

do let me know if you find deeper rationale to prove uniform partition delay. intuitively I don't believe that can be assumed, but possibly there's a critical factor I'm not aware of

Copy link
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good... will let you decide how to proceed on the measurement of non-uniform delay and whether anticipated to be an issue

Copy link
Contributor

@ZihanLi58 ZihanLi58 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@ZihanLi58 ZihanLi58 merged commit 5db1621 into apache:master Jan 24, 2023
phet added a commit to phet/gobblin that referenced this pull request Feb 13, 2023
* upstream/master:
  [GOBBLIN-1771] Clean up logs for dataset commit and file cleanup (apache#3631)
  [GOBBLIN-1770] Allow null values for fields in GaaSObservabilityEvent.Issue fields which are optional
  [GOBBLIN-1769] Change a noisy log that indicates that the queue capacity is almost full to debug (apache#3629)
  [GOBBLIN-1766] Define metric to measure lag from producing to consume… (apache#3625)
  [GOBBLIN-1765] Add support to sync metadata for dir in manifest based copy (apache#3624)
  [GOBBLIN-1768] Fix constructor in KafkaJobStatusMonitorFactory so that it can be injected (apache#3628)
  Specifically name each Hikari connection pool created, for traceability (apache#3627)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants