-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a BoundedTrie metric which is used to efficiently store and… #33385
base: master
Are you sure you want to change the base?
Conversation
… aggregate a collection of string sequences (FQNs) with a limited size.
R: @robertwb |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieCell.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
} else if (other.root().isPresent() && other.singleton().isPresent()) { | ||
return this; | ||
} else { | ||
BoundedTrieNode combined = new BoundedTrieNode(asTrie()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/BoundedTrieResult.java
Outdated
Show resolved
Hide resolved
c9f106b
to
01ffe0f
Compare
5a5ddae
to
ebbe4f9
Compare
… mutable BoundedTrieData
ebbe4f9
to
35e0a8c
Compare
@robertwb : Please have another look. Thank you very much. |
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/BoundedTrieResult.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Outdated
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
} | ||
|
||
/** Returns a new {@link BoundedTrieData} instance that is a deep copy of this instance. */ | ||
public synchronized BoundedTrieData getCumulative() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns a BoundedTrieData over a Set to be consistent with other MetricData in java SDK
* | ||
* @param other The other {@link BoundedTrieData} to combine with. | ||
*/ | ||
public synchronized void combine(@Nonnull BoundedTrieData other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This return a void rather than a reference to updated object because the object is mutable and returning a reference from here will then require unnecessary overhead of using AtomicReference in BoundedTrieCell to safely be able to handle combine call from mutli-threads.
The 2 failing checks |
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/BoundedTrieData.java
Show resolved
Hide resolved
if (metricUpdate.get(BOUNDED_TRIE) == null) { | ||
return BoundedTrieResult.empty(); | ||
} | ||
BoundedTrie bTrie = (BoundedTrie) metricUpdate.get(BOUNDED_TRIE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How has this code been tested?
private static final Logger LOG = LoggerFactory.getLogger(DataflowMetrics.class); | ||
// TODO (rosinha): Remove this once bounded_trie is available in metrics proto Dataflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be more comfortable excluding these changes from this CL until the dataflow client protos are updated. We can do this in a follow-up PR (this one is already huge enough) rather than have a dummy implementation here (especially given the issues we had last time with class cast exceptions that suddenly manifest when the service is updated). The most important is that such metrics are reported and the round trip works on direct runners.
return new CounterUpdate() | ||
.setStructuredNameAndMetadata(name) | ||
.setCumulative(false) | ||
.set("bounded_trie", boundedTrieData.toProto()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the dataflow worker even accept this yet? Again, let's pull the legacy worker changes out into their own PR.
runners/portability/java/src/main/java/org/apache/beam/runners/portability/PortableMetrics.java
Show resolved
Hide resolved
@@ -277,7 +286,8 @@ public static class CommittedMetricTests extends SharedTestBase { | |||
UsesCounterMetrics.class, | |||
UsesDistributionMetrics.class, | |||
UsesGaugeMetrics.class, | |||
UsesStringSetMetrics.class | |||
UsesStringSetMetrics.class, | |||
UsesBoundedTrieMetrics.class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we lose (ordinary) counter coverage for runners not yet supporting bounded trie metrics?
… aggregate a collection of string sequences (FQNs) with a limited size.
It is recommended to review this PR by commits.
BoundedTrie is a space-saving way to store many string sequences (like FQN/file paths). It acts like a tree with branches, holding sequences within a size limit. It can efficiently add, combine, and search and perform trimming of children when the size increases beyond defined max.
Let's say we want to store these sequences, with a size limit of 3:
"folder1/file1.txt"
"folder1/file2.txt"
"folder2/file3.txt"
Here's how the BoundedTrie might look:
If we try to add "folder1/file4.txt", the trie might trim to "folder1", dropping all children to stay within the size limit.
This will be used to replace the StringSet metric for lineage tracking for very large lineage graphs to overcome the size limits.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.