-
-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of Owners
rule
#16563
Improve performance of Owners
rule
#16563
Conversation
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
[ci skip-build-wheels] # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust]
0356b15
to
aa03cf3
Compare
fn parse_patterns_from_include( | ||
include: &[PathGlobIncludeEntry], | ||
) -> Result<Vec<glob::Pattern>, String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined below
/// to check if it may optimize the snapshot subset operation on this tree, | ||
/// or need to check for excluded files/directories. | ||
/// | ||
pub fn maybe_is_parent_of_ignored_path(&self, path: &Path) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused
impl PyPathGlobs { | ||
fn parse(self) -> PyResult<PreparedPathGlobs> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now unused
src/rust/engine/src/externs/fs.rs
Outdated
.0 | ||
.include_globs() | ||
.iter() | ||
.map(|pattern| format!("{pattern}")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stuhood is there a better way to call Display
on the value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pattern.to_string()
calls Display
.
} | ||
|
||
fn matches(&self, paths: Vec<String>, py: Python) -> PyResult<Vec<String>> { | ||
py.allow_threads(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stuhood has suggested it may be more efficient to remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would need to do some microbenchmarking, yea.
The two cases are:
- taking
paths: Vec<&str>
, which would avoid copying the inputs, but would require us to hold the GIL, or - keeping as is
You could maybe temporarily add a loop around the match
calls in src/python/pants/source/filespec_test.py
assert_rule_matches
, and then compare that way. It's not a completely realistic use case, but it would be straightforward. I expect that it's not worth doing anything more involved though.
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
docs/markdown/Writing Plugins/common-plugin-tasks/plugin-upgrade-guide.md
Show resolved
Hide resolved
/// owners detection of deleted files (see #6790 and #5636 for more info). The lazy filesystem | ||
/// traversal in expand is (currently) too expensive to use for that in-memory matching (such as | ||
/// via MemFS). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserving the explanation of why we have two implementations of matching (because the lazy impl is too slow for this usecase) would be good. It may not be true forever.
src/rust/engine/src/externs/fs.rs
Outdated
.0 | ||
.include_globs() | ||
.iter() | ||
.map(|pattern| format!("{pattern}")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pattern.to_string()
calls Display
.
} | ||
|
||
fn matches(&self, paths: Vec<String>, py: Python) -> PyResult<Vec<String>> { | ||
py.allow_threads(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would need to do some microbenchmarking, yea.
The two cases are:
- taking
paths: Vec<&str>
, which would avoid copying the inputs, but would require us to hold the GIL, or - keeping as is
You could maybe temporarily add a loop around the match
calls in src/python/pants/source/filespec_test.py
assert_rule_matches
, and then compare that way. It's not a completely realistic use case, but it would be straightforward. I expect that it's not worth doing anything more involved though.
let includes = includes | ||
.iter() | ||
.map(|glob| { | ||
PathGlob::normalize_pattern(glob).and_then(|components| { | ||
let normalized_pattern: PathBuf = components.into_iter().collect(); | ||
Pattern::new(normalized_pattern.to_str().unwrap()) | ||
.map_err(|e| format!("Could not parse {:?} as a glob: {:?}", glob, e)) | ||
}) | ||
}) | ||
.collect::<Result<Vec<_>, String>>()?; | ||
let excludes = GitignoreStyleExcludes::create(excludes)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these should be executed with the GIL released I think.
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
Possibly will close pantsbuild#16122. @stuhood found in pantsbuild#16122 (comment) that we were spending substantial time in Rust parsing the same globs in the stateless function `matches_filespec`. This improves the situation by introducing `FilespecsMatcher`, which allows us to persist the globs parsing over time. Crucially, `FilespecsMatcher` eagerly parses the globs in its constructor, unlike `PathGlobs` currently being lazy. Then, we use a `@memoized_method` on each `SourcesField` instance to avoid ever creating the matcher more than necessary. Before: ``` ❯ hyperfine -w 2 -r 10 './pants --no-pantsd --no-buildsense-enable --no-anonymous-telemetry-enabled dependencies ::' Time (mean ± σ): 6.562 s ± 0.153 s [User: 7.354 s, System: 1.516 s] Range (min … max): 6.420 s … 6.878 s 10 runs ``` After: ``` ❯ hyperfine -w 2 -r 10 './pants --no-pantsd --no-buildsense-enable --no-anonymous-telemetry-enabled dependencies ::' Time (mean ± σ): 6.460 s ± 0.065 s [User: 7.301 s, System: 1.447 s] Range (min … max): 6.340 s … 6.576 s 10 runs ``` (Note, we expect the performance to be better in repos with the shape described at pantsbuild#16122 (comment)) ## Relationship to `PathGlobs` We also considered changing `PathGlobs` to eagerly parse, and then defining `PathGlobs.matches()`, rather than introducing the new type `FilespecsMatcher`. I believe `FilespecsMatcher` is a clearer API: it is only used for in-memory checks if globs match paths, whereas `PathGlobs` is used for on-disk. Notably, the Rust code only overlaps in how the exclude patterns are ignored. So, making this split allows us to simplify `PathGlobs`. A downside of having separate types is it's harder to share the duplicate parsing of the exclude patterns. This seems fine, given that the include patterns are separate code paths. (See pantsbuild#16564 for improving `PathGlobs` to also eagerly parse.) # Conflicts: # docs/markdown/Writing Plugins/common-plugin-tasks/plugin-upgrade-guide.md
Possibly will close pantsbuild#16122. @stuhood found in pantsbuild#16122 (comment) that we were spending substantial time in Rust parsing the same globs in the stateless function `matches_filespec`. This improves the situation by introducing `FilespecsMatcher`, which allows us to persist the globs parsing over time. Crucially, `FilespecsMatcher` eagerly parses the globs in its constructor, unlike `PathGlobs` currently being lazy. Then, we use a `@memoized_method` on each `SourcesField` instance to avoid ever creating the matcher more than necessary. Before: ``` ❯ hyperfine -w 2 -r 10 './pants --no-pantsd --no-buildsense-enable --no-anonymous-telemetry-enabled dependencies ::' Time (mean ± σ): 6.562 s ± 0.153 s [User: 7.354 s, System: 1.516 s] Range (min … max): 6.420 s … 6.878 s 10 runs ``` After: ``` ❯ hyperfine -w 2 -r 10 './pants --no-pantsd --no-buildsense-enable --no-anonymous-telemetry-enabled dependencies ::' Time (mean ± σ): 6.460 s ± 0.065 s [User: 7.301 s, System: 1.447 s] Range (min … max): 6.340 s … 6.576 s 10 runs ``` (Note, we expect the performance to be better in repos with the shape described at pantsbuild#16122 (comment)) We also considered changing `PathGlobs` to eagerly parse, and then defining `PathGlobs.matches()`, rather than introducing the new type `FilespecsMatcher`. I believe `FilespecsMatcher` is a clearer API: it is only used for in-memory checks if globs match paths, whereas `PathGlobs` is used for on-disk. Notably, the Rust code only overlaps in how the exclude patterns are ignored. So, making this split allows us to simplify `PathGlobs`. A downside of having separate types is it's harder to share the duplicate parsing of the exclude patterns. This seems fine, given that the include patterns are separate code paths. (See pantsbuild#16564 for improving `PathGlobs` to also eagerly parse.) [ci skip-build-wheels] [ci skip-rust] [ci skip-build-wheels]
Possibly will close #16122. @stuhood found in #16122 (comment) that we were spending substantial time in Rust parsing the same globs in the stateless function `matches_filespec`. This improves the situation by introducing `FilespecsMatcher`, which allows us to persist the globs parsing over time. Crucially, `FilespecsMatcher` eagerly parses the globs in its constructor, unlike `PathGlobs` currently being lazy. Then, we use a `@memoized_method` on each `SourcesField` instance to avoid ever creating the matcher more than necessary. Before: ``` ❯ hyperfine -w 2 -r 10 './pants --no-pantsd --no-buildsense-enable --no-anonymous-telemetry-enabled dependencies ::' Time (mean ± σ): 6.562 s ± 0.153 s [User: 7.354 s, System: 1.516 s] Range (min … max): 6.420 s … 6.878 s 10 runs ``` After: ``` ❯ hyperfine -w 2 -r 10 './pants --no-pantsd --no-buildsense-enable --no-anonymous-telemetry-enabled dependencies ::' Time (mean ± σ): 6.460 s ± 0.065 s [User: 7.301 s, System: 1.447 s] Range (min … max): 6.340 s … 6.576 s 10 runs ``` (Note, we expect the performance to be better in repos with the shape described at #16122 (comment)) ## Relationship to `PathGlobs` We also considered changing `PathGlobs` to eagerly parse, and then defining `PathGlobs.matches()`, rather than introducing the new type `FilespecsMatcher`. I believe `FilespecsMatcher` is a clearer API: it is only used for in-memory checks if globs match paths, whereas `PathGlobs` is used for on-disk. Notably, the Rust code only overlaps in how the exclude patterns are ignored. So, making this split allows us to simplify `PathGlobs`. A downside of having separate types is it's harder to share the duplicate parsing of the exclude patterns. This seems fine, given that the include patterns are separate code paths. (See #16564 for improving `PathGlobs` to also eagerly parse.)
Possibly will close pantsbuild#16122. @stuhood found in pantsbuild#16122 (comment) that we were spending substantial time in Rust parsing the same globs in the stateless function `matches_filespec`. This improves the situation by introducing `FilespecsMatcher`, which allows us to persist the globs parsing over time. Crucially, `FilespecsMatcher` eagerly parses the globs in its constructor, unlike `PathGlobs` currently being lazy. Then, we use a `@memoized_method` on each `SourcesField` instance to avoid ever creating the matcher more than necessary. Before: ``` ❯ hyperfine -w 2 -r 10 './pants --no-pantsd --no-buildsense-enable --no-anonymous-telemetry-enabled dependencies ::' Time (mean ± σ): 6.562 s ± 0.153 s [User: 7.354 s, System: 1.516 s] Range (min … max): 6.420 s … 6.878 s 10 runs ``` After: ``` ❯ hyperfine -w 2 -r 10 './pants --no-pantsd --no-buildsense-enable --no-anonymous-telemetry-enabled dependencies ::' Time (mean ± σ): 6.460 s ± 0.065 s [User: 7.301 s, System: 1.447 s] Range (min … max): 6.340 s … 6.576 s 10 runs ``` (Note, we expect the performance to be better in repos with the shape described at pantsbuild#16122 (comment)) ## Relationship to `PathGlobs` We also considered changing `PathGlobs` to eagerly parse, and then defining `PathGlobs.matches()`, rather than introducing the new type `FilespecsMatcher`. I believe `FilespecsMatcher` is a clearer API: it is only used for in-memory checks if globs match paths, whereas `PathGlobs` is used for on-disk. Notably, the Rust code only overlaps in how the exclude patterns are ignored. So, making this split allows us to simplify `PathGlobs`. A downside of having separate types is it's harder to share the duplicate parsing of the exclude patterns. This seems fine, given that the include patterns are separate code paths. (See pantsbuild#16564 for improving `PathGlobs` to also eagerly parse.)
Possibly will close #16122.
@stuhood found in #16122 (comment) that we were spending substantial time in Rust parsing the same globs in the stateless function
matches_filespec
.This improves the situation by introducing
FilespecsMatcher
, which allows us to persist the globs parsing over time. Crucially,FilespecsMatcher
eagerly parses the globs in its constructor, unlikePathGlobs
currently being lazy. Then, we use a@memoized_method
on eachSourcesField
instance to avoid ever creating the matcher more than necessary.Before:
After:
(Note, we expect the performance to be better in repos with the shape described at #16122 (comment))
Relationship to
PathGlobs
We also considered changing
PathGlobs
to eagerly parse, and then definingPathGlobs.matches()
, rather than introducing the new typeFilespecsMatcher
.I believe
FilespecsMatcher
is a clearer API: it is only used for in-memory checks if globs match paths, whereasPathGlobs
is used for on-disk.Notably, the Rust code only overlaps in how the exclude patterns are ignored. So, making this split allows us to simplify
PathGlobs
.A downside of having separate types is it's harder to share the duplicate parsing of the exclude patterns. This seems fine, given that the include patterns are separate code paths.
(See #16564 for improving
PathGlobs
to also eagerly parse.)