-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Validating config in docs #11394
Conversation
Signed-off-by: Dmitri Dolguikh <[email protected]>
Signed-off-by: Dmitri Dolguikh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really exciting to see this coming along!
docs/_ext/validating_code_block.py
Outdated
process = subprocess.Popen(['bazel', 'run', '//tools/config_validation:validate_fragment', '--', self.options.get('type-name'), "-s", "\n".join(self.content)], | ||
stdout=subprocess.PIPE, | ||
stderr=subprocess.PIPE) | ||
stdout, stderr = process.communicate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really nice and clean. You mention on Slack that this is pretty slow (around 10s), which won't scale. You mentioned a "customer builder", curious to learn more. If you want to continue with your existing PR, I'd suggest that you build a .par
with all dependencies in docs/build.sh
before invoking Sphinx. Then you can just reference the path of this .par
here. That will allow you to skip all the Bazel overhead on each YAML processing.
That might still end up being too slow, e.g. if it takes ~1s and you have 100, that's over a minute and a half. When I would do then is add an env var to control validation. In CI we would always enable, but for local iterations on docs builds, we could disable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned a "customer builder", curious to learn more
My thinking is that we'd use a dedicated pass to verify example configs by invoking a dedicated builder. All it would do is to validate configs and generate a report. When a "normal" builder is used, examples would still be rendered, but without validation.
Basically we'd be doing config validation in one batch and it would be explicitly invoked. Current implementation combines doc generation and validation, and I'm not sure I can/should continue on validation errors: I think putting a global state into a directive (which is what ValidatingCodeBlock
is) is a way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to failing hard on validation errors.
How about this for an idea..
- We have a Sphinx plugin that just writes out the (YAML fragment, type) tuples to some directory and build all the docs.
- We then run them all through the
config_validator
in a singlebazel run
at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you are suggesting is quite close to a dedicated builder, it would qork quite similar to what you have described. This is the way to go If we want to validate all config examples and then report the ones that failed (as opposed to stopping at the first failure, like it's currently implemented).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the other advantage is pure speed. Right now, you are invoking Bazel and Python multiple times to be able to do the validation. I'm guessing only a small fraction of CPU cycles are spent actually in the validation (could be worth measuring).
Signed-off-by: Dmitri Dolguikh <[email protected]>
Signed-off-by: Dmitri Dolguikh <[email protected]>
Signed-off-by: Dmitri Dolguikh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Can you provide timing for docs build before this change, with the skip and without it? Thanks.
docs/_ext/validating_code_block.py
Outdated
|
||
if ValidatingCodeBlock.skip_validation.lower() != 'true': | ||
args = [ | ||
arg for arg in ['bazel', 'run'] + ValidatingCodeBlock.bazel_build_options.split() + [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shlex.split()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I'm jumping through the hoops here is that ValidatingCodeBlock.bazel_build_options
is put in quotes otherwise (by subprocess.Popen()
), which breaks bazel command line parser. Not sure how shlex.split()
would help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shlex.split()
knows how to take a string of distinct CLI options and split them back into their args, which it looks like is what is going on here. It handles things like shell escapes.
Processing a single |
Signed-off-by: Dmitri Dolguikh <[email protected]>
Signed-off-by: Dmitri Dolguikh <[email protected]>
@dmitri-d looking at CI, it seems docs only took ~4 mins to do the work in https://app.circleci.com/pipelines/github/envoyproxy/envoy/26065/workflows/3b37b45e-45cb-49ae-ab83-463248faeaf0/jobs/350645/steps, whereas in other recent jobs, it's almost ~4-6mins, e.g. https://app.circleci.com/pipelines/github/envoyproxy/envoy/26021/workflows/86f49169-f140-4296-b7c4-1f415c368d34/jobs/350461/steps. I guess this is because you haven't turned this on for more than one example? I count 197 possible places we could use this in So, that would be a slowdown of 689s, or a build time increase from 4m to 15.5m on CI. @lizan @mattklein123 is this going to be acceptable? Or should we keep trying to find ways to speed up the validation. We definitely want this functionality, it's a question of how best to express it. |
I think it's OK if we have a slowdown given how long other CI takes and how important this is, though it would be nice to see if there is any low hanging fruit to speed it up. I wonder if it would be faster to shell out to a C++ binary? |
Rather than rewriting in C++, I think #11394 (comment) provides a path to be able to speed things up by avoiding having to spin up tons of Python processes. |
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Signed-off-by: Dmitri Dolguikh <[email protected]>
Signed-off-by: Dmitri Dolguikh <[email protected]>
|
ping @htuch |
Signed-off-by: Dmitri Dolguikh <[email protected]>
Signed-off-by: Dmitri Dolguikh <[email protected]>
Signed-off-by: Dmitri Dolguikh <[email protected]>
Signed-off-by: Dmitri Dolguikh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo two nits, thanks!
/wait
docs/_ext/validating_code_block.py
Outdated
self.options.get('type-name'), '-s', '\n'.join(self.content) | ||
] | ||
process = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE) | ||
stdout, stderr = process.communicate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is there a reason not to just use subprocess.check_call()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_call/check_output seem to lose at least some of the output (and using PIPE with stderr isn't recommended). The process output is helpful as it points to the error in yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried subprocess.run
(new in Python 3) at https://docs.python.org/3/library/subprocess.html? We're not a Python 3 only shop.
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Signed-off-by: Dmitri Dolguikh <[email protected]>
Signed-off-by: Dmitri Dolguikh <[email protected]>
|
Signed-off-by: Dmitri Dolguikh <[email protected]>
Signed-off-by: Dmitri Dolguikh <[email protected]>
ping @htuch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just the nits remaining.
/wait
docs/_ext/validating_code_block.py
Outdated
self.options.get('type-name'), '-s', '\n'.join(self.content) | ||
] | ||
process = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE) | ||
stdout, stderr = process.communicate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried subprocess.run
(new in Python 3) at https://docs.python.org/3/library/subprocess.html? We're not a Python 3 only shop.
Signed-off-by: Dmitri Dolguikh <[email protected]>
ping @htuch |
Signed-off-by: Dmitri Dolguikh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! Can't wait to see this in action when we upgrade the examples. Such a huge improvement in maintainability.
Thanks for the reviews and feedback! |
Amazing!!! |
Signed-off-by: Dmitri Dolguikh <[email protected]> Signed-off-by: scheler <[email protected]>
No description provided.