Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Validating config in docs #11394

Merged
merged 19 commits into from
Jul 9, 2020
Merged

WIP: Validating config in docs #11394

merged 19 commits into from
Jul 9, 2020

Conversation

dmitri-d
Copy link
Contributor

@dmitri-d dmitri-d commented Jun 1, 2020

No description provided.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really exciting to see this coming along!

tools/config_validation/validate_fragment.py Outdated Show resolved Hide resolved
docs/_ext/validating_code_block.py Show resolved Hide resolved
docs/_ext/validating_code_block.py Outdated Show resolved Hide resolved
docs/_ext/validating_code_block.py Outdated Show resolved Hide resolved
docs/_ext/validating_code_block.py Outdated Show resolved Hide resolved
process = subprocess.Popen(['bazel', 'run', '//tools/config_validation:validate_fragment', '--', self.options.get('type-name'), "-s", "\n".join(self.content)],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really nice and clean. You mention on Slack that this is pretty slow (around 10s), which won't scale. You mentioned a "customer builder", curious to learn more. If you want to continue with your existing PR, I'd suggest that you build a .par with all dependencies in docs/build.sh before invoking Sphinx. Then you can just reference the path of this .par here. That will allow you to skip all the Bazel overhead on each YAML processing.

That might still end up being too slow, e.g. if it takes ~1s and you have 100, that's over a minute and a half. When I would do then is add an env var to control validation. In CI we would always enable, but for local iterations on docs builds, we could disable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned a "customer builder", curious to learn more

My thinking is that we'd use a dedicated pass to verify example configs by invoking a dedicated builder. All it would do is to validate configs and generate a report. When a "normal" builder is used, examples would still be rendered, but without validation.

Basically we'd be doing config validation in one batch and it would be explicitly invoked. Current implementation combines doc generation and validation, and I'm not sure I can/should continue on validation errors: I think putting a global state into a directive (which is what ValidatingCodeBlock is) is a way to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to failing hard on validation errors.

How about this for an idea..

  1. We have a Sphinx plugin that just writes out the (YAML fragment, type) tuples to some directory and build all the docs.
  2. We then run them all through the config_validator in a single bazel run at the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you are suggesting is quite close to a dedicated builder, it would qork quite similar to what you have described. This is the way to go If we want to validate all config examples and then report the ones that failed (as opposed to stopping at the first failure, like it's currently implemented).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the other advantage is pure speed. Right now, you are invoking Bazel and Python multiple times to be able to do the validation. I'm guessing only a small fraction of CPU cycles are spent actually in the validation (could be worth measuring).

Dmitri Dolguikh added 3 commits June 2, 2020 16:01
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Can you provide timing for docs build before this change, with the skip and without it? Thanks.

docs/_ext/validating_code_block.py Outdated Show resolved Hide resolved

if ValidatingCodeBlock.skip_validation.lower() != 'true':
args = [
arg for arg in ['bazel', 'run'] + ValidatingCodeBlock.bazel_build_options.split() + [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shlex.split()

Copy link
Contributor Author

@dmitri-d dmitri-d Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I'm jumping through the hoops here is that ValidatingCodeBlock.bazel_build_options is put in quotes otherwise (by subprocess.Popen()), which breaks bazel command line parser. Not sure how shlex.split() would help?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shlex.split() knows how to take a string of distinct CLI options and split them back into their args, which it looks like is what is going on here. It handles things like shell escapes.

docs/build.sh Show resolved Hide resolved
@dmitri-d
Copy link
Contributor Author

dmitri-d commented Jun 4, 2020

Can you provide timing for docs build before this change, with the skip and without it?

Processing a single validated-code-block directive
with SPHINX_SKIP_CONFIG_VALIDATION=true: ~.02 ms (there's little difference between validated-code-block and plain code-block directives in this case -- by ~1 micro second)
with validation turned on: ~3.5s

Dmitri Dolguikh added 2 commits June 4, 2020 11:23
Signed-off-by: Dmitri Dolguikh <[email protected]>
Signed-off-by: Dmitri Dolguikh <[email protected]>
@htuch
Copy link
Member

htuch commented Jun 4, 2020

@dmitri-d looking at CI, it seems docs only took ~4 mins to do the work in https://app.circleci.com/pipelines/github/envoyproxy/envoy/26065/workflows/3b37b45e-45cb-49ae-ab83-463248faeaf0/jobs/350645/steps, whereas in other recent jobs, it's almost ~4-6mins, e.g. https://app.circleci.com/pipelines/github/envoyproxy/envoy/26021/workflows/86f49169-f140-4296-b7c4-1f415c368d34/jobs/350461/steps.

I guess this is because you haven't turned this on for more than one example?

I count 197 possible places we could use this in ag ":: yaml" generated/rst | wc -l.

So, that would be a slowdown of 689s, or a build time increase from 4m to 15.5m on CI.

@lizan @mattklein123 is this going to be acceptable? Or should we keep trying to find ways to speed up the validation.

We definitely want this functionality, it's a question of how best to express it.

@mattklein123
Copy link
Member

I think it's OK if we have a slowdown given how long other CI takes and how important this is, though it would be nice to see if there is any low hanging fruit to speed it up. I wonder if it would be faster to shell out to a C++ binary?

@htuch
Copy link
Member

htuch commented Jun 5, 2020

Rather than rewriting in C++, I think #11394 (comment) provides a path to be able to speed things up by avoiding having to spin up tons of Python processes.

@stale
Copy link

stale bot commented Jun 12, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jun 12, 2020
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Jun 17, 2020
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/.

🐱

Caused by: #11394 was synchronize by dmitri-d.

see: more, trace.

@dmitri-d
Copy link
Contributor Author

  • documented SPHINX_SKIP_CONFIG_VALIDATION env var

@dmitri-d
Copy link
Contributor Author

ping @htuch

Dmitri Dolguikh added 4 commits June 17, 2020 14:33
Signed-off-by: Dmitri Dolguikh <[email protected]>
Signed-off-by: Dmitri Dolguikh <[email protected]>
Signed-off-by: Dmitri Dolguikh <[email protected]>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo two nits, thanks!
/wait

self.options.get('type-name'), '-s', '\n'.join(self.content)
]
process = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is there a reason not to just use subprocess.check_call() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_call/check_output seem to lose at least some of the output (and using PIPE with stderr isn't recommended). The process output is helpful as it points to the error in yaml.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried subprocess.run (new in Python 3) at https://docs.python.org/3/library/subprocess.html? We're not a Python 3 only shop.

tools/config_validation/validate_fragment.py Outdated Show resolved Hide resolved
@stale
Copy link

stale bot commented Jun 26, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jun 26, 2020
@stale
Copy link

stale bot commented Jul 3, 2020

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot closed this Jul 3, 2020
@htuch htuch reopened this Jul 6, 2020
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Jul 6, 2020
@dmitri-d
Copy link
Contributor Author

dmitri-d commented Jul 6, 2020

  • switched to argparser in validate_fragment.py

Dmitri Dolguikh added 2 commits July 6, 2020 15:55
@dmitri-d
Copy link
Contributor Author

dmitri-d commented Jul 7, 2020

ping @htuch

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just the nits remaining.
/wait

tools/config_validation/validate_fragment.py Outdated Show resolved Hide resolved
tools/config_validation/validate_fragment.py Outdated Show resolved Hide resolved
self.options.get('type-name'), '-s', '\n'.join(self.content)
]
process = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried subprocess.run (new in Python 3) at https://docs.python.org/3/library/subprocess.html? We're not a Python 3 only shop.

Signed-off-by: Dmitri Dolguikh <[email protected]>
@dmitri-d
Copy link
Contributor Author

dmitri-d commented Jul 8, 2020

  • switched to subprocess.run()
  • renamed cli arguments to message_type and fragment_path

ping @htuch

Signed-off-by: Dmitri Dolguikh <[email protected]>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Can't wait to see this in action when we upgrade the examples. Such a huge improvement in maintainability.

@htuch htuch merged commit a1d3f4b into envoyproxy:master Jul 9, 2020
@dmitri-d
Copy link
Contributor Author

dmitri-d commented Jul 9, 2020

Thanks for the reviews and feedback!

@mattklein123
Copy link
Member

Amazing!!!

scheler pushed a commit to scheler/envoy that referenced this pull request Aug 4, 2020
Signed-off-by: Dmitri Dolguikh <[email protected]>
Signed-off-by: scheler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants