Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infra: Fix non-PEP txt files being included and erroring out the build #2405

Closed
wants to merge 4 commits into from

Conversation

CAM-Gerlach
Copy link
Member

@CAM-Gerlach CAM-Gerlach commented Mar 10, 2022

Presently, as described in #2404 , Sphinx attempts to render any .txt files not explicitly excluded in exclude_patterns of conf.py, in the root or any subdirectories. This results in build failure if any are found, particularly in untracked files which we don't control (e.g. vendor.txt in venvs).

Per the Sphinx docs we can use full Unix shell-style glob syntax, so adding the glob **/*.txt to exclude_patterns neatly fixes most instance of this problem by ensuring that .txt files are not matched unless they are in the root directory (as all the valid PEPs are).

To handle any remaining cases, i.e. non-PEP .txt files that are in the root directory and not explicitly excluded, adding *[!0-9].txt,*[!0-9]?.txt, *[!0-9]??.txt, *[!0-9]???.txt, *[!p]?????.txt, *[!e]??????.txt, and *[!p]???????.txt ensures that to be included, .txt files must match the equivalent of the regex ^pep?[0-9]{4}\.txt$, which is only used for PEPs.

This solution was tested on some sample files in the root and subdirectories, including the reported venv-spam/vendor.txt, and several with varying degrees of names somewhat similar to PEPs, and it excluded all of them, while not excluding any valid PEPs.

It also obviates the need for some of our other explicit excludes, but I opted to leave them for now

Fix #2404

@CAM-Gerlach CAM-Gerlach added bug infra Core infrastructure for building and rendering PEPs labels Mar 10, 2022
@CAM-Gerlach CAM-Gerlach requested a review from AA-Turner as a code owner March 10, 2022 06:09
@CAM-Gerlach CAM-Gerlach self-assigned this Mar 10, 2022
@AA-Turner
Copy link
Member

I am -1 on this; whilst a nice idea I think it overcomplicates this list massively for a edge-case.

If glob syntax is supported, why not just add *env* to the blacklist?

A

@CAM-Gerlach CAM-Gerlach requested a review from hugovk March 11, 2022 06:34
@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Mar 11, 2022

The cited example and the stopgap fix is one specific case, but the actual problem @ericsnowcurrently describes in #2404 is general: any .txt file added anywhere in this repo, and any .txt file added by any tool at a user- or repo-level .gitignored directory, file or path will be interpreted as a reST PEP file and attempted to be built/rendered accordingly (which will likely hard fail the build), unless explicitly deny-listed.

Adding *env* is merely a stopgap fix to one very specific case (venvs/virtualenvs that happen to be named *env*) of a much more general problem, leaving any other instance of the above not explicitly special-cased into the exclude_patterns unsolved. Instead, this PR implements the "allow-list" approach for legacy .txt PEPs that @ericsnowcurrently suggests using the existing supported exclude_pattern syntax.

I'm not sure I see how adding a handful of globs to an exclude list, while eliminating many of the existing special cases in the process (and avoid having to add an unbounded number more in the future), "massively" increases complexity. Since it as confirmed per my testing, it effectively eliminates any chance of a false negative while producing no false positives, and no more txt PEPs will be added, I don't see a practical downside that would motivate us to not simply fix this the right way from the start.

@AA-Turner
Copy link
Member

I'm not sure I see how adding a handful of globs to an exclude list ... "massively" increases complexity.

The list does not get much longer, but it is really hard (at least for me) to mentally parse what the patterns are meant to be matching or not matching -- the approach I took in 2415 has more code "behind the veil", but keeps conf.py very simple / understandable.

So not complexity in terms of what is actually written, but complexity in terms of having to spend several minutes understanding what is going on with the patterns.

A

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Mar 12, 2022

Yeah, I can certainly see how it could be confusing, but I could just explain it more clearly in the code comments. However, I'd be okay with your revised approach in #2415 provided it is actively planned to be upstreamed in Sphinx, because otherwise it adds way more complexity we have to maintain indefinitely than a few exclude patterns here.

@CAM-Gerlach CAM-Gerlach force-pushed the infra-fix-exclude-txt branch from 4e93e21 to 09f4ed3 Compare March 12, 2022 18:16
@CAM-Gerlach
Copy link
Member Author

Looks like closing PRs with other PRs is working again; it wasn't for a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug CLA signed infra Core infrastructure for building and rendering PEPs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make sphinx Breaks when Certain Untracked Files are Present
3 participants