-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify dependencies before uploading a package #1
Conversation
@@ -28,13 +29,21 @@ def main(typeshed_dir: str, commit: str, dry_run: bool = False) -> None: | |||
if dry_run: | |||
print(f"Would upload: {distribution}, increment {increment}") | |||
continue | |||
for dependency in build_wheel.read_matadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Share this code? It seems repeated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I tried, but because of one line in the middle that is different it doesn't look worth it.
@@ -25,13 +26,21 @@ def main(typeshed_dir: str, pattern: str) -> None: | |||
# Setting base version to None, so it will be read from current METADATA.toml. | |||
increment = get_version.main(typeshed_dir, distribution, version=None) | |||
increment += 1 | |||
for dependency in build_wheel.read_matadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the other copy which looks the same.
scripts/upload_changed.py
Outdated
).get("requires", []): | ||
if get_version.check_exists(get_version.strip_dep_version(dependency)): | ||
# If this dependency is already present, check it was uploaded by us. | ||
build_wheel.verify_dependency(typeshed_dir, dependency, uploaded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we always check that the dependency is on PyPI? Otherwise it may be missing, and somebody could upload a bad distribution afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I sort by dependency order it should be OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the check_exists
check for? Can we verify the dependency unconditionally? If a dependency doesn't exist, we should reject the dependency, since somebody could later upload some bad stuff, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! I'm not still sure if the dependency logic quite right. Can you double check it?
scripts/build_wheel.py
Outdated
@@ -20,8 +22,12 @@ | |||
import shutil | |||
import tempfile | |||
import subprocess | |||
from collections import defaultdict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: two spaces after 'from'
scripts/build_wheel.py
Outdated
for dependency in data.get("requires", []): | ||
dependency = get_version.strip_dep_version(dependency) | ||
assert dependency.startswith("types-"), "Currently only dependencies on stub packages are supported" | ||
if dependency[len("types-"):] in distributions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be worth adding a helper that does s[len("types-"):]
, since we do it several times.
@@ -34,6 +37,30 @@ def read_base_version(typeshed_dir: str, distribution: str) -> str: | |||
return data["version"] | |||
|
|||
|
|||
def strip_dep_version(dependency: str) -> str: | |||
"""Strip a possible version suffix, e.g. types-six>=0.1.4 -> types-six.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there is a space before >=
? There could also be a semicolon. This example is from test-requirements.txt
for mypy:
flake8-bugbear; python_version >= '3.5'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should not be semicolons, requires
is a list already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't notice what is in the second part, I think we may not support this initially.
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument("typeshed_dir", help="Path to typeshed checkout directory") | ||
parser.add_argument("previous_commit", help="Previous typeshed commit for which we performed upload") | ||
parser.add_argument("uploaded", help="Previously uploaded packages to validate dependencies") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a path to a file, not the name of packages? Make the help text more explicit.
scripts/upload_changed.py
Outdated
).get("requires", []): | ||
if get_version.check_exists(get_version.strip_dep_version(dependency)): | ||
# If this dependency is already present, check it was uploaded by us. | ||
build_wheel.verify_dependency(typeshed_dir, dependency, uploaded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the check_exists
check for? Can we verify the dependency unconditionally? If a dependency doesn't exist, we should reject the dependency, since somebody could later upload some bad stuff, I think.
Resolves python/typeshed#5768 In python/typeshed#5768 (comment) consensus was reached on Akuli's idea, that external dependencies for stub packages are fine, as long as we validate that the external dependency is a dependency of the upstream package. The most important part of the implementation is the validation we perform. This happens in `verify_typeshed_req` and `verify_external_req`. To help lock things down, Metadata does not expose access to elements of `requires` without validation. We rely on PyPI's API to find the dependencies of the upstream. I believe this might not work if our stub has external dependencies and the upstream does not publish wheels. This is not the case currently (as proven by test) and does not seem too likely, so will leave as a TODO for the future. We use uploaded_packages.txt as the source of truth for what is a typeshed dependency. This is important to avoid potential badness around addition and deletion, for instance, if something is added to typeshed, but the distribution is created by someone else before stub_uploader uploads it. The other set of changes is that I delete most of the graph code that existed previously. The graph code was added in typeshed-internal#1 and was previously load bearing for security. The idea being to ensure that the graph of transitive dependencies was fully contained within typeshed. This is no longer the case, so we can remove most of it. I still have some graph code in here, but it's no longer load bearing for security. I keep it around to better preserve uploading semantics, since it seems like it could matter in some edge case scenarios (such as multiple packages being uploaded for the first time that depend on each other). Since we don't have custom needs, we can get away with using the new-ish graphlib from stdlib. While the graph code has been removed, note that we do still run validation on transitive dependencies for each package. This is accomplished by `recursive_verify`. I think the non-transitive validation is sufficient, but running this before uploading each package doesn't hurt. I added some special-casing for types-gdb. As Akuli pointed out, this avoids us accidentally trusting a gdb package on PyPI if ever someone attempts to add external dependencies to types-gdb. New code paths have tests. I audited test coverage to make sure of this.
Resolves python/typeshed#5768 In python/typeshed#5768 (comment) consensus was reached on Akuli's idea, that external dependencies for stub packages are fine, as long as we validate that the external dependency is a dependency of the upstream package. The most important part of the implementation is the validation we perform. This happens in `verify_typeshed_req` and `verify_external_req`. To help lock things down, Metadata does not expose access to elements of `requires` without validation. We rely on PyPI's API to find the dependencies of the upstream. I believe this might not work if our stub has external dependencies and the upstream does not publish wheels. This is not the case currently (as proven by test) and does not seem too likely, so will leave as a TODO for the future. We use uploaded_packages.txt as the source of truth for what is a typeshed dependency. This is important to avoid potential badness around addition and deletion, for instance, if something is added to typeshed, but the distribution is created by someone else before stub_uploader uploads it. The other set of changes is that I delete most of the graph code that existed previously. The graph code was added in typeshed-internal#1 and was previously load bearing for security. The idea being to ensure that the graph of transitive dependencies was fully contained within typeshed. This is no longer the case, so we can remove most of it. I still have some graph code in here, but it's no longer load bearing for security. I keep it around to better preserve uploading semantics, since it seems like it could matter in some edge case scenarios (such as multiple packages being uploaded for the first time that depend on each other). Since we don't have custom needs, we can get away with using the new-ish graphlib from stdlib. While the graph code has been removed, note that we do still run validation on transitive dependencies for each package. This is accomplished by `recursive_verify`. I think the non-transitive validation is sufficient, but running this before uploading each package doesn't hurt. I added some special-casing for types-gdb. As Akuli pointed out, this avoids us accidentally trusting a gdb package on PyPI if ever someone attempts to add external dependencies to types-gdb. New code paths have tests. I audited test coverage to make sure of this.
* Allow external dependencies Resolves python/typeshed#5768 In python/typeshed#5768 (comment) consensus was reached on Akuli's idea, that external dependencies for stub packages are fine, as long as we validate that the external dependency is a dependency of the upstream package. The most important part of the implementation is the validation we perform. This happens in `verify_typeshed_req` and `verify_external_req`. To help lock things down, Metadata does not expose access to elements of `requires` without validation. We rely on PyPI's API to find the dependencies of the upstream. I believe this might not work if our stub has external dependencies and the upstream does not publish wheels. This is not the case currently (as proven by test) and does not seem too likely, so will leave as a TODO for the future. We use uploaded_packages.txt as the source of truth for what is a typeshed dependency. This is important to avoid potential badness around addition and deletion, for instance, if something is added to typeshed, but the distribution is created by someone else before stub_uploader uploads it. The other set of changes is that I delete most of the graph code that existed previously. The graph code was added in #1 and was previously load bearing for security. The idea being to ensure that the graph of transitive dependencies was fully contained within typeshed. This is no longer the case, so we can remove most of it. I still have some graph code in here, but it's no longer load bearing for security. I keep it around to better preserve uploading semantics, since it seems like it could matter in some edge case scenarios (such as multiple packages being uploaded for the first time that depend on each other). Since we don't have custom needs, we can get away with using the new-ish graphlib from stdlib. While the graph code has been removed, note that we do still run validation on transitive dependencies for each package. This is accomplished by `recursive_verify`. I think the non-transitive validation is sufficient, but running this before uploading each package doesn't hurt. I added some special-casing for types-gdb. As Akuli pointed out, this avoids us accidentally trusting a gdb package on PyPI if ever someone attempts to add external dependencies to types-gdb. New code paths have tests. I audited test coverage to make sure of this. * error if distribution is not on pypi * use removeprefix * hoist packages out of loop * add comment about sdists * add allowlist * add comment about requires_dist * Update stub_uploader/metadata.py
This is required for security reasons. cc @JukkaL