Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify dependencies before uploading a package #1

Merged
merged 13 commits into from
Feb 1, 2021
Merged

Conversation

ilevkivskyi
Copy link
Contributor

This is required for security reasons. cc @JukkaL

scripts/build_wheel.py Outdated Show resolved Hide resolved
@@ -28,13 +29,21 @@ def main(typeshed_dir: str, commit: str, dry_run: bool = False) -> None:
if dry_run:
print(f"Would upload: {distribution}, increment {increment}")
continue
for dependency in build_wheel.read_matadata(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Share this code? It seems repeated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tried, but because of one line in the middle that is different it doesn't look worth it.

@@ -25,13 +26,21 @@ def main(typeshed_dir: str, pattern: str) -> None:
# Setting base version to None, so it will be read from current METADATA.toml.
increment = get_version.main(typeshed_dir, distribution, version=None)
increment += 1
for dependency in build_wheel.read_matadata(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the other copy which looks the same.

).get("requires", []):
if get_version.check_exists(get_version.strip_dep_version(dependency)):
# If this dependency is already present, check it was uploaded by us.
build_wheel.verify_dependency(typeshed_dir, dependency, uploaded)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we always check that the dependency is on PyPI? Otherwise it may be missing, and somebody could upload a bad distribution afterwards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I sort by dependency order it should be OK.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the check_exists check for? Can we verify the dependency unconditionally? If a dependency doesn't exist, we should reject the dependency, since somebody could later upload some bad stuff, I think.

@ilevkivskyi ilevkivskyi requested a review from JukkaL January 30, 2021 16:17
Copy link

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! I'm not still sure if the dependency logic quite right. Can you double check it?

@@ -20,8 +22,12 @@
import shutil
import tempfile
import subprocess
from collections import defaultdict
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: two spaces after 'from'

for dependency in data.get("requires", []):
dependency = get_version.strip_dep_version(dependency)
assert dependency.startswith("types-"), "Currently only dependencies on stub packages are supported"
if dependency[len("types-"):] in distributions:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth adding a helper that does s[len("types-"):], since we do it several times.

@@ -34,6 +37,30 @@ def read_base_version(typeshed_dir: str, distribution: str) -> str:
return data["version"]


def strip_dep_version(dependency: str) -> str:
"""Strip a possible version suffix, e.g. types-six>=0.1.4 -> types-six."""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there is a space before >=? There could also be a semicolon. This example is from test-requirements.txt for mypy:

flake8-bugbear; python_version >= '3.5'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should not be semicolons, requires is a list already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't notice what is in the second part, I think we may not support this initially.



if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("typeshed_dir", help="Path to typeshed checkout directory")
parser.add_argument("previous_commit", help="Previous typeshed commit for which we performed upload")
parser.add_argument("uploaded", help="Previously uploaded packages to validate dependencies")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a path to a file, not the name of packages? Make the help text more explicit.

).get("requires", []):
if get_version.check_exists(get_version.strip_dep_version(dependency)):
# If this dependency is already present, check it was uploaded by us.
build_wheel.verify_dependency(typeshed_dir, dependency, uploaded)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the check_exists check for? Can we verify the dependency unconditionally? If a dependency doesn't exist, we should reject the dependency, since somebody could later upload some bad stuff, I think.

Ivan Levkivskyi added 2 commits February 1, 2021 15:02
@ilevkivskyi ilevkivskyi merged commit ee3a48e into main Feb 1, 2021
@ilevkivskyi ilevkivskyi deleted the verify-deps branch February 1, 2021 15:30
hauntsaninja added a commit to hauntsaninja/stub_uploader that referenced this pull request Sep 22, 2022
Resolves python/typeshed#5768

In python/typeshed#5768 (comment)
consensus was reached on Akuli's idea, that external dependencies for
stub packages are fine, as long as we validate that the external
dependency is a dependency of the upstream package.

The most important part of the implementation is the validation we
perform. This happens in `verify_typeshed_req` and
`verify_external_req`. To help lock things down, Metadata does not
expose access to elements of `requires` without validation.

We rely on PyPI's API to find the dependencies of the upstream. I
believe this might not work if our stub has external dependencies and
the upstream does not publish wheels. This is not the case currently (as
proven by test) and does not seem too likely, so will leave as a TODO
for the future.

We use uploaded_packages.txt as the source of truth for what is a
typeshed dependency. This is important to avoid potential badness around
addition and deletion, for instance, if something is added to typeshed,
but the distribution is created by someone else before stub_uploader
uploads it.

The other set of changes is that I delete most of the graph code that
existed previously. The graph code was added in
typeshed-internal#1 and was
previously load bearing for security. The idea being to ensure that the
graph of transitive dependencies was fully contained within typeshed.
This is no longer the case, so we can remove most of it.

I still have some graph code in here, but it's no longer load bearing
for security. I keep it around to better preserve uploading semantics,
since it seems like it could matter in some edge case scenarios (such as
multiple packages being uploaded for the first time that depend on each
other). Since we don't have custom needs, we can get away with using the
new-ish graphlib from stdlib.

While the graph code has been removed, note that we do still run
validation on transitive dependencies for each package. This is
accomplished by `recursive_verify`. I think the non-transitive
validation is sufficient, but running this before uploading each package
doesn't hurt.

I added some special-casing for types-gdb. As Akuli pointed out, this
avoids us accidentally trusting a gdb package on PyPI if ever someone
attempts to add external dependencies to types-gdb.

New code paths have tests. I audited test coverage to make sure of this.
hauntsaninja added a commit to hauntsaninja/stub_uploader that referenced this pull request Sep 22, 2022
Resolves python/typeshed#5768

In python/typeshed#5768 (comment)
consensus was reached on Akuli's idea, that external dependencies for
stub packages are fine, as long as we validate that the external
dependency is a dependency of the upstream package.

The most important part of the implementation is the validation we
perform. This happens in `verify_typeshed_req` and
`verify_external_req`. To help lock things down, Metadata does not
expose access to elements of `requires` without validation.

We rely on PyPI's API to find the dependencies of the upstream. I
believe this might not work if our stub has external dependencies and
the upstream does not publish wheels. This is not the case currently (as
proven by test) and does not seem too likely, so will leave as a TODO
for the future.

We use uploaded_packages.txt as the source of truth for what is a
typeshed dependency. This is important to avoid potential badness around
addition and deletion, for instance, if something is added to typeshed,
but the distribution is created by someone else before stub_uploader
uploads it.

The other set of changes is that I delete most of the graph code that
existed previously. The graph code was added in
typeshed-internal#1 and was
previously load bearing for security. The idea being to ensure that the
graph of transitive dependencies was fully contained within typeshed.
This is no longer the case, so we can remove most of it.

I still have some graph code in here, but it's no longer load bearing
for security. I keep it around to better preserve uploading semantics,
since it seems like it could matter in some edge case scenarios (such as
multiple packages being uploaded for the first time that depend on each
other). Since we don't have custom needs, we can get away with using the
new-ish graphlib from stdlib.

While the graph code has been removed, note that we do still run
validation on transitive dependencies for each package. This is
accomplished by `recursive_verify`. I think the non-transitive
validation is sufficient, but running this before uploading each package
doesn't hurt.

I added some special-casing for types-gdb. As Akuli pointed out, this
avoids us accidentally trusting a gdb package on PyPI if ever someone
attempts to add external dependencies to types-gdb.

New code paths have tests. I audited test coverage to make sure of this.
ilevkivskyi pushed a commit that referenced this pull request Oct 11, 2022
* Allow external dependencies

Resolves python/typeshed#5768

In python/typeshed#5768 (comment)
consensus was reached on Akuli's idea, that external dependencies for
stub packages are fine, as long as we validate that the external
dependency is a dependency of the upstream package.

The most important part of the implementation is the validation we
perform. This happens in `verify_typeshed_req` and
`verify_external_req`. To help lock things down, Metadata does not
expose access to elements of `requires` without validation.

We rely on PyPI's API to find the dependencies of the upstream. I
believe this might not work if our stub has external dependencies and
the upstream does not publish wheels. This is not the case currently (as
proven by test) and does not seem too likely, so will leave as a TODO
for the future.

We use uploaded_packages.txt as the source of truth for what is a
typeshed dependency. This is important to avoid potential badness around
addition and deletion, for instance, if something is added to typeshed,
but the distribution is created by someone else before stub_uploader
uploads it.

The other set of changes is that I delete most of the graph code that
existed previously. The graph code was added in
#1 and was
previously load bearing for security. The idea being to ensure that the
graph of transitive dependencies was fully contained within typeshed.
This is no longer the case, so we can remove most of it.

I still have some graph code in here, but it's no longer load bearing
for security. I keep it around to better preserve uploading semantics,
since it seems like it could matter in some edge case scenarios (such as
multiple packages being uploaded for the first time that depend on each
other). Since we don't have custom needs, we can get away with using the
new-ish graphlib from stdlib.

While the graph code has been removed, note that we do still run
validation on transitive dependencies for each package. This is
accomplished by `recursive_verify`. I think the non-transitive
validation is sufficient, but running this before uploading each package
doesn't hurt.

I added some special-casing for types-gdb. As Akuli pointed out, this
avoids us accidentally trusting a gdb package on PyPI if ever someone
attempts to add external dependencies to types-gdb.

New code paths have tests. I audited test coverage to make sure of this.

* error if distribution is not on pypi

* use removeprefix

* hoist packages out of loop

* add comment about sdists

* add allowlist

* add comment about requires_dist

* Update stub_uploader/metadata.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants