-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow external dependencies #61
Allow external dependencies #61
Conversation
Resolves python/typeshed#5768 In python/typeshed#5768 (comment) consensus was reached on Akuli's idea, that external dependencies for stub packages are fine, as long as we validate that the external dependency is a dependency of the upstream package. The most important part of the implementation is the validation we perform. This happens in `verify_typeshed_req` and `verify_external_req`. To help lock things down, Metadata does not expose access to elements of `requires` without validation. We rely on PyPI's API to find the dependencies of the upstream. I believe this might not work if our stub has external dependencies and the upstream does not publish wheels. This is not the case currently (as proven by test) and does not seem too likely, so will leave as a TODO for the future. We use uploaded_packages.txt as the source of truth for what is a typeshed dependency. This is important to avoid potential badness around addition and deletion, for instance, if something is added to typeshed, but the distribution is created by someone else before stub_uploader uploads it. The other set of changes is that I delete most of the graph code that existed previously. The graph code was added in typeshed-internal#1 and was previously load bearing for security. The idea being to ensure that the graph of transitive dependencies was fully contained within typeshed. This is no longer the case, so we can remove most of it. I still have some graph code in here, but it's no longer load bearing for security. I keep it around to better preserve uploading semantics, since it seems like it could matter in some edge case scenarios (such as multiple packages being uploaded for the first time that depend on each other). Since we don't have custom needs, we can get away with using the new-ish graphlib from stdlib. While the graph code has been removed, note that we do still run validation on transitive dependencies for each package. This is accomplished by `recursive_verify`. I think the non-transitive validation is sufficient, but running this before uploading each package doesn't hurt. I added some special-casing for types-gdb. As Akuli pointed out, this avoids us accidentally trusting a gdb package on PyPI if ever someone attempts to add external dependencies to types-gdb. New code paths have tests. I audited test coverage to make sure of this.
|
||
def read(self) -> set[str]: | ||
if self._cached is not None: | ||
return set(map(canonical_name, self._cached)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not cache the canonicalized set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently uploaded_packages.txt uses pretty case and I figured we might want to keep it that way
def upstream_distribution(self) -> Optional[str]: | ||
# TODO: add a field to METADATA.toml if the stubs are for a package | ||
# that does not exist on PyPI | ||
if self._alleged_upstream_distribution == "gdb": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think about another possible attack. What if someone creates a package foo
and adds types-foo
to typeshed. It can be a totally benign package. Then later they make a PR that makes types-foo
a dependency of something commonly used, like types-requests
(or maybe something less suspicious), e.g. because the library has some custom type that works as an argument for some requests
function. Much later they add some malicious package bad
as a dependency of foo
, and submit a "trivial" PR adding it also as a dependency of types-foo
. Now this malicious package bad
will be installed for a lot of users (executing arbitrary code during install), despite the fact that no-one may actually use foo
.
Of course this is a tricky scenario, but someone may still try it, taking into account that potential impact will be very high. You may say that something similar is also possible for a regular runtime package, but the difference is that maintainers of runtime packages usually deal with only few dependencies, and adding another one is a really big deal, while typeshed maintainers will (hopefully) deal with hundreds or thousands packages each with their own deps.
So, I would say we will still need an explicit allowlist in this repo, in addition to the logic proposed in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that can be dealt with in a similar way to the current logic for external dependencies, that is, in order to add types-xyz
as a dependency of types-requests
, xyz
needs to be a dependency of requests
.
That's probably more robust too, because whatever process we have for adding things to the allowlist will be used to add things to the allowlist. To spell things out:
- Package foo exists, that depends on bar
- Add types-foo, add dependency on bar
- In order to make types-foo useful, we need bar, so we do some security review of bar / the bar maintainer and add it to the allowlist
- bar (gets hacked and) adds a dependency on evil
- types-foo users are affected, while this isn't good we're okay with it, since those are assumed to be a subset of foo users
- Someone sneaks past adding a dependency on types-foo to types-requests
- types-requests users are affected, this is really bad
So adding a step 3 can help in practice, but we should prevent step 6. I can write this check, should be pretty easy to extend the existing logic.
Note that I'm not against adding an allowlist, but I'd like to think of it more as a fallback. I'd also want us to clearly document the criteria for additions to the allowlist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I wrote the logic for preventing step 6 locally, but it doesn't pass cleanly against existing packages. There are four failures, two of which are addressed by python/typeshed#8792 and python/typeshed#8793.
The other two are trickier, so would need to make that change later in another PR (might need new METADATA.toml fields). In the meantime I've added the allowlist as proposed, which should unblock this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it doesn't pass cleanly against existing packages
Yeah, I felt it may happen. Anyway, the PR looks good now.
Bump :-) |
@hauntsaninja Hm, it looks like one of added tests is broken on master, could you please check? |
Resolves python/typeshed#5768 (on stub_uploader side)
In python/typeshed#5768 (comment) consensus was seemingly reached on Akuli's idea, that external dependencies for stub packages are fine, as long as we validate that the external dependency is a dependency of the upstream package.
The most important part of the implementation is the validation we perform. This happens in
verify_typeshed_req
andverify_external_req
. To help lock things down, Metadata does not expose access to elements ofrequires
without validation.We rely on PyPI's API to find the dependencies of the upstream. I believe this might not work if our stub has external dependencies and the upstream does not publish wheels. This is not the case currently (as proven by test), does not seem too likely, and it's unclear how to do safely, so will leave as a TODO for the future.
We use uploaded_packages.txt as the source of truth for what is a typeshed dependency. This is important to avoid potential badness around addition and deletion, for instance, if something is added to typeshed, but the distribution is created by someone else before stub_uploader uploads it.
The other set of changes is that I delete most of the graph code that existed previously. The graph code was added in
#1 and was previously load bearing for security. The idea being to ensure that the graph of transitive dependencies was fully contained within typeshed. This is no longer the case, so we can remove most of it.
I still have some graph code in here, but it's no longer load bearing for security. I keep it around to better preserve uploading semantics, since it seems like it could matter in some edge case scenarios (such as multiple packages being uploaded for the first time that depend on each other). Since we don't have custom needs, we can get away with using the new-ish graphlib from stdlib.
While the graph code has been removed, note that we do still run validation on transitive dependencies for each package. This is accomplished by
recursive_verify
. (I think the non-transitive validation is probably sufficient, but running this before uploading each package can't hurt)I added some special-casing for types-gdb. As Akuli pointed out, this avoids us accidentally trusting a gdb package on PyPI if ever someone attempts to add external dependencies to types-gdb.
New code paths have tests. I audited test coverage to make sure of this.