Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow external dependencies #61

Merged
merged 8 commits into from
Oct 11, 2022

Conversation

hauntsaninja
Copy link
Contributor

@hauntsaninja hauntsaninja commented Sep 22, 2022

Resolves python/typeshed#5768 (on stub_uploader side)

In python/typeshed#5768 (comment) consensus was seemingly reached on Akuli's idea, that external dependencies for stub packages are fine, as long as we validate that the external dependency is a dependency of the upstream package.

The most important part of the implementation is the validation we perform. This happens in verify_typeshed_req and verify_external_req. To help lock things down, Metadata does not expose access to elements of requires without validation.

We rely on PyPI's API to find the dependencies of the upstream. I believe this might not work if our stub has external dependencies and the upstream does not publish wheels. This is not the case currently (as proven by test), does not seem too likely, and it's unclear how to do safely, so will leave as a TODO for the future.

We use uploaded_packages.txt as the source of truth for what is a typeshed dependency. This is important to avoid potential badness around addition and deletion, for instance, if something is added to typeshed, but the distribution is created by someone else before stub_uploader uploads it.

The other set of changes is that I delete most of the graph code that existed previously. The graph code was added in
#1 and was previously load bearing for security. The idea being to ensure that the graph of transitive dependencies was fully contained within typeshed. This is no longer the case, so we can remove most of it.

I still have some graph code in here, but it's no longer load bearing for security. I keep it around to better preserve uploading semantics, since it seems like it could matter in some edge case scenarios (such as multiple packages being uploaded for the first time that depend on each other). Since we don't have custom needs, we can get away with using the new-ish graphlib from stdlib.

While the graph code has been removed, note that we do still run validation on transitive dependencies for each package. This is accomplished by recursive_verify. (I think the non-transitive validation is probably sufficient, but running this before uploading each package can't hurt)

I added some special-casing for types-gdb. As Akuli pointed out, this avoids us accidentally trusting a gdb package on PyPI if ever someone attempts to add external dependencies to types-gdb.

New code paths have tests. I audited test coverage to make sure of this.

Resolves python/typeshed#5768

In python/typeshed#5768 (comment)
consensus was reached on Akuli's idea, that external dependencies for
stub packages are fine, as long as we validate that the external
dependency is a dependency of the upstream package.

The most important part of the implementation is the validation we
perform. This happens in `verify_typeshed_req` and
`verify_external_req`. To help lock things down, Metadata does not
expose access to elements of `requires` without validation.

We rely on PyPI's API to find the dependencies of the upstream. I
believe this might not work if our stub has external dependencies and
the upstream does not publish wheels. This is not the case currently (as
proven by test) and does not seem too likely, so will leave as a TODO
for the future.

We use uploaded_packages.txt as the source of truth for what is a
typeshed dependency. This is important to avoid potential badness around
addition and deletion, for instance, if something is added to typeshed,
but the distribution is created by someone else before stub_uploader
uploads it.

The other set of changes is that I delete most of the graph code that
existed previously. The graph code was added in
typeshed-internal#1 and was
previously load bearing for security. The idea being to ensure that the
graph of transitive dependencies was fully contained within typeshed.
This is no longer the case, so we can remove most of it.

I still have some graph code in here, but it's no longer load bearing
for security. I keep it around to better preserve uploading semantics,
since it seems like it could matter in some edge case scenarios (such as
multiple packages being uploaded for the first time that depend on each
other). Since we don't have custom needs, we can get away with using the
new-ish graphlib from stdlib.

While the graph code has been removed, note that we do still run
validation on transitive dependencies for each package. This is
accomplished by `recursive_verify`. I think the non-transitive
validation is sufficient, but running this before uploading each package
doesn't hurt.

I added some special-casing for types-gdb. As Akuli pointed out, this
avoids us accidentally trusting a gdb package on PyPI if ever someone
attempts to add external dependencies to types-gdb.

New code paths have tests. I audited test coverage to make sure of this.
stub_uploader/metadata.py Outdated Show resolved Hide resolved

def read(self) -> set[str]:
if self._cached is not None:
return set(map(canonical_name, self._cached))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not cache the canonicalized set?

Copy link
Contributor Author

@hauntsaninja hauntsaninja Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently uploaded_packages.txt uses pretty case and I figured we might want to keep it that way

stub_uploader/metadata.py Outdated Show resolved Hide resolved
stub_uploader/metadata.py Outdated Show resolved Hide resolved
def upstream_distribution(self) -> Optional[str]:
# TODO: add a field to METADATA.toml if the stubs are for a package
# that does not exist on PyPI
if self._alleged_upstream_distribution == "gdb":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me think about another possible attack. What if someone creates a package foo and adds types-foo to typeshed. It can be a totally benign package. Then later they make a PR that makes types-foo a dependency of something commonly used, like types-requests (or maybe something less suspicious), e.g. because the library has some custom type that works as an argument for some requests function. Much later they add some malicious package bad as a dependency of foo, and submit a "trivial" PR adding it also as a dependency of types-foo. Now this malicious package bad will be installed for a lot of users (executing arbitrary code during install), despite the fact that no-one may actually use foo.

Of course this is a tricky scenario, but someone may still try it, taking into account that potential impact will be very high. You may say that something similar is also possible for a regular runtime package, but the difference is that maintainers of runtime packages usually deal with only few dependencies, and adding another one is a really big deal, while typeshed maintainers will (hopefully) deal with hundreds or thousands packages each with their own deps.

So, I would say we will still need an explicit allowlist in this repo, in addition to the logic proposed in this PR.

Copy link
Contributor Author

@hauntsaninja hauntsaninja Sep 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that can be dealt with in a similar way to the current logic for external dependencies, that is, in order to add types-xyz as a dependency of types-requests, xyz needs to be a dependency of requests.

That's probably more robust too, because whatever process we have for adding things to the allowlist will be used to add things to the allowlist. To spell things out:

  1. Package foo exists, that depends on bar
  2. Add types-foo, add dependency on bar
  3. In order to make types-foo useful, we need bar, so we do some security review of bar / the bar maintainer and add it to the allowlist
  4. bar (gets hacked and) adds a dependency on evil
  5. types-foo users are affected, while this isn't good we're okay with it, since those are assumed to be a subset of foo users
  6. Someone sneaks past adding a dependency on types-foo to types-requests
  7. types-requests users are affected, this is really bad

So adding a step 3 can help in practice, but we should prevent step 6. I can write this check, should be pretty easy to extend the existing logic.

Note that I'm not against adding an allowlist, but I'd like to think of it more as a fallback. I'd also want us to clearly document the criteria for additions to the allowlist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I wrote the logic for preventing step 6 locally, but it doesn't pass cleanly against existing packages. There are four failures, two of which are addressed by python/typeshed#8792 and python/typeshed#8793.

The other two are trickier, so would need to make that change later in another PR (might need new METADATA.toml fields). In the meantime I've added the allowlist as proposed, which should unblock this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it doesn't pass cleanly against existing packages

Yeah, I felt it may happen. Anyway, the PR looks good now.

@hauntsaninja
Copy link
Contributor Author

Bump :-)

@ilevkivskyi ilevkivskyi merged commit 5aae098 into typeshed-internal:main Oct 11, 2022
@ilevkivskyi
Copy link
Contributor

@hauntsaninja Hm, it looks like one of added tests is broken on master, could you please check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow non-types dependencies
5 participants