-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring pr tuf initialization #6
base: main
Are you sure you want to change the base?
Refactoring pr tuf initialization #6
Conversation
50843f6
to
3576806
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 ❤️ 🚀 Amazing work, @kairoaraujo! On a first pass it all looks sound.
But I'd like to give it some more scrutiny to see if if it really is fully spec/pep compliant.
Before that I have a couple of high-level questions:
IIUC the call paths to create and modify the TUF metadata repository are as follows:
cli.tuf
->tuf.repository
cli.tuf
->tuf.utils.repository_*
->tuf.repository
tuf.tasks
->tuf.utils.repository_*
->tuf.repository
I must admit that the boundaries between those levels of abstraction are not completely obvious to me. Would you mind sharing your thoughts about each module and why you implemented any given functionality in a given module?
Let me try to be more concrete, you added two new data structures RolesPayload
and TargetsPayload
, which resemble the data structures from the python-tuf
metadata API quite a bit. Now if you look e.g. at add_targets
or delegate_targets_roles
in the cli.tuf
,
there you iterate over some input data (db.query, hash bin generator) to construct a payload container, which you send over to the your tuf.repository
, where you again iterate over the data, i.e. to container, to generate and write actual TUF metadata. Wouldn't it be possible to skip this intermediate wrapping and unwrapping of data? I think that would make it easier to see if the resulting TUF metadata is correct, at least for me.
That said, I am sure you have good reasons for your design, and I'd love to understand them. :)
I also left some comments inline, including some that are not very important...
warehouse/tuf/repository.py
Outdated
|
||
Args | ||
delegator_metadata: Delegator Metadata | ||
snapshot_metadata: Snapshot Metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why pass snapshot_metadata
only to return it unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You found a 🐛 here.
It is changed to Optional once it depends on the scenario.
The snapshot_metadata
is important if you create multiple delegations and update the Snapshot Meta many times to avoid bumping multiple Snapshot versions.
For example, when we create the hashed bin roles.
warehouse/cli/tuf.py
Outdated
storage_service = _storage_service(config) | ||
metadata_repository = MetadataRepository(storage_service, key_service) | ||
|
||
if metadata_repository._is_initialized: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: Why use the private _is_initialized
when there is also a public is_initialized
? Also, in initialize()
, which you call below the bool is checked again and a FileExistsError
is raised on True, which you do handle handle.
On a more general note, I wonder if you really need the initialization state variable? It looks like a lot of code and effort to keep it up-to-date, and easy to get wrong. Maybe it's easier to handle the existence / non-existence of the files on per-case basis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't fix it yet.
warehouse/tuf/repository.py
Outdated
# Sign all top level roles metadata | ||
signers = dict() | ||
for role in TOP_LEVEL_ROLE_NAMES: | ||
if payload[role].threshold < len(payload[role].keys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in production the ultimately desired signing threshold is higher than the available signing keys at the time of bootstrapping the top-level metadata, at least for the root role (and I think also for top-level targets and bins roles, which according to the PEP should also be signed with offline keys).
If this initialize
method is for testing/development purposes (currently it is only used by the tuf dev new-repo
command) maybe it does not have to be part of the MetadataRepository
API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention here is also to have an admin command, giving the local keys as parameters.
The idea is that during the creation of the initial Metadata Repository, we could use it in an offline environment avoiding a manual bootstrap and possible mistakes.
We could do the process as a command line for the Metadata Repository bootstrap and BIN-S delegations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current development, I increased the number of keys/threshold to 2 for the Targets role, in a sense to test multiple keys.
As you can see, I tried to improve it in the last commit.
As we discussed, the idea is to send to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting better and better! With your recent changes I can focus on tuf.services
and tuf.repository
to review spec/pep compliance. ❤️
I'm not fully finished yet, but have a couple of comments I'd like to share with you in the meanwhile...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good morning, @kairoaraujo! I have two questions about tuf.interfaces (full disclosure: I am not familiar with zope
)
I made some improvements to the CLI descriptions.
|
warehouse/tuf/services.py
Outdated
@@ -370,6 +423,9 @@ def add_hashed_targets(self, targets): | |||
self._storage_backend, self._key_storage_backend | |||
) | |||
|
|||
# TODO: Should we renew expiration date of 'timestamp' *and* 'snapshot' here? | |||
# PEP 458 'Producing Consistent Snapshots' only mentions 'timestamp'. | |||
# https://www.python.org/dev/peps/pep-0458/#producing-consistent-snapshots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is something we should run by the other PEP authors.
7de24c1
to
b56f43a
Compare
One more thing I noticed wrt interface boundary between All service methods that update targets metadata, i.e. I think this makes the 'tuf.repository' interface slightly inconsistent. Is performing the chain of inter-dependent metadata updates a repository task, or is it a task for the application code (service)? I don't say we should change this, but I wanted to take a note about it somewhere. |
warehouse/tuf/repository.py
Outdated
role_metadata.sign(SSlibSigner(key), append=True) | ||
|
||
self._store(rolename, role_metadata) | ||
role_metadata = self.bump_role_version( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing I noticed wrt interface boundary between
tuf.services
andtuf.repository
:All service methods that update targets metadata, i.e.
init_targets_delegation
,add_hashed_targets
andbump_bin_n_roles
transitively bump 'snapshot' and 'timestamp'. But while the latter does this explicitly, the former two do it implicitly by calling methods intuf.repository
that do it for them.bump_snapshot
follows the same principle asbump_bin_n_roles
, as it transitively but explicitly bumps 'timestamp'.I think this makes the 'tuf.repository' interface slightly inconsistent. Is performing the chain of inter-dependent metadata updates a repository task, or is it a task for the application code (service)? I don't say we should change this, but I wanted to take a note about it somewhere.
That was an excellent catch @lukpueh ❤️
I removed the implicitly calls for bumping snapshot and timestamp version in the repository (repository.tuf
).
The only remaining call is for bump_role_version
when adding new targets. For this one, I need to think of a way to avoid degrading the performance.
|
||
snapshot_metadata = metadata_repository.delegate_targets_roles( | ||
delegate_roles_payload, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a TUF perspective it is enough to bump snapshot only once after having updated all targets metadata (targets, bins, bin-n). And from a performance perspective it would be a good idea too.
I guess the reason why you bump snapshot here is to write the new snapshot to storage, so that delegate_targets_roles
below, which loads snapshot from storage doesn't have the old version. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. One of the first implementations I did, the delegated_target_roles
, has the metadata_snapshot
as optional (to load or not). Further, I decided to have it more explicit and registered in the snapshots versions for each set of delegations (targets v1 -> bins v2 -> bins-n v3). Should I revert it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the benefit of multiple versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. I just did it to test it and see if my steps were correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to revert it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that delegate_targets_roles
takes a dictionary of {"<delegator>": ["<delegatee>", ...]}
. So you could actually build your delegate_roles_payload
with "targets" and "bins" and call delegate_targets_roles
only once (and bump_snapshot) afterwards.
warehouse/tuf/repository.py
Outdated
``Dict[str, Metadata]`` | ||
""" | ||
|
||
self._check_is_initialized() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of a state property self.is_initialized
, if we don't tust it to be up-to-date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That isn't very clear, I agree.
What do you think about merging all to self.is_initialized
and giving the status in the runtime? Or any suggestion?
example:
@property
def is_initialized(self) -> bool:
"""State of repository
Returns:
bool
"""
try:
if any(
role
for role in TOP_LEVEL_ROLE_NAMES
if isinstance(self.load_role(role), Metadata)
):
return True
except StorageError as err:
if "Can't open" in str(err):
return False
return False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a lot better, if we want to stick to LBYL. Generally I wonder if we can't just go EAFP here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we go EAFP here? 🤔I try to load any TOP_LEVEL_ROLE_NAMES without checking if the files exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. But I'm thinking about places where you will use this property.
LYBL:
if is_initialized:
use_metadata()
else:
handle_unitialized_repo()
EAFP:
try:
use_metadata()
except StorageError:
handle_unitialized_repo() # Omit try/except if no need to handle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I thought about the suggestion I shared.
For example, it wouldn't protect the repository initialization (initialize()
).
warehouse/tuf/repository.py
Outdated
|
||
return timestamp_metadata | ||
|
||
def snapshot_bump_version( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this one could just call bump_role_version
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, I did it to be more explicit. But I'm open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's different than in timestamp_bump_version
, because here you don't assign snapshot's meta
field, so it's pretty much the same functionality as bump_role_version
. But we don't have to change this now.
warehouse/cli/tuf.py
Outdated
@click.option("--path", "path_", help="The basename of the Ed25519 keypair to generate") | ||
def keypair(config, name_, path_): | ||
""" | ||
Generate a new TUF keypair for top-level roles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any keypair really, also for non-top-level roles, bins and bin-n, right?
I can quickly polish these docstrings, if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It can generate any key keypair
Summarizing some architectural discussions I had with @kairoaraujo on slack: The current design implements Warehouse-specific TUF app code in The intention is to hide TUF metadata handling from Warehouse. And, in the future when better TUF repository tooling is available replace the repo abstraction by such tooling (see theupdateframework/python-tuf#1136). But it seems hard to draw a clear line between application and repository responsibilities:
The added complexity of the repository abstraction together with an unclear distribution of responsibilities makes it harder to review the correctness of the implementation in terms of PEP and TUF spec compliance. Maybe it would be easier to remove the abstraction and implement all in app code? The two relevant classes All that said, we decided that we will stick to the current design, and wait for feedback from upstream. (cc @jku, who has spent a lot of thoughts on TUF repository abstractions.) |
warehouse/tuf/tasks.py
Outdated
def bump_snapshot(task, request): | ||
""" | ||
Re-signs the TUF snapshot role, incrementing its version and renewing its | ||
expiration period. | ||
|
||
Bumping the snapshot transitively bumps the timestamp role. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: This is the only docstring in this module, and the tasks all call into service methods, which are well documented. Can we just remove this docstring here?
c842368
to
c2a8ce2
Compare
4d10b79
to
9eebff6
Compare
Co-authored-by: Abdulrahman Alfawal <[email protected]> Co-authored-by: Adolfo Jayme Barrientos <[email protected]> Co-authored-by: Alan Jacob Mathew <[email protected]> Co-authored-by: Anton Rosin <[email protected]> Co-authored-by: Eric <[email protected]> Co-authored-by: Hosted Weblate <[email protected]> Co-authored-by: Monzer Ghannam <[email protected]> Co-authored-by: Vik <[email protected]> Co-authored-by: Xu ZhuoHan <[email protected]> Co-authored-by: Yang Yulin <[email protected]> Translate-URL: https://hosted.weblate.org/projects/pypa/warehouse/ Translation: pypa/warehouse Co-authored-by: Abdulrahman Alfawal <[email protected]> Co-authored-by: Adolfo Jayme Barrientos <[email protected]> Co-authored-by: Alan Jacob Mathew <[email protected]> Co-authored-by: Anton Rosin <[email protected]> Co-authored-by: Eric <[email protected]> Co-authored-by: Monzer Ghannam <[email protected]> Co-authored-by: Vik <[email protected]> Co-authored-by: Xu ZhuoHan <[email protected]>
Bumps [terser](https://github.com/terser/terser) from 4.6.10 to 4.8.1. - [Release notes](https://github.com/terser/terser/releases) - [Changelog](https://github.com/terser/terser/blob/master/CHANGELOG.md) - [Commits](https://github.com/terser/terser/commits) --- updated-dependencies: - dependency-name: terser dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dustin Ingram <[email protected]>
We didn't maintain parity with Dockerfile* when upgrading last time. Signed-off-by: Mike Fiedler <[email protected]> Signed-off-by: Mike Fiedler <[email protected]>
* Update help.html * Update translations Co-authored-by: Dustin Ingram <[email protected]>
* chore(deps): upgrade google-cloud-bigquery Remove version constraint and include the new dependencies of numpy and pyarrow. Corrects current broken state where the combination of dependencies prevents resolution. Increases overall image size by ~100MB to include these dependencies Signed-off-by: Mike Fiedler <[email protected]> * lint: import Client directly to avoid mypy issue Refs: python/mypy#10360 Signed-off-by: Mike Fiedler <[email protected]> * test: update tests to reflect import change Signed-off-by: Mike Fiedler <[email protected]> * lint: rename variable, formatting Signed-off-by: Mike Fiedler <[email protected]> * Bump pyjwt[crypto] from 2.4.0 to 2.5.0 Bumps [pyjwt[crypto]](https://github.com/jpadilla/pyjwt) from 2.4.0 to 2.5.0. - [Release notes](https://github.com/jpadilla/pyjwt/releases) - [Changelog](https://github.com/jpadilla/pyjwt/blob/master/CHANGELOG.rst) - [Commits](jpadilla/pyjwt@2.4.0...2.5.0) --- updated-dependencies: - dependency-name: pyjwt[crypto] dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Mike Fiedler <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dustin Ingram <[email protected]>
Adds CLI and task support for initializing a TUF repository. Adds a service interface (IKeyService) with a filesystem-based implementation (LocalKeyService).
This work refactors the [Draft PR](pypi#7488) by @ woodruffw, to build a new repository tool on top of the Python-TUF Metadata API, and use it instead of the Python-TUF repository tool that was deprecated in v1.0.0. Part of pypi#10672 Signed-off-by: Kairo de Araujo <[email protected]>
This commit implements the unit tests for the TUF CLI. Signed-off-by: Kairo de Araujo <[email protected]>
Simplify the IKeyService that doesn't import public keys, only private keys are used during the flows. The repository state check is simplified. Signed-off-by: Kairo de Araujo <[email protected]>
Adjust the unit tests according to changes made in the codebase for adding the TUF implementation to the Warehouse. Signed-off-by: Kairo de Araujo <[email protected]>
Adds to ``warehouse/tuf/tasks`` the unit tests. Signed-off-by: Kairo de Araujo <[email protected]>
Added unit tests for tuf.services.LocalStorageService Signed-off-by: Kairo de Araujo <[email protected]>
Added unit tests for tuf.services.RepositoryService Signed-off-by: Kairo de Araujo <[email protected]>
Added unit tests for tuf.hash_bins Signed-off-by: Kairo de Araujo <[email protected]>
Added unit tests for warehouse.tuf.repository Signed-off-by: Kairo de Araujo <[email protected]>
Fix general linting for tests added and tuf services Signed-off-by: Kairo de Araujo <[email protected]>
As ``warehouse.tuf.repository`` is using typing, the mypy found some issues and it was fixed. Some tests improvements, added some monkeypatch and new asserts for call recorded using pretend. Signed-off-by: Kairo de Araujo <[email protected]>
Fix some required paramenters for running the development environment. Fix bug on LocalKeyStorage Signed-off-by: Kairo de Araujo <[email protected]>
Remove unnecessary TargetsPayload data structure and use the TargetFile from the Python TUF (python-tuf) Metadata API. The TargetsPayload was used to add hashed targets. However, a similar data structure is provided by python-tuf. Signed-off-by: Kairo de Araujo <[email protected]>
Remove function _make_fileinfo RepositoryService that is not used anywhere. Signed-off-by: Kairo de Araujo <[email protected]>
The RolesPayload was used by the TUF initialization (for development purposes) and during the Roles Delegations. The RolesPayload is no longer necessary during the TUF development initialization once all the configuration is available on request settings. During the Roles Delegations, it was replaced by the python-TUF data structure DelegatedRole, reusing it from ``tuf.repository``. Signed-off-by: Kairo de Araujo <[email protected]>
This commit adds a refactoring on the key signature used. Instead of using from Key Storage Service keys as a dictionary, uses that as a ``securesystemslib.signer.Signer``. It gives more flexibility and uses the same data structure across the services, repository and TUF. Signed-off-by: Kairo de Araujo <[email protected]>
Reduce complexity and lines of code by implementing all current TUF management code inside `RepositoryService`, and thus removing one level of abstraction, previously implemented by the `MetadataRepository` class. NOTE: This patch is marked WIP, as it has not removed all references to `MetadataRepository`, nor adopted the tests. Moreover, it still needs review in terms of correctness wrt PEP458. But the reduced complexity should make this easier. NOTE: (2) There is more potential for DRY code, see reoccurring `_bump_version; _bump_expiration; _sign; _persist;` and `_update_snapshot; _update_timestamp;` call chains. For this iteration of the patch, I chose verbosity/explicitness over saving a few more lines. But maybe both can be achieved. Signed-off-by: Lukas Puehringer <[email protected]>
This commit fixes some bug implementation from the last commit. This improves the internal functions that require the role names to work correctly once the role object has no explicit type (name), adds the SPEC_VERSION in the service, and the JSONSerializer for persisting the files with a better appearance. Signed-off-by: Kairo de Araujo <[email protected]>
The introduction of python-tuf 2.0.0 adds the feature of Succinct Delegation Roles as part of TAP15 (https://github.com/theupdateframework/taps/blob/master/tap15.md) This feature reduces the number of lines as the Hash Bins become built-in on python-tuf. All unit tests updated. Signed-off-by: Kairo de Araujo <[email protected]>
Missing tuf.url setting in the conftest for the app_config. Signed-off-by: Kairo de Araujo <[email protected]>
8058e54
to
f420476
Compare
Fixed typos in conftest and tuf/interfaces. Signed-off-by: Kairo de Araujo <[email protected]>
The current work on Warehouse is a refactoring from the existing PR Draft using the old
python-tuf
repository tools.The mentioned repository tools are no longer available on the latest version.
The idea is to start implementing small PRs to evaluate and progress.
Current status
Implemented the new repository tool using the new python-tuf Metadata API.
Repository tool Design overview:
The repository
warehouse.tuf.repository.MetadataRepository
managementtool was created standalone, not as
zope.interface.Interface
.The reason is that if a better Metadata Repository management implementation
is available in the future, it could be replaced easily.
Also, the
MetadataRepository
does not need to support differentimplementations such as SaaS once the
StorageService
is enough.The
IStorageService
was used in thewarehouse.tuf.repository.MetadataRepository
as backend storage.The
warehouse.cli.tuf
calls goes throughwarehouse.cli.tasks
, meaningthat all calls goes trough Celery.
Implemented Services/Interfaces for Storage and Storage Keys
Implemented TUF CLI
Tasks
It's is possible use the new
python-tuf
client (ngclient) to download thetarget.
Next steps:
Polish the new Warehouse metadata repository tool
Publish the new Draft PR
Goals:
PR to implement the simple detail index to the Metadata
PR to implement the TUF in the Warehouse request flow
Using the Warehouse development environment for TUF
Follow the official Warehouse until
make initdb
The metadata is available at http://localhost:9001/metadata/
You can also upload a file using the Warehouse and add the targets using CLI
twine