Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CT Checkpoint updater #198

Open
phbnf opened this issue Sep 3, 2024 · 4 comments
Open

CT Checkpoint updater #198

phbnf opened this issue Sep 3, 2024 · 4 comments
Assignees
Labels
question Further information is requested
Milestone

Comments

@phbnf
Copy link
Contributor

phbnf commented Sep 3, 2024

Towards #88

CT checkpoints contain a timestamp, used amongst other things, to measure CT logs MMD. That means that a checkpoint needs to be updated every X amount of time, irregardless of whether or not new entries have been integrated.

Today, Tessera only updates checkpoints (with a new timestamp) when new entries are integrated. We need a new mechanism to allow CT logs to update their checkpoints, even if no new entries have been integrated.

Option 1: Support this in personalities only, with a fully decoupled checkpoint writer.

  • That means that there will be at least two checkpoints writer: one from Tessera, one from the personality. We need a mechanism to ensure that Tessera's and the personality's checkpoint writers don't step on each other. Otherwise, a personality checkpoint writer might rollback a Tessera checkpoint update.
    • For buckety storage systems supporting preconditions, we could rely on preconditions
    • For non buckety storage systems, that might be more of a problem, and might require the personality to get a lock on Tessera's internal storage systems.
  • Another downside is that a personality would have to fiddle with Tessera's internal storage, which, even if it's just the checkpoint, isn't great.
  • If there are multiple servers, there will be multiple Tessera checkpoints writers as well, which is fine since Tessera make sure that they don't step on one another.
  • There would also be multiple personality checkpoints, but it's less of a problem if these step over each other. They could be staggered over time for instance. Otherwise, preconditions - when they are a feature, would help.

Option 2: Support this as a Tessera option

  • This is really a CT requirement (other checkpoints don't include timestamps), so it should probably be a ct_only option.
  • Every Tessera storage system implementation would implement its own updater, and handle locks, shall they want to support this option. We might be able to provide a mixin, but it's not 100% straightforward.

Tessera's Alpha release will support CT on AWS and GCP, both of which use buckety storage systems, and allow for preconditions (as of a few days for AWS! https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/). So option 1 should be enough to start with. But given the known limitations, we might as well go for option 2 directly?

I'm happy to work on either implementation, but I'm interested to know what others think about this.

@phbnf phbnf added the question Further information is requested label Sep 3, 2024
@mhutchinson
Copy link
Contributor

I'll just float an in-between option:

Option 3

  • Tessera has a function CreateCheckpoint()
  • Personalities can call this at whatever interval their policy dictates where a new checkpoint needs to be created

I'm not in love with this option, but it avoids the personality needing to own the write logic for checkpoints (option 1), and it avoids business policy logic needing to find its way into Tessera (option 2).

@AlCutter
Copy link
Collaborator

AlCutter commented Sep 3, 2024

I'm not a fan of these options as they stand:

  1. Breaks the division of responsibilities between Tessera and the personality (Tessera owns the checkpoint resource) and is a layering violation (personalities should not have to know details of how writes are accomplished in the relevant Tessera storage backend).

  2. As you say, this pushes complexity into Tessera which shouldn't be there as it's a CT thing.

Another option could be to expose method which personalities can use to request a new checkpoint to be generated. e.g. Integrate(alwaysWriteCP bool) which triggers an immediate integration (rather than waiting for the next scheduled one), and passing true requests that a new checkpoint is written even if the tree doesn't grow as a result of the integrate.

@phbnf
Copy link
Contributor Author

phbnf commented Sep 3, 2024

@mhutchinson : Do you mean exposing this as a Tessera API on top of Add()? Why would we want to export this method for all clients, if the only use case is CT?

@roger2hk
Copy link
Contributor

roger2hk commented Sep 3, 2024

Option 1 makes the checkpoint update logic very complicated to handle the race condition. A wild idea is to separate the personality checkpoint and the tessera checkpoint, so you don't have to worry about the race condition but it is very likely to have some delay and consistency issue.

Option 2 may work if there are more personalities use cases.

Option 3 and 4 look pretty similar to expose a new method (with or without integration). They're reasonable to me.

I don't have a better suggestion at the moment. Reusing the storage.Add() to pass an empty entry is possible if we don't want to introduce/expose a new method. However, it overloads the method with too many logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants