Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarification on the safety requirement for the Locker interface #196

Closed
zllovesuki opened this issue Aug 7, 2022 · 7 comments
Closed
Labels
question Further information is requested

Comments

@zllovesuki
Copy link

zllovesuki commented Aug 7, 2022

What is your question?

re: https://github.com/caddyserver/certmagic/blob/master/storage.go#L75
How is the Lock used for the purpose of certmagic? Is it used to save on redundant work (e.g. it is ok to try to renew a certificate a second time), or to guarantee correctness (e.g. absolutely no one else is able to update the certificate other than me otherwise the world catches on fire).

While the comment hints to me that it is the former, it is not at all clear to me that if certmagic relies on the correctness of the locking property.

For context, I'm trying to integrate certmagic in a personal project with an abstract KV interface. While the rest of the Storage interface is straightforward, I'm unsure about the guarantee that Locker provides (hence, expectations of the underlying implementation).

@zllovesuki zllovesuki added the question Further information is requested label Aug 7, 2022
@zllovesuki
Copy link
Author

This is potentially related to #58

@mholt
Copy link
Member

mholt commented Aug 7, 2022

Locker is not an optional interface, if that's what you're asking 🙂

An improper implementation may cause ACME challenges to fail because two instances are presenting different credentials for the same domain at the same time. Or two instances will successfully fail to share or coordinate the certificate they obtained, resulting in rate limit overages. It's also using for distributed STEK coordination, so if locking is not properly implemented, clients' TLS sessions will end and performance will decrease.

The underlying storage mechanism must support atomic operations (for example, S3 does not).

Does that help?

@zllovesuki
Copy link
Author

zllovesuki commented Aug 7, 2022

I understand that Locker is not optional 😂

Let me just give an example for better explaining. Suppose that the Storage interface is implemented using Cloudflare's Workers storage offerings, KV and/or Durable Objects, the requirement by certmagic of its Locker implementation may change which storage solution one chooses.

KV (https://developers.cloudflare.com/workers/learning/how-kv-works/):

  1. Write a fresh key in SFO
  2. Reading of the said same key will have the value immediately (no matter if you are reading from SFO, DFW, or FRA)

However,

  1. Write a fresh key in SFO
  2. Read the said key in another region
  3. Write to the same key again with a different value in SFO
  4. Now all bets are off. You can read old value, or new value, but eventually you see the new value

Durable Objects (https://developers.cloudflare.com/workers/learning/using-durable-objects/):
Provides serializable transaction on the said key, which is "atomic" in that sense

What is not clear to me is that,

  1. once certmagic invokes Lock(key), does the storage implementation need to guarantee Get(key) by non-lock holders will not see the Store(key) by the lock holder, until Unlock(key) is called (like BeginTX and CommitTx)?
  2. The underlying storage mechanism must support atomic operations (for example, S3 does not) I assume that it applies to Locker, but what about the surrounding Storage operations? Do you need to immediately Read-Your-Writes (Get/Store)? Or you can get more relaxed with Get/Store while providing stronger guarantee to Lock/Unlock?

You can imagine the situation where one would implement the Locker interface using Durable Objects, but other operations in the Storage interface using KV.

@mholt
Copy link
Member

mholt commented Aug 7, 2022

Ah I see, good questions.

once certmagic invokes Lock(key), does the storage implementation need to guarantee Get(key) by non-lock holders will not see the Store(key) by the lock holder, until Unlock(key) is called (like BeginTX and CommitTx)?

If I understand you right, it sounds like you are asking about Lock/Unlock being used to guard or synchronize Load/Store/etc.

Load/Store/etc should be blocking and synchronous on their own; meaning that when Store() returns, all future calls to Load() for a key should return the last value stored for that key.

Based on what you described about Cloudflare KV, it sounds like this would require waiting for propagation or something like that, before returning from an implementation of Store(). Because yeah, you wouldn't want future calls to Load() to return a previous value.

Lock and Unlock, though, are a little different. Think of them just like sync.Mutex. If CertMagic (or another importer of the code) wanted to sync a critical section, it would call Lock() to start, then Unlock() when it's done.

The underlying storage mechanism must support atomic operations (for example, S3 does not) I assume that it applies to Locker, but what about the surrounding Storage operations? Do you need to immediately Read-Your-Writes (Get/Store)? Or you can get more relaxed with Get/Store while providing stronger guarantee to Lock/Unlock?

So yes, callers should be able to immediately read their writes. Lock/Unlock are not expected to be used around synchronous calls to Load/Store. The caller doesn't know or care whether the underlying storage system is synchronous or asynchronous. If it's asynchronous, the Storage implementation should take care of that so it acts synchronous.

I'll try to clarify this in the godocs.

@zllovesuki
Copy link
Author

Thank you, just gotta make sure about the requirement of the Storage interface, because you know, as if implementing a distributed counter is easy 😂

related: https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html

@zllovesuki
Copy link
Author

Can we assume that if CertMagic calls Lock(abc), it is guarding mutual exclusion against Store(abc)? Asking because I want the implementation protect again expired lock because of arbitrary delay in network.

You can imagine that once it calls Lock(abc), the Store must also provides a token to ensure that it is still holding an up-to-date lock.

@mholt
Copy link
Member

mholt commented Aug 8, 2022

Can we assume that if CertMagic calls Lock(abc), it is guarding mutual exclusion against Store(abc)?

No; locks are basically named mutexes, they have nothing to do with individual files/keys. Usually a lock name represents a specific/unique operation or job that is being performed.

The storage implementation should do its own sync for file reads/writes. For example, it should block calls to Load() while Store() is happening for the same key/filename. (I believe standard system disk file systems do this sync for us, like those we use in calls to os.Open(). Databases also commonly do this as well.)

Lock/Unlock have as much to do with Store/Load as os.ReadFile()/os.WriteFile() have to do with sync.Mutex.Lock()/sync.Mutex.Unlock() (that is, virtually nothing).

Think of Lock/Unlock as on a different level from the sync needed to do Store/Load. With Lock/Unlock, you are providing a mutex as a service to the caller, regardless of whatever other methods are happening. With Store/Load, the caller doesn't know or care about the mutexing, they just expect that to immediately be able to Load() what was Store()'ed. They should be able to call Load/Store synchronously without corruption.

(I pushed a commit below that clarifies the docs, hopefully that is better. Feel free to continue with discussion or ask more questions if needed.)

@mholt mholt closed this as completed in 46a4436 Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants