-
Notifications
You must be signed in to change notification settings - Fork 3.8k
callback support for checktime timer expiry #7803
Conversation
result in an exception. Setting to nullptr is allowed as a "no callback" hint */ | ||
void set_expiry_callback(void(*func)(void*), void* user) { | ||
EOS_ASSERT(!(func && _expiration_callback), misc_exception, "Setting a platform_timer callback when one already exists"); | ||
_expiration_callback = func; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that there is more than just the one "Race" that you mentioned in the PR here. Given FuncT1
DataT1
FuncT2
DataT2
(where one pair may be nullptr
) Depending on how the compiler orders these instructions on both set_expiry_callback
and call_expiration_callback
you can at least get any of the following cases:
if (FuncT1) FuncT1(DataT1);
if (FuncT1) FuncT2(DataT1)
if (FuncT1) FuncT1(DataT2)
if (FuncT1) FuncT2(DataT2)
if (FuncT2) FuncT2(DataT1)
if (FuncT2) FuncT2(DataT2)
I think I understand in that in the current use case where T2 is practically going to be nullptr that boils down to:
if (FuncT1) FuncT1(DataT1);
if (FuncT1) nullptr(DataT1)
if (FuncT1) FuncT1(nullptr)
if (FuncT1) nullptr(nullptr)
if (nullptr)...
if (nullptr)...
so, 2 out of 6 of those are crashes, 1 out of 6 calls FuncT1
in an unexpected way. 1 out of 6 calls FuncT1
"spuriously" (the noted race in the PR) and the remaining 2 operate as expected.
We can improve this dramatically if we can guarantee that the pair is atomically set and queried. Does the signal handling context allow us to use atomic primitives like CAS?
Or can we double buffer the function+data pair with some sort of data barrier and then "swap" the active buffer with a single index update?
Good points. We could probably mitigate much of the reordering by simply asking the compiler not to. But, since we're dealing with two pointers here there will still be the possibility of mixing something up. CAS should be safe here but I'm curious what construct you have in mind. Anything that spins, like a mutex, could result in a deadlock if the signal is delivered to the same thread that is trying to modify the pointers. This is potentially solvable by delivering the timer signal to a dedicated thread. But doing that is an abstraction nightmare -- in POSIX threads inherit the signal mask of the creating thread. This means we'd have to mask off the timer signal early in main() before net threads, chain threads, etc get created. Another possibility here is to do an atomic read/write of 16 bytes probably via a CAS of 16 bytes. It's not clear to be how easily this is done in gcc/clang. For example, on x86_64 this is the CMPXCHG16B instruction. But on ARM8?? I tried compiling the following std::cout << std::atomic<__int128>::is_always_lock_free << std::endl; This returns |
yeah, I'm not sure what the CAS construct would be that is safe. I think the safe "baseline" implementation may be double buffering as it should be pretty easy (maybe slow) to guarantee atomic updates of the pair and to put a memory barrier in on platforms that need it (so that the updates to the func and data precede the update to the index). If we want to get fancy on platforms that have (pointer-bits * 2) bit atomic writes then, I'm okay with it. Having a dirt simple fallback is worthwhile. |
… callback pointers
I took a stab at "double buffering"; which can be simplified because the code never lets you overwrite the callback with a different callback. This code still probably gets an F in a concurrency textbook but I believe it no longer has the defect where only 1 of the 2 variables may be set or weirdness due to reordering. I had to use __sync constructs because std::atomic<> was always resolving to a function call for me 😠 |
@@ -32,11 +33,17 @@ struct platform_timer { | |||
fc::fwd<impl,fwd_size> my; | |||
|
|||
void call_expiration_callback() { | |||
if(_expiration_callback) | |||
_expiration_callback(_expiration_callback_data); | |||
if(__atomic_load_n(&_callback_enabled, __ATOMIC_CONSUME)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt we'd ever see this as problematic given our use case but, as these are fences, not locks there is no guarantee that between line 36 and 37 and 38, we don't see different states of both _expiration_callback
and _expiration_callback_data
. Consider that if a thread "pauses" after each line and another calls the appropriate permutations of set_expiry_callback
we could get the same set of problem cases as before and still have _callback_enabled
reset by line 39.
I think something like a 4 stage atomic enumerated int (unset, writing, set, reading) with:
set_expiry_callback
spinlock until it gets a definitive "set" and fails or can CAS an "unset" for a "writing" and upates the values with a memory fence atomic write to "set" at the end.- a specialization
clear_expiry_callback
spinlock until it gets a definitive "unset" and fails or can CAS a "set" for an "unset" - and a
call_expiry_callback
that doesnt spinlock it opportunistically tries once to CAS a "set" for a "reading" and then pull out the values, atomically write back to "set" with a fence and call the callback. Any other result from the CAS is treated as "unset".
reduces our possible damage to
- the race between "releasing" the spinlock in
call_expiry_callback
and the consistent but perhaps stale callback being called. We could alleviate this by calling the callback holding the spin lock but that may very well deadlock. - the race between two
call_expiry_callback
invocations that cause one to erroneously fail as it treats a conflicting "reading" value as an "unset" and therefore does nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no guarantee that between line 36 and 37 and 38 ..
Yeah it's a variation of an ABA problem; but it would require someone to do set_callback(nullptr), set_callback(0x...) back to back right? Something that is pretty far fetched for our use case (it would effectively be an action ending and starting without anything in between).
I have no qualms about hardening this further; as the complexity increases it starts to smell more like something that should be improved with a 128-bit CAS for at least x86_64 though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something that should be improved with a 128-bit CAS for at least x86_64 though
As long as we have a base implementation that is safe, I support platforms dabbling with faster versions (which should also be safe) where available. In this case we just need 128bit atomic load/store right? It would effectively become last-write-wins but with the concession that we are only trying to guarantee consistency of callback/data pointers thats "safe"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, just need an atomic 128bit load/store to meet the same safety criteria it seems. Which surprisingly (to me) on x86_64 only means CMPXCHG16B -- from what I can from docs even wide SSE/AVX load/stores are not guaranteed to be atomic.
anyways, it's worth benchmarking the impact of the base impl before optimizing this immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once again, I've failed at PR review, you were right to need a 128 bit CAS as the result of the comparison allows you to maintain the expectation that you cannot overwrite an existing callback (that throws)
@@ -17,11 +17,12 @@ struct platform_timer { | |||
|
|||
/* Sets a callback for when timer expires. Be aware this could might fire from a signal handling context and/or | |||
on any particular thread. Only a single callback can be registered at once; trying to register more will | |||
result in an exception. Setting to nullptr is allowed as a "no callback" hint */ | |||
result in an exception. Setting to nullptr disables any current set callback */ | |||
void set_expiry_callback(void(*func)(void*), void* user) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to remain consistent with the name of the member variable and the call_expiration_callback
method this should probably be renamed to set_expiration_callback
bool _callback_enabled = false; | ||
static_assert(__atomic_always_lock_free(sizeof(_callback_enabled), 0), "eosio requires an always lock free bool type"); | ||
void(*_expiration_callback)(void*); | ||
void* _expiration_callback_data; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a "carries a dependency" relationship here between _callback_enabled
and _expiration_callback
so __ATOMIC_CONSUME
will not work correctly. Did you mean __ATOMIC_ACQUIRE
? With atomic acquire this should work just fine and the second __atomic_load_n
is not needed.
This https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html indicates __ATOMIC_CONSUME
is implemented as __ATOMIC_ACQUIRE
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.cppreference.com/w/cpp/atomic/memory_order
The specification of release-consume ordering is being revised, and the use of memory_order_consume is temporarily discouraged. (since C++17)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I sent you down a rabbit hole and overcomplicated things with the 4-state suggestion. I apologize. It seems like a simple 2-state lock would be simpler to reason about and likely safer.
I like that the "read" in call_expiration_callback
is optimistic and non-blocking/spinning.
the "write" in set_expiration_callback
should probably just spin until it can CAS the state from "unlocked" to "locked", do all of the checks and possibly manipulation, store the state back to "unlocked" and then throw
any errors it would have thrown in the middle. We get no benefit from the read/write paradigm I suggested and its muddied the waters significantly.
I plead guilty to the crimes of over-engineering and solutioning in PRs
callback_state current_state; | ||
while((current_state = _callback_state.load(std::memory_order_acquire)) == READING) {} //wait for call_expiration_callback() to be done with it | ||
if(func) | ||
EOS_ASSERT(current_state == UNSET, misc_exception, "Setting a platform_timer callback when one already exists"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this assert will sometimes fire and sometimes not, such that you cannot make assumptions in the remaining code that _callback_state
is eitherSET
or func
is nullptr
as the _callback_state
can change between this line and later lines where it is modified.
return; | ||
|
||
//prepare for loading in new values by moving from UNSET->WRITING | ||
while(!atomic_compare_exchange_strong(&_callback_state, ¤t_state, WRITING)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble working out the control flow in here. There are ideally two expectations here: we are "clearing" the callback and the _callback_state
is SET
or we are "setting" the callback and _callback_state is
UNSET`
After the first iteration of the loop, we change our expected _callback_state
to UNSET
even if we are going to clear. The code after is safe but wont this spinlock spin until something else clears the callback if it misses its first time through the loop (say if a callback fired at a bad time)?
At this point it's probably easier & safer to just use the threaded approach. Since we could safely use a real lock it would allow solving all problems (even prevent late firing of the callback). |
Unfortunately using a dedicated thread for signal handling causes high variation in timer quality. Probably due to a dormant thread needing to be completely woke up and scheduled when the timer expires. No go here. |
Change Description
When a transaction's checktimer timer expires, call a user specified function. This can allow potential future wasm runtimes to be notified of expiry in a very low overhead manner (they don't need to poll expiry).
This PR tries to KISS and just adds support for a single callback by storing two
void*
s in the platform_timer. Since the callback could be called within a signal handling context I didn't want to complicate manners withstd::function
s or anything that required resizing to support more than one.Usage could be something like this:
You will notice what technically amounts to a very slight race condition here. If the timer expires at the very same time that the callback is attempting to be disabled (by calling
set_expiry_callback(nullptr, nullptr)
) it's conceivable that the callback will still be called even though it has been disabled at that point.It's not clear what the best way is to solve that problem. The expiry code being run from a signal handling context ties my hands on many typical solutions. Any thoughts on this are welcome.
However, this issue isn't a problem for potential new wasm runtime given the current state of single threaded execution -- slight late execution of the callback even after disabling it doesn't cause a problem (as long it isn't so late that it is called after the next transaction's wasm starts executing, which would be absurd in practice). In fact, the only reason potential new wasm runtime needs to disable the callback at all is just to make platform_timer's assert() happy that no one is trying to pile up more than one callback.
Consensus Changes
API Changes
Documentation Additions