Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

callback support for checktime timer expiry #7803

Merged
merged 4 commits into from
Aug 30, 2019
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 12 additions & 5 deletions libraries/chain/include/eosio/chain/platform_timer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,12 @@ struct platform_timer {

/* Sets a callback for when timer expires. Be aware this could might fire from a signal handling context and/or
on any particular thread. Only a single callback can be registered at once; trying to register more will
result in an exception. Setting to nullptr is allowed as a "no callback" hint */
result in an exception. Setting to nullptr disables any current set callback */
void set_expiry_callback(void(*func)(void*), void* user) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to remain consistent with the name of the member variable and the call_expiration_callback method this should probably be renamed to set_expiration_callback

EOS_ASSERT(!(func && _expiration_callback), misc_exception, "Setting a platform_timer callback when one already exists");
EOS_ASSERT(!(func && __atomic_load_n(&_callback_enabled, __ATOMIC_CONSUME)), misc_exception, "Setting a platform_timer callback when one already exists");
_expiration_callback = func;
Copy link
Contributor

@b1bart b1bart Aug 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that there is more than just the one "Race" that you mentioned in the PR here. Given FuncT1 DataT1 FuncT2 DataT2 (where one pair may be nullptr) Depending on how the compiler orders these instructions on both set_expiry_callback and call_expiration_callback you can at least get any of the following cases:

if (FuncT1) FuncT1(DataT1);
if (FuncT1) FuncT2(DataT1)
if (FuncT1) FuncT1(DataT2)
if (FuncT1) FuncT2(DataT2)
if (FuncT2) FuncT2(DataT1)
if (FuncT2) FuncT2(DataT2)

I think I understand in that in the current use case where T2 is practically going to be nullptr that boils down to:
if (FuncT1) FuncT1(DataT1);
if (FuncT1) nullptr(DataT1)
if (FuncT1) FuncT1(nullptr)
if (FuncT1) nullptr(nullptr)
if (nullptr)...
if (nullptr)...

so, 2 out of 6 of those are crashes, 1 out of 6 calls FuncT1 in an unexpected way. 1 out of 6 calls FuncT1 "spuriously" (the noted race in the PR) and the remaining 2 operate as expected.

We can improve this dramatically if we can guarantee that the pair is atomically set and queried. Does the signal handling context allow us to use atomic primitives like CAS?

Or can we double buffer the function+data pair with some sort of data barrier and then "swap" the active buffer with a single index update?

_expiration_callback_data = user;
__atomic_store_n(&_callback_enabled, !!_expiration_callback, __ATOMIC_RELEASE);
}

volatile sig_atomic_t expired = 1;
Expand All @@ -32,11 +33,17 @@ struct platform_timer {
fc::fwd<impl,fwd_size> my;

void call_expiration_callback() {
if(_expiration_callback)
_expiration_callback(_expiration_callback_data);
if(__atomic_load_n(&_callback_enabled, __ATOMIC_CONSUME)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt we'd ever see this as problematic given our use case but, as these are fences, not locks there is no guarantee that between line 36 and 37 and 38, we don't see different states of both _expiration_callback and _expiration_callback_data. Consider that if a thread "pauses" after each line and another calls the appropriate permutations of set_expiry_callback we could get the same set of problem cases as before and still have _callback_enabled reset by line 39.

I think something like a 4 stage atomic enumerated int (unset, writing, set, reading) with:

  • set_expiry_callback spinlock until it gets a definitive "set" and fails or can CAS an "unset" for a "writing" and upates the values with a memory fence atomic write to "set" at the end.
  • a specialization clear_expiry_callback spinlock until it gets a definitive "unset" and fails or can CAS a "set" for an "unset"
  • and a call_expiry_callback that doesnt spinlock it opportunistically tries once to CAS a "set" for a "reading" and then pull out the values, atomically write back to "set" with a fence and call the callback. Any other result from the CAS is treated as "unset".

reduces our possible damage to

  • the race between "releasing" the spinlock in call_expiry_callback and the consistent but perhaps stale callback being called. We could alleviate this by calling the callback holding the spin lock but that may very well deadlock.
  • the race between two call_expiry_callback invocations that cause one to erroneously fail as it treats a conflicting "reading" value as an "unset" and therefore does nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no guarantee that between line 36 and 37 and 38 ..

Yeah it's a variation of an ABA problem; but it would require someone to do set_callback(nullptr), set_callback(0x...) back to back right? Something that is pretty far fetched for our use case (it would effectively be an action ending and starting without anything in between).

I have no qualms about hardening this further; as the complexity increases it starts to smell more like something that should be improved with a 128-bit CAS for at least x86_64 though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something that should be improved with a 128-bit CAS for at least x86_64 though

As long as we have a base implementation that is safe, I support platforms dabbling with faster versions (which should also be safe) where available. In this case we just need 128bit atomic load/store right? It would effectively become last-write-wins but with the concession that we are only trying to guarantee consistency of callback/data pointers thats "safe"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, just need an atomic 128bit load/store to meet the same safety criteria it seems. Which surprisingly (to me) on x86_64 only means CMPXCHG16B -- from what I can from docs even wide SSE/AVX load/stores are not guaranteed to be atomic.

anyways, it's worth benchmarking the impact of the base impl before optimizing this immediately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once again, I've failed at PR review, you were right to need a 128 bit CAS as the result of the comparison allows you to maintain the expectation that you cannot overwrite an existing callback (that throws)

void(*cb)(void*) = _expiration_callback;
void* cb_data = _expiration_callback_data;
if(__atomic_load_n(&_callback_enabled, __ATOMIC_CONSUME))
cb(cb_data);
}
}

void(*_expiration_callback)(void*) = nullptr;
bool _callback_enabled = false;
static_assert(__atomic_always_lock_free(sizeof(_callback_enabled), 0), "eosio requires an always lock free bool type");
void(*_expiration_callback)(void*);
void* _expiration_callback_data;
};
Copy link
Contributor

@heifner heifner Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a "carries a dependency" relationship here between _callback_enabled and _expiration_callback so __ATOMIC_CONSUME will not work correctly. Did you mean __ATOMIC_ACQUIRE ? With atomic acquire this should work just fine and the second __atomic_load_n is not needed.

This https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html indicates __ATOMIC_CONSUME is implemented as __ATOMIC_ACQUIRE anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.cppreference.com/w/cpp/atomic/memory_order

The specification of release-consume ordering is being revised, and the use of memory_order_consume is temporarily discouraged. (since C++17)


Expand Down