You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An attempt was made on a cluster to run a very large aae_folderase_keys query. The fold ran ok in count mode, indicating 360M keys were available to be erased. However, when run with a change_mode of local, multiple nodes crashed.
The queue within the loop state of the riak_kv_eraser process is unbounded. It is expected that it might have to grow to a large value, as erase_keys folds that push to the queue may be fast, but the deletion process that consumes from the queue is slow. The references on the queue are small - but in this case 60M references were enough to cause memory allocation problems.
The issues is made worse as there is no format_status/2 function to restrict the logging of loop state when the process crashes - so any attempt to record the process crashing would have itself caused significant memory issues.
The riak_kv_reaper process has a similar issue - both an unbounded queue and a missing format_status/2 function.
The riak_kv_replrtq_src process has a bounded queue - but not format_status/2 function.
The text was updated successfully, but these errors were encountered:
Perhaps for eraser/reaper, rather than simply having a limit and discarding at the limit - disk_log could be used to persist when the limit has been reached, and then should the queue ever be empty a cache log logged erases could be read back from the disk_log.
This allows for very large jobs to be slowly worked on, without running into memory risks. The disk_log folder should be cleaned at startup (rather than potentially re-reading very old logged erases). the disk_log folder is intended to persist strictly for the purpose of preserving memory, not for surviving process restarts.
An attempt was made on a cluster to run a very large
aae_fold
erase_keys
query. The fold ran ok incount
mode, indicating 360M keys were available to be erased. However, when run with a change_mode oflocal
, multiple nodes crashed.The queue within the loop state of the
riak_kv_eraser
process is unbounded. It is expected that it might have to grow to a large value, as erase_keys folds that push to the queue may be fast, but the deletion process that consumes from the queue is slow. The references on the queue are small - but in this case 60M references were enough to cause memory allocation problems.The issues is made worse as there is no
format_status/2
function to restrict the logging of loop state when the process crashes - so any attempt to record the process crashing would have itself caused significant memory issues.The
riak_kv_reaper
process has a similar issue - both an unbounded queue and a missingformat_status/2
function.The
riak_kv_replrtq_src
process has a bounded queue - but notformat_status/2
function.The text was updated successfully, but these errors were encountered: