[leo_watchdog][leo_storage] Continuous warnings caused by disk usage make leo_mq stop #776

mocchira · 2017-06-22T07:59:42Z

Found through the report on #725 (comment).
Now when warning/error get raised from leo_watchdog, leo_storage execute this code block https://github.com/leo-project/leofs/blob/1.3.4/apps/leo_storage/src/leo_storage_watchdog_sub.erl#L83-L99, as a result leo_mq stop to consume items anymore in case warning/error(s) keep happening around a few minutes. This behavior can be problematic if the warning/error caused by the disk usage as it may take time to ensure more disk capacity or erase unnecessary files.

What happen

batch of msgs get to 0 and will not go up again until the warning/error gone.

              id                |    state    | number of msgs | batch of msgs  |    interval    |                 description
--------------------------------+-------------+----------------|----------------|----------------|---------------------------------------------
 leo_delete_dir_queue           |   idling    | 0              | 0              | 2800           | remove directories
 leo_comp_meta_with_dc_queue    |   idling    | 0              | 0              | 2800           | compare metadata w/remote-node
 leo_sync_obj_with_dc_queue     |   idling    | 0              | 0              | 2750           | sync objs w/remote-node
 leo_recovery_node_queue        |   idling    | 0              | 0              | 2850           | recovery objs of node
 leo_async_deletion_queue       |   idling    | 0              | 0              | 2850           | async deletion of objs
 leo_rebalance_queue            |   idling    | 0              | 0              | 2850           | rebalance objs
 leo_sync_by_vnode_id_queue     |   idling    | 0              | 0              | 2850           | sync objs by vnode-id
 leo_per_object_queue           |   idling    | 0              | 0              | 2850           | recover inconsistent objs

Solution

We may have to divide leo_watchdog_disk into two

leo_watchdog_disk_util for checking the disk utilization via iostat
leo_watchdog_disk_usage for checking the disk usage via df

and define the handle_notify callback function for each of them and make the one for leo_watchdog_disk_usage not to control the rate to consume items in leo_mq.

mocchira · 2017-08-25T08:01:02Z

leo-project/leo_watchdog#8 should fix this issue.

yosukehara · 2017-08-29T08:04:06Z

DONE

mocchira added Bug Priority-MIDDLE survey _leo_storage _leo_watchdog labels Jun 22, 2017

mocchira added this to the 1.4.0 milestone Jun 22, 2017

mocchira assigned yosukehara and mocchira Jun 22, 2017

mocchira mentioned this issue Jun 22, 2017

Deleting bucket eventually fails and makes delete queues stuck #725

Open

yosukehara modified the milestones: 1.3.6, 1.4.0 Aug 25, 2017

yosukehara closed this as completed Aug 29, 2017

yosukehara added the v1.3 label Aug 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[leo_watchdog][leo_storage] Continuous warnings caused by disk usage make leo_mq stop #776

[leo_watchdog][leo_storage] Continuous warnings caused by disk usage make leo_mq stop #776

mocchira commented Jun 22, 2017 •

edited

Loading

mocchira commented Aug 25, 2017

yosukehara commented Aug 29, 2017

[leo_watchdog][leo_storage] Continuous warnings caused by disk usage make leo_mq stop #776

[leo_watchdog][leo_storage] Continuous warnings caused by disk usage make leo_mq stop #776

Comments

mocchira commented Jun 22, 2017 • edited Loading

What happen

Solution

mocchira commented Aug 25, 2017

yosukehara commented Aug 29, 2017

mocchira commented Jun 22, 2017 •

edited

Loading