Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[leo_watchdog][leo_storage] Continuous warnings caused by disk usage make leo_mq stop #776

Closed
mocchira opened this issue Jun 22, 2017 · 2 comments

Comments

@mocchira
Copy link
Member

mocchira commented Jun 22, 2017

Found through the report on #725 (comment).
Now when warning/error get raised from leo_watchdog, leo_storage execute this code block https://github.com/leo-project/leofs/blob/1.3.4/apps/leo_storage/src/leo_storage_watchdog_sub.erl#L83-L99, as a result leo_mq stop to consume items anymore in case warning/error(s) keep happening around a few minutes. This behavior can be problematic if the warning/error caused by the disk usage as it may take time to ensure more disk capacity or erase unnecessary files.

What happen

batch of msgs get to 0 and will not go up again until the warning/error gone.

              id                |    state    | number of msgs | batch of msgs  |    interval    |                 description
--------------------------------+-------------+----------------|----------------|----------------|---------------------------------------------
 leo_delete_dir_queue           |   idling    | 0              | 0              | 2800           | remove directories
 leo_comp_meta_with_dc_queue    |   idling    | 0              | 0              | 2800           | compare metadata w/remote-node
 leo_sync_obj_with_dc_queue     |   idling    | 0              | 0              | 2750           | sync objs w/remote-node
 leo_recovery_node_queue        |   idling    | 0              | 0              | 2850           | recovery objs of node
 leo_async_deletion_queue       |   idling    | 0              | 0              | 2850           | async deletion of objs
 leo_rebalance_queue            |   idling    | 0              | 0              | 2850           | rebalance objs
 leo_sync_by_vnode_id_queue     |   idling    | 0              | 0              | 2850           | sync objs by vnode-id
 leo_per_object_queue           |   idling    | 0              | 0              | 2850           | recover inconsistent objs

Solution

We may have to divide leo_watchdog_disk into two

  • leo_watchdog_disk_util for checking the disk utilization via iostat
  • leo_watchdog_disk_usage for checking the disk usage via df

and define the handle_notify callback function for each of them and make the one for leo_watchdog_disk_usage not to control the rate to consume items in leo_mq.

@mocchira
Copy link
Member Author

leo-project/leo_watchdog#8 should fix this issue.

@yosukehara
Copy link
Member

DONE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants