Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[leo_storage] Improve the performance of the recover-node #474

Closed
yosukehara opened this issue Apr 15, 2016 · 5 comments
Closed

[leo_storage] Improve the performance of the recover-node #474

yosukehara opened this issue Apr 15, 2016 · 5 comments

Comments

@yosukehara
Copy link
Member

We've found the performance of the recover-node is not good so we need to improve its performance to immediately fix inconsistent objects.

In current LeoFS' version - v1.2.21, after executing the recover-node, leo_storage assigns recovery messages to all storage nodes to avoid lost inconsistent objects, which means a recovered object is duplicated which depends on a routing-table(ring).

I'll improve that leo_storage reduces recovery messages and is able to do consensus between storage-nodes to ensure solving inconsistency.

screencapture-222-230-139-43-13000-dashboard-db-leofs-dashboard-yosuke-1460723660525

@yosukehara yosukehara self-assigned this Apr 15, 2016
@yosukehara yosukehara added this to the 1.2.22 milestone Apr 15, 2016
yosukehara added a commit to leo-project/leo_storage that referenced this issue Apr 16, 2016
yosukehara added a commit to leo-project/leo_storage that referenced this issue Apr 16, 2016
@yosukehara
Copy link
Member Author

I've improved the performance but there is still room for improvement.
Tomorrow, I'll improve leo_mq's consumer to be able to increase a number of consumption messages per a minute more and more.

screencapture-222-230-139-43-13000-dashboard-db-leofs-dashboard-yosuke-1460986829452

yosukehara added a commit to leo-project/leo_mq that referenced this issue Apr 21, 2016
@windkit
Copy link
Contributor

windkit commented Apr 21, 2016

grafana_recover6
Now the recovery process is more or less bound by disk I/O

@yosukehara
Copy link
Member Author

yosukehara commented Apr 21, 2016

The performance has been dramatically increased. 1M objects was recovered from 12hours to 7.5hours, finally 20min.

@yosukehara
Copy link
Member Author

In this morning, we've tested the recover-node again. There is no issue. So I've closed this.

@yosukehara
Copy link
Member Author

We've found a leo_mq's issue as below:

[E] storage_1@127.0.0.1 2016-05-10 12:49:08.421060 +0900    1462852148  
        leo_mq_server:handle_call/3 203 
        {noproc,{gen_server,call,[mq_persistent_node_message_0,first,30000]}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants