Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fetch_by_key_in_parallel to fix #764 #22

Merged
merged 4 commits into from
Jan 16, 2018

Conversation

mocchira
Copy link
Member

No description provided.

@mocchira
Copy link
Member Author

mocchira commented Jan 10, 2018

@yosukehara Also please check the comment here: leo-project/leofs#764 (comment)

@windkit Can you benchmark the delete-bucket performance on RIT env with the patch below for leo_storage.

diff --git a/apps/leo_storage/src/leo_storage_handler_object.erl b/apps/leo_storage/src/leo_storage_handler_object.erl
index bad8275..a94aee9 100644
--- a/apps/leo_storage/src/leo_storage_handler_object.erl
+++ b/apps/leo_storage/src/leo_storage_handler_object.erl
@@ -1076,7 +1076,7 @@ prefix_search_and_remove_objects(MQId, ParentDir, EnqueuedAt) ->
                           end
                   end
           end,
-    case catch leo_object_storage_api:fetch_by_key(ParentDir, Fun) of
+    case catch leo_object_storage_api:fetch_by_key_in_parallel(ParentDir, Fun, undefined) of
         {'EXIT', Cause} ->
             {error, Cause};
         {ok,_} ->

It should be almost same as without PR however I have a bit concern especially in case the number of AVS files is huge (256 - 512)

@yosukehara
Copy link
Member

@mocchira WIP

@windkit
Copy link
Contributor

windkit commented Jan 11, 2018

@mocchira When I try to compile with

  1. Get current develop branch of leofs
  2. update the apps/leo_storage/src/leo_storage_handler_object.erl
  3. Get this pull request on deps/leo_object_storage with git pull origin pull/22/head

and I cannot put any object to LeoFS. Do you think any things I am missing here?

[W]     [email protected]       2018-01-11 09:22:49.343422 +0900        1515630169      leo_storage_replicator:replicate_fun/2  249     [{key,<<"test/test">>},{node,local},{req_id,127006226},{cause,{noproc,{gen_server,call,[undefined,{get,<<"test/test">>},30000]}}}]
[E]     [email protected]       2018-01-11 09:22:49.352349 +0900        1515630169      leo_storage_handler_object:put/2        388     [{from,gateway},{method,put},{key,<<"test/test">>},{req_id,127006226},{cause,"Replicate failure"}]

@mocchira
Copy link
Member Author

@windkit how about running ./git_checkout develop between step 1 and 2?

@windkit
Copy link
Contributor

windkit commented Jan 11, 2018

@mocchira Thank you, now it is working.

@windkit
Copy link
Contributor

windkit commented Jan 11, 2018

@mocchira From my testing, the initial enqueuing phase is quicker, but afterwards the consumption phase seems to be run in the same speed, does the above commit change that?

@windkit
Copy link
Contributor

windkit commented Jan 11, 2018

Version 1.3.8

delete_bucket_138

PR (before the d17c49e

delete_bucket_180111_1200

@mocchira
Copy link
Member Author

@windkit

@mocchira From my testing, the initial enqueuing phase is quicker, but afterwards the consumption phase seems to be run in the same speed, does the above commit change that?

It should keep the same behavior. (maybe initial enqueuing phase gets a bit slow down due to the additional cpu cost to make the distribution as uniform as possible although quicker than 1.3.8)

anyway, there is still room to make the distribution more uniform by tuning some hard coded values so I'll ask you the benchmark again once the final optimization finished.

@mocchira
Copy link
Member Author

@windkit Please benchmark again with the latest PR. also just in case let me confirm that what is set to num_of_containers? (256 or 512 would be desirable as I mentioned above)

@windkit
Copy link
Contributor

windkit commented Jan 12, 2018

PR (avs, 256)

delete_bucket_180112_1400

@mocchira I am not seeing much difference with number of avs = 8 and 1.3.8

@mocchira
Copy link
Member Author

@windkit Thanks. It looks good to me.
@yosukehara please start to review.

@windkit
Copy link
Contributor

windkit commented Jan 15, 2018

@mocchira it makes me wonder where is the bottleneck here, neither CPU / IO seems to be fully utilized.

@mocchira
Copy link
Member Author

@windkit The original problem can be found on here leo-project/leofs#725 (comment).
In short, This PR tries to solve not only fully utilizing CPU/IO but also making timeout/slow processing for requests from gateway happen as less often as possible by distributing delete-bucket messages evenly into each leo_object_storage_server (and corresponding leo_backend_db_server).
Let me know if you have any question.

Copy link
Member

@yosukehara yosukehara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed this PR. Can you confirm the comment?

end,
NewCounter = Counter + 1,
erlang:put(counter, NewCounter),
case NewCounter rem 5 of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5, as a trivial thing, I think that its value should be a constant.

@mocchira
Copy link
Member Author

@yosukehara Fixed. PTAL.

@yosukehara yosukehara merged commit f2a565c into leo-project:develop Jan 16, 2018
@yosukehara
Copy link
Member

@mocchira Merged, thanks

@windkit
Copy link
Contributor

windkit commented Jan 17, 2018

@mocchira but the result still shows that neither CPU nor IO got fully utilized, are there other bottlenecks?

@mocchira
Copy link
Member Author

@windkit Yes probably software locks happened on eleveldb thread pool management/erlang runtime and/or OS thread scheduling especially for ones spawned by eleveldb thread pool. I think we can improve CPU/IO utilization in case dealing with many AVS files by tuning https://github.com/basho/eleveldb/blob/develop/c_src/eleveldb.cc#L223 which has been used as default(71) on leofs so changing it to the number of AVS files could improve CPU/IO utilization.
I will file it as another issue, thanks.

@mocchira
Copy link
Member Author

filed on leo-project/leofs#970

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants