Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[leo_object_storage] More strictly checking the header, less file:pread(s) for reading a body #527

Closed
mocchira opened this issue Nov 19, 2016 · 7 comments

Comments

@mocchira
Copy link
Member

As there are only two checks (timestamp, dsize) for converting a binary to a header,
In many case unnecessary file:pread to retrieve its body happen during skipping a garbage.
These can be relaxed by checking the header more strictly ( like checking every fields ) .

@yosukehara
Copy link
Member

I've recognized this issue, every item should be checked to avoid remaining a broken object in a container I think.

@mocchira
Copy link
Member Author

mocchira commented Nov 22, 2016

UPDATE: Fixed clock to 4633552912011362 as it's not unix timestamp/microsec but the erlang specific one
UPDATE: Fixed msize to 4096 as it may include MDC related information
UPDATE: Fixed a part of spec based on AVS spec https://github.com/leo-project/leo_object_storage/blob/1.3.0/include/leo_object_storage.hrl#L226-L242

check specification in details

@yosukehara could you check the above spec?
I'll go ahead with this spec if no problem.

@yosukehara
Copy link
Member

@mocchira Thanks for summarizing the check items.

We actually use custom-metadata as two kind of uses, a metadata of the multi data center replication and user defined metadata(s). So we cannot fix length of the msize which is 2048 bytes but we can define the length, ( udf-metadata:2048 bytes + mdc-metadata.max_length)

@mocchira
Copy link
Member Author

@yosukehara thanks.

https://github.com/leo-project/leo_storage/blob/develop/src/leo_sync_remote_cluster.erl#L349 with length(CMeta) == 2048 should be the upper limit.
I'd measure the actual value and use it as the upper limit for msize.

@mocchira
Copy link
Member Author

(storage_0@127.0.0.1)3> R = leo_object_storage_transformer:list_to_cmeta_bin([
(storage_0@127.0.0.1)3> {cluster_id, hoge},
(storage_0@127.0.0.1)3> {num_of_replicas, 3},
(storage_0@127.0.0.1)3> {udm, crypto:rand_bytes(2048)}
(storage_0@127.0.0.1)3> ]).
<<131,108,0,0,0,4,104,2,100,0,10,99,108,117,115,116,101,
  114,95,105,100,100,0,4,104,111,103,101,104,...>>
(storage_0@127.0.0.1)4> byte_size(R).
2122
(storage_0@127.0.0.1)5>

so I'd like to choose 4096 for safe.

@mocchira
Copy link
Member Author

@yosukehara
as it turned out that os:timestamp/0 used in leo_date:clock/1 is NOT unix timestamp/microsecs ( around 10^3 times bigger than that ),
I'd adjust a bit.

@mocchira
Copy link
Member Author

mocchira commented Nov 24, 2016

Skipping time has gotten around 2x faster than the 1.3.0 by this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants