-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors about multipart object parts on storages during upload #845
Comments
WIP |
@vstax Thanks for reporting this problem. The sequence of multipart upload(MU) on leo_gateway
The point is 3 are guaranteed to happen subsequently to 1 on leo_gateway however this order can be inverted chronologically on leo_storage (especially on secondary/third replica) as those requests get proceeded in async manner. Those behavior causes
not found error happen at first as 1 was not reached on leo_storage then result in
as 1 was reached after 3 proceeded on leo_storage and the temporary object suffixed by UploadID has been remained. As the permanent fix is difficult (ensuring the causality between 1 and 3 is needed and that means we need to implement some consensus algorithm), now we are considering the fix decreasing the odds by executing to remove a temporary object after confirming the checksum.
Yes, 5 extra ones are
|
@mocchira Thank you for analyzing. Glad to know this won't affect the real data. A few questions:
doesn't do anything. |
Maybe (however IMO, cost over the benefit).
Once the above fix is landed, the odds you could see inconsistencies should dramatically decrease because confirming the checksum needs N times (N = the number of chunks of a large object) round trip(s) between leo_gateway and leo_storage(s) so that I'd recommend you to wait for the fix without adopting PTP.
Hmm recover-file should work even if the target is a temporary object so I will vet further. |
@vstax as you may know, this has been fixed (precisely the odds you could see inconsistencies get decreased) so give it a try if you have time. |
@mocchira Thank you, I will (I need to finish recover-node experiments before wiping the data, so this will have to wait a bit). We also have PTP now so it won't be the same experiment exactly, but eventually I will be uploading much more data so it should get plenty of testing. Got a question about recover-file though, is it supposed to work or I am doing it wrong way? If needed, I can provide results of get/head API calls directly like in the other ticket. |
Got it.
recover-file should work against temporary objects. leofs-adm recover-file "body/12/bf/0d/12bf0db7d8bcdf91a3a42ba97867bf6b785b0113754238dce5a4952524cb13182a7a2e64cfe7b3614a4c4551f0db4adf0002250100000000.xz\na7c6516a2e15f44a0e037043c88736d8" |
@mocchira Nope, it doesn't work.. nothing happens. EDIT: works just fine now with the latest patches. |
I'm uploading data to production cluster (6 storage servers, N=3, W=2, R=1). It's latest develop version (with latest leo_object_storage as well). The code and logic is exactly the same as #722 - python code walks through filesystem, for each objects it finds it executes "HEAD" to see if it's on storage, if it's not, it executes "PUT" and uploads the object. It's using boto3. In this experiment, there were no objects so it's always PUT after HEAD. There is no other load on cluster other than uploading. Upload is performed in parallel - 6 processes, but boto3's threads for multipart uploads are disabled, each upload works in single thread.
(the uploaded data can be scrapped and I can upload it again, it's not a problem at this point. I can repeat experiment after changing some settings, if needed)
I'm having errors on storage nodes and object state isn't consistent, however the alarming part is that there are no errors on client, I'm getting "200" result for everything. Client assumes that these objects were safely uploaded. Retries are disabled in boto3 - so apparently it really doesn't get any error at all.
I'm having errors on storage nodes. Log on bodies03:
Object status:
Log on bodies05:
Objects status:
Main part of these (multipart) objects is always fine:
(same for any other)
There are lot more objects like that on all nodes, actually. For some reason all examples I've checked have bodies02 as primary node. There are lots of similar errors in bodies02 log (looks like all errors on each other node have corresponding error in bodies02 log), so grepped just related to these objects. The rest of errors (about other objects) look exactly the same. All errors are about parts of multipart objects, btw.
There are no errors of any other kind on any node. There are occasional
messages on all nodes (the beam.smp never uses more than ~1.2-1.3 GB, though).
There are no hardware problems (disk/cpu/memory) on bodies02 or any other node; all servers are identical as well. I can't vouch for network hardware, though - problems there aren't too likely but possible in theory. I.e. I can't deny possibility that bodies02 is connected to different switch and there is some other difference compared to other nodes, network-wise. No errors from kernel and no errors of any kind in "ethtool -S" on all nodes, at least.
The access goes through single gateway right now. There are no errors or info messages on gateway at all, however the CPU watchdog triggers since it's running on server with some CPU load:
I suppose I'll just disable it. Not sure if it affects this or not.
Status of all nodes is fine, all queues are empty.
EDIT: In case this might be useful, I executed diagnose-start on each node, here are mentions of first object I mention here (89369fc...) in logs of all nodes:
On bodies01:
On bodies02:
On bodies03:
On bodies04:
On bodies05:
On bodies06:
This object size is 5255972 bytes:
So I'm supposed to get two parts, one 5M one and another around 13K (plus object for multipart header?), but there seem to be 5 extra ones, two of which are deleted?
The text was updated successfully, but these errors were encountered: