Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors about multipart object parts on storages during upload #845

Closed
vstax opened this issue Sep 23, 2017 · 8 comments
Closed

Errors about multipart object parts on storages during upload #845

vstax opened this issue Sep 23, 2017 · 8 comments

Comments

@vstax
Copy link
Contributor

vstax commented Sep 23, 2017

I'm uploading data to production cluster (6 storage servers, N=3, W=2, R=1). It's latest develop version (with latest leo_object_storage as well). The code and logic is exactly the same as #722 - python code walks through filesystem, for each objects it finds it executes "HEAD" to see if it's on storage, if it's not, it executes "PUT" and uploads the object. It's using boto3. In this experiment, there were no objects so it's always PUT after HEAD. There is no other load on cluster other than uploading. Upload is performed in parallel - 6 processes, but boto3's threads for multipart uploads are disabled, each upload works in single thread.
(the uploaded data can be scrapped and I can upload it again, it's not a problem at this point. I can repeat experiment after changing some settings, if needed)

I'm having errors on storage nodes and object state isn't consistent, however the alarming part is that there are no errors on client, I'm getting "200" result for everything. Client assumes that these objects were safely uploaded. Retries are disabled in boto3 - so apparently it really doesn't get any error at all.

I'm having errors on storage nodes. Log on bodies03:

[E]	[email protected]	2017-09-22 22:34:20.566502 +0300	1506108860	leo_storage_handler_object:put/4	423	[{from,storage},{method,delete},{key,<<"body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz\n3817fb1c3269fc06779418276cefe0aa">>},{req_id,55429545},{cause,not_found}]
[E]	[email protected]	2017-09-23 03:31:13.906581 +0300	1506126673	leo_storage_handler_object:put/4	423	[{from,storage},{method,delete},{key,<<"body/3f/33/ca/3f33cae6299a61aa70f83885e820edea90ad59ba5dd742ee02a43ed54dce8c84ba9ea172a603f834ed1979e0cf2fc9b0e01cee0000000000.xz\nf2889495d9a623e56c18856a90f35dfd">>},{req_id,66506851},{cause,not_found}]

Object status:

[vm@bodies-master ~]$ leofs-adm whereis 'body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz\n3817fb1c3269fc06779418276cefe0aa'
-------+--------------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when            
-------+--------------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
  *    | [email protected]      | f317474ed5353027c02e4a6f9be8f7fa     |         0B |   d41d8cd98f | false          |              0 | 559cc4b9e58da  | 2017-09-22 22:34:20 +0300
  *    | [email protected]      | f317474ed5353027c02e4a6f9be8f7fa     |         0B |   d41d8cd98f | false          |              0 | 559cc4b9e58da  | 2017-09-22 22:34:20 +0300
       | [email protected]      | f317474ed5353027c02e4a6f9be8f7fa     |         0B |   d41d8cd98f | false          |              0 | 559cc4b983676  | 2017-09-22 22:34:20 +0300

[vm@bodies-master ~]$ leofs-adm whereis 'body/3f/33/ca/3f33cae6299a61aa70f83885e820edea90ad59ba5dd742ee02a43ed54dce8c84ba9ea172a603f834ed1979e0cf2fc9b0e01cee0000000000.xz\nf2889495d9a623e56c18856a90f35dfd'
-------+--------------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when            
-------+--------------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
  *    | [email protected]      | bc2a17ec483f7447523623dadf48d0fa     |         0B |   d41d8cd98f | false          |              0 | 559d07161f511  | 2017-09-23 03:31:13 +0300
       | [email protected]      | bc2a17ec483f7447523623dadf48d0fa     |         0B |   d41d8cd98f | false          |              0 | 559d0715a2dbc  | 2017-09-23 03:31:13 +0300
  *    | [email protected]      | bc2a17ec483f7447523623dadf48d0fa     |         0B |   d41d8cd98f | false          |              0 | 559d07161f511  | 2017-09-23 03:31:13 +0300

Log on bodies05:

[E]	[email protected]	2017-09-23 01:08:31.141043 +0300	1506118111	leo_storage_handler_object:put/4	423	[{from,storage},{method,delete},{key,<<"body/32/74/05/3274056a92a181acdcd5dfada2ee743a6bee6dc919ed9b5273c9ca7b5c1a1f9d3d5573fb11b82bea92d2703d004c03990014d40000000000.xz\nea161c4968feea289646687897c43564">>},{req_id,86068850},{cause,not_found}]
[E]	[email protected]	2017-09-23 01:13:20.691197 +0300	1506118400	leo_storage_handler_object:put/4	423	[{from,storage},{method,delete},{key,<<"body/12/bf/0d/12bf0db7d8bcdf91a3a42ba97867bf6b785b0113754238dce5a4952524cb13182a7a2e64cfe7b3614a4c4551f0db4adf0002250100000000.xz\na7c6516a2e15f44a0e037043c88736d8">>},{req_id,115093938},{cause,not_found}]

Objects status:

[vm@bodies-master ~]$ leofs-adm whereis 'body/32/74/05/3274056a92a181acdcd5dfada2ee743a6bee6dc919ed9b5273c9ca7b5c1a1f9d3d5573fb11b82bea92d2703d004c03990014d40000000000.xz\nea161c4968feea289646687897c43564'
-------+--------------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when            
-------+--------------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
  *    | [email protected]      | cf142e51b924ba5fb4165b5bb7387afe     |         0B |   d41d8cd98f | false          |              0 | 559ce72f74d6a  | 2017-09-23 01:08:30 +0300
  *    | [email protected]      | cf142e51b924ba5fb4165b5bb7387afe     |         0B |   d41d8cd98f | false          |              0 | 559ce72f74d6a  | 2017-09-23 01:08:30 +0300
       | [email protected]      | cf142e51b924ba5fb4165b5bb7387afe     |         0B |   d41d8cd98f | false          |              0 | 559ce72f0c762  | 2017-09-23 01:08:30 +0300

[vm@bodies-master ~]$ leofs-adm whereis 'body/12/bf/0d/12bf0db7d8bcdf91a3a42ba97867bf6b785b0113754238dce5a4952524cb13182a7a2e64cfe7b3614a4c4551f0db4adf0002250100000000.xz\na7c6516a2e15f44a0e037043c88736d8'
-------+--------------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when            
-------+--------------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
  *    | [email protected]      | 25964721380d6594f0d193cf949ee286     |         0B |   d41d8cd98f | false          |              0 | 559ce84414f6a  | 2017-09-23 01:13:20 +0300
  *    | [email protected]      | 25964721380d6594f0d193cf949ee286     |         0B |   d41d8cd98f | false          |              0 | 559ce84414f6a  | 2017-09-23 01:13:20 +0300
       | [email protected]      | 25964721380d6594f0d193cf949ee286     |         0B |   d41d8cd98f | false          |              0 | 559ce8438b58c  | 2017-09-23 01:13:19 +0300

Main part of these (multipart) objects is always fine:

[vm@bodies-master ~]$ leofs-adm whereis 'body/12/bf/0d/12bf0db7d8bcdf91a3a42ba97867bf6b785b0113754238dce5a4952524cb13182a7a2e64cfe7b3614a4c4551f0db4adf0002250100000000.xz'
-------+--------------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when
-------+--------------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
       | [email protected]      | a1e97f692fa5954d89b76d350b6dffd5     |      7884K |   4a7cf0ab59 | false          |              2 | 559ce84419147  | 2017-09-23 01:13:20 +0300
       | [email protected]      | a1e97f692fa5954d89b76d350b6dffd5     |      7884K |   4a7cf0ab59 | false          |              2 | 559ce84419147  | 2017-09-23 01:13:20 +0300
       | [email protected]      | a1e97f692fa5954d89b76d350b6dffd5     |      7884K |   4a7cf0ab59 | false          |              2 | 559ce84419147  | 2017-09-23 01:13:20 +0300

(same for any other)

There are lot more objects like that on all nodes, actually. For some reason all examples I've checked have bodies02 as primary node. There are lots of similar errors in bodies02 log (looks like all errors on each other node have corresponding error in bodies02 log), so grepped just related to these objects. The rest of errors (about other objects) look exactly the same. All errors are about parts of multipart objects, btw.

[W]	[email protected]	2017-09-22 22:34:20.566426 +0300	1506108860	leo_storage_read_repairer:compare/4	167	[{node,'[email protected]'},{addr_id,323123272100344001099548108964674205690},{key,<<"body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz\n3817fb1c3269fc06779418276cefe0aa">>},{clock,1506108860020342},{cause,not_found}]
[W]	[email protected]	2017-09-23 01:08:31.141937 +0300	1506118111	leo_storage_read_repairer:compare/4	167	[{node,'[email protected]'},{addr_id,275254980530270342147947643001814678270},{key,<<"body/32/74/05/3274056a92a181acdcd5dfada2ee743a6bee6dc919ed9b5273c9ca7b5c1a1f9d3d5573fb11b82bea92d2703d004c03990014d40000000000.xz\nea161c4968feea289646687897c43564">>},{clock,1506118110070626},{cause,not_found}]
[W]	[email protected]	2017-09-23 01:13:20.692804 +0300	1506118400	leo_storage_read_repairer:compare/4	167	[{node,'[email protected]'},{addr_id,49961723055780689973179696651582759558},{key,<<"body/12/bf/0d/12bf0db7d8bcdf91a3a42ba97867bf6b785b0113754238dce5a4952524cb13182a7a2e64cfe7b3614a4c4551f0db4adf0002250100000000.xz\na7c6516a2e15f44a0e037043c88736d8">>},{clock,1506118399997324},{cause,not_found}]
[W]	[email protected]	2017-09-23 03:31:13.908487 +0300	1506126673	leo_storage_read_repairer:compare/4	167	[{node,'[email protected]'},{addr_id,250113424891249516365044801522905960698},{key,<<"body/3f/33/ca/3f33cae6299a61aa70f83885e820edea90ad59ba5dd742ee02a43ed54dce8c84ba9ea172a603f834ed1979e0cf2fc9b0e01cee0000000000.xz\nf2889495d9a623e56c18856a90f35dfd">>},{clock,1506126673358268},{cause,not_found}]

There are no errors of any other kind on any node. There are occasional

[I]	[email protected]	2017-09-22 22:22:39.58070 +0300	1506108159	null:null	0	["alarm_handler",58,32,"{set,{system_memory_high_watermark,[]}}"]

messages on all nodes (the beam.smp never uses more than ~1.2-1.3 GB, though).

There are no hardware problems (disk/cpu/memory) on bodies02 or any other node; all servers are identical as well. I can't vouch for network hardware, though - problems there aren't too likely but possible in theory. I.e. I can't deny possibility that bodies02 is connected to different switch and there is some other difference compared to other nodes, network-wise. No errors from kernel and no errors of any kind in "ethtool -S" on all nodes, at least.

The access goes through single gateway right now. There are no errors or info messages on gateway at all, however the CPU watchdog triggers since it's running on server with some CPU load:

[W]	[email protected]	2017-09-23 14:12:38.943948 +0300	1506165158	leo_watchdog_cpu:handle_call/2	224	[{triggered_watchdog,cpu_load},{load_avg_1m,6.4}]
[W]	[email protected]	2017-09-23 14:12:43.946443 +0300	1506165163	leo_watchdog_cpu:handle_call/2	224	[{triggered_watchdog,cpu_load},{load_avg_1m,5.97}]

I suppose I'll just disable it. Not sure if it affects this or not.
Status of all nodes is fine, all queues are empty.

EDIT: In case this might be useful, I executed diagnose-start on each node, here are mentions of first object I mention here (89369fc...) in logs of all nodes:

On bodies01:

127440176	292797815120921720132443653970268988165	body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz	2	13092	1506108860377918	2017-09-22 22:34:20 +0300	0
101014476	144613024842166612775662523989968230703	body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz	0	5255972	1506108860446322	2017-09-22 22:34:20 +0300	0

On bodies02:

130793378	323123272100344001099548108964674205690	body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz	0	0	1506108860020342	2017-09-22 22:34:20 +0300	0
130793676	323123272100344001099548108964674205690	body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz	0	0	1506108860422362	2017-09-22 22:34:20 +0300	1
91426957	292797815120921720132443653970268988165	body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz	2	13092	1506108860377918	2017-09-22 22:34:20 +0300	0
103831649	266566964550193539543455955030711265242	body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz	1	5242880	1506108860272834	2017-09-22 22:34:20 +0300	0

On bodies03:

87343807	323123272100344001099548108964674205690	body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz	0	0	1506108860020342	2017-09-22 22:34:20 +0300	0
111514107	144613024842166612775662523989968230703	body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz	0	5255972	1506108860446322	2017-09-22 22:34:20 +0300	0

On bodies04:

81859373	323123272100344001099548108964674205690	body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz	0	0	1506108860020342	2017-09-22 22:34:20 +0300	0
81859671	323123272100344001099548108964674205690	body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz	0	0	1506108860422362	2017-09-22 22:34:20 +0300	1

On bodies05:

103072807	266566964550193539543455955030711265242	body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz	1	5242880	1506108860272834	2017-09-22 22:34:20 +0300	0

On bodies06:

121201919	292797815120921720132443653970268988165	body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz	2	13092	1506108860377918	2017-09-22 22:34:20 +0300	0
113091666	266566964550193539543455955030711265242	body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz	1	5242880	1506108860272834	2017-09-22 22:34:20 +0300	0
74541902	144613024842166612775662523989968230703	body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz	0	5255972	1506108860446322	2017-09-22 22:34:20 +0300	0

This object size is 5255972 bytes:

[vm@bodies-master ~]$ leofs-adm whereis 'body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz'
-------+--------------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when            
-------+--------------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
       | [email protected]      | 6ccb749af0999648aafbb9487177192f     |      5133K |   f5c57976b1 | false          |              2 | 559cc4b9eb672  | 2017-09-22 22:34:20 +0300
       | [email protected]      | 6ccb749af0999648aafbb9487177192f     |      5133K |   f5c57976b1 | false          |              2 | 559cc4b9eb672  | 2017-09-22 22:34:20 +0300
       | [email protected]      | 6ccb749af0999648aafbb9487177192f     |      5133K |   f5c57976b1 | false          |              2 | 559cc4b9eb672  | 2017-09-22 22:34:20 +0300

So I'm supposed to get two parts, one 5M one and another around 13K (plus object for multipart header?), but there seem to be 5 extra ones, two of which are deleted?

@vstax vstax changed the title Inconsistent state of (multipart) objects on storages during upload Errors about multipart object parts on storages during upload Sep 24, 2017
@mocchira mocchira self-assigned this Sep 26, 2017
@mocchira mocchira added this to the 1.4.0 milestone Sep 26, 2017
@mocchira
Copy link
Member

WIP

@mocchira
Copy link
Member

@vstax Thanks for reporting this problem.
It seems kind of race condition problem that could happen under a distributed system adopting eventual consistency.
We fixed part of the problem at #722 however it's not perfect fix (the odds inconsistencies could happen get decreased but not zero).
Let me explain how this could happen below.

The sequence of multipart upload(MU) on leo_gateway

  1. Receive a MU initiate request and put a temporary object suffixed by UploadID (this temporary object can be remained (not deleted) in your case and you'd see inconsistencies through leofs-adm)
  2. Receive MU part requests and put the part objects
  3. Receive a MU complete request and remove the temporary object created at 1
  4. Check the all parts uploaded and calculate its checksum and if no problem then put the parent object for those parts (the object gets available to clients at this moment)

The point is 3 are guaranteed to happen subsequently to 1 on leo_gateway however this order can be inverted chronologically on leo_storage (especially on secondary/third replica) as those requests get proceeded in async manner. Those behavior causes

[E]	[email protected]	2017-09-22 22:34:20.566502 +0300	1506108860	leo_storage_handler_object:put/4	423	[{from,storage},{method,delete},{key,<<"body/83/69/fc/8369fcebed231a6246410cfa9a1436758770ad4e5acb00f30a94ebaa31ceb5b65719e2a3578bbdd2f1cce53fdafd6cad00d0fd0000000000.xz\n3817fb1c3269fc06779418276cefe0aa">>},{req_id,55429545},{cause,not_found}]

not found error happen at first as 1 was not reached on leo_storage then result in

[vm@bodies-master ~]$ leofs-adm whereis 'body/12/bf/0d/12bf0db7d8bcdf91a3a42ba97867bf6b785b0113754238dce5a4952524cb13182a7a2e64cfe7b3614a4c4551f0db4adf0002250100000000.xz\na7c6516a2e15f44a0e037043c88736d8'
-------+--------------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when            
-------+--------------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
  *    | [email protected]      | 25964721380d6594f0d193cf949ee286     |         0B |   d41d8cd98f | false          |              0 | 559ce84414f6a  | 2017-09-23 01:13:20 +0300
  *    | [email protected]      | 25964721380d6594f0d193cf949ee286     |         0B |   d41d8cd98f | false          |              0 | 559ce84414f6a  | 2017-09-23 01:13:20 +0300
       | [email protected]      | 25964721380d6594f0d193cf949ee286     |         0B |   d41d8cd98f | false          |              0 | 559ce8438b58c  | 2017-09-23 01:13:19 +0300

as 1 was reached after 3 proceeded on leo_storage and the temporary object suffixed by UploadID has been remained.
That's it.

As the permanent fix is difficult (ensuring the causality between 1 and 3 is needed and that means we need to implement some consensus algorithm), now we are considering the fix decreasing the odds by executing to remove a temporary object after confirming the checksum.

So I'm supposed to get two parts, one 5M one and another around 13K (plus object for multipart header?), but there seem to be 5 extra ones, two of which are deleted?

Yes, 5 extra ones are

  • 3 temporary objects
  • 2 tombstone objects for delete (this number should be 3 when everything works fine

@vstax
Copy link
Contributor Author

vstax commented Sep 28, 2017

@mocchira Thank you for analyzing. Glad to know this won't affect the real data. A few questions:

  1. Do you think switching to PTP (microsecond-class time synchronization) on storage nodes cluster instead of NTP would reduce the problem? PTP is a bit annoying to setup compared to just running NTP, but maybe it's worth it? Unforunately, we probably can't run PTP between storages and gateways as they are in different datacenters (they aren't too far, RTT is less than ms, but still).
  2. Let's suppose I ignore these errors. What is the easiest way to make state between nodes consistent (i.e. mark that temporary object as deleted on stor05 in this example)? Running
leofs-adm recover-file body/12/bf/0d/12bf0db7d8bcdf91a3a42ba97867bf6b785b0113754238dce5a4952524cb13182a7a2e64cfe7b3614a4c4551f0db4adf0002250100000000.xz
leofs-adm recover-file body/12/bf/0d/12bf0db7d8bcdf91a3a42ba97867bf6b785b0113754238dce5a4952524cb13182a7a2e64cfe7b3614a4c4551f0db4adf0002250100000000.xz\na7c6516a2e15f44a0e037043c88736d8

doesn't do anything.

@mocchira
Copy link
Member

@vstax

Do you think switching to PTP (microsecond-class time synchronization) on storage nodes cluster instead of NTP would reduce the problem? PTP is a bit annoying to setup compared to just running NTP, but maybe it's worth it? Unforunately, we probably can't run PTP between storages and gateways as they are in different datacenters (they aren't too far, RTT is less than ms, but still).

Maybe (however IMO, cost over the benefit).

now we are considering the fix decreasing the odds by executing to remove a temporary object after confirming the checksum.

Once the above fix is landed, the odds you could see inconsistencies should dramatically decrease because confirming the checksum needs N times (N = the number of chunks of a large object) round trip(s) between leo_gateway and leo_storage(s) so that I'd recommend you to wait for the fix without adopting PTP.

Let's suppose I ignore these errors. What is the easiest way to make state between nodes consistent (i.e. mark that temporary object as deleted on stor05 in this example)? Running

Hmm recover-file should work even if the target is a temporary object so I will vet further.

@mocchira
Copy link
Member

mocchira commented Oct 6, 2017

@vstax as you may know, this has been fixed (precisely the odds you could see inconsistencies get decreased) so give it a try if you have time.

@vstax
Copy link
Contributor Author

vstax commented Oct 6, 2017

@mocchira Thank you, I will (I need to finish recover-node experiments before wiping the data, so this will have to wait a bit). We also have PTP now so it won't be the same experiment exactly, but eventually I will be uploading much more data so it should get plenty of testing.

Got a question about recover-file though, is it supposed to work or I am doing it wrong way? If needed, I can provide results of get/head API calls directly like in the other ticket.

@mocchira
Copy link
Member

mocchira commented Oct 6, 2017

@vstax

Thank you, I will (I need to finish recover-node experiments before wiping the data, so this will have to wait a bit). We also have PTP now so it won't be the same experiment exactly, but eventually I will be uploading much more data so it should get plenty of testing.

Got it.

Got a question about recover-file though, is it supposed to work or I am doing it wrong way? If needed, I can provide results of get/head API calls directly like in the other ticket.

recover-file should work against temporary objects.
How about the below (quoting the path argument as it's including a meta character )?

leofs-adm recover-file "body/12/bf/0d/12bf0db7d8bcdf91a3a42ba97867bf6b785b0113754238dce5a4952524cb13182a7a2e64cfe7b3614a4c4551f0db4adf0002250100000000.xz\na7c6516a2e15f44a0e037043c88736d8"

@vstax
Copy link
Contributor Author

vstax commented Oct 6, 2017

@mocchira Nope, it doesn't work.. nothing happens.
I somehow omitted quoting in the comment but actually I was using it before (passing exactly the same name as for "whereis" command)

EDIT: works just fine now with the latest patches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants