Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After recover-node, tons of "not found" error are generating in logs #859

Closed
vstax opened this issue Sep 29, 2017 · 28 comments
Closed

After recover-node, tons of "not found" error are generating in logs #859

vstax opened this issue Sep 29, 2017 · 28 comments

Comments

@vstax
Copy link
Contributor

vstax commented Sep 29, 2017

I was trying to perform some data loss / recovery experiment on (soon to become) production cluster. 6 nodes, N=3, holding currently around 10M objects (~5M on each node). For the first experiment I simply executed "leofs-adm recover-node [email protected]" - every node was in perfect condition before that command and I didn't do anything else other than waiting for recover to finish. There was no other load as well.

The performance of recovery was bound by 1Gb network - i.e. I saw >950 Mb incoming traffic on stor01 all the time. There was no write IO on stor01 and low-to-moderate read IO on each other node. Queues on stor01 were empty, queue "leo_per_object_queue" on all other nodes were slowly growing, then slowly dropping. I expected recover to finish when all queues drop to 0, and eventually that queue on all but the last node (stor02) reached zero. It took around 11 hours which is expected (~3300 GB of data to recover).

[vm@bodies-master ~]$ leofs-adm du [email protected]
 active number of objects: 5529192
  total number of objects: 5642781
   active size of objects: 3537148259064
    total size of objects: 3537182125400
     ratio of active size: 100.0%
    last compaction start: 2017-09-24 19:17:31 +0300
      last compaction end: 2017-09-24 19:18:24 +0300

However, around the point when last, stor02's "leo_per_object_queue" was getting closer to 0 something seriously broke. All nodes except for stor01 started to get high CPU load again and generate insane amount of errors in log (8-9M lines per hour). All errors look like this:

[E]	[email protected]	2017-09-29 22:08:06.575204 +0300	1506712086	leo_storage_handler_object:get/1	89	[{from,storage},{method,get},{key,<<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>},{cause,not_found}]
[E]	[email protected]	2017-09-29 22:08:06.575592 +0300	1506712086	leo_storage_handler_object:get/1	89	[{from,storage},{method,get},{key,<<"bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz\nde4cfe14c9e4ba8bfd003b21c198297c">>},{cause,not_found}]
[E]	[email protected]	2017-09-29 22:08:06.575954 +0300	1506712086	leo_storage_handler_object:get/1	89	[{from,storage},{method,get},{key,<<"bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz\nce2c2e91bbee81237302fb6b3e464148">>},{cause,not_found}]

on other node

[E]	[email protected]	2017-09-29 21:10:17.109946 +0300	1506708617	leo_storage_handler_object:get/1	89	[{from,storage},{method,get},{key,<<"bod1/00/16/16/001616ae075e99d7f2357029543de075be4f47cf5609ff28abd47a73407c8e733cd598fde3b8f1a3fd7b3eb4d597349320ddbf0000000000.xz\nb7dfc9f1da52d42e338921742b0909c1">>},{cause,not_found}]
[E]	[email protected]	2017-09-29 21:10:17.110392 +0300	1506708617	leo_storage_handler_object:get/1	89	[{from,storage},{method,get},{key,<<"bod1/00/82/53/00825344618ee3940c757ebb5866950f11428adea97c907991f9d74b83485806b47222f1c070a2cf448c369f7fb75aba0040fa0000000000.xz\n6c3380c7d7a67dd96c389538eae590b4">>},{cause,not_found}]
[E]	[email protected]	2017-09-29 21:10:17.110737 +0300	1506708617	leo_storage_handler_object:get/1	89	[{from,storage},{method,get},{key,<<"bod1/00/b5/82/00b582b09e6a375beff5f836495932b16f59c59c38bd4ceb983a5d9dad2002d641fb331248b2d40dff9a025ffce16af2b06e090100000000.xz\naff1e6d3640c6fce31f36e19b2db3eb9">>},{cause,not_found}]

I.e. nodes stor02-stor06 are generating logs related to parts of large objects. There aren't even so many large objects on nodes - each object name is repeated 4-5K times per hour in these logs. The queues on stor01 are empty, all other nodes got a bunch of messages in "leo_per_object_queue". The queue state is "running" but the numbers aren't changing:

[vm@bodies-master ~]$ leofs-adm mq-stats [email protected]|grep leo_per_object_queue
 leo_per_object_queue           |   running   | 17797          | 1600           | 500            | recover inconsistent objs                   
[vm@bodies-master ~]$ leofs-adm mq-stats [email protected]|grep leo_per_object_queue
 leo_per_object_queue           |   running   | 13982          | 1600           | 500            | recover inconsistent objs                   

The problem didn't start at the same moment, on stor06 it started at 20:40 and at stor02 at 22:08, for example. This very well could be around the time the queue on each node reached 0 for the first time.

There are no other errors on storage nodes and no errors on managers.
"whereis" output for some of these objects doesn't show anything strange. E.g. these 3 objects that are in bodies02's log:

[vm@bodies-master ~]$ leofs-adm whereis "bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984"
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when            
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
  *    | [email protected]      | faec83844b09ed275009713d6993dcfa     |         0B |   d41d8cd98f | false          |              0 | 559cdc01b5958  | 2017-09-23 00:18:29 +0300
  *    | [email protected]      | faec83844b09ed275009713d6993dcfa     |         0B |   d41d8cd98f | false          |              0 | 559cdc01b5958  | 2017-09-23 00:18:29 +0300
  *    | [email protected]      | faec83844b09ed275009713d6993dcfa     |         0B |   d41d8cd98f | false          |              0 | 559cdc01b5958  | 2017-09-23 00:18:29 +0300

[vm@bodies-master ~]$ leofs-adm whereis "bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz\nde4cfe14c9e4ba8bfd003b21c198297c"
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when            
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
  *    | [email protected]      | 17bf224d05d09b881a3cea6bdc13c39c     |         0B |   d41d8cd98f | false          |              0 | 559cdc3bd3a9c  | 2017-09-23 00:19:30 +0300
  *    | [email protected]      | 17bf224d05d09b881a3cea6bdc13c39c     |         0B |   d41d8cd98f | false          |              0 | 559cdc3bd3a9c  | 2017-09-23 00:19:30 +0300
  *    | [email protected]      | 17bf224d05d09b881a3cea6bdc13c39c     |         0B |   d41d8cd98f | false          |              0 | 559cdc3bd3a9c  | 2017-09-23 00:19:30 +0300

[vm@bodies-master ~]$ leofs-adm whereis "bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz\nce2c2e91bbee81237302fb6b3e464148"
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when            
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
  *    | [email protected]      | 908d42b7777bb4044e71236dd7188ff6     |         0B |   d41d8cd98f | false          |              0 | 559cdde56ad14  | 2017-09-23 00:26:56 +0300
  *    | [email protected]      | 908d42b7777bb4044e71236dd7188ff6     |         0B |   d41d8cd98f | false          |              0 | 559cdde56ad14  | 2017-09-23 00:26:56 +0300
  *    | [email protected]      | 908d42b7777bb4044e71236dd7188ff6     |         0B |   d41d8cd98f | false          |              0 | 559cdde56ad14  | 2017-09-23 00:26:56 +0300

I had to stop all storage nodes except for stor01 right now as each of them was generating 3GB of logs per hour (which is OK for now since the cluster isn't under real load yet).

@vstax
Copy link
Contributor Author

vstax commented Oct 4, 2017

Maybe this will be helpful, grep for (all parts) of these three objects

bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984
bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz\nde4cfe14c9e4ba8bfd003b21c198297c
bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz\nce2c2e91bbee81237302fb6b3e46414

in diagnose logs on each node that I've taken few days ago before this problem (but the objects are older than that anyway):
stor01:

/mnt/avs1/bodies/log/leo_object_storage_61.20170924.16.1:501605509	339557497538056389764364852147920426947	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	0	6387568	1506115109746005	2017-09-23 00:18:29 +0300	0
/mnt/avs2/bodies/log/leo_object_storage_10031.20170924.16.1:475512242	192142298424847141664351670159625719798	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	0	0	1506115616397447	2017-09-23 00:26:56 +0300	0
/mnt/avs2/bodies/log/leo_object_storage_10031.20170924.16.1:475512540	192142298424847141664351670159625719798	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	0	0	1506115616943380	2017-09-23 00:26:56 +0300	1
/mnt/avs2/bodies/log/leo_object_storage_10024.20170924.16.1:500634062	137701108157391764395984471119953336430	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	2	1083496	1506115170639999	2017-09-23 00:19:30 +0300	0
/mnt/avs2/bodies/log/leo_object_storage_10025.20170924.16.1:501314494	31564668307327871006024546818790245276	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	0	0	1506115170200781	2017-09-23 00:19:30 +0300	0
/mnt/avs2/bodies/log/leo_object_storage_10025.20170924.16.1:501314792	31564668307327871006024546818790245276	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	0	0	1506115170679452	2017-09-23 00:19:30 +0300	1
/mnt/avs2/bodies/log/leo_object_storage_10017.20170924.16.1:552883145	65704074110356625176109288712339985316	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	2	612808	1506115616854880	2017-09-23 00:26:56 +0300	0
/mnt/avs3/bodies/log/leo_object_storage_20016.20170924.17.1:493080779	422488725232705560218867916672879673	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	1	5242880	1506115170455574	2017-09-23 00:19:30 +0300	0
/mnt/avs3/bodies/log/leo_object_storage_20000.20170924.17.1:541508463	333535048481842093793027197745423965434	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	0	0	1506115109271868	2017-09-23 00:18:29 +0300	0
/mnt/avs3/bodies/log/leo_object_storage_20000.20170924.17.1:541508761	333535048481842093793027197745423965434	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	0	0	1506115109738840	2017-09-23 00:18:29 +0300	1
/mnt/avs4/bodies/log/leo_object_storage_30059.20170924.19.1:511609231	205001574379103612268490633464020270328	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	2	1144688	1506115109706369	2017-09-23 00:18:29 +0300	0
/mnt/avs4/bodies/log/leo_object_storage_30033.20170924.18.1:507144434	163197871683914518314889057090751548844	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	0	5855688	1506115616965780	2017-09-23 00:26:56 +0300	0

stor02:

/mnt/avs1/bodies/log/leo_object_storage_61.20170924.16.1:554137048	339557497538056389764364852147920426947	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	0	6387568	1506115109746005	2017-09-23 00:18:29 +0300	0
/mnt/avs1/bodies/log/leo_object_storage_7.20170924.15.1:500838327	174594718691070219487381420503214074450	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	1	5242880	1506115109538325	2017-09-23 00:18:29 +0300	0
/mnt/avs2/bodies/log/leo_object_storage_10031.20170924.17.1:469571644	192142298424847141664351670159625719798	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	0	0	1506115616397447	2017-09-23 00:26:56 +0300	0
/mnt/avs2/bodies/log/leo_object_storage_10031.20170924.17.1:469571942	192142298424847141664351670159625719798	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	0	0	1506115616943380	2017-09-23 00:26:56 +0300	1
/mnt/avs2/bodies/log/leo_object_storage_10025.20170924.17.1:482700393	31564668307327871006024546818790245276	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	0	0	1506115170200781	2017-09-23 00:19:30 +0300	0
/mnt/avs2/bodies/log/leo_object_storage_10025.20170924.17.1:483007176	31564668307327871006024546818790245276	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	0	0	1506115170679452	2017-09-23 00:19:30 +0300	1
/mnt/avs2/bodies/log/leo_object_storage_10024.20170924.17.1:503000144	137701108157391764395984471119953336430	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	2	1083496	1506115170639999	2017-09-23 00:19:30 +0300	0
/mnt/avs2/bodies/log/leo_object_storage_10017.20170924.16.1:513196885	65704074110356625176109288712339985316	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	2	612808	1506115616854880	2017-09-23 00:26:56 +0300	0
/mnt/avs3/bodies/log/leo_object_storage_20000.20170924.17.1:522206954	333535048481842093793027197745423965434	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	0	0	1506115109271868	2017-09-23 00:18:29 +0300	0
/mnt/avs3/bodies/log/leo_object_storage_20000.20170924.17.1:522207252	333535048481842093793027197745423965434	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	0	0	1506115109738840	2017-09-23 00:18:29 +0300	1
/mnt/avs3/bodies/log/leo_object_storage_20016.20170924.17.1:474669209	422488725232705560218867916672879673	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	1	5242880	1506115170455574	2017-09-23 00:19:30 +0300	0
/mnt/avs4/bodies/log/leo_object_storage_30059.20170924.19.1:561084056	205001574379103612268490633464020270328	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	2	1144688	1506115109706369	2017-09-23 00:18:29 +0300	0

stor03:

/mnt/avs2/bodies/log/leo_object_storage_10050.20170924.17.1:408397380	136206740878441485192241887097818729671	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	0	6326376	1506115170686849	2017-09-23 00:19:30 +0300	0
/mnt/avs2/bodies/log/leo_object_storage_10017.20170924.16.1:493137320	65704074110356625176109288712339985316	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	2	612808	1506115616854880	2017-09-23 00:26:56 +0300	0
/mnt/avs4/bodies/log/leo_object_storage_30033.20170924.18.1:497087959	163197871683914518314889057090751548844	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	0	5855688	1506115616965780	2017-09-23 00:26:56 +0300	0
/mnt/avs4/bodies/log/leo_object_storage_30011.20170924.18.1:406934592	29002966790248415467176985531292634474	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	1	5242880	1506115616703767	2017-09-23 00:26:56 +0300	0

stor04:

/mnt/avs1/bodies/log/leo_object_storage_61.20170924.16.1:495666511	339557497538056389764364852147920426947	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	0	6387568	1506115109746005	2017-09-23 00:18:29 +0300	0
/mnt/avs1/bodies/log/leo_object_storage_7.20170924.15.1:486632130	174594718691070219487381420503214074450	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	1	5242880	1506115109538325	2017-09-23 00:18:29 +0300	0
/mnt/avs2/bodies/log/leo_object_storage_10050.20170924.17.1:550762494	136206740878441485192241887097818729671	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	0	6326376	1506115170686849	2017-09-23 00:19:30 +0300	0
/mnt/avs4/bodies/log/leo_object_storage_30059.20170924.19.1:496207614	205001574379103612268490633464020270328	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	2	1144688	1506115109706369	2017-09-23 00:18:29 +0300	0
/mnt/avs4/bodies/log/leo_object_storage_30011.20170924.18.1:505647543	29002966790248415467176985531292634474	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	1	5242880	1506115616703767	2017-09-23 00:26:56 +0300	0

stor05:

/mnt/avs1/bodies/log/leo_object_storage_7.20170924.15.1:545448031	174594718691070219487381420503214074450	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	1	5242880	1506115109538325	2017-09-23 00:18:29 +0300	0
/mnt/avs2/bodies/log/leo_object_storage_10031.20170924.17.1:533281330	192142298424847141664351670159625719798	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	0	0	1506115616397447	2017-09-23 00:26:56 +0300	0
/mnt/avs2/bodies/log/leo_object_storage_10031.20170924.17.1:533281628	192142298424847141664351670159625719798	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	0	0	1506115616943380	2017-09-23 00:26:56 +0300	1
/mnt/avs2/bodies/log/leo_object_storage_10024.20170924.17.1:544691227	137701108157391764395984471119953336430	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	2	1083496	1506115170639999	2017-09-23 00:19:30 +0300	0
/mnt/avs2/bodies/log/leo_object_storage_10025.20170924.17.1:524918353	31564668307327871006024546818790245276	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	0	0	1506115170200781	2017-09-23 00:19:30 +0300	0
/mnt/avs2/bodies/log/leo_object_storage_10025.20170924.17.1:524918651	31564668307327871006024546818790245276	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	0	0	1506115170679452	2017-09-23 00:19:30 +0300	1
/mnt/avs3/bodies/log/leo_object_storage_20000.20170924.17.1:576529518	333535048481842093793027197745423965434	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	0	0	1506115109271868	2017-09-23 00:18:29 +0300	0
/mnt/avs3/bodies/log/leo_object_storage_20000.20170924.17.1:576529816	333535048481842093793027197745423965434	bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz	0	0	1506115109738840	2017-09-23 00:18:29 +0300	1

stor06:

/mnt/avs2/bodies/log/leo_object_storage_10050.20170924.17.1:430583800	136206740878441485192241887097818729671	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	0	6326376	1506115170686849	2017-09-23 00:19:30 +0300	0
/mnt/avs3/bodies/log/leo_object_storage_20016.20170924.17.1:428529494	422488725232705560218867916672879673	bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz	1	5242880	1506115170455574	2017-09-23 00:19:30 +0300	0
/mnt/avs4/bodies/log/leo_object_storage_30033.20170924.18.1:489311203	163197871683914518314889057090751548844	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	0	5855688	1506115616965780	2017-09-23 00:26:56 +0300	0
/mnt/avs4/bodies/log/leo_object_storage_30011.20170924.18.1:465686320	29002966790248415467176985531292634474	bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz	1	5242880	1506115616703767	2017-09-23 00:26:56 +0300	0

@mocchira
Copy link
Member

mocchira commented Oct 5, 2017

WIP

@mocchira
Copy link
Member

mocchira commented Oct 5, 2017

@vstax Just in case, can you share the result of leofs-adm dump-ring ${every_node} in order to check whether there is no inconsistencies on RING between each node?

@mocchira
Copy link
Member

mocchira commented Oct 5, 2017

@vstax still not reproduced yet on our env.
From my understanding, the only reason this could happen is there are some differences between RING on each node.
I think the following sequence happened repeatedly on your env.

my assumption above will be verified by the result of dump-ring.
if my assumption is correct then the root cause is why the differences between RING on each node has not been corrected. (if no split brain happened) we will vet further in this direction.

@vstax
Copy link
Contributor Author

vstax commented Oct 5, 2017

@mocchira Thank you for looking

The files ring_cur.dump and ring_prv.dump are identical on master, stor01 and gateway used to load these objects; I had to start other nodes, and new RING is pushed to them on start, I think? Either way, after I started them and made the dumps - they all came out identical as well. The files ring_cur and ring_prv are identical as well (as well as corresponding log files - identical for cur/prv and between all nodes). Full list of various timestamps in RING are

1505498458378826
1505745681705743
1505745941278270
1505746206399864
1505746473731856
1505746601120237

First one is 'Fri Sep 15 21:00:58 2017' here and rest are 'Mon Sep 18 17:41:21 2017' to 'Mon Sep 18 17:56:41 2017'. First date is when stor01 was first started (cluster wasn't initialized); other dates represent when stor02 to stor06 were first started. "start" command to start cluster happened at TS 1505746641881341 (2017-09-18 17:57:21.881193). Since then no rebalance command happened - cluster lived with these 6 nodes and that's it. Apparently RING never changed since initial creation. Either way, each date is way before upload of these objects has happened.

On Sep 19 and 20, one more node (during experiment with multiple nodes) was connected to the manager, but I never attached it - it was connected for some time, then disconnected and gone, RING wasn't recalculated. Its name was [email protected] (it was running on the same stor01, with different queue/log/avs directories and rpc.server.listen_port, of course. There are no traces of that in manager logs.

Split brain, you mean during upload of these objects? Well there were no errors on gateway and client and W=2. There were no known network problems and there was nothing in logs apart from errors listed in #845.

Is it possible to identify node that was source of trouble (if any) given how it all happened? I.e. recover-node was running for stor01, each node seems to have reached 0 messages in queue before the problem started to happen and the numbers in that queue got non-zero again, accompanied by high load?

Would repeat of experiment be any useful? (i.e. now, when we are sure that RING is identical, stop stor02-06, delete contents of that queue, start them, run recover-node again and wait).

@vstax
Copy link
Contributor Author

vstax commented Oct 5, 2017

Contents of members_cur.dump with all nodes running:

{member,'[email protected]',"node_9143ed04",
        "stor01.selectel.cloud.lan",13077,ipv4,1505498458378826,
        running,168,[],[]}.
{member,'[email protected]',"node_f94badb4",
        "stor02.selectel.cloud.lan",13077,ipv4,1505745681705743,
        running,168,[],[]}.
{member,'[email protected]',"node_11bd0a28",
        "stor03.selectel.cloud.lan",13077,ipv4,1505745941278270,
        running,168,[],[]}.
{member,'[email protected]',"node_1edfd78e",
        "stor04.selectel.cloud.lan",13077,ipv4,1505746206399864,
        running,168,[],[]}.
{member,'[email protected]',"node_e01ba289",
        "stor05.selectel.cloud.lan",13077,ipv4,1505746473731856,
        running,168,[],[]}.
{member,'[email protected]',"node_42845b8a",
        "stor06.selectel.cloud.lan",13077,ipv4,1505746601120237,
        running,168,[],[]}.

RING dump attached as well.
ring_cur.dump.63674418994.txt

Assuming that leo_object_storage_api:head or some other method actually returns wrong data on one of the nodes, is it possible to check it directly with remote_console? (that is, what other methods to call first to get these AddrId and Key arguments, knowing object and node name?)

@mocchira
Copy link
Member

mocchira commented Oct 6, 2017

@vstax thanks for sharing. seems no inconsistencies happened at all or inconsistencies already got recovered at some moment, however in the latter scenario, you can see something notifying you that leofs detected inconsistencies and got RING force-updated in log files so now I think no inconsistencies happened. we are now vetting further on this direction (how it happened without RING inconsistencies).

@mocchira
Copy link
Member

mocchira commented Oct 6, 2017

@vstax

Assuming that leo_object_storage_api:head or some other method actually returns wrong data on one of the nodes, is it possible to check it directly with remote_console? (that is, what other methods to call first to get these AddrId and Key arguments, knowing object and node name?)

Yes, assuming we have leofs-adm/CHANGELOG.md like below

leofs@cat2neat:leofs.1.3.5$ ./leofs-adm whereis leofs-adm/CHANGELOG.md
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |           node           |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
       | [email protected]      | 6ae5ce88432a88e6ea2610059db0de9      |       204K |   16c159389a | false          |              0 | 55ad92a35b816  | 2017-10-06 13:20:46 +0900
       | [email protected]      | 6ae5ce88432a88e6ea2610059db0de9      |       204K |   16c159389a | false          |              0 | 55ad92a35b816  | 2017-10-06 13:20:46 +0900

then

%% get an AddrId from a Key through leo_redundant_manager_api:get_redundancies_by_key
(storage_3@127.0.0.1)2> leo_redundant_manager_api:get_redundancies_by_key(<<"leofs-adm/CHANGELOG.md">>).
{ok,{redundancies,8880712031625058080169575614374415849,
                  8713827541981604648542427598869967970,
                  8909797327562015671913302618427450100,[],[],
                  [{redundant_node,'[email protected]',true,true,'L'},
                   {redundant_node,'[email protected]',true,true,'FL'}],
                  2,1,1,1,0,0,958648327}}
%% the first element of the record redundancies stores AddrId
(storage_3@127.0.0.1)4> {ok, Ret} = leo_object_storage_api:head({8880712031625058080169575614374415849, <<"leofs-adm/CHANGELOG.md">>}).
{ok,<<131,104,22,100,0,10,109,101,116,97,100,97,116,97,
      95,51,109,0,0,0,22,108,101,111,102,115,45,...>>}
%% As leo_object_storage_api:head returns a raw binary, it needs to be deserialized to be readable for us.
(storage_3@127.0.0.1)5> binary_to_term(Ret).
%% Now we can see the metadata for leofs-adm/CHANGELOG.md 
{metadata_3,<<"leofs-adm/CHANGELOG.md">>,
            8880712031625058080169575614374415849,22,209043,
            <<131,108,0,0,0,1,104,2,109,0,0,0,22,120,45,97,109,122,45,
              109,101,116,97,...>>,
            181,0,0,0,5243396,1507263646119958,63674482846,
            30246938820081418282392782782160660966,958648327,undefined,
            0,0,0,0,0,0}
%% then let's try to get through leo_storage_handler_object::get with make_ref builtin function
(storage_3@127.0.0.1)7> {ok, _, Meta, Body} = leo_storage_handler_object:get({make_ref(), <<"leofs-adm/CHANGELOG.md">>}).
%% Now we can see the metadata along with its Body
{ok,#Ref<0.2250545746.352059393.107826>,
    {metadata_3,<<"leofs-adm/CHANGELOG.md">>,
                8880712031625058080169575614374415849,22,209043,
                <<131,108,0,0,0,1,104,2,109,0,0,0,22,120,45,97,109,122,45,
                  109,...>>,
                181,0,0,0,5243396,1507263646119958,63674482846,
                30246938820081418282392782782160660966,958648327,undefined,
                0,0,0,0,0,0},
    <<"# CHANGELOG\n## 1.3.7 (Sep 12, 2017)\n\n### Fixed Bugs\n\n* [#592](https://github.com/leo-project/leofs/i"...>>}

let me know if you have any question.

@mocchira
Copy link
Member

mocchira commented Oct 6, 2017

@vstax let me ask you that

bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984
bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz\nde4cfe14c9e4ba8bfd003b21c198297c
bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz\nce2c2e91bbee81237302fb6b3e46414

Try leo_object_storage_api:head and leo_storage_handler_object:get on remote_console against above objects that caused leo_storage(s) to dump logs massively and share the result?

Now we can only say

so if you get the inconsistent result (one is deleted, the other is present) between leo_object_storage_api:head and leo_storage_handler_object:get then I'd not recommend you to repeat the same experiment as probably it will reach the same end. if not then things might go well.

@vstax
Copy link
Contributor Author

vstax commented Oct 6, 2017

@mocchira Well here is some output but it doesn't work as well as it did for you... From stor01:

[root@stor01 ~]# /usr/local/leofs/current/leo_storage/bin/leo_storage remote_console
NAME_ARG: -name [email protected]
Erlang/OTP 20 [erts-9.0] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V9.0  (abort with ^G)
([email protected])1> leo_redundant_manager_api:get_redundancies_by_key(<<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>).
{ok,{redundancies,333535048481842093793027197745423965434,
                  333336867892509251216686562825411144350,
                  333651228914897360571389480935392916771,[],[],
                  [{redundant_node,'[email protected]',
                                   true,true,'L'},
                   {redundant_node,'[email protected]',
                                   false,true,'FL'},
                   {redundant_node,'[email protected]',
                                   false,true,'FL'}],
                  3,1,2,1,0,0,2263375396}}
([email protected])2> {ok, Ret} = leo_object_storage_api:head({333535048481842093793027197745423965434, <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
{ok,<<131,104,22,100,0,10,109,101,116,97,100,97,116,97,
      95,51,109,0,0,0,162,98,111,100,49,47,48,...>>}
([email protected])3> binary_to_term(Ret).
{metadata_3,<<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5"...>>,
            333535048481842093793027197745423965434,162,0,<<>>,0,0,0,0,
            541508761,1506115109738840,63673334309,
            281949768489412648962353822266799178366,2263375396,
            undefined,0,0,0,0,0,1}
([email protected])4> {ok, _, Meta, Body} = leo_storage_handler_object:get({make_ref(), <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
** exception error: no match of right hand side value {error,#Ref<0.3093727938.2882535425.4860>,not_found}
([email protected])5> leo_redundant_manager_api:get_redundancies_by_key(<<"bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz\nde4cfe14c9e4ba8bfd003b21c198297c">>).       {ok,{redundancies,31564668307327871006024546818790245276,
                  30134353396075751234097884368929358631,
                  32497452806801139283842932177160698277,[],[],
                  [{redundant_node,'[email protected]',
                                   true,true,'L'},
                   {redundant_node,'[email protected]',
                                   false,true,'FL'},
                   {redundant_node,'[email protected]',
                                   false,true,'FL'}],
                  3,1,2,1,0,0,2263375396}}
([email protected])8> {ok, Ret} = leo_object_storage_api:head({333535048481842093793027197745423965434, <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
{ok,<<131,104,22,100,0,10,109,101,116,97,100,97,116,97,
      95,51,109,0,0,0,162,98,111,100,49,47,48,...>>}
([email protected])10> {ok, _, Meta, Body} = leo_storage_handler_object:get({make_ref(), <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
** exception error: no match of right hand side value {error,#Ref<0.3093727938.2882535425.32165>,not_found}
([email protected])12> leo_storage_handler_object:get({make_ref(), <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).           {error,#Ref<0.3093727938.2882535425.40273>,not_found}
([email protected])15> leo_redundant_manager_api:get_redundancies_by_key(<<"bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz\nce2c2e91bbee81237302fb6b3e46414">>).       {ok,{redundancies,4529297360425744296764133758171795573,
                  4279128009316051424741391834839533365,
                  4593245109081467175416293916813886545,[],[],
                  [{redundant_node,'[email protected]',
                                   true,true,'L'},
                   {redundant_node,'[email protected]',
                                   false,true,'FL'},
                   {redundant_node,'[email protected]',
                                   false,true,'FL'}],
                  3,1,2,1,0,0,2263375396}}
([email protected])18> {ok, Ret3} = leo_object_storage_api:head({4529297360425744296764133758171795573, <<"bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz\nce2c2e91bbee81237302fb6b3e46414">>}).  
** exception error: no match of right hand side value not_found
([email protected])19> leo_object_storage_api:head({4529297360425744296764133758171795573, <<"bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz\nce2c2e91bbee81237302fb6b3e46414">>}).             
not_found

From stor02:

[vm@stor02 ~]$ /usr/local/leofs/current/leo_storage/bin/leo_storage remote_console
Erlang/OTP 20 [erts-9.0] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V9.0  (abort with ^G)
([email protected])1> leo_redundant_manager_api:get_redundancies_by_key(<<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>).
{ok,{redundancies,333535048481842093793027197745423965434,
                  333336867892509251216686562825411144350,
                  333651228914897360571389480935392916771,[],[],
                  [{redundant_node,'[email protected]',
                                   true,true,'L'},
                   {redundant_node,'[email protected]',
                                   true,true,'FL'},
                   {redundant_node,'[email protected]',
                                   true,true,'FL'}],
                  3,1,2,1,0,0,2263375396}}
([email protected])2> {ok, Ret} = leo_object_storage_api:head({333535048481842093793027197745423965434, <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
{ok,<<131,104,22,100,0,10,109,101,116,97,100,97,116,97,
      95,51,109,0,0,0,162,98,111,100,49,47,48,...>>}
([email protected])3> {ok, _, Meta, Body} = leo_storage_handler_object:get({make_ref(), <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
** exception error: no match of right hand side value {error,#Ref<0.3815312026.4219207682.241394>,not_found}
([email protected])4> leo_redundant_manager_api:get_redundancies_by_key(<<"bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz\nde4cfe14c9e4ba8bfd003b21c198297c">>).
{ok,{redundancies,31564668307327871006024546818790245276,
                  30134353396075751234097884368929358631,
                  32497452806801139283842932177160698277,[],[],
                  [{redundant_node,'[email protected]',
                                   true,true,'L'},
                   {redundant_node,'[email protected]',
                                   true,true,'FL'},
                   {redundant_node,'[email protected]',
                                   true,true,'FL'}],
                  3,1,2,1,0,0,2263375396}}
([email protected])5> {ok, Ret} = leo_object_storage_api:head({333535048481842093793027197745423965434, <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
{ok,<<131,104,22,100,0,10,109,101,116,97,100,97,116,97,
      95,51,109,0,0,0,162,98,111,100,49,47,48,...>>}
([email protected])6> {ok, _, Meta, Body} = leo_storage_handler_object:get({make_ref(), <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
** exception error: no match of right hand side value {error,#Ref<0.3815312026.4225761281.201527>,not_found}

Stor05:

[vm@stor05 ~]$ /usr/local/leofs/current/leo_storage/bin/leo_storage remote_console
Erlang/OTP 20 [erts-9.0] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V9.0  (abort with ^G)
([email protected])1> leo_redundant_manager_api:get_redundancies_by_key(<<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>).
{ok,{redundancies,333535048481842093793027197745423965434,
                  333336867892509251216686562825411144350,
                  333651228914897360571389480935392916771,[],[],
                  [{redundant_node,'[email protected]',
                                   true,true,'L'},
                   {redundant_node,'[email protected]',
                                   true,true,'FL'},
                   {redundant_node,'[email protected]',
                                   true,true,'FL'}],
                  3,1,2,1,0,0,2263375396}}
([email protected])2> {ok, Ret} = leo_object_storage_api:head({333535048481842093793027197745423965434, <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
{ok,<<131,104,22,100,0,10,109,101,116,97,100,97,116,97,
      95,51,109,0,0,0,162,98,111,100,49,47,48,...>>}
([email protected])3> binary_to_term(Ret).
{metadata_3,<<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5"...>>,
            333535048481842093793027197745423965434,162,0,<<>>,0,0,0,0,
            576529816,1506115109738840,63673334309,
            281949768489412648962353822266799178366,2263375396,
            undefined,0,0,0,0,0,1}
([email protected])4> {ok, _, Meta, Body} = leo_storage_handler_object:get({make_ref(), <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
** exception error: no match of right hand side value {error,#Ref<0.412206262.3164078081.130865>,not_found}
([email protected])5> leo_redundant_manager_api:get_redundancies_by_key(<<"bod1/00/c2/47/00c24766bd56e4fff68f8de1773231802ebcd30557d2afadb981a3ba0a723ee2bcba5a5267c4e0d444d45f516358f4cfe01cee0000000000.xz\nde4cfe14c9e4ba8bfd003b21c198297c">>).
{ok,{redundancies,31564668307327871006024546818790245276,
                  30134353396075751234097884368929358631,
                  32497452806801139283842932177160698277,[],[],
                  [{redundant_node,'[email protected]',
                                   true,true,'L'},
                   {redundant_node,'[email protected]',
                                   true,true,'FL'},
                   {redundant_node,'[email protected]',
                                   true,true,'FL'}],
                  3,1,2,1,0,0,2263375396}}
([email protected])6> {ok, Ret} = leo_object_storage_api:head({333535048481842093793027197745423965434, <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
{ok,<<131,104,22,100,0,10,109,101,116,97,100,97,116,97,
      95,51,109,0,0,0,162,98,111,100,49,47,48,...>>}
([email protected])7> {ok, _, Meta, Body} = leo_storage_handler_object:get({make_ref(), <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
** exception error: no match of right hand side value {error,#Ref<0.412206262.3165388801.85123>,not_found}
([email protected])8> leo_redundant_manager_api:get_redundancies_by_key(<<"bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz\nce2c2e91bbee81237302fb6b3e46414">>).
{ok,{redundancies,4529297360425744296764133758171795573,
                  4279128009316051424741391834839533365,
                  4593245109081467175416293916813886545,[],[],
                  [{redundant_node,'[email protected]',
                                   true,true,'L'},
                   {redundant_node,'[email protected]',
                                   false,true,'FL'},
                   {redundant_node,'[email protected]',
                                   false,true,'FL'}],
                  3,1,2,1,0,0,2263375396}}
([email protected])9> leo_object_storage_api:head({4529297360425744296764133758171795573, <<"bod1/01/5c/8e/015c8e7ff954828859b63a3bb5d0d3cb1786ec10e8dd920e10f2bd03e9aed013881c1e037da63679f22e73969f567141005e0b0100000000.xz\nce2c2e91bbee81237302fb6b3e46414">>}).             
not_found

Is leo_storage_handler_object:get() supposed to work with these remains of multipart uploads anyway? It works for me for real objects but not for these parts with "\n".

@mocchira
Copy link
Member

mocchira commented Oct 6, 2017

@vstax

** exception error: no match of right hand side value {error,#Ref<0.3093727938.2882535425.32165>,not_found}

The variable in Erlang is immutable so you can't use the same variable to store some different value so for instance, you have to use {ok, _, Meta2, Body2} = leo_storage_handler_object:get for the second try. also there is a command f() to clear all bound varibales so the below example also should work if you want to store a value using the same variable name.

Eshell V9.0  (abort with ^G)
2> B = 1.
1
3> B = 2.
** exception error: no match of right hand side value 2
4> f().
ok
5> B = 2.
2
6>

@vstax
Copy link
Contributor Author

vstax commented Oct 6, 2017

@mocchira Yeah I've tried that as well (I've removed few lines from output as the result was the same)
You can see me doing calls without saving the result but the answer was still "not found".
Here is one thing I missed, "binary_to_term" for head result on stor02:

([email protected])1> {ok, Ret} = leo_object_storage_api:head({333535048481842093793027197745423965434, <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
{ok,<<131,104,22,100,0,10,109,101,116,97,100,97,116,97,
      95,51,109,0,0,0,162,98,111,100,49,47,48,...>>}
([email protected])2> binary_to_term(Ret).
{metadata_3,<<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5"...>>,
            333535048481842093793027197745423965434,162,0,<<>>,0,0,0,0,
            522207252,1506115109738840,63673334309,
            281949768489412648962353822266799178366,2263375396,
            undefined,0,0,0,0,0,1}
([email protected])3> leo_storage_handler_object:get({make_ref(), <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
{error,#Ref<0.2544486004.3686793218.99269>,not_found}

@mocchira
Copy link
Member

mocchira commented Oct 6, 2017

@vstax

Is leo_storage_handler_object:get() supposed to work with these remains of multipart uploads anyway? It works for me for real objects but not for these parts with "\n".

Yes.

and the result you pasted looks fine now.

{metadata_3,<<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5"...>>,
            333535048481842093793027197745423965434,162,0,<<>>,0,0,0,0,
            522207252,1506115109738840,63673334309,
            281949768489412648962353822266799178366,2263375396,
            undefined,0,0,0,0,0,1}

The last field is the delete flag and now it's set to 1 means (true) and

([email protected])3> leo_storage_handler_object:get({make_ref(), <<"bod1/00/ac/66/00ac66deef727a4538c57e1415d1f24a006c614c8f66e185561cc0d0580f27a61af8c36d5dc9d192876dbfa1192ee5be8ae6610000000000.xz\nb5c590202c7fd0878b4854c063bfb984">>}).
{error,#Ref<0.2544486004.3686793218.99269>,not_found}

leo_storage_handler_object:get returns not_found so each result is now consistent so I believe the same error never happen for objects having such consistent state.

Would repeat of experiment be any useful? (i.e. now, when we are sure that RING is identical, stop stor02-06, delete contents of that queue, start them, run recover-node again and wait).

Now the repeat would be useful to me.

@vstax
Copy link
Contributor Author

vstax commented Oct 6, 2017

@mocchira
I did recover-node and the same problem happens again. Now I was looking more closely how it starts and the queue for the first node when it happened didn't reach 0 at that moment:

Пт окт  6 20:21:42 MSK 2017
 leo_per_object_queue           |   running   | 16629          | 1600           | 500            | recover inconsistent objs
Пт окт  6 20:21:48 MSK 2017
 leo_per_object_queue           |   running   | 16456          | 1600           | 500            | recover inconsistent objs
Пт окт  6 20:21:53 MSK 2017
 leo_per_object_queue           |   running   | 16456          | 1600           | 500            | recover inconsistent objs
Пт окт  6 20:21:58 MSK 2017
 leo_per_object_queue           |   running   | 16368          | 1600           | 500            | recover inconsistent objs

[...skipped...]

Пт окт  6 20:24:34 MSK 2017
 leo_per_object_queue           |   running   | 16368          | 1600           | 500            | recover inconsistent objs
Пт окт  6 20:24:39 MSK 2017
 leo_per_object_queue           |   running   | 15296          | 1600           | 500            | recover inconsistent objs
Пт окт  6 20:24:45 MSK 2017
 leo_per_object_queue           |   running   | 15296          | 1600           | 500            | recover inconsistent objs
Пт окт  6 20:24:50 MSK 2017
 leo_per_object_queue           |   running   | 15296          | 1600           | 500            | recover inconsistent objs
Пт окт  6 20:24:55 MSK 2017
 leo_per_object_queue           |   running   | 15296          | 1600           | 500            | recover inconsistent objs
Пт окт  6 20:25:00 MSK 2017
 leo_per_object_queue           |   running   | 15296          | 1600           | 500            | recover inconsistent objs
Пт окт  6 20:25:05 MSK 2017
 leo_per_object_queue           |   running   | 15296          | 1600           | 500            | recover inconsistent objs
Пт окт  6 20:25:10 MSK 2017
 leo_per_object_queue           |   running   | 15296          | 1600           | 500            | recover inconsistent objs
Пт окт  6 20:25:15 MSK 2017
 leo_per_object_queue           |   running   | 15056          | 1600           | 500            | recover inconsistent objs
Пт окт  6 20:25:20 MSK 2017
 leo_per_object_queue           |   running   | 15028          | 1600           | 500            | recover inconsistent objs

Errors starts to spam in log just like that:

[E]	[email protected]	2017-10-06 20:24:42.812784 +0300	1507310682	leo_storage_handler_object:get/1	89	[{from,storage},{method,get},{key,<<"bod1/00/27/22/002722c8b4595cd3008507b15bf41d36f5045a58ea74d619aa5e541968992b0bb21b0f5b7d289232f62eb38f7737869fa8c8bc0000000000.xz\nbf95a664b68a1aa7c5ba2fd29b7d2a6b">>},{cause,not_found}]
[E]	[email protected]	2017-10-06 20:24:42.813149 +0300	1507310682	leo_storage_handler_object:get/1	89	[{from,storage},{method,get},{key,<<"bod1/00/29/3a/00293a9bff33a1a3633762d411310ec0af7a9fb3baa7856d5a70fee2dbfc709b9c2e4f8d01ccf136d608cbf5d6216bcc60637a0000000000.xz\ncd2ee77154853ba0eef97e82e3945d4a">>},{cause,not_found}]
[E]	[email protected]	2017-10-06 20:24:42.813464 +0300	1507310682	leo_storage_handler_object:get/1	89	[{from,storage},{method,get},{key,<<"bod1/02/ff/4c/02ff4c3d138e9d3725099ff18bd4f6de8b691a477572ae2625c714869c9d1342aa3b79709e33ef0b7df52db5598c6fa6c034980000000000.xz\n8ff29d1ff7476b29c94443351818689f">>},{cause,not_found}]

but the queue was still reducing at that point. Also, these errors are exactly the same as the last time, i.e. the names and order of objects in log match the previous attempt.

Few minutes after, the queue gets to

 leo_per_object_queue           |   running   | 12609          | 1600           | 500            | recover inconsistent objs

and stops reducing. The number in other queues have settled at exactly the same numbers as last time (e.g. on stor05 it's 13982).

Let me share some thoughts about the problem - maybe you'll notice something that will give you a hint about what can be wrong here.
First is differences between cluster with problem and cluster where recover-node works - this new (problematic) cluster uses multiple AVS directories, I haven't tried that before. Another difference is that I'm using "develop" version of leo_object storage with fixes leo-project/leo_object_storage@2425da2 and leo-project/leo_object_storage@d89d6d2 - but they seem to be unrelated. Another is that new cluster has N=3 but that shouldn't really affect anything? Also, I'm using DNS names.. no way that makes any difference, right?

Also, there are exactly 1600 different objects in error logs (each is MU upload temporary object); each object name was repeated 16-18K times during few hours the node was working like this. The exact objects are somewhat different on each node, though. 1600 here is obviously MQ batch size.

The fact that all objects in logs had either "bod1" or "bod2" prefix is just due to the fact queue never moved past first batch; looking at these queues directly, I can see objects from different buckets like "body", "body9" or "body12". I was uploading objects to various batches in several sessions - e.g. I was uploading to bucket "body" on Sep 22 and to bucket "body12" on Sep 28, with others between them. All are affected by this problem. In fact, looking at raw queues directly, I can see mentions of ~500-800K strings with object names, but considering that there are 3-10 duplicates of object mentions the real amount is probably closer to 70K per node which roughly matches total amount of these multipart headers: ~11K per bucket, 13 buckets (~140K total or 70K per node). I have no idea why each node only shows 12-18K messages in queue, though (I've tried crashing the node to make it count messages again but it was the same number). Anyhow, when looking at raw queue datafiles, it seems like every multipart object that I've uploaded is in there and no other objects at all. I also manually checked around 10 large and normal object and every large object is mentioned in these queues while every normal isn't present there. (unfortunately, I still don't know how to open this DB properly as LevelDB tools that I've tried don't like the format)

Objects in at least half of these buckets were uploaded through single gateway, so LB+multiple gateways aren't affecting this as well. Still, for every upload, no matter how it was done, header of multipart object seems to give problems during recover-node. I think if I upload more big objects they will show this problem as well, but it's hard to prove it with queues being stuck on the first batch.

By the way, how come queue wasn't throttling / reducing batch processing size? I mean, when diagnosing different problems before I've seen that to happen, but this time the same batch of 1600 messages is trying to process again and again without pause. Unrelated to the problem itself, though.

I checked head() output for the first object from this log and it's deleted on all 3 nodes:

([email protected])5> leo_redundant_manager_api:get_redundancies_by_key(<<"bod1/00/27/22/002722c8b4595cd3008507b15bf41d36f5045a58ea74d619aa5e541968992b0bb21b0f5b7d289232f62eb38f7737869fa8c8bc0000000000.xz\nbf95a664b68a1aa7c5ba2fd29b7d2a6b">>).
{ok,{redundancies,134875347933076857520658934770079078693,
                  134313893413423072045144857445749942809,
                  135443863945767657054348638898835091489,[],[],
                  [{redundant_node,'[email protected]',
                                   true,true,'L'},
                   {redundant_node,'[email protected]',
                                   true,true,'FL'},
                   {redundant_node,'[email protected]',
                                   true,true,'FL'}],
                  3,1,2,1,0,0,2263375396}}
([email protected])7> {ok, Ret} = leo_object_storage_api:head({134875347933076857520658934770079078693, <<"bod1/00/27/22/002722c8b4595cd3008507b15bf41d36f5045a58ea74d619aa5e541968992b0bb21b0f5b7d289232f62eb38f7737869fa8c8bc0000000000.xz\nbf95a664b68a1aa7c5ba2fd29b7d2a6b">>}).
{ok,<<131,104,22,100,0,10,109,101,116,97,100,97,116,97,
      95,51,109,0,0,0,162,98,111,100,49,47,48,...>>}
([email protected])8> binary_to_term(Ret).
{metadata_3,<<"bod1/00/27/22/002722c8b4595cd3008507b15bf41d36f5045a58ea74d619aa5e541968992b0bb21b0f5b7d289232f62eb38f773786"...>>,
            134875347933076857520658934770079078693,162,0,<<>>,0,0,0,0,
            431888618,1506114817134481,63673334017,
            281949768489412648962353822266799178366,2263375396,
            undefined,0,0,0,0,0,1}


([email protected])3> {ok, Ret} = leo_object_storage_api:head({134875347933076857520658934770079078693, <<"bod1/00/27/22/002722c8b4595cd3008507b15bf41d36f5045a58ea74d619aa5e541968992b0bb21b0f5b7d289232f62eb38f7737869fa8c8bc0000000000.xz\nbf95a664b68a1aa7c5ba2fd29b7d2a6b">>}).
{ok,<<131,104,22,100,0,10,109,101,116,97,100,97,116,97,
      95,51,109,0,0,0,162,98,111,100,49,47,48,...>>}
([email protected])4> binary_to_term(Ret).
{metadata_3,<<"bod1/00/27/22/002722c8b4595cd3008507b15bf41d36f5045a58ea74d619aa5e541968992b0bb21b0f5b7d289232f62eb38f773786"...>>,
            134875347933076857520658934770079078693,162,0,<<>>,0,0,0,0,
            487002745,1506114817134481,63673334017,
            281949768489412648962353822266799178366,2263375396,
            undefined,0,0,0,0,0,1}


([email protected])1> {ok, Ret} = leo_object_storage_api:head({134875347933076857520658934770079078693, <<"bod1/00/27/22/002722c8b4595cd3008507b15bf41d36f5045a58ea74d619aa5e541968992b0bb21b0f5b7d289232f62eb38f7737869fa8c8bc0000000000.xz\nbf95a664b68a1aa7c5ba2fd29b7d2a6b">>}).
{ok,<<131,104,22,100,0,10,109,101,116,97,100,97,116,97,
      95,51,109,0,0,0,162,98,111,100,49,47,48,...>>}
([email protected])2> binary_to_term(Ret).
{metadata_3,<<"bod1/00/27/22/002722c8b4595cd3008507b15bf41d36f5045a58ea74d619aa5e541968992b0bb21b0f5b7d289232f62eb38f773786"...>>,
            134875347933076857520658934770079078693,162,0,<<>>,0,0,0,0,
            518346021,1506114817134481,63673334017,
            281949768489412648962353822266799178366,2263375396,
            undefined,0,0,0,0,0,1}

So, what else can be done to get more information? Add some logging? Try recover-node on different node like stor02?

@vstax
Copy link
Contributor Author

vstax commented Oct 6, 2017

@mocchira
I was trying to go through stack of functions that you mentioned manually and I got here. I think this isn't supposed to work.. but it does?

([email protected])42> leo_storage_handler_object:replicate([list_to_atom("[email protected]")], 134875347933076857520658934770079078693, <<"bod1/00/27/22/002722c8b4595cd3008507b15bf41d36f5045a58ea74d619aa5e541968992b0bb21b0f5b7d289232f62eb38f7737869fa8c8bc0000000000.xz\nbf95a664b68a1aa7c5ba2fd29b7d2a6b">>).
ok
([email protected])43> leo_storage_handler_object:replicate([list_to_atom("[email protected]")], 134875347933076857520658934770079078693, <<"bod1/00/27/22/002722c8b4595cd3008507b15bf41d36f5045a58ea74d619aa5e541968992b0bb21b0f5b7d289232f62eb38f7737869fa8c8bc0000000000.xz\nbf95a664b68a1aa7c5ba2fd29b7d2a6b">>).
ok

@mocchira
Copy link
Member

@vstax Thanks for further digging. I may find the root cause however it seems to be difficult to take time today for some reason so I will work for it tomorrow and ask you to try the patch once I send PR.

mocchira added a commit to mocchira/leofs that referenced this issue Oct 11, 2017
@mocchira
Copy link
Member

@vstax #876 should fix the recover-node problem (also recover-file problem as well) so please give it a try.

@vstax
Copy link
Contributor Author

vstax commented Oct 11, 2017

@mocchira Thank you, it (almost) works. The queue gets processed, however the "not_found" errors still appear in logs as queue processes. I think that since everything is all right (there is nothing with having temporary multipart headers during recover-node) there should be no errors.

Recover-file still doesn't work, nothing happens. I tried enabling debug logs, nothing in them. Queues are empty.

Also, after I launched node - stor02 here - to process the old queue (that was stuck before) with this patch this happened:

[E]	[email protected]	2017-10-11 18:19:16.201291 +0300	1507735156	leo_mq_consumer:consume/4	526	[{module,leo_storage_mq},{id,leo_per_object_queue},{cause,badarg}]

[398 same "badarg" lines skipped]

[E]	[email protected]	2017-10-11 18:19:16.247038 +0300	1507735156	leo_mq_consumer:consume/4	526	[{module,leo_storage_mq},{id,leo_per_object_queue},{cause,badarg}]
[W]	[email protected]	2017-10-11 18:19:16.331064 +0300	1507735156	leo_storage_mq:correct_redundancies/1	736	[{key,<<"hcheck/empty">>},{cause,not_found}]
[E]	[email protected]	2017-10-11 18:19:16.331203 +0300	1507735156	leo_mq_consumer:consume/4	526	[{module,leo_storage_mq},{id,leo_per_object_queue},{cause,badarg}]
[E]	[email protected]	2017-10-11 18:19:16.331316 +0300	1507735156	leo_mq_consumer:consume/4	526	[{module,leo_storage_mq},{id,leo_per_object_queue},{cause,badarg}]

[1597 same "badarg" lines skipped]

[E]	[email protected]	2017-10-11 18:19:16.829376 +0300	1507735156	leo_mq_consumer:consume/4	526	[{module,leo_storage_mq},{id,leo_per_object_queue},{cause,badarg}]
[W]	[email protected]	2017-10-11 18:19:16.928684 +0300	1507735156	leo_storage_mq:correct_redundancies/1	736	[{key,<<"hcheck/empty">>},{cause,not_found}]
[E]	[email protected]	2017-10-11 18:19:16.928825 +0300	1507735156	leo_mq_consumer:consume/4	526	[{module,leo_storage_mq},{id,leo_per_object_queue},{cause,badarg}]

[1597 same "badarg" lines skipped]

[E]	[email protected]	2017-10-11 18:19:17.423433 +0300	1507735157	leo_mq_consumer:consume/4	526	[{module,leo_storage_mq},{id,leo_per_object_queue},{cause,badarg}]
[W]	[email protected]	2017-10-11 18:19:17.523715 +0300	1507735157	leo_storage_mq:correct_redundancies/1	736	[{key,<<"hcheck/empty">>},{cause,not_found}]
[E]	[email protected]	2017-10-11 18:19:17.523851 +0300	1507735157	leo_mq_consumer:consume/4	526	[{module,leo_storage_mq},{id,leo_per_object_queue},{cause,badarg}]

[1597 same "badarg" lines skipped]

[E]	[email protected]	2017-10-11 18:19:18.18311 +0300	1507735158	leo_mq_consumer:consume/4	526	[{module,leo_storage_mq},{id,leo_per_object_queue},{cause,badarg}]
[W]	[email protected]	2017-10-11 18:19:18.119651 +0300	1507735158	leo_storage_mq:correct_redundancies/1	736	[{key,<<"hcheck/empty">>},{cause,not_found}]
[E]	[email protected]	2017-10-11 18:19:18.120257 +0300	1507735158	leo_mq_consumer:consume/4	526	[{module,leo_storage_mq},{id,leo_per_object_queue},{cause,badarg}]

[3596 same "badarg" lines skipped]

[E]	[email protected]	2017-10-11 18:19:19.367329 +0300	1507735159	leo_mq_consumer:consume/4	526	[{module,leo_storage_mq},{id,leo_per_object_queue},{cause,badarg}]
[E]	[email protected]	2017-10-11 18:19:19.430913 +0300	1507735159	leo_storage_handler_object:get/1	90	[{from,storage},{method,get},{key,<<"bod1/00/11/16/0011163ac036c71e03883dd10c626f81cd1f55c6bc20817f69e76a2b2ab4ba30d175821a3a4ea5a34e285182584518b7d83ed00000000000.xz\nf07a6c4f005a5cdfb3c752ce2f258de9">>},{cause,not_found}]

[17796 "not_found" lines just like above skipped]

The amount of "not_found" lines match amount of messages in queue. What's bothering me is "badarg" errors that happened before that queue started to process and errors about "hcheck/empty" object. This is the (zero-byte) object that I was using for health checking from LB. It was never deleted, whereis output for it:

-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when            
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
       | [email protected]      | 9a023d3ce093d3f691f7f339bab7a27c     |         0B |   d41d8cd98f | false          |              0 | 55a54f0fec211  | 2017-09-29 17:35:50 +0300
       | [email protected]      | 9a023d3ce093d3f691f7f339bab7a27c     |         0B |   d41d8cd98f | false          |              0 | 55a54f0fec211  | 2017-09-29 17:35:50 +0300
       | [email protected]      | 9a023d3ce093d3f691f7f339bab7a27c     |         0B |   d41d8cd98f | false          |              0 | 55a54f0fec211  | 2017-09-29 17:35:50 +0300

Why would logs mention it at all?

All other nodes were running and weren't doing anything when I launched stor02.
On nodes stor03-stor06, it was almost the same: 8-11K "badarg" errors, then "not_found" with these temporary multipart obects; no mention of "hcheck" (even on bodies04).

That said, these problems seem minor and don't prevent me from finishing recover-node testing.

@mocchira
Copy link
Member

@vstax Thanks for checking.

That said, these problems seem minor and don't prevent me from finishing recover-node testing.

Good to hear that, I will keep working for your blocking issues here and #846.

Why would logs mention it at all?

Seems calling https://github.com/mocchira/leofs/blob/fix/issue859/apps/leo_storage/src/leo_storage_mq.erl#L731 with "hcheck/empty" returns not_found so that means there might be inconsistent RING records on stor02. can you check the result of leo_redundant_manager_api:get_redundancies_by_key with "hcheck/empty" on stor01 or stor04. The result would be like below

(storage_3@127.0.0.1)3> leo_redundant_manager_api:get_redundancies_by_key(<<"hcheck/empty">>).
{ok,{redundancies,204712737994773042953409181065537888892,
                  204330855779912657295634281112907108201,
                  205479344241174860687721462239176707334,[],[],
                  [{redundant_node,'[email protected]',true,true,'L'},
                   {redundant_node,'[email protected]',true,true,'FL'}],
                  2,1,1,1,0,0,958648327}}

then do leofs-adm dump-ring [email protected] and open the file named ring_cur_worker.log.xxxx and search the RING record in which "hcheck/empty" belongs to, with the case pasted on above, search the file for 204330855779912657295634281112907108201 or 205479344241174860687721462239176707334, if not found then on-memory RING records on stor02 is broken so try leofs-adm recover-ring [email protected] and confirm whether RING got recovered by leo_redundant_manager_api:get_redundancies_by_key on stor02.

[E] [email protected] 2017-10-11 18:19:19.430913 +0300 1507735159 leo_storage_handler_object:get/1 90 [{from,storage},{method,get},{key,<<"bod1/00/11/16/0011163ac036c71e03883dd10c626f81cd1f55c6bc20817f69e76a2b2ab4ba30d175821a3a4ea5a34e285182584518b7d83ed00000000000.xz\nf07a6c4f005a5cdfb3c752ce2f258de9">>},{cause,not_found}]

As it turned out sort of not_found error pasted above could be reduced, I will update the PR later so please check it out when you have time.

Recover-file still doesn't work, nothing happens. I tried enabling debug logs, nothing in them. Queues are empty.

Can you share the status of the target of recover-file with whereis?
I can't reproduce this with #876.

The amount of "not_found" lines match amount of messages in queue. What's bothering me is "badarg" errors that happened before that queue started to process and errors about "hcheck/empty" object
[E] [email protected] 2017-10-11 18:19:18.18311 +0300 1507735158 leo_mq_consumer:consume/4 526 [{module,leo_storage_mq},{id,leo_per_object_queue},{cause,badarg}]

This badarg error is kind of unexpected one (in theory it should never happen) so I'd like to dig deeper with debug logs so if possible, can you do the same test which probably reproduce those badarg errors again with debug enabled once I update the PR including additional debug logs and share the debug logs later?

@mocchira
Copy link
Member

mocchira commented Oct 12, 2017

Updated #876 for

As it turned out sort of not_found error pasted above could be reduced, I will update the PR later so please check it out when you have time.

and

This badarg error is kind of unexpected one (in theory it should never happen) so I'd like to dig deeper with debug logs so if possible, can you do the same test which probably reproduce those badarg errors again with debug enabled once I update the PR including additional debug logs and share the debug logs later?

@vstax
Copy link
Contributor Author

vstax commented Oct 12, 2017

@mocchira Thank you for support.

then do leofs-adm dump-ring [email protected] and open the file named ring_cur_worker.log.xxxx and search the RING record in which "hcheck/empty" belongs to, with the case pasted on above, search the file for 204330855779912657295634281112907108201 or 205479344241174860687721462239176707334, if not found then on-memory RING records on stor02 is broken so try leofs-adm recover-ring [email protected] and confirm whether RING got recovered by leo_redundant_manager_api:get_redundancies_by_key on stor02.

Output of get_redundancies_by_key is identical for all 3 nodes; the RING dumps and logs are identical as well, and identical to RING dumps that I made 7 days ago for #859 (comment). The second and third number are present in log file (but not the first one).

([email protected])1> leo_redundant_manager_api:get_redundancies_by_key(<<"hcheck/empty">>).
{ok,{redundancies,204712737994773042953409181065537888892,
                  204609913176439658070474165434891082665,
                  205151254216934795150413748967400838572,[],[],
                  [{redundant_node,'[email protected]',
                                   true,true,'L'},
                   {redundant_node,'[email protected]',
                                   true,true,'FL'},
                   {redundant_node,'[email protected]',
                                   true,true,'FL'}],
                  3,1,2,1,0,0,2263375396}}

Log from RING dump on stor02:

     {vnodeid_nodes,619,204609913176439658070474165434891082665,
         205151254216934795150413748967400838572,
         [{redundant_node,'[email protected]',true,
              true,undefined},
          {redundant_node,'[email protected]',true,
              true,undefined},
          {redundant_node,'[email protected]',true,
              true,undefined}]},

Unfortunately, I lost (moved instead of copied) the copy of queues on stor02 so I can't try to process them again; I will repeat the original experiment ("recover-node stor01") which created these queues in the first place, but running with latest patches. Maybe I will see this error again.

Can you share the status of the target of recover-file with whereis?

Recover-file fails to work on any objects that are in inconsistent state because of #845, for example

[vm@bodies-master ~]$ leofs-adm whereis "body/12/bf/0d/12bf0db7d8bcdf91a3a42ba97867bf6b785b0113754238dce5a4952524cb13182a7a2e64cfe7b3614a4c4551f0db4adf0002250100000000.xz\na7c6516a2e15f44a0e037043c88736d8"
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when                     
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
  *    | [email protected]      | 25964721380d6594f0d193cf949ee286     |         0B |   d41d8cd98f | false          |              0 | 559ce84414f6a  | 2017-09-23 01:13:20 +0300
  *    | [email protected]      | 25964721380d6594f0d193cf949ee286     |         0B |   d41d8cd98f | false          |              0 | 559ce84414f6a  | 2017-09-23 01:13:20 +0300
       | [email protected]      | 25964721380d6594f0d193cf949ee286     |         0B |   d41d8cd98f | false          |              0 | 559ce8438b58c  | 2017-09-23 01:13:19 +0300

[vm@bodies-master ~]$ leofs-adm recover-file "body/12/bf/0d/12bf0db7d8bcdf91a3a42ba97867bf6b785b0113754238dce5a4952524cb13182a7a2e64cfe7b3614a4c4551f0db4adf0002250100000000.xz\na7c6516a2e15f44a0e037043c88736d8"
OK
[vm@bodies-master ~]$ leofs-adm whereis "body/12/bf/0d/12bf0db7d8bcdf91a3a42ba97867bf6b785b0113754238dce5a4952524cb13182a7a2e64cfe7b3614a4c4551f0db4adf0002250100000000.xz\na7c6516a2e15f44a0e037043c88736d8"
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                  |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when                           
-------+-----------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
  *    | [email protected]      | 25964721380d6594f0d193cf949ee286     |         0B |   d41d8cd98f | false          |              0 | 559ce84414f6a  | 2017-09-23 01:13:20 +0300
  *    | [email protected]      | 25964721380d6594f0d193cf949ee286     |         0B |   d41d8cd98f | false          |              0 | 559ce84414f6a  | 2017-09-23 01:13:20 +0300
       | [email protected]      | 25964721380d6594f0d193cf949ee286     |         0B |   d41d8cd98f | false          |              0 | 559ce8438b58c  | 2017-09-23 01:13:19 +0300

It's only for these deleted temporary multipart header objects that recover-file doesn't work for me (it doesn't set "delete" flag on stor05 here). It works correctly for existing normal objects (including large objects). I can't say if it works or not for deleted normal objects as I don't have any that are in inconsistent state.

This badarg error is kind of unexpected one (in theory it should never happen) so I'd like to dig deeper with debug logs so if possible, can you do the same test which probably reproduce those badarg errors again with debug enabled once I update the PR including additional debug logs and share the debug logs later?

Unfortunately, it doesn't quite work. I do have badargs without this change, but with it I don't have neither not_found errors nor badargs nor extra debug output. I've tried older version again and copied queue directory that I've saved and both thousands of badargs and errors were back.

Actually, I did get a single badarg once on one of the nodes even with latest patch version:

[E]     [email protected]     2017-10-12 20:32:50.11129 +0300 1507829570      leo_mq_consumer:consume/4        526     [{module,leo_storage_mq},{id,leo_per_object_queue},{cause,badarg}]

but it's just one line and only on this node; unfortunately, precisely on this node and at that launch I didn't have debug logs enabled. I've restarted this node multiple times after that (replacing empty queues with copy before each start) but wasn't able to repeat this badarg with debugs logs enabled. "send_object_to_remote_node" never appears in debug logs.

Does this make any sense?

EDIT: I have object with name "body//s/to/ra//storage/nfsshares/storage/00/ff/00ff10f0536f07ccc38af567bf2fb3edd8cefc7d5efd0fc66c7910c4da2ccd69b8d457bd44ee6b1be8b9230c130fbcb7c200000000000000.xz" on stor03 (that is, stor03 is primary node for that one); it was created due to incorrect launch of upload script. These double slashes in name don't seem to create any problems during PUT/GET but I wonder if it is really OK to have such an object (given that slashes are used as separator for creating index for prefix search and such). The reason I remembered about this object (which exists, was never deleted and all) is that it always appear in logs when launching stor03 with these (used to be problematic) queues:

[D]   [email protected]     2017-10-12 20:51:03.826934 +0300        1507830663     leo_storage_handler_object:get/1 78      [{from,storage},{method,get},{key,<<"body//s/to/ra//storage/nfsshares/storage/00/ff/00ff10f0536f07ccc38af567bf2fb3edd8cefc7d5efd0fc66c7910c4da2ccd69b8d457bd44ee6b1be8b9230c130fbcb7c200000000000000.xz">>}]

I think it's the only object in these queues that doesn't fall into "deleted MU header" category, it's small as well (184 bytes). Not sure why it's there (maybe somehow it just was the last object in queues), but I doubt it's the cause of any problems with queues as it only exists on 3 nodes and problems are on each of them. I'm mentioning about it just in case because I never tried working with objects with double slashes in name before.

@mocchira
Copy link
Member

@vstax Thanks for trying new patch.

Output of get_redundancies_by_key is identical for all 3 nodes; the RING dumps and logs are identical as well, and identical to RING dumps that I made 7 days ago for #859 (comment). The second and third number are present in log file (but not the first one).

Looks fine now so the error happened at https://github.com/mocchira/leofs/blob/fix/issue859/apps/leo_storage/src/leo_storage_mq.erl#L761-L762 will no longer happen with the current stable RING.

Unfortunately, I lost (moved instead of copied) the copy of queues on stor02 so I can't try to process them again; I will repeat the original experiment ("recover-node stor01") which created these queues in the first place, but running with latest patches. Maybe I will see this error again.

OK. please let me know if you see again.

but it's just one line and only on this node; unfortunately, precisely on this node and at that launch I didn't have debug logs enabled. I've restarted this node multiple times after that (replacing empty queues with copy before each start) but wasn't able to repeat this badarg with debugs logs enabled. "send_object_to_remote_node" never appears in debug logs.
Does this make any sense?

Maybe. Other than debug logging, the prev patch and current one is logically doing the same things however physically there is a difference

  • the prev patch
    • GET first
    • if GET returns not_found(delete flag: true) then try to HEAD
  • the current one
    • HEAD first
    • if HEAD returns metadata with delete flag false then try to GET

In order to avoid getting the error log filled with not_found, I'd change to the current logic and this may affect the odds badarg could happen for some reason (reduced I/O load on the partition where log files lay down or reduced GET rate or something else may decrease the odds some unknown racy problem?).

Given that the badarg rate drastically has decreased with the current patch, I'd like to change the log level for tracking these badarg to ERROR so please let me know if you face badarg again.

I think it's the only object in these queues that doesn't fall into "deleted MU header" category, it's small as well (184 bytes). Not sure why it's there (maybe somehow it just was the last object in queues), but I doubt it's the cause of any problems with queues as it only exists on 3 nodes and problems are on each of them. I'm mentioning about it just in case because I never tried working with objects with double slashes in name before.

Thanks for sharing.
basic operations (GET/PUT/DELETE/HEAD) against objects with double slashes seem to work fine however listing seems not to work properly as below.

### there is test//readme
leofs@cat2neat:leofs.1.3.5$ s3cmd ls s3://test/
                       DIR   s3://test//
### expect readme to be returned but the result was empty
leofs@cat2neat:leofs.1.3.5$ s3cmd ls s3://test//

So would not recommend using a key having double slashes.

@mocchira
Copy link
Member

It's only for these deleted temporary multipart header objects that recover-file doesn't work for me (it doesn't set "delete" flag on stor05 here). It works correctly for existing normal objects (including large objects). I can't say if it works or not for deleted normal objects as I don't have any that are in inconsistent state.

WIP

@mocchira
Copy link
Member

Updated #876 for

Given that the badarg rate drastically has decreased with the current patch, I'd like to change the log level for tracking these badarg to ERROR so please let me know if you face badarg again.

@mocchira
Copy link
Member

Updated #876 for

It's only for these deleted temporary multipart header objects that recover-file doesn't work for me

@vstax
Copy link
Contributor Author

vstax commented Oct 13, 2017

@mocchira

Looks fine now so the error happened at https://github.com/mocchira/leofs/blob/fix/issue859/apps/leo_storage/src/leo_storage_mq.erl#L761-L762 will no longer happen with the current stable RING.

Hmm I don't think RING was broken at any point, though, so it's strange. Well, doesn't matter that much now, I guess. It didn't happen in the latest recover-node experiment as well.

this may affect the odds badarg could happen for some reason (reduced I/O load on the partition where log files lay down or reduced GET rate or something else may decrease the odds some unknown racy problem?).

Let's see.. The queue processed very fast and probably was IO bound. The logs (if you mean text logs) - don't think they should affect anything, as there is no fsync(), not enough writes to fill dirty buffers and OS flushes buffers once every 5 seconds, but badargs (with old patch version) happen all the time and on each node, so it's not about text logs. Queue files themselves: LevelDB should issue fsync() or at least fdatasync() under some conditions during write, I think? The queue is stored in /var which is located on soft RAID10 made from small partitions in the beginning of each drive used for AVS files. So intense seek around AVS files or their indexes will hurt IO for queues if they try to sync. That said, thousands of badargs always happen in 4-5 seconds, that's fast enough not to have serious IO expectations, I think? I mean, there might be IO load, but it's fast enough so it shouldn't hit any internal timeout or anything. At very least, it seems unlikely to me that when something happens that fast IO load should cause any errors directly.

Interesting part is that amount of badarg messages on each node was some round number like 11200, 8400, 9200. The exact amount on each node isn't stable, it can vary from launch to launch. Plus, that case with a single badarg message - which I wasn't able to reproduce after, another argument for "some race" theory, I'd say. Unfortunately :(

It should be better now with message being "error" as I can't possibly run recover-node with debug enabled (due to logger problems). Then again, I can't seem to be able to get badargs anymore anyway.

So would not recommend using a key having double slashes.

Oh, I did not plan to, it happened by mistake. Yes, it seems to hide from "ls" but once you get past double slashes it works again:

$ s3cmd ls s3://body//s/to
                       DIR   s3://body//s/to/ra/
$ s3cmd ls s3://body//s/to/ra
                       DIR   s3://body//s/to/ra//
$ s3cmd ls s3://body//s/to/ra//
$ s3cmd ls s3://body//s/to/ra//storage
                       DIR   s3://body//s/to/ra//storage/nfsshares/

I can't test it now but googling shows that real AWS would've shown "storage" here when trying to do "ls s3://body//s/to/ra//". Well, it's an obscure and minor problem.

Latest #876 allows me to do recover-file with MU header correctly, thanks.

@vstax
Copy link
Contributor Author

vstax commented Oct 23, 2017

I don't have this issue anymore. Even though the reason for badargs wasn't found, they don't happen right now so maybe it's ok to close this.

@mocchira
Copy link
Member

@vstax OK, please close this one. just in case, I will file another issue to track how badarg could happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants