Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not match checksum of an large-object-parent after executed the rebalance command and the recover-node command #223

Closed
yosukehara opened this issue Aug 21, 2014 · 0 comments
Assignees

Comments

@yosukehara
Copy link
Member

I've found this situation with LeoFS v1.1.1-dev as follows:

Case-1: Recover objects of "storage_3"

Summary

  • Discard/Collapse data
  • Suspend "storage_3"
  • Restart "storage_3"
  • Execute the "recover-node" command

Log

Launch LeoFS

$ sh bootstrap.sh start
:::
::: Starting the storages :::
:::
Generating RING...
Generated RING
OK  25% - [email protected]
OK  50% - [email protected]
OK  75% - [email protected]
OK 100% - [email protected]
OK
[System config]
                System version : 1.1.1
                    Cluster Id : leofs_1
                         DC Id : dc_1
                Total replicas : 2
           # of successes of R : 1
           # of successes of W : 1
           # of successes of D : 1
 # of DC-awareness replicas    : 0
                     ring size : 2^128
             Current ring hash : b50b0e91
                Prev ring hash : b50b0e91
[Multi DC replication settings]
         max # of joinable DCs : 2
            # of replicas a DC : 1

[Node(s) state]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at         
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | b50b0e91       | b50b0e91       | 2014-08-21 09:43:45 +0900
  S    | [email protected]      | running      | b50b0e91       | b50b0e91       | 2014-08-21 09:43:45 +0900
  S    | [email protected]      | running      | b50b0e91       | b50b0e91       | 2014-08-21 09:43:45 +0900
  S    | [email protected]      | running      | b50b0e91       | b50b0e91       | 2014-08-21 09:43:45 +0900

*** leofs - Finished :) ***

Create a bucket and Put an object

$ s3cmd mb s3://photo
Bucket 's3://photo/' created
yosuke@yosuke-optiplex-7010:~/dev/leo/leofs (develop)$ s3cmd put ~/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz s3://photo
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 1 of 11, 15MB]
 15728640 of 15728640   100% in    0s    26.88 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 2 of 11, 15MB]
 15728640 of 15728640   100% in    0s    26.11 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 3 of 11, 15MB]
 15728640 of 15728640   100% in    0s    24.71 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 4 of 11, 15MB]
 15728640 of 15728640   100% in    0s    25.74 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 5 of 11, 15MB]
 15728640 of 15728640   100% in    0s    24.56 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 6 of 11, 15MB]
 15728640 of 15728640   100% in    0s    25.17 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 7 of 11, 15MB]
 15728640 of 15728640   100% in    0s    24.42 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 8 of 11, 15MB]
 15728640 of 15728640   100% in    0s    25.04 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 9 of 11, 15MB]
 15728640 of 15728640   100% in    0s    25.33 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 10 of 11, 15MB]
 15728640 of 15728640   100% in    0s    24.51 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 11 of 11, 2028kB]
 2077187 of 2077187   100% in    0s    24.63 MB/s  done

$ ./leofs-adm whereis photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
 del?  |           node           |             ring address             |    size    |   checksum   |  # of chunks   |     clock      |             when            
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
       | [email protected]      | 9cb468a0db43ae5f2c7ee05ef50288c5     |    155629K |   47a6cef684 |             11 | 501190bc61a26  | 2014-08-21 09:46:06 +0900
       | [email protected]      | 9cb468a0db43ae5f2c7ee05ef50288c5     |    155629K |   47a6cef684 |             11 | 501190bc61a26  | 2014-08-21 09:46:06 +0900

Discard/Collapse "storage_3" data

$ cd package/leo_storage_3/avs/
$ rm -rf metadata/ object/
$ cd -

Restart "storage_3"

$ ./leofs-adm suspend [email protected]
$ ./package/leo_storage_3/bin/leo_storage stop
$ ./package/leo_storage_3/bin/leo_storage start

Resume and Recover "storage_3"

$ ./leofs-adm status
[System config]
                System version : 1.1.1
                    Cluster Id : leofs_1
                         DC Id : dc_1
                Total replicas : 2
           # of successes of R : 1
           # of successes of W : 1
           # of successes of D : 1
 # of DC-awareness replicas    : 0
                     ring size : 2^128
             Current ring hash : b50b0e91
                Prev ring hash : b50b0e91
[Multi DC replication settings]
         max # of joinable DCs : 2
            # of replicas a DC : 1

[Node(s) state]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at         
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | b50b0e91       | b50b0e91       | 2014-08-21 09:43:45 +0900
  S    | [email protected]      | running      | b50b0e91       | b50b0e91       | 2014-08-21 09:43:45 +0900
  S    | [email protected]      | running      | b50b0e91       | b50b0e91       | 2014-08-21 09:43:45 +0900
  S    | [email protected]      | restarted    | b50b0e91       | b50b0e91       | 2014-08-21 09:48:45 +0900
  G    | [email protected]      | running      | b50b0e91       | b50b0e91       | 2014-08-21 09:43:47 +0900

$ ./leofs-adm resume [email protected]      
OK

$ ./leofs-adm status                    
[System config]
                System version : 1.1.1
                    Cluster Id : leofs_1
                         DC Id : dc_1
                Total replicas : 2
           # of successes of R : 1
           # of successes of W : 1
           # of successes of D : 1
 # of DC-awareness replicas    : 0
                     ring size : 2^128
             Current ring hash : b50b0e91
                Prev ring hash : b50b0e91
[Multi DC replication settings]
         max # of joinable DCs : 2
            # of replicas a DC : 1

[Node(s) state]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at         
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | b50b0e91       | b50b0e91       | 2014-08-21 09:43:45 +0900
  S    | [email protected]      | running      | b50b0e91       | b50b0e91       | 2014-08-21 09:43:45 +0900
  S    | [email protected]      | running      | b50b0e91       | b50b0e91       | 2014-08-21 09:43:45 +0900
  S    | [email protected]      | running      | b50b0e91       | b50b0e91       | 2014-08-21 09:48:45 +0900
  G    | [email protected]      | running      | b50b0e91       | b50b0e91       | 2014-08-21 09:43:47 +0900

Check state of the object in the cluster

$ ./leofs-adm whereis photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
 del?  |           node           |             ring address             |    size    |   checksum   |  # of chunks   |     clock      |             when            
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
       | [email protected]      | 9cb468a0db43ae5f2c7ee05ef50288c5     |    155629K |   d41d8cd98f |             11 | 501190bc61a26  | 2014-08-21 09:46:06 +0900
       | [email protected]      | 9cb468a0db43ae5f2c7ee05ef50288c5     |    155629K |   47a6cef684 |             11 | 501190bc61a26  | 2014-08-21 09:46:06 +0900

Case-2: Takeover objects of "storage_3" to a new node - "storage_4"

Summary

  • Discard/Collapse data
  • Detach "storage_3"
  • Start "storage_4" (Attach)
  • Execute the "rebalance" command

Log

Launch LeoFS

:::
::: Starting the storages :::
:::
Generating RING...
Generated RING
OK  25% - [email protected]
OK  50% - [email protected]
OK  75% - [email protected]
OK 100% - [email protected]
OK
[System config]
                System version : 1.1.1
                    Cluster Id : leofs_1
                         DC Id : dc_1
                Total replicas : 2
           # of successes of R : 1
           # of successes of W : 1
           # of successes of D : 1
 # of DC-awareness replicas    : 0
                     ring size : 2^128
             Current ring hash : 74d698d4
                Prev ring hash : 74d698d4
[Multi DC replication settings]
         max # of joinable DCs : 2
            # of replicas a DC : 1

[Node(s) state]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at         
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | 74d698d4       | 74d698d4       | 2014-08-21 10:07:45 +0900
  S    | [email protected]      | running      | 74d698d4       | 74d698d4       | 2014-08-21 10:07:45 +0900
  S    | [email protected]      | running      | 74d698d4       | 74d698d4       | 2014-08-21 10:07:45 +0900
  S    | [email protected]      | running      | 74d698d4       | 74d698d4       | 2014-08-21 10:07:45 +0900

*** leofs - Finished :) ***

Create a bucket and Put an object

$ s3cmd mb s3://photo
Bucket 's3://photo/' created

$ s3cmd put ~/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz s3://photo
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 1 of 11, 15MB]
 15728640 of 15728640   100% in    0s    25.63 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 2 of 11, 15MB]
 15728640 of 15728640   100% in    0s    25.44 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 3 of 11, 15MB]
 15728640 of 15728640   100% in    0s    25.70 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 4 of 11, 15MB]
 15728640 of 15728640   100% in    0s    25.69 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 5 of 11, 15MB]
 15728640 of 15728640   100% in    0s    24.57 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 6 of 11, 15MB]
 15728640 of 15728640   100% in    0s    25.15 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 7 of 11, 15MB]
 15728640 of 15728640   100% in    0s    25.07 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 8 of 11, 15MB]
 15728640 of 15728640   100% in    0s    25.05 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 9 of 11, 15MB]
 15728640 of 15728640   100% in    0s    24.89 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 10 of 11, 15MB]
 15728640 of 15728640   100% in    0s    25.66 MB/s  done
/home/yosuke/Downloads/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz -> s3://photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz  [part 11 of 11, 2028kB]
 2077187 of 2077187   100% in    0s    20.68 MB/s  done

$ ./leofs-adm whereis photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
 del?  |           node           |             ring address             |    size    |   checksum   |  # of chunks   |     clock      |             when            
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
       | [email protected]      | 9cb468a0db43ae5f2c7ee05ef50288c5     |    155629K |   47a6cef684 |             11 | 501195fcd8a4d  | 2014-08-21 10:09:36 +0900
       | [email protected]      | 9cb468a0db43ae5f2c7ee05ef50288c5     |    155629K |   47a6cef684 |             11 | 501195fcd8a4d  | 2014-08-21 10:09:36 +0900

Detach "storage_3" and Attach a new node then Rebalance

### STORAGE-3
$ ./leofs-adm detach [email protected]
OK
$ ./package/leo_storage_3/bin/leo_storage stop
ok

### NEW-NODE
$ ./package/leo_storage_4/bin/leo_storage start

$ ./leofs-adm status
[System config]
                System version : 1.1.1
                    Cluster Id : leofs_1
                         DC Id : dc_1
                Total replicas : 2
           # of successes of R : 1
           # of successes of W : 1
           # of successes of D : 1
 # of DC-awareness replicas    : 0
                     ring size : 2^128
             Current ring hash : 74d698d4
                Prev ring hash : 74d698d4
[Multi DC replication settings]
         max # of joinable DCs : 2
            # of replicas a DC : 1

[Node(s) state]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at         
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | 74d698d4       | 74d698d4       | 2014-08-21 10:07:45 +0900
  S    | [email protected]      | running      | 74d698d4       | 74d698d4       | 2014-08-21 10:07:45 +0900
  S    | [email protected]      | running      | 74d698d4       | 74d698d4       | 2014-08-21 10:07:45 +0900
  S    | [email protected]      | detached     | 74d698d4       | 74d698d4       | 2014-08-21 10:07:45 +0900
  S    | [email protected]      | attached     |                |                | 2014-08-21 10:13:24 +0900
  G    | [email protected]      | running      | 74d698d4       | 74d698d4       | 2014-08-21 10:07:47 +0900


$ ./leofs-adm rebalance
Generating rebalance-list...
Generated rebalance-list
Distributing rebalance-list to the storage nodes
OK  25% - [email protected]
OK  50% - [email protected]
OK  75% - [email protected]
OK 100% - [email protected]
OK


$ ./leofs-adm status
[System config]
                System version : 1.1.1
                    Cluster Id : leofs_1
                         DC Id : dc_1
                Total replicas : 2
           # of successes of R : 1
           # of successes of W : 1
           # of successes of D : 1
 # of DC-awareness replicas    : 0
                     ring size : 2^128
             Current ring hash : e6e09570
                Prev ring hash : e6e09570
[Multi DC replication settings]
         max # of joinable DCs : 2
            # of replicas a DC : 1

[Node(s) state]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at         
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | e6e09570       | e6e09570       | 2014-08-21 10:07:45 +0900
  S    | [email protected]      | running      | e6e09570       | e6e09570       | 2014-08-21 10:07:45 +0900
  S    | [email protected]      | running      | e6e09570       | e6e09570       | 2014-08-21 10:07:45 +0900
  S    | [email protected]      | running      | e6e09570       | e6e09570       | 2014-08-21 10:14:39 +0900
  G    | [email protected]      | running      | e6e09570       | e6e09570       | 2014-08-21 10:07:47 +0900

Check state of the object in the cluster

$ ./leofs-adm whereis photo/eclipse-java-kepler-SR2-linux-gtk-x86_64.tar.gz
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
 del?  |           node           |             ring address             |    size    |   checksum   |  # of chunks   |     clock      |             when            
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
       | [email protected]      | 9cb468a0db43ae5f2c7ee05ef50288c5     |    155629K |   d41d8cd98f |             11 | 501195fcd8a4d  | 2014-08-21 10:09:36 +0900
       | [email protected]      | 9cb468a0db43ae5f2c7ee05ef50288c5     |    155629K |   47a6cef684 |             11 | 501195fcd8a4d  | 2014-08-21 10:09:36 +0900
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant