Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to start nameserver when regester zk leader node after calling init function #3867

Closed
tobegit3hub opened this issue Apr 12, 2024 · 2 comments · Fixed by #3869
Closed
Assignees
Labels
bug Something isn't working

Comments

@tobegit3hub
Copy link
Collaborator

tobegit3hub commented Apr 12, 2024

Bug Description

Now we may fail to start nameserver after adding the sleep method.

882e694bdff08c51d40968d581b81e59

The issue may between Init() and RegisterName. The function init is asynchronous which is used to register zk watch. The function register is used to write endpoint in zk but it requires to be exeucted before init finished.

This is the wired design and occurs failure of starting nameserver.

Expected Behavior

Success to start nameserver whenever it sleeps.

Steps To Reproduce

  1. Pull https://github.com/oh2024/OpenMLDB/tree/496a2679a8473a76e76c2cecf3e9375080a53d0e
  2. Run new_server_env_test
@tobegit3hub tobegit3hub added the bug Something isn't working label Apr 12, 2024
@oh2024
Copy link
Collaborator

oh2024 commented Apr 12, 2024

Error logs:
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20240412 03:12:52.201548 39639 util.cc:58] setting temp path for test in "/tmp/openmldb/new_server_env_test598400"
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from NewServerEnvTest
[ RUN ] NewServerEnvTest.ShowRealEndpoint
I20240412 03:12:52.203043 39639 name_server_impl.cc:1414] zone name ns1/rtidb45250674
2024-04-12 03:12:52,203:39639(0x7f5ce6603500):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.14
2024-04-12 03:12:52,203:39639(0x7f5ce6603500):ZOO_INFO@log_env@757: Client environment:host.name=4b60f8b6dfcb
2024-04-12 03:12:52,203:39639(0x7f5ce6603500):ZOO_INFO@log_env@764: Client environment:os.name=Linux
2024-04-12 03:12:52,203:39639(0x7f5ce6603500):ZOO_INFO@log_env@765: Client environment:os.arch=3.10.0-862.el7.x86_64
2024-04-12 03:12:52,203:39639(0x7f5ce6603500):ZOO_INFO@log_env@766: Client environment:os.version=#1 SMP Fri Apr 20 16:44:24 UTC 2018
2024-04-12 03:12:52,203:39639(0x7f5ce6603500):ZOO_INFO@log_env@774: Client environment:user.name=(null)
2024-04-12 03:12:52,203:39639(0x7f5ce6603500):ZOO_INFO@log_env@782: Client environment:user.home=/root
2024-04-12 03:12:52,203:39639(0x7f5ce6603500):ZOO_INFO@log_env@794: Client environment:user.dir=/workspaces/OpenMLDB
2024-04-12 03:12:52,203:39639(0x7f5ce6603500):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:6181 sessionTimeout=100000 watcher=0x7e8420 sessionId=0 sessionPasswd= context=0x6c52000 flags=0
2024-04-12 03:12:52,204:39639(0x7f5cdffba700):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:6181]
2024-04-12 03:12:52,217:39639(0x7f5cdffba700):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:6181], sessionId=0x103614baed8000c, negotiated timeout=40000
I20240412 03:12:52.218111 39651 zk_client.cc:581] zookeeper event with type -1, state 3, path
I20240412 03:12:52.218254 39651 zk_client.cc:601] connect success
W20240412 03:12:52.218854 39639 zk_client.cc:555] fail to get children from path /rtidb45250674/nodes with errno -101
I20240412 03:12:52.223469 39639 zk_client.cc:302] create node /rtidb45250674/nodes ok and real node name /rtidb45250674/nodes
I20240412 03:12:52.233489 39639 zk_client.cc:302] create node /rtidb45250674/data/function ok and real node name /rtidb45250674/data/function
I20240412 03:12:53.251349 39652 zk_client.cc:302] create node /rtidb45250674/leader/lock_request ok and real node name /rtidb45250674/leader/lock_request0000000000
I20240412 03:12:53.251428 39652 dist_lock.cc:68] create node ok with assigned path /rtidb45250674/leader/lock_request0000000000
I20240412 03:12:53.251775 39652 dist_lock.cc:90] first child /rtidb45250674/leader/lock_request0000000000
I20240412 03:12:53.251807 39652 dist_lock.cc:94] get lock with assigned_path /rtidb45250674/leader/lock_request0000000000
I20240412 03:12:53.251832 39652 name_server_impl.cc:5542] become the leader name server
I20240412 03:12:53.260620 39652 zk_client.cc:302] create node /rtidb45250674/table/table_index ok and real node name /rtidb45250674/table/table_index
I20240412 03:12:53.260674 39652 name_server_impl.cc:566] init table_index[1]
I20240412 03:12:53.271608 39652 zk_client.cc:302] create node /rtidb45250674/table/term ok and real node name /rtidb45250674/table/term
I20240412 03:12:53.271662 39652 name_server_impl.cc:578] init term[1]
I20240412 03:12:53.278609 39652 zk_client.cc:302] create node /rtidb45250674/op/op_index ok and real node name /rtidb45250674/op/op_index
I20240412 03:12:53.278663 39652 name_server_impl.cc:590] init op_index[0]
I20240412 03:12:53.285871 39652 zk_client.cc:302] create node /rtidb45250674/table/notify ok and real node name /rtidb45250674/table/notify
I20240412 03:12:53.293161 39652 zk_client.cc:302] create node /rtidb45250674/notify/global_variable ok and real node name /rtidb45250674/notify/global_variable
I20240412 03:12:53.300524 39652 zk_client.cc:302] create node /rtidb45250674/config/auto_failover ok and real node name /rtidb45250674/config/auto_failover
I20240412 03:12:53.300575 39652 name_server_impl.cc:615] set zk_auto_failover_node[false]
W20240412 03:12:53.300868 39652 zk_client.cc:529] fail to get children from path /rtidb45250674/db with errno -101
W20240412 03:12:53.301151 39652 name_server_impl.cc:690] db node does not exist
W20240412 03:12:53.301376 39652 zk_client.cc:529] fail to get children from path /rtidb45250674/table/table_data with errno -101
W20240412 03:12:53.301584 39652 name_server_impl.cc:757] table data node does not exist
I20240412 03:12:53.301627 39652 name_server_impl.cc:763] need to recover default table num[0]
W20240412 03:12:53.301795 39652 zk_client.cc:529] fail to get children from path /rtidb45250674/table/db_table_data with errno -101
W20240412 03:12:53.301983 39652 name_server_impl.cc:784] db table data node does not exist
I20240412 03:12:53.302021 39652 name_server_impl.cc:790] need to recover db table num[0]
W20240412 03:12:53.302211 39652 zk_client.cc:529] fail to get children from path /rtidb45250674/store_procedure/db_sp_data with errno -101
W20240412 03:12:53.302420 39652 name_server_impl.cc:9554] zk_db_sp_data_path node [/rtidb45250674/store_procedure/db_sp_data] does not exist
I20240412 03:12:53.302595 39652 name_server_impl.cc:9322] /map/sdkendpoints node /rtidb45250674/map/sdkendpoints not exist
W20240412 03:12:53.302784 39652 zk_client.cc:529] fail to get children from path /rtidb45250674/cluster/replica with errno -101
W20240412 03:12:53.302976 39652 name_server_impl.cc:718] cluster info node does not exist
W20240412 03:12:53.303185 39652 zk_client.cc:529] fail to get children from path /rtidb45250674/op/op_data with errno -101
W20240412 03:12:53.303378 39652 name_server_impl.cc:823] op data node does not exist
I20240412 03:12:53.310165 39652 zk_client.cc:302] create node /rtidb45250674/db/__INTERNAL_DB ok and real node name /rtidb45250674/db/__INTERNAL_DB
I20240412 03:12:53.310261 39652 name_server_impl.cc:8981] create database __INTERNAL_DB success
I20240412 03:12:53.316862 39652 zk_client.cc:302] create node /rtidb45250674/db/__PRE_AGG_DB ok and real node name /rtidb45250674/db/__PRE_AGG_DB
I20240412 03:12:53.316949 39652 name_server_impl.cc:8981] create database __PRE_AGG_DB success
I20240412 03:12:53.323650 39652 zk_client.cc:302] create node /rtidb45250674/db/INFORMATION_SCHEMA ok and real node name /rtidb45250674/db/INFORMATION_SCHEMA
I20240412 03:12:53.323725 39652 name_server_impl.cc:8981] create database INFORMATION_SCHEMA success
I20240412 03:12:53.323999 39652 dist_lock.cc:119] my path /rtidb45250674/leader/lock_request0000000000 , first child /rtidb45250674/leader/lock_request0000000000 , lock value ns1
I20240412 03:12:53.324074 39652 dist_lock.cc:120] all child: lock_request0000000000
W20240412 03:12:57.252521 39639 zk_client.cc:237] server name:ns1 duplicate
/workspaces/OpenMLDB/src/nameserver/new_server_env_test.cc:69: Failure
Expected equality of these values:
true
nameserver->RegisterName()
Which is: false
2024-04-12 03:12:57,261:39639(0x7f5ce6603500):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.14
2024-04-12 03:12:57,261:39639(0x7f5ce6603500):ZOO_INFO@log_env@757: Client environment:host.name=4b60f8b6dfcb
2024-04-12 03:12:57,261:39639(0x7f5ce6603500):ZOO_INFO@log_env@764: Client environment:os.name=Linux
2024-04-12 03:12:57,261:39639(0x7f5ce6603500):ZOO_INFO@log_env@765: Client environment:os.arch=3.10.0-862.el7.x86_64
2024-04-12 03:12:57,261:39639(0x7f5ce6603500):ZOO_INFO@log_env@766: Client environment:os.version=#1 SMP Fri Apr 20 16:44:24 UTC 2018
2024-04-12 03:12:57,261:39639(0x7f5ce6603500):ZOO_INFO@log_env@774: Client environment:user.name=(null)
2024-04-12 03:12:57,261:39639(0x7f5ce6603500):ZOO_INFO@log_env@782: Client environment:user.home=/root
2024-04-12 03:12:57,262:39639(0x7f5ce6603500):ZOO_INFO@log_env@794: Client environment:user.dir=/workspaces/OpenMLDB
2024-04-12 03:12:57,262:39639(0x7f5ce6603500):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:6181 sessionTimeout=100000 watcher=0x7e8420 sessionId=0 sessionPasswd= context=0x72b4000 flags=0
2024-04-12 03:12:57,262:39639(0x7f5cd4ea3700):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:6181]
2024-04-12 03:12:57,267:39639(0x7f5cd4ea3700):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:6181], sessionId=0x103614baed8000e, negotiated timeout=40000
I20240412 03:12:57.268085 39711 zk_client.cc:581] zookeeper event with type -1, state 3, path
I20240412 03:12:57.268218 39711 zk_client.cc:601] connect success
I20240412 03:12:57.268916 39639 client_manager.cc:562] add client. name tb1, endpoint 127.0.0.1:9831
I20240412 03:12:57.269496 39639 tablet_impl.cc:591] no external functions to recover
I20240412 03:12:57.293287 39639 server.cpp:1134] Server[openmldb::tablet::TabletImpl] is serving on port=9831.
I20240412 03:12:57.293370 39639 server.cpp:1137] Check out http://4b60f8b6dfcb:9831 in web browser.
I20240412 03:12:57.304081 39639 zk_client.cc:252] register with name tb1 value 127.0.0.1:9831 ok
I20240412 03:12:57.311094 39651 zk_client.cc:48] node watcher with event type 4, state 3
I20240412 03:12:57.311125 39639 zk_client.cc:194] register self with endpoint tb1 ok
I20240412 03:12:57.311265 39639 tablet_impl.cc:395] tablet with endpoint tb1 register to zk cluster 127.0.0.1:6181 ok
I20240412 03:12:57.311475 39651 zk_client.cc:170] handle node changed event with type 4, and state 3, endpoints size 1, callback size 1
I20240412 03:12:57.311808 39651 name_server_impl.cc:1146] add tablet client. endpoint[tb1]
I20240412 03:12:57.314203 39711 zk_client.cc:58] item watcher with event type 3, state 3
I20240412 03:12:57.314249 39651 name_server_impl.cc:7490] notify table changed ok
I20240412 03:12:57.314350 39651 name_server_impl.cc:1170] healthy tablet with endpoint[tb1]
I20240412 03:12:57.314932 39711 tablet_impl.cc:4436] no tables in db
I20240412 03:12:57.315212 39711 tablet_catalog.cc:562] refresh catalog. version 2
W20240412 03:12:57.315272 39711 tablet_impl.cc:4531] __INTERNAL_DB.PRE_AGG_META_INFO not found
I20240412 03:12:57.316304 39735 tablet_impl.cc:4756] set tablet mode normal
I20240412 03:12:57.316680 39735 tablet_impl.cc:5271] update real endpoint: ns1 : 127.0.0.1:9631
I20240412 03:12:57.316730 39735 tablet_impl.cc:5271] update real endpoint: tb1 : 127.0.0.1:9831
I20240412 03:12:57.316776 39735 client_manager.cc:562] add client. name ns1, endpoint 127.0.0.1:9631
2024-04-12 03:12:59,314:39639(0x7f5ce6603500):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.14
2024-04-12 03:12:59,314:39639(0x7f5ce6603500):ZOO_INFO@log_env@757: Client environment:host.name=4b60f8b6dfcb
2024-04-12 03:12:59,314:39639(0x7f5ce6603500):ZOO_INFO@log_env@764: Client environment:os.name=Linux
2024-04-12 03:12:59,314:39639(0x7f5ce6603500):ZOO_INFO@log_env@765: Client environment:os.arch=3.10.0-862.el7.x86_64
2024-04-12 03:12:59,314:39639(0x7f5ce6603500):ZOO_INFO@log_env@766: Client environment:os.version=#1 SMP Fri Apr 20 16:44:24 UTC 2018
2024-04-12 03:12:59,314:39639(0x7f5ce6603500):ZOO_INFO@log_env@774: Client environment:user.name=(null)
2024-04-12 03:12:59,314:39639(0x7f5ce6603500):ZOO_INFO@log_env@782: Client environment:user.home=/root
2024-04-12 03:12:59,314:39639(0x7f5ce6603500):ZOO_INFO@log_env@794: Client environment:user.dir=/workspaces/OpenMLDB
2024-04-12 03:12:59,314:39639(0x7f5ce6603500):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:6181 sessionTimeout=100000 watcher=0x7e8420 sessionId=0 sessionPasswd= context=0x7e44000 flags=0
2024-04-12 03:12:59,315:39639(0x7f5cbed6f700):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:6181]
2024-04-12 03:12:59,328:39639(0x7f5cbed6f700):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:6181], sessionId=0x103614baed8000f, negotiated timeout=40000
I20240412 03:12:59.328866 39763 zk_client.cc:581] zookeeper event with type -1, state 3, path
I20240412 03:12:59.328984 39763 zk_client.cc:601] connect success
I20240412 03:12:59.329366 39639 client_manager.cc:562] add client. name tb2, endpoint 127.0.0.1:9931
I20240412 03:12:59.330317 39639 tablet_impl.cc:591] no external functions to recover
I20240412 03:12:59.335878 39639 server.cpp:1134] Server[openmldb::tablet::TabletImpl] is serving on port=9931.
I20240412 03:12:59.335954 39639 server.cpp:1137] Check out http://4b60f8b6dfcb:9931 in web browser.
I20240412 03:12:59.349725 39639 zk_client.cc:252] register with name tb2 value 127.0.0.1:9931 ok
I20240412 03:12:59.356878 39651 zk_client.cc:48] node watcher with event type 4, state 3
I20240412 03:12:59.356899 39639 zk_client.cc:194] register self with endpoint tb2 ok
I20240412 03:12:59.356974 39639 tablet_impl.cc:395] tablet with endpoint tb2 register to zk cluster 127.0.0.1:6181 ok
I20240412 03:12:59.357206 39651 zk_client.cc:170] handle node changed event with type 4, and state 3, endpoints size 2, callback size 1
I20240412 03:12:59.357595 39651 name_server_impl.cc:1170] healthy tablet with endpoint[tb1]
I20240412 03:12:59.357643 39651 name_server_impl.cc:1146] add tablet client. endpoint[tb2]
I20240412 03:12:59.359864 39711 zk_client.cc:58] item watcher with event type 3, state 3
I20240412 03:12:59.359903 39651 name_server_impl.cc:7490] notify table changed ok
I20240412 03:12:59.359972 39763 zk_client.cc:58] item watcher with event type 3, state 3
I20240412 03:12:59.359984 39651 name_server_impl.cc:1170] healthy tablet with endpoint[tb2]
I20240412 03:12:59.360270 39694 tablet_impl.cc:4756] set tablet mode normal
I20240412 03:12:59.360628 39711 tablet_impl.cc:4436] no tables in db
I20240412 03:12:59.360783 39763 tablet_impl.cc:4436] no tables in db
I20240412 03:12:59.360849 39711 tablet_catalog.cc:562] refresh catalog. version 3
W20240412 03:12:59.360890 39711 tablet_impl.cc:4531] __INTERNAL_DB.PRE_AGG_META_INFO not found
I20240412 03:12:59.360953 39714 tablet_impl.cc:4756] set tablet mode normal
I20240412 03:12:59.361013 39763 tablet_catalog.cc:562] refresh catalog. version 3
W20240412 03:12:59.361048 39763 tablet_impl.cc:4531] __INTERNAL_DB.PRE_AGG_META_INFO not found
I20240412 03:12:59.361294 39735 tablet_impl.cc:5271] update real endpoint: ns1 : 127.0.0.1:9631
I20240412 03:12:59.361366 39735 tablet_impl.cc:5271] update real endpoint: tb1 : 127.0.0.1:9831
I20240412 03:12:59.361398 39735 tablet_impl.cc:5271] update real endpoint: tb2 : 127.0.0.1:9931
I20240412 03:12:59.361440 39735 client_manager.cc:562] add client. name tb2, endpoint 127.0.0.1:9931
I20240412 03:12:59.361666 39735 tablet_impl.cc:5271] update real endpoint: ns1 : 127.0.0.1:9631
I20240412 03:12:59.361716 39735 tablet_impl.cc:5271] update real endpoint: tb1 : 127.0.0.1:9831
I20240412 03:12:59.361745 39735 tablet_impl.cc:5271] update real endpoint: tb2 : 127.0.0.1:9931
I20240412 03:12:59.361783 39735 client_manager.cc:562] add client. name ns1, endpoint 127.0.0.1:9631
I20240412 03:12:59.361826 39735 client_manager.cc:562] add client. name tb1, endpoint 127.0.0.1:9831
2024-04-12 03:13:01,357:39639(0x7f5ce6603500):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.14
2024-04-12 03:13:01,357:39639(0x7f5ce6603500):ZOO_INFO@log_env@757: Client environment:host.name=4b60f8b6dfcb
2024-04-12 03:13:01,357:39639(0x7f5ce6603500):ZOO_INFO@log_env@764: Client environment:os.name=Linux
2024-04-12 03:13:01,357:39639(0x7f5ce6603500):ZOO_INFO@log_env@765: Client environment:os.arch=3.10.0-862.el7.x86_64
2024-04-12 03:13:01,357:39639(0x7f5ce6603500):ZOO_INFO@log_env@766: Client environment:os.version=#1 SMP Fri Apr 20 16:44:24 UTC 2018
2024-04-12 03:13:01,357:39639(0x7f5ce6603500):ZOO_INFO@log_env@774: Client environment:user.name=(null)
2024-04-12 03:13:01,358:39639(0x7f5ce6603500):ZOO_INFO@log_env@782: Client environment:user.home=/root
2024-04-12 03:13:01,358:39639(0x7f5ce6603500):ZOO_INFO@log_env@794: Client environment:user.dir=/workspaces/OpenMLDB
2024-04-12 03:13:01,358:39639(0x7f5ce6603500):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:6181 sessionTimeout=1000 watcher=0x7e8420 sessionId=0 sessionPasswd= context=0x83ba010 flags=0
2024-04-12 03:13:01,358:39639(0x7f5cbd868700):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:6181]
2024-04-12 03:13:01,364:39639(0x7f5cbd868700):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:6181], sessionId=0x103614baed80010, negotiated timeout=4000
I20240412 03:13:01.364224 39777 zk_client.cc:581] zookeeper event with type -1, state 3, path
I20240412 03:13:01.364337 39777 zk_client.cc:601] connect success
2024-04-12 03:13:01,365:39639(0x7f5ce6603500):ZOO_INFO@zookeeper_close@2564: Closing zookeeper sessionId=0x103614baed80010 to [127.0.0.1:6181]

/workspaces/OpenMLDB/src/nameserver/new_server_env_test.cc:180: Failure
Expected equality of these values:
ns_real_ep
Which is: "127.0.0.1:9631"
it->second
Which is: ""
I20240412 03:13:01.365576 39639 server.cpp:1194] Server[openmldb::tablet::TabletImpl] is going to quit
2024-04-12 03:13:01,366:39639(0x7f5ce6603500):ZOO_INFO@zookeeper_close@2564: Closing zookeeper sessionId=0x103614baed8000f to [127.0.0.1:6181]

I20240412 03:13:01.369341 39639 server.cpp:1194] Server[openmldb::tablet::TabletImpl] is going to quit
I20240412 03:13:01.369550 39651 zk_client.cc:48] node watcher with event type 4, state 3
I20240412 03:13:01.369913 39651 zk_client.cc:170] handle node changed event with type 4, and state 3, endpoints size 1, callback size 1
I20240412 03:13:01.370280 39651 name_server_impl.cc:1170] healthy tablet with endpoint[tb1]
I20240412 03:13:01.370349 39651 name_server_impl.cc:1176] offline tablet with endpoint[tb2]
W20240412 03:13:01.370465 39641 rpc_client.h:61] error_code is EHOSTDOWN, sleep [1000] ms
2024-04-12 03:13:01,370:39639(0x7f5ce6603500):ZOO_INFO@zookeeper_close@2564: Closing zookeeper sessionId=0x103614baed8000e to [127.0.0.1:6181]

[ FAILED ] NewServerEnvTest.ShowRealEndpoint (9171 ms)
[----------] 1 test from NewServerEnvTest (9171 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (9171 ms total)
I20240412 03:13:01.373283 39651 zk_client.cc:48] node watcher with event type 4, state 3
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] NewServerEnvTest.ShowRealEndpoint

1 FAILED TEST
I20240412 03:13:01.373620 39651 zk_client.cc:170] handle node changed event with type 4, and state 3, endpoints size 0, callback size 1
I20240412 03:13:01.373682 39651 name_server_impl.cc:1176] offline tablet with endpoint[tb1]
I20240412 03:13:01.373965 39639 util.cc:68] removing temp path: "/tmp/openmldb/new_server_env_test598400"

@vagetablechicken
Copy link
Collaborator

vagetablechicken commented Apr 12, 2024

root cause is W20240412 03:12:57.252521 39639 zk_client.cc:237] server name:ns1 duplicate, check

std::string sname = endpoint_;
// check server name duplicate
std::vector<std::string> sname_vec;
std::string leader_path = zk_root_path_ + "/leader";
std::vector<std::string> children;
if (GetChildren(leader_path, children)) {
for (auto path : children) {
std::string endpoint;
std::string real_path = leader_path + "/" + path;
if (GetNodeValue(real_path, endpoint)) {
sname_vec.push_back(endpoint);
}
}
}
std::vector<std::string> endpoints;
if (GetNodes(endpoints)) {
std::vector<std::string>::const_iterator it = endpoints.begin();
for (; it != endpoints.end(); ++it) {
sname_vec.push_back(*it);
}
}
if (std::find(sname_vec.begin(), sname_vec.end(), sname) != sname_vec.end()) {
std::string ep;
if (GetNodeValue(names_root_path_ + "/" + sname, ep) && ep == real_endpoint_) {
LOG(INFO) << "node:" << sname << "value:" << ep << " exist";
return true;
}
LOG(WARNING) << "server name:" << sname << " duplicate";
return false;
}

It'll get nodes(leader and tablets) in line 212-230, if leader node is exists in zk, sname_vec will have ns1, so it'll check GetNodeValue(names_root_path_ + "/" + sname, ep) && ep == real_endpoint_. But no ns1 in zk names_root_path_ , we haven't create it, GetNodeValue return false. So registerName says duplicate and returns false.

I think the good way is check if exists in zk names_root_path_ `, don't just get, we can't figure out that it doesn't exsit or get failed(zk failures).

zk data use zkCli.sh -server xx:xx, cheatsheet https://zookeeper.apache.org/doc/r3.6.0/zookeeperCLI.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants