Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] yb-master node gets stuck in CrashLoopBackOff when tls is reconfigured #43

Open
srteam2020 opened this issue Nov 7, 2021 · 0 comments

Comments

@srteam2020
Copy link

Behavior

When reconfiguring spec.tls.enabled, we find that sometimes one of the yugabyte master nodes can be stuck in the CrashLoopBackOff status forever after it restarts:

kubectl get pods:
NAME                                 READY   STATUS             RESTARTS   AGE
yb-master-0                          1/1     Running            0          3m58s
yb-master-1                          1/1     Running            0          3m58s
yb-master-2                          0/1     CrashLoopBackOff   3          86s
yb-tserver-0                         1/1     Running            0          74s
yb-tserver-1                         1/1     Running            0          77s
yb-tserver-2                         1/1     Running            0          86s
yugabyte-operator-744c956b6d-m56j2   1/1     Running            0          4m8s

The log messages from yb-master-2:

I1106 19:55:28.597328    29 catalog_manager.cc:1414] Did not find previous SysCatalogTable data on disk. Not found (yb/util/env_posix.cc:1514): Unable to load consensus metadata for tablet 00000000000000000000000000000000: /mnt/data0/yb-data/master/consensus-meta/00000000000000000000000000000000: No such file or directory (system error 2)
I1106 19:55:28.597637    29 sys_catalog.cc:288] Creating new SysCatalogTable data
E1106 19:55:28.597716    29 master.cc:276] [email protected]:7100: Unable to init master catalog manager: Already present (yb/tablet/tablet_metadata.cc:264): Unable to initialize catalog manager: Failed to initialize sys tables async: Encountered errors during system catalog initialization:
        Error on Load: Not found (yb/util/env_posix.cc:1514): Unable to load consensus metadata for tablet 00000000000000000000000000000000: /mnt/data0/yb-data/master/consensus-meta/00000000000000000000000000000000: No such file or directory (system error 2)
        Error on CreateNew: : Raft group already exists: 00000000000000000000000000000000
F1106 19:55:28.597748     1 master_main.cc:131] Already present (yb/tablet/tablet_metadata.cc:264): Unable to initialize catalog manager: Failed to initialize sys tables async: Encountered errors during system catalog initialization:
        Error on Load: Not found (yb/util/env_posix.cc:1514): Unable to load consensus metadata for tablet 00000000000000000000000000000000: /mnt/data0/yb-data/master/consensus-meta/00000000000000000000000000000000: No such file or directory (system error 2)
        Error on CreateNew: : Raft group already exists: 00000000000000000000000000000000
Fatal failure details written to /mnt/data0/yb-data/master/logs/yb-master.FATAL.details.2021-11-06T19_55_28.pid1.txt
F20211106 19:55:28 ../../src/yb/master/master_main.cc:131] Already present (yb/tablet/tablet_metadata.cc:264): Unable to initialize catalog manager: Failed to initialize sys tables async: Encountered errors during system catalog initialization:
        Error on Load: Not found (yb/util/env_posix.cc:1514): Unable to load consensus metadata for tablet 00000000000000000000000000000000: /mnt/data0/yb-data/master/consensus-meta/00000000000000000000000000000000: No such file or directory (system error 2)
        Error on CreateNew: : Raft group already exists: 00000000000000000000000000000000
    @     0x7f22c21a5a3c  yb::LogFatalHandlerSink::send()
    @     0x7f22c137e866  google::LogMessage::SendToLog()
    @     0x7f22c137be3a  google::LogMessage::Flush()
    @     0x7f22c137f529  google::LogMessageFatal::~LogMessageFatal()
    @           0x4099ac  yb::master::MasterMain()
    @     0x7f22bcfd0825  __libc_start_main
    @           0x4089c9  _start
    @              (nil)  (unknown)

*** Check failure stack trace: ***
    @     0x7f22c21a3e21  yb::(anonymous namespace)::DumpStackTraceAndExit()
    @     0x7f22c137c3dd  google::LogMessage::Fail()
    @     0x7f22c137e906  google::LogMessage::SendToLog()
    @     0x7f22c137be3a  google::LogMessage::Flush()
    @     0x7f22c137f529  google::LogMessageFatal::~LogMessageFatal()
    @           0x4099ac  yb::master::MasterMain()
    @     0x7f22bcfd0825  __libc_start_main
    @           0x4089c9  _start
    @              (nil)  (unknown)
*** Aborted at 1636228528 (unix time) try "date -d @1636228528" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 1 (TID 0x7f22cd6c52c0) from PID 0; stack trace: ***
    @     0x7f22bd961ba0 (unknown)
    @     0x7f22bcfe45a6 __GI_abort
    @     0x7f22c21a3e74  yb::(anonymous namespace)::DumpStackTraceAndExit()
    @     0x7f22c137c3dc  google::LogMessage::Fail()
    @     0x7f22c137e905  google::LogMessage::SendToLog()
    @     0x7f22c137be39  google::LogMessage::Flush()
    @     0x7f22c137f528  google::LogMessageFatal::~LogMessageFatal()
    @           0x4099ab  yb::master::MasterMain()
    @     0x7f22bcfd0825 __libc_start_main
    @           0x4089c9 _start
    @                0x0 (unknown)

It seems that yb-master node failed to find previous SysCatalogTable data, and tried to create a new one. After that it encountered an error during the initialization and aborted.

From yugabyte-operator log, it encounters reconciliation error:

2021-11-07 00:14:13.638304 I | yugabyte-k8s-operator: running command 'yb-admin get_universe_config' in YB-Master pod: yb-master-0, command: ["bash" "-c" "/home/yugabyte/bin/yb-admin --master_addresses yb-master-0.yb-masters.default.svc.cluster.local:7100,yb-master-1.yb-masters.default.svc.cluster.local:7100,yb-master-2.yb-masters.default.svc.cluster.local:7100 get_universe_config"]
{"level":"error","ts":1636244084.8385901,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"ybcluster-controller","request":"default/example-ybcluster","error":"command terminated with exit code 137","stacktrace":"..."}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant