Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd pod crashloop - wal: max entry size limit exceeded #15090

Closed
vikasbudhwat opened this issue Jan 12, 2023 · 1 comment
Closed

etcd pod crashloop - wal: max entry size limit exceeded #15090

vikasbudhwat opened this issue Jan 12, 2023 · 1 comment
Labels

Comments

@vikasbudhwat
Copy link

vikasbudhwat commented Jan 12, 2023

What happened?

We have setup etcd cluster of 3 etcd pods, out of which only 1 etcd pod has crashloop.

tried pod restart multiple times and same issue appears surprisingly other 2 pods are running fine.
not sure if it could be due to some crash, memory seg got corrupted, though I don't know if there was any crash.

we are running same setup on other k8s eks cluster but there is no such issue seen there.

What did you expect to happen?

pod come up

How can we reproduce it (as minimally and precisely as possible)?

not sure

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
# paste output here

$ etcdctl version
# paste output here

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

Pod restart logs
`│ {"level":"warn","ts":"2023-01-12T12:20:53.494Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT_2379_TCP=tcp://172.20.231.7 │
│ 9:2379"}                                                                                                                                                                                   │
│ {"level":"warn","ts":"2023-01-12T12:20:53.494Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_SERVICE_PORT=2379"}              │
│ {"level":"warn","ts":"2023-01-12T12:20:53.494Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT_2379_TCP_PORT=2379"}        │
│ {"level":"warn","ts":"2023-01-12T12:20:53.494Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT_2379_TCP_PROTO=tcp"}        │
│ {"level":"warn","ts":"2023-01-12T12:20:53.494Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT=tcp://172.20.231.79:2379"}  │
│ {"level":"warn","ts":"2023-01-12T12:20:53.494Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_SERVICE_HOST=172.20.231.79"}     │
│ {"level":"warn","ts":"2023-01-12T12:20:53.494Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_SERVICE_PORT_ETCD_CLIENT=2379"}  │
│ {"level":"warn","ts":"2023-01-12T12:20:53.494Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLIENT_PORT_2379_TCP_ADDR=172.20.231.79 │
"}                                                                                                                                                                                         │
│ {"level":"info","ts":"2023-01-12T12:20:53.494Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--name","etcd-2","--listen-peer-urls","http://0.0.0.0:2380","--listen-cli │
│ ent-urls","http://0.0.0.0:2379","--advertise-client-urls","http://etcd-2.etcd:2379","--initial-advertise-peer-urls","http://etcd-2:2380","--initial-cluster-token","etcd-cluster-1","--ini │
│ tial-cluster","etcd-0=http://etcd-0.etcd:2380,etcd-1=http://etcd-1.etcd:2380,etcd-2=http://etcd-2.etcd:2380","--initial-cluster-state","new","--data-dir","/var/run/etcd/default.etcd"]}   │
│ {"level":"info","ts":"2023-01-12T12:20:53.494Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/run/etcd/default.etcd","dir-type":"member"}   │
│ {"level":"info","ts":"2023-01-12T12:20:53.494Z","caller":"embed/etcd.go:124","msg":"configuring peer listeners","listen-peer-urls":["http://0.0.0.0:2380"]}                                │
│ {"level":"info","ts":"2023-01-12T12:20:53.494Z","caller":"embed/etcd.go:132","msg":"configuring client listeners","listen-client-urls":["http://0.0.0.0:2379"]}                            │
│ {"level":"info","ts":"2023-01-12T12:20:53.495Z","caller":"embed/etcd.go:306","msg":"starting an etcd server","etcd-version":"3.5.5","git-sha":"19002cfc6","go-version":"go1.16.15","go-os"
│ :"linux","go-arch":"amd64","max-cpu-set":32,"max-cpu-available":32,"member-initialized":true,"name":"etcd-2","data-dir":"/var/run/etcd/default.etcd","wal-dir":"","wal-dir-dedicated":"","
│ member-dir":"/var/run/etcd/default.etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":10000 │
│ 0,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://etcd-2:2380"],"listen-peer-urls":["http://0.0.0.0:2380"],"advertise-client-urls":["http://etcd-2.etcd:2379"],"lis │
│ ten-client-urls":["http://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota │
│ -backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check │
│ -time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery │
│ -proxy":"","downgrade-check-interval":"5s"}                                                                                                                                                │
│ {"level":"warn","ts":1673526053.4951336,"caller":"fileutil/fileutil.go:57","msg":"check file permission","error":"directory \"/var/run/etcd/default.etcd\" exist, but the permission is \"
│ dgrwxrwx---\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}                                                                          │
│ {"level":"warn","ts":1673526053.554111,"caller":"fileutil/fileutil.go:57","msg":"check file permission","error":"directory \"/var/run/etcd/default.etcd/member/snap\" exist, but the permi │
│ ssion is \"dgrwxrwx---\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}                                                               │
│ {"level":"info","ts":"2023-01-12T12:20:55.765Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/run/etcd/default.etcd/member/snap/db","took":"2.211541434s"}    │
│ {"level":"warn","ts":"2023-01-12T12:20:55.766Z","caller":"wal/util.go:90","msg":"ignored file in WAL directory","path":"0000000000000003-00000000000f614d.wal.broken"}                     │
│ {"level":"info","ts":"2023-01-12T12:20:58.765Z","caller":"embed/etcd.go:371","msg":"closing etcd server","name":"etcd-2","data-dir":"/var/run/etcd/default.etcd","advertise-peer-urls":["h │
│ ttp://etcd-2:2380"],"advertise-client-urls":["http://etcd-2.etcd:2379"]}                                                                                                                   │
│ {"level":"info","ts":"2023-01-12T12:20:58.765Z","caller":"embed/etcd.go:373","msg":"closed etcd server","name":"etcd-2","data-dir":"/var/run/etcd/default.etcd","advertise-peer-urls":["ht │
│ tp://etcd-2:2380"],"advertise-client-urls":["http://etcd-2.etcd:2379"]}                                                                                                                    │
│ {"level":"fatal","ts":"2023-01-12T12:20:58.765Z","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"wal: max entry size limit exceeded, recBytes: 143, fileSize(21032960) - │
│  offset(21032896) - padBytes(1) = entryLimit(63)","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go │
│ :204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/main.go:40\nmain.main\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/main. │
│ go:32\nruntime.main\n\t/usr/local/google/home/siarkowicz/.gvm/gos/go1.16.15/src/runtime/proc.go:225"}`
@ahrtr
Copy link
Member

ahrtr commented Jan 12, 2023

It's a known issue, and resolved in #15069. The fix will be included in etcd 3.5.7 (to be released soon).

Please follow runtime-configuration/#replace-a-failed-machine to workaround this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants