We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug
在 k3s + cri-dockerd 环境中发现 ilogtail 启动后一直大量报错输出日志,docker discover 失败。
检查代码后发现涉及以下逻辑:
这个 inspectOneContainer 判断 container 超时之后,把超时当成 err 返回了,但是事实上超时对于 fetchAll 这里应该是无所谓的。 这个 err 在 fetchAll 中还会被 for 循环覆盖,返回的是最后一个 err,如果不巧最后一个 container 是 Exited 的状态 + timeout 判定成功,就会导致 fetchAll 返回一个超时 err,然后进一步导致 docker discover 失效。
iLogtail Running Environment
2024-11-27 05:30:01 [INF] [container_discover_controller.go:197] [Init] input:param docker discover:true cri discover:false static discover:false 2024-11-27 05:30:01 [INF] [container_discover_controller.go:223] [Init] init docker center, fetch all seconds:5m0s 2024-11-27 05:30:01 [INF] [container_discover_controller.go:233] [Init] init docker center, fecth all success timeout:1h40m0s 2024-11-27 05:30:01 [INF] [container_discover_controller.go:243] [Init] init docker center, client request timeout:30s 2024-11-27 05:30:01 [INF] [container_discover_controller.go:254] [Init] init docker center, max fetchOne count per second:200 2024-11-27 05:30:01 [WRN] [docker_center.go:672] [setLastError] AlarmType:DOCKER_CENTER_ALARM message:inspect time out container 2bd2284a387ff7af09f15a7134979f3f278c2e53ae73fd8c3f31bc075686dc9b error found:inspect time out container 2bd2284a387ff7af09f15a7134979f3f278c2e53ae73fd8c3f31bc075686dc9b 2024-11-27 05:30:01 [WRN] [docker_center.go:672] [setLastError] AlarmType:DOCKER_CENTER_ALARM message:inspect time out container 2a38c17886d91034f05f0354be7d0dfde8c6be4b7c3238987fa8d8f9a3468cbd error found:inspect time out container 2a38c17886d91034f05f0354be7d0dfde8c6be4b7c3238987fa8d8f9a3468cbd 2024-11-27 05:30:01 [WRN] [docker_center.go:672] [setLastError] AlarmType:DOCKER_CENTER_ALARM message:inspect time out container 2c97e747d6f43cbf39646af799e273466d81fc57ad0e88ae4d859af9b1a2f139 error found:inspect time out container 2c97e747d6f43cbf39646af799e273466d81fc57ad0e88ae4d859af9b1a2f139 2024-11-27 05:30:01 [WRN] [docker_center.go:672] [setLastError] AlarmType:DOCKER_CENTER_ALARM message:inspect time out container b7eb5d7f850b2147b19a89a57be14d7e71b4a3e076913cdb334c35b7d7b082c5 error found:inspect time out container b7eb5d7f850b2147b19a89a57be14d7e71b4a3e076913cdb334c35b7d7b082c5 2024-11-27 05:30:01 [WRN] [docker_center.go:672] [setLastError] AlarmType:DOCKER_CENTER_ALARM message:inspect time out container c2fbb8a3ba1c2e48ccc74eb3fd36f73659e4705dec0ce715037731bc8c4cf189 error found:inspect time out container c2fbb8a3ba1c2e48ccc74eb3fd36f73659e4705dec0ce715037731bc8c4cf189 2024-11-27 05:30:01 [WRN] [docker_center.go:672] [setLastError] AlarmType:DOCKER_CENTER_ALARM message:inspect time out container bbee7fa2be2f69b7db13b1dbc6a98ee1b113f2f2e33d80d7cf3df4201c47cafc error found:inspect time out container bbee7fa2be2f69b7db13b1dbc6a98ee1b113f2f2e33d80d7cf3df4201c47cafc 2024-11-27 05:30:01 [WRN] [docker_center.go:672] [setLastError] AlarmType:DOCKER_CENTER_ALARM message:inspect time out container 6dc6e3a8eec21764666a1c2efcd37cdffdc7279025a248b66224b7bfc1e5ecca error found:inspect time out container 6dc6e3a8eec21764666a1c2efcd37cdffdc7279025a248b66224b7bfc1e5ecca 2024-11-27 05:30:01 [WRN] [docker_center.go:672] [setLastError] AlarmType:DOCKER_CENTER_ALARM message:inspect time out container 51a89d29fc21a0065bdb9b221862f44e3a2c364acfce2930580502e4414c6b87 error found:inspect time out container 51a89d29fc21a0065bdb9b221862f44e3a2c364acfce2930580502e4414c6b87 2024-11-27 05:30:01 [WRN] [docker_center.go:672] [setLastError] AlarmType:DOCKER_CENTER_ALARM message:inspect time out container 811f18cf987ab0e3055bf612197b52233c5a229b7212225f60142292b93c12d3 error found:inspect time out container 811f18cf987ab0e3055bf612197b52233c5a229b7212225f60142292b93c12d3 2024-11-27 05:30:01 [WRN] [docker_center.go:672] [setLastError] AlarmType:DOCKER_CENTER_ALARM message:inspect time out container 47e58c1a0992c3bf6d07d60eeda4dfec362bc6bebdf25d3a4e2ac5923d4e027c error found:inspect time out container 47e58c1a0992c3bf6d07d60eeda4dfec362bc6bebdf25d3a4e2ac5923d4e027c 2024-11-27 05:30:01 [WRN] [docker_center.go:672] [setLastError] AlarmType:DOCKER_CENTER_ALARM message:inspect time out container 42f4697f5c8c2de9fa2ad9166e8a6b38741e87d75474bdf8c3a4b5842ebdca53 error found:inspect time out container 42f4697f5c8c2de9fa2ad9166e8a6b38741e87d75474bdf8c3a4b5842ebdca53 2024-11-27 05:30:01 [ERR] [container_discover_controller.go:260] [Init] AlarmType:DOCKER_CENTER_ALARM fetch docker containers error, close docker discover, will retry 2024-11-27 05:30:01 [INF] [container_discover_controller.go:270] [Init] final:param docker discover:false cri discover:false static discover:false
The text was updated successfully, but these errors were encountered:
fix: Handle timeout for exited containers in docker discover process
ecbfa1a
fix alibaba#1934
Successfully merging a pull request may close this issue.
Describe the bug
在 k3s + cri-dockerd 环境中发现 ilogtail 启动后一直大量报错输出日志,docker discover 失败。
检查代码后发现涉及以下逻辑:
这个 inspectOneContainer 判断 container 超时之后,把超时当成 err 返回了,但是事实上超时对于 fetchAll 这里应该是无所谓的。
这个 err 在 fetchAll 中还会被 for 循环覆盖,返回的是最后一个 err,如果不巧最后一个 container 是 Exited 的状态 + timeout 判定成功,就会导致 fetchAll 返回一个超时 err,然后进一步导致 docker discover 失效。
iLogtail Running Environment
The text was updated successfully, but these errors were encountered: