Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: 2.13.1 /v1/healthycheck 获取不到upstream 健康状态 #7141

Closed
heming79 opened this issue May 26, 2022 · 11 comments · Fixed by #7184
Closed

bug: 2.13.1 /v1/healthycheck 获取不到upstream 健康状态 #7141

heming79 opened this issue May 26, 2022 · 11 comments · Fixed by #7184
Labels
checking check first if this issue occurred

Comments

@heming79
Copy link

Current Behavior

curl 127.0.0.1:9090/v1/healthcheck
{}

Expected Behavior

https://github.com/apache/apisix/blob/master/docs/en/latest/control-api.md#get-v1healthcheck
[
{
"healthy_nodes": [
{
"host": "127.0.0.1",
"port": 1980,
"priority": 0,
"weight": 1
}
],
"name": "upstream#/upstreams/1",
"nodes": [
{
"host": "127.0.0.1",
"port": 1980,
"priority": 0,
"weight": 1
},
{
"host": "127.0.0.2",
"port": 1988,
"priority": 0,
"weight": 1
}
],
"src_id": "1",
"src_type": "upstreams"
},
{
"healthy_nodes": [
{
"host": "127.0.0.1",
"port": 1980,
"priority": 0,
"weight": 1
}
],
"name": "upstream#/routes/1",
"nodes": [
{
"host": "127.0.0.1",
"port": 1980,
"priority": 0,
"weight": 1
},
{
"host": "127.0.0.1",
"port": 1988,
"priority": 0,
"weight": 1
}
],
"src_id": "1",
"src_type": "routes"
}
]

Error Logs

No response

Steps to Reproduce

1、start apisix
2、add upstream node

Environment

  • APISIX version (run apisix version):2.13.1
  • Operating system (run uname -a):centos7.9
  • OpenResty / Nginx version (run openresty -V or nginx -V): openresty/1.19.9.1
  • etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):3.4.0
  • APISIX Dashboard version, if relevant:2.8.0
  • Plugin runner version, for issues related to plugin runners:
  • LuaRocks version, for installation issues (run luarocks --version):3.8.0
@tzssangglass
Copy link
Member

Is this a steadily recurring problem, or an occasional one?

@spacewander
Copy link
Member

Is it because there is no request sent to the upstream?

As https://github.com/apache/apisix/blob/master/docs/en/latest/health-check.md shows,

We only start the health check when the upstream is hit by a request. There won't be any health check if an upstream is configured but isn't in used.

@heming79
Copy link
Author

有时候能看到有值
[
{
"nodes":[
{
"priority":0,
"weight":1,
"host":"192.168.6.93",
"port":20452
},
{
"priority":0,
"weight":1,
"host":"192.168.6.91",
"port":20450
}
],
"name":"upstream#/apisix/upstreams/409106632274886134",
"src_type":"upstreams",
"healthy_nodes":[
{
"priority":0,
"weight":1,
"host":"192.168.6.91",
"port":20450
}
],
"src_id":"409106632274886134"
},
{
"nodes":[
{
"priority":0,
"weight":2,
"host":"192.168.6.93",
"port":20731
},
{
"priority":0,
"weight":1,
"host":"192.168.6.54",
"port":20730
},
{
"priority":0,
"weight":1,
"host":"192.168.6.91",
"port":20730
}
],
"name":"upstream#/apisix/upstreams/409108789875195382",
"src_type":"upstreams",
"healthy_nodes":[
{
"priority":0,
"weight":2,
"host":"192.168.6.93",
"port":20731
}
],
"src_id":"409108789875195382"
}
]

有时候 是空的

/usr/local/apisix/logs # more 2022-05-26_07-00-00__error.log
2022/05/26 06:01:40 [error] 289#289: 27754543 [lua] init.lua:157: http_ssl_phase(): failed to fetch ssl config: failed to find SNI: please check if the client requests via IP or uses an outdated protocol. If you need to report an issue, provide a packet capture file of the TLS handshake., context: ssl_certificate_by_lua, client: 193.106.191.48, server: 0.0.0.0:443
2022/05/26 06:31:27 [error] 297#297: 28435830 [lua] init.lua:157: http_ssl_phase(): failed to fetch ssl config: failed to find SNI: please check if the client requests via IP or uses an outdated protocol. If you need to report an issue, provide a packet capture file of the TLS handshake., context: ssl_certificate_by_lua, client: 193.106.191.48, server: 0.0.0.0:443
2022/05/26 06:37:37 [error] 294#294: 28576762 [lua] init.lua:157: http_ssl_phase(): failed to fetch ssl config: failed to find SNI: please check if the client requests via IP or uses an outdated protocol. If you need to report an issue, provide a packet capture file of the TLS handshake., context: ssl_certificate_by_lua, client: 193.106.191.48, server: 0.0.0.0:443
2022/05/26 07:00:00 [warn] 310#310: *29091039 [lua] log-rotate.lua:266: send USR1 signal to master process [1] for reopening log file, context: ngx.timer

现在就是空的
/usr/local/apisix/logs # curl 127.0.0.1:9090/v1/healthcheck
{}

@heming79
Copy link
Author

另外 能不能集中显示一下 unhealthy_nodes , 我更关注的时候 unhealthy_nodes 能及时报警出来 ,需要及时通知监控工程师去解决unhealthy_nodes 的问题 。 尽快把服务恢复起来 。

@heming79
Copy link
Author

/usr/local/apisix/apisix/control/v1.lua # line 110 add
core.log.error("upstream_nodes: ", core.json.delay_encode(upstreams.nodes))

image
2022/05/26 08:29:42 [error] 630#630: *30924328 [lua] v1.lua:110: handler(): upstream_nodes: null, client: 127.0.0.1, server: , request: "GET /v1/healthcheck HTTP/1.1", host: "127.0.0.1:9090"

@heming79
Copy link
Author

local upstream_mod = require("apisix.upstream")
local get_upstreams = upstream_mod.upstreams

这个routes 就没问题 稳定的 upstreams 经常就是空的 。
local routes = get_routes()
core.log.error("routes: ", core.json.delay_encode(routes))
local upstreams = get_upstreams()
core.log.error("upstreams: ", core.json.delay_encode(upstreams))
core.log.error("upstream_nodes: ", core.json.delay_encode(upstreams.nodes))

2022/05/26 08:49:29 [error] 803#803: 31319311 [lua] v1.lua:106: handler(): routes: [{"clean_handlers":{},"value":{"status":1,"priority":0,"upstream_id":"409123071262202320","host":"service-yyzyh.wanzhuanmohe.cn","name":"service-yyzyh.wanzhuanmohe.cn","methods":["GET","POST"],"update_time":1653386086,"create_time":1653386086,"id":"409123293879081424","uri":"/"},"createdIndex":26,"update_count":0,"has_domain":false,"orig_modifiedIndex":26,"key":"/apisix/routes/409123293879081424","modifiedIndex":26},{"clean_handlers":{},"value":{"status":1,"priority":0,"upstream_id":"409123196135021008","host":"service-bmlt.wanzhuanmohe.cn","name":"service-bmlt.wanzhuanmohe.cn","methods":["GET","POST"],"update_time":1653386133,"create_time":1653386133,"id":"409123373017209296","uri":"/"},"createdIndex":28,"update_count":0,"has_domain":false,"orig_modifiedIndex":28,"key":"/apisix/routes/409123373017209296","modifiedIndex":28},{"clean_handlers":{},"value":{"status":1,"priority":0,"create_time":1653524709,"host":"service-yyzyh.wanzhuanmohe.cn","upstream_id":"409356277903268304","name":"service-yyzyh.wanzhuanmohe.cn/app","methods":["GET","POST"],"update_time":1653525007,"uri":"/app/","id":"409355866207164880","desc":"游戏接口域名/app 转发到广告接口"},"createdIndex":2015,"update_count":0,"has_domain":false,"orig_modifiedIndex":2018,"key":"/apisix/routes/409355866207164880","modifiedIndex":2018},{"clean_handlers":{},"value":{"status":1,"priority":0,"create_time":1653536211,"host":"serviceapi-cbdmcnssp.wanzhuanmohe.cn","upstream_id":"409356277903268304","name":"serviceapi-cbdmcnssp.wanzhuanmohe.cn","methods":["GET","POST"],"update_time":1653536211,"uri":"/","id":"409375163394562512","desc":"cbd 游戏广告接口"},"createdIndex":2068,"update_count":0,"has_domain":false,"orig_modifiedIndex":2068,"key":"/apisix/routes/409375163394562512","modifiedIndex":2068}], client: 127.0.0.1, server: , request: "GET /v1/healthcheck HTTP/1.1", host: "127.0.0.1:9090"
2022/05/26 08:49:29 [error] 803#803: *31319311 [lua] v1.lua:112: handler(): upstreams: , client: 127.0.0.1, server: , request: "GET /v1/healthcheck HTTP/1.1", host: "127.0.0.1:9090"
2022/05/26 08:49:29 [error] 803#803: *31319311 [lua] v1.lua:113: handler(): upstream_nodes: null, client: 127.0.0.1, server: , request: "GET /v1/healthcheck HTTP/1.1", host: "127.0.0.1:9090"

@tokers
Copy link
Contributor

tokers commented May 26, 2022

Only upstream with requests sent will show their status.

@heming79
Copy link
Author

heming79 commented May 26, 2022

/usr/local/apisix # curl 127.0.0.1:9090/v1/upstreams

<title>**500 Internal Server Error**</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style>

An error occurred.

You can report issue to APISIX

Faithfully yours, APISIX.

是参数问题吗?
2022/05/26 08:57:59 [error] 944#944: *31488271 lua entry thread aborted: runtime error: /usr/local/apisix/apisix/control/v1.lua:226: bad argument #2 to 'error' (expected table to have __tostring metamethod)
stack traceback:
coroutine 0:
[C]: in function 'error'
/usr/local/apisix/apisix/control/v1.lua:226: in function 'handler'
/usr/local/apisix/apisix/control/router.lua:79: in function 'handler'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:722: in function 'fn'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:590: in function 'match_route_opts'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:612: in function '_match_from_routes'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:663: in function 'match_route'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:709: in function 'match'
/usr/local/apisix/apisix/init.lua:795: in function 'http_control'
content_by_lua(nginx.conf:158):2: in main chunk, client: 127.0.0.1, server: , request: "GET /v1/upstreams HTTP/1.1", host: "127.0.0.1:9090"

/usr/local/apisix # curl 127.0.0.1:9090/v1/routes
[{"update_count":0,"has_domain":false,"key":"/apisix/routes/409123293879081424","modifiedIndex":26,"orig_modifiedIndex":26,"value":{"status":1,"update_time":1653386086,"methods":["GET","POST"],"priority":0,"upstream_id":"409123071262202320","id":"409123293879081424","host":"service-yyzyh.wanzhuanmohe.cn","name":"service-yyzyh.wanzhuanmohe.cn","uri":"/*","create_time":1653386086},"clean_handlers":{},"createdIndex":26},{"update_count":0,"has_domain":false,"key":"/apisix/routes/409123373017209296","modifiedIndex":28,"orig_modifiedIndex":28,"value":{"status":1,"update_time":1653386133,"methods":

upstreams 有时是 500 有时又能查询
image

@heming79
Copy link
Author

另外一个集群:
/usr/local/apisix # curl 127.0.0.1:9090/v1/upstreams
[{"has_domain":false,"modifiedIndex":1655,"createdIndex":63,"clean_handlers":{},"key":"/apisix/upstreams/409106632274886134","value":{"type":"roundrobin","create_time":1653376154,"retries":3,"retry_timeout":2,"id":"409106632274886134","update_time":1653468030,"name":"service-yyzyh","scheme":"http","keepalive_pool":{"size":320,"idle_timeout":60,"requests":1000},"desc":"cbd养鱼专业户游戏接口","checks":{"passive":{"unhealthy":{"http_statuses":[429,500,503],"tcp_failures":0,"timeouts":0,"http_failures":0},"healthy":{"http_statuses":[200,201,202,203,204,205,206,207,208,226,300,301,302,303,304,305,306,307,308],"successes":0},"type":"http"},"active":{"unhealthy":{"http_failures":5,"tcp_failures":2,"timeouts":3,"tcp_Failures":2,"interval":1,"http_statuses":[429,404,500,501,502,503,504,505]},"healthy":{"http_statuses":[200,302],"interval":1,"successes":2},"timeout":1,"type":"http","concurrency":10,"http_path":"/","https_verify_certificate":true,"port":80}},"timeout":{"send":6,"read":6,"connect":6},"nodes":[{"host":"192.168.6.93","weight":1,"port":20452},{"host":"192.168.6.91","weight":1,"port":20450}],"pass_host":"pass","hash_on":"vars"}},{"has_domain":false,"modifiedIndex":1722,"createdIndex":88,"clean_handlers":{},"key":"/apisix/upstreams/409108789875195382","value":{"type":"roundrobin","create_time":1653377440,"retries":4,"retry_timeout":2,"id":"409108789875195382","update_time":1653556657,"name":"service-bmlt","scheme":"http","keepalive_pool":{"size":320,"idle_timeout":60,"requests":1000},"desc":"cbd 百亩良田游戏api接口","checks":{"passive":{"unhealthy":{"http_failures":2,"tcp_failures":2,"timeouts":6,"http_statuses":[429,500,503,502,504,404]},"healthy":{"http_statuses":[200,201,202,203,204,205,206,207,208,226,300,301,302,303,304,305,306,307,308],"successes":5},"type":"http"},"active":{"unhealthy":{"http_failures":5,"tcp_failures":2,"timeouts":3,"tcp_Failures":2,"interval":1,"http_statuses":[429,404,500,501,502,503,504,505]},"healthy":{"http_statuses":[200,302],"interval":1,"successes":2},"type":"http","concurrency":10,"http_path":"/","https_verify_certificate":true,"timeout":1}},"timeout":{"send":6,"read":6,"connect":6},"nodes":[{"host":"192.168.6.54","weight":1,"port":20730},{"host":"192.168.6.91","weight":1,"port":20730},{"host":"192.168.6.93","weight":2,"port":20731},{"host":"192.168.6.51","weight":1,"port":20731}],"pass_host":"pass","hash_on":"vars"}}]
/usr/local/apisix # curl 127.0.0.1:9090/v1/healthcheck
{}

@spacewander spacewander added the checking check first if this issue occurred label May 26, 2022
@soulbird
Copy link
Contributor

In the case of multiple processes, if the process hit by the request and the process hit by the api are not the same, the health check information cannot be obtained

@tzssangglass
Copy link
Member

In the case of multiple processes, if the process hit by the request and the process hit by the api are not the same, the health check information cannot be obtained

That's the truth.

But I don't think it's a problem. In actual use, it only happens for a very short period of time when APISIX starts up. It's just that this phenomenon is amplified in testing.

As the requests increase, each worker processes the request and returns healthcheck data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
checking check first if this issue occurred
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants