Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readiness and liveness endpoints for service health monitoring #4443

Closed
gilescope opened this issue May 13, 2024 · 1 comment · Fixed by #4802
Closed

readiness and liveness endpoints for service health monitoring #4443

gilescope opened this issue May 13, 2024 · 1 comment · Fixed by #4802
Assignees

Comments

@gilescope
Copy link
Contributor

Re-awakening: paritytech/substrate#1017

Stale PR here: paritytech/substrate#14314

What's the current thoughts on having a readiness endpoint? Without an 'official' endpoint people will create ad-hoc checks rather than all collaborate on making one good check.

(If this is already implemented we can close this, but nothing came up on the search)

@bkchr
Copy link
Member

bkchr commented May 13, 2024

CC @niklasad1

@niklasad1 niklasad1 self-assigned this May 13, 2024
github-merge-queue bot pushed a commit that referenced this issue Jun 19, 2024
Previous attempt paritytech/substrate#14314

Close #4443 

Ideally, we should move /health and /health/readiness to the prometheus
server but because it's was quite easy to implement on the RPC server
and that RPC server already exposes /health.

Manual tests on a polkadot node syncing:

```bash
➜ polkadot-sdk (na-fix-4443) ✗ curl -v localhost:9944/health
* Host localhost:9944 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:9944...
* connect to ::1 port 9944 from ::1 port 55024 failed: Connection refused
*   Trying 127.0.0.1:9944...
* Connected to localhost (127.0.0.1) port 9944
> GET /health HTTP/1.1
> Host: localhost:9944
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 200 OK
< content-type: application/json; charset=utf-8
< content-length: 53
< date: Fri, 14 Jun 2024 16:12:23 GMT
<
* Connection #0 to host localhost left intact
{"peers":0,"isSyncing":false,"shouldHavePeers":false}%
➜ polkadot-sdk (na-fix-4443) ✗ curl -v localhost:9944/health/readiness
* Host localhost:9944 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:9944...
* connect to ::1 port 9944 from ::1 port 54328 failed: Connection refused
*   Trying 127.0.0.1:9944...
* Connected to localhost (127.0.0.1) port 9944
> GET /health/readiness HTTP/1.1
> Host: localhost:9944
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< content-type: application/json; charset=utf-8
< content-length: 0
< date: Fri, 14 Jun 2024 16:12:36 GMT
<
* Connection #0 to host localhost left intact
```

//cc @BulatSaif you may be interested in this..

---------

Co-authored-by: Bastian Köcher <[email protected]>
TarekkMA pushed a commit to moonbeam-foundation/polkadot-sdk that referenced this issue Aug 2, 2024
Previous attempt paritytech/substrate#14314

Close paritytech#4443 

Ideally, we should move /health and /health/readiness to the prometheus
server but because it's was quite easy to implement on the RPC server
and that RPC server already exposes /health.

Manual tests on a polkadot node syncing:

```bash
➜ polkadot-sdk (na-fix-4443) ✗ curl -v localhost:9944/health
* Host localhost:9944 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:9944...
* connect to ::1 port 9944 from ::1 port 55024 failed: Connection refused
*   Trying 127.0.0.1:9944...
* Connected to localhost (127.0.0.1) port 9944
> GET /health HTTP/1.1
> Host: localhost:9944
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 200 OK
< content-type: application/json; charset=utf-8
< content-length: 53
< date: Fri, 14 Jun 2024 16:12:23 GMT
<
* Connection #0 to host localhost left intact
{"peers":0,"isSyncing":false,"shouldHavePeers":false}%
➜ polkadot-sdk (na-fix-4443) ✗ curl -v localhost:9944/health/readiness
* Host localhost:9944 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:9944...
* connect to ::1 port 9944 from ::1 port 54328 failed: Connection refused
*   Trying 127.0.0.1:9944...
* Connected to localhost (127.0.0.1) port 9944
> GET /health/readiness HTTP/1.1
> Host: localhost:9944
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< content-type: application/json; charset=utf-8
< content-length: 0
< date: Fri, 14 Jun 2024 16:12:36 GMT
<
* Connection #0 to host localhost left intact
```

//cc @BulatSaif you may be interested in this..

---------

Co-authored-by: Bastian Köcher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants