extend Hasura service boot grace period from 0 to 180s #4051

freemvmt · 2024-12-06T15:25:23Z

This PR introduces a 180s grace period for the Hasura service in an attempt to deal with a boot loop during scale up.

Context

During load testing, even with relatively small numbers of users we immediately hit issues with Hasura.

When the service tries to scale up with a 2nd Task to deal with the load, the new task fails to boot to a healthy state (which the load balancer checks for before directing traffic to it), and AWS then has to deprovision them (usually within 2 minutes or so of boot).

The issue seems to be with the Hasura container itself, since the HasuraProxy container reports the following in the logs for one of these failed task:

{"level":"error","ts":1733444599.5636377,"logger":"http.log.error","msg":"dial tcp 127.0.0.1:8080: connect: connection refused","request":{"remote_ip":"10.0.69.213","remote_port":"35948","proto":"HTTP/1.1","method":"GET","host":"10.0.1.180","uri":"/healthz","headers":{"Connection":["close"],"User-Agent":["ELB-HealthChecker/2.0"],"Accept-Encoding":["gzip, compressed"]}},"duration":0.000346896,"status":502,"err_id":"swhm20qqn","err_trace":"reverseproxy.statusError (reverseproxy.go:1299)"}

Note the 502 Bad Gateway status code, the reverseproxy.statusError (reverseproxy.go:1299) error trace (source), and the very short duration of the request.

See Notion doc for more detail.

github-actions · 2024-12-06T15:37:00Z

Removed vultr server and associated DNS entries

extend Hasura service boot grace period from 0 to 180s

dd78d4e

freemvmt requested a review from DafyddLlyr December 6, 2024 15:29

jessicamcinchak approved these changes Dec 6, 2024

View reviewed changes

freemvmt merged commit 4fe7503 into main Dec 6, 2024
10 of 11 checks passed

freemvmt deleted the hasura-service-grace-period branch December 6, 2024 15:36

This was referenced Dec 12, 2024

boost hasura-proxy Fargate container CPU / memory (staging) #4076

Merged

extend timeout for Hasura migrations at boot #4080

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend Hasura service boot grace period from 0 to 180s #4051

extend Hasura service boot grace period from 0 to 180s #4051

freemvmt commented Dec 6, 2024 •

edited

Loading

github-actions bot commented Dec 6, 2024

extend Hasura service boot grace period from 0 to 180s #4051

extend Hasura service boot grace period from 0 to 180s #4051

Conversation

freemvmt commented Dec 6, 2024 • edited Loading

Context

github-actions bot commented Dec 6, 2024

freemvmt commented Dec 6, 2024 •

edited

Loading