Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend Hasura service boot grace period from 0 to 180s #4051

Merged
merged 1 commit into from
Dec 6, 2024

Conversation

freemvmt
Copy link
Contributor

@freemvmt freemvmt commented Dec 6, 2024

Related to this ticket.

This PR introduces a 180s grace period for the Hasura service in an attempt to deal with a boot loop during scale up.

Context

During load testing, even with relatively small numbers of users we immediately hit issues with Hasura.

When the service tries to scale up with a 2nd Task to deal with the load, the new task fails to boot to a healthy state (which the load balancer checks for before directing traffic to it), and AWS then has to deprovision them (usually within 2 minutes or so of boot).

image

The issue seems to be with the Hasura container itself, since the HasuraProxy container reports the following in the logs for one of these failed task:

{"level":"error","ts":1733444599.5636377,"logger":"http.log.error","msg":"dial tcp 127.0.0.1:8080: connect: connection refused","request":{"remote_ip":"10.0.69.213","remote_port":"35948","proto":"HTTP/1.1","method":"GET","host":"10.0.1.180","uri":"/healthz","headers":{"Connection":["close"],"User-Agent":["ELB-HealthChecker/2.0"],"Accept-Encoding":["gzip, compressed"]}},"duration":0.000346896,"status":502,"err_id":"swhm20qqn","err_trace":"reverseproxy.statusError (reverseproxy.go:1299)"}

Note the 502 Bad Gateway status code, the reverseproxy.statusError (reverseproxy.go:1299) error trace (source), and the very short duration of the request.

See Notion doc for more detail.

@freemvmt freemvmt requested a review from DafyddLlyr December 6, 2024 15:29
@freemvmt freemvmt merged commit 4fe7503 into main Dec 6, 2024
10 of 11 checks passed
@freemvmt freemvmt deleted the hasura-service-grace-period branch December 6, 2024 15:36
Copy link

github-actions bot commented Dec 6, 2024

Removed vultr server and associated DNS entries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants