504 in the frontend facia app #10392

alinaboghiu · 2024-01-29T17:07:16Z

Tasks

Give feedback

Try reducing the number of instances that render fronts (from 27 to 18) #10396
Refactor DCR CDK scaling configuration to allow variations between apps #10408
Create Fronts DCR App #9323
Try running the new fronts rendering app on different sized instances
Options

cemms1 · 2024-01-31T16:55:31Z

The volume of ELB 5xx on the rendering app drops significantly when directing fronts-based traffic to the new facia-rendering app

Graph of rendering app showing the requests vs errors and latency charts for the period before, during and after changing the app doing the fronts rendering:

cemms1 · 2024-01-31T17:16:27Z

When we change from an ELB to an ALB, load balancer 504 responses turn into 502 responses

HTTP 504 errors on the facia app for DCR requests:

HTTP 502 errors on the facia app for DCR requests:

cemms1 · 2024-01-31T18:37:05Z

Current theory is the following from this AWS troubleshooting page:

The target closed the connection with a TCP RST or a TCP FIN while the load balancer had an outstanding request to the target

The load balancer receives a request and forwards it to the target. The target receives the request and starts to process it, but closes the connection to the load balancer too early. > This usually occurs when the duration of the keep-alive timeout for the target is shorter than the idle timeout value of the load balancer. Make sure that the duration of the keep-alive timeout is greater than the idle timeout value.

Check the values for the request_processing_time, target_processing_time and response_processing_time fields.

See the following example access log entry:

http 2022-04-15T16:52:50.757968Z app/my-loadbalancer/50dc6c495c0c9188 192.168.131.39:2817 10.0.0.1:80 0.001 4.205 -1 502 - 94 326 "GET http://example.com:80 HTTP/1.1" "curl/7.51.0" - - arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/73e2d6bc24d8a067 "Root=1-58337262-36d228ad5d99923122bbe354"

Note: In this access log entry, the request_processing_time is 0.001, the target_processing_time is 4.205, and the response_processing_time is -1.

Useful links:

cemms1 · 2024-02-01T09:44:28Z

I am fairly sure the 502s, and previously 504s, were being caused by the target server keep alive timeout being shorter than the load balancer connection idle timeout

The options here are:

increase the keep alive timeout on the node application
e.g. in server.prod.ts:

const server = app.listen(port);
server.keepAliveTimeout = 90 * 1000; // ensure this is higher than the default LB idle timeout of 60 seconds

decrease the idle timeout on the load balancer
e.g. in the screenshot of the LB settings below, ensure the timeout is lower than the Node default timeout of 5 seconds

There's a blog post about this issue here, which is worth a quick read

alinaboghiu · 2024-02-01T09:54:21Z

This write up is fantastic, thank you Charlotte, brilliant 🕵️ work.

cemms1 · 2024-02-01T12:20:50Z

With guidance from DevX, decided to go with option 2 to decrease the idle timeout on the load balancer. The PR is here

alinaboghiu mentioned this issue Jan 29, 2024

5xxes #10391

Closed

alinaboghiu added this to the Health milestone Jan 29, 2024

alinaboghiu added this to WebX Team Jan 29, 2024

github-project-automation bot moved this to Triage in WebX Team Jan 29, 2024

alinaboghiu moved this from Triage to In Progress in WebX Team Jan 29, 2024

alinaboghiu assigned cemms1 and JamieB-gu Jan 29, 2024

This was referenced Jan 29, 2024

502s in the new article rendering stack #10393

Closed

5xx on the ELB for the DCR main/original rendering app #10394

Closed

JamieB-gu mentioned this issue Jan 30, 2024

Rendering App Stuck In Alarm Since Traffic Move To Article Rendering App #10265

Closed

cemms1 added the Epic label Jan 31, 2024

cemms1 mentioned this issue Feb 1, 2024

Updates idle timeout for load balancer to stop ELB HTTP 502s #10450

Merged

cemms1 moved this from In Progress to Review in WebX Team Feb 1, 2024

cemms1 closed this as completed in #10450 Feb 1, 2024

github-project-automation bot moved this from Review to Done in WebX Team Feb 1, 2024

alinaboghiu mentioned this issue Feb 2, 2024

What if we made the id of Video Main Media optional? #10344

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

504 in the frontend facia app #10392

504 in the frontend facia app #10392

alinaboghiu commented Jan 29, 2024 •

edited by cemms1

Loading

Tasks

cemms1 commented Jan 31, 2024

cemms1 commented Jan 31, 2024

cemms1 commented Jan 31, 2024

cemms1 commented Feb 1, 2024 •

edited

Loading

alinaboghiu commented Feb 1, 2024

cemms1 commented Feb 1, 2024 •

edited

Loading

504 in the frontend facia app #10392

504 in the frontend facia app #10392

Comments

alinaboghiu commented Jan 29, 2024 • edited by cemms1 Loading

Tasks

cemms1 commented Jan 31, 2024

cemms1 commented Jan 31, 2024

cemms1 commented Jan 31, 2024

cemms1 commented Feb 1, 2024 • edited Loading

alinaboghiu commented Feb 1, 2024

cemms1 commented Feb 1, 2024 • edited Loading

alinaboghiu commented Jan 29, 2024 •

edited by cemms1

Loading

cemms1 commented Feb 1, 2024 •

edited

Loading

cemms1 commented Feb 1, 2024 •

edited

Loading