Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stoplight for object storage failures, return HTTP 503 #13043

Merged
merged 1 commit into from
Dec 15, 2020

Conversation

Gargron
Copy link
Member

@Gargron Gargron commented Feb 4, 2020

Reduce traffic congestion due to timeouts

@Gargron Gargron added the performance Runtime performance label Feb 4, 2020
@Gargron Gargron force-pushed the fix-object-storage-stoplights branch 2 times, most recently from 7d4302c to 1108be0 Compare February 7, 2020 14:36
@Gargron Gargron changed the title Add stoplights for object-storage specific API endpoints Add stoplight for object storage failures, return HTTP 503 Feb 7, 2020
@Gargron Gargron force-pushed the fix-object-storage-stoplights branch from 1108be0 to 75c7185 Compare February 7, 2020 15:07
@nightpool
Copy link
Member

can we make this opt in through an environment variable? it seems bad if people on reliable storage providers would see increased unavailability due to transient network errors.

i.e., if I'm using Digital Ocean Spaces, and there's a one off timeout, I don't want that to trigger a stoplight, because I know it's probably not indicative of a more serious failure.

@Gargron
Copy link
Member Author

Gargron commented Feb 7, 2020

@nightpool Threshold is 3 and cooldown is 60 seconds by default. Can we find a combination that you would be happy with without requiring extra configuration? Even a cooldown of 10 seconds would greatly reduce the amount of timing out concurrent requests.

Edit: Changed threshold to 10 failures, cooldown to 30 seconds. If object storage fails 10 times, the next 30 seconds it will fail instantly, then it can fail another 10 times. Is that OK with you?

@Gargron Gargron force-pushed the fix-object-storage-stoplights branch 3 times, most recently from a820350 to 4d2a985 Compare February 8, 2020 16:15
@Gargron Gargron force-pushed the fix-object-storage-stoplights branch from 4d2a985 to d3f50d3 Compare March 8, 2020 23:04
@Gargron Gargron requested a review from ClearlyClaire March 10, 2020 11:11
@Gargron Gargron force-pushed the fix-object-storage-stoplights branch from d3f50d3 to 4f55364 Compare May 10, 2020 09:24
@Gargron Gargron requested a review from ClearlyClaire May 10, 2020 09:52
@ClearlyClaire
Copy link
Contributor

One thing that this could have enabled is receiving and processing statuses with media attachments when the object storage is down. However, I think the error handling in ActivityPub::Activity::Create#process_attachments would actually not catch it, and abort the whole job?

@stale
Copy link

stale bot commented Sep 7, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status/wontfix This will not be worked on label Sep 7, 2020
@stale stale bot closed this Sep 14, 2020
@Gargron Gargron reopened this Sep 14, 2020
@stale stale bot removed the status/wontfix This will not be worked on label Sep 14, 2020
@Gargron Gargron force-pushed the fix-object-storage-stoplights branch from 4f55364 to 5d9c295 Compare December 15, 2020 03:40
Copy link
Contributor

@ClearlyClaire ClearlyClaire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me although I'm a bit worried about temporary storage server failures causing to silently avoid downloading media attachments/emoji. It's probably a bit more useful overall, but I'm afraid this would make the behavior slightly more surprising and slightly more difficult to debug.

@Gargron Gargron merged commit 1045549 into master Dec 15, 2020
@Gargron Gargron deleted the fix-object-storage-stoplights branch December 15, 2020 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Runtime performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants