Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent API service from becoming totally unresponsive/DoS #2085

Open
yarikoptic opened this issue Nov 26, 2024 · 1 comment
Open

Prevent API service from becoming totally unresponsive/DoS #2085

yarikoptic opened this issue Nov 26, 2024 · 1 comment
Labels
performance Improve performance of an existing feature

Comments

@yarikoptic
Copy link
Member

Last saturday (Nov 23, 2024) we got main archive to become unresponsive because of being unable to communicate to API server. To the visitor it hanged for awhile eventually showing

image

more on the situation could be discovered in slack: https://app.slack.com/client/E01044K0LBZ/GMRLT5RQ8

Sample of logs from around that point in time happen someone decides to check:

2024-11-23T01:25:01.383611+00:00 app[web.1]: 10.1.87.10 - - [23/Nov/2024:01:25:01 +0000] "GET /api/dandisets/000026/versions/draft/assets/?path=sub-I48%2Fses-SPIM%2Fmicr%2Fsub-I48_ses-SPIM_sample-BrocaAreaS08_stain-Calretinin_SPIM.ome.zarr&metadata=1&order=path HTTP/1.1" 200 1684 "-" "dandidav/0.5.0 (https://github.com/dandi/dandidav)"
2024-11-23T01:25:01.626650+00:00 app[web.1]: 10.1.61.240 - - [23/Nov/2024:01:25:01 +0000] "GET /api/dandisets/000026/versions/draft/assets/?path=sub-I48%2Fses-SPIM%2Fmicr%2Fsub-I48_ses-SPIM_sample-BrocaAreaS09_stain-Nuclei_SPIM.ome.zarr&metadata=1&order=path HTTP/1.1" 200 1676 "-" "dandidav/0.5.0 (https://github.com/dandi/dandidav)"
2024-11-23T01:25:01.466962+00:00 app[analytics-worker.1]: [2024-11-23 01:25:01,466: INFO/ForkPoolWorker-1] Task dandiapi.analytics.tasks.process_s3_log_file_task[44b248eb-084c-4fc0-b43b-30e00a4c4783] succeeded in 0.9786828100041021s: None

and specific "trigger" to the situation is webdav needing to make per-asset requests on a heavy in number of assets dandiset 000026. The particular issue to be addressed to allow for more efficient API is

but the point of this issue is different. IMHO API service should be made more robust against DoS situations where one client or some specific set of IPs hog it up preventing others entirely. Possibly it could be done via limiting but I think it is worth looking into some "QoS" (quality of service) balancing/throttling.

@yarikoptic yarikoptic added the performance Improve performance of an existing feature label Nov 26, 2024
@waxlamp
Copy link
Member

waxlamp commented Dec 12, 2024

Possibly it could be done via limiting but I think it is worth looking into some "QoS" (quality of service) balancing/throttling.

The standard solution is rate limiting. QoS balancing/throttling sounds orders of magnitude more complex. If we want to look into it as a research-type solution, that's fine, but rate limiting is really the thing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improve performance of an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants