-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Limit the size of http request bodies that we handle #1405
fix: Limit the size of http request bodies that we handle #1405
Conversation
Im very much against a time based limit for requests. Requests can absolutely take more than a second if there is a large dataset and a complex/inefficient query. The original problem was the fact that our original implementation would arbitrarily read the entire incoming request payload, which according to the HTTP spec for POST is technically unlimited, meaning you could send a 1TB request and crash the server. There should be a simple sane byte size limit on the request, and nothing more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the structure of this change is good but the limiting parameter should be size and not time.
api/http/handlerfuncs.go
Outdated
@@ -124,7 +126,7 @@ func execGQLHandler(rw http.ResponseWriter, req *http.Request) { | |||
handleErr(req.Context(), rw, ErrBodyEmpty, http.StatusBadRequest) | |||
return | |||
} | |||
body, err := io.ReadAll(req.Body) | |||
body, err := readWithTimeout(req.Context(), req.Body, time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I don't think we should have a timeout function like this. Timeouts for http requests should be handled at the server level. We have a limited implementation of it at the moment but when we update Go to 1.20, we'll have some added functionalities to help us with this.
api/http/handlerfuncs.go
Outdated
@@ -322,3 +324,32 @@ func subscriptionHandler(pub *events.Publisher[events.Update], rw http.ResponseW | |||
} | |||
} | |||
} | |||
|
|||
// readWithTimeout reads from the reader until either EoF or the given maxDuration has been reached. | |||
func readWithTimeout(ctx context.Context, reader io.Reader, maxDuration time.Duration) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Taking my above suggestion into account, this could be changed to a readWithLimit
function where the limit is the maximum payload size we will allow.
Codecov Report
@@ Coverage Diff @@
## develop #1405 +/- ##
===========================================
- Coverage 72.09% 72.05% -0.05%
===========================================
Files 185 185
Lines 18166 18175 +9
===========================================
- Hits 13097 13096 -1
- Misses 4037 4045 +8
- Partials 1032 1034 +2
|
Of course they can, but if we want to handle that then we can use a bigger timeout?
You disagree that the proposed solution solves this? And you don't think it scales better to the host hardware/context? A fixed size limit for all would mean that a defra server running on highly constrained device would still be exposed to this, whereas a large device would be needlessly limited. |
Theres very few (none) instances where we need to support incoming request payloads larger than 4GB for example which is the common limit POST data. Now thats to accomodate binary uploads, I would say we don't need anything larger than a couple of MB. This can still be exposed from a configuration perspective. So large device/small device is somewhat irrelevant to this discussion. Its more about what is a better metric to protect against this DOS attack, and the primary concern is unbounded read buffers, ie byte size limit. Theres no benefit to time limit in context of the DOS attack, and I believe it is a poor metric to limit on as there is just too many different ways someone might unknowingly/accidentally hit that limit without understanding why their requests are failling. |
Time limits the amount of bytes that can be read, in a way that allows the limit to roughly scale with the host capabilities, reducing the likelihood that users will need to (re)configure the value. A couple of MB seems quite small if you thought 1 second was too small, do you have a specific default size-value in mind? Note: If capping by byte count the loop will be removed, reading in chunks would serve no purpose in that case. |
Above is incorrect, apologies 👍 More generally, its about being specific, on not bound to the IO caps at that specific point in time. Theres never going to be a request that takes 1 second to just read in the input data. Request byte size is simple, straightforward, well understood, and easy
As the incoming request is only the GQL data in most cases, or the schema SDL, the "couple of MBs" was to anchor the conversation in the magnitude of data we're dealing with. Ultimately it doesn't matter what the exact limit is, its more that we get the magnitude correct (couple of MBs, 100s of MBs, GBs, etc). |
api/http/handlerfuncs.go
Outdated
@@ -124,7 +126,7 @@ func execGQLHandler(rw http.ResponseWriter, req *http.Request) { | |||
handleErr(req.Context(), rw, ErrBodyEmpty, http.StatusBadRequest) | |||
return | |||
} | |||
body, err := io.ReadAll(req.Body) | |||
body, err := readWithTimeout(req.Context(), req.Body, time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in terms of concrete suggestion for my comments, this whole PR can be turned into a single line change, which also avoids context, additional goroutine for timeouts, select call etc.
r := io.LimitReader(req.Body, 1 << (10 * 2)) // here 1 << (10 * 2) is a Megabyte, can be expressed however
body, err := io.ReadAll(r)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io.ReadAll
is potential very wasteful memory-wise as it relies on append
for resizing the result slice. It could theoretically mean that the capacity (allocated mem cost) of the result is actually (2n - 1).
I also really dont want any more magic in the magic number, and doing that calc inline like that is not very readable, with or without a comment.
ed6fda2
to
9f9ec5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yass! I like this version a lot :)
Requested by John
It relies on append, which can result in siginificant over allocation of mem
28caaa7
to
4d727df
Compare
Relevant issue(s)
Resolves #1322
Description
Limits the size of http request bodies that we handle to make things slightly harder for anyone to DoS a defra node.
Timeout time preferred by me over a byte limit as I find it much more descriptive, and will scale nicely with the system/hardware-context (more powerful machines can handle bigger req. bodies). Is undoubtably a runtime cost involved though compared to just using an int.
I set all the limits to 1 second, as that seems like plenty for now. Should be added to the config at somepoint IMO though (not now).