fix: Limit the size of http request bodies that we handle #1405

AndrewSisley · 2023-04-25T16:28:47Z

Relevant issue(s)

Resolves #1322

Description

Limits the size of http request bodies that we handle to make things slightly harder for anyone to DoS a defra node.

Timeout time preferred by me over a byte limit as I find it much more descriptive, and will scale nicely with the system/hardware-context (more powerful machines can handle bigger req. bodies). Is undoubtably a runtime cost involved though compared to just using an int.

I set all the limits to 1 second, as that seems like plenty for now. Should be added to the config at somepoint IMO though (not now).

jsimnz · 2023-04-25T16:49:40Z

Im very much against a time based limit for requests. Requests can absolutely take more than a second if there is a large dataset and a complex/inefficient query.

The original problem was the fact that our original implementation would arbitrarily read the entire incoming request payload, which according to the HTTP spec for POST is technically unlimited, meaning you could send a 1TB request and crash the server.

There should be a simple sane byte size limit on the request, and nothing more.

fredcarle

I think the structure of this change is good but the limiting parameter should be size and not time.

fredcarle · 2023-04-25T16:53:07Z

api/http/handlerfuncs.go

@@ -124,7 +126,7 @@ func execGQLHandler(rw http.ResponseWriter, req *http.Request) {
 				handleErr(req.Context(), rw, ErrBodyEmpty, http.StatusBadRequest)
 				return
 			}
-			body, err := io.ReadAll(req.Body)
+			body, err := readWithTimeout(req.Context(), req.Body, time.Second)


suggestion: I don't think we should have a timeout function like this. Timeouts for http requests should be handled at the server level. We have a limited implementation of it at the moment but when we update Go to 1.20, we'll have some added functionalities to help us with this.

fredcarle · 2023-04-25T16:54:32Z

api/http/handlerfuncs.go

@@ -322,3 +324,32 @@ func subscriptionHandler(pub *events.Publisher[events.Update], rw http.ResponseW
 		}
 	}
 }
+
+// readWithTimeout reads from the reader until either EoF or the given maxDuration has been reached.
+func readWithTimeout(ctx context.Context, reader io.Reader, maxDuration time.Duration) ([]byte, error) {


suggestion: Taking my above suggestion into account, this could be changed to a readWithLimit function where the limit is the maximum payload size we will allow.

codecov · 2023-04-25T17:01:03Z

Codecov Report

Merging #1405 (4d727df) into develop (df0d323) will decrease coverage by 0.05%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #1405      +/-   ##
===========================================
- Coverage    72.09%   72.05%   -0.05%     
===========================================
  Files          185      185              
  Lines        18166    18175       +9     
===========================================
- Hits         13097    13096       -1     
- Misses        4037     4045       +8     
- Partials      1032     1034       +2

Impacted Files	Coverage Δ
api/http/handlerfuncs.go	`84.01% <100.00%> (+0.61%)`	⬆️

... and 7 files with indirect coverage changes

AndrewSisley · 2023-04-25T17:30:50Z

Im very much against a time based limit for requests. Requests can absolutely take more than a second if there is a large dataset and a complex/inefficient query.

Of course they can, but if we want to handle that then we can use a bigger timeout?

The original problem was the fact that our original implementation would arbitrarily read the entire incoming request payload, which according to the HTTP spec for POST is technically unlimited, meaning you could send a 1TB request and crash the server.

You disagree that the proposed solution solves this? And you don't think it scales better to the host hardware/context? A fixed size limit for all would mean that a defra server running on highly constrained device would still be exposed to this, whereas a large device would be needlessly limited.

jsimnz · 2023-04-25T17:48:17Z

You disagree that the proposed solution solves this? And you don't think it scales better to the host hardware/context? A fixed size limit for all would mean that a defra server running on highly constrained device would still be exposed to this, whereas a large device would be needlessly limited.

Theres very few (none) instances where we need to support incoming request payloads larger than 4GB for example which is the common limit POST data. Now thats to accomodate binary uploads, I would say we don't need anything larger than a couple of MB.

This can still be exposed from a configuration perspective. So large device/small device is somewhat irrelevant to this discussion.

Its more about what is a better metric to protect against this DOS attack, and the primary concern is unbounded read buffers, ie byte size limit.

Theres no benefit to time limit in context of the DOS attack, and I believe it is a poor metric to limit on as there is just too many different ways someone might unknowingly/accidentally hit that limit without understanding why their requests are failling.

AndrewSisley · 2023-04-25T18:32:37Z

... Theres no benefit to time limit in context of the DOS attack, and I believe it is a poor metric to limit on as there is just too many different ways someone might unknowingly/accidentally hit that limit without understanding why their requests are failling.

Time limits the amount of bytes that can be read, in a way that allows the limit to roughly scale with the host capabilities, reducing the likelihood that users will need to (re)configure the value.

A couple of MB seems quite small if you thought 1 second was too small, do you have a specific default size-value in mind?

Note: If capping by byte count the loop will be removed, reading in chunks would serve no purpose in that case.

jsimnz · 2023-04-25T19:08:22Z

Time limits the amount of bytes that can be read

Yes, but not limited to the "bytes you can read" which is where my strong objection comes from. Limiting requests by time not only limits bytes you can read, it also encompasses time it takes to parse request, compile plan graph, execute query, serialize response, and write bytes back to client.

That is far more than just the amount of bytes. If the concern is the DOS from unbounded incoming byte buffers, then we need to limit the size of those buffers, and limit the max size of the incoming request payload.

Above is incorrect, apologies 👍

More generally, its about being specific, on not bound to the IO caps at that specific point in time. Theres never going to be a request that takes 1 second to just read in the input data. Request byte size is simple, straightforward, well understood, and easy

A couple of MB seems quite small if you thought 1 second was too small, do you have a specific default size-value in mind?

As the incoming request is only the GQL data in most cases, or the schema SDL, the "couple of MBs" was to anchor the conversation in the magnitude of data we're dealing with. Ultimately it doesn't matter what the exact limit is, its more that we get the magnitude correct (couple of MBs, 100s of MBs, GBs, etc).

jsimnz · 2023-04-25T19:43:24Z

api/http/handlerfuncs.go

@@ -124,7 +126,7 @@ func execGQLHandler(rw http.ResponseWriter, req *http.Request) {
 				handleErr(req.Context(), rw, ErrBodyEmpty, http.StatusBadRequest)
 				return
 			}
-			body, err := io.ReadAll(req.Body)
+			body, err := readWithTimeout(req.Context(), req.Body, time.Second)


in terms of concrete suggestion for my comments, this whole PR can be turned into a single line change, which also avoids context, additional goroutine for timeouts, select call etc.

r := io.LimitReader(req.Body, 1 << (10 * 2)) // here 1 << (10 * 2) is a Megabyte, can be expressed however body, err := io.ReadAll(r)

io.ReadAll is potential very wasteful memory-wise as it relies on append for resizing the result slice. It could theoretically mean that the capacity (allocated mem cost) of the result is actually (2n - 1).

I also really dont want any more magic in the magic number, and doing that calc inline like that is not very readable, with or without a comment.

jsimnz

LGTM

fredcarle

Yass! I like this version a lot :)

Requested by John

It relies on append, which can result in siginificant over allocation of mem

…ork#1405)

AndrewSisley added bug Something isn't working area/network Related to internal/external network components action/no-benchmark Skips the action that runs the benchmark. labels Apr 25, 2023

AndrewSisley added this to the DefraDB v0.5.1 milestone Apr 25, 2023

AndrewSisley requested a review from a team April 25, 2023 16:28

AndrewSisley self-assigned this Apr 25, 2023

fredcarle requested changes Apr 25, 2023

View reviewed changes

jsimnz requested changes Apr 25, 2023

View reviewed changes

AndrewSisley force-pushed the 1322-http-reader branch from ed6fda2 to 9f9ec5d Compare April 25, 2023 19:50

AndrewSisley requested review from jsimnz and fredcarle April 25, 2023 19:51

jsimnz approved these changes Apr 25, 2023

View reviewed changes

fredcarle approved these changes Apr 25, 2023

View reviewed changes

AndrewSisley added 4 commits April 25, 2023 17:24

Limit the size of request bodies that we handle

63524d4

PR FIXUP - Use byte limit instead of time

b8d7b5e

PR FIXUP - Use different magic number

627ea07

Requested by John

PR FIXUP - Avoid ReadAll

4d727df

It relies on append, which can result in siginificant over allocation of mem

AndrewSisley force-pushed the 1322-http-reader branch from 28caaa7 to 4d727df Compare April 25, 2023 21:24

AndrewSisley merged commit f0d7fe0 into sourcenetwork:develop Apr 25, 2023

shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024

fix: Limit the size of http request bodies that we handle (sourcenetw…

5a5ba85

…ork#1405)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Limit the size of http request bodies that we handle #1405

fix: Limit the size of http request bodies that we handle #1405

AndrewSisley commented Apr 25, 2023 •

edited

Loading

jsimnz commented Apr 25, 2023

fredcarle left a comment

fredcarle Apr 25, 2023

fredcarle Apr 25, 2023

codecov bot commented Apr 25, 2023 •

edited

Loading

AndrewSisley commented Apr 25, 2023

jsimnz commented Apr 25, 2023

AndrewSisley commented Apr 25, 2023 •

edited

Loading

jsimnz commented Apr 25, 2023 •

edited

Loading

jsimnz Apr 25, 2023

AndrewSisley Apr 25, 2023

jsimnz left a comment

fredcarle left a comment

fix: Limit the size of http request bodies that we handle #1405

fix: Limit the size of http request bodies that we handle #1405

Conversation

AndrewSisley commented Apr 25, 2023 • edited Loading

Relevant issue(s)

Description

jsimnz commented Apr 25, 2023

fredcarle left a comment

Choose a reason for hiding this comment

fredcarle Apr 25, 2023

Choose a reason for hiding this comment

fredcarle Apr 25, 2023

Choose a reason for hiding this comment

codecov bot commented Apr 25, 2023 • edited Loading

Codecov Report

AndrewSisley commented Apr 25, 2023

jsimnz commented Apr 25, 2023

AndrewSisley commented Apr 25, 2023 • edited Loading

jsimnz commented Apr 25, 2023 • edited Loading

jsimnz Apr 25, 2023

Choose a reason for hiding this comment

AndrewSisley Apr 25, 2023

Choose a reason for hiding this comment

jsimnz left a comment

Choose a reason for hiding this comment

fredcarle left a comment

Choose a reason for hiding this comment

AndrewSisley commented Apr 25, 2023 •

edited

Loading

codecov bot commented Apr 25, 2023 •

edited

Loading

AndrewSisley commented Apr 25, 2023 •

edited

Loading

jsimnz commented Apr 25, 2023 •

edited

Loading