Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Limit the size of http request bodies that we handle #1405

Merged
merged 4 commits into from
Apr 25, 2023

Conversation

AndrewSisley
Copy link
Contributor

@AndrewSisley AndrewSisley commented Apr 25, 2023

Relevant issue(s)

Resolves #1322

Description

Limits the size of http request bodies that we handle to make things slightly harder for anyone to DoS a defra node.

Timeout time preferred by me over a byte limit as I find it much more descriptive, and will scale nicely with the system/hardware-context (more powerful machines can handle bigger req. bodies). Is undoubtably a runtime cost involved though compared to just using an int.

I set all the limits to 1 second, as that seems like plenty for now. Should be added to the config at somepoint IMO though (not now).

@AndrewSisley AndrewSisley added bug Something isn't working area/network Related to internal/external network components action/no-benchmark Skips the action that runs the benchmark. labels Apr 25, 2023
@AndrewSisley AndrewSisley added this to the DefraDB v0.5.1 milestone Apr 25, 2023
@AndrewSisley AndrewSisley requested a review from a team April 25, 2023 16:28
@AndrewSisley AndrewSisley self-assigned this Apr 25, 2023
@jsimnz
Copy link
Member

jsimnz commented Apr 25, 2023

Im very much against a time based limit for requests. Requests can absolutely take more than a second if there is a large dataset and a complex/inefficient query.

The original problem was the fact that our original implementation would arbitrarily read the entire incoming request payload, which according to the HTTP spec for POST is technically unlimited, meaning you could send a 1TB request and crash the server.

There should be a simple sane byte size limit on the request, and nothing more.

Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the structure of this change is good but the limiting parameter should be size and not time.

@@ -124,7 +126,7 @@ func execGQLHandler(rw http.ResponseWriter, req *http.Request) {
handleErr(req.Context(), rw, ErrBodyEmpty, http.StatusBadRequest)
return
}
body, err := io.ReadAll(req.Body)
body, err := readWithTimeout(req.Context(), req.Body, time.Second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I don't think we should have a timeout function like this. Timeouts for http requests should be handled at the server level. We have a limited implementation of it at the moment but when we update Go to 1.20, we'll have some added functionalities to help us with this.

@@ -322,3 +324,32 @@ func subscriptionHandler(pub *events.Publisher[events.Update], rw http.ResponseW
}
}
}

// readWithTimeout reads from the reader until either EoF or the given maxDuration has been reached.
func readWithTimeout(ctx context.Context, reader io.Reader, maxDuration time.Duration) ([]byte, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Taking my above suggestion into account, this could be changed to a readWithLimit function where the limit is the maximum payload size we will allow.

@codecov
Copy link

codecov bot commented Apr 25, 2023

Codecov Report

Merging #1405 (4d727df) into develop (df0d323) will decrease coverage by 0.05%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1405      +/-   ##
===========================================
- Coverage    72.09%   72.05%   -0.05%     
===========================================
  Files          185      185              
  Lines        18166    18175       +9     
===========================================
- Hits         13097    13096       -1     
- Misses        4037     4045       +8     
- Partials      1032     1034       +2     
Impacted Files Coverage Δ
api/http/handlerfuncs.go 84.01% <100.00%> (+0.61%) ⬆️

... and 7 files with indirect coverage changes

@AndrewSisley
Copy link
Contributor Author

Im very much against a time based limit for requests. Requests can absolutely take more than a second if there is a large dataset and a complex/inefficient query.

Of course they can, but if we want to handle that then we can use a bigger timeout?

The original problem was the fact that our original implementation would arbitrarily read the entire incoming request payload, which according to the HTTP spec for POST is technically unlimited, meaning you could send a 1TB request and crash the server.

You disagree that the proposed solution solves this? And you don't think it scales better to the host hardware/context? A fixed size limit for all would mean that a defra server running on highly constrained device would still be exposed to this, whereas a large device would be needlessly limited.

@jsimnz
Copy link
Member

jsimnz commented Apr 25, 2023

You disagree that the proposed solution solves this? And you don't think it scales better to the host hardware/context? A fixed size limit for all would mean that a defra server running on highly constrained device would still be exposed to this, whereas a large device would be needlessly limited.

Theres very few (none) instances where we need to support incoming request payloads larger than 4GB for example which is the common limit POST data. Now thats to accomodate binary uploads, I would say we don't need anything larger than a couple of MB.

This can still be exposed from a configuration perspective. So large device/small device is somewhat irrelevant to this discussion.

Its more about what is a better metric to protect against this DOS attack, and the primary concern is unbounded read buffers, ie byte size limit.

Theres no benefit to time limit in context of the DOS attack, and I believe it is a poor metric to limit on as there is just too many different ways someone might unknowingly/accidentally hit that limit without understanding why their requests are failling.

@AndrewSisley
Copy link
Contributor Author

AndrewSisley commented Apr 25, 2023

... Theres no benefit to time limit in context of the DOS attack, and I believe it is a poor metric to limit on as there is just too many different ways someone might unknowingly/accidentally hit that limit without understanding why their requests are failling.

Time limits the amount of bytes that can be read, in a way that allows the limit to roughly scale with the host capabilities, reducing the likelihood that users will need to (re)configure the value.

A couple of MB seems quite small if you thought 1 second was too small, do you have a specific default size-value in mind?

Note: If capping by byte count the loop will be removed, reading in chunks would serve no purpose in that case.

@jsimnz
Copy link
Member

jsimnz commented Apr 25, 2023

Time limits the amount of bytes that can be read

Yes, but not limited to the "bytes you can read" which is where my strong objection comes from. Limiting requests by time not only limits bytes you can read, it also encompasses time it takes to parse request, compile plan graph, execute query, serialize response, and write bytes back to client.

That is far more than just the amount of bytes. If the concern is the DOS from unbounded incoming byte buffers, then we need to limit the size of those buffers, and limit the max size of the incoming request payload.

Above is incorrect, apologies 👍

More generally, its about being specific, on not bound to the IO caps at that specific point in time. Theres never going to be a request that takes 1 second to just read in the input data. Request byte size is simple, straightforward, well understood, and easy

A couple of MB seems quite small if you thought 1 second was too small, do you have a specific default size-value in mind?

As the incoming request is only the GQL data in most cases, or the schema SDL, the "couple of MBs" was to anchor the conversation in the magnitude of data we're dealing with. Ultimately it doesn't matter what the exact limit is, its more that we get the magnitude correct (couple of MBs, 100s of MBs, GBs, etc).

@@ -124,7 +126,7 @@ func execGQLHandler(rw http.ResponseWriter, req *http.Request) {
handleErr(req.Context(), rw, ErrBodyEmpty, http.StatusBadRequest)
return
}
body, err := io.ReadAll(req.Body)
body, err := readWithTimeout(req.Context(), req.Body, time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in terms of concrete suggestion for my comments, this whole PR can be turned into a single line change, which also avoids context, additional goroutine for timeouts, select call etc.

r := io.LimitReader(req.Body,  1 << (10 * 2)) // here  1 << (10 * 2) is a Megabyte, can be expressed however
body, err := io.ReadAll(r)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.ReadAll is potential very wasteful memory-wise as it relies on append for resizing the result slice. It could theoretically mean that the capacity (allocated mem cost) of the result is actually (2n - 1).

I also really dont want any more magic in the magic number, and doing that calc inline like that is not very readable, with or without a comment.

Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yass! I like this version a lot :)

@AndrewSisley AndrewSisley merged commit f0d7fe0 into sourcenetwork:develop Apr 25, 2023
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/network Related to internal/external network components bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use of io.ReadAll in http handler can result in denial of service attack
3 participants