Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when making request, could not parse Url as Uri #668

Closed
timvisee opened this issue Oct 9, 2019 · 14 comments · Fixed by #2040
Closed

Panic when making request, could not parse Url as Uri #668

timvisee opened this issue Oct 9, 2019 · 14 comments · Fixed by #2040

Comments

@timvisee
Copy link

timvisee commented Oct 9, 2019

I just came across a panic in the latest reqwest release (both 0.10 and master). This happens when making a request.

It seems that some input in Client.get(url) parses successfully as URL at first, and then panics when attempting to parse the URL as http::Uri (internally).

Example

Here's a very basic (abstracted) example that causes this panic:

#[tokio::main]
async fn main() -> Result<(), reqwest::Error> {
    reqwest::Client::new()
        .get("http://\"")
        .send()
        .await?;
    Ok(())
}

Which outputs:

thread 'main' panicked at 'a parsed Url should always be a valid Uri: InvalidUri(InvalidUriChar)', src/libcore/result.rs:1165:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

I believe the Client.get(url) function should return an error in all cases where the URL is invalid. This actually happens when providing something like "http://a b" (InvalidDomainCharacter). However, "http://\"" succeeds and panics when making the request with .send().

The panic is located here.

Assumption

I have the following assumption on what is going on:

I think Url is used first to parse the provided URL (here). Then the assumption is made this URL is valid once parsing succeeds. Internally, when making a request through hyper, this URL is parsed to an Uri. Assuming the URL is valid this should always succeed, but it does not. That probably means that either implementation of Url or Uri is incorrect, or at least not compatible.

Al right. I'm not too familiar with the URL/URI specs, and don't want to go through it right now, to confirm either implementation is correct.

Fixing this

I see the following options:

  • Immediately parse to an Uri when creating the request (and check for validity).
  • Fix Url implementation (if invalid)
  • Fix Uri implementation (if invalid)
  • Use Uri throughout the crate.
  • Propagate this parsing issue up to the calling function, return as error in a Result.

I think the first is easiest to do on a short notice.

What would be the correct way of doing things? What are your thoughts, is the assumption I made right, and is this undesired behaviour?

I might be happy to implement a fix for this in code once I know what path I should take.

@Parth
Copy link

Parth commented Oct 10, 2019

Repro'd on the Url side of things

@timvisee
Copy link
Author

Here are a few other random ones that came up in my logs, parseable as Url but not as Uri:

"http://‘nvme@nip2"
"http://-1@"
"https://a@b"
"https://€@qq"

@Parth
Copy link

Parth commented Oct 14, 2019

As a non-ideal bandage, I'm using these in the short term: https://stackoverflow.com/questions/3809401/what-is-a-good-regular-expression-to-match-a-url

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

@timvisee
Copy link
Author

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

Thanks, as a workaround, I'm currently actually testing whether it will parse as Uri, to be 100% sure it works before providing it to reqwest.

@Parth
Copy link

Parth commented Oct 14, 2019

I believe URI is more forgiving in terms of the spec right?

@timvisee
Copy link
Author

timvisee commented Oct 14, 2019

I believe URI is more forgiving in terms of the spec right?

Not sure, it might be. But the panic came up because an Url (successfully parsed) couldn't be parsed as Uri (internally in reqwest). Testing this exact transformation before giving the URL to reqwest allows you to test whether it would panic or not, that's why I'm using this approach.

But in the end I think this issue should be fixed no matter what, of course.

@matthewrobertbell
Copy link

I've also hit this bug, on URLs that contain "<" and ">".

LIke @timvisee I am parsing URLs as http::Uri first, which does stop panics.

It seems to me that reqwest should parse as http::Uri as soon as possible, and return an error to the caller if it isn't valid.

@timvisee
Copy link
Author

timvisee commented Oct 15, 2019

It seems to me that reqwest should parse as http::Uri as soon as possible, and return an error to the caller if it isn't valid.

I don't think Uri can directly replace Url in the request struct, because Url is used to manage various URL parts. An explicit parse check to mitigate this panic upon creating the Url could be done, but this introduces undesirable overhead.

I believe #668 (comment) is correct, in that the best approach would be to fix the Url implementation.

@seanmonstar
Copy link
Owner

I don't believe this is necessarily a bug in url, as the URL spec allows for parsing a lot of non-sensical URLs.

As you've mentioned, reqwest should likely check sooner that the URL can also be a valid URI.

@gralpli
Copy link

gralpli commented Dec 13, 2020

My code panicked tonight and because the fix should be the same, I’m adding my example to this issue.

I am using a Telegram bot to inform me when something goes wrong. To send a message, you call a URL and pass the message in the query part of it. Unfortunately, because I included some logs in that message, the URL got too long and reqwest paniced at src/into_url.rs:44.

I’d expect that .send() returns a Result::Err instead.

Example to reproduce

use reqwest::blocking::Client; // reqwest 0.10.9
use url::Url; // url 2.2.0

fn main() {
    let text = "abcdefghijklmnopqrstuvwxyz".repeat(100_000);

    let mut url = Url::parse("https://www.example.com/").unwrap();
    url.query_pairs_mut().append_pair("text", &text);

    Client::new()
        .get(url)
        .send() // panics
        .expect("THIS IS NOT REACHED");
}
[package]
name = "bug"
version = "0.1.0"
edition = "2018"

[dependencies]
reqwest = { version = "0.10.9", features = [ "blocking" ] }
url = "2.2.0"

@Martichou
Copy link
Contributor

Martichou commented Dec 13, 2020

@gralpli I'm pretty sure your issue here is because you have way to much char in your url due to the repeat(100_000).
I've not looked at the underlying URL lib being used to parse the Url, but it should be normal that they have hardcoded a maximum length.

Can you try without the repeat and check if it works?

@gralpli
Copy link

gralpli commented Dec 13, 2020

@Martichou Yes, the intended issue is that the URL is too long. But it is reqwest‘s .send() that panics. It should return a Result::Err instead.

@Martichou
Copy link
Contributor

@gralpli Oh yeah I see what you mean.
Indeed it would be better, don't know if @seanmonstar have the same opinion.

mlodato517 pushed a commit to deepgram/http that referenced this issue Jan 6, 2021
**This Commit**
Increases the max length of URIs to be u32::MAX - 1

**Why?**
STEM can receive requests > u16::MAX. When it tries to forward these
requests to Impeller, `reqwest` crashes when parsing the URL into a URI
because `http` returns a `Result::Err`.

This is currently being tracked
[here](seanmonstar/reqwest#668).
mlodato517 pushed a commit to deepgram/http that referenced this issue Jan 6, 2021
**This Commit**
Increases the max length of URIs to be u32::MAX - 1

**Why?**
STEM can receive requests > u16::MAX. When it tries to forward these
requests to Impeller, `reqwest` crashes when parsing the URL into a URI
because `http` returns a `Result::Err`.

This is currently being tracked
[here](seanmonstar/reqwest#668).
mlodato517 pushed a commit to deepgram/http that referenced this issue Jan 6, 2021
**This Commit**
Increases the max length of URIs to be u32::MAX - 1

**Why?**
STEM can receive requests > u16::MAX. When it tries to forward these
requests to Impeller, `reqwest` crashes when parsing the URL into a URI
because `http` returns a `Result::Err`.

This is currently being tracked
[here](seanmonstar/reqwest#668).
mlodato517 pushed a commit to deepgram/http that referenced this issue Jan 6, 2021
**This Commit**
Increases the max length of URIs to be 100_00

**Why?**
STEM can receive requests > u16::MAX. When it tries to forward these
requests to Impeller, `reqwest` crashes when parsing the URL into a URI
because `http` returns a `Result::Err`.

This is currently being tracked
[here](seanmonstar/reqwest#668).
mlodato517 pushed a commit to deepgram/http that referenced this issue Jan 6, 2021
**This Commit**
Increases the max length of URIs to be 100_000

**Why?**
STEM can receive requests > u16::MAX. When it tries to forward these
requests to Impeller, `reqwest` crashes when parsing the URL into a URI
because `http` returns a `Result::Err`.

This is currently being tracked
[here](seanmonstar/reqwest#668).
mlodato517 pushed a commit to deepgram/http that referenced this issue Jan 6, 2021
**This Commit**
Increases the max length of URIs to be 100_000

**Why?**
STEM can receive requests > u16::MAX. When it tries to forward these
requests to Impeller, `reqwest` crashes when parsing the URL into a URI
because `http` returns a `Result::Err`.

This is currently being tracked
[here](seanmonstar/reqwest#668).
mlodato517 pushed a commit to deepgram/http that referenced this issue Jan 6, 2021
**This Commit**
Increases the max length of URIs to be 150_000

**Why?**
STEM can receive requests > u16::MAX. When it tries to forward these
requests to Impeller, `reqwest` crashes when parsing the URL into a URI
because `http` returns a `Result::Err`.

This is currently being tracked
[here](seanmonstar/reqwest#668).
mlodato517 pushed a commit to deepgram/http that referenced this issue Jan 6, 2021
**This Commit**
Increases the max length of URIs to be 150_000

**Why?**
STEM can receive requests > u16::MAX. When it tries to forward these
requests to Impeller, `reqwest` crashes when parsing the URL into a URI
because `http` returns a `Result::Err`.

This is currently being tracked
[here](seanmonstar/reqwest#668).
mlodato517 pushed a commit to deepgram/http that referenced this issue Jan 6, 2021
**This Commit**
Increases the max length of URIs to be 150_000

**Why?**
STEM can receive requests > u16::MAX. When it tries to forward these
requests to Impeller, `reqwest` crashes when parsing the URL into a URI
because `http` returns a `Result::Err`.

This is currently being tracked
[here](seanmonstar/reqwest#668).
mlodato517 pushed a commit to deepgram/http that referenced this issue Jan 7, 2021
**This Commit**
Increases the max length of URIs to be 150_000

**Why?**
STEM can receive requests > u16::MAX. When it tries to forward these
requests to Impeller, `reqwest` crashes when parsing the URL into a URI
because `http` returns a `Result::Err`.

This is currently being tracked
[here](seanmonstar/reqwest#668).
mlodato517 pushed a commit to deepgram/http that referenced this issue Jan 8, 2021
**This Commit**
Increases the max length of URIs

**Why?**
STEM can receive requests > u16::MAX. When it tries to forward these
requests to Impeller, `reqwest` crashes when parsing the URL into a URI
because `http` returns a `Result::Err`.

This is currently being tracked
[here](seanmonstar/reqwest#668).
mlodato517 pushed a commit to deepgram/http that referenced this issue Jan 8, 2021
**This Commit**
Increases the max length of URIs

**Why?**
STEM can receive requests > u16::MAX. When it tries to forward these
requests to Impeller, `reqwest` crashes when parsing the URL into a URI
because `http` returns a `Result::Err`.

This is currently being tracked
[here](seanmonstar/reqwest#668).
mlodato517 pushed a commit to deepgram/http that referenced this issue Jan 8, 2021
**This Commit**
Increases the max length of URIs

**Why?**
STEM can receive requests > u16::MAX. When it tries to forward these
requests to Impeller, `reqwest` crashes when parsing the URL into a URI
because `http` returns a `Result::Err`.

This is currently being tracked
[here](seanmonstar/reqwest#668).
jcdyer pushed a commit to deepgram/http that referenced this issue Dec 1, 2021
**This Commit**
Increases the max length of URIs

**Why?**
STEM can receive requests > u16::MAX. When it tries to forward these
requests to Impeller, `reqwest` crashes when parsing the URL into a URI
because `http` returns a `Result::Err`.

This is currently being tracked
[here](seanmonstar/reqwest#668).
mre added a commit to mre/reqwest that referenced this issue Dec 4, 2021
Instead propagate this parsing issue up to the
calling function as a Result.
See seanmonstar#668
@Smerity
Copy link

Smerity commented Jan 29, 2024

Hitting a panic instead of a err'ed Result was quite a surprise to me and unfortunately hit me in production.

This bit me as well recently in a similar way to @gralpli above:

a parsed Url should always be a valid Uri: InvalidUri(TooLong)

I now convert the String to a URL via reqwest::Url::parse before passing in to the .get(url) call.

In reading the get docs during writing the code I somehow parsed "This method fails whenever the supplied Url cannot be parsed." as thinking I'd get a bad Result later.

seanmonstar pushed a commit that referenced this issue Jan 29, 2024
Instead propagate this parsing issue up to the
calling function as a Result.
See #668
Original PR: #1399

Co-authored-by: Matthias <[email protected]>
convex-copybara bot pushed a commit to get-convex/convex-backend that referenced this issue Mar 8, 2024
Added a regression test.

Reqwest fixed the issue upstream.
seanmonstar/reqwest#668

GitOrigin-RevId: c58c994841934fc15530a0eea612aa9dea130b3d
mre added a commit to lycheeverse/lychee that referenced this issue Apr 25, 2024
With the upgrade to `reqwest` 0.12, we can finally handle a long-standing
issue, when Urls could not be parsed to Uris. Previously, we would panic, but
we can now handle that situation gracefully and return an error instead.

I've also renamed `Status::is_failure` to `Status::is_error`, because the
notion of failures no longer exists in the codebase and we use the term "error"
consistently throughout the codebase instead. This is technically a breaking
change in the API, but it's fine since we have not released a stable version
yet.

More information about the URI parsing issue:
- #539
- seanmonstar/reqwest#668
mre added a commit to lycheeverse/lychee that referenced this issue Apr 25, 2024
With the upgrade to `reqwest` 0.12, we can finally handle a long-standing
issue, when Urls could not be parsed to Uris. Previously, we would panic, but
we can now handle that situation gracefully and return an error instead.

I've also renamed `Status::is_failure` to `Status::is_error`, because the
notion of failures no longer exists in the codebase and we use the term "error"
consistently throughout the codebase instead. This is technically a breaking
change in the API, but it's fine since we have not released a stable version
yet.

More information about the URI parsing issue:
- #539
- seanmonstar/reqwest#668
Nutomic pushed a commit to Nutomic/reqwest that referenced this issue Nov 7, 2024
Instead propagate this parsing issue up to the
calling function as a Result.
See seanmonstar#668
Original PR: seanmonstar#1399

Co-authored-by: Matthias <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants