Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSP: Fixes URL decoding incoming file names. #13473

Closed
wants to merge 1 commit into from

Conversation

christianparpart
Copy link
Member

@christianparpart christianparpart commented Sep 2, 2022

This fixes #13035.

On Windows platform, it is much more likely to hit, as we did not URL decode the incoming URL, and thus, the files (nor the base path, if specified as rootUri) could not be opened nor traversed.

Checklist

  • fix URL decoding/encoding
  • Changelog entry
  • tests

@christianparpart christianparpart marked this pull request as ready for review September 2, 2022 13:38
Copy link
Contributor

@Marenz Marenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see at least one test with a full url that is also using /

@Marenz

This comment was marked as resolved.

@christianparpart christianparpart force-pushed the fix-lsp-uri-decoding branch 3 times, most recently from e9fc513 to b298001 Compare October 18, 2022 11:58
Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but could use some small tweaks.

@@ -20,6 +21,7 @@ Important Bugfixes:

Compiler Features:
* Code Generator: More efficient overflow checks for multiplication.
* Yul Optimizer: Simplify the starting offset of zero-length operations to zero.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems out of place in this PR.

@@ -7,6 +7,7 @@ Compiler Features:
* Commandline Interface: Add `--no-cbor-metadata` that skips CBOR metadata from getting appended at the end of the bytecode.
* Standard JSON: Add a boolean field `settings.metadata.appendCBOR` that skips CBOR metadata from getting appended at the end of the bytecode.
* Yul Optimizer: Allow replacing the previously hard-coded cleanup sequence by specifying custom steps after a colon delimiter (``:``) in the sequence string.
* Language Server: Fixes URL decoding of incoming file names.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially confused as to why file names would need URL-decoding. I think it would be better to say that we're decoding the file:// URLs.

Suggested change
* Language Server: Fixes URL decoding of incoming file names.
* Language Server: Properly URL-decode the `file://` URIs coming from the client and encode the URIs coming out of by the server.

I'd also say it's a bugfix rather than a feature :)

target_link_libraries(solutil PUBLIC jsoncpp Boost::boost Boost::filesystem Boost::system range-v3)
target_link_libraries(solutil PUBLIC jsoncpp Boost::boost Boost::filesystem Boost::system range-v3 fmt::fmt-header-only)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the URL-encoding helpers are things I'd put in a separate commit. They should stand out as its own thing in history.

Comment on lines +53 to +58
/// Decodes a URI with respect to %XX notation.
/// No URI-validity verification is performed but simply the URI decoded into non-escaping characters.
std::string decodeURI(std::string const& _uri);

/// Encodes a string into a URI conform notation.
std::string encodeURI(std::string const& _uri);
Copy link
Member

@cameel cameel Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that these don't have to be URIs (it works for arbitrary strings), I'd call it something like urlEncode() and urlDecode() (or even urlencode() and urldecode()). I associate the term "urlencoding" specifically with this kind of encoding while "encoding an URI" could mean any encoding. I would probably look for the former I wanted to find it.

Comment on lines +53 to +58
/// Decodes a URI with respect to %XX notation.
/// No URI-validity verification is performed but simply the URI decoded into non-escaping characters.
std::string decodeURI(std::string const& _uri);

/// Encodes a string into a URI conform notation.
std::string encodeURI(std::string const& _uri);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Decodes a URI with respect to %XX notation.
/// No URI-validity verification is performed but simply the URI decoded into non-escaping characters.
std::string decodeURI(std::string const& _uri);
/// Encodes a string into a URI conform notation.
std::string encodeURI(std::string const& _uri);
/// Decodes a percent-encoded string (i.e. the %XX notation).
/// No URI-validity verification is performed but simply the URI decoded into non-escaping characters.
std::string decodeURI(std::string const& _uri);
/// Percent-encodes a string.
/// Note that `:` and `/` are preserved, which means that the function will not mangle the `protocol://`
/// prefix in URLs.
std::string encodeURI(std::string const& _uri);

Comment on lines +227 to +228
BOOST_CHECK_EQUAL(util::decodeURI(""), "");
BOOST_CHECK_EQUAL(util::decodeURI(""), "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate test.

{
BOOST_CHECK_EQUAL(util::decodeURI(""), "");
BOOST_CHECK_EQUAL(util::decodeURI(""), "");
BOOST_CHECK_EQUAL(util::decodeURI(" ")," ");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BOOST_CHECK_EQUAL(util::decodeURI(" ")," ");
BOOST_CHECK_EQUAL(util::decodeURI(" "), " ");

Comment on lines +237 to +242
// Decoding failure cases (there's not really a standard for that).
BOOST_CHECK_EQUAL(util::decodeURI("%"), "");
BOOST_CHECK_EQUAL(util::decodeURI("%ZZ"), "ZZ");
BOOST_CHECK_EQUAL(util::decodeURI("%7Ge"), "7Ge");
BOOST_CHECK_EQUAL(util::decodeURI("%2F%2%2F"), "/2/");
BOOST_CHECK_EQUAL(util::decodeURI("%1G/%7G/%FG"), "1G/7G/FG");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to handle these as errors? Having % silently disappear from the end of a string when you pass in something not encoded by accident does not sound like the nicest behavior.

Comment on lines +230 to +242
BOOST_CHECK_EQUAL(util::decodeURI("%25"), "%");
BOOST_CHECK_EQUAL(util::decodeURI("Hello%20World."), "Hello World.");
BOOST_CHECK_EQUAL(util::decodeURI("Hello World."), "Hello World.");
BOOST_CHECK_EQUAL(util::decodeURI("%01%02%7F"), "\x01\x02\x7F");
BOOST_CHECK_EQUAL(util::decodeURI("C%3A%5Creadme.md"), "C:\\readme.md");
BOOST_CHECK_EQUAL(util::decodeURI("C:/readme.md"), "C:/readme.md");

// Decoding failure cases (there's not really a standard for that).
BOOST_CHECK_EQUAL(util::decodeURI("%"), "");
BOOST_CHECK_EQUAL(util::decodeURI("%ZZ"), "ZZ");
BOOST_CHECK_EQUAL(util::decodeURI("%7Ge"), "7Ge");
BOOST_CHECK_EQUAL(util::decodeURI("%2F%2%2F"), "/2/");
BOOST_CHECK_EQUAL(util::decodeURI("%1G/%7G/%FG"), "1G/7G/FG");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a few more?

  • A Windows path with a file:// prefix.
  • %😃
  • %Z

@@ -747,7 +752,9 @@ def run_testcase(self, testcase: TestParser.RequestAndResponse):
if isinstance(actualResponseJson["result"], list):
for result in actualResponseJson["result"]:
if "uri" in result:
result["uri"] = result["uri"].replace(self.suite.project_root_uri + "/" + self.sub_dir + "/", "")
result["uri"] = decodeURI(result["uri"]).replace(
self.suite.project_root_uri + "/" + self.sub_dir + "/", ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.suite.project_root_uri + "/" + self.sub_dir + "/", ""
self.suite.project_root_uri + "/" + self.sub_dir + "/",
""

@github-actions
Copy link

This pull request is stale because it has been open for 14 days with no activity.
It will be closed in 7 days unless the stale label is removed.

@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language server stale The issue/PR was marked as stale because it has been open for too long.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

LSP: textDocument/definition always returns "result":[]
6 participants