Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect protocol/hostname parsing #203

Closed
Przytua opened this issue May 4, 2021 · 4 comments · Fixed by #204
Closed

Incorrect protocol/hostname parsing #203

Przytua opened this issue May 4, 2021 · 4 comments · Fixed by #204

Comments

@Przytua
Copy link

Przytua commented May 4, 2021

According to URL and URI definitions, after the scheme (or protocol) followed by colon (:), there is an optional authority (or hostname) component preceded by two slashes (//).

The File URI describes exactly how many slashes can be a part of the authority as:

A valid file URI must therefore begin with either file:/path (no hostname), file:///path (empty hostname), or file://hostname/path.

When parsing an iOS file URL which looks like this:

file:///Users/username/Library/Developer/CoreSimulator/Devices/00000000-0000-0000-0000-000000000000/data/Containers/Data/Application/00000000-0000-0000-0000-000000000000/Documents/Inbox/dummy.pdf

the parser incorrectly takes all 3 slashes as a host predecessor (due to protocol regex looking for ([\\/]{1,})?, so 1 or more slashes), and takes the Users as a host, and the rest (starting with /username) as the path.
To properly parse this file URL, only 2 slashes should be used as a host predecessor, while 3rd slash should be the beginning of a path (thus resulting with an empty host).

Exact issue that this is causing is that the host is lowercased, and toString function always uses 2 slashes when building the string, so parsing above file URL, and using toString on it results with:

file://users/username/Library/Developer/CoreSimulator/Devices/00000000-0000-0000-0000-000000000000/data/Containers/Data/Application/00000000-0000-0000-0000-000000000000/Documents/Inbox/dummy.pdf

which is an incorrect file URL on iOS, and cannot be accessed.

Changing the protocolre from /^([a-z][a-z0-9.+-]*:)?([\\/]{1,})?([\S\s]*)/i to /^([a-z][a-z0-9.+-]*:)?([\\/]{2})?([\S\s]*)/i (so that ([\\/]{2})? will take 0 or 2 slashes) fixes this issue.

@lpinca
Copy link
Member

lpinca commented May 4, 2021

It actually depends on the protocol and url-parse is not smart enough to handle the file: protocol correctly.

> new URL('http:///Users/username/Library/dummy.pdf').toString()
'http://users/username/Library/dummy.pdf'
> new URL('file:///Users/username/Library/dummy.pdf').toString()
'file:///Users/username/Library/dummy.pdf'

It worked as expected before d1e7e88

> var parse = require('.');
undefined
> parse('file:///Users/username/Library/dummy.pdf').toString();
'file:///Users/username/Library/dummy.pdf'

@krassowski
Copy link

Hi @lpinca this issue seems to be causing some headache for us in https://github.com/krassowski/jupyterlab-lsp/issues/595 (if my understanding of things is correct). We would prefer to avoid reverting back to an earlier version as it was labelled as a security fix (and because we don't have a direct control over version being used as it is a peer dependency in a federated extensions system).

Would the regexp change suggested by @Przytua work for you? If yes, I would be happy to open a PR implementing the fix together with a unit test, and we would greatly appreciate if it could make it into 1.5.2 release soon.

@lpinca
Copy link
Member

lpinca commented May 13, 2021

I think it will invalidate some of the existing tests.

@lpinca
Copy link
Member

lpinca commented May 13, 2021

@Przytua @krassowski the following patch

diff --git a/index.js b/index.js
index 72b27c0..61ba3c1 100644
--- a/index.js
+++ b/index.js
@@ -224,7 +224,9 @@ function Url(address, location, parser) {
   // When the authority component is absent the URL starts with a path
   // component.
   //
-  if (!extracted.slashes) instructions[3] = [/(.*)/, 'pathname'];
+  if (!extracted.slashes || url.protocol === 'file:') {
+    instructions[3] = [/(.*)/, 'pathname'];
+  }
 
   for (; i < instructions.length; i++) {
     instruction = instructions[i];
@@ -288,7 +290,10 @@ function Url(address, location, parser) {
   // Default to a / for pathname if none exists. This normalizes the URL
   // to always have a /
   //
-  if (url.pathname.charAt(0) !== '/' && url.hostname) {
+  if (
+    url.pathname.charAt(0) !== '/' &&
+    (url.hostname || url.protocol === 'file:')
+  ) {
     url.pathname = '/' + url.pathname;
   }
 
@@ -430,7 +435,7 @@ function toString(stringify) {
 
   if (protocol && protocol.charAt(protocol.length - 1) !== ':') protocol += ':';
 
-  var result = protocol + (url.slashes ? '//' : '');
+  var result = protocol + (url.slashes || url.protocol === 'file:' ? '//' : '');
 
   if (url.username) {
     result += url.username;

fixes the isse, but I'm not sure if it has side effects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants