-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor and improve URI implementation #6323
Refactor and improve URI implementation #6323
Conversation
1afc004
to
165ae8d
Compare
describe ".parse" do | ||
# scheme | ||
assert_uri("http:", scheme: "http") | ||
it { URI.parse("HttP:").should eq(URI.new(scheme: "http")) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are missing a spec of what the result of #scheme
on a missing scheme is. Some tests below lacks scheme but none tests what #scheme
is in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean. There are many specs for no-scheme URIs. For example assert_uri("/foo", path: "/foo")
. It ensures that the URI parsed from "/foo"
has a path of /foo
and all other properties at the default value. This means scheme
should be nil
.
The test failures make me wonder if |
Nah, the specs are only failing because of the previous incorrect behavior of ommitting the port if it was specified but equal to the default. Comparing two |
|
I'm not saying we should copy Ruby... but specifying a default port when it's the default one, and keeping it around... I don't know... there's no difference, really. |
@asterite Yeah, Ruby does that. But it also uses a scheme-specific type Besides, just a few counter examples: u, err := url.Parse("http://foo.org:80")
fmt.Println(u.Port()) // => 80 System.out.println(new URI("http://foo.com:80").getPort()); // => 80 urlparse('http://foo.com:80').port # => 80 |
For well-known schemes, probably not. But when you're using a custom scheme, it might not be given that the next consumer of a URI has the same default port configured.
|
Sounds good |
0a1513e
to
dd2d8bc
Compare
6cd19ee
to
ddda611
Compare
🏓 A second review would be great! |
I'd really like to get this merged for 0.28.0. It's so annoying to have to work around URI bugs I fixed six months ago. |
I'm with @straight-shoota please let's get this merged |
it { URI.new("http", "www.example.com", 80, "/hello", "a=1").to_s.should eq("http://www.example.com/hello?a=1") } | ||
it { URI.new("mailto", opaque: "[email protected]").to_s.should eq("mailto:[email protected]") } | ||
|
||
it "removes default port" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not normalize ports from default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no generic default ports in the URI specification. It depends solely on configuration which port is considered "default".
When no port is specified, the resolver is entrusted to choose whichever the port it seems fit. But when a port is explicitly specified, the resolver is asked to use that specific port, regardless of it's configuration for default ports.
Thus removing a port (even if it is considered default) mutates the URI. It looses information.
A well-formed URI should not be automatically stripped of anything unless explicitly asked (for example using #normalize
).
8f1be0b
to
a88776b
Compare
I think it makes sense to keep the port if the URI has it set explicitly, even if it's the default port for that scheme. |
a88776b
to
19be7a1
Compare
Rebased and ready to ship. |
@straight-shoota should all the |
I just realized that might have been out of scope for this refactor but thanks for moving the parse methods to private. |
@wontruefree Yeah I thought so at first. But then, what the heck... it's a simple thing and fits to the set of changes in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a rebase and fix some conflict and we can merge this.
Every URI has a path, but it may be empty. Making the ivar non-nilable avoids any ambiguity whether an empty path is represented by an empty string (`""`) or `nil`.
This macro was previously used to spec internals of URI::Parser, but it no longer works because the struct can't be inherited. It doesn't make much sense to spec the internal API anyway. Having unit tests for URI.parse is sufficient. The specs for URI::Parser integrated into the specs for URI.parse.
Previously, `#to_s` would omit the port if it matches the default port, even if it is explicitly specified. This ommitance changes the URI and must be avoided, unless explicitly asked to normalize. Changes `#normalize` to remove the port if matches the schema's default port.
It is not necessary to have an extra ivar for an opaque path, URI#path is sufficient and can hold the path of both opaque and hiearchicyl URIs. Adds URI#opaque? to tell whether the URI is considered opaque.
URI schemes are case-insensitive, this ensures that. This approach is recommended by RFC 3986.
URI.prase decodes escape sequences in host, URI#to_s applies them where necessary. This is already some implicity normalization but results in the same URI, although in a differen string representation.
An empty port was previously interpreted as port `0`. Now it's `nil`.
An URI starting with three slashes (`///`) has an empty authority but it's still a hierarchical URI. `host` must be the empty string, not `nil`.
`#to_s` inserts a slash if `path` does not start with one and follows a non-nil authority. Paths starting with double slash (`"//"`) are prefixed by ("/.`) if it does not follow an authority. Otherwise the URI would be considered relative to the schema and the first path component interpreted as authority.
05e186b
to
b9acfe7
Compare
This PR applies a ton of mostly small improvements to
URI
andURI::Parser
.URI#path
is changed toString
. It is no longer nilable. The semantic is: every URI has a path, but it may be empty. This avoids ambiguity between empty path and nil path.URI::Parser.step
is removed. It was previously used to spec internals ofURI::Parser
but it no longer works because the struct can't be inherited. It doesn't make much sense to spec the internal API anyway, having unit tests forURI.parse
is totally fine. Consequently, I have removed the specs forURI::Parser
and integrated them into the specs forURI.parse
.URI#opaque
. It is simply not necessary to have an extra ivar for this. It is replaced byURI#path
which contains both a path in the hierarchical part as well as a non-hierarchical, opaque path.URI#opaque?
answers if the URI is considered opaque.#to_s
no longer hides the port if it is the default port. A default port can now be stripped using#normalize
.#parse
downcases thescheme
to make it case-insensitive. This is recommended by RFC 3986.#parse
decodes escape sequences inhost
, and#to_s
applies them.#parse
correctly parses an empty port. Previously,scheme://host:/
would setport
to0
, now it'snil
.#parse
parses a string starting with three slashes as empty host. Previously,host
wasnil
.#to_s
inserts a slash ifpath
does not start with one and follows an authority. Paths starting with double slash//
are prefixed by/.
if it does not follow an authority. Otherwise the first path component would be interpreted as authority.