Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: add JoinPath, URL.JoinPath #47005

Closed
wanglong001 opened this issue Jul 1, 2021 · 21 comments
Closed

net/url: add JoinPath, URL.JoinPath #47005

wanglong001 opened this issue Jul 1, 2021 · 21 comments

Comments

@wanglong001
Copy link

wanglong001 commented Jul 1, 2021

Url Join is often used

Unsatisfactory using path.Join: http://www -> http:/www

I want to add Join method to url package

// Join  concatenates baseUrl and the elements
// - check baseUrl format
// - concatenates baseUrl and the elements
func Join(baseUrl string, elem ...string) (result string, err error) {
	url, err := Parse(baseUrl)
	if err != nil {
		return
	}
	if len(elem) > 0 {
		elem = append([]string{url.Path}, elem...)
		url.Path = path.Join(elem...)
	}
	result = url.String()
	return
}

unit test

func TestJoin(t *testing.T) {
	type args struct {
		baseUrl string
		elem    []string
	}
	tests := []struct {
		name       string
		args       args
		wantResult string
		wantErr    bool
	}{
		{
			name: "test normal url",
			args: args{
				baseUrl: "https://go.googlesource.com",
				elem:    []string{"go"},
			},
			wantResult: "https://go.googlesource.com/go",
			wantErr:    false,
		},
		{
			name: "test .. parent url",
			args: args{
				baseUrl: "https://go.googlesource.com/a/b/c",
				elem:    []string{"../../../go"},
			},
			wantResult: "https://go.googlesource.com/go",
			wantErr:    false,
		},
		{
			name: "test . cul path",
			args: args{
				baseUrl: "https://go.googlesource.com/",
				elem:    []string{"./go"},
			},
			wantResult: "https://go.googlesource.com/go",
			wantErr:    false,
		},
		{
			name: "test multiple Separator",
			args: args{
				baseUrl: "https://go.googlesource.com//",
				elem:    []string{"/go"},
			},
			wantResult: "https://go.googlesource.com/go",
			wantErr:    false,
		},
		{
			name: "test more elems",
			args: args{
				baseUrl: "https://go.googlesource.com//",
				elem:    []string{"/go", "a", "b", "c"},
			},
			wantResult: "https://go.googlesource.com/go/a/b/c",
			wantErr:    false,
		},
	}
	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			gotResult, err := Join(tt.args.baseUrl, tt.args.elem...)
			if (err != nil) != tt.wantErr {
				t.Errorf("Join() error = %v, wantErr %v", err, tt.wantErr)
				return
			}
			if gotResult != tt.wantResult {
				t.Errorf("Join() = %v, want %v", gotResult, tt.wantResult)
			}
		})
	}
}
@gopherbot gopherbot added this to the Proposal milestone Jul 1, 2021
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/332209 mentions this issue: net/url: add Join method

@earthboundkid
Copy link
Contributor

I actually just implemented something like this for my request client library: earthboundkid/requests@cbdbf91

I think a slightly better name is JoinPaths because it shouldn't do any host changing, just the path.

@earthboundkid
Copy link
Contributor

earthboundkid commented Jul 1, 2021

I also propose that it should be a method on URL, rather than package level function.

// JoinPaths joins the provided path to any existing Path 
// and cleans the result of any ./ or ../ elements.
func (u *URL) JoinPaths(paths ...string) {
	if len(paths) == 0 { 
		return
	}
	paths = append([]string{url.Path}, paths...)
	url.Path = path.Join(paths...)
	return
}

@earthboundkid
Copy link
Contributor

One complication: I think it's quite reasonable to expect that paths which begin with / will be interpreted as absolute (which is what URL.ResolveRef does), but that isn't what you implemented and might be surprising to users either way.

@wanglong001
Copy link
Author

Like this topic

image

It continues to be paid attention to. Path.Join cannot be used directly. Generally, the string type is converted to the url type, then the error is checked, and then ResolveReference or path.join, and finally the complete url path is output. For ordinary users, this process is very repetitive

@earthboundkid
Copy link
Contributor

I agree that this is probably worth doing on some level, but it's not totally clear what semantics users expect. For example, if we used ResolveReference as the standard, we get very different results:

func Join(u *url.URL, path string) {
	*u = *u.ResolveReference(&url.URL{Path: path})
}

func main() {
	u, _ := url.Parse("http://example.com")
	fmt.Println(u)
	for _, p := range []string{"a/", "b/", "c", "d", "../e", "/f"} {
		Join(u, p)
		fmt.Println(u)
	}
}

Results:

http://example.com
http://example.com/a/
http://example.com/a/b/
http://example.com/a/b/c
http://example.com/a/b/d
http://example.com/a/e
http://example.com/f

@wanglong001
Copy link
Author

Yes, using ResolveReference will indeed confuse,
In my opinion, the function of url.Join should be similar to path.Join.
The same is true for the literal understanding

@earthboundkid
Copy link
Contributor

Here is a note on surprising behavior with Python’s url.parse: https://utcc.utoronto.ca/~cks/space/blog/python/UrllibParsePartialURLs

Now suppose someone accidentally creates a URL for a web page of yours that looks like 'https://example.org//your/page/url' (with two slashes after the host instead of one) and visits it, and you attempt to decode the result of what Apache will hand you:

>>> urllib.parse.urlparse("//your/page/url")
ParseResult(scheme='', netloc='your', path='/page/url', params='', query='', fragment='')

The problem here is that '//ahost.org/some/path' is a perfectly legal protocol-relative URL, so that's what urllib.parse will produce when you give it something that looks like one, which is to say something that starts with '//'.

This could be a problem for url.Join as well. Either assuming that // is wrong and should be / or assuming it’s right could lead to surprising behavior depending on the context.

@rsc
Copy link
Contributor

rsc commented Oct 27, 2021

If we do add this, it seems like Join should be restricted to manipulating the path, as in the top StackOverflow answer:

u, err := url.Parse("http://foo")
u.Path = path.Join(u.Path, "bar.html")
s := u.String() // prints http://foo/bar.html

That is, we should not be doing anything with or like ResolveReference, which has to deal with full URL syntax as the second argument. Maybe we should make that clear by calling it url.JoinPath?

So the API would be:

func JoinPath(baseURL string, elem ...string) (string, error)
func (u *URL) JoinPath(elem ...string) (*URL, error) 

Do I have that right?

@rsc
Copy link
Contributor

rsc commented Oct 27, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@wanglong001
Copy link
Author

@rsc Thank you for your answer

Does this API need to return a new object URL? Or modify the Path directly on the original object, which of these two methods is more suitable?

func (u *URL) JoinPath(elem ...string) (*URL, error) 

1

func (u *URL) JoinPath(elem ...string)  *URL{
       url := URL{
             Scheme : u.Scheme
	     Opaque : u.Opaque
	     User   : u.User
	     Host   : u.Host
	     Path       : u.Path
	     RawPath     : u.RawPath
	     ForceQuery  : u.ForceQuery
	     RawQuery     : u.RawQuery
	     Fragment    : u.Fragment
	     RawFragment  : u.RawFragment
        }
	if len(elem) > 0 {
		elem = append([]string{u.Path}, elem...)
		url.Path = path.Join(elem...)
	}
      return  &url
}

func JoinPath(baseUrl string, elem ...string) (result string, err error) {
	url, err := Parse(baseUrl)
	if err != nil {
		return
	}
       urlAfterJoin := url.JoinPath(elem...)
	result = urlAfterJoin.String()
	return
}

2

func (u *URL) JoinPath(elem ...string) {
	if len(elem) > 0 {
		elem = append([]string{u.Path}, elem...)
		u.Path = path.Join(elem...)
	}
}

func JoinPath(baseUrl string, elem ...string) (result string, err error) {
	url, err := Parse(baseUrl)
	if err != nil {
		return
	}
	url.JoinPath(elem...)
	result = url.String()
	return
}

@rsc rsc changed the title proposal: net/url: add URL.Join method proposal: net/url: add JoinPath, URL.JoinPath Nov 3, 2021
@rsc
Copy link
Contributor

rsc commented Nov 3, 2021

I don't believe we have any methods on URL today that modify the receiver (except Unmarshal).
It seems like a mistake to start.
Especially since path.Join returns a new thing rather than modifying anything.

Any other objections to the API in #47005 (comment)?

@wanglong001
Copy link
Author

@rsc hi, I’m not sure if I really understand what you mean, I resubmitted the code, please help me review it again, thank you

@earthboundkid
Copy link
Contributor

This is a cleaner implementation, but I'm not sure if package url can import path without causing a dependency cycle somewhere:

func (u *URL) JoinPath(elem ...string)  *URL{
	u2 := *u
	if len(elem) > 0 {
		elem = append([]string{u.Path}, elem...)
		u2.Path = path.Join(elem...)
	}
      return  &u2
}

func JoinPath(baseUrl string, elem ...string) (result string, err error) {
	u, err := Parse(baseUrl)
	if err != nil {
		return
	}
    
	return u.JoinPath(elem...).String()
}

@rsc
Copy link
Contributor

rsc commented Nov 10, 2021

It should be OK for net/url to import path. path definitely doesn't import url.

@rsc
Copy link
Contributor

rsc commented Nov 10, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Dec 1, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: net/url: add JoinPath, URL.JoinPath net/url: add JoinPath, URL.JoinPath Dec 1, 2021
@rsc rsc modified the milestones: Proposal, Backlog Dec 1, 2021
earthboundkid pushed a commit to earthboundkid/go that referenced this issue Dec 29, 2021
concatenates baseUrl and the elements

Fixes golang#47005

Change-Id: I71d638be9d451435dddf135091d8b1db54d174fd
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/374654 mentions this issue: net/url: add JoinPath, URL.JoinPath

earthboundkid pushed a commit to earthboundkid/go that referenced this issue Mar 4, 2022
concatenates baseUrl and the elements

Fixes golang#47005

Change-Id: I71d638be9d451435dddf135091d8b1db54d174fd
@dmitshur dmitshur modified the milestones: Backlog, Go1.19 Mar 10, 2022
@drigz
Copy link

drigz commented Mar 31, 2022

Thank you for adding this, looks useful! Is it intended that the new method strips trailing / characters? url.JoinPath("http://host/foo", "bar/") == "http://host/foo/bar", nil

This is not an important distinction for filesystem paths but sometimes makes an important difference for URLs, depending on the server, so I can imagine it surprising the user. The behavior isn't documented for JoinPath in the linked change.

@earthboundkid
Copy link
Contributor

Hmm, it does seem like it should preserve the trailing slash, IMO. File a new issue so we can decide this before 1.19 ships.

@drigz
Copy link

drigz commented Mar 31, 2022

Done, thanks! #52074

@rsc rsc moved this to Accepted in Proposals Aug 10, 2022
@rsc rsc added this to Proposals Aug 10, 2022
@rsc rsc removed this from Proposals Mar 15, 2023
@golang golang locked and limited conversation to collaborators Mar 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants