Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: go get fails on non-ASCII github packages #18660

Closed
mcandre opened this issue Jan 14, 2017 · 14 comments
Closed

cmd/go: go get fails on non-ASCII github packages #18660

mcandre opened this issue Jan 14, 2017 · 14 comments
Milestone

Comments

@mcandre
Copy link

mcandre commented Jan 14, 2017

$ go get -u github.com/mcandre/toys/go/испытание
package github.com/mcandre/toys/go/испытание: invalid github.com/ import path "github.com/mcandre/toys/go/испытание"

In vcs.go:

var vcsPaths = []*vcsPath{
        // Github                                                                                                                               
        {
                prefix: "github.com/",
                re:     `^(?P<root>github\.com/[A-Za-z0-9_.\-]+/[A-Za-z0-9_.\-]+)(/[A-Za-z0-9_.\-]+)*$`,
                vcs:    "git",
                repo:   "https://{root}",
                check:  noVCSSuffix,
        },

Could the regex be made more flexible?

@bradfitz bradfitz changed the title go get fails on Cyrillic package names cmd/go: go get fails on non-ASCII github packages Jan 14, 2017
@dmitshur
Copy link
Contributor

dmitshur commented Jan 18, 2017

I'll add some additional information about this bug below.

This affects golang.org/x/tools/go/vcs package in addition to cmd/go.

I've made a smaller reproduce repo (@rsc is welcome to fork it to his collection if desired; consider my repo temporary):

https://github.com/dmitshur-test/go-get-issue-unicode

Note that this issue affects only import paths that are statically known (GitHub, Bitbucket, etc.). It's not an issue for vanity import paths.

E.g., here is an alternative vanity import path that works without issues:

dmitri.shuralyov.com/test/go-get-issue-unicode/испытание

You can go get, go install, go test, go doc it without issues:

$ go get -u dmitri.shuralyov.com/test/go-get-issue-unicode/испытание
$ go install dmitri.shuralyov.com/test/go-get-issue-unicode/испытание
$ go test dmitri.shuralyov.com/test/go-get-issue-unicode/испытание
ok  	dmitri.shuralyov.com/test/go-get-issue-unicode/испытание	0.014s
$ go doc dmitri.shuralyov.com/test/go-get-issue-unicode/испытание
package испытание // import "dmitri.shuralyov.com/test/go-get-issue-unicode/испытание"

Package испытание demonstrates Unicode capabilities in Go source code.

type Эксперимент struct{ ... }
    func Испытание() Эксперимент

You can also call vcs.RepoRootForImportPath("dmitri.shuralyov.com/test/go-get-issue-unicode/испытание", false) successfully on the vanity import path:

$ goexec 'vcs.RepoRootForImportPath("dmitri.shuralyov.com/test/go-get-issue-unicode/испытание", false)'
(*vcs.RepoRoot)(&vcs.RepoRoot{
	VCS: (*vcs.Cmd)(&vcs.Cmd{
		Name:        (string)("Git"),
		Cmd:         (string)("git"),
		CreateCmd:   (string)("clone {repo} {dir}"),
		DownloadCmd: (string)("pull --ff-only"),
		TagCmd: ([]vcs.TagCmd)([]vcs.TagCmd{
			(vcs.TagCmd)(vcs.TagCmd{
				Cmd:     (string)("show-ref"),
				Pattern: (string)("(?:tags|origin)/(\\S+)$"),
			}),
		}),
		TagLookupCmd: ([]vcs.TagCmd)([]vcs.TagCmd{
			(vcs.TagCmd)(vcs.TagCmd{
				Cmd:     (string)("show-ref tags/{tag} origin/{tag}"),
				Pattern: (string)("((?:tags|origin)/\\S+)$"),
			}),
		}),
		TagSyncCmd:     (string)("checkout {tag}"),
		TagSyncDefault: (string)("checkout master"),
		LogCmd:         (string)(""),
		Scheme: ([]string)([]string{
			(string)("git"),
			(string)("https"),
			(string)("http"),
			(string)("git+ssh"),
		}),
		PingCmd: (string)("ls-remote {scheme}://{repo}"),
	}),
	Repo: (string)("https://github.com/dmitshur-test/go-get-issue-unicode"),
	Root: (string)("dmitri.shuralyov.com/test/go-get-issue-unicode"),
})
(interface{})(nil)

However, both go get and vcs.RepoRootForImportPath() fail on the "github.com/dmitshur-test/go-get-issue-unicode/испытание" import path, because the regexps in vcs.go only allow ASCII letters, not unicode ones.

@mattn
Copy link
Member

mattn commented Jan 18, 2017

It's not easy to expand character range of unicode. go's identifier doesn't accept some unicode classes. For example, IDEOGRAPHIC SPACE is.

@bradfitz bradfitz added this to the Go1.9 milestone Jan 19, 2017
@bradfitz
Copy link
Contributor

@mattn, that's not a blocker for this bug. We can match on [^/]+.

@mattn
Copy link
Member

mattn commented Jan 19, 2017

@bradfitz please try this.

https://github.com/mattn/misc/blob/master/foo.go

the first import github.com/mattn/misc/世界 is okay to import. But second one github.com/mattn/misc/世 界 occur invalid import path. And I guess [^/]+ should not accept .., :, ;, &, % or something characters which can make another problems for each OSs.

@mattn
Copy link
Member

mattn commented Jan 19, 2017

If you think go get doesn't have to matter the letters, and the error should be occured expectedly, i don't have strong opinion. So please fixes this.

dmitshur added a commit to shurcooL/home that referenced this issue Feb 5, 2017
For golang/go#18660 and golang/gddo#468.

Will be deleted after those issues are resolved.
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/41750 mentions this issue.

gopherbot pushed a commit to golang/gddo that referenced this issue Apr 25, 2017
Background

The following is a valid vanity import path that works without issues:

	dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание

You can go get, go install, go test, go doc it without issues:

	$ go get -u dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание
	$ go install dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание
	$ go test dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание
	ok  	dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание	0.014s
	$ go doc dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание
	package испытание // import "dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание"

	Package испытание demonstrates Unicode capabilities in Go source code.

	type Эксперимент struct{ ... }
	    func Испытание() Эксперимент

You can also call vcs.RepoRootForImportPath("dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание", false)
(from golang.org/x/tools/go/vcs) successfully on the vanity import path:

	$ goexec 'vcs.RepoRootForImportPath("dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание", false)'
	(*vcs.RepoRoot)(&vcs.RepoRoot{
		VCS: (*vcs.Cmd)(&vcs.Cmd{
			Name:        (string)("Git"),
			Cmd:         (string)("git"),
			CreateCmd:   (string)("clone {repo} {dir}"),
			DownloadCmd: (string)("pull --ff-only"),
			TagCmd: ([]vcs.TagCmd)([]vcs.TagCmd{
				(vcs.TagCmd)(vcs.TagCmd{
					Cmd:     (string)("show-ref"),
					Pattern: (string)("(?:tags|origin)/(\\S+)$"),
				}),
			}),
			TagLookupCmd: ([]vcs.TagCmd)([]vcs.TagCmd{
				(vcs.TagCmd)(vcs.TagCmd{
					Cmd:     (string)("show-ref tags/{tag} origin/{tag}"),
					Pattern: (string)("((?:tags|origin)/\\S+)$"),
				}),
			}),
			TagSyncCmd:     (string)("checkout {tag}"),
			TagSyncDefault: (string)("checkout master"),
			LogCmd:         (string)(""),
			Scheme: ([]string)([]string{
				(string)("git"),
				(string)("https"),
				(string)("http"),
				(string)("git+ssh"),
			}),
			PingCmd: (string)("ls-remote {scheme}://{repo}"),
		}),
		Repo: (string)("https://github.com/shurcooL-test/go-get-issue-unicode"),
		Root: (string)("dmitri.shuralyov.com/temp/go-get-issue-unicode"),
	})
	(interface{})(nil)

However, gosrc.IsValidRemotePath incorrectly reports false for the
"dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание" import path.

Fix

gosrc.IsValidRemotePath reports false for such import paths because
validPathElement regexp only allows ASCII letters A-Za-z, not Unicode
ones.

This change fixes that by using a predefined character class, the
Unicode character property class \p{L} that describes the Unicode
characters that are letters.

Additionally, fix an issue where a query parameter value was not
correctly escaped when constructing a URL.

Fixes #468.
Updates golang/go#18660.

References

-	https://stackoverflow.com/questions/3617797/regex-to-match-only-letters
-	https://stackoverflow.com/questions/6005459/is-there-a-way-to-match-any-unicode-non-alphabetic-character
-	https://www.regular-expressions.info/unicode.html#prop

Change-Id: I48680749d827cbc63fefca2c21e9790009f20746
Reviewed-on: https://go-review.googlesource.com/41750
Reviewed-by: Chris Broadfoot <[email protected]>
Reviewed-by: Tuo Shan <[email protected]>
Reviewed-by: Francesc Campoy Flores <[email protected]>
@dmitshur
Copy link
Contributor

dmitshur commented Apr 26, 2017

I think I can apply a similar solution to fix this as I did for golang/gddo#468:

gosrc.IsValidRemotePath reports false for such import paths because
validPathElement regexp only allows ASCII letters A-Za-z, not Unicode
ones.

This change fixes that by using a predefined character class, the
Unicode character property class \p{L} that describes the Unicode
characters that are letters.

References

See commit golang/gddo@cdd60fa for full details.

The fix for cmd/go then becomes:

diff --git a/src/cmd/go/internal/get/vcs.go b/src/cmd/go/internal/get/vcs.go
index 7439cc8649..c72d52bc1b 100644
--- a/src/cmd/go/internal/get/vcs.go
+++ b/src/cmd/go/internal/get/vcs.go
@@ -851,7 +851,7 @@ var vcsPaths = []*vcsPath{
 	// Github
 	{
 		prefix: "github.com/",
-		re:     `^(?P<root>github\.com/[A-Za-z0-9_.\-]+/[A-Za-z0-9_.\-]+)(/[A-Za-z0-9_.\-]+)*$`,
+		re:     `^(?P<root>github\.com/[A-Za-z0-9_.\-]+/[A-Za-z0-9_.\-]+)(/[\p{L}0-9_.\-]+)*$`,
 		vcs:    "git",
 		repo:   "https://{root}",
 		check:  noVCSSuffix,

I tested it, and it works on the original bug report. It doesn't seem neccessary to allow Unicode for GitHub username and repository name, because from what I can tell, GitHub doesn't allow those:

image

image

So a Unicode character can only come up within a directory name of a GitHub repository.

I'd rather make the smallest possible change that'll resolve this issue (unless I'm advised otherwise).

If that sounds reasonable, I can send a CL that'll resolve this.

@josharian
Copy link
Contributor

Possibly relevant: #20115

@bradfitz
Copy link
Contributor

@shurcooL, SGTM.

@dmitshur
Copy link
Contributor

dmitshur commented Apr 26, 2017

#20115 provides further support for the choice of not allowing Unicode for GitHub usernames and repository names needlessly. That way, only the directory names will be susceptible, but not the actual repository, which helps.

@bradfitz Ok, I'll send a CL. I'll work on adding/updating a test case for it.

Would it be possible/good idea for @rsc to create a copy of (or fork) my https://github.com/shurcooL-test/go-get-issue-unicode repository, so I could rely on it to be there and use it in my tests? I ask because most go get issue repos are already there. @rsc can rename the repository to go-get-issue-18660.

@dmitshur
Copy link
Contributor

dmitshur commented Apr 26, 2017

Although, I'm not sure if this really needs a live go get test, I think some offline test cases for vcsPath path regexp matching could be sufficient.

@bradfitz
Copy link
Contributor

Self-contained non-network test only please. :)

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/41822 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/42017 mentions this issue.

gopherbot pushed a commit to golang/tools that referenced this issue Apr 28, 2017
Manually apply same change as CL 41822 did for cmd/go/internal/get,
but for golang.org/x/tools/go/vcs, to help keep them in sync.

Updates golang/go#18660.
Helps golang/go#11490.

Change-Id: I6c7759c073583dea771bc438b70f8c2eb7b5ebfb
Reviewed-on: https://go-review.googlesource.com/42017
Reviewed-by: Brad Fitzpatrick <[email protected]>
Run-TryBot: Brad Fitzpatrick <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
@golang golang locked and limited conversation to collaborators Apr 28, 2018
@golang golang unlocked this conversation Dec 4, 2018
@golang golang locked and limited conversation to collaborators Dec 4, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants