-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/go: go get fails on non-ASCII github packages #18660
Comments
I'll add some additional information about this bug below. This affects I've made a smaller reproduce repo (@rsc is welcome to fork it to his collection if desired; consider my repo temporary): https://github.com/dmitshur-test/go-get-issue-unicode Note that this issue affects only import paths that are statically known (GitHub, Bitbucket, etc.). It's not an issue for vanity import paths. E.g., here is an alternative vanity import path that works without issues:
You can $ go get -u dmitri.shuralyov.com/test/go-get-issue-unicode/испытание
$ go install dmitri.shuralyov.com/test/go-get-issue-unicode/испытание
$ go test dmitri.shuralyov.com/test/go-get-issue-unicode/испытание
ok dmitri.shuralyov.com/test/go-get-issue-unicode/испытание 0.014s
$ go doc dmitri.shuralyov.com/test/go-get-issue-unicode/испытание
package испытание // import "dmitri.shuralyov.com/test/go-get-issue-unicode/испытание"
Package испытание demonstrates Unicode capabilities in Go source code.
type Эксперимент struct{ ... }
func Испытание() Эксперимент You can also call
However, both |
It's not easy to expand character range of unicode. go's identifier doesn't accept some unicode classes. For example, IDEOGRAPHIC SPACE is. |
@mattn, that's not a blocker for this bug. We can match on |
@bradfitz please try this. https://github.com/mattn/misc/blob/master/foo.go the first import |
If you think |
For golang/go#18660 and golang/gddo#468. Will be deleted after those issues are resolved.
CL https://golang.org/cl/41750 mentions this issue. |
Background The following is a valid vanity import path that works without issues: dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание You can go get, go install, go test, go doc it without issues: $ go get -u dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание $ go install dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание $ go test dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание ok dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание 0.014s $ go doc dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание package испытание // import "dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание" Package испытание demonstrates Unicode capabilities in Go source code. type Эксперимент struct{ ... } func Испытание() Эксперимент You can also call vcs.RepoRootForImportPath("dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание", false) (from golang.org/x/tools/go/vcs) successfully on the vanity import path: $ goexec 'vcs.RepoRootForImportPath("dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание", false)' (*vcs.RepoRoot)(&vcs.RepoRoot{ VCS: (*vcs.Cmd)(&vcs.Cmd{ Name: (string)("Git"), Cmd: (string)("git"), CreateCmd: (string)("clone {repo} {dir}"), DownloadCmd: (string)("pull --ff-only"), TagCmd: ([]vcs.TagCmd)([]vcs.TagCmd{ (vcs.TagCmd)(vcs.TagCmd{ Cmd: (string)("show-ref"), Pattern: (string)("(?:tags|origin)/(\\S+)$"), }), }), TagLookupCmd: ([]vcs.TagCmd)([]vcs.TagCmd{ (vcs.TagCmd)(vcs.TagCmd{ Cmd: (string)("show-ref tags/{tag} origin/{tag}"), Pattern: (string)("((?:tags|origin)/\\S+)$"), }), }), TagSyncCmd: (string)("checkout {tag}"), TagSyncDefault: (string)("checkout master"), LogCmd: (string)(""), Scheme: ([]string)([]string{ (string)("git"), (string)("https"), (string)("http"), (string)("git+ssh"), }), PingCmd: (string)("ls-remote {scheme}://{repo}"), }), Repo: (string)("https://github.com/shurcooL-test/go-get-issue-unicode"), Root: (string)("dmitri.shuralyov.com/temp/go-get-issue-unicode"), }) (interface{})(nil) However, gosrc.IsValidRemotePath incorrectly reports false for the "dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание" import path. Fix gosrc.IsValidRemotePath reports false for such import paths because validPathElement regexp only allows ASCII letters A-Za-z, not Unicode ones. This change fixes that by using a predefined character class, the Unicode character property class \p{L} that describes the Unicode characters that are letters. Additionally, fix an issue where a query parameter value was not correctly escaped when constructing a URL. Fixes #468. Updates golang/go#18660. References - https://stackoverflow.com/questions/3617797/regex-to-match-only-letters - https://stackoverflow.com/questions/6005459/is-there-a-way-to-match-any-unicode-non-alphabetic-character - https://www.regular-expressions.info/unicode.html#prop Change-Id: I48680749d827cbc63fefca2c21e9790009f20746 Reviewed-on: https://go-review.googlesource.com/41750 Reviewed-by: Chris Broadfoot <[email protected]> Reviewed-by: Tuo Shan <[email protected]> Reviewed-by: Francesc Campoy Flores <[email protected]>
I think I can apply a similar solution to fix this as I did for golang/gddo#468:
See commit golang/gddo@cdd60fa for full details. The fix for diff --git a/src/cmd/go/internal/get/vcs.go b/src/cmd/go/internal/get/vcs.go
index 7439cc8649..c72d52bc1b 100644
--- a/src/cmd/go/internal/get/vcs.go
+++ b/src/cmd/go/internal/get/vcs.go
@@ -851,7 +851,7 @@ var vcsPaths = []*vcsPath{
// Github
{
prefix: "github.com/",
- re: `^(?P<root>github\.com/[A-Za-z0-9_.\-]+/[A-Za-z0-9_.\-]+)(/[A-Za-z0-9_.\-]+)*$`,
+ re: `^(?P<root>github\.com/[A-Za-z0-9_.\-]+/[A-Za-z0-9_.\-]+)(/[\p{L}0-9_.\-]+)*$`,
vcs: "git",
repo: "https://{root}",
check: noVCSSuffix, I tested it, and it works on the original bug report. It doesn't seem neccessary to allow Unicode for GitHub username and repository name, because from what I can tell, GitHub doesn't allow those: So a Unicode character can only come up within a directory name of a GitHub repository. I'd rather make the smallest possible change that'll resolve this issue (unless I'm advised otherwise). If that sounds reasonable, I can send a CL that'll resolve this. |
Possibly relevant: #20115 |
@shurcooL, SGTM. |
#20115 provides further support for the choice of not allowing Unicode for GitHub usernames and repository names needlessly. That way, only the directory names will be susceptible, but not the actual repository, which helps. @bradfitz Ok, I'll send a CL. I'll work on adding/updating a test case for it. Would it be possible/good idea for @rsc to create a copy of (or fork) my https://github.com/shurcooL-test/go-get-issue-unicode repository, so I could rely on it to be there and use it in my tests? I ask because most go get issue repos are already there. @rsc can rename the repository to |
Although, I'm not sure if this really needs a live go get test, I think some offline test cases for |
Self-contained non-network test only please. :) |
CL https://golang.org/cl/41822 mentions this issue. |
CL https://golang.org/cl/42017 mentions this issue. |
Manually apply same change as CL 41822 did for cmd/go/internal/get, but for golang.org/x/tools/go/vcs, to help keep them in sync. Updates golang/go#18660. Helps golang/go#11490. Change-Id: I6c7759c073583dea771bc438b70f8c2eb7b5ebfb Reviewed-on: https://go-review.googlesource.com/42017 Reviewed-by: Brad Fitzpatrick <[email protected]> Run-TryBot: Brad Fitzpatrick <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>
In vcs.go:
Could the regex be made more flexible?
The text was updated successfully, but these errors were encountered: