-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repospec: support ssh urls with ssh certificates #4741
Conversation
|
Welcome @mightyguava! |
Hi @mightyguava. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mightyguava The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/ok-to-test |
/test all |
I can't figure out how to trigger tests... 😞 |
Thanks @natasha41575 for triggering the tests. I've fixed the lint error. I did actually try to lint locally before but it turns out that I had an old version of the linter. |
Hmm the new lint failure is at |
Ok looks like that's getting fixed in #4754 and is unrelated. |
Organizations that use a custom SSH Certificate Authority with GitHub Enterprise Cloud will have a host that starts with `org-12345@` instead of `git@`. This removes the hard-coded `git@` in the repo spec parsing for a regexp match on a valid username instead. E.g. now it will be able to parse specs like `[email protected]:someOrg/someRepo/README.md?ref=someBranch`. Docs [here](https://docs.github.com/en/enterprise-cloud@latest/organizations/managing-git-access-to-your-organizations-repositories/about-ssh-certificate-authorities#about-ssh-urls-with-ssh-certificates)
api/internal/git/repospec.go
Outdated
@@ -212,18 +213,26 @@ func peelQuery(arg string) (string, string, time.Duration, bool) { | |||
return parsed.Path, ref, duration, submodules | |||
} | |||
|
|||
var userRegex = regexp.MustCompile(`^[a-zA-Z][a-zA-Z0-9-]*@`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm for sure being pedantic, but:
- Should the
*
be a+
? I'd expect we wouldn't want to match an @ with nothing before it. - Should the
^
be\A
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The character class [a-zA-Z]
already ensures a character before @
. It is standalone from the second character class because the first character cannot be a number or -
.
I've never seen \A
before. Looking at the docs, its behavior is the same as ^
when multiline is not enabled. I think it's better to stick with ^
as it's more common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're totally right about the initial character class 🤦♀️
I think \A
is perfectly standard. I first learned about it in the context of writing safe regexes in a web app context (see the very first entry here for example). Testing some examples with Go indicates that it unexpectedly (to me) doesn't allow the newline already, and indeed it looks like it treats ^
as \A
by default: https://github.com/golang/go/blob/master/src/regexp/syntax/parse.go#L819-L825
api/internal/git/repospec.go
Outdated
// Start accumulating the host part. | ||
// Order matters here. | ||
consumeHostStrings([]string{"git::", "gh:", "ssh://", "https://", "http://", "file://"}) | ||
if p := userRegex.FindString(n); p != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we end up forever supporting everything that we make work, I'm trying to wrap my head around everything this makes possible in addition to what if anything it could break.
- Since we're relying on the
@
in effect, what other valid paths might contain it, and does our parsing of them change before/after? For example,https://username:[email protected]/username/repository.git
seems to parse currently, albeit discarding the username/password parts. - For the purposes of the feature you need, we only need to add support for arbitrary usernames when SSH in particular is being used with Github specifically, but will this make anything else new work? It seems to me arbitrary protocol prefixes will be accepted with arbitrary usernames, but only for Github. Other hosts will continue to only work with the Git user. Should we further restrict this? Any new combos that would work should get test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://username:[email protected]/username/repository.git seems to parse currently, albeit discarding the username/password parts.
I have added a test to preserve the behavior.
I've also updated the regexp to specifically match on SSH only for GitHub.com
n = n[len(p):] | ||
host += p | ||
} | ||
consumeHostStrings([]string{"github.com:", "github.com/"}) | ||
if host == "git@" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing devil's advocate with my own comment above: arguably this conditional here is our current username handling, and we could reduce special casing in the implementation by changing it to apply whenever we've found any username, and allow it to consume any domain, rather than just github.com
above. That would add risk in the form of additional supported permutations we can't specifically integration test... but in another way also reduce it by making use cases like the new one in this PR use the code paths we already have integration coverage for (not to mention reducing the difficulty of understanding this code). Wdyt?
cc @natasha41575 for another opinion on this tradeoff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledging I read this comment but don't have an opinion. I do like the idea! I'd prefer not to include that change in this PR though, since it would probably be much bigger.
@mightyguava: This PR has multiple commits, and the default merge method is: merge. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@KnVerey bump! |
api/internal/git/repospec.go
Outdated
if strings.Contains(s, "github.com") { | ||
if strings.Contains(s, "git@") || strings.Contains(s, "ssh:") { | ||
switch { | ||
case len(m) > 0: | ||
if strings.HasPrefix(s, "git::") && m[1] != "git@" { | ||
return "", fmt.Errorf("git protocol on github.com only allows git@ user") | ||
} | ||
host = m[1] + "github.com:" | ||
case strings.Contains(s, "ssh:"): | ||
host = "[email protected]:" | ||
} else { | ||
default: | ||
host = "https://github.com/" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest not use "switch/case".
Reason: "switch/case" is read as the conditions are comparable or in parallel. They should be considered as one or the other (this is also why "fallthrough" is arguable). I don't think this is what your code is trying to deliver, so it's better to not use switch/case to make the code more readability. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understand this statement. Go switch statements do not behave like that of other languages. In Go, a switch statement behaves like a series of if/else
statements, which this is replacing. Fallthrough only happens when the explicit fallthrough
keyword is used.
Are you saying that we should avoid switch
in general because it behaves differently in Go from other languages and can be confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion. I'm reviewing the PR from the readability perspective (whether the code itself can be self-explained and easy to maintain). That's said, the switch/case here can be used as if/else and the logic is 100% correct, but it seems not very straightforward to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @yuwenma's comment from readability perspective here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be easier to read and maintain the code if we do not use regex (or we can use regex group which might be a better option, but go regexp
does not have a good support for that 😞 ).
For example,
func normalizeGitHostSpec(host string) (string, error) {
s := strings.ToLower(host)
if !strings.Contains(s, "github.com") {
// shall we check the "ssh::" as well?
return strings.TrimPrefix(s, "git::"), nil
}
//either switch/case or if condition, defer to you.
switch true {
case s == githubSSH:
return githubSSH, nil
case strings.HasPrefix(s, "git::"):
s = strings.TrimPrefix(s, "git::")
user, _ := strings.Split(s, "@")
if user != "git" {
return "", fmt.Errorf("git protocol on github.com only allows git@ user")
}
return githubSSH
case strings.HasPrefix(s, "ssh://"):
user, _ := strings.Split(s, "ssh://")
return user + "github.com:", nil
case strings.HasPrefix(s, "ssh:"):
return githubSSH
default:
return githubHTTP
}
The main concern I have is that the current code shares the git protocal logic paths. Basically it treats git::
and ssh://
together, and handles git::
with an additional constraints ( only support git@ user) inside the shared logic, and then handles ssh:
and git::
again in the following if conditions. If future features need to change the git::
logic, it will affect the path of ssh://
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"then handles ssh: and git:: again in the following if conditions" --> I mean the second switch/case
(ssh:
) condition and the if statement after the switch/case
(ln291-293)
In my opinion, your suggested option has 5 if/else cases, has slightly more complexity in each case, and is thus less readable and more difficult to maintain.
Feel free to give other code structure solutions. Basically this ssh conditions are complicated, so I'm leaning towards listing each individual case rather than shorter lines of code. It's the similar debate about
return a & b & c & d & e
vs
if !a:
return false
if !b
return false
if !c:
return false
if !d:
return false
if !e:
return false
return true
the one-line code seems to be pretty, but it's always the latter that is easier to maintain and read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't intending to distract the PR review to this discussion that much.
the original code is quite easy to read
if strings.Contains(s, "git@") || strings.Contains(s, "ssh:") {
host = "[email protected]:"
} else {
host = "https://github.com/"
}
I saw that your request is to support one more case, like ssh://[email protected]
. If it's not convincing enough for you to change the code, can you make the minimum change like (psycho code)
if strings.Contains(s, "git@") || strings.Contains(s, "ssh:") {
org := allowOtherOrg(s)
if org != "" {
host = org + "@github.com:
} else {
host = "[email protected]:"
}
} else {
host = "https://github.com/"
}
func allowOtherOrg(s string) string {
// regex for the exact pattern your want to add "^ssh://[a-zA-Z][a-zA-Z0-9-]*@"
// return the org
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for suggesting the readability improvements. I agree that it would be best to make sure the code is as simple and maintainable for the maintainers and future contributors.
I do believe though, the current code is a pretty simple option. Any changes to be made should both make the code simpler and preserve the logic. I think the options suggested so far does one or the other, but not both.
The cases I would like to cover are:
- GOOD -
[email protected]
- GOOD -
ssh://[email protected]
- GOOD -
[email protected]
- GOOD -
ssh://[email protected]
- GOOD -
git::[email protected]
- BAD -
git::[email protected]
Your newest suggestion would support cases 2-5, but would not support case 1. I think would also produce a strange result for case 6 and will return https://github.com/
instead of failing (though I acknowledge that may be the current behavior, but it is incorrect behavior).
In your other suggestion with 5 case statements, there are also a couple things I think are lacking from the maintainability and correctness perspective. I mentioned above but didn't go into detail but the following statement
case strings.HasPrefix(s, "ssh://"):
user, _ := strings.Split(s, "@") // I changed this from "ssh://", assume this is what you intended
return user + "github.com:", nil
would handle the good case of ssh://[email protected]
, producing org-12345
as the user. However it would also handle the bad, but common case of ssh://github.com/sig-kubernetes/kustomize@master
, which would produce user github.com/sig-kubernetes/kustomize
and host github.com/sig-kubernetes/[email protected]
.
I personally think that additional branching is more cognitive load for the reader than a small regular expression like ^(?:ssh://|git::)?([a-zA-Z][a-zA-Z0-9-]*@
. This is actually why I went against @KnVerey's suggestion of using \A
above. Using \A
would be just as correct but I believe would reduce readability, just so ever slightly.
Regarding readability of the switch
statement and whether it should be used, using switch
in place of if/else
statements is idiomatic Go. I acknowledge this may not be the case in Java/C++, but it is a common pattern in Golang. It is in Effective Go. It is also commonly used within the Go standard library, like here and here. There are many other use cases. The Go standard library is widely considered a reference for Go style.
I think we both have the same goals and I appreciate your suggestions. I would also like to keep this codebase maintainable for future authors. I currently don’t see a good way to make the logic simpler while preserving the behavior. If you or maybe @natasha41575 suggest an alternative that is logically equivalent and also simpler, I would be happy to make the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made an update to reduce nested branching. I think it ends up being more complicated but maybe it's more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your newest suggestion would support cases 2-5, but would not support case 1. I think would also produce a strange result for case 6 and will return https://github.com/ instead of failing
Gotcha. My suggestion is from reading the previous and this PR's code intention. This is why I feel like the code logic is kind of twisted and need some readability improvements.
I think your newest code does very good job on the variable names. They are meaningful and explains the code very well. @KnVerey also has some nice suggestions about comments and regex.
Besides that, this PR looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OOO for a few days for the Canadian Thanksgiving holiday. Once @mightyguava has a chance to look at my latest comments, let's focus the next round of reviews on the correctness concerns please.
|
||
// Special treatment for github.com | ||
if strings.Contains(host, "github.com") { | ||
m := githubRegexp.FindStringSubmatch(host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is matching the various inputs we normalize to an SSH Github URL, right? Let's make the regex var name more specific.
I think it would also be helpful to have more comments in this section. For example, on L291, we are returning a normalize HTTPS URL when we've concluded that it is not SSH.
if strings.HasPrefix(s, "git::") { | ||
|
||
// The git:: syntax is meant to force the Git protocol (separate from SSH | ||
// and HTTPS), but we drop it here, to preserve past behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that Github actually does not support the git protocol? I just discovered this myself. https://blog.readthedocs.com/github-git-protocol-deprecation/
realHost = "github.com:" | ||
} | ||
|
||
const gitUser = "git" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The git user is referenced several other times in this file, so I'm thinking this should be introduced external to this method (possibly as git@
instead).
|
||
switch { | ||
case isGitProtocol && !isGitUser: | ||
return "", fmt.Errorf("git protocol on github.com only allows git@ user") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I just discovered and mentioned above re: git protocol not being supported at all on Github changes my mind about this error. In light of that, at least at the current point in history, removal of git://
is essentially auto-correcting a removed protocol in favour of one that will still work. And when we're talking SSH, AFAIK we have no way to distinguish between the public Github case where only the git username is valid and the GHE case you are creating this PR for where anything can be valid.
I think this realization lets us simplify back much closer to the original implementation.
} | ||
userName, realHost := m[1], m[2] | ||
|
||
if realHost == "github.com/" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only other possibility is that it already ends in a colon right? Can we force the colon directly in the return result and avoid needing to capture the host in the regex at all?
consumeHostStrings([]string{"git::", "gh:", "ssh://", "https://", "http://", "file://"}) | ||
if p := userRegexp.FindString(n); p != "" { | ||
n = n[len(p):] | ||
host += p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm right about the simplification from the git protocol not being valid with Github, then I think the remaining source of correctness risk is for any non-Github URLs that work if they contain usernames today. Notably, I found an interaction between this code and the non-Github handling at L261, which expects the protocol to have been the last thing parsed. For example, the following smoke test passes on master and fails (with poor host parsing) on this branch:
{
name: "t71",
input: "ssh://[email protected]/ourteamname/ourrepositoryname.git//path?ref=branch",
cloneSpec: "ssh://[email protected]/ourteamname/ourrepositoryname.git",
absPath: notCloned.Join("path"),
repoSpec: RepoSpec{
Host: "ssh://[email protected]/",
OrgRepo: "ourteamname/ourrepositoryname",
Path: "/path",
Ref: "branch",
GitSuffix: ".git",
},
},
I suspect we need to make more changes to this method to both make this work and leave the method in a state that makes some sense.
After reading @KnVerey comment , I feel like I might misevaluate the complexity of this PR. I was assuming @mightyguava has a very specific use case he wants to support ( the @mightyguava shall we update the description to cover all of those? (PR description is super helpful to record the original intent and specific use cases we want to support. It helps deciding the future maintenance effort as well) |
I think that your impression is correct--that is the goal of this PR. The additional cases come from the complexity of the existing code, which makes it extremely difficult to add this feature without affecting how other input URLs get parsed. It is important that anything that successfully built in the past will continue to build successfully. Ideally we would also not make anything new possible besides the intended case, but the ways to do that may have trade-offs in terms of increasing the complexity of the already complex code here. Related to that, I think most of us who have interacted with this PR agree that RepoSpec as it stands on master today needs a significant refactor, and that some of the things it supports today are unintentional / don't make sense. Given the risk of breaking changes there, the maintainers team agreed at our last meeting that this would be a good thing to do as part of the next major release, potentially with a pre-release period soliciting feedback on any formats we may have unintentionally removed support for. |
Thank you all for the reviews, @KnVerey @yuwenma @natasha41575. I really appreciate your time and attention. I wanted to give a heads up that unfortunately this PR has significantly exceeded the time I allotted myself to push it to completion, particularly with time spent on #4640. I'm going to have to put this on the backburner and focus more on my day job for a while. I'd still really like to get this change in though, so I plan to circle back after around a month. |
@mightyguava: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Closing in favour of #4986 |
Organizations that use a custom SSH Certificate Authority with GitHub
Enterprise Cloud will have a host that starts with
org-12345@
insteadof
git@
. This removes the hard-codedgit@
in the repo spec parsingfor a regexp match on a valid username instead.
E.g. now it will be able to parse specs like
[email protected]:someOrg/someRepo/README.md?ref=someBranch
.Docs here