-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match NO_PROXY
formats to libcurl behaviour
#501
Conversation
// For example: "example.com" should match "http://example.com" and | ||
// "http://www.example.com" but not "http://notanexample.com". | ||
regex.Append(@"(\.|\:\/\/)"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we care about things like "http://*.example.com" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covered in the case of the \.
arm of the group. See this example: https://regex101.com/r/Q1kE2R/1
There is also a unit test for this case "www.example.com".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does curl strip off the "http" vs "https" so we don't need to worry about one being handled one way and the other differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This group at the start says either it must be .
and then the user value "example.com" OR it must form the start of the hostname. I'd like to have used ^
here, but the .NET code actually matches against the full URL (so it includes "scheme://hostname"). To match the "start" of the hostname, I use the URL scheme delimiter ://
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WRT "http://*.example.com" my concern was that we have explicit code to strip off leading "star dot", but an inline "star" here will pass thru (and maybe be escaped).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh right.. so where the user has set NO_PROXY=http://*.example.com
? I.e., they've put both the protocol/scheme prefix and an inline *.
? That is not supported, at least AFAIK, with libcurl either.
Or did you mean NO_PROXY=example.*.com
? In that case we would escape the .*.
to \.\*\.
, which I would again consider "invalid". libcurl says it does not support any wildcards except just NO_PROXY=*
(which turns off the proxying all together).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing a leading *.
is a convenience here. Not sure what libcurl would do here. I know it does ignore a leading .
(for example .example.com == example.com
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about NO_PROXY=http://*.example.com
and was afraid that we'd get an escaped *
passed to the RegEx. I was not thinking about the example.*.com
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that libcurl does not match NO_PROXY
values that include the scheme (http[s]://
) to hosts.
$ https_proxy=http://localhost:8080/proxy no_proxy=https://google.com curl -v https://google.com
* Uses proxy env variable no_proxy == 'https://google.com'
* Uses proxy env variable https_proxy == 'http://localhost:8080/proxy'
* Trying ::1...
* TCP_NODELAY set
* Connection failed
* connect to ::1 port 8080 failed: Connection refused
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connection failed
* connect to 127.0.0.1 port 8080 failed: Connection refused
* Failed to connect to localhost port 8080: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 8080: Connection refused
Above I set the HTTPS proxy to "http://localhost:8080/proxy", and a NO_PROXY
value of https://google.com
. Attempting to make the request to https://google.com
I am still attempting to proxy via the fake localhost
proxy. The NO_PROXY
value is not matched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, libcurl doesn't match with no_proxy=*.google.com
either:
$ https_proxy=http://localhost:8080/proxy no_proxy=*.google.com curl -v https://google.com
* Uses proxy env variable no_proxy == '*.google.com'
* Uses proxy env variable https_proxy == 'http://localhost:8080/proxy'
* Trying ::1...
* TCP_NODELAY set
* Connection failed
* connect to ::1 port 8080 failed: Connection refused
[...]
.. so GCM is actually allowing more formats than libcurl does here. I don't think this is a major problem since it's a superset? We can also just drop the lead *.
trimming code?
@jeffhostetler what do you think?
Note that no_proxy=.google.com
and no_proxy=google.com
work as expected (the request to https://google.com
is not proxied).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just a small suggested addition to your test data.
src/shared/Microsoft.Git.CredentialManager.Tests/SettingsTests.cs
Outdated
Show resolved
Hide resolved
f466150
to
8b19495
Compare
We aim to be compatible with the behaviour of Git as much as possible when it comes to network settings. This enables users to setup Git proxy settings and get the same setup "for free" with GCM. Git uses libcurl to provide it's HTTP interactions. The NO_PROXY setting is used by libcurl to disable proxy settings for specific hosts. We previously attempted to plumb the value of NO_PROXY through to the .NET WebProxy class' list of "bypassed addresses" (the set of hosts that should not be proxied). However, the .NET class expects a set of _regular expressions_ which is unlike libcurl! As a result, libcurl permitted values for NO_PROXY were throwing errors inside of GCM since they are not valid regexs. In this commit we perform a transformation of the NO_PROXY list and construct a set of regular expressions that match addresses in the same way as libcurl does. The transformation is as follows: 1. strip any leading periods '.' or wildcards '*.' 2. escape the remaining input to match literally (e.g.: '.' becomes '\.') 3. prepend a group that matches either a period '.' or a URI scheme delimiter '://' - this prevents partial domain matching 4. append a end-of-string symbol '$' to ensure we only match to the specified TLD and port See the libcurl documentation on NO_PROXY behaviour: https://curl.se/libcurl/c/CURLOPT_NOPROXY.html
8b19495
to
8cd79a3
Compare
We aim to be compatible with the behaviour of Git as much as possible when it comes to network settings. This enables users to setup Git proxy settings and get the same setup "for free" with GCM.
Git uses libcurl to provide it's HTTP interactions. The
NO_PROXY
setting is used by libcurl to disable proxy settings for specific hosts.We previously attempted to plumb the value of NO_PROXY through to the .NET
WebProxy
class' list of "bypassed addresses" (the set of hosts that should not be proxied). However, the .NET class expects a set of regular expressions which is unlike libcurl!As a result, libcurl permitted values for
NO_PROXY
were throwing errors inside of GCM since they are not valid regexs.In this commit we perform a transformation of the
NO_PROXY
list and construct a set of regular expressions that match addresses in the same way as libcurl does.The transformation is as follows:
.
or wildcards*.
.
becomes\.
)://
- this prevents partial domain matching$
to ensure we only match to the specified TLD and portSee the libcurl documentation on
NO_PROXY
behaviour: https://curl.se/libcurl/c/CURLOPT_NOPROXY.htmlFixes #497