Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match NO_PROXY formats to libcurl behaviour #501

Merged
merged 1 commit into from
Nov 2, 2021

Conversation

mjcheetham
Copy link
Collaborator

We aim to be compatible with the behaviour of Git as much as possible when it comes to network settings. This enables users to setup Git proxy settings and get the same setup "for free" with GCM.

Git uses libcurl to provide it's HTTP interactions. The NO_PROXY setting is used by libcurl to disable proxy settings for specific hosts.

We previously attempted to plumb the value of NO_PROXY through to the .NET WebProxy class' list of "bypassed addresses" (the set of hosts that should not be proxied). However, the .NET class expects a set of regular expressions which is unlike libcurl!

As a result, libcurl permitted values for NO_PROXY were throwing errors inside of GCM since they are not valid regexs.

In this commit we perform a transformation of the NO_PROXY list and construct a set of regular expressions that match addresses in the same way as libcurl does.

The transformation is as follows:

  1. strip any leading periods . or wildcards *.
  2. escape the remaining input to match literally (e.g.: . becomes \.)
  3. prepend a group that matches either a period '.' or a URI scheme delimiter :// - this prevents partial domain matching
  4. append a end-of-string symbol $ to ensure we only match to the specified TLD and port

See the libcurl documentation on NO_PROXY behaviour: https://curl.se/libcurl/c/CURLOPT_NOPROXY.html

Fixes #497

@mjcheetham mjcheetham added bug A bug in Git Credential Manager network Related to networking (VPN, TCP/IP, etc) labels Oct 20, 2021
// For example: "example.com" should match "http://example.com" and
// "http://www.example.com" but not "http://notanexample.com".
regex.Append(@"(\.|\:\/\/)");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we care about things like "http://*.example.com" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered in the case of the \. arm of the group. See this example: https://regex101.com/r/Q1kE2R/1
There is also a unit test for this case "www.example.com".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does curl strip off the "http" vs "https" so we don't need to worry about one being handled one way and the other differently?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This group at the start says either it must be . and then the user value "example.com" OR it must form the start of the hostname. I'd like to have used ^ here, but the .NET code actually matches against the full URL (so it includes "scheme://hostname"). To match the "start" of the hostname, I use the URL scheme delimiter ://.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT "http://*.example.com" my concern was that we have explicit code to strip off leading "star dot", but an inline "star" here will pass thru (and maybe be escaped).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh right.. so where the user has set NO_PROXY=http://*.example.com? I.e., they've put both the protocol/scheme prefix and an inline *.? That is not supported, at least AFAIK, with libcurl either.

Or did you mean NO_PROXY=example.*.com? In that case we would escape the .*. to \.\*\., which I would again consider "invalid". libcurl says it does not support any wildcards except just NO_PROXY=* (which turns off the proxying all together).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing a leading *. is a convenience here. Not sure what libcurl would do here. I know it does ignore a leading . (for example .example.com == example.com).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about NO_PROXY=http://*.example.com and was afraid that we'd get an escaped * passed to the RegEx. I was not thinking about the example.*.com case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that libcurl does not match NO_PROXY values that include the scheme (http[s]://) to hosts.

$ https_proxy=http://localhost:8080/proxy no_proxy=https://google.com curl -v https://google.com
* Uses proxy env variable no_proxy == 'https://google.com'
* Uses proxy env variable https_proxy == 'http://localhost:8080/proxy'
*   Trying ::1...
* TCP_NODELAY set
* Connection failed
* connect to ::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connection failed
* connect to 127.0.0.1 port 8080 failed: Connection refused
* Failed to connect to localhost port 8080: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 8080: Connection refused

Above I set the HTTPS proxy to "http://localhost:8080/proxy", and a NO_PROXY value of https://google.com. Attempting to make the request to https://google.com I am still attempting to proxy via the fake localhost proxy. The NO_PROXY value is not matched.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, libcurl doesn't match with no_proxy=*.google.com either:

$ https_proxy=http://localhost:8080/proxy no_proxy=*.google.com curl -v https://google.com
* Uses proxy env variable no_proxy == '*.google.com'
* Uses proxy env variable https_proxy == 'http://localhost:8080/proxy'
*   Trying ::1...
* TCP_NODELAY set
* Connection failed
* connect to ::1 port 8080 failed: Connection refused
[...]

.. so GCM is actually allowing more formats than libcurl does here. I don't think this is a major problem since it's a superset? We can also just drop the lead *. trimming code?

@jeffhostetler what do you think?

Note that no_proxy=.google.com and no_proxy=google.com work as expected (the request to https://google.com is not proxied).

Copy link
Contributor

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just a small suggested addition to your test data.

We aim to be compatible with the behaviour of Git as much as possible
when it comes to network settings. This enables users to setup Git proxy
settings and get the same setup "for free" with GCM.

Git uses libcurl to provide it's HTTP interactions. The NO_PROXY setting
is used by libcurl to disable proxy settings for specific hosts.

We previously attempted to plumb the value of NO_PROXY through to the
.NET WebProxy class' list of "bypassed addresses" (the set of hosts that
should not be proxied). However, the .NET class expects a set of
_regular expressions_ which is unlike libcurl!

As a result, libcurl permitted values for NO_PROXY were throwing errors
inside of GCM since they are not valid regexs.

In this commit we perform a transformation of the NO_PROXY list and
construct a set of regular expressions that match addresses in the same
way as libcurl does.

The transformation is as follows:

1. strip any leading periods '.' or wildcards '*.'
2. escape the remaining input to match literally (e.g.: '.' becomes '\.')
3. prepend a group that matches either a period '.' or a URI scheme
   delimiter '://' - this prevents partial domain matching
4. append a end-of-string symbol '$' to ensure we only match to the
   specified TLD and port

See the libcurl documentation on NO_PROXY behaviour:
https://curl.se/libcurl/c/CURLOPT_NOPROXY.html
@mjcheetham mjcheetham merged commit a391785 into git-ecosystem:main Nov 2, 2021
@mjcheetham mjcheetham deleted the noproxy-globs branch November 2, 2021 10:32
This was referenced Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug in Git Credential Manager network Related to networking (VPN, TCP/IP, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GCM core fails when NO_PROXY contains a glob pattern
3 participants