Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolution issues when connected to a VPN #3536

Closed
devth opened this issue Oct 17, 2015 · 40 comments
Closed

DNS resolution issues when connected to a VPN #3536

devth opened this issue Oct 17, 2015 · 40 comments
Labels
bug build Auto-pinning core v0.7 Issues (primarily bugs) reported against v0.7 releases v0.8 Issues (primarily bugs) reported against v0.8 releases v0.9 Issues (primarily bugs) reported against v0.9 releases v0.10 Issues (primarily bugs) reported against v0.10 releases v0.11 Issues (primarily bugs) reported against v0.11 releases v0.12 Issues (primarily bugs) reported against v0.12 releases

Comments

@devth
Copy link

devth commented Oct 17, 2015

Using 0.6.3:

± tf --version
Terraform v0.6.3

± TF=INFO tf plan
Refreshing Terraform state prior to plan...

openstack_lb_pool_v1.clusters_preprod_pool: Refreshing state... (ID: a7b1aac5-7e24-4010-be81-c9f278729468)
openstack_lb_vip_v1.clusters-preprod: Refreshing state... (ID: f46a09a0-4666-4f98-bd30-2b8ce871f411)

The Terraform execution plan has been generated and is shown below.
Resources are shown in alphabetical order for quick scanning. Green resources
will be created (or destroyed and then created if an existing resource
exists), yellow resources are being changed in-place, and red resources
will be destroyed.

Note: You didn't specify an "-out" parameter to save this plan, so when
"apply" is called, Terraform can't guarantee this is what will execute.

I upgraded to 0.6.4, ran with same Openstack env vars and:

± tf --version
Terraform v0.6.4

± tf plan
Refreshing Terraform state prior to plan...

Error refreshing state: 1 error(s) occurred:

* Post https://os-identity.vip.foo.com:5443/v2.0/tokens: dial tcp: lookup os-identity.vip.foo.com: no such host
@jtopjian
Copy link
Contributor

jtopjian commented Nov 1, 2015

Is https://os-identity.vip.foo.com a resolvable domain? Additionally, is Keystone running on port 5443? It usually runs on port 5000.

edit: err... I see. You simply swapped out versions of Terraform. Can you still confirm that the Keystone URL is resolvable? Is the domain an entry in /etc/hosts (or equivalent) and not necessarily a real domain name?

If Terraform isn't resolving it, that might be a problem with Terraform core and not specifically OpenStack.

Let me know 😄

@jtopjian
Copy link
Contributor

jtopjian commented Nov 1, 2015

I was also reading your description of the issue in #3345. Can you verify the problem either exists or doesn't exist with the latest, unmodified 0.6.6 binaries?

@bluk
Copy link

bluk commented Jan 30, 2016

I ran into this issue on a private OpenStack instance. For me, I'm logging in over a VPN on OS X and while I can hit the various endpoints in the browser (and resolve correctly via ping and other utilities), Terraform seems to not resolve the IP address correctly for the various endpoints (DNS lookups I think fail but hard to tell).

If I hardcode the IP addresses of the various OpenStack domain names, I can get it to work by editing my /etc/hosts. Notably Packer does not have this issue in spawning up an instance and building an image. This is on Terraform 0.6.8 through 0.6.10.

@jtopjian
Copy link
Contributor

@bluk Thank you for the info!

To confirm: When you are logged in over a VPN, you are then using a VPN-specific DNS resolver in order to resolve hosts/domains that are only accessible over the VPN?

@bluk
Copy link

bluk commented Jan 30, 2016

@jtopjian Yes, it's a VPN specific DNS resolver.

@jtopjian
Copy link
Contributor

@bluk OK, thanks. Does the VPN software update your /etc/resolv.conf file so that all DNS requests now go through your VPN? Or are lookups done by some other means?

@jtopjian jtopjian added the waiting-response An issue/pull request is waiting for a response from the community label Feb 28, 2016
@btyler97
Copy link

I noticed there hasn't been activity on this issue in a while, but I am experiencing the same issue and @jtopjian I can confirm that the VPN software does NOT update /etc/resolv.conf (at least not in my case). The VPN software I am using is Sonicwall Mobile Connect and I'm on OS X El Capitan. I understand that the old NetExtender software does update /etc/resolv.conf; however, there are issues with it on El Cap, so we're stuck with the mobile connect client.

@jtopjian
Copy link
Contributor

@btyler97 Thanks for the information. To confirm: this is only happening when you're connected to the VPN? Are you able to use the OpenStack command line tools while you're connected to the VPN?

@pryorda
Copy link

pryorda commented Mar 29, 2016

@jtopjian here is a link on how dns works with mobile connect. Might help with diagnosing the issue. https://support.software.dell.com/kb/sw11559

@jtopjian
Copy link
Contributor

@pryorda Thanks!

At this point, I'm trying to make a confident determination that the issue everyone is seeing is only happening when they are connected to a VPN. If so, then I believe this issue isn't local to just the OpenStack provider, but possibly Terraform core and/or Golang.

I think the main reason why this problem is manifesting within the OpenStack provider is because it's one of the few providers within Terraform that communicates with a non-public cloud provider. DNS resolution behavior might be different depending on how the DNS infrastructure that contains the OpenStack endpoint records is configured along with the VPN. The link @pryorda gave, seems to support that theory.

@btyler97
Copy link

@jtopjian I can confirm that this is only an issue when connected via the VPN. After some late night research I'm also of the belief that the issue isn't local to the Openstack provider. The Golang docs on the "net" package hint at a possible cause (https://golang.org/pkg/net/) under the "Name Resolution" heading. I tried setting the ENV variables they suggested, but I'm probably doing something wrong as I didn't notice any change. Unfortunately, I just don't have enough familiarity with Go to know if they aren't applicable in this situation or if I'm missing something.

@pryorda
Copy link

pryorda commented Mar 30, 2016

@jtopjian Here is what we found... The issue is that Mac OS X native net dns resolver goes directly to resolv.conf and our vpn client does not update the resolv.conf since it split tunnels the queries based on dns suffix. We fixed the issue by having it build using this command:

export CGO_ENABLED=1; XC_OS="darwin" XC_ARCH="amd64" make bin

A packet capture confirmed that it was traversing the vpn rather then going directly to the servers in resolv.conf.

@jtopjian
Copy link
Contributor

@pryorda @btyler97 Nice! Thank you for the investigation.

I'm going to label this as a Core bug to get some other eyes on it.

@jtopjian jtopjian added core and removed provider/openstack waiting-response An issue/pull request is waiting for a response from the community labels Mar 30, 2016
@jtopjian jtopjian changed the title Openstack identity broken for me in 0.6.4 DNS resolution issues when connected to a VPN Mar 31, 2016
@bacoboy
Copy link

bacoboy commented Nov 21, 2016

Seems like you need this upstream change to go language networking for this to work as expected:
golang/go#12524

@pryorda
Copy link

pryorda commented Dec 6, 2016

We tried that and that doesnt work well with split horizon dns.

@apparentlymart
Copy link
Contributor

I had mentioned this in passing in #14781, but want to put it here too for posterity:

Currently we use Go's native cross-compilation support to build the release binaries for all supported platforms, but that approach doesn't give us the OS-specific libraries and headers needed to use CGo on OS X, and thus we aren't able to use the libc resolver. In future we may be able to use xgo to work around this, but we won't have time to do this in the immediate term, unfortunately.

@richid
Copy link

richid commented Jul 27, 2017

Just going to throw my $0.02 in here in case it helps someone else.

I currently have a Vault installation sitting in AWS in a VPC using a private Route53 Hosted Zone. This means that the zone is not publicly distributed and can only be accessed within the VPC with which it is associated. To access resources in this VPC I have EC2 instances in the VPC that are used as VPN connectors. I'm running OS X and the VPN software does not update /etc/hosts, rather the OS-level DNS hooks which can be inspected via scutil --dns.

When configuring Terraform's Vault provider I get the dial tcp: lookup vault.internal.company.com on 192.168.130.1:53: no such host error. The quick way around this for me was to run route get vault.internal.company.com (again, on OS X) and put that IP into my /etc/hosts file. I may be way off but it seems like if we just let the OS do the resolution (rather than do it explicitly) it should work. But I'm sure it's not that simple.

@jason-riddle
Copy link
Contributor

Not sure what happened, but it looks like this was resolved? I can't replicate this anymore.

@apparentlymart
Copy link
Contributor

Nothing specific has changed within Terraform itself to support this, but we did switch to Go 1.9 for the latest two releases, so possibly there is some new behavior in Go 1.9 that is making this smoother.

I didn't see anything in the release notes specifically about this, but there were some DNS-related changes in the 1.9 timeframe that may have changed the situation here. Versions 0.10.3 and 0.10.4 were built with Go 1.9, while 0.10.2 was built with 1.8. If someone has the time to compare the behavior on 0.10.2 vs. 0.10.4, that could help confirm whether this got resolved by changes in Go 1.9.

@hashibot hashibot added the build Auto-pinning label Aug 27, 2019
@hashibot hashibot added v0.10 Issues (primarily bugs) reported against v0.10 releases v0.11 Issues (primarily bugs) reported against v0.11 releases v0.12 Issues (primarily bugs) reported against v0.12 releases v0.7 Issues (primarily bugs) reported against v0.7 releases v0.8 Issues (primarily bugs) reported against v0.8 releases v0.9 Issues (primarily bugs) reported against v0.9 releases labels Aug 29, 2019
@GJKrupa
Copy link

GJKrupa commented Oct 11, 2019

Looks like the Go ticket isn't heading anywhere. We've suffered this same problem on Vault as well. Wouldn't it be possible to just build a CGO-enabled binary using the Travis macOS environment? https://docs.travis-ci.com/user/reference/osx/

@pmoust
Copy link
Contributor

pmoust commented Feb 8, 2020

I understand that there are benefits with Go's resolver.
However, I am missing what's the technical reason why Terraform does not switch to cgo for MacOSX binaries to satisfy users that are impacted by the current behavior described in this GH issue?
Is this driven purely on licensing concerns?

@techdragon
Copy link

techdragon commented Apr 6, 2020

Adding some more fuel to this fire. On macOS Mojave (10.14.6) with no VPN installed I am getting this behaviour attempting to perform a stock terraform init with only the AWS provider in the main.tf file. The /etc/resolv.conf file that the golang network stack expects to exist, does not exist. Other go programs seem fine, and I can curl the well known path fine. curl -k https://registry.terraform.io/.well-known/terraform.json prints out {"modules.v1":"/v1/modules/","providers.v1":"/v1/providers/"}. So it's not a network stack issue.

With these debug options...

export TF_LOG=TRACE
export GODEBUG=netdns=cgo+1

... in the shell environment, the terraform init output logs are as follows:

2020/04/06 15:38:04 [INFO] Terraform version: 0.12.24  
2020/04/06 15:38:04 [INFO] Go runtime version: go1.13.8
2020/04/06 15:38:04 [INFO] CLI args: []string{"/usr/local/bin/terraform", "init"}
2020/04/06 15:38:04 [DEBUG] Attempting to open CLI config file: /Users/sam/.terraformrc
2020/04/06 15:38:04 [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2020/04/06 15:38:04 [INFO] CLI command args: []string{"init"}
2020/04/06 15:38:04 [TRACE] Meta.Backend: no config given or present on disk, so returning nil config
2020/04/06 15:38:04 [TRACE] Meta.Backend: backend has not previously been initialized in this working directory
2020/04/06 15:38:04 [DEBUG] New state was assigned lineage "41f50108-c09f-fdd9-5be7-05053b1380b3"
2020/04/06 15:38:04 [TRACE] Meta.Backend: using default local state only (no backend configuration, and no existing initialized backend)
2020/04/06 15:38:04 [TRACE] Meta.Backend: instantiated backend of type <nil>
2020/04/06 15:38:04 [DEBUG] checking for provider in "."
go package net: built with netgo build tag; using Go's DNS resolver

Initializing the backend...
2020/04/06 15:38:04 [ERR] Checkpoint error: Get https://checkpoint-api.hashicorp.com/v1/check/terraform?arch=amd64&os=darwin&signature=fbac2f67-67bf-ee92-f08d-ab394075ba45&version=0.12.24: dial tcp: lookup checkpoint-api.hashicorp.com on [::1]:53: read udp [::1]:61518->[::1]:53: read: connection refused
2020/04/06 15:38:04 [DEBUG] checking for provider in "/usr/local/bin"
2020/04/06 15:38:04 [DEBUG] checking for provisioner in "."
2020/04/06 15:38:04 [DEBUG] checking for provisioner in "/usr/local/bin"
2020/04/06 15:38:04 [INFO] Failed to read plugin lock file .terraform/plugins/darwin_amd64/lock.json: open .terraform/plugins/darwin_amd64/lock.json: no such file or directory
2020/04/06 15:38:04 [TRACE] Meta.Backend: backend <nil> does not support operations, so wrapping it in a local backend
2020/04/06 15:38:04 [TRACE] backend/local: state manager for workspace "default" will:
 - read initial snapshot from terraform.tfstate
 - write new snapshots to terraform.tfstate
 - create any backup at terraform.tfstate.backup
2020/04/06 15:38:04 [TRACE] statemgr.Filesystem: reading initial snapshot from terraform.tfstate
2020/04/06 15:38:04 [TRACE] statemgr.Filesystem: snapshot file has nil snapshot, but that's okay
2020/04/06 15:38:04 [TRACE] statemgr.Filesystem: read nil snapshot
2020/04/06 15:38:04 [DEBUG] checking for provider in "."
2020/04/06 15:38:04 [DEBUG] checking for provider in "/usr/local/bin"
2020/04/06 15:38:04 [DEBUG] plugin requirements: "aws"=""
2020/04/06 15:38:04 [DEBUG] Service discovery for registry.terraform.io at https://registry.terraform.io/.well-known/terraform.json
2020/04/06 15:38:04 [TRACE] HTTP client GET request to https://registry.terraform.io/.well-known/terraform.json

Initializing provider plugins...
- Checking for available provider plugins...

2020/04/06 15:38:04 [DEBUG] Failed to request discovery document: Get https://registry.terraform.io/.well-known/terraform.json: dial tcp: lookup registry.terraform.io on [::1]:53: read udp [::1]:54136->[::1]:53: read: connection refused
Registry service unreachable.

This issue should probably be renamed as its quite clear by now that this is not a VPN related issue. This is as simple as macOS + ( go without cgo ) = DNS issues in some cases.

The linked upstream issues in the core go project tracker do not indicate this is a priority for them and I don't really blame them as it appears to be relatively easy for affected projects to work around by using cgo as part of their macOS builds. Unfortunately ( because an upstream fix would be the best outcome ) It looks like it will be necessary for hashicorp to make this workaround part of their build process somehow if we want this fix to happen in a timely manner.

@rjhornsby
Copy link

Any recent thoughts on this? I know it isn't a Hashicorp bug per-se, but more driven by what Go is doing behind the scenes - yet in TF v0.14.5 it's still causing pain. Perhaps I missed it in the thread above, but modifying /etc/resolv.conf on macOS is not a solution.

# This file is not consulted for DNS hostname resolution, address
# resolution, or the DNS query routing mechanism used by most
# processes on this system.

However, you can add resolv.conf-like files to /etc/resolver (see man 5 resolver). This seems to work just fine - except for Go, which is ignoring the OS configuration and doing its own thing because reasons.

Short of recompiling TF (which I don't entirely understand) the only work-arounds I can think of are modifying your local /etc/hosts or adding an explicit record to the DNS server TF (Go, really) is trying to use. In my case, that's PiHole. It works, but it does so by essentially short-circuiting DNS itself.

@ramarnat
Copy link

If you are using unbound with your pi-hole, you can also setup the vpn zones there as well.

@rjhornsby
Copy link

If you are using unbound with your pi-hole, you can also setup the vpn zones there as well.

That's a good thought, but in practice it didn't work. I think that's because the VPN connection is on my host, but not on the pihole/unbound server. This means that the unbound server has no access to the internal DNS servers which are behind the VPN.

@rsik
Copy link

rsik commented Feb 26, 2021

On Mac OS, I have a script for my vpn and while it's not pretty, this is a script portion that allows me to workaround any Hashicorp (go) tools dns issues:

# you can replace "Wi-Fi" with "Ethernet" if using that

networksetup -setdnsservers Wi-Fi empty

networksetup -setdnsservers Wi-Fi [dns-server]

killall -HUP mDNSResponder

@mdeggies
Copy link
Member

mdeggies commented Mar 8, 2021

Hi all- We're actively working to fix this issue across Terraform Core and the Providers. For those of you who are still facing issues and are willing to test out our solution and provide feedback, please email [email protected] with the version of the CLI you're using, and the names and versions of the providers you're using. Thanks!

@jbardin
Copy link
Member

jbardin commented Jun 24, 2021

Hello,

Terraform v1.0.1 has been released, which includes a natively compiled build for MacOS darwin/amd64. This should avoid the above DNS issues on that platform. If there is still a reproducible issue with DNS resolution over a vpn with v1.0.1 on that platform, please feel free to reply here and we can reevaluate.

Thanks!

@github-actions
Copy link
Contributor

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug build Auto-pinning core v0.7 Issues (primarily bugs) reported against v0.7 releases v0.8 Issues (primarily bugs) reported against v0.8 releases v0.9 Issues (primarily bugs) reported against v0.9 releases v0.10 Issues (primarily bugs) reported against v0.10 releases v0.11 Issues (primarily bugs) reported against v0.11 releases v0.12 Issues (primarily bugs) reported against v0.12 releases
Projects
None yet
Development

No branches or pull requests