Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"wait_for_lease = true" is broken in versions greater than 0.7.1 #1091

Open
SJFCS opened this issue Aug 14, 2024 · 12 comments
Open

"wait_for_lease = true" is broken in versions greater than 0.7.1 #1091

SJFCS opened this issue Aug 14, 2024 · 12 comments
Assignees

Comments

@SJFCS
Copy link

SJFCS commented Aug 14, 2024

System Information

Linux distribution

Archlinux

Terraform version

terraform -v
Terraform v1.9.4
on linux_amd64

Provider and libvirt versions

This issue can be reproduced in versions greater than 0.7.1
Tested version:
0.7.1 normal
0.7.4 broken
0.7.6 broken
0.8.1 broken

Description of Issue/Question

If it is greater than 0.7.1, will not wait to obtain IP.
You won't see this prompt "Still creating... [10s elapsed]"

│ Error: couldn't retrieve IP address of domain id: 3ac397de-13cd-485d-9772-872f7652de0d. Please check following: 
│ 1) is the domain running proplerly? 
│ 2) has the network interface an IP address? 
│ 3) Networking issues on your libvirt setup? 
│  4) is DHCP enabled on this Domain's network? 
│ 5) if you use bridge network, the domain should have the pkg qemu-agent installed 
│ IMPORTANT: This error is not a terraform libvirt-provider error, but an error caused by your KVM/libvirt infrastructure configuration/setup 
│  error retrieving interface addresses: error retrieving interface addresses: Virtual machine agent not responding: QEMU host agent not connected

I found that this is not related to whether the network mode is bridge or nat. To simplify the reproduction process and avoid cloudinit interference, I used the Talos ISO boot image below, which includes qemu-guest-agent and can be booted directly as a boot disk.

The metal-amd64.iso (MD5: ebd98e402606991700d8cb5545e72673) can be downloaded from: https://factory.talos.dev/image/ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515/v1.8.2/metal-amd64.iso

You can also build it yourself here: https://factory.talos.dev/ -> Bare-metal Machine -> choose version -> amd64 -> choose System Extensions qemu-guest-agent

#=====================================================================================
# Providers
#=====================================================================================
terraform {
  required_version = ">= 1.6.0"
  required_providers {
    libvirt = {
      source  = "dmacvicar/libvirt"
      version = "0.7.4"
    }
    template = {
      source  = "hashicorp/template"
      version = "2.2.0"
    }
  }
}

provider "libvirt" {
  uri = "qemu:///system"
}

#=====================================================================================
# Libvirt Pool
#=====================================================================================
resource "libvirt_pool" "kubernetes" {
  name = "talos"
  type = "dir"
  path = "/opt/libvirt-pool/talos"
}

#=====================================================================================
# Network
#=====================================================================================
# resource "libvirt_network" "talos" {
#   name      = "talos"
#   mode      = "bridge"
#   bridge    = "br0" # Use the created bridge network card
#   autostart = true
# }
resource "libvirt_network" "talos" {
  name      = "talos"
  mode      = "nat"
  addresses = ["192.168.123.0/24"]
  autostart = true
}
#=====================================================================================
# Domain
#=====================================================================================
resource "libvirt_domain" "domain-talos" {
  name   = "talos"
  memory = "2048"
  vcpu   = 4
  cpu {
    mode = "host-passthrough"
  }

  qemu_agent = true

  boot_device {
    dev = ["cdrom", "hd", "network"]
  }
  network_interface {
    network_id     = libvirt_network.talos.id
    wait_for_lease = true
  }

  # cdrom
  disk {
    file = "/home/admin/Downloads/images/metal-amd64.iso"
  }
  #=====================================================================================
  # Console
  #=====================================================================================
  console {
    type        = "pty"
    target_port = "0"
    target_type = "serial"
  }

  console {
    type        = "pty"
    target_type = "virtio"
    target_port = "1"
  }

  graphics {
    type        = "spice"
    listen_type = "address"
    autoport    = true
  }
  video {
    type = "virtio"
  }
}

# Output the IP addresses
output "ips" {
  value = {
    ip = libvirt_domain.domain-talos.network_interface[0].addresses
  }
}

Reproduction steps

#     set  version = "0.7.1"
terraform init
terraform apply -auto-approve
terraform destroy -auto-approve
# it work !
#     set  version = "0.7.1"
terraform init -upgrade
terraform apply -auto-approve
# it err !
###

when use 0.7.6 debug:
TF_LOG=DEBUG terraform apply -auto-approve

          </graphics>
          <rng model="virtio">
              <backend model="random">/dev/urandom</backend>
          </rng>
      </devices>
  </domain>: timestamp="2024-08-14T23:28:10.086+0800"
2024-08-14T23:28:10.435+0800 [INFO]  provider.terraform-provider-libvirt_v0.7.6: 2024/08/14 23:28:10 [INFO] Domain ID: 1e643687-5914-469d-b5c8-356c5dc65790: timestamp="2024-08-14T23:28:10.435+0800"
2024-08-14T23:28:10.435+0800 [INFO]  provider.terraform-provider-libvirt_v0.7.6: 2024/08/14 23:28:10 [DEBUG] Waiting for state to become: [all-addresses-obtained]: timestamp="2024-08-14T23:28:10.435+0800"
2024-08-14T23:28:15.441+0800 [INFO]  provider.terraform-provider-libvirt_v0.7.6: 2024/08/14 23:28:15 [DEBUG] waiting for network address for iface=52:54:00:16:93:28: timestamp="2024-08-14T23:28:15.440+0800"
2024-08-14T23:28:15.441+0800 [INFO]  provider.terraform-provider-libvirt_v0.7.6: 2024/08/14 23:28:15 [DEBUG] qemu-agent used to query interface info: timestamp="2024-08-14T23:28:15.441+0800"
2024-08-14T23:28:15.443+0800 [ERROR] provider.terraform-provider-libvirt_v0.7.6: Response contains error diagnostic: diagnostic_severity=ERROR tf_proto_version=5.3 tf_provider_addr=provider @caller=github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go:55 @module=sdk.proto tf_req_id=3443beee-8402-aa9f-8e77-364a3bd03a5e tf_resource_type=libvirt_domain tf_rpc=ApplyResourceChange diagnostic_detail=""
  diagnostic_summary=
  | couldn't retrieve IP address of domain id: 1e643687-5914-469d-b5c8-356c5dc65790. Please check following: 
  | 1) is the domain running properly? 
  | 2) has the network interface an IP address? 
  | 3) Networking issues on your libvirt setup? 
  |  4) is DHCP enabled on this Domain's network? 
  | 5) if you use bridge network, the domain should have the pkg qemu-agent installed 
  | IMPORTANT: This error is not a terraform libvirt-provider error, but an error caused by your KVM/libvirt infrastructure configuration/setup 

0.7.1 is work step:

  1. just change this:
    libvirt = {
      source  = "dmacvicar/libvirt"
      version = "0.7.1"
    }
  1. terraform init -upgrade
  2. TF_LOG=DEBUG terraform apply -auto-approve

when use 0.7.1 debug:

2024-08-14T23:26:07.000+0800 [INFO]  provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:07 [DEBUG] waiting for network address for iface=52:54:00:7E:A5:63: timestamp="2024-08-14T23:26:07.000+0800"
2024-08-14T23:26:07.000+0800 [INFO]  provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:07 [DEBUG] qemu-agent used to query interface info: timestamp="2024-08-14T23:26:07.000+0800"
2024-08-14T23:26:07.001+0800 [INFO]  provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:07 [DEBUG] Interfaces info obtained with libvirt API:
([]libvirt.DomainInterface) <nil>: timestamp="2024-08-14T23:26:07.001+0800"
2024-08-14T23:26:07.001+0800 [INFO]  provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:07 [DEBUG] ifaces with addresses: []: timestamp="2024-08-14T23:26:07.001+0800"
2024-08-14T23:26:07.001+0800 [INFO]  provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:07 [DEBUG] 52:54:00:7E:A5:63 doesn't have IP address(es) yet...: timestamp="2024-08-14T23:26:07.001+0800"
2024-08-14T23:26:07.001+0800 [INFO]  provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:07 [DEBUG] IP address not found for iface=52:54:00:7E:A5:63: will try in a while: timestamp="2024-08-14T23:26:07.001+0800"
2024-08-14T23:26:07.001+0800 [INFO]  provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:07 [TRACE] Waiting 10s before next try: timestamp="2024-08-14T23:26:07.001+0800"
libvirt_domain.domain-ubuntu: Still creating... [40s elapsed]
2024-08-14T23:26:17.010+0800 [INFO]  provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:17 [DEBUG] waiting for network address for iface=52:54:00:7E:A5:63: timestamp="2024-08-14T23:26:17.010+0800"
2024-08-14T23:26:17.010+0800 [INFO]  provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:17 [DEBUG] qemu-agent used to query interface info: timestamp="2024-08-14T23:26:17.010+0800"
2024-08-14T23:26:17.013+0800 [INFO]  provider.terraform-provider-libvirt_v0.7.1: 2024/08/14 23:26:17 [DEBUG] Interfaces info obtained with libvirt API:
([]libvirt.DomainInterface) (len=2 cap=2) {

(Include debug logs if possible and relevant).

Related issues issue #1050 #1028 #1037


Additional information:

The environment is the same, only the provider version is different

@scabala
Copy link
Contributor

scabala commented Sep 2, 2024

Hello,
could you try to get an specify wait_For_lease using an image that already has qemu-guest-agent installed? I had successfully get IP address from VM when doing so.

@SJFCS
Copy link
Author

SJFCS commented Sep 3, 2024

Hello, could you try to get an specify wait_For_lease using an image that already has qemu-guest-agent installed? I had successfully get IP address from VM when doing so.

Thank you for the method you provided
I haven't tried to use an image with qemu-guest-agent already installed because I want qemu-guest-agent to be installed automatically during the cloudinit phase, which was possible in previous versions but will not work in the new version

@scabala
Copy link
Contributor

scabala commented Sep 3, 2024

I'll try to take a look and see if I can find anything changed that might cause it between those two versions.

@scabala
Copy link
Contributor

scabala commented Sep 18, 2024

I couldn't find anything particular between those versions. Also, I don't have bridged network in my setup and it's hard for me to create it so I used NAT-ed one and I couldn't reproduce it.

@SJFCS could you check if you can reproduce it in different network types? NAT-ed and routed for example?

EDIT: forget what I wrote, I can reproduce it, just used wrong image before 🤦

I'll try to bisect and see where problem lies

@scabala
Copy link
Contributor

scabala commented Sep 20, 2024

Okay, more debugging later: I cannot reproduce it - previously I had problems with cloud-init. I think it might be related to cloud-init itself rather than to provider.

Either way, I have consisten behavior between 0.7.6 and 0.7.1 - it's either failing if qemu-guest-agent is not installed and started or it is running fine otherwise.

@SJFCS
Copy link
Author

SJFCS commented Sep 21, 2024

I couldn't find anything particular between those versions. Also, I don't have bridged network in my setup and it's hard for me to create it so I used NAT-ed one and I couldn't reproduce it.

@SJFCS could you check if you can reproduce it in different network types? NAT-ed and routed for example?

EDIT: forget what I wrote, I can reproduce it, just used wrong image before 🤦

I'll try to bisect and see where problem lies

The network configuration is the same, I think it has nothing to do with this

@SJFCS
Copy link
Author

SJFCS commented Sep 21, 2024

Okay, more debugging later: I cannot reproduce it - previously I had problems with cloud-init. I think it might be related to cloud-init itself rather than to provider.

Either way, I have consisten behavior between 0.7.6 and 0.7.1 - it's either failing if qemu-guest-agent is not installed and started or it is running fine otherwise.

Okay, thanks for the troubleshooting, but I did only change the provider version number while keeping the configuration unchanged.

@scabala
Copy link
Contributor

scabala commented Sep 21, 2024

Do you have cloud-init logs for both scenarios?

@SJFCS
Copy link
Author

SJFCS commented Oct 28, 2024

Do you have cloud-init logs for both scenarios?

I have seen the logs in both cases, and they are normal and no errors are reported.

@SJFCS
Copy link
Author

SJFCS commented Nov 2, 2024

libv

This issue can be reproduced in versions greater than 0.7.1

│ Error: couldn't retrieve IP address of domain id: 3ac397de-13cd-485d-9772-872f7652de0d. Please check following: 
│ 1) is the domain running proplerly? 
│ 2) has the network interface an IP address? 
│ 3) Networking issues on your libvirt setup? 
│  4) is DHCP enabled on this Domain's network? 
│ 5) if you use bridge network, the domain should have the pkg qemu-agent installed 
│ IMPORTANT: This error is not a terraform libvirt-provider error, but an error caused by your KVM/libvirt infrastructure configuration/setup 
│  error retrieving interface addresses: error retrieving interface addresses: Virtual machine agent not responding: QEMU host agent not connected

I found that this is not related to whether the network mode is bridge or nat. To simplify the reproduction process and avoid cloudinit interference, I used the Talos ISO boot image below, which includes qemu-guest-agent and can be booted directly as a boot disk.

The metal-amd64.iso (MD5: ebd98e402606991700d8cb5545e72673) can be downloaded from: https://factory.talos.dev/image/ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515/v1.8.2/metal-amd64.iso

You can also build it yourself here: https://factory.talos.dev -> Bare-metal Machine -> choose version -> amd64 -> choose System Extensions qemu-guest-agent

#=====================================================================================
# Providers
#=====================================================================================
terraform {
  required_version = ">= 1.6.0"
  required_providers {
    libvirt = {
      source  = "dmacvicar/libvirt"
      version = "0.7.4"
    }
    template = {
      source  = "hashicorp/template"
      version = "2.2.0"
    }
  }
}

provider "libvirt" {
  uri = "qemu:///system"
}

#=====================================================================================
# Libvirt Pool
#=====================================================================================
resource "libvirt_pool" "kubernetes" {
  name = "talos"
  type = "dir"
  path = "/opt/libvirt-pool/talos"
}

#=====================================================================================
# Network
#=====================================================================================
# resource "libvirt_network" "talos" {
#   name      = "talos"
#   mode      = "bridge"
#   bridge    = "br0" # Use the created bridge network card
#   autostart = true
# }
resource "libvirt_network" "talos" {
  name      = "talos"
  mode      = "nat"
  addresses = ["192.168.123.0/24"]
  autostart = true
}
#=====================================================================================
# Domain
#=====================================================================================
resource "libvirt_domain" "domain-talos" {
  name   = "talos"
  memory = "2048"
  vcpu   = 4
  cpu {
    mode = "host-passthrough"
  }

  qemu_agent = true

  boot_device {
    dev = ["cdrom", "hd", "network"]
  }
  network_interface {
    network_id     = libvirt_network.talos.id
    wait_for_lease = true
  }

  # cdrom
  disk {
    file = "/home/admin/Downloads/images/metal-amd64.iso"
  }
  #=====================================================================================
  # Console
  #=====================================================================================
  console {
    type        = "pty"
    target_port = "0"
    target_type = "serial"
  }

  console {
    type        = "pty"
    target_type = "virtio"
    target_port = "1"
  }

  graphics {
    type        = "spice"
    listen_type = "address"
    autoport    = true
  }
  video {
    type = "virtio"
  }
}

# Output the IP addresses
output "ips" {
  value = {
    ip = libvirt_domain.domain-talos.network_interface[0].addresses
  }
}

Reproduction steps

#     set  version = "0.7.1"
terraform init
terraform apply -auto-approve
terraform destroy -auto-approve
# it work !
#     set  version = "0.7.4"
terraform init -upgrade
terraform apply -auto-approve
# it err !

@SJFCS SJFCS changed the title "wait_for_lease = true" does not take effect "wait_for_lease = true" is broken in versions greater than 0.7.1 Nov 2, 2024
@NamelessOne91
Copy link

I wanted to have a look at this issue, but it seems I can reproduce it only with version 0.7.4

Versions 0.7.1, 0.7.6 and 0.8.1 are working fine for me.
I pretty much copy-pasted your tf file in the previous comment, minus the template provider.

@SJFCS
Copy link
Author

SJFCS commented Nov 28, 2024

I wanted to have a look at this issue, but it seems I can reproduce it only with version 0.7.4

Versions 0.7.1, 0.7.6 and 0.8.1 are working fine for me. I pretty much copy-pasted your tf file in the previous comment, minus the template provider.

i try it ,on 0.7.6 and 0.8.1 is not working too. ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants