Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid override of device's values on primary ip creation and update #610

Conversation

thibaultbustarret-ovhcloud
Copy link
Contributor

@thibaultbustarret-ovhcloud thibaultbustarret-ovhcloud commented Jun 12, 2024

The netbox_device_primary_ip was badly implemented, by using a device update call assigning each parameter of a device one by one, even if it needs to add only two parameters. Thus if one adds a new parameter to the netbox_device resource (e.g. #604), the parameter should also be added to the netbox_device_primary_ip, even though there is no logical link between the two.

To fix this, I simply changed from an update to a partial update call, to avoid overriding any data.

@thibaultbustarret-ovhcloud
Copy link
Contributor Author

Not sure why the checks aren't successful, as my tests passed without errors

@fbreckle
Copy link
Collaborator

fbreckle commented Jun 13, 2024

From my experience I can say that the partial update does not behave differently to the normal (non-partial) update.

@thibaultbustarret-ovhcloud
Copy link
Contributor Author

thibaultbustarret-ovhcloud commented Jun 13, 2024

Well in this case it does seem to : I just tested it separately by creating a Device and assigning it a config template, and then running a partial update followed by a normal update. You can see in the output that the config template gets cleared out after running the normal update.

func readDevice(c *client.NetBoxAPI, deviceID int64) *dcim.DcimDevicesReadOK {
	params := dcim.NewDcimDevicesReadParams().WithID(deviceID)

	resRead, errRead := c.Dcim.DcimDevicesRead(params, nil)
	if errRead != nil {
		if errresp, ok := errRead.(*dcim.DcimDevicesReadDefault); ok {
			errorcode := errresp.Code()
			if errorcode == 404 {
				log.Fatalf("Device not found: %v", errRead)
			}
		}
		log.Fatalf("Cannot read device: %v", errRead)
	}

	return resRead
}

func main() {
	token := "token123"

	netboxHost := "localhost:8000"
	transport := httptransport.New(netboxHost, client.DefaultBasePath, []string{"http"})
	transport.DefaultAuthentication = httptransport.APIKeyAuth("Authorization", "header", "Token "+token)

	c := client.New(transport, nil)

	testString := "test"
	testTags := []*models.NestedTag{}

	// Create a Manufacturer
	reqCreateManufacturer := dcim.NewDcimManufacturersCreateParams().WithData(&models.Manufacturer{
		Name: &testString,
		Slug: &testString,
		Tags: testTags,
	})
	resManufacturer, errManufacturer := c.Dcim.DcimManufacturersCreate(reqCreateManufacturer, nil)
	if errManufacturer != nil {
		log.Error(errManufacturer.Error())
		return
	}

	// Create a Device Type
	reqCreateDeviceType := dcim.NewDcimDeviceTypesCreateParams().WithData(&models.WritableDeviceType{
		Manufacturer: &resManufacturer.GetPayload().ID,
		Model:        &testString,
		Slug:         &testString,
		Tags:         testTags,
	})
	resDeviceType, errDeviceType := c.Dcim.DcimDeviceTypesCreate(reqCreateDeviceType, nil)
	if errDeviceType != nil {
		log.Error(errDeviceType.Error())
		return
	}

	// Create a Device Role
	reqCreateRole := dcim.NewDcimDeviceRolesCreateParams().WithData(&models.DeviceRole{
		Color: "ff00ff",
		Name:  &testString,
		Slug:  &testString,
		Tags:  testTags,
	})
	resRole, errRole := c.Dcim.DcimDeviceRolesCreate(reqCreateRole, nil)
	if errRole != nil {
		log.Error(errRole.Error())
		return
	}

	// Create a Site
	reqCreateSite := dcim.NewDcimSitesCreateParams().WithData(&models.WritableSite{
		Name: &testString,
		Slug: &testString,
		Tags: testTags,
		Asns: []int64{},
	})
	resSite, errSite := c.Dcim.DcimSitesCreate(reqCreateSite, nil)
	if errSite != nil {
		log.Error(errSite.Error())
		return
	}

	// Create a Config Template
	reqCreateConfigTemplate := extras.NewExtrasConfigTemplatesCreateParams().WithData(&models.WritableConfigTemplate{
		Name:         &testString,
		TemplateCode: &testString,
		Tags:         testTags,
	})
	resConfigTemplate, errConfigTemplate := c.Extras.ExtrasConfigTemplatesCreate(reqCreateConfigTemplate, nil)
	if errConfigTemplate != nil {
		log.Error(errConfigTemplate.Error())
		return
	}

	// Create a Device
	reqCreateDevice := dcim.NewDcimDevicesCreateParams().WithData(&models.WritableDeviceWithConfigContext{
		Name:           &testString,
		Description:    "Watch Me",
		DeviceType:     &resDeviceType.GetPayload().ID,
		Role:           &resRole.GetPayload().ID,
		Site:           &resSite.GetPayload().ID,
		Tags:           testTags,
		ConfigTemplate: &resConfigTemplate.GetPayload().ID,
	})
	res, err := c.Dcim.DcimDevicesCreate(reqCreateDevice, nil)
	if err != nil {
		log.Error(err.Error())
		return
	}

	id := res.GetPayload().ID

	// Define changes to apply
	data := models.WritableDeviceWithConfigContext{
		Name:       reqCreateDevice.Data.Name,
		DeviceType: reqCreateDevice.Data.DeviceType,
		Role:       reqCreateDevice.Data.Role,
		Site:       reqCreateDevice.Data.Site,
		Comments:   "A simple comment",
		Tags:       testTags,
	}

	// (Partially) Update the Device
	paramsPartialUpdate := dcim.NewDcimDevicesPartialUpdateParams().WithID(id).WithData(&data)
	_, errPartialUpdate := c.Dcim.DcimDevicesPartialUpdate(paramsPartialUpdate, nil)
	if errPartialUpdate != nil {
		log.Error(errPartialUpdate.Error())
		return
	}

	log.Infof("id: %d", id)
	// Read the (partially) updated Device
	resRead := readDevice(c, id)

	// Print the updated Device
	log.Infof("name: %v", *resRead.GetPayload().Name)
	log.Infof("description: %v", resRead.GetPayload().Description)
	log.Infof("comments: %v", resRead.GetPayload().Comments)
	log.Infof("config template: %v", resRead.GetPayload().ConfigTemplate)

	data.Comments = "An updated comment"

	// Update the Device
	paramsUpdate := dcim.NewDcimDevicesUpdateParams().WithID(id).WithData(&data)
	_, errUpdate := c.Dcim.DcimDevicesUpdate(paramsUpdate, nil)
	if errUpdate != nil {
		log.Error(errUpdate.Error())
		return
	}

	// Read the updated Device
	resRead2 := readDevice(c, id)

	// Print the updated Device
	log.Infof("name: %v", *resRead2.GetPayload().Name)
	log.Infof("description: %v", resRead2.GetPayload().Description)
	log.Infof("comments: %v", resRead2.GetPayload().Comments)
	log.Infof("config template: %v", resRead2.GetPayload().ConfigTemplate)
}

Here is the output, note that INFO[0001] occurs after the normal update.

INFO[0000] id: 17                                       
INFO[0000] name: test                                   
INFO[0000] description: Watch Me                        
INFO[0000] comments: A simple comment                   
INFO[0000] config template: &{test 14 0xc0004329e0 http://localhost:8000/api/extras/config-templates/14/} 
INFO[0001] name: test                                   
INFO[0001] description: Watch Me                        
INFO[0001] comments: An updated comment                 
INFO[0001] config template: <nil>

@ad8lmondy
Copy link
Contributor

I added some tests in #609 to test this exact problem - though my PR does not really fix the problem either :(

@thibaultbustarret-ovhcloud
Copy link
Contributor Author

@fbreckle Is this PR good to go, or are there any changes or tests you'd like me to do?

@ad8lmondy
Copy link
Contributor

Hey,

This implementation suffers from the same issue my PR does. If you add this test (sorry it's so long):

func TestAccNetboxDevicePrimaryIP4_removePrimary(t *testing.T) {
	testSlug := "pr_ip_removePrimary"
	testName := testAccGetTestName(testSlug)
	resource.ParallelTest(t, resource.TestCase{
		PreCheck:  func() { testAccPreCheck(t) },
		Providers: testAccProviders,
		Steps: []resource.TestStep{
			{
				Config: testAccNetboxDevicePrimaryIPFullDependencies(testName) + fmt.Sprintf(`
resource "netbox_device" "test2" {
  name = "%[1]s"
  role_id = netbox_device_role.test.id
  site_id = netbox_site.test.id
  device_type_id = netbox_device_type.test.id
  cluster_id = netbox_cluster.test.id
  platform_id = netbox_platform.test.id
  location_id = netbox_location.test.id
  comments = "thisisacomment"
  status = "planned"
  rack_id = netbox_rack.test.id
  rack_face = "front"
  rack_position = 11

  tags = [netbox_tag.test.name]
}

resource "netbox_device_interface" "test2" {
  device_id = netbox_device.test2.id
  name = "%[1]s"
  type = "1000base-t"
}

resource "netbox_ip_address" "test_v4_2" {
  ip_address = "1.1.1.12/32"
  status = "active"
  interface_id = netbox_device_interface.test2.id
  object_type = "dcim.interface"
}

resource "netbox_device_primary_ip" "test_v4_2" {
  device_id = netbox_device.test2.id
  ip_address_id = netbox_ip_address.test_v4_2.id
}`, testName),
			},
			// A repeated second step is required, so that the resource "netbox_device" "test2" goes through a resourceNetboxDeviceRead cycle
			// This is needed because adding a netbox_device_primary_ip updates the netbox_device
			{
				Config: testAccNetboxDevicePrimaryIPFullDependencies(testName) + fmt.Sprintf(`
resource "netbox_device" "test2" {
  name = "%[1]s"
  role_id = netbox_device_role.test.id
  site_id = netbox_site.test.id
  device_type_id = netbox_device_type.test.id
  cluster_id = netbox_cluster.test.id
  platform_id = netbox_platform.test.id
  location_id = netbox_location.test.id
  comments = "thisisacomment"
  status = "planned"
  rack_id = netbox_rack.test.id
  rack_face = "front"
  rack_position = 11

  tags = [netbox_tag.test.name]
}

resource "netbox_device_interface" "test2" {
  device_id = netbox_device.test2.id
  name = "%[1]s"
  type = "1000base-t"
}

resource "netbox_ip_address" "test_v4_2" {
  ip_address = "1.1.1.12/32"
  status = "active"
  interface_id = netbox_device_interface.test2.id
  object_type = "dcim.interface"
}

resource "netbox_device_primary_ip" "test_v4_2" {
  device_id = netbox_device.test2.id
  ip_address_id = netbox_ip_address.test_v4_2.id
}`, testName),
				Check: resource.ComposeTestCheckFunc(
					resource.TestCheckResourceAttrPair("netbox_device_primary_ip.test_v4_2", "device_id", "netbox_device.test2", "id"),
					resource.TestCheckResourceAttrPair("netbox_device_primary_ip.test_v4_2", "ip_address_id", "netbox_ip_address.test_v4_2", "id"),
				),
			},
			// Now we do 2 things: modify netbox_device.test2 (changing the comment value), AND we remove the IP and primary IP
			// This fails with:
			//        Error: [PUT /dcim/devices/{id}/][400] dcim_devices_update default {"primary_ip4":["Related object not found using the provided numeric ID: 14"]}
			// because (I think) that the device is doing 1) a read of the current state, 2) the deletion of the primary IP then modifies the device, 3) the device then tries to write its changes, but its now out of date
			{
				Config: testAccNetboxDevicePrimaryIPFullDependencies(testName) + fmt.Sprintf(`
resource "netbox_device" "test2" {
  name = "%[1]s"
  role_id = netbox_device_role.test.id
  site_id = netbox_site.test.id
  device_type_id = netbox_device_type.test.id
  cluster_id = netbox_cluster.test.id
  platform_id = netbox_platform.test.id
  location_id = netbox_location.test.id
  comments = "thisisacomment with changes"
  status = "planned"
  rack_id = netbox_rack.test.id
  rack_face = "front"
  rack_position = 11

  tags = [netbox_tag.test.name]
}

resource "netbox_device_interface" "test2" {
  device_id = netbox_device.test2.id
  name = "%[1]s"
  type = "1000base-t"
}
`, testName),
				Check: resource.ComposeTestCheckFunc(
					resource.TestCheckResourceAttr("netbox_device.test2", "name", testName),
					resource.TestCheckResourceAttr("netbox_device.test2", "primary_ipv4", "0"),
					resource.TestCheckResourceAttr("netbox_device.test2", "tags.#", "1"),
					resource.TestCheckResourceAttr("netbox_device.test2", "tags.0", testName),
					resource.TestCheckResourceAttr("netbox_device.test2", "status", "planned"),
				),
			},
		},
	})
}

it fails.

Essentially all the test does is: in a single terraform apply, both: remove the primary_ip and IP, and update the device itself.

When you try to do this, it says:

        Error: [PUT /dcim/devices/{id}/][400] dcim_devices_update default {"primary_ip4":["Related object not found using the provided numeric ID: 14"]}

Which I think is a race condition, where the device does a Read, then the primary IP is deleted, which modifies the device, and then when the device does the Write, it has out of date info for the primary_ip.

I believe this is because the resource_netbox_device.go resourceNetboxDeviceUpdate function also does not use PATCH, but instead rebuilds the entire datastructure.

I'm having the same issue with primary_ips for virtual machines, but now also with virtual disks - since the total disk size is calculated on the VM itself, but if you add/remove a disk and modify the VM at the same time, the same race happens :(

I think having all Update functions use PATCH would likely solve the issue, but that seems like a big task.

@thibaultbustarret-ovhcloud
Copy link
Contributor Author

Wow, thank you for the detailed explanation!

I investigated the behavior of the Update method for the Device resource and observed an interesting phenomenon. When I updated a device with a primary IP address without providing the primary IP ID as a parameter, the primary IP address remained unchanged, even though the update uses PUT instead of a PATCH.

Here's the code snippet that demonstrates this (full source code here : https://gist.github.com/thibaultbustarret-ovhcloud/9c2b41362edc89ccf227ce42f9d11154).

primaryIpUpdate := dcim.NewDcimDevicesUpdateParams().WithID(id).WithData(&models.WritableDeviceWithConfigContext{
		Name:           &testString,
		Description:    "Watch Me",
		DeviceType:     &resDeviceType.GetPayload().ID,
		Role:           &resRole.GetPayload().ID,
		Site:           &resSite.GetPayload().ID,
		Tags:           testTags,
		PrimaryIp4:     &resIPAddress.GetPayload().ID,
		ConfigTemplate: &resConfigTemplate.GetPayload().ID,
	})
	_, primaryIpUpdateErr := c.Dcim.DcimDevicesUpdate(primaryIpUpdate, nil)
	// [...]

	// Define changes to apply
	data := models.WritableDeviceWithConfigContext{
		Name:       reqCreateDevice.Data.Name,
		DeviceType: reqCreateDevice.Data.DeviceType,
		Role:       reqCreateDevice.Data.Role,
		Site:       reqCreateDevice.Data.Site,
		Comments:   "A simple comment",
		Tags:       testTags,
		CustomFields: map[string]int{
			testString: 100,
		},
	}

	// (Partially) Update the Device
	paramsPartialUpdate := dcim.NewDcimDevicesPartialUpdateParams().WithID(id).WithData(&data)
	_, errPartialUpdate := c.Dcim.DcimDevicesPartialUpdate(paramsPartialUpdate, nil)
	// [...]

	log.Infof("id: %d", id)
	// Read the (partially) updated Device
	resRead := readDevice(c, id)

	// Print the updated Device
	log.Infof("name: %v", *resRead.GetPayload().Name)
	log.Infof("description: %v", resRead.GetPayload().Description)
	log.Infof("comments: %v", resRead.GetPayload().Comments)
	log.Infof("config template: %v", resRead.GetPayload().ConfigTemplate)
	log.Infof("primary ip4: %v", resRead.GetPayload().PrimaryIp4)

	data.Comments = "An updated comment"

	// Update the Device
	paramsUpdate := dcim.NewDcimDevicesUpdateParams().WithID(id).WithData(&data)
	_, errUpdate := c.Dcim.DcimDevicesUpdate(paramsUpdate, nil)
	if errUpdate != nil {
		log.Error(errUpdate.Error())
		return
	}

	// Read the updated Device
	resRead2 := readDevice(c, id)

	// Print the updated Device
	log.Infof("name: %v", *resRead2.GetPayload().Name)
	log.Infof("description: %v", resRead2.GetPayload().Description)
	log.Infof("comments: %v", resRead2.GetPayload().Comments)
	log.Infof("config template: %v", resRead2.GetPayload().ConfigTemplate)
	log.Infof("primary ip4: %v", resRead.GetPayload().PrimaryIp4)
}

The results show that the primary_ip4 field is not affected by the update:

INFO[0001] id: 43               
                        
INFO[0001] name: test                                   
INFO[0001] description: Watch Me                        
INFO[0001] comments: A simple comment                   
INFO[0001] config template: &{test 46 0xc00042e430 http://localhost:8085/api/extras/config-templates/46/} 
INFO[0001] primary ip4: &{0xc00042e3d0 1.1.1.12/32 4 138 http://localhost:8085/api/ipam/ip-addresses/138/} 

INFO[0002] name: test                                   
INFO[0002] description: Watch Me                        
INFO[0002] comments: An updated comment                 
INFO[0002] config template: <nil>                       
INFO[0002] primary ip4: &{0xc00042e3d0 1.1.1.12/32 4 138 http://localhost:8085/api/ipam/ip-addresses/138/}

The root cause of our issue lies in the fact that primary IPs are managed by both netbox_device and netbox_device_primary_ip resources. To address this, we can simply change the resourceNetboxDeviceUpdate function by removing the update of primary IPs, as it's not necessary and eliminates the race condition.

I'm not sure if that solution would work for virtual disks too, but at least it's working well for that specific case.

Additionally, I've included your test, along with an extra test to verify that updating a device won't inadvertently clear its primary IP

@ad8lmondy
Copy link
Contributor

ad8lmondy commented Jun 28, 2024

Alas, I tried the same thing, but it didn't work :(
#609 (comment)

I found that over time my VMs would slowly loose their primary IPs without that code in place. I can't say I can fully explain why though - but I assume it was when an update would happen to the VM without an update to the primary_IP, and so it would get pulled out.

But if this works, maybe I can try and adapt it to my case too!

@thibaultbustarret-ovhcloud
Copy link
Contributor Author

Oops sorry, I read your PR two weeks ago, I forgot that you already tried this approach.

I didn't manage to reproduce this issue on devices, what changes did you made to the devices before the primary IPs disappeared ? Perhaps there's a specific scenario that I haven't tested thoroughly enough, or maybe devices are simply not affected by it.

@ad8lmondy
Copy link
Contributor

ad8lmondy commented Jul 2, 2024

Hmm, I'm not sure sorry, I've had to move on a little bit.

I was trying to setup a module, where I would pass in info about a VM, and make a netbox VM (and the various accoutrement, like disks, network interfaces, IPs, primary IPs, etc.).

So during testing, I was moving/removing the VM between different prefixes all the time. That meant the VM itself, the interfaces, and the IP stuff would all be changing all the time; sometimes all together, sometimes just a few bits.
That is when I noticed VMs losing their primary IPs.

Anyway, that's not particularly helpful, sorry - I'm sure if you test something like:

  • modifying the device
  • modifying the device and the interface
  • modifying the device and the interface and the IP
  • modifying the device and the interface and the IP and the primary IP
    and maybe doing the same thing with deleting some bits too, would be more than enough to know if devices are OK.

@thibaultbustarret-ovhcloud
Copy link
Contributor Author

Thank you for the answer !
No worries, I've indeed tested all of these modifications on my infra and I didn't notice any problem.

@fbreckle do you want me to add all of these to the primary_ip tests, or is the existing one good enough ?

@fbreckle
Copy link
Collaborator

fbreckle commented Jul 3, 2024

I feel that we should test this as vigorously as possible, so yeah, please.

@thibaultbustarret-ovhcloud
Copy link
Contributor Author

I feel that we should test this as vigorously as possible, so yeah, please.

I didn't have much time in the last few days, but here are the tests !

Btw, these tests highlights the fact that having primary_ipv4 and primary_ipv6 fields on a device even though these are managed by the device_primary_ip resource may lead to data in state that is not always up to date, this is probably out of the scope of this PR though.

@thibaultbustarret-ovhcloud
Copy link
Contributor Author

@fbreckle Is this PR good to go ?

@fbreckle fbreckle merged commit 9e7cbea into e-breuninger:master Jul 30, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants