Terraform crashed while creating load balancers with protocols #295

sshingarapu · 2018-06-26T11:30:37Z

Hi there,

Affected Resource(s)

ibm_lbaas

Panic Output

https://gist.github.com/sshingarapu/a9af5f01222d3ecda466e33a946c932e

We are seeing a terraform crash issue (panic output) when sending protocols from modules to LB resources and attaching instances to LB. This has been tested with the latest given terraform ibm provider.( v0.10.0)

Tested without instance attachment and even seeing the terraform crash errors.

module “instance” {
// module for instance creation
}

module " infranodes01ragsr01-AppExt " {
source = "lbaas"
protocols = [{
frontend_protocol = "HTTPS"
frontend_port = 443
backend_protocol = "HTTP"
backend_port = 443
load_balancing_method = "round_robin"
tls_certificate_id = 182195
},
{
frontend_protocol = "HTTP"
frontend_port = 80
backend_protocol = "HTTP"
backend_port = 80
load_balancing_method = "round_robin"
},
]
}

module " infranodes01ragsr01-AppInt " {
source = "lbaas"
protocols = [{
frontend_protocol = "TCP"
frontend_port = 80
backend_protocol = "TCP"
backend_port = 80
load_balancing_method = "round_robin"
},
]
}

lbaas/main.tf
resource "ibm_lbaas" "lbaas" {
name = "terraformLBExample"
description = "lbaas example"
subnets = ["1511875"]
protocols = ["${var.protocols}"]
}
resource "ibm_lbaas_server_instance_attachment" "server_attach" {
count = "${var.count}" // number of instances to attach to LB
private_ip_address = "${element(var.private_ip_address,count.index)}" // ipaddresses of instances
lbaas_id = "${ibm_lbaas.lbaas.id}"
}

Praveengostu · 2018-06-26T11:38:23Z

The issue is fixed part of #286 and will be available in next release. You can get the latest code from public and build it for the provider plugin with fix.

sshingarapu · 2018-06-28T13:13:38Z

@Praveengostu, I have build the provider plugin from public git repository and tried the LB creation with protocols as above. Now I don't see crash issue but the load balancers are not getting created. It is taking more than 1 hr and finally throwing the below error and also i cannot destroy them.

ibm_lbaas.lbaas: Error during creation of Load balancer: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATE_PENDING', timeout: 1h30m0s)
module.infranodes01ragsr01-AppInt.ibm_lbaas.lbaas: 1 error(s) occurred:
ibm_lbaas.lbaas: Error during creation of Load balancer: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATE_PENDING', timeout: 1h30m0s)

Praveengostu · 2018-06-28T14:38:57Z

@sshingarapu The default time out is 90m. So it would have taken more than 90m. Usually it creates in less 30m. Pls let us know if it consistently occurs, so that we will take it with the API team

sshingarapu · 2018-06-28T14:44:45Z

@Praveengostu , Yes. it is happening consistently. Ideally it should not take more than 3-4 mins but taking maximum default time out.

Praveengostu · 2018-06-28T17:01:27Z

@sshingarapu Could you please share the loadbalancer name and your account number to check this further

sshingarapu · 2018-06-29T10:31:00Z

@Praveengostu I have tried load balancers with name Test3 and Test4. Now they appeared as online but i got the error when running terraform apply and they were offline for long time.
Account number: [email protected]

sshingarapu · 2018-06-29T10:36:11Z

@Praveengostu And also health checks are not defined in the load balancers.

Praveengostu · 2018-06-29T11:31:55Z

Thanks @sshingarapu For healthchecks pls check the resource ibm_lbaas_health_monitor https://ibm-cloud.github.io/tf-ibm-docs/v0.10.0/r/lbaas_health_monitor.html

sshingarapu · 2018-06-29T12:38:11Z

@Praveengostu I have configured health monitors in terraform file but they did not get created. This issue (health monitors are not getting created) has happened only one time and i think this is because of creation of load balancers are taking time.

Praveengostu · 2018-07-02T11:11:23Z

@sshingarapu Share the details with the API team.. Will get back to you once we get update from them

Praveengostu · 2018-07-03T16:30:35Z

@sshingarapu The issue is fixed by the API team.. Please let us know if you see the issue again.

sshingarapu · 2018-07-03T17:17:39Z

@Praveengostu Thanks for the fix. I will build the provider again and let you know the status.

sshingarapu · 2018-07-03T18:38:59Z

@Praveengostu I have tested it with new build. Couple of issues are observed. Load balancers are getting created but the health monitors are not able to add to the load balancers. I am getting the below error.

ibm_lbaas_health_monitor.lbaas_hm: Error adding health monitors: sl.Error{StatusCode:500, Exception:"SoftLayer_Exception_Network_LBaaS_ObjectInInvalidState", Message:"Load balancer uuid=d5437d68-d7ac-4208-93e8-ef2232cc73ee cannot be updated. The object is in state UPDATE_PENDING.", Wrapped:error(nil)}
module.Test3.ibm_lbaas_health_monitor.lbaas_hm: 1 error(s) occurred:
ibm_lbaas_health_monitor.lbaas_hm: Error adding health monitors: sl.Error{StatusCode:500, Exception:"SoftLayer_Exception_Network_LBaaS_ObjectInInvalidState", Message:"Load balancer uuid=e07b1da6-63cc-4c55-ace2-11f9ebe46de8 cannot be updated. The object is in state UPDATE_PENDING.", Wrapped:error(nil)}

Below is my load balancer main.tf

resource "ibm_lbaas" "lbaas" {
name = "${var.name}"
subnets = ["1629415"]
type = "${var.type}"
protocols = ["${var.protocols}"]
}
resource "ibm_lbaas_server_instance_attachment" "server_attach" {
count = "${var.count}"
private_ip_address = "${element(var.private_ip_address,count.index)}"
lbaas_id = "${ibm_lbaas.lbaas.id}"
}

resource "ibm_lbaas_health_monitor" "lbaas_hm" {
protocol = "${ibm_lbaas.lbaas.health_monitors.0.protocol}"
port = "${ibm_lbaas.lbaas.health_monitors.0.port}"
timeout = 3
interval = 5
max_retries = 6
url_path = "/"
lbaas_id = "${ibm_lbaas.lbaas.id}"
monitor_id = "${ibm_lbaas.lbaas.health_monitors.0.monitor_id}"
}

Praveengostu · 2018-07-04T04:30:22Z

The server_attach and lbaas_hm cannot run in parallel as they will change the status of LB. you can add depends_on to handle this

resource "ibm_lbaas" "lbaas" {
name = "${var.name}"
subnets = ["1629415"]
type = "${var.type}"
protocols = ["${var.protocols}"]
}
resource "ibm_lbaas_server_instance_attachment" "server_attach" {
count = "${var.count}"
private_ip_address = "${element(var.private_ip_address,count.index)}"
lbaas_id = "${ibm_lbaas.lbaas.id}"
}
resource "ibm_lbaas_health_monitor" "lbaas_hm" {
protocol = "${ibm_lbaas.lbaas.health_monitors.0.protocol}"
port = "${ibm_lbaas.lbaas.health_monitors.0.port}"
timeout = 3
interval = 5
max_retries = 6
url_path = "/"
lbaas_id = "${ibm_lbaas.lbaas.id}"
monitor_id = "${ibm_lbaas.lbaas.health_monitors.0.monitor_id}"
depends_on = ["ibm_lbaas_server_instance_attachment.server_attach"]
}

sshingarapu · 2018-07-04T12:54:28Z

@Praveengostu It worked. But the url_path which is given in above is "/" is not updated in health checks in LB and even i am unable to give manually in the console. Currently there is no value for PATH.

sshingarapu · 2018-07-05T08:54:14Z

@Praveengostu One observation is the url_path in helath checks is not getting updated when the protocol is TCP.

Praveengostu · 2018-07-05T09:14:35Z

@sshingarapu There will not be url_path in health checks when the protocol is tcp. Could you please help us with understanding your use case with usage of cloud components.

sshingarapu · 2018-07-06T07:17:10Z

@Praveengostu We are building openshift environment on bluemix virtual machines. Creating external and internal load balancers with health checks for monitoring purpose. In the load balancers currently we use the protocol TCP. Basically we would like to monitor the health of our environment by defining the health monitors. If TCP has no PATH url then what is the default path will be used for health checks.

sakshiag · 2018-07-06T08:26:04Z

The health checks against HTTP and TCP ports are conducted as follows:

HTTP: An HTTP GET request against a pre-specified URL is sent to the back-end server port. The server port is marked healthy upon receiving a 200 OK response. The default GET URL is “/” via the GUI, and it can be customized.
TCP: The Load Balancer attempts to open a TCP connection with the back-end server on a specified TCP port. The server port is marked healthy if the connection attempt is successful, and the connection is then closed.

You can refer this doc for more details : https://console.bluemix.net/docs/infrastructure/loadbalancer-service/health-checks.html#health-checks

From your previous comment , I see you are trying to setup openshift on bluemix VM. We have also tried doing same with the basic architecture(https://github.com/IBM-Cloud/terraform-ibm-openshift/) using virtual machines and security groups. Can you help us in understanding the approach which you are following and architecture which you are trying to deploy ?

sshingarapu · 2018-07-10T10:21:13Z

Thanks @sakshiag, @Praveengostu for your help on this.
We still have an issue with load balancers, where we see the following error when destroying. Upon issuing another destroy the resource eventually gets deleted.
* ibm_lbaas_server_instance_attachment.server_attach.0: Error removing server instances: sl.Error{StatusCode:500, Exception:"SoftLayer_Exception_Network_LBaaS_ObjectInInvalidState", Message:"Load balancer uuid=3d658488-12c1-43da-89f4-dfdf700a6697 cannot be updated. The object is in state UPDATE_PENDING.", Wrapped:error(nil)}

Also there is an intermittent issue with LB creation and server attachment with the following error
* module.masternodes01ragsr01-MasterInt.ibm_lbaas_server_instance_attachment.server_attach[0]: 1 error(s) occurred: * ibm_lbaas_server_instance_attachment.server_attach.0: Error adding server instances: sl.Error{StatusCode:500, Exception:"SoftLayer_Exception_Network_LBaaS_ObjectInInvalidState", Message:"Load balancer uuid=13d28131-8176-426a-bf41-30f26a7e2660 cannot be updated. The object is in state UPDATE_PENDING.", Wrapped:error(nil)}

Our architecture is pretty much the same except for the fact that we have multiple masters which are behind a load balancer for obvious reasons. Also, we have more SG's to support our various teams here.

Praveengostu · 2018-07-10T11:24:46Z

@sshingarapu Thanks for reporting the issue. Will check on this. I assume the issue is recreatable with the same config

resource "ibm_lbaas" "lbaas" {
name = "${var.name}"
subnets = ["1629415"]
type = "${var.type}"
protocols = ["${var.protocols}"]
}
resource "ibm_lbaas_server_instance_attachment" "server_attach" {
count = "${var.count}"
private_ip_address = "${element(var.private_ip_address,count.index)}"
lbaas_id = "${ibm_lbaas.lbaas.id}"
}
resource "ibm_lbaas_health_monitor" "lbaas_hm" {
protocol = "${ibm_lbaas.lbaas.health_monitors.0.protocol}"
port = "${ibm_lbaas.lbaas.health_monitors.0.port}"
timeout = 3
interval = 5
max_retries = 6
url_path = "/"
lbaas_id = "${ibm_lbaas.lbaas.id}"
monitor_id = "${ibm_lbaas.lbaas.health_monitors.0.monitor_id}"
depends_on = ["ibm_lbaas_server_instance_attachment.server_attach"]
}

sshingarapu · 2018-07-10T12:37:29Z

@Praveengostu Yes. In our case we have 6 load balancers and attaching 2-3 servers in each load balancer. I hope you can reproduce the issue with this configuration.

sshingarapu · 2018-07-13T13:20:57Z

@Praveengostu Could you please share an update on this? I am getting this issue frequently now.

Praveengostu · 2018-07-13T13:25:24Z

@sshingarapu Currently looking in to this. Could you please share me your configuration file. Pls enable the debug log by export TF_LOG=debug and share us the log next time you encounter the issue.

Praveengostu · 2018-07-13T18:21:49Z

@sshingarapu I could not recreate the issue with adding the depends_on between server_attachment and health monitors.. Here is the tf configuration main.tf(https://gist.github.com/Praveengostu/bd732a547b251c120e41dff9ef00366a) where it creates 6 lbaas with each 2 server attachments. Here is the terraform_output(https://gist.github.com/Praveengostu/6634847498fde1dae7e5a6ba82541364) which contains the o/p of apply, show and destroy. Could you please share your configuration and log to understand the issue.

sshingarapu · 2018-07-19T13:19:06Z

@Praveengostu I have included depends_on in server_attach resource and then i don't see this error frequently. But today i got it again but i missed to set debug. I will set debug in next time and give you the logs if it fails.

resource "ibm_lbaas_server_instance_attachment" "server_attach" {
count = "${var.count}"
private_ip_address = "${element(var.private_ip_address,count.index)}"
lbaas_id = "${ibm_lbaas.lbaas.id}"
depends_on = ["ibm_lbaas.lbaas"]
}

This is how we configure in our case.

main.tf:

We have module definition for each server(in total 10 servers) and for load balancers (in total 6 load balancers) with different parameters like below
module "instance1" {
source = "./modules/casaas-terraform-modules/casaas-bmx-instance"
private_network_only = "false"
hostname = "instance1"
}

module "instance2" {
source = "./modules/casaas-terraform-modules/casaas-bmx-instance"
private_network_only = "true"
hostname = "instance2"
}
// Load Balancers
module "infranodes01ragsr01-AppExt" {
source = "./modules/casaas-terraform-modules/casaas-bmx-lb"
name = "infranodes01ragsr01-AppExt"
type = "PUBLIC"
count = "2"
health_check_interval = "10"
health_check_path = "/healthz"
health_check_port = "1936"
health_check_timeout = "5"
health_check_protocol = "HTTP"
protocols = [

            {
            frontend_protocol     = "TCP"
            frontend_port         = 443
            backend_protocol      = "TCP"
            backend_port          = 443
            session_stickiness    = "SOURCE_IP"
            load_balancing_method = "round_robin"
            },

    ]

}

module "infranodes01ragsr01-AppInt" {
source = "./modules/casaas-terraform-modules/casaas-bmx-lb"
name = "infranodes01ragsr01-AppInt"
type = "PRIVATE"
count = "2"
health_check_interval = "10"
health_check_path = "/healthz"
health_check_port = "1936"
health_check_timeout = "5"
health_check_protocol = "HTTP"
protocols = [

            {
            frontend_protocol     = "TCP"
            frontend_port         = 443
            backend_protocol      = "TCP"
            backend_port          = 443
            session_stickiness    = "SOURCE_IP"
            load_balancing_method = "round_robin"
            },

    ]

}

Could you please try in this way and check if we can reproduce the issue.

sshingarapu · 2018-07-24T09:23:45Z

@Praveengostu We are getting the issue which we mention earlier while destroying. Here is the terraform ouput with DEBUG enabled https://gist.github.com/sshingarapu/e927f9b2882779e9c94781a8584db719.
I will let you know incase if i get the issue while applying.

Praveengostu · 2018-07-24T09:30:04Z

@sshingarapu I see the module module.masternodes01ragsr01-MasterExt.ibm_lbaas_server_instance_attachment.server_attach[2] fails to attempt to destroy the server attach as the load balancer state is pending. Mostly this occurs if the dependency is missing. Could you please share your configuration so that we can help you with a permanent resolution as we are not able to recreate this.

Praveengostu · 2018-07-24T09:33:15Z

@sshingarapu One more point is if there are multiple resources of ibm_lbaas_server_instance_attachment there should be a dependency mention between them as each of them changes the state of the Load balancer.

sshingarapu · 2018-07-24T09:50:16Z

@Praveengostu I have given the configuration details in my earlier comments. Please let me know if that is not enough to debug the issue.

All load balancer module definitions are like the below but with different values for name, type, count etc..

// Load Balancers
module "masternodes01ragsr01-MasterExt" {
source = "./modules/casaas-terraform-modules/casaas-bmx-lb"
name = "ragsr01-MasterExt"
lbaas_subnet = "${module.masternodes01.lbaas_subnet}"
type = "PUBLIC"
count = "3" // number of servers to attach
private_ip_address = "${module.masternodes01.private_ips}" // list of server ip's to attach in load balancer
health_check_interval = "30"
health_check_path = "/healthz"
health_check_port = "8443"
health_check_timeout = "10"
health_check_protocol = "HTTP"
protocols = [

            {
            frontend_protocol     = "TCP"
            frontend_port         = 443
            backend_protocol      = "TCP"
            backend_port          = 8443
            session_stickiness    = "SOURCE_IP"
            load_balancing_method = "round_robin"
            },
    ]

}

source:
resource "ibm_lbaas" "lbaas" {
name = "${var.name}"
subnets = ["1629415"]
type = "${var.type}"
protocols = ["${var.protocols}"]
}
resource "ibm_lbaas_server_instance_attachment" "server_attach" {
count = "${var.count}" // this values if provided in load balancer module definition to attach servers
private_ip_address = "${element(var.private_ip_address,count.index)}" // this value is provided in load balancer module definition
lbaas_id = "${ibm_lbaas.lbaas.id}"
depends_on = ["ibm_lbaas.lbaas"]
}
resource "ibm_lbaas_health_monitor" "lbaas_hm" {
protocol = "${ibm_lbaas.lbaas.health_monitors.0.protocol}"
port = "${ibm_lbaas.lbaas.health_monitors.0.port}"
timeout = 3
interval = 5
max_retries = 6
url_path = "/"
lbaas_id = "${ibm_lbaas.lbaas.id}"
monitor_id = "${ibm_lbaas.lbaas.health_monitors.0.monitor_id}"
depends_on = ["ibm_lbaas_server_instance_attachment.server_attach"]
}

Praveengostu · 2018-07-24T10:10:15Z

@sshingarapu Sure, Will check and get back to you.

hkantare · 2018-07-25T11:36:02Z

@sshingarapu Since we are using count in ibm_lbaas_server_instance_attachment they run in parallel and sometimes may be the two or more resources call the delete API at same time and fails with "UPDATE_PENDING" ...One solution to solve the issue by using parallelism
terraform destroy -parallelism=1 ..it destroys one by one.

sshingarapu · 2018-07-26T09:16:49Z

@Praveengostu Can we use parallelism while applying also? We are seeing the below intermittent issue while terraform apply. I have not seen this issue recently but i am sure that i will get this error again.

And, When we use parallelism, does all the resources will be created one by one? If yes then it may take upto 1 hr incase of spinning more VMs.(10-15)

module.masternodes01ragsr01-MasterInt.ibm_lbaas_server_instance_attachment.server_attach[0]: 1 error(s) occurred: * ibm_lbaas_server_instance_attachment.server_attach.0: Error adding server instances: sl.Error{StatusCode:500, Exception:"SoftLayer_Exception_Network_LBaaS_ObjectInInvalidState", Message:"Load balancer uuid=13d28131-8176-426a-bf41-30f26a7e2660 cannot be updated. The object is in state UPDATE_PENDING.", `Wrapped:error(nil)}

hkantare · 2018-07-26T09:27:18Z

Yes parallelism can be applied to plan & apply also. When you apply parallelism (1) then all resources will be created one by one.
Another approach is to break down terraform apply in to multiple steps

terraform apply -target=module.vms -target=modules.xxx (Create all resources which are not dependent on lbass and lbass without parallelism so they will run in parallel)
2)terraform apply -target=module.lbass -parallelism=1 (lbass resources will be created )
terraform apply

sshingarapu · 2018-07-27T19:38:28Z

I have tried destroy with parallelism and i don't see the load balancer issue but below is the error we always get while destroying. It says no rule with ID of 1733885 exists but it exists in terraform tfstate file.

Debug output updated at https://gist.github.com/sshingarapu/41baad2302d8a21a4d586424e59d6992

module.security_groups.ibm_security_group_rule.outbound_AlertLogicSecurityGroup_80[0] (destroy): 1 error(s) occurred:

ibm_security_group_rule.outbound_AlertLogicSecurityGroup_80.0: Error deleting Security Group Rule: SoftLayer_Exception_NotFound: No rule with ID of 1733885 exists for this security group (HTTP 500)

hkantare · 2018-07-31T08:32:06Z

@sshingarapu Thanks for testing with parallelism. Can you please close this issue and open a new issue to track the security groups and rules.Provide the sample configuration you are using for the security group and rules.

hkantare · 2018-12-20T07:40:32Z

Closing the issue . If issue still exists please reopen it.

ramba07 · 2019-02-11T12:57:17Z

I got an issue similar to the one posted above, in Bluemix cloud creating an LB and attaching the instances to it through Terraform:

ibm_lbaas_server_instance_attachment.server_attach.1: Error adding server instances: sl.Error{StatusCode:500, Exception:"SoftLayer_Exception_Network_LBaaS_ObjectInInvalidState", Message:"Load balancer uuid=44ffd0b1-2c89-40ad-ad10-ab9b593058ea cannot be updated. The object is in state UPDATE_PENDING.", Wrapped:error(nil)}
module.edgenodes01bmxtestrk-AppExt.ibm_lbaas_server_instance_attachment.server_attach[0]: 1 error(s) occurred:

Also the below error:

ibm_compute_vm_instance.instance: Error ordering virtual guest: SoftLayer_Exception_Public: A price (1639) for First Disk was submitted. This preset configuration does not allow modifications to First Disk. (HTTP 500).

Request any insight into this.

Thanks in advance.

hkantare closed this as completed Dec 20, 2018

faseyiks mentioned this issue Jan 16, 2021

terraform-provider-ibm breaks and cannot handle LB creation with count greater than 1 #2168

Closed

Terraform crashed while creating load balancers with protocols #295

Terraform crashed while creating load balancers with protocols #295

Comments

sshingarapu commented Jun 26, 2018

Affected Resource(s)

Panic Output

Praveengostu commented Jun 26, 2018

sshingarapu commented Jun 28, 2018

Praveengostu commented Jun 28, 2018

sshingarapu commented Jun 28, 2018

Praveengostu commented Jun 28, 2018

sshingarapu commented Jun 29, 2018

sshingarapu commented Jun 29, 2018

Praveengostu commented Jun 29, 2018

sshingarapu commented Jun 29, 2018 • edited Loading

Praveengostu commented Jul 2, 2018

Praveengostu commented Jul 3, 2018

sshingarapu commented Jul 3, 2018

sshingarapu commented Jul 3, 2018

Below is my load balancer main.tf

Praveengostu commented Jul 4, 2018

sshingarapu commented Jul 4, 2018 • edited Loading

sshingarapu commented Jul 5, 2018

Praveengostu commented Jul 5, 2018

sshingarapu commented Jul 6, 2018

sakshiag commented Jul 6, 2018

sshingarapu commented Jul 10, 2018 • edited Loading

Praveengostu commented Jul 10, 2018 • edited Loading

sshingarapu commented Jul 10, 2018

sshingarapu commented Jul 13, 2018

Praveengostu commented Jul 13, 2018

Praveengostu commented Jul 13, 2018 • edited Loading

sshingarapu commented Jul 19, 2018

sshingarapu commented Jul 24, 2018

Praveengostu commented Jul 24, 2018

Praveengostu commented Jul 24, 2018

sshingarapu commented Jul 24, 2018 • edited Loading

Praveengostu commented Jul 24, 2018

hkantare commented Jul 25, 2018

sshingarapu commented Jul 26, 2018

hkantare commented Jul 26, 2018

sshingarapu commented Jul 27, 2018

hkantare commented Jul 31, 2018

hkantare commented Dec 20, 2018

ramba07 commented Feb 11, 2019

sshingarapu commented Jun 29, 2018 •

edited

Loading

sshingarapu commented Jul 4, 2018 •

edited

Loading

sshingarapu commented Jul 10, 2018 •

edited

Loading

Praveengostu commented Jul 10, 2018 •

edited

Loading

Praveengostu commented Jul 13, 2018 •

edited

Loading

sshingarapu commented Jul 24, 2018 •

edited

Loading