Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform crashed while creating load balancers with protocols #295

Closed
sshingarapu opened this issue Jun 26, 2018 · 38 comments
Closed

Terraform crashed while creating load balancers with protocols #295

sshingarapu opened this issue Jun 26, 2018 · 38 comments

Comments

@sshingarapu
Copy link

Hi there,

Affected Resource(s)

  • ibm_lbaas

Panic Output

https://gist.github.com/sshingarapu/a9af5f01222d3ecda466e33a946c932e

We are seeing a terraform crash issue (panic output) when sending protocols from modules to LB resources and attaching instances to LB. This has been tested with the latest given terraform ibm provider.( v0.10.0)

Tested without instance attachment and even seeing the terraform crash errors.

module “instance” {
// module for instance creation
}

module " infranodes01ragsr01-AppExt " {
source = "lbaas"
protocols = [{
frontend_protocol = "HTTPS"
frontend_port = 443
backend_protocol = "HTTP"
backend_port = 443
load_balancing_method = "round_robin"
tls_certificate_id = 182195
},
{
frontend_protocol = "HTTP"
frontend_port = 80
backend_protocol = "HTTP"
backend_port = 80
load_balancing_method = "round_robin"
},
]
}

module " infranodes01ragsr01-AppInt " {
source = "lbaas"
protocols = [{
frontend_protocol = "TCP"
frontend_port = 80
backend_protocol = "TCP"
backend_port = 80
load_balancing_method = "round_robin"
},
]
}

lbaas/main.tf
resource "ibm_lbaas" "lbaas" {
name = "terraformLBExample"
description = "lbaas example"
subnets = ["1511875"]
protocols = ["${var.protocols}"]
}
resource "ibm_lbaas_server_instance_attachment" "server_attach" {
count = "${var.count}" // number of instances to attach to LB
private_ip_address = "${element(var.private_ip_address,count.index)}" // ipaddresses of instances
lbaas_id = "${ibm_lbaas.lbaas.id}"
}

@Praveengostu
Copy link
Collaborator

The issue is fixed part of #286 and will be available in next release. You can get the latest code from public and build it for the provider plugin with fix.

@sshingarapu
Copy link
Author

@Praveengostu, I have build the provider plugin from public git repository and tried the LB creation with protocols as above. Now I don't see crash issue but the load balancers are not getting created. It is taking more than 1 hr and finally throwing the below error and also i cannot destroy them.

  • ibm_lbaas.lbaas: Error during creation of Load balancer: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATE_PENDING', timeout: 1h30m0s)

  • module.infranodes01ragsr01-AppInt.ibm_lbaas.lbaas: 1 error(s) occurred:

  • ibm_lbaas.lbaas: Error during creation of Load balancer: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATE_PENDING', timeout: 1h30m0s)

@Praveengostu
Copy link
Collaborator

@sshingarapu The default time out is 90m. So it would have taken more than 90m. Usually it creates in less 30m. Pls let us know if it consistently occurs, so that we will take it with the API team

@sshingarapu
Copy link
Author

@Praveengostu , Yes. it is happening consistently. Ideally it should not take more than 3-4 mins but taking maximum default time out.

@Praveengostu
Copy link
Collaborator

@sshingarapu Could you please share the loadbalancer name and your account number to check this further

@sshingarapu
Copy link
Author

@Praveengostu I have tried load balancers with name Test3 and Test4. Now they appeared as online but i got the error when running terraform apply and they were offline for long time.
Account number: [email protected]

@sshingarapu
Copy link
Author

@Praveengostu And also health checks are not defined in the load balancers.

@Praveengostu
Copy link
Collaborator

Thanks @sshingarapu For healthchecks pls check the resource ibm_lbaas_health_monitor https://ibm-cloud.github.io/tf-ibm-docs/v0.10.0/r/lbaas_health_monitor.html

@sshingarapu
Copy link
Author

sshingarapu commented Jun 29, 2018

@Praveengostu I have configured health monitors in terraform file but they did not get created. This issue (health monitors are not getting created) has happened only one time and i think this is because of creation of load balancers are taking time.

@Praveengostu
Copy link
Collaborator

@sshingarapu Share the details with the API team.. Will get back to you once we get update from them

@Praveengostu
Copy link
Collaborator

@sshingarapu The issue is fixed by the API team.. Please let us know if you see the issue again.

@sshingarapu
Copy link
Author

@Praveengostu Thanks for the fix. I will build the provider again and let you know the status.

@sshingarapu
Copy link
Author

@Praveengostu I have tested it with new build. Couple of issues are observed. Load balancers are getting created but the health monitors are not able to add to the load balancers. I am getting the below error.

  • ibm_lbaas_health_monitor.lbaas_hm: Error adding health monitors: sl.Error{StatusCode:500, Exception:"SoftLayer_Exception_Network_LBaaS_ObjectInInvalidState", Message:"Load balancer uuid=d5437d68-d7ac-4208-93e8-ef2232cc73ee cannot be updated. The object is in state UPDATE_PENDING.", Wrapped:error(nil)}

  • module.Test3.ibm_lbaas_health_monitor.lbaas_hm: 1 error(s) occurred:

  • ibm_lbaas_health_monitor.lbaas_hm: Error adding health monitors: sl.Error{StatusCode:500, Exception:"SoftLayer_Exception_Network_LBaaS_ObjectInInvalidState", Message:"Load balancer uuid=e07b1da6-63cc-4c55-ace2-11f9ebe46de8 cannot be updated. The object is in state UPDATE_PENDING.", Wrapped:error(nil)}

Below is my load balancer main.tf

resource "ibm_lbaas" "lbaas" {
name = "${var.name}"
subnets = ["1629415"]
type = "${var.type}"
protocols = ["${var.protocols}"]
}
resource "ibm_lbaas_server_instance_attachment" "server_attach" {
count = "${var.count}"
private_ip_address = "${element(var.private_ip_address,count.index)}"
lbaas_id = "${ibm_lbaas.lbaas.id}"
}

resource "ibm_lbaas_health_monitor" "lbaas_hm" {
protocol = "${ibm_lbaas.lbaas.health_monitors.0.protocol}"
port = "${ibm_lbaas.lbaas.health_monitors.0.port}"
timeout = 3
interval = 5
max_retries = 6
url_path = "/"
lbaas_id = "${ibm_lbaas.lbaas.id}"
monitor_id = "${ibm_lbaas.lbaas.health_monitors.0.monitor_id}"
}

@Praveengostu
Copy link
Collaborator

The server_attach and lbaas_hm cannot run in parallel as they will change the status of LB. you can add depends_on to handle this

resource "ibm_lbaas" "lbaas" {
name = "${var.name}"
subnets = ["1629415"]
type = "${var.type}"
protocols = ["${var.protocols}"]
}
resource "ibm_lbaas_server_instance_attachment" "server_attach" {
count = "${var.count}"
private_ip_address = "${element(var.private_ip_address,count.index)}"
lbaas_id = "${ibm_lbaas.lbaas.id}"
}
resource "ibm_lbaas_health_monitor" "lbaas_hm" {
protocol = "${ibm_lbaas.lbaas.health_monitors.0.protocol}"
port = "${ibm_lbaas.lbaas.health_monitors.0.port}"
timeout = 3
interval = 5
max_retries = 6
url_path = "/"
lbaas_id = "${ibm_lbaas.lbaas.id}"
monitor_id = "${ibm_lbaas.lbaas.health_monitors.0.monitor_id}"
depends_on = ["ibm_lbaas_server_instance_attachment.server_attach"]
}

@sshingarapu
Copy link
Author

sshingarapu commented Jul 4, 2018

@Praveengostu It worked. But the url_path which is given in above is "/" is not updated in health checks in LB and even i am unable to give manually in the console. Currently there is no value for PATH.

@sshingarapu
Copy link
Author

@Praveengostu One observation is the url_path in helath checks is not getting updated when the protocol is TCP.

@Praveengostu
Copy link
Collaborator

@sshingarapu There will not be url_path in health checks when the protocol is tcp. Could you please help us with understanding your use case with usage of cloud components.

@sshingarapu
Copy link
Author

@Praveengostu We are building openshift environment on bluemix virtual machines. Creating external and internal load balancers with health checks for monitoring purpose. In the load balancers currently we use the protocol TCP. Basically we would like to monitor the health of our environment by defining the health monitors. If TCP has no PATH url then what is the default path will be used for health checks.

@sakshiag
Copy link
Collaborator

sakshiag commented Jul 6, 2018

The health checks against HTTP and TCP ports are conducted as follows:

HTTP: An HTTP GET request against a pre-specified URL is sent to the back-end server port. The server port is marked healthy upon receiving a 200 OK response. The default GET URL is “/” via the GUI, and it can be customized.
TCP: The Load Balancer attempts to open a TCP connection with the back-end server on a specified TCP port. The server port is marked healthy if the connection attempt is successful, and the connection is then closed.

You can refer this doc for more details : https://console.bluemix.net/docs/infrastructure/loadbalancer-service/health-checks.html#health-checks

From your previous comment , I see you are trying to setup openshift on bluemix VM. We have also tried doing same with the basic architecture(https://github.com/IBM-Cloud/terraform-ibm-openshift/) using virtual machines and security groups. Can you help us in understanding the approach which you are following and architecture which you are trying to deploy ?

@sshingarapu
Copy link
Author

sshingarapu commented Jul 10, 2018

Thanks @sakshiag, @Praveengostu for your help on this.
We still have an issue with load balancers, where we see the following error when destroying. Upon issuing another destroy the resource eventually gets deleted.
* ibm_lbaas_server_instance_attachment.server_attach.0: Error removing server instances: sl.Error{StatusCode:500, Exception:"SoftLayer_Exception_Network_LBaaS_ObjectInInvalidState", Message:"Load balancer uuid=3d658488-12c1-43da-89f4-dfdf700a6697 cannot be updated. The object is in state UPDATE_PENDING.", Wrapped:error(nil)}

Also there is an intermittent issue with LB creation and server attachment with the following error
* module.masternodes01ragsr01-MasterInt.ibm_lbaas_server_instance_attachment.server_attach[0]: 1 error(s) occurred: * ibm_lbaas_server_instance_attachment.server_attach.0: Error adding server instances: sl.Error{StatusCode:500, Exception:"SoftLayer_Exception_Network_LBaaS_ObjectInInvalidState", Message:"Load balancer uuid=13d28131-8176-426a-bf41-30f26a7e2660 cannot be updated. The object is in state UPDATE_PENDING.", Wrapped:error(nil)}

Our architecture is pretty much the same except for the fact that we have multiple masters which are behind a load balancer for obvious reasons. Also, we have more SG's to support our various teams here.

@Praveengostu
Copy link
Collaborator

Praveengostu commented Jul 10, 2018

@sshingarapu Thanks for reporting the issue. Will check on this. I assume the issue is recreatable with the same config

resource "ibm_lbaas" "lbaas" {
name = "${var.name}"
subnets = ["1629415"]
type = "${var.type}"
protocols = ["${var.protocols}"]
}
resource "ibm_lbaas_server_instance_attachment" "server_attach" {
count = "${var.count}"
private_ip_address = "${element(var.private_ip_address,count.index)}"
lbaas_id = "${ibm_lbaas.lbaas.id}"
}
resource "ibm_lbaas_health_monitor" "lbaas_hm" {
protocol = "${ibm_lbaas.lbaas.health_monitors.0.protocol}"
port = "${ibm_lbaas.lbaas.health_monitors.0.port}"
timeout = 3
interval = 5
max_retries = 6
url_path = "/"
lbaas_id = "${ibm_lbaas.lbaas.id}"
monitor_id = "${ibm_lbaas.lbaas.health_monitors.0.monitor_id}"
depends_on = ["ibm_lbaas_server_instance_attachment.server_attach"]
}

@sshingarapu
Copy link
Author

@Praveengostu Yes. In our case we have 6 load balancers and attaching 2-3 servers in each load balancer. I hope you can reproduce the issue with this configuration.

@sshingarapu
Copy link
Author

@Praveengostu Could you please share an update on this? I am getting this issue frequently now.

@Praveengostu
Copy link
Collaborator

@sshingarapu Currently looking in to this. Could you please share me your configuration file. Pls enable the debug log by export TF_LOG=debug and share us the log next time you encounter the issue.

@Praveengostu
Copy link
Collaborator

Praveengostu commented Jul 13, 2018

@sshingarapu I could not recreate the issue with adding the depends_on between server_attachment and health monitors.. Here is the tf configuration main.tf(https://gist.github.com/Praveengostu/bd732a547b251c120e41dff9ef00366a) where it creates 6 lbaas with each 2 server attachments. Here is the terraform_output(https://gist.github.com/Praveengostu/6634847498fde1dae7e5a6ba82541364) which contains the o/p of apply, show and destroy. Could you please share your configuration and log to understand the issue.

@sshingarapu
Copy link
Author

@Praveengostu I have included depends_on in server_attach resource and then i don't see this error frequently. But today i got it again but i missed to set debug. I will set debug in next time and give you the logs if it fails.

resource "ibm_lbaas_server_instance_attachment" "server_attach" {
count = "${var.count}"
private_ip_address = "${element(var.private_ip_address,count.index)}"
lbaas_id = "${ibm_lbaas.lbaas.id}"
depends_on = ["ibm_lbaas.lbaas"]
}

This is how we configure in our case.

main.tf:

  1. We have module definition for each server(in total 10 servers) and for load balancers (in total 6 load balancers) with different parameters like below
    module "instance1" {
    source = "./modules/casaas-terraform-modules/casaas-bmx-instance"
    private_network_only = "false"
    hostname = "instance1"
    }

module "instance2" {
source = "./modules/casaas-terraform-modules/casaas-bmx-instance"
private_network_only = "true"
hostname = "instance2"
}
// Load Balancers
module "infranodes01ragsr01-AppExt" {
source = "./modules/casaas-terraform-modules/casaas-bmx-lb"
name = "infranodes01ragsr01-AppExt"
type = "PUBLIC"
count = "2"
health_check_interval = "10"
health_check_path = "/healthz"
health_check_port = "1936"
health_check_timeout = "5"
health_check_protocol = "HTTP"
protocols = [

            {
            frontend_protocol     = "TCP"
            frontend_port         = 443
            backend_protocol      = "TCP"
            backend_port          = 443
            session_stickiness    = "SOURCE_IP"
            load_balancing_method = "round_robin"
            },

    ]

}

module "infranodes01ragsr01-AppInt" {
source = "./modules/casaas-terraform-modules/casaas-bmx-lb"
name = "infranodes01ragsr01-AppInt"
type = "PRIVATE"
count = "2"
health_check_interval = "10"
health_check_path = "/healthz"
health_check_port = "1936"
health_check_timeout = "5"
health_check_protocol = "HTTP"
protocols = [

            {
            frontend_protocol     = "TCP"
            frontend_port         = 443
            backend_protocol      = "TCP"
            backend_port          = 443
            session_stickiness    = "SOURCE_IP"
            load_balancing_method = "round_robin"
            },

    ]

}

Could you please try in this way and check if we can reproduce the issue.

@sshingarapu
Copy link
Author

@Praveengostu We are getting the issue which we mention earlier while destroying. Here is the terraform ouput with DEBUG enabled https://gist.github.com/sshingarapu/e927f9b2882779e9c94781a8584db719.
I will let you know incase if i get the issue while applying.

@Praveengostu
Copy link
Collaborator

@sshingarapu I see the module module.masternodes01ragsr01-MasterExt.ibm_lbaas_server_instance_attachment.server_attach[2] fails to attempt to destroy the server attach as the load balancer state is pending. Mostly this occurs if the dependency is missing. Could you please share your configuration so that we can help you with a permanent resolution as we are not able to recreate this.

@Praveengostu
Copy link
Collaborator

@sshingarapu One more point is if there are multiple resources of ibm_lbaas_server_instance_attachment there should be a dependency mention between them as each of them changes the state of the Load balancer.

@sshingarapu
Copy link
Author

sshingarapu commented Jul 24, 2018

@Praveengostu I have given the configuration details in my earlier comments. Please let me know if that is not enough to debug the issue.

All load balancer module definitions are like the below but with different values for name, type, count etc..

// Load Balancers
module "masternodes01ragsr01-MasterExt" {
source = "./modules/casaas-terraform-modules/casaas-bmx-lb"
name = "ragsr01-MasterExt"
lbaas_subnet = "${module.masternodes01.lbaas_subnet}"
type = "PUBLIC"
count = "3" // number of servers to attach
private_ip_address = "${module.masternodes01.private_ips}" // list of server ip's to attach in load balancer
health_check_interval = "30"
health_check_path = "/healthz"
health_check_port = "8443"
health_check_timeout = "10"
health_check_protocol = "HTTP"
protocols = [

            {
            frontend_protocol     = "TCP"
            frontend_port         = 443
            backend_protocol      = "TCP"
            backend_port          = 8443
            session_stickiness    = "SOURCE_IP"
            load_balancing_method = "round_robin"
            },
    ]

}

source:
resource "ibm_lbaas" "lbaas" {
name = "${var.name}"
subnets = ["1629415"]
type = "${var.type}"
protocols = ["${var.protocols}"]
}
resource "ibm_lbaas_server_instance_attachment" "server_attach" {
count = "${var.count}" // this values if provided in load balancer module definition to attach servers
private_ip_address = "${element(var.private_ip_address,count.index)}" // this value is provided in load balancer module definition
lbaas_id = "${ibm_lbaas.lbaas.id}"
depends_on = ["ibm_lbaas.lbaas"]
}
resource "ibm_lbaas_health_monitor" "lbaas_hm" {
protocol = "${ibm_lbaas.lbaas.health_monitors.0.protocol}"
port = "${ibm_lbaas.lbaas.health_monitors.0.port}"
timeout = 3
interval = 5
max_retries = 6
url_path = "/"
lbaas_id = "${ibm_lbaas.lbaas.id}"
monitor_id = "${ibm_lbaas.lbaas.health_monitors.0.monitor_id}"
depends_on = ["ibm_lbaas_server_instance_attachment.server_attach"]
}

@Praveengostu
Copy link
Collaborator

@sshingarapu Sure, Will check and get back to you.

@hkantare
Copy link
Collaborator

@sshingarapu Since we are using count in ibm_lbaas_server_instance_attachment they run in parallel and sometimes may be the two or more resources call the delete API at same time and fails with "UPDATE_PENDING" ...One solution to solve the issue by using parallelism
terraform destroy -parallelism=1 ..it destroys one by one.

@sshingarapu
Copy link
Author

@Praveengostu Can we use parallelism while applying also? We are seeing the below intermittent issue while terraform apply. I have not seen this issue recently but i am sure that i will get this error again.

And, When we use parallelism, does all the resources will be created one by one? If yes then it may take upto 1 hr incase of spinning more VMs.(10-15)

  • module.masternodes01ragsr01-MasterInt.ibm_lbaas_server_instance_attachment.server_attach[0]: 1 error(s) occurred: * ibm_lbaas_server_instance_attachment.server_attach.0: Error adding server instances: sl.Error{StatusCode:500, Exception:"SoftLayer_Exception_Network_LBaaS_ObjectInInvalidState", Message:"Load balancer uuid=13d28131-8176-426a-bf41-30f26a7e2660 cannot be updated. The object is in state UPDATE_PENDING.", `Wrapped:error(nil)}

@hkantare
Copy link
Collaborator

Yes parallelism can be applied to plan & apply also. When you apply parallelism (1) then all resources will be created one by one.
Another approach is to break down terraform apply in to multiple steps

  1. terraform apply -target=module.vms -target=modules.xxx (Create all resources which are not dependent on lbass and lbass without parallelism so they will run in parallel)
    2)terraform apply -target=module.lbass -parallelism=1 (lbass resources will be created )
  2. terraform apply

@sshingarapu
Copy link
Author

I have tried destroy with parallelism and i don't see the load balancer issue but below is the error we always get while destroying. It says no rule with ID of 1733885 exists but it exists in terraform tfstate file.

Debug output updated at https://gist.github.com/sshingarapu/41baad2302d8a21a4d586424e59d6992

module.security_groups.ibm_security_group_rule.outbound_AlertLogicSecurityGroup_80[0] (destroy): 1 error(s) occurred:

  • ibm_security_group_rule.outbound_AlertLogicSecurityGroup_80.0: Error deleting Security Group Rule: SoftLayer_Exception_NotFound: No rule with ID of 1733885 exists for this security group (HTTP 500)

@hkantare
Copy link
Collaborator

@sshingarapu Thanks for testing with parallelism. Can you please close this issue and open a new issue to track the security groups and rules.Provide the sample configuration you are using for the security group and rules.

@hkantare
Copy link
Collaborator

Closing the issue . If issue still exists please reopen it.

@ramba07
Copy link

ramba07 commented Feb 11, 2019

I got an issue similar to the one posted above, in Bluemix cloud creating an LB and attaching the instances to it through Terraform:

  • ibm_lbaas_server_instance_attachment.server_attach.1: Error adding server instances: sl.Error{StatusCode:500, Exception:"SoftLayer_Exception_Network_LBaaS_ObjectInInvalidState", Message:"Load balancer uuid=44ffd0b1-2c89-40ad-ad10-ab9b593058ea cannot be updated. The object is in state UPDATE_PENDING.", Wrapped:error(nil)}
  • module.edgenodes01bmxtestrk-AppExt.ibm_lbaas_server_instance_attachment.server_attach[0]: 1 error(s) occurred:

Also the below error:

  • ibm_compute_vm_instance.instance: Error ordering virtual guest: SoftLayer_Exception_Public: A price (1639) for First Disk was submitted. This preset configuration does not allow modifications to First Disk. (HTTP 500).

Request any insight into this.

Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants