DRS failed for cluster if have another cluster in zone #8629

yashi4engg · 2024-02-08T16:17:25Z

ISSUE TYPE

Bug Report

COMPONENT NAME

CLOUDSTACK VERSION

CONFIGURATION

OS / ENVIRONMENT

SUMMARY

We have zone1 where we have two clusters ex- cluster01 and cluster02 ,
Where cluster01 have 3 hypervisors with same h/w model and cluster02 has one hypervisor with dfifferent h/w model.
I have enabled drs setiings in globals setting and then disbaled it for cluster02 in cluster setting "drs.automatic.enable -- false" but enabled for cluster01 .

In the above scenario drs plan failed with below logs -

2024-02-08 10:26:33,532 DEBUG [c.c.s.ManagementServerImpl] (VMSchedulerPollTask:ctx-0bba1590) (logid:f0f7966b) Hosts having capacity and suitable for migration: [Host {"id":25,"name":"node-cluster01","type":"Routing","uuid":"5d145861-e4ad-4f94-a805-266711321d59"}, Host {"id":40,"name":"node-cluster02","type":"Routing","uuid":"66b47d2a-b047-452c-b5d3-65c160666b50"}, Host {"id":48,"name":"node-cluster01","type":"Routing","uuid":"da1be373-cd08-4ae1-948f-7592daabb3fc"}]
2024-02-08 10:26:33,535 ERROR [o.a.c.c.ClusterDrsServiceImpl] (VMSchedulerPollTask:ctx-0bba1590) (logid:f0f7966b) Unable to generate DRS plans for cluster Cluster-Z01 [id=5366c5fb-0ed0-4caf-b2c7-93ebea15a717]

If i disable host from cluster02 our drs works as expected and migrated VMs based on load . But when we have both clusters and all nodes in cluster enabled its failed to generate plan.

Below are cluster level settings for reference from DB.
621 | 1 | drs.automatic.enable | true |
| 622 | 1 | drs.automatic.interval | 10 |
| 623 | 1 | drs.imbalance | 0.4 |
| 624 | 16 | drs.automatic.enable | false

Even tried by keep it disabled in global settings and then enable it for just one cluster in cluster settings.

As our use case is where we have multiple clusters in same zone with different type of h/w.

STEPS TO REPRODUCE

EXPECTED RESULTS

The text was updated successfully, but these errors were encountered:

rohityadavcloud · 2024-02-08T17:43:57Z

cc @vishesh92

vishesh92 · 2024-02-08T20:30:59Z

@yashi4engg I am not able to reproduce the issue. DRS runs independently for each cluster and this shouldn't be happening.

Can you share the complete logs including the error as well as more details about your clusters?

yashi4engg · 2024-02-08T21:16:42Z

@vishesh92 -- While trying to reproduce above error. Below is our configs :-

Primary Storage - We are using zone wide primary storages for both clusters.

Cluster 1 have 3 hypervisors and able to run DRS if i disable hosts from cluster02 .

Cluster02 - have 1 hypervisors which is different hardware from hosts in cluster01. DRS is disabled in cluster settings.

vishesh92 · 2024-02-12T19:26:40Z

@yashi4engg The error log you have shared seems to be truncated. Can you share the complete error log?
Also, a cloudstack cluster supports only one hypervisor. It's possible the error is happening because of your current setup. The complete error logs might help in identifying the issue.

yashi4engg · 2024-02-14T14:33:07Z

We were able to fix above errors and good to close this.

Now DRS is working as expected for us . It was due to zone level storage pool.

yashi4engg · 2024-04-16T16:16:05Z

@vishesh92 -- it works fine where we have primary storage as NFS ..But its not working where we are using OCFS2 FS as primary storage pool in cluster.Even while generating DRS plan manually i am not seeing error it shows success but not getting any plan .

below are logs -
[root@test01 ~]# cat /var/log/cloudstack/management/management-server.log |grep -i edcf30ae
2024-04-16 10:43:57,568 DEBUG [o.a.c.c.ClusterDrsServiceImpl] (VMSchedulerPollTask:ctx-cf16f762) (logid:edcf30ae) ClusterDRS.poll is being called at 2024-04-16 14:44:00 GMT
2024-04-16 10:43:57,579 DEBUG [o.a.c.c.ClusterDrsServiceImpl] (VMSchedulerPollTask:ctx-cf16f762) (logid:edcf30ae) Removed 0 old drs migration plans
[root@test01 ~]# cat /var/log/cloudstack/management/management-server.log |grep -i edcf30ae
2024-04-16 10:43:57,568 DEBUG [o.a.c.c.ClusterDrsServiceImpl] (VMSchedulerPollTask:ctx-cf16f762) (logid:edcf30ae) ClusterDRS.poll is being called at 2024-04-16 14:44:00 GMT
2024-04-16 10:43:57,579 DEBUG [o.a.c.c.ClusterDrsServiceImpl] (VMSchedulerPollTask:ctx-cf16f762) (logid:edcf30ae) Removed 0 old drs migration plans

where as in my cluster VM count is like, i set drs.imbalance to 1.0 as well -
node1 - 150VM
node2 - 150 VM
node3 - 1 VM

vishesh92 · 2024-06-08T09:08:35Z

@yashi4engg Are you able to manually migrate a VM to another host in your cluster?

vishesh92 · 2024-06-13T09:49:54Z

@yashi4engg It's good to know that you are interested in DRS. But please open a different issue or a discussion if the problem is different. It makes it easier for the community to manage the issues.

I have created this PR to ensure that the destination host is part of the same cluster here: #9245 . This should fix the issue you previously described.
There were some changes in #8521 which will be released as part of 4.19.1.0 release. I am not sure if this will fix your issue or not. You can try out the current 4.19 branch to check.

Can you please create another issue with the following details:

Which drs.algorithm are you using?
Are all the hosts in the cluster homogeneous? Details about the host & VMs CPU/memory deployed in the cluster.
Is the storage pool cluster wide or zone wide?

DaanHoogland · 2024-06-18T06:30:37Z

@yashi4engg do you have the possibility to test PR #9243?

sureshanaparti · 2024-06-25T20:42:29Z

Closing this, fixed in #9245. @yashi4engg you can create another issue with any other DRS issues/improvements.

yashi4engg · 2024-07-01T14:07:14Z

@DaanHoogland - sorry as we dont have any VMWare cluster imported/added in cloudstack as of now to test this.

vishesh92 self-assigned this Feb 8, 2024

DaanHoogland added the component:drs label Feb 13, 2024

yashi4engg closed this as completed Feb 14, 2024

yashi4engg reopened this Apr 16, 2024

rohityadavcloud added this to the 4.19.1.0 milestone Apr 30, 2024

sureshanaparti added this to Apache CloudStack 4.19.1 May 23, 2024

sureshanaparti moved this to Todo in Apache CloudStack 4.19.1 May 23, 2024

sureshanaparti added this to Apache CloudStack BugFest - Issues Jun 3, 2024

sureshanaparti moved this to Todo in Apache CloudStack BugFest - Issues Jun 3, 2024

vishesh92 moved this from Todo to Discuss in Apache CloudStack BugFest - Issues Jun 13, 2024

vishesh92 moved this from Discuss to ready for Review in Apache CloudStack BugFest - Issues Jun 13, 2024

vishesh92 mentioned this issue Jun 13, 2024

DRS: Ensure the destination host is part of the same cluster #9245

Merged

13 tasks

vishesh92 linked a pull request Jun 13, 2024 that will close this issue

DRS: Ensure the destination host is part of the same cluster #9245

Merged

13 tasks

DaanHoogland moved this from ready for Review to ready for Testing in Apache CloudStack BugFest - Issues Jun 18, 2024

sureshanaparti moved this from Todo to In Progress in Apache CloudStack 4.19.1 Jun 19, 2024

sureshanaparti closed this as completed Jun 25, 2024

github-project-automation bot moved this from In Progress to Done in Apache CloudStack 4.19.1 Jun 25, 2024

github-project-automation bot moved this from ready for Testing to Done in Apache CloudStack BugFest - Issues Jun 25, 2024

DaanHoogland removed this from Apache CloudStack BugFest - Issues Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRS failed for cluster if have another cluster in zone #8629

DRS failed for cluster if have another cluster in zone #8629

yashi4engg commented Feb 8, 2024

rohityadavcloud commented Feb 8, 2024

vishesh92 commented Feb 8, 2024

yashi4engg commented Feb 8, 2024 •

edited

Loading

vishesh92 commented Feb 12, 2024

yashi4engg commented Feb 14, 2024

yashi4engg commented Apr 16, 2024

vishesh92 commented Jun 8, 2024

vishesh92 commented Jun 13, 2024 •

edited

Loading

DaanHoogland commented Jun 18, 2024

sureshanaparti commented Jun 25, 2024

yashi4engg commented Jul 1, 2024

DRS failed for cluster if have another cluster in zone #8629

DRS failed for cluster if have another cluster in zone #8629

Comments

yashi4engg commented Feb 8, 2024

ISSUE TYPE

COMPONENT NAME

CLOUDSTACK VERSION

CONFIGURATION

OS / ENVIRONMENT

SUMMARY

STEPS TO REPRODUCE

EXPECTED RESULTS

rohityadavcloud commented Feb 8, 2024

vishesh92 commented Feb 8, 2024

yashi4engg commented Feb 8, 2024 • edited Loading

vishesh92 commented Feb 12, 2024

yashi4engg commented Feb 14, 2024

yashi4engg commented Apr 16, 2024

vishesh92 commented Jun 8, 2024

vishesh92 commented Jun 13, 2024 • edited Loading

DaanHoogland commented Jun 18, 2024

sureshanaparti commented Jun 25, 2024

yashi4engg commented Jul 1, 2024

yashi4engg commented Feb 8, 2024 •

edited

Loading

vishesh92 commented Jun 13, 2024 •

edited

Loading