-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed VPC defaults to be multiple blocks to account for larger testing clusters #175
Conversation
v1 "k8s.io/api/batch/v1" | ||
"k8s.io/apimachinery/pkg/api/errors" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/client-go/util/retry" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to separate std imports from 3rd party imports. Did we run ./hack/fmt.sh
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your editor should automate ths. Talk to me if you need help setting this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PublicSubnetCIDR*
and VPCBlock*
are the same by default. Can we just use VPC block for subnet CIDR ranges?
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Hour) | ||
defer cancel() | ||
job := &v1.Job{} | ||
for job.Status.Succeeded < 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why we are getting rid of this? It just waits for at least one job is successful...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were going to use it for when we put certain tests into Prow, but found that this didn't properly handle the timeouts. Rather than trying to figure out the timeouts, I decided to get rid of it as we didn't need it, and may not need it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't properly handle the timeouts
Then, we can decrease the timeouts and just log in warning level? Do you run clusterloader2 for more than 2 hours?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the size of the test, yes it can run for up to two hours. But when scale tests get too large, API calls time out, and the testing for that takes a very long time, so rather than having a partially working feature, I just wanted to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the size of the test, yes it can run for up to two hours. But when scale tests get too large, API calls time out, and the testing for that takes a very long time, so rather than having a partially working feature, I just wanted to remove it.
I understand that. Still, this breaks other small-scale tests where 2-hour is enough. Our bug bash job relies on this checks to make sure 5 rounds of clusterloader runs complete successfully.
We can add another field to eksconfig
to opt out of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, let's skip this (delete). I was confused with other wati func.
eks/cluster/vpc.go
Outdated
VPCBlock1: | ||
Value: !Ref VPCBlock1 | ||
Description: Primary VPC Block | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the newline or the VPCBlock1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty new line
eks/cluster/vpc.go
Outdated
|
||
VPCBlock2: | ||
Value: !Ref VPCBlock2 | ||
Description: Primary VPC Block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Primary/Secondary/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
Let's document in CHANGELOG, we are now adding default VPC blocks with the same range as subnet CIDRs. And highlight default blocks are expanded to support large scale workloads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the cleanup. A couple comments
v1 "k8s.io/api/batch/v1" | ||
"k8s.io/apimachinery/pkg/api/errors" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/client-go/util/retry" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your editor should automate ths. Talk to me if you need help setting this up.
}{ | ||
ClusterLoaderSpec: c.Config.Spec.ClusterLoader, | ||
ConfigMapData: configMapData, | ||
TestArgs: c.buildArgs(), | ||
InstanceTypes: c.Config.Spec.ClusterLoader.InstanceTypes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO instance types shouldn't be used to select. Perhaps we can use a node selector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at https://github.com/njtran/aws-k8s-tester/blob/vpcsubnets/eks/cluster-loader/clusterloader2/cluster-loader.gotmpl#L67 it's using a Node Selector so that it can specify which instances the clusterloader2 pod (not the test pods) can be scheduled onto. I will add a comment to make that clearer.
eks/cluster/vpc.go
Outdated
ts.cfg.EKSConfig.Parameters.VPCCIDR = v | ||
ts.cfg.EKSConfig.Parameters.VPCBlock1 = v | ||
case "VPCBlock2": | ||
// to avoid additional CFN describe stack call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't copy this comment 4 times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
…ing clusters and adjusted other defaults and functionalities for automated cl2 scale testing
@@ -135,7 +117,6 @@ func (c *ClusterLoader) buildArgs() string { | |||
"--provider=eks", | |||
"--testconfig=/etc/config/config.yaml", | |||
"--run-from-cluster", | |||
"--report-dir=/var/reports/cluster-loader", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of what I removed is what I've added in the past. This, along with the other removals, is part of the automated CL2 tests, where we don't need it because it adds extra functionality that isn't being used. We aren't using report-dir because we aren't keeping track of the CL2 test results, just making sure it's being completed.
Issue #, if available:
Description of changes:
Testing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.