Skip to content

Commit

Permalink
feat: update to Talos 1.9.0 final
Browse files Browse the repository at this point in the history
Update to the final release, update CAPI to v1.9.0.

Signed-off-by: Andrey Smirnov <[email protected]>
  • Loading branch information
smira committed Dec 19, 2024
1 parent a2ac6f6 commit 09f7338
Show file tree
Hide file tree
Showing 13 changed files with 277 additions and 228 deletions.
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,21 +26,21 @@ This provider's versions are compatible with the following versions of Cluster A

This provider's versions are able to install and manage the following versions of Kubernetes:

| | v1.16 | v 1.17 | v1.18 | v1.19 | v1.20 | v1.21 | v1.22 | v1.23 | v1.24 | v1.25 | v1.26 | v1.27 | v1.28 | v1.29 | v1.30 | v1.31 |
| ------------------------------------------- | ----- | ------ | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
| Control Plane Provider Talos v1alpha3 (v0.2) ||||||| | | | | | | | | | |
| Control Plane Provider Talos v1alpha3 (v0.3) ||||||| | | | | | | | | | |
| Control Plane Provider Talos v1alpha3 (v0.4) | | | ||||||||| | | | | |
| Control Plane Provider Talos v1alpha3 (v0.5) | | | | | | | | | ||||||||
| | v1.16 | v 1.17 | v1.18 | v1.19 | v1.20 | v1.21 | v1.22 | v1.23 | v1.24 | v1.25 | v1.26 | v1.27 | v1.28 | v1.29 | v1.30 | v1.31 | v1.32 |
| ------------------------------------------- | ----- | ------ | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
| Control Plane Provider Talos v1alpha3 (v0.2) ||||||| | | | | | | | | | | |
| Control Plane Provider Talos v1alpha3 (v0.3) ||||||| | | | | | | | | | | |
| Control Plane Provider Talos v1alpha3 (v0.4) | | | ||||||||| | | | | | |
| Control Plane Provider Talos v1alpha3 (v0.5) | | | | | | | | | |||||||||

This provider's versions are compatible with the following versions of Talos:

| | v0.11 | v0.12 | v0.13 | v0.14 | v1.0 | v1.1 | v1.2 | v1.3 | v1.4 | v1.5 | v1.6 | v1.7 | v1.8 |
| -------------------------------------------- | ----- | ------ | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
| Control Plane Provider Talos v1alpha3 (v0.3) ||| | | | | | | | | | | |
| Control Plane Provider Talos v1alpha3 (v0.3) |||| | | | | | | | | | |
| Control Plane Provider Talos v1alpha3 (v0.4) ||||||||| | | | | |
| Control Plane Provider Talos v1alpha3 (v0.5) | | | | | | | |||||||
| | v0.11 | v0.12 | v0.13 | v0.14 | v1.0 | v1.1 | v1.2 | v1.3 | v1.4 | v1.5 | v1.6 | v1.7 | v1.8 | v1.9 |
| -------------------------------------------- | ----- | ------ | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
| Control Plane Provider Talos v1alpha3 (v0.3) ||| | | | | | | | | | | | |
| Control Plane Provider Talos v1alpha3 (v0.3) |||| | | | | | | | | | | |
| Control Plane Provider Talos v1alpha3 (v0.4) ||||||||| | | | | | |
| Control Plane Provider Talos v1alpha3 (v0.5) | | | | | | | ||||||||

## Building and Installing

Expand Down
5 changes: 5 additions & 0 deletions api/v1alpha3/taloscontrolplane_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,11 @@ type TalosControlPlaneStatus struct {
// Conditions defines current service state of the KubeadmControlPlane.
// +optional
Conditions clusterv1.Conditions `json:"conditions,omitempty"`

// version represents the minimum Kubernetes version for the control plane machines
// in the cluster.
// +optional
Version *string `json:"version,omitempty"`
}

// +kubebuilder:object:root=true
Expand Down
5 changes: 5 additions & 0 deletions api/v1alpha3/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -291,20 +291,20 @@ spec:
description: |-
The reason for the condition's last transition in CamelCase.
The specific API may choose whether or not this field is considered a guaranteed API.
This field may not be empty.
This field may be empty.
type: string
severity:
description: |-
Severity provides an explicit classification of Reason code, so the users or machines can immediately
severity provides an explicit classification of Reason code, so the users or machines can immediately
understand the current situation and act accordingly.
The Severity field MUST be set only when Status=False.
type: string
status:
description: Status of the condition, one of True, False, Unknown.
description: status of the condition, one of True, False, Unknown.
type: string
type:
description: |-
Type of condition in CamelCase or in foo.example.com/CamelCase.
type of condition in CamelCase or in foo.example.com/CamelCase.
Many .condition.type values are consistent across resources like Available, but because arbitrary conditions
can be useful (see .node.status.conditions), the ability to deconflict is important.
type: string
Expand Down Expand Up @@ -368,6 +368,11 @@ spec:
that still have not been created.
format: int32
type: integer
version:
description: |-
version represents the minimum Kubernetes version for the control plane machines
in the cluster.
type: string
type: object
type: object
served: true
Expand Down
3 changes: 3 additions & 0 deletions controllers/configs.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ import (
)

// talosconfigForMachine will generate a talosconfig that uses *all* found addresses as the endpoints.
//
// NOTE: There is no client.WithNodes(...) here, so no multiplexing is done. The request will hit any
// of the controlplane nodes in machines list.
func (r *TalosControlPlaneReconciler) talosconfigForMachines(ctx context.Context, tcp *controlplanev1.TalosControlPlane, machines ...clusterv1.Machine) (*talosclient.Client, error) {
if len(machines) == 0 {
return nil, fmt.Errorf("at least one machine should be provided")
Expand Down
2 changes: 1 addition & 1 deletion controllers/controlplane.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ func (c *ControlPlane) MachinesNeedingRollout() collections.Machines {
func getInfraResources(ctx context.Context, cl client.Client, machines collections.Machines) (map[string]*unstructured.Unstructured, error) {
result := map[string]*unstructured.Unstructured{}
for _, m := range machines {
infraObj, err := external.Get(ctx, cl, &m.Spec.InfrastructureRef, m.Namespace)
infraObj, err := external.Get(ctx, cl, &m.Spec.InfrastructureRef)
if err != nil {
if apierrors.IsNotFound(errors.Cause(err)) {
continue
Expand Down
115 changes: 62 additions & 53 deletions controllers/etcd.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,14 @@ import (
"time"

controlplanev1 "github.com/siderolabs/cluster-api-control-plane-provider-talos/api/v1alpha3"
"github.com/siderolabs/talos/pkg/machinery/api/machine"
machineapi "github.com/siderolabs/talos/pkg/machinery/api/machine"
talosclient "github.com/siderolabs/talos/pkg/machinery/client"
clusterv1 "sigs.k8s.io/cluster-api/api/v1beta1"
"sigs.k8s.io/controller-runtime/pkg/client"
)

func (r *TalosControlPlaneReconciler) etcdHealthcheck(ctx context.Context, tcp *controlplanev1.TalosControlPlane, ownedMachines []clusterv1.Machine) error {
ctx, cancel := context.WithTimeout(ctx, time.Second*5)

defer cancel()

machines := []clusterv1.Machine{}
Expand All @@ -30,70 +29,80 @@ func (r *TalosControlPlaneReconciler) etcdHealthcheck(ctx context.Context, tcp *
}
}

c, err := r.talosconfigForMachines(ctx, tcp, machines...)
if err != nil {
return err
}

defer c.Close() //nolint:errcheck

service := "etcd"

params := make([]interface{}, 0, len(machines)*2)
params := make([]any, 0, len(machines)*2)
for _, machine := range machines {
params = append(params, "node", machine.Name)
}

r.Log.Info("verifying etcd health on all nodes", params...)

svcs, err := c.ServiceInfo(ctx, service)
if err != nil {
return err
}
const service = "etcd"

// check that etcd service is healthy on all nodes
for _, svc := range svcs {
node := svc.Metadata.GetHostname()
// list of discovered etcd members, updated on each iteration
members := map[string]struct{}{}

if len(svc.Service.Events.Events) == 0 {
return fmt.Errorf("%s: no events recorded yet for service %q", node, service)
}
for i, machine := range machines {
// loop for each machine, the client created has endpoints which point to a single machine
if err := func() error {
c, err := r.talosconfigForMachines(ctx, tcp, machine)
if err != nil {
return err
}

lastEvent := svc.Service.Events.Events[len(svc.Service.Events.Events)-1]
if lastEvent.State != "Running" {
return fmt.Errorf("%s: service %q not in expected state %q: current state [%s] %s", node, service, "Running", lastEvent.State, lastEvent.Msg)
}
defer c.Close() //nolint:errcheck

if !svc.Service.GetHealth().GetHealthy() {
return fmt.Errorf("%s: service is not healthy: %s", node, service)
}
}
svcs, err := c.ServiceInfo(ctx, service)
if err != nil {
return err
}

resp, err := c.EtcdMemberList(ctx, &machine.EtcdMemberListRequest{})
if err != nil {
return err
}
// check that etcd service is healthy on the node
for _, svc := range svcs {
node := svc.Metadata.GetHostname()

members := map[string]struct{}{}
if len(svc.Service.Events.Events) == 0 {
return fmt.Errorf("%s: no events recorded yet for service %q", node, service)
}

lastEvent := svc.Service.Events.Events[len(svc.Service.Events.Events)-1]
if lastEvent.State != "Running" {
return fmt.Errorf("%s: service %q not in expected state %q: current state [%s] %s", node, service, "Running", lastEvent.State, lastEvent.Msg)
}

if !svc.Service.GetHealth().GetHealthy() {
return fmt.Errorf("%s: service is not healthy: %s", node, service)
}
}

resp, err := c.EtcdMemberList(ctx, &machineapi.EtcdMemberListRequest{})
if err != nil {
return err
}

for i, message := range resp.Messages {
actualMembers := len(message.Members)
expectedMembers := len(machines)
for _, message := range resp.Messages {
actualMembers := len(message.Members)
expectedMembers := len(machines)

node := message.Metadata.GetHostname()
node := message.Metadata.GetHostname()

// check that the count of members is the same on all nodes
if actualMembers != expectedMembers {
return fmt.Errorf("%s: expected to have %d members, got %d", node, expectedMembers, actualMembers)
}
// check that the count of members is the same on all nodes
if actualMembers != expectedMembers {
return fmt.Errorf("%s: expected to have %d members, got %d", node, expectedMembers, actualMembers)
}

// check that member list is the same on all nodes
for _, member := range message.Members {
if _, found := members[member.Hostname]; i > 0 && !found {
return fmt.Errorf("%s: found extra etcd member %s", node, member.Hostname)
// check that member list is the same on all nodes
for _, member := range message.Members {
if _, found := members[member.Hostname]; i > 0 && !found {
return fmt.Errorf("%s: found extra etcd member %s", node, member.Hostname)
}

members[member.Hostname] = struct{}{}
}
}

members[member.Hostname] = struct{}{}
return nil
}(); err != nil {
return fmt.Errorf("error checking etcd health on machine %q: %w", machines[i].Name, err)
}
}

Expand All @@ -118,14 +127,14 @@ func (r *TalosControlPlaneReconciler) gracefulEtcdLeave(ctx context.Context, c *
if svc.Service.State != "Finished" {
r.Log.Info("forfeiting leadership", "machine", machineToLeave.Status.NodeRef.Name)

_, err = c.EtcdForfeitLeadership(ctx, &machine.EtcdForfeitLeadershipRequest{})
_, err = c.EtcdForfeitLeadership(ctx, &machineapi.EtcdForfeitLeadershipRequest{})
if err != nil {
return err
}

r.Log.Info("leaving etcd", "machine", machineToLeave.Name, "node", machineToLeave.Status.NodeRef.Name)

err = c.EtcdLeaveCluster(ctx, &machine.EtcdLeaveClusterRequest{})
err = c.EtcdLeaveCluster(ctx, &machineapi.EtcdLeaveClusterRequest{})
if err != nil {
return err
}
Expand All @@ -137,7 +146,7 @@ func (r *TalosControlPlaneReconciler) gracefulEtcdLeave(ctx context.Context, c *

// forceEtcdLeave removes a given machine from the etcd cluster by telling another CP node to remove the member.
// This is used in times when the machine was deleted out from under us.
func (r *TalosControlPlaneReconciler) forceEtcdLeave(ctx context.Context, c *talosclient.Client, member *machine.EtcdMember) error {
func (r *TalosControlPlaneReconciler) forceEtcdLeave(ctx context.Context, c *talosclient.Client, member *machineapi.EtcdMember) error {
ctx, cancel := context.WithTimeout(ctx, time.Second*5)

defer cancel()
Expand All @@ -146,7 +155,7 @@ func (r *TalosControlPlaneReconciler) forceEtcdLeave(ctx context.Context, c *tal

return c.EtcdRemoveMemberByID(
ctx,
&machine.EtcdRemoveMemberByIDRequest{
&machineapi.EtcdRemoveMemberByIDRequest{
MemberId: member.Id,
},
)
Expand Down Expand Up @@ -199,7 +208,7 @@ func (r *TalosControlPlaneReconciler) auditEtcd(ctx context.Context, tcp *contro

defer c.Close() //nolint:errcheck

response, err := c.EtcdMemberList(ctx, &machine.EtcdMemberListRequest{})
response, err := c.EtcdMemberList(ctx, &machineapi.EtcdMemberListRequest{})
if err != nil {
return fmt.Errorf("error getting etcd members via %q (endpoints %v): %w", designatedCPMachine.Name, c.GetConfigContext().Endpoints, err)
}
Expand Down
Loading

0 comments on commit 09f7338

Please sign in to comment.