Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor endpoint handling and reconcilliation #21

Merged

Conversation

deverton-godaddy
Copy link
Contributor

Feature or Problem

With the release of Nomad 1.6 it's possible to get the network address of the allocation from Nomad. The change to enable this is only in the client library and does not require updating the Nomad server. The IP was sent back by older Nomad versions, it just wasn't available in the client.

This enables refactoring the endpoint reconcilliation to make use of the IP address to identify the endpoint within Cilium. There is no longer a dependency on Consul for policy reconcilliaton. Additionally, endpoints are now labelled with the task group and task information as services can be created at those levels.

On startup of netreap the reconciliation process is as follows:

  1. List all the endpoints in Cilium
  2. Get the container ID associated with the endpoint. As part of the CNI call from Nomad, it will pass the allocation ID as the container ID to Cilium.
  3. Use the container ID to look up the allocation in Nomad.
  4. Label the endpoints as needed.

There's no need to delete endpoints (as far as I can tell) as Nomad should be deleting the endpoints as part of its integration with the CNI plugin.

For event processing netreap will now monitor the Allocation topic as the AllocationUpdated event from that topic contains the network details associated with the allocation. The process is now:

  1. Receive an AllocationUpdated event.
  2. If it does not have an IP address associated (and tasks without network configuration won't) ignore it.
  3. Use the IP address to look up the endpoint in the local Cilium agent. You can query endpoints by IP address and Cilium will only return details for endpoints that re local to the node. Though we could also just query by container ID (i.e. allocation ID).
  4. If the allocation event doesn't include the Job data, fetch it and then apply the labels combing information from both.

This should be fairly scalable as the number of calls to the Nomad API will be one per-endpoint allocation.

One other change is patch the endpoint with an endpoint change request event. This triggers Cilium to remove the reserved:init label from the endpoints as it considers the endpoints labelled properly at that point. This change is based on how the Cilium Docker network plugin works.

Note that it is no longer possible to filter out services by tags. I'm not sure that feature makes sense with the refactored code as you would always want to apply labels to Cilium endpoints.

Related Issues

Potentially fixes #20 and #9

As part of the Go 1.20 release it seems like the default for `CGO_ENABLED` is no longer carried over from the tools. This leads to linking issues on systems that use different versions of glibc from what the base image uses. See golang/go#58550 for more details.

This change should fix cosmonic-labs#16
Use `scratch` as a base image since we're generating a static binary
anyway. Also be more explicity about the platform and target OS during
the build.
DRAFT

With the release of Nomad 1.6 it's possible to get the network address
of the allocation from Nomad. The change to enable this is only in the
client library and does not require updating the Nomad server. The IP
was sent back by older Nomad versions, it just wasn't available in the
client.

This enables refactoring the endpoint reconcilliation to make use of the
IP address to identify the endpoint within Cilium.

There is no longer a dependency on Consul for policies.

Additional, endpoints are now labelled with the task group and task
information as services can be created at those levels.
Remove unused flags from the readme and command line and refactor the
code to allow for testing.
@netlify
Copy link

netlify bot commented Jul 21, 2023

Deploy Preview for netreap canceled.

Name Link
🔨 Latest commit a21fe9b
🔍 Latest deploy log https://app.netlify.com/sites/netreap/deploys/64c97bee3387290008ed29f2

Copy link
Contributor

@protochron protochron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only needs a few small changes related to the Dockerfile, but other than that it looks good to go to me!

Thanks a ton for the contribution and helping us remove the dependency on Consul!

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Copy link

@thomastaylor312 thomastaylor312 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deverton-godaddy Great work here! This is almost ready to merge, I just had one more follow up question around some code that was removed

reapers/endpoints.go Show resolved Hide resolved
Copy link

@thomastaylor312 thomastaylor312 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good to go! Thank you

@thomastaylor312 thomastaylor312 merged commit db0802d into cosmonic-labs:main Aug 2, 2023
@deverton-godaddy deverton-godaddy deleted the deverton/faster-reconcile branch August 2, 2023 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Netreap dont be reapplying the labels
3 participants