Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add network lifecycle management #5626

Merged
merged 9 commits into from
May 14, 2019
Merged

Conversation

nickethier
Copy link
Member

@nickethier nickethier commented Apr 29, 2019

This PR adds a few notable changes to support lifecycle management of network namespaces.

Alloc Runner Hooks

The first big change here was adding a call to initialize platform dependent hooks since the network hook is currently only supported on linux. There is new dependency here on a small packaged names ns that only builds on linux due to references to the syscall package that are platform dependent.

The second is adding the actual network hook that includes checking if a driver needs to initialize the network.

plugins/drivers Changes

New proto messages and rpcs were added to support network namespaces. These aren't implemented by any driver in this PR but I did add a helper struct that can be embedded when driver network management is not needed (aka everything but docker). The typical proto to go conversion funcs are there as well.

@nickethier nickethier force-pushed the f-network-lifecycle branch from feb9dff to 82333ac Compare May 3, 2019 18:14
@nickethier nickethier marked this pull request as ready for review May 8, 2019 15:26
@nickethier nickethier requested review from schmichael and notnoop May 8, 2019 17:01
@nickethier nickethier changed the base branch from f-network-group-stanza to f-network-namespaces May 8, 2019 17:46
@nickethier nickethier force-pushed the f-network-lifecycle branch from 3944cf5 to ddf125f Compare May 9, 2019 01:33
Copy link
Contributor

@notnoop notnoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did one round of code comments but I'll read other networking PRs for more context before reviewing it again.

client/allocrunner/alloc_runner_hooks.go Outdated Show resolved Hide resolved
client/allocrunner/network_hook_linux.go Outdated Show resolved Hide resolved
client/allocrunner/network_hook_linux.go Show resolved Hide resolved
plugins/drivers/driver.go Show resolved Hide resolved
plugins/drivers/driver.go Outdated Show resolved Hide resolved
client/allocrunner/alloc_runner_linux.go Outdated Show resolved Hide resolved
client/allocrunner/alloc_runner_linux.go Show resolved Hide resolved
return
}

// Put this thread back to the orig ns, since it might get reused (pre go1.10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious how you noticed this issue? I assume you aren't running golang 1.10?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome - might be nice to include a link; also may need to copy the whole file along with the copyright and license notice on top.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what @notnoop said. This code is 💥 and we shouldn't copy it without proper attribution.

@nickethier nickethier merged commit 0587751 into f-network-namespaces May 14, 2019
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth refactoring the default device manager out of the network hook and seeing if we can ship the general network support as fully crossplatform even if there's no non-linux network manager.

NewAllocRunner errors

I also have mixed opinions on adding another error return from NewAR. We were hoping to someday make NewAR/TR never return an error, but I have a feeling we're never going to get there. The current way client.go handles errors from NewAR seems fine, so it's probably not worth worrying about.

Adding a new AR/TR Validation method or hook might be useful? Or it might just complicate things more.

I think it's safe to accept your approach as-is and treat my ideas for out-of-scope of your networking efforts.


// noop for non linux clients
func (ar *allocRunner) initPlatformRunnerHooks(hookLogger hclog.Logger) ([]interfaces.RunnerHook, error) {
return nil, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we detect if they're trying to use an unsupported network mode and return an error?

@@ -0,0 +1,243 @@
// +build linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker on Windows being able to run Linux images really complicates this. #2633

Not something we need to worry about right now, but it is interesting Windows introduces new complexity where a task's OS doesn't match the host OS!


// manager is used when creating the network namespace. This defaults to
// bind mounting a network namespace descritor under /var/run/netns but
// can be created by a driver if nessicary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// can be created by a driver if nessicary
// can be created by a driver if necessary

// alloc should only be read from
alloc *structs.Allocation

// spec described the network namespace and is syncronized by specLock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// spec described the network namespace and is syncronized by specLock
// spec described the network namespace and is syncronized by specLock
Suggested change
// spec described the network namespace and is syncronized by specLock
// spec described the network namespace and is synchronized by specLock

// spec described the network namespace and is syncronized by specLock
spec *drivers.NetworkIsolationSpec
specLock sync.Mutex
logger hclog.Logger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger hclog.Logger
logger hclog.Logger

NetIsolationModeTask = NetIsolationMode("task")

// NetIsolationModeNone indicates that there is no network to isolate and is
// inteded to be used for tasks that the client manages remotely
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// inteded to be used for tasks that the client manages remotely
// intended to be used for tasks that the client manages remotely

Although maybe just remove that part as (a) it doesn't exist yet and (b) none may be useful for other things like system batch jobs that also don't exist yet but can be faked.

@@ -280,6 +288,27 @@ message ExecTaskResponse {
ExitResult result = 3;
}

message CreateNetworkRequest {

// AllodID of the allocation the network is associated with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// AllodID of the allocation the network is associated with
// AllocID of the allocation the network is associated with


message DestroyNetworkRequest {

// AllodID of the allocation the network is associated with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// AllodID of the allocation the network is associated with
// AllocID of the allocation the network is associated with

@@ -350,6 +397,10 @@ message TaskConfig {

// AllocId is the ID of the associated allocation
string alloc_id = 15;

// NetworkIsolationSpec specifies the configuration for the network namespace
// to use for the task. *Only supported on Linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// to use for the task. *Only supported on Linux
// to use for the task. Only supported on Linux

...although that may not be true if you rearrange the hook code above? Couldn't somebody make a Windows driver that provided Create/Destroy Network?

@@ -414,6 +406,26 @@ func tasksUpdated(jobA, jobB *structs.Job, taskGroup string) bool {
return false
}

func networkUpdated(netA, netB []*structs.NetworkResource) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants