Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for atomic snapshot backups #8047

Merged
merged 8 commits into from
Jun 1, 2020
Merged

API for atomic snapshot backups #8047

merged 8 commits into from
Jun 1, 2020

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented May 26, 2020

This introduces a generate snapshot API to ease taking backups of the Nomad state, along with a CLI command.

The PR introduce an endpoint (i.e. /v1/operator/snapshot) and new CLI subcommands (i.e. nomad operator snapshot save for generating the snapshot and nomad operator snapshot inspect for inspecting a snapshot file). These generate atomic consistent snapshots of the nomad state, representing latest view the servers have.

The snapshots include all cluster data: job definitions, ACL policies, etc. It includes sensitive information (e.g. ACL Tokens, Vault Tokens), so they need to maintained with care. Also, it may contain ephemeral information about external world that may no longer be relevant - allocations/evaluations about nodes from two days ago may no longer relevant.

I'm mulling over some design consideration for restore capability to accommodate few use cases (disaster recovery vs provisioning a new cluster with recovered jobs but targeting new nodes), and to properly handle ephemeral state. I'll follow up with a new PR.

@notnoop notnoop requested review from schmichael and drewbailey May 26, 2020 13:05
@notnoop notnoop self-assigned this May 26, 2020
err = r.forwardLeader(remoteServer, method, args, reply)
return true, err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even though it isn't exported a comment here explaining that nil means we are the leader would be helpful

op.srv.setQueryMeta(&reply.QueryMeta)

// Take the snapshot and capture the index.
snap, err := snapshot.New(op.logger, op.srv.raft)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it might be helpful to rename the logger that's passed in to differentiate from the server logger


// SnapshotSaveResponse is the header for the streaming snapshot endpoint,
// and followed by the snapshot file content.
// It is written to the
Copy link
Contributor

@drewbailey drewbailey May 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfinished comment?

Copy link
Contributor

@drewbailey drewbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me just a few small things

Mahmood Ali and others added 3 commits May 31, 2020 21:29
The callers for `forward` and old implementation expect failures to be
accompanied with a true value!  This fixes the issue and have tests
passing!
@notnoop notnoop merged commit 3b04afe into master Jun 1, 2020
@notnoop notnoop deleted the f-snapshot-save branch June 1, 2020 11:55
@github-actions
Copy link

github-actions bot commented Jan 4, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants