-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: enable configuring enable_tag_override for services #7106
Conversation
Consul provides a feature of Service Definitions where the tags associated with a service can be modified through the Catalog API, overriding the value(s) configured in the agent's service configuration. To enable this feature, the flag enable_tag_override must be configured in the service definition. Previously, Nomad did not allow configuring this flag, and thus the default value of false was used. Now, it is configurable. Because Nomad itself acts as a state machine around the the service definitions of the tasks it manages, it's worth describing what happens when this feature is enabled and why. Consider the basic case where there is no Nomad, and your service is provided to consul as a boring JSON file. The ultimate source of truth for the definition of that service is the file, and is stored in the agent. Later, Consul performs "anti-entropy" which synchronizes the Catalog (stored only the leaders). Then with enable_tag_override=true, the tags field is available for "external" modification through the Catalog API (rather than directly configuring the service definition file, or using the Agent API). The important observation is that if the service definition ever changes (i.e. the file is changed & config reloaded OR the Agent API is used to modify the service), those "external" tag values are thrown away, and the new service definition is once again the source of truth. In the Nomad case, Nomad itself is the source of truth over the Agent in the same way the JSON file was the source of truth in the example above. That means any time Nomad sets a new service definition, any externally configured tags are going to be replaced. When does this happen? Only on major lifecycle events, for example when a task is modified because of an updated job spec from the 'nomad job run <existing>' command. Otherwise, Nomad's periodic re-sync's with Consul will now no longer try to restore the externally modified tag values (as long as enable_tag_override=true). Fixes #2057
This comes with a litter helper script for manual checking #!/usr/bin/env bash
set -euo pipefail
function slice {
args=("${@}")
quotes=$(echo "${args[@]@Q}" | tr "'" '"' | sed -e 's/ /, /g')
echo "[${quotes}]"
}
job="eto-example"
service="sleep" # just the one for now
testcase="${1}"
host=$(hostname)
nomadV=$(nomad version)
consulV=$(consul version | xargs | cut -d' ' -f1,2)
function serviceID {
echo "$(curl -s localhost:8500/v1/catalog/service/sleep | jq -r .[0].ServiceID)"
}
function setTags {
payload=$(cat <<EOM
{
"Node": "${host}",
"Address": "127.0.0.1",
"DC": "dc1",
"Service": {
"ID": "$(serviceID "${service}")",
"Service": "${service}",
"EnableTagOverride": true,
"Tags": $(slice "$@")
}
}
EOM
)
tmp=$(mktemp)
echo "${payload}" > "${tmp}"
curl -XPUT localhost:8500/v1/catalog/register -d "@${tmp}"
}
function startExample {
echo "[will start ${job} nomad job with enable_tag_override=${1}]"
payload=$(cat <<EOM
job "${job}" {
datacenters = ["dc1"]
type = "service"
group "group" {
task "${service}" {
driver = "raw_exec"
config {
command = "/bin/sleep"
args = ["10000"]
}
service {
name = "${service}"
tags = ["original", "tags"]
enable_tag_override = ${1}
}
}
}
}
EOM
)
tmp=$(mktemp)
echo "${payload}" > "${tmp}"
nomad job run "${tmp}"
}
function stopExample {
nomad job stop "${job}"
}
function watchTags {
watch "curl -s localhost:8500/v1/catalog/service/${service} | jq '.[0] | .ServiceID, .ServiceName, .ServiceTags'"
}
function showService {
curl "localhost:8500/v1/catalog/service/${service}"
}
###################
### entry point ###
###################
echo "[setup] host: ${host}"
echo "[setup] nomad version: ${nomadV}"
echo "[setup] consul version: ${consulV}"
echo "[setup] action: ${testcase}"
case "${testcase}" in
"set-tags")
echo "--- set-tags ---"
setTags some new tags
;;
"watch-tags")
echo "--- watch-tags ---"
watchTags
;;
"show-service")
echo "--- show-service ---"
showService
;;
"start-example")
echo "--- start-example ---"
startExample "${2}"
;;
"stop-example")
echo "--- stop-example ---"
stopExample
;;
*)
echo "not a valid test case"
exit 1
;;
esac Usage outline # compile nomad
# $ go install
#
# run nomad
# $ nomad agent -dev -log-level=INFO
#
# run consul
# $ consul agent -dev
#
# keep a tab watching the tags of our service
# ./demo.sh watch-tags
#
# create example job, with enable_tag_override=true
# ./demo.sh start-example true
#
# do a manual update on the tags via consul catalog
# ./demo.sh set-tags
#
# (the watch-tags tab should show the change)
# can also double check the entire service output
# ./demo.sh show-service
#
# now wait ~60 seconds for Consul anti-entropy to take place
# something like: [DEBUG] agent: Node info in sync
#
# the tags should not be changing (indicating Consul is
# respecting the ETO field in the service definition)
#
# now wait another ~30 seconds for Nomad's periodic resync
# to take place
# (there does not seem to be a nice log line indicating the
# periodic resync if nothing happens)
# just note that the tags never get reset to their original values
# to test the ETO=false behavior is unchanged, do all of the above
# but with: start-example false
#
# note that the consul anti-entropy kicks in and restores the tags
# to their original values from the nomad jobspec (about 1 minute
# for a tiny cluster size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me - I like the fairly neat changes to the update detector.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Consul provides a feature of Service Definitions where the tags
associated with a service can be modified through the Catalog API,
overriding the value(s) configured in the agent's service configuration.
To enable this feature, the flag enable_tag_override must be configured
in the service definition.
Previously, Nomad did not allow configuring this flag, and thus the default
value of false was used. Now, it is configurable.
Because Nomad itself acts as a state machine around the the service definitions
of the tasks it manages, it's worth describing what happens when this feature
is enabled and why.
Consider the basic case where there is no Nomad, and your service is provided
to consul as a boring JSON file. The ultimate source of truth for the definition
of that service is the file, and is stored in the agent. Later, Consul performs
"anti-entropy" which synchronizes the Catalog (stored only the leaders). Then
with enable_tag_override=true, the tags field is available for "external"
modification through the Catalog API (rather than directly configuring the
service definition file, or using the Agent API). The important observation
is that if the service definition ever changes (i.e. the file is changed &
config reloaded OR the Agent API is used to modify the service), those
"external" tag values are thrown away, and the new service definition is
once again the source of truth.
In the Nomad case, Nomad itself is the source of truth over the Agent in
the same way the JSON file was the source of truth in the example above.
That means any time Nomad sets a new service definition, any externally
configured tags are going to be replaced. When does this happen? Only on
major lifecycle events, for example when a task is modified because of an
updated job spec from the 'nomad job run ' command. Otherwise,
Nomad's periodic re-sync's with Consul will now no longer try to restore
the externally modified tag values (as long as enable_tag_override=true).
Fixes #2057