Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tighten storage-operator permissions against Salt #2635

Merged
merged 4 commits into from
Jun 23, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 27 additions & 22 deletions eve/create-volumes.sh
Original file line number Diff line number Diff line change
Expand Up @@ -59,28 +59,33 @@ sed "s/NODE_NAME/${NODE_NAME}/" \
"${PRODUCT_MOUNT}/examples/prometheus-sparse.yaml" | \
kubectl apply -f -

echo "Waiting for PV '$NODE_NAME-alertmanager' to be provisioned"
if ! retry "$MAX_TRIES" check_pv_exists "$NODE_NAME-alertmanager"; then
echo "PV not created"
exit 1
fi
wait_for_pv() {
local -r pv="$1"

echo "Waiting for PV '$pv' to be provisioned"
if ! retry "$MAX_TRIES" check_pv_exists "$pv"; then
echo "PV '$pv' not created"
kubectl logs -n kube-system deploy/storage-operator
exit 1
fi
}

echo "Waiting for PV '$NODE_NAME-prometheus' to be provisioned"
if ! retry "$MAX_TRIES" check_pv_exists "$NODE_NAME-prometheus"; then
echo "PV not created"
exit 1
fi
wait_for_pv "$NODE_NAME-alertmanager"
wait_for_pv "$NODE_NAME-prometheus"

echo 'Waiting for AlertManager to be running'
if ! retry "$MAX_TRIES" check_pod_is_in_phase metalk8s-monitoring \
alertmanager-prometheus-operator-alertmanager-0 Running; then
echo "AlertManager is not Running"
exit 1
fi
wait_for_pod() {
local -r name="$1" namespace="$2" pod="$3"

echo 'Waiting for Prometheus to be running'
if ! retry "$MAX_TRIES" check_pod_is_in_phase metalk8s-monitoring \
prometheus-prometheus-operator-prometheus-0 Running; then
echo "Prometheus is not Running"
exit 1
fi
echo "Waiting for $name to be running"
if ! retry "$MAX_TRIES" check_pod_is_in_phase "$namespace" "$pod" Running
then
echo "$name is not Running"
kubectl describe pod -n "$namespace" "$pod"
exit 1
fi
}

wait_for_pod "AlertManager" \
metalk8s-monitoring alertmanager-prometheus-operator-alertmanager-0
wait_for_pod "Prometheus" \
metalk8s-monitoring prometheus-prometheus-operator-prometheus-0
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,9 @@ external_auth:
storage-operator:
- '*':
- 'disk.dump'
- 'state.sls'
- 'state.sls':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ho, this was simpler than I thought ^^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the theory, not working atm 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK it looks like it's working actually!

kwargs:
mods: 'metalk8s\.volumes.*'
- '@jobs'

# `kubeconfig` file and `context` used by salt to interact with apiserver
Expand Down
67 changes: 51 additions & 16 deletions storage-operator/deploy/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,36 +4,63 @@ metadata:
creationTimestamp: null
name: storage-operator
rules:
# For recording transition events
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to add comments in this file 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, if we update operator-sdk/controller-runtime (or move to kubebuilder), the RBAC rules are to be defined in the Go code using annotations, and this file is to be generated by the tooling.

Copy link
Contributor Author

@gdemonet gdemonet Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah... Debt ticket #2644

- apiGroups:
- ""
resources:
- pods
- nodes
- services
- endpoints
- persistentvolumes
- persistentvolumeclaims
- events
- configmaps
- secrets
verbs:
- '*'
- create
- patch
# For setting up monitoring for itself
- apiGroups:
- ""
resources:
- services
verbs:
- get
- create
- update
- apiGroups:
- apps
resources:
- deployments
- daemonsets
- replicasets
- statefulsets
verbs:
- '*'
- get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the storage-operator need to read deployments and replicasets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes from kubemetrics: when creating the Service, it reads its Pod's owner controller (a ReplicaSet) and this controller's controller (a Deployment) to determine the right selector to use, I think. The information is disseminated in several issues and pull requests in operator-sdk, and I found most of them when investigating the Failed to list *unstructured.Unstructured error logs. Could do some digging up my history if you want, but I can confirm that there were unauthorized errors if I remove those permissions.

- apiGroups:
- monitoring.coreos.com
resources:
- servicemonitors
verbs:
- get
- create
# For reading its own name and namespace
- apiGroups:
- ""
resources:
- pods
verbs:
- get
# For managing its lock
- apiGroups:
- ""
resources:
- configmaps
resourceNames:
- storage-operator-lock
verbs:
- get
- update
- apiGroups:
- ""
resources:
- configmaps
verbs:
# NOTE: cannot scope "create" to a resourceName, see
# https://kubernetes.io/docs/reference/access-authn-authz/rbac/#referring-to-resources
- create
# For managing its own graceful termination
- apiGroups:
- apps
resourceNames:
Expand All @@ -42,24 +69,32 @@ rules:
- deployments/finalizers
verbs:
- update
# For managing owned PVs
- apiGroups:
- ""
resources:
- pods
- persistentvolumes
verbs:
- get
- '*'
# For reading a Node's MetalK8s version
- apiGroups:
- apps
- ""
resources:
- replicasets
- nodes
verbs:
- get
# NOTE: we only use "get" in code, but the controller-runtime tooling uses
# "list" and "watch" to manage a cache
- list
- watch
# For every custom resource from this Operator
- apiGroups:
- storage.metalk8s.scality.com
resources:
- '*'
verbs:
- '*'
# For reading device preparation details (formatting and mounting options)
- apiGroups:
- storage.k8s.io
resources:
Expand Down
4 changes: 3 additions & 1 deletion tests/post/features/salt_api.feature
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,10 @@ Feature: SaltAPI
Scenario: Login to SaltAPI using a ServiceAccount
Given the Kubernetes API is available
When we login to SaltAPI with the ServiceAccount 'storage-operator'
Then we can invoke '["disk.dump", "state.sls"]' on '*'
Then we can invoke '["disk.dump", {"state.sls": {"kwargs": {"mods": r"metalk8s\.volumes.*"}}}]' on '*'
And we have '@jobs' perms
And we can not ping all minions
And we can not run state 'test.nop' on '*'

Scenario: Login to SaltAPI using an incorrect password
Given the Kubernetes API is available
Expand Down
76 changes: 58 additions & 18 deletions tests/post/steps/test_salt_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,17 @@
from pytest_bdd import parsers, scenario, then, when


def _negation(value):
"""Parse an optional negation after a verb (in a Gherkin feature spec)."""
if value == "":
return False
elif value in [" not", "not"]:
return True
else:
raise ValueError(
"Cannot parse '{}' as an optional negation".format(value)
)

# Scenario {{{


Expand Down Expand Up @@ -71,26 +82,34 @@ def login_salt_api_token(host, k8s_client, account_name, version, context):
# Then {{{


@then('we can ping all minions')
def ping_all_minions(host, context):
result = requests.post(
context['salt-api']['url'],
json=[
{
'client': 'local',
'tgt': '*',
'fun': 'test.ping',
},
],
headers={
'X-Auth-Token': context['salt-api']['token'],
},
verify=False,
)
@then(parsers.cfparse(
'we can{negated:Negation?} ping all minions',
extra_types={'Negation': _negation}
))
def ping_all_minions(host, context, negated):
result = _salt_call(context, 'test.ping', tgt='*')

if negated:
assert result.status_code == 401
assert 'No permission' in result.text
else:
result_data = result.json()
assert result_data['return'][0] != []

result_data = result.json()

assert result_data['return'][0] != []
@then(parsers.cfparse(
"we can{negated:Negation?} run state '{module}' on '{targets}'",
extra_types={'Negation': _negation}
))
def run_state_on_targets(host, context, negated, module, targets):
result = _salt_call(context, 'state.sls', tgt=targets,
kwarg={'mods': module})

if negated:
assert result.status_code == 401
assert 'No permission' in result.text
else:
assert result.status_code == 200


@then('authentication fails')
Expand Down Expand Up @@ -164,4 +183,25 @@ def _salt_api_login(address, username=None, password=None, token=None):
return result


def _salt_call(context, fun, tgt='*', arg=None, kwarg=None):
action = {
'client': 'local',
'tgt': tgt,
'fun': fun,
}
if arg is not None:
action['arg'] = arg
if kwarg is not None:
action['kwarg'] = kwarg

return requests.post(
context['salt-api']['url'],
json=[action],
headers={
'X-Auth-Token': context['salt-api']['token'],
},
verify=False,
)


# }}}