-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify memory and disk requirements before install #4161
Verify memory and disk requirements before install #4161
Conversation
I expect this to make the installation CI jobs to fail because the hosts probably don't meet the requirements of memory and disk... let's see. |
aos-ci-test |
playbooks/byo/config.yml
Outdated
@@ -1,2 +1,14 @@ | |||
--- | |||
- name: Verify Requirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we use playbooks/byo
only as entry points. The verify code should go under playbooks/common
. @mtnbikenc ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @mtnbikenc was working on some graphical representation of what calls what... that would be useful to look at now. Any pointers to this dependency graph somewhere?
@@ -46,18 +46,25 @@ def run(self, tmp=None, task_vars=None): | |||
|
|||
result["checks"] = check_results = {} | |||
|
|||
user_disabled_checks = [ | |||
check.strip() | |||
for check in task_vars.get("openshift_disable_check", "").split(",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the list of all checks to disable listed somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. Presumably there will eventually be docs for this.
The checks that did fail are mentioned in the output. We could make that more obvious with a summary line so that after a failure the user can review and decide if they don't care about the failures or want to reconfigure thresholds etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I make a brief mention of this var in openshift/openshift-docs#4452
The error logs show what I wanted to see:
I'll send a patch to include a sentence after the error explaining how to disable checks. |
Now the message should look like this, only where there are failed checks:
|
playbooks/byo/config.yml
Outdated
@@ -1,2 +1,14 @@ | |||
--- | |||
- name: Verify Requirements | |||
# REVIEW: what's the proper group to use: OSEv3, g_all_hosts or something else? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OSEv3 at this location; however these changes might go better in playbooks/byo/openshift-cluster/config.yml
which is included only by this file. They could go after
- include: initialize_groups.yml
... and actually use those group names.
Or if we put this in a playbook in common we can rely on the groups too.
try: | ||
r = check.run(tmp, task_vars) | ||
except OpenShiftCheckException as e: | ||
r = {} | ||
r["failed"] = True | ||
r["msg"] = str(e) | ||
else: | ||
# TODO(rhcarvalho): we may want to provide some distinctive | ||
# complementary message to know why a check was skipped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to distinguish why a check was skipped, not necessarily to be implemented in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually should be easy enough to record that it's either explicitly skipped or not active.
if result.get('failed', False) | ||
] | ||
# FIXME: get name of currently running playbook, if possible. | ||
NAME_OF_PLAYBOOK = 'playbooks/byo/config.yml' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be tricky to figure out, if possible. Didn't have much time to keep digging. We may also just drop this information from the output.
'Check names are shown in the failure summary above.\n' | ||
'The variable can be set in the inventory or passed in the ' | ||
'command line using the -e flag to ansible-playbook.' | ||
).format(NAME_OF_PLAYBOOK, ','.join(sorted(set(failed_checks)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: we need to remove duplicates. Sorting provides consistent output.
playbooks/byo/config.yml
Outdated
args: | ||
checks: | ||
- disk_availability | ||
- memory_availability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea: this block might become a playbooks/openshift-check/pre-install.yml
, and then we simply include it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checks we want to run automatically on every install may not coincide with the checks we want to run if they're specifically running pre-install checks. I'll think about this a bit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitting review comments I forgot to on Friday :-(
playbooks/byo/config.yml
Outdated
@@ -1,2 +1,14 @@ | |||
--- | |||
- name: Verify Requirements | |||
# REVIEW: what's the proper group to use: OSEv3, g_all_hosts or something else? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OSEv3 at this location; however these changes might go better in playbooks/byo/openshift-cluster/config.yml
which is included only by this file. They could go after
- include: initialize_groups.yml
... and actually use those group names.
Or if we put this in a playbook in common we can rely on the groups too.
playbooks/byo/config.yml
Outdated
args: | ||
checks: | ||
- disk_availability | ||
- memory_availability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checks we want to run automatically on every install may not coincide with the checks we want to run if they're specifically running pre-install checks. I'll think about this a bit...
try: | ||
r = check.run(tmp, task_vars) | ||
except OpenShiftCheckException as e: | ||
r = {} | ||
r["failed"] = True | ||
r["msg"] = str(e) | ||
else: | ||
# TODO(rhcarvalho): we may want to provide some distinctive | ||
# complementary message to know why a check was skipped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually should be easy enough to record that it's either explicitly skipped or not active.
aos-ci-test |
NAME_OF_PLAYBOOK = 'playbooks/byo/config.yml' | ||
msg = ( | ||
"\nThe execution of the playbook '{}' includes checks designed " | ||
'to ensure it can complete successfully. One or more of these ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this message is directed at the checks being run as part of the install playbook. I would like it to be different for running the pre-install playbook explicitly, and for running health checks. I think it would make sense to add a role variable that the playbook sets to give some context for which of those it is, so this plugin can adjust output accordingly.
b0bd18e
to
06e9eaa
Compare
aos-ci-test |
New example output when there are failed checks:
If nothing fails, there's no summary. If something other than the checks fails, then failures are still nicely displayed by host but the summary about configuring checks is not displayed, e.g.:
|
Example usage: $ ansible-playbook -i hosts playbooks/byo/config.yml -e openshift_disable_check=memory_availability,disk_availability Or add the variable to the inventory / hosts file.
@@ -3,6 +3,19 @@ | |||
tags: | |||
- always | |||
|
|||
- name: Verify Requirements | |||
hosts: OSEv3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will include other host types like [nfs] and [lb] which we don't want to apply our checks to. We should probably move this into playbooks/common/openshift-cluster/config.yml and use the oo_masters_to_config:oo_nodes_to_config:oo_etcd_to_config there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I see that you're selecting which checks to apply based on byo group names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be best to use the right groups in the checks (the way checks are structured, they all run under one task which runs against all hosts, so they filter themselves out by various criteria). It's just not obvious to me what the groups would ideally be, since I haven't grokked what the various groups are for and where they come from.
aos-ci-test |
[merge] |
Jenkins tests failing because of the new checks. I guess I have to update those as well. |
[test] with CI changes |
[merge] |
[test] again with openshift-eng/aos-cd-jobs#288 in |
[test] |
Evaluated for openshift ansible test up to da04b57 |
[merge][severity: bug] |
If that fails we'll manually merge. |
Evaluated for openshift ansible merge up to da04b57 |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/155/) (Base Commit: a353d4d) |
continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/459/) (Base Commit: a353d4d) (Extended Tests: bug) |
Thx :) |
free_bytes = self.openshift_available_disk(ansible_mounts) | ||
|
||
recommended_min = max(self.recommended_disk_space_bytes.get(name, 0) for name in group_names) | ||
configured_min = int(get_var(task_vars, "openshift_check_min_host_disk_gb", default=0)) * 10**9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sosiouxme this variable seems to be too generic, how was this supposed to work?
I'm updating this check now to check multiple paths / mount points (/var, /usr/local/bin, /tmp or equivalent), and in that case this variable becomes ambiguous, which filesystem should I apply it to?
Is it used somewhere we know of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not used anywhere except here... I made it up
Points to discuss:
Where exactly to call the checks
I have considered placing the verification down to the
openshift-cluster
role, though I am afraid it might end up affecting runs other than an initial install. Open to suggestions.How to disable the checks
We need a mechanism for opting out of the checks. The mechanism so far is a comma-separated list of check names defined in the inventory (or passed in the command line) as
openshift_disable_check
. Open to alternatives and naming suggestions.Example call:
$ ansible-playbook -i hosts playbooks/byo/config.yml -e openshift_disable_check=memory_availability,disk_availability
Maybe we could also have a way to disable all (
openshift_disable_check=all
).