Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "nf_conntrack", check for 90% full #369

Merged
merged 1 commit into from
Nov 29, 2019
Merged

Support "nf_conntrack", check for 90% full #369

merged 1 commit into from
Nov 29, 2019

Conversation

arekkusu
Copy link
Contributor

@arekkusu arekkusu commented Oct 3, 2019

  • Script was checking for "ip_conntrack_..." which was replaced by "nf_conntrack_..." on newer system. Now support both.

  • Return failure ("not ok") when table is more than 90% full. Note sure what value is best here but I think that is better than when the table is full. Otherwise we might end up with a value close to the max or bouncing around.

  • Replaced cat by "$(< file )" to avoid calling external command

@k8s-ci-robot
Copy link
Contributor

Welcome @arekkusu!

It looks like this is your first PR to kubernetes/node-problem-detector 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/node-problem-detector has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Oct 3, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @arekkusu. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 3, 2019
@arekkusu
Copy link
Contributor Author

CLA should be ok now (reply should trigger re-check)

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Oct 10, 2019
@Random-Liu
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 25, 2019
Copy link
Contributor

@xueweiz xueweiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @arekkusu !
Overall it looks good to me :) However there are a few small stylistic suggestions, following this Bash style guide.
If you can help addressing them, that'd be great, and I'll happily approve the PR :) Thank you!

config/plugin/network_problem.sh Outdated Show resolved Hide resolved
config/plugin/network_problem.sh Outdated Show resolved Hide resolved

conntrack_max=$(cat /proc/sys/net/ipv4/netfilter/ip_conntrack_max)
conntrack_count=$(cat /proc/sys/net/ipv4/netfilter/ip_conntrack_count)
if [ -f $NF_CONNTRACK_COUNT ] && [ -f $NF_CONNTRACK_MAX ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could you help change the [ -f xxx ]s here and below into [[ -f xxx ]]? Thanks! :)
This is because [[ ]] is more prefered than [ ]. If you could help use this chance to help improve the code quality, I'm sure everyone would appreciate it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use [[ ]], I also switched to 2 space indent to follow Google Shell Style Guide.

I think same should be done for check_ntp.sh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yes I agree, we should update check_ntp.sh as well as a small improvement.

config/plugin/network_problem.sh Outdated Show resolved Hide resolved
exit $UNKNOWN
fi

conntrack_count=$(< $CONNTRACK_COUNT) || exit $UNKNOWN
Copy link
Contributor

@xueweiz xueweiz Nov 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We have conntrack_count and CONNTRACK_COUNT, one for the value and another one for the path. do you mind appending _PATH to the path constants (CONNTRACK_*, NF_CONNTRACK_*, IP_CONNTRACK_*)? Thanks so much!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a nit, you're right about the naming.
I added PATH and shortened CONNTRACK to CT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me! Thanks! :)

if (( conntrack_count >= conntrack_max )); then
echo "Conntrack table full"
if (( conntrack_count > conntrack_max /10 * 9)); then
echo "Conntrack table is more than 90% full"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe
echo "Conntrack table is more than 90% used: ${conntrack_count} out of ${conntrack_max}"

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to included this additional information for both OK and NONOK exit.
In addition I choose to output to STDERR for NONOK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the trouble. Could you help change the print back to STDOUT? :P

This is a small detail in custom-plugin-monitor's interface, where only STDOUT is collected, and STDERR is ignored. See the design and the implementation for this detail.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 29, 2019
@arekkusu
Copy link
Contributor Author

Really appreciate your review and input. Have a look at the update and let me know what you think.

@xueweiz
Copy link
Contributor

xueweiz commented Nov 29, 2019

Hi @arekkusu , thanks a lot for making the changes! Just one more small item for the STDOUT thing :) once that's fixed I'll approve and merge the PR.
Thanks!

@xueweiz
Copy link
Contributor

xueweiz commented Nov 29, 2019

And by the way, you can (optionally) squash all the 2-3 commits into one. We typically do this to have a simpler/cleaner commit history. But I understand that a lot of people choose to not do that to keep a history for the code review.
Your choice :) I'm fine with either way.

- Script was checking for "ip_conntrack_..." which was replaced by "nf_conntrack_..." on newer system. Now support both.

-  Return failure ("not ok") when table is more than 90% full.
  - Not sure what value is best here but I think that is better than when the table is full.
    Otherwise we might end up with a value close to the max or bouncing around.

- Replaced cat by "$(< file )" to avoid calling external command
- Follow Google "Shell Style Guide": 2 space indent, use preferred "[[ test ]]", add "readonly"
- Include current connection usage in output message
@arekkusu
Copy link
Contributor Author

Ok regarding outputting to STDOUT. Updated and squashed the commits.

BTW do see any changes needed to network-problem-monitor.json? I guess ConntrackFullis no longer fully accurate (ConntrackPressureorConntrackHighUsage`). I am not sure it's something we want / need to change.

{
"plugin": "custom",
"pluginConfig": {
"invoke_interval": "30s",
"timeout": "5s",
"max_output_length": 80,
"concurrency": 3
},
"source": "network-custom-plugin-monitor",
"metricsReporting": true,
"conditions": [],
"rules": [
{
"type": "temporary",
"reason": "ConntrackFull",
"path": "./config/plugin/network_problem.sh",
"timeout": "3s"
}
]
}

@arekkusu
Copy link
Contributor Author

I think the (draft?) design doc doesn't clearly indicate the UNKNOWN return status.

Implementation considers anything else than 0 or 1 as unknown

switch exitCode {
case 0:
return cpmtypes.OK, output
case 1:
return cpmtypes.NonOK, output
default:
return cpmtypes.Unknown, output
}
}

This test does refer to exit code 2 as unkown

#!/usr/bin/env bash
echo "UNKNOWN"
exit 3

Let me know if you think it's better to use 3 as exit code for UNKNOWN ...

@xueweiz
Copy link
Contributor

xueweiz commented Nov 29, 2019

I guess ConntrackFull is no longer fully accurate (ConntrackPressure or ConntrackHighUsage). I am not sure it's something we want / need to change.

I agree with you, I think ConntrackPressure seems like a good option. However I don't know whether it's worth it to change it...Since users may have already doing detecting & remedy based on the ConntrackFull event that NPD reports. Changing it is more visible to the users. I don't have a strong opinion in either way. I think your PR is fine as is.
+cc @andyxning , who reviewed the original PR #152 .

Let me know if you think it's better to use 3 as exit code for UNKNOWN ...

I think keeping it as is (which is 2) is OK :)
(I'm quite curious why was this not clearly defined lol

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 29, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arekkusu, xueweiz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 29, 2019
@k8s-ci-robot k8s-ci-robot merged commit f535592 into kubernetes:master Nov 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants