Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: Remove trace mode and trace type #2353

Merged
merged 1 commit into from
Oct 15, 2021

Conversation

jodh-intel
Copy link
Contributor

Remove the trace_mode and trace_type agent tracing options as
decided in the Architecture Committee meeting.

See:

Fixes: #2352.

Signed-off-by: James O. D. Hunt [email protected]

@jodh-intel jodh-intel added the do-not-merge PR has problems or depends on another label Jul 29, 2021
@jodh-intel
Copy link
Contributor Author

Added dnm label as this still needs to be manually tested...

@jodh-intel jodh-intel force-pushed the rm-trace-type-and-mode branch from b075d8d to 090cd8e Compare July 30, 2021 10:06
@jodh-intel jodh-intel removed the do-not-merge PR has problems or depends on another label Jul 30, 2021
@jodh-intel
Copy link
Contributor Author

Manual tests worked so please review so we can land y'all!

Related: #2363.

@jodh-intel
Copy link
Contributor Author

/test

@jodh-intel
Copy link
Contributor Author

/cc @liubin.

@jodh-intel jodh-intel force-pushed the rm-trace-type-and-mode branch from 090cd8e to 2a09dd0 Compare August 9, 2021 13:44
@jodh-intel
Copy link
Contributor Author

/test

if let Ok(result) = value.parse::<tracer::TraceType>() {
self.tracing = result;
if let Ok(value) = env::var(TRACING_ENV_VAR) {
match value.as_str() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use get_bool_value to parse the config value like command-line way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @liubin! Updated.

@liubin
Copy link
Member

liubin commented Aug 10, 2021

And just a reminder that don't forget to update docs/how-to/how-to-set-sandbox-config-kata.md to remove these trace options. @jodh-intel

@jodh-intel jodh-intel force-pushed the rm-trace-type-and-mode branch from 2a09dd0 to 781aa92 Compare August 10, 2021 08:31
@jodh-intel jodh-intel requested a review from a team as a code owner August 10, 2021 08:31
@jodh-intel
Copy link
Contributor Author

Thanks @liubin - doc also updated.

@jodh-intel
Copy link
Contributor Author

/test

Copy link

@cmaf cmaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodh-intel, I tested this with tracing enabled for runtime and agent in the config file and see an error consistently:

ctr: Failed to check if grpc server is working: rpc error: code = DeadlineExceeded desc = timed out connecting to vsock 2900665692:1024: unknown

With only runtime tracing enabled, I am able run and trace with ctr. I also tested tracing with the latest from main and don't see this error with agent tracing configured. I verified that the host kernel was built with the CONFIG_VHOST_VSOCK option and that the module was loaded.

I don't see anything in your PR that explicitly modifies anything related to vsock...thoughts?

@jodh-intel jodh-intel force-pushed the rm-trace-type-and-mode branch from 781aa92 to 07104ec Compare August 24, 2021 14:26
@jodh-intel
Copy link
Contributor Author

/test

@jodh-intel
Copy link
Contributor Author

Hi @cmaf - I've just re-tested on a fully up-to-date Ubuntu 20.04.3 system using Kata 2.2.0-rc0 + this PR with containerd v1.5.5 (72cec4be58a9eb6b2910f5d10f1c01ca47d231c0), and it wfm.

If you can still repeat this issue, can you revert to the main version of the code to definitively rule out the host kernel? It might be worth checking the behaviour on another system and checking the journal for vsock error messages (or stray processes?)

@jodh-intel
Copy link
Contributor Author

@cmaf - wait a sec - are you using an image= or initrd=? If the latter, you may have hit #2443.

@cmaf
Copy link

cmaf commented Aug 31, 2021

@jodh-intel I tried a completely new install of 20.04 with the latest kata containers built from main, both containerd v1.5.2 and v1.5.5, and I can get tracing output for the agent. Both on bare metal and a VM. However, when I switch to this PR, I get the same error as above. Increasing the timeout did not fix it. I also verified that I am using image= in the configuration file.

I used the setup.sh script from the ci directory to install and configure, and further updated containerd to v1.5.5. Maybe I am missing a configuration step that you took?

Log output:

Aug 28 02:02:47 system-tracing-setup kata[108121]: time="2021-08-28T02:02:47.323236925Z" level=warning msg="Unable to connect to unix socket (/run/vc/vm/foo/qmp.sock): dial unix /run/vc/vm/foo/qmp.sock: connect: no such file or directory" name=containerd-shim-v2 pid=108121 sandbox=foo source=virtcontainers subsystem=qmp
Aug 28 02:02:47 system-tracing-setup kata[108121]: time="2021-08-28T02:02:47.323445025Z" level=error msg="Failed to connect to QEMU instance" error="dial unix /run/vc/vm/foo/qmp.sock: connect: no such file or directory" name=containerd-shim-v2 pid=108121 sandbox=foo source=virtcontainers subsystem=qemu
Aug 28 02:03:15 system-tracing-setup kata[108121]: time="2021-08-28T02:03:15.516250547Z" level=warning msg="sandbox cgroups path is empty" name=containerd-shim-v2 pid=108121 sandbox=foo source=virtcontainers subsystem=sandbox
Aug 28 02:03:15 system-tracing-setup kata[108121]: time="2021-08-28T02:03:15.519950348Z" level=warning msg="failed to cleanup netns" error="failed to get netns /var/run/netns/cnitest-c0ca549b-0de0-b343-426b-5c267956e6aa: failed to Statfs \"/var/run/netns/cnitest-c0ca549b-0de0-b343-426b-5c267956e6aa\": no such file or directory" name=containerd-shim-v2 path=/var/run/netns/cnitest-c0ca549b-0de0-b343-426b-5c267956e6aa pid=108121 sandbox=foo source=katautils

@cmaf
Copy link

cmaf commented Aug 31, 2021

I tested this with some changes to configuration.toml network settings, but cannot get output. When agent tracing is enabled, it looks the vm isn't being created.

@jodh-intel jodh-intel force-pushed the rm-trace-type-and-mode branch from 07104ec to d092cc2 Compare September 3, 2021 15:42
Copy link

@cmaf cmaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodh-intel I tested with agent debug enabled and saw an "invalid trace type" error followed by a shutdown of the agent:

Sep 15 19:33:18 test-system kata[175979]: time="2021-09-15T19:33:18.580312355Z" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/foo/console.sock name=containerd-shim-v2 pid=175979 sandbox=foo source=virtcontainers subsystem=sandbox vmconsole="Error: invalid trace type"
Sep 15 19:33:18 test-system kata[175979]: time="2021-09-15T19:33:18.583842736Z" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/foo/console.sock name=containerd-shim-v2 pid=175979 sandbox=foo source=virtcontainers subsystem=sandbox vmconsole="[\x1b[0;32m  OK  \x1b[0m] Stopped target \x1b[0;1;39mKata Containers Agent Target\x1b[0m."

The VM eventually shuts down as well. Please see code comments.

@@ -56,7 +55,7 @@ impl FromStr for TraceType {
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L51-53:

"isolated" => Ok(TraceType::Isolated),
"disabled" => Ok(TraceType::Disabled),
_ => Err(TraceTypeError::new("invalid trace type")),

Shoud TraceType be removed and replaced with a boolean like it is elsewhere in the code?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @jodh-intel on a second round of testing, I got the PR working with no error. However, the code referring to TraceType should probably still be removed.

@jodh-intel jodh-intel force-pushed the rm-trace-type-and-mode branch 2 times, most recently from c648901 to 8023d7e Compare October 14, 2021 14:02
@jodh-intel
Copy link
Contributor Author

@cmaf - I've removed the TraceType from the tracer and updated the branch. Please tal so we can try to get this merged soon.

@jodh-intel
Copy link
Contributor Author

/test

Copy link

@cmaf cmaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodh-intel I tested it and it is working, but there is a static check failure:

[static-checks.sh:360] INFO: Running golangci-lint on /home/runner/work/kata-containers/kata-containers/src/github.com/kata-containers/kata-containers/src/runtime/cmd/kata-runtime
kata-env.go:153:14: fieldalignment: struct with 520 pointer bytes could be 512 (govet)
type EnvInfo struct {
             ^
make: *** [Makefile:35: static-checks] Error 1

Remove the `trace_mode` and `trace_type` agent tracing options as
decided in the Architecture Committee meeting.

See:

- kata-containers#2062

Fixes: kata-containers#2352.

Signed-off-by: James O. D. Hunt <[email protected]>
@jodh-intel jodh-intel force-pushed the rm-trace-type-and-mode branch from 8023d7e to 321be0f Compare October 15, 2021 09:12
@jodh-intel
Copy link
Contributor Author

@cmaf - fixed.


golangci-lint still doesn't show the suggested linter fix when it bleats. Until it does, the following are good places to look:

@jodh-intel
Copy link
Contributor Author

/test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove trace_mode and trace_type config options
3 participants