Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics CI: Failing bounds checks, since v1.5 release? #102

Closed
grahamwhaley opened this issue Jan 25, 2019 · 5 comments
Closed

metrics CI: Failing bounds checks, since v1.5 release? #102

grahamwhaley opened this issue Jan 25, 2019 · 5 comments

Comments

@grahamwhaley
Copy link
Contributor

The metrics CI started failing its checks, just about the time we did a v1.5 release (could be co-incidental).

Looking at the json from the boot-times results, it seems that the qemu and kernel versions may have changed...

(note, I have pinned those builds as 'keep forever' in Jenkins for now so we don't lose the refs)

If we look at the qemu and kernel versions in those files:

  • 388
    • "Hypervisor": "/usr/bin/qemu-lite-system-x86_64"
    • "HypervisorVersion": " QEMU emulator version 2.11.0\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers",
    • Kernel: "Path": "/usr/share/kata-containers/vmlinuz-4.14.67-22"
  • 392
    • "Hypervisor": "/usr/bin/qemu-system-x86_64"
    • "HypervisorVersion": " QEMU emulator version 2.11.0 (v2.11.0-rc0-310-gf886228-dirty)\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers",
    • Kernel: "Path": "/usr/share/kata-containers/vmlinuz-4.14.67.22-143.container"

I'm reasonably concerned that we:

  • have lite appearing in one qemu and not the other
  • have quite different kernel version formats

My current best guess is that those differing versions came about as we did a release, which means the upstream kata packaged versions of those items now matches our defined yaml file versions, so we pull the binaries to test with rather than building from source (for those that don't know, that is how the CI gets some of its components - if it cannot find the correct component pre-built in the packaging repos, then it will build them from source).

I'll see if I can re-create this locally to confirm. If it is true, we should try to understand why we get a different result between build-from-source and use-prebuilt-binary.

If this persists over the weekend, on Monday I can go tweak the metrics CI bounds check so things fall back into the 'passing zone' - but I'm only happy doing that on the premise that we chase down what changed and why...

/cc @chavafg @jcvenegas
/cc @nitkon who first highlighted the metrics was failing (thanks!)

@chavafg
Copy link
Contributor

chavafg commented Jan 25, 2019

I am actually taking a look at the logs of the jobs above and see that on Job 392, the qemu is being built and in Job 388, the qemu is being downloaded from OBS. This is because the new packaged qemu is a different version:
On OBS, the packaged qemu is commit 87517afd72,
but our versions.yaml has a different qemu commit: f88622805677163b04498dcba35ceca0183b1318

Not sure if 87517afd72 is the one that we want...

@grahamwhaley
Copy link
Contributor Author

thx @chavafg , that's interesting. Hmm, both those commits are from the 2.11.0 branch over at:
https://github.com/kata-containers/qemu/commits/qemu-lite-2.11.0
There is only one commit difference - a PR to add a reset callback for no firmware - I would not have thought the one commit would make the difference - maybe it is the build differences....
I'm also wondering why we are not using the 2.11.2 branch? (unless I've mis-read something...)
https://github.com/kata-containers/qemu/commits/qemu-lite-2.11.2

ah, indeed, our versions points at 2.11.0 ... https://github.com/kata-containers/runtime/blob/master/versions.yaml#L78
@jcvenegas - any thoughts?

@grahamwhaley
Copy link
Contributor Author

OK, I think this change in metrics is tightly bound to OBS repo update/changes related to kata-containers/runtime#1184
Whilst that rattles on and sorts itself out, I'm going to re-educate the metrics boundary checks so the metrics CI starts working again within the bounds of the current setup...

@grahamwhaley
Copy link
Contributor Author

I've reset the metrics CI bounds according to recent build history, and kicked off a rebuild of the last failed runtime PR - let's see how it flies.

@grahamwhaley
Copy link
Contributor Author

afaict, this came about as part of the release/packaging change for v1.5. I've not managed to re-recreate or track down when/where the actual size difference came in, and I'm going to write it off to 'something caught up when the packaging changed'.
I've adjusted the metrics CI bounds to work with the new metrics results, and that is now passing again. Whilst there I tightened up the bounds check ranges a little.
Closing this then.

GabyCT pushed a commit to GabyCT/ci that referenced this issue Feb 12, 2019
Ensure Jenkins jobs call the static checks before running setup and the
tests.

Fixes kata-containers#102.

Signed-off-by: James O. D. Hunt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants