Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

kernel: config: add config fragment support #314

Closed

Conversation

grahamwhaley
Copy link
Contributor

Add support for building kernel configs from a set of 'fragment' files, rather than just copying fully pre-built config files.

@grahamwhaley grahamwhaley added do-not-merge rfc Requires input from the team labels Jan 22, 2019
@grahamwhaley grahamwhaley requested review from jcvenegas and devimc and removed request for jcvenegas January 22, 2019 16:34
@grahamwhaley
Copy link
Contributor Author

grahamwhaley commented Jan 22, 2019

This is WIP. It does currently construct an x86_64 4.19.x kernel config file that does boot enough to run the metrics report tests - but, it needs a lot more review and testing before we think of merging it.

Notes then:

  • This really needs reviewing
  • Currently only supports x86_64 setup
  • It also needs a lot of testing!
  • There are some notable items missing, like seccomp is not yet enabled
  • The networking settings are the 'lions share' - we need to review. Do we need/use netfilter inside the guest VM for instance?
  • We currently set 422 =y items in the fragments, but end up with 755 =y in the .config once 'oldconfig' has done its work (setting defaults) - which is less than the 1039 items we have set in our 4.19 'whole config' file.
  • I can provide a diff between the frag generated 4.19 and the current 4.19 we have if we want - that will help figure out what is missing that we currently have (and also if there is anything new we might then want to try and turn off)
  • we probably want to look at adding more x86 specific optimisation settings

and then

  • security reviewing kernel version changes and config file updates will still be hard - but, we should discuss if the fragments will make it any easier?
  • I will compare the dmesg logs from a pre/post frag boot to see if I can spot any obvious issues

and then finally

  • sure, you all like numbers eh?.... in my initial local tests:
    • KSM memory footprint down from 65Mb to 55Mb
    • non-KSM memory footprint down from 168Mb to 143Mb
    • boot-to-workload time down from 1.58s to 1.22s
    • "numbers may vary depending on your setup - and this is still WIP"

So, really looking for feedback on - is this the right path forwards? Do the fragments look sensible? Any volunteers to help test and update/define the fragments.

@amshinde @mcastelino - maybe you can cast an eye over the network fragment for me?

@grahamwhaley
Copy link
Contributor Author

@nitkon @alicefr @Pennyzct - you may want to cast an eye over this and consider if you would want to go this path for your respective architectures as well.

@grahamwhaley
Copy link
Contributor Author

I'll attach a diff between the current and the frag generated 4.19 kernel configs, as that probably will help review.
diff.txt

Notable things from a quick browse then:

  • we have dropped the ability to coredump
  • I think big/huge pages are not yet on
  • There is a whole bunch of crypto not turned on
  • memory balloon/compaction is not on (which may not be the same as the balloon driver)
  • a load of DRM/DM/FB display stuff is turned off
  • I think our HZ is down to 250 from 1000 - easy to put back up if we need
  • IOMMU stuff not in
  • a few sched things and iosched things not in
  • dropped a bunch of IR driver stuff
  • dropped kallsyms
  • no swap
  • dropped a few network vendors
  • some PCI bits missing
  • turned off firmware updating I think
  • think we moved off of SLAB - we should decide which allocator we want
  • probably some thermal and power stuff missing
  • no virtualisation (as host) support - coz, err, you were thinking of running kata-in-kata? :-)
  • there are a few x86 things we should evalaute
  • no ZONE_DMA - do we do DMA in the guest kernel? not sure how that fits with any SRIOV for instance

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @grahamwhaley - this is great! I'd still prefer we create the fragments in YAML and then "build" the Kconfig fragments from that, but I think this is a really good first step.

Aside from my inline comments, I think the following scenarios need documenting:

  • What if two config fragments have conflicting config values? In other words, is there / should there be ordering of fragments?

    If so, we should require them to have a numeric prefix like the ol' SysV init scripts in days of yore.

  • What if a full kernel config and a set of fragments exists? Which takes priority, or is that an error?

kernel/configs/README.md Outdated Show resolved Hide resolved
kernel/configs/README.md Outdated Show resolved Hide resolved
kernel/configs/README.md Outdated Show resolved Hide resolved
kernel/configs/README.md Outdated Show resolved Hide resolved
kernel/configs/README.md Outdated Show resolved Hide resolved
kernel/configs/README.md Show resolved Hide resolved
kernel/configs/fragments/4.19.x/common/9p.conf Outdated Show resolved Hide resolved
@jodh-intel
Copy link
Contributor

This PR has reminded me about the idea of providing /proc/config.gz in all the guest kernels (and/or packaging a file alongside the kernels in /usr/share/kata-containers/ with information about kernel config and/or kernel modules):

@grahamwhaley
Copy link
Contributor Author

This PR has reminded me about the idea of providing /proc/config.gz in all the guest kernels (and/or packaging a file alongside the kernels in /usr/share/kata-containers/ with information about kernel config and/or kernel modules):

We can trivially enable the kernel config options - but it will add a bit of size to the kernel (very small increase I'd guess). Adding them to the packaging would make them easier to get hold of, and cost no runtime space/time.

@jodh-intel
Copy link
Contributor

Yes, the packaging option may be best approach initially. The general premise is that "the workload should not care about the guest kernel". But now we have multiple architectures and those have differing config options, there may be scenarios where a workload may find itself running in an environment that it cannot tolerate.

Depending on what we think our chance of success is wrt making all arch configs entirely consistent from the userspace perspective (I'd say unlikely), we should consider providing a way for either the runtime or the workload itself to determine if the environment provided by the guest is suitable. If we provide kernel config details at the packaging level, that implies the runtime would need to offer a way for the "something" related to the workload querying those config details. If the env is not suitable, it would abort the docker run for example. Alternatively, if we provided /proc/config.gz, the workload could start to run but be able to decide itself whether the environment was sane and exit if not.

Checking the packaging config details could be done by an admin of course. A manual step, but I'm wondering if we need the information in both places to support the automatic approach.

Having said all this, maybe the workload should just check the guest env and error if it finds something it doesn't like. As long as we provide "a way" for the admin to determine easily what environment the workload is being provided with, I guess that's fine at this stage.

@grahamwhaley
Copy link
Contributor Author

grahamwhaley commented Jan 23, 2019

Before I forget, @jodh-intel asked:

What if two config fragments have conflicting config values? In other words, is there / should there be ordering of fragments?
If so, we should require them to have a numeric prefix like the ol' SysV init scripts in days of yore.

I've enabled 'checking' on the magic merge script - it will now complain, with debug, and stop if you get two settings with different values.
It will also stop if:

  • you have a setting that does not make it into the config file (generally means you typed in an invalid CONFIG name)
  • you have the same setting in more than one fragment - which whilst valid, it is probably going to force us to keep the fragments fairly 'clean' - it made me do a couple of cleanups already...

What if a full kernel config and a set of fragments exists? Which takes priority, or is that an error?

Fragments take priority - well, the code looks for a fragment first, and uses it if found. I'm going to say 'it is the new modern way, so takes priority' :-)

@jodh-intel
Copy link
Contributor

@grahamwhaley - ta. I'd consider adding that details to the README.

@grahamwhaley grahamwhaley force-pushed the 20190122_config_frags branch from 5017b37 to c89c366 Compare January 23, 2019 18:03
@grahamwhaley
Copy link
Contributor Author

feedback fed in - updates pushed.
Notably then I:

  • enabled seccomp
  • enabled huge page support
  • added AIO
  • turned on kernel VSTACK support
  • dropped kernel maps from user space
  • moved netfilter out to its own file

And, with additions comes growth.
Our KSM footprint is now approaching back to the 4.19 whole config file level, and our boot time just went up again (I think that might be seccomp).

kernel/build-kernel.sh Outdated Show resolved Hide resolved
kernel/build-kernel.sh Outdated Show resolved Hide resolved
@egernst
Copy link
Member

egernst commented Feb 6, 2019

While dividing into fragments will make things a bit clearler, I'm wondering if its absolutely necessary to have the fragments be kernel version specific. What happens if we apply these fragments into 4.14, for example? I don't want to manage per kernel fragments, if feasible.

@grahamwhaley
Copy link
Contributor Author

That's a fair point @egernst - let me see if there is a sane way for us to keep a generic set of fragments, whilst still allowing kernel version specific bits. That should be easy if the CONFIG elements don't clash, but harder if we want to be able to over-ride on a per-version basis.
I suspect we'll start with a simple 'all generic' layout then, and later we can add the ability to do version specific over-rides if we need.

I'll scrape the feedback, re-arrange and update rsn.

@grahamwhaley grahamwhaley force-pushed the 20190122_config_frags branch from c89c366 to 74b6980 Compare February 8, 2019 14:39
@grahamwhaley
Copy link
Contributor Author

@egernst reworked to remove kernel version from the fragment path. In its current state then it will use the fragments for all x86 kernel builds, and the existing premade configs for all other arch's.
Maybe the best way to test this is to run the CIs.... @jcvenegas @chavafg - I think we fixed it now so that PRs here will get used in the CI eh? Well, one way to find out (there is a new info() or two in the patch, so we can see from the logs...

@grahamwhaley
Copy link
Contributor Author

/test

@egernst
Copy link
Member

egernst commented Feb 11, 2019

@grahamwhaley looks like you need to update config version:

ERROR: Please bump version in kernel/kata_config_version
Makefile:24: recipe for target 'test' failed```

@ganeshmaharaj
Copy link
Contributor

I tried all i could to improve the entropy value of the system but nothing in my local tests have improved the value.

+CONFIG_CRYPTO_DEV_VIRTIO=y
+CONFIG_CRYPTO_JITTERENTROPY=y
+CONFIG_BLK_DEV_CRYPTOLOOP=y
+CONFIG_CRYPTO_HW=y
+CONFIG_CRYPTO_AEAD=y
+CONFIG_CRYPTO_AEAD2=y
+CONFIG_CRYPTO_BLKCIPHER=y
+CONFIG_CRYPTO_BLKCIPHER2=y
+CONFIG_CRYPTO_RNG=y
+CONFIG_CRYPTO_RNG2=y
+CONFIG_CRYPTO_NULL2=y
+CONFIG_CRYPTO_WORKQUEUE=y
+CONFIG_CRYPTO_ENGINE=y
+CONFIG_CRYPTO_HMAC=y
+CONFIG_CRYPTO_DRBG_MENU=y
+CONFIG_CRYPTO_DRBG_HMAC=y
+CONFIG_CRYPTO_DRBG=y

@grahamwhaley
Copy link
Contributor Author

Heh, @ganeshmaharaj I'm pretty sure the entropy issue has arisen as I added in CONFIG_RANDOM_TRUST_CPU=y from #482 posted by @mcastelino , which I think is suffering the same issue.
We'll have to see if we work out that entropy issue quick, or if I revert that addition so we can land the fragments.... probably the latter I am going to guess.

@grahamwhaley grahamwhaley force-pushed the 20190122_config_frags branch 2 times, most recently from ae1975b to 2f7f573 Compare May 8, 2019 10:43
@grahamwhaley
Copy link
Contributor Author

/test
disabled RANDOM_TRUST_CPU for now until we figure out what to do with the entropy CI tests - as we'd rather like to try and land this.
rebased to pick up kernel version update etc.

@grahamwhaley grahamwhaley force-pushed the 20190122_config_frags branch from 2f7f573 to 308462d Compare May 9, 2019 14:09
@grahamwhaley
Copy link
Contributor Author

/test

Graham Whaley and others added 6 commits May 13, 2019 12:00
Add the framework to build kernel config files from trees
of kernel fragments.

If no fragment directory is found for the requested kernel
version and architecture then revert to looking for a whole
prebuilt kernel config file instead.

Fixes: kata-containers#234

Signed-off-by: Graham Whaley <[email protected]>
Embellish the README a bit, and add some details about the
new fragment method.

Signed-off-by: Graham Whaley <[email protected]>
Add the base common fragments and x86_64 specific fragments
for the 4.19.x kernel.

Signed-off-by: Graham Whaley <[email protected]>
Now we are using the fragments, drop the x86_64 4.19 config file
so we default to fragment mode.

Signed-off-by: Graham Whaley <[email protected]>
In order to support kernel config fragments in the snap, the kernel should
be built using the build_kernel.sh script.

fixes kata-containers#438

Signed-off-by: Julio Montes <[email protected]>
Modifying configs, need to bump config version file.

Signed-off-by: Graham Whaley <[email protected]>
@grahamwhaley grahamwhaley force-pushed the 20190122_config_frags branch from 308462d to f6eb061 Compare May 13, 2019 11:03
@grahamwhaley
Copy link
Contributor Author

/test
Was looking good on the CIs here and on the twin runtime test PR, but needed a rebase and a config file version bump.

ganeshmaharaj pushed a commit to ganeshmaharaj/kata-runtime that referenced this pull request May 13, 2019
Depends-on: github.com/kata-containers/packaging#314
Fixes: #123141234

Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>
ganeshmaharaj pushed a commit to ganeshmaharaj/kata-runtime that referenced this pull request May 13, 2019
Depends-on: github.com/kata-containers/packaging#314
Fixes: #123141234

Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>
ganeshmaharaj pushed a commit to ganeshmaharaj/kata-runtime that referenced this pull request May 13, 2019
Depends-on: github.com/kata-containers/packaging#314
Fixes: #123141234

Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>
@amshinde
Copy link
Member

amshinde commented Jun 3, 2019

@grahamwhaley @ganeshmaharaj Whats the status of this PR? What needs to be done to land this PR?

@grahamwhaley
Copy link
Contributor Author

The PR is failing the firecracker CIs (see over on the parallel test PR at kata-containers/runtime#1588
I failed to get a stable CI setup locally to try to reproduce/diagnose, and then some other items pushed this down my stack for the minute...
If we can work out why the FC CI is failing (and annoyingly, it seemed to fail in different places each run, which is never helpful), and work out if we need to add more CONFIG items....

Also, we really should test this against some SRIOV setup, as I'm almost sure it will not support that either at present.

@egernst egernst added the needs-help Request for extra help (technical, resource, etc) label Jun 12, 2019
@caoruidong
Copy link
Member

ping @grahamwhaley

@jodh-intel
Copy link
Contributor

@grahamwhaley - branch conflicted. Any updates on this? Maybe now kata-containers/runtime#1813 has landed this may magically work? 😄

jcvenegas pushed a commit that referenced this pull request Jul 2, 2019
tests: reduce the amount of log displayed
@caoruidong
Copy link
Member

ping @grahamwhaley Will #646 replace this PR?

@grahamwhaley
Copy link
Contributor Author

@caoruidong - yep, absolutely - #646 is a respin of this PR to continue the work. Closing this one now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs-help Request for extra help (technical, resource, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.