Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve entropy collection in VMs #673

Closed
marmarek opened this issue Mar 8, 2015 · 85 comments
Closed

Improve entropy collection in VMs #673

marmarek opened this issue Mar 8, 2015 · 85 comments
Labels
C: other cryptography This issue pertains to cryptography. P: major Priority: major. Between "default" and "critical" in severity. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality. ux User experience

Comments

@marmarek
Copy link
Member

marmarek commented Mar 8, 2015

Reported by joanna on 15 Nov 2012 19:03 UTC
While this is only my feeling, I suspect that the entropy collection daemon in our VMs needs some improvements.. This is because of the limited interaction with the physical world of each VM (e.g. mouse events go via vchan instead of via kernel module in a VM).

This can be easily noticed when one tries to generate a new GPG key in a VM -- the gpg would complain about inadequate entropy that is available and will hang until more is produced. One can produce more entropy via various disk activities (e.g. grep through the filesystem), however this:

  1. Isn't very convenient
  2. It's questionable whether such entropy is of "first-class freshness", or is it somehow inferior to the entropy that could be collected with the help of mouse movements, etc.?

It would probably be desirable to create some entropy producing device that would run in each of the VMs, and to feed this device from Dom0 or other domains exposed to physical hardware (netvm, usbvm?). One should be careful, however, not to distribute the same "entropy bits" to more than one domain, as this would likely compromise domain isolation.

Migrated-From: https://wiki.qubes-os.org/ticket/673

@marmarek marmarek added this to the Release 2 Beta 2 milestone Mar 8, 2015
@marmarek marmarek added T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality. C: other P: major Priority: major. Between "default" and "critical" in severity. labels Mar 8, 2015
@marmarek
Copy link
Member Author

marmarek commented Mar 8, 2015

Comment by joanna on 15 Nov 2012 23:25 UTC
Ok, I see two simple solutions:

  1. We run a set of daemons in Dom0 (one for each VM) that essentially do this in a loop:
read_a_chunk_of_bytes (/dev/ranomd);
send_bytes_to_VM(); // via qrexec
sleep (...) // let other read some Dom0's entropy also

Then, in the VM, there is a code that reads the transmitted bytes and sends them into the kernel's rng using the RNDADDENTROPY IOCTL on /dev/random:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=drivers/char/random.c;h=b86eae9b77dfaeb04dd2d4efefd6ebc01b9e0a93;hb=HEAD#l1265

  1. We just enable haveged in each VM (it gathers entopry from measuring internal CPU state):

http://www.issihosts.com/haveged/index.html
http://www.irisa.fr/caps/projects/hipsor/

Note 1: haveged is incredibly fast! Just seem to be a bit TOO fast for me... So, I think I would feel better with the option #1 I think...

Note 2: Dom0 entropy seems pretty reasonable (thanks to mouse and keyboard!), so it's not unreasonable to share it among all the VMs. But perhaps we could allow to manually exclude some VMs from getting the entropy from Dom0 (e.g. those that are not very sensitive). E.g. I have almost 30 domains on my laptop, while there are maybe 4 only that are used for key generation and those are the only ones that need fresh entropy from Dom0.

@marmarek
Copy link
Member Author

marmarek commented Mar 8, 2015

Comment by joanna on 21 Nov 2012 10:25 UTC
Some comments are in this thread:

https://groups.google.com/group/qubes-devel/browse_thread/thread/e7023cca06daa219

@marmarek
Copy link
Member Author

marmarek commented Mar 8, 2015

Modified by joanna on 8 Feb 2013 12:59 UTC

@marmarek
Copy link
Member Author

marmarek commented Mar 8, 2015

Modified by joanna on 30 Aug 2013 17:21 UTC

@marmarek marmarek modified the milestones: Release 3, Release 2 Beta 3 Mar 8, 2015
@marmarek
Copy link
Member Author

marmarek commented Mar 8, 2015

Modified by joanna on 20 Apr 2014 17:04 UTC

@marmarek marmarek modified the milestones: Release 2.1 (post R2), Release 3 Mar 8, 2015
@marmarek
Copy link
Member Author

marmarek commented Mar 8, 2015

Comment by joanna on 3 Jul 2014 12:07 UTC
For now (R2 release) we should just ensure haveged in the default template, I think.

@marmarek marmarek modified the milestones: Release 2, Release 2.1 (post R2) Mar 8, 2015
@marmarek
Copy link
Member Author

marmarek commented Mar 8, 2015

Comment by marmarek on 4 Jul 2014 02:45 UTC
Why this can't be the final solution? I don't believe we ever implement any other solution for this...

Also - do we want to cover by this fix also updates to template (which would mean hard dependency on haveged from qubes-core-vm)? Or installing it in new templates would be enough (so on R2 ISO)?

@marmarek
Copy link
Member Author

marmarek commented Mar 8, 2015

Comment by joanna on 4 Jul 2014 09:37 UTC
I think just the new template, no need to issue updates. This is no a security problem, rather a usability -- i.e. if read() on /dev/random hangs, it's an annoyance to the user.

I agree to closing this ticket with haveged.

@marmarek
Copy link
Member Author

marmarek commented Mar 8, 2015

@marmarek marmarek closed this as completed Mar 8, 2015
@adrelanos
Copy link
Member

From https://wiki.archlinux.org/index.php/Haveged#Virtual_Machines

As discussed at Is it appropriate to use haveged as a source of entropy on virtual machines?, it can be contested whether haveged provides quality entropy within a virtual environment.


It's not as simple as writing to /dev/random. From man random(4)

This differs from writing to /dev/random or /dev/urandom, which only adds some data but does not increment the entropy count. The following structure is used:


Looks like this could be implemented by reading from /dev/random and forwarding that entropy qrexec, ioctl(2), RNDADDENTROPY to VMs.

@marmarek
Copy link
Member Author

marmarek commented Jun 7, 2015

Program attached there returns numbers between 10 and 100 (30-50 on my
system), so theoretically it means that VMs have access to real rdtsc.
This is on R3rc1 (Xen 4.4.2), not sure how about R2 (Xen 4.1.2). I'll
check Xen documentation later - I think I've seen some option for this.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@adrelanos
Copy link
Member

@mfc mfc added the cryptography This issue pertains to cryptography. label May 22, 2016
@jpouellet
Copy link
Contributor

@adrelanos
Copy link
Member

haveged is discouraged in VMs by Andre Seznec, one of haveged's main authors. Source:
BetterCrypto/Applied-Crypto-Hardening@cf7cef7#commitcomment-23006392

Entropy is needed before systemd / systemd-random-seed.service / haveged by the kernel (and possibly others, which is not researched). (Wrote a bit about that here #2704 (comment).)

Please reopen. @andrewdavidwong

@smuellerDD
Copy link

smuellerDD commented Oct 10, 2021 via email

@smuellerDD
Copy link

smuellerDD commented Oct 10, 2021 via email

@qua3k
Copy link

qua3k commented Oct 10, 2021

There is no way that the kernel tells you that data has been pulled. Therefore this strategy of injecting data once in a while is considered appropriate. Besides, this operation once every 10 minutes is cheap and is not considered to cause concerns.

If the pool has been initialized you don't need to "inject data"; there is no reason to do this.

@3hhh
Copy link

3hhh commented Oct 10, 2021

In short: /dev/random gives random output directly from hardware. /dev/urandom gives random output from a crypto algorithm initialized and sometimes refreshed with output from /dev/random.

This is wholly inaccurate and not how the Linux kernel CSPRNG works.

Thanks for the clarification (nice pictures!) - looks like I fell for the myth.

Then let me retry my "in short" statement (feel free to correct me again):
Currently (kernel 4.8 - 5.5):

  • /dev/random: Always seeded CSPRNG output, blocking when it believes it doesn't have enough entropy anymore to reseed.
  • /dev/urandom: Direct CSPRNG output, even when not seeded.

Future (kernel 5.6+):

  • /dev/random: Always seeded CSPRNG output, never blocks after seeded once.
  • /dev/urandom: Direct CSPRNG output, even when not seeded.

Ironically that makes me conclude to only use /dev/random from 5.6 onwards for everything - quite contrary to the link conclusion.

For 4.8 - 5.5 it's reasonable to use /dev/urandom after having checked that the CSPRNG was initially seeded e.g. via getrandom.

Btw reseeding may make sense for cloned VM snapshots etc. or if you don't want to 100% trust the initial seed. If you wouldn't need reseeding at all, the whole entropy collection structure would be pointless / only serve to obtain the initial seed.

@brendanhoar
Copy link

brendanhoar commented Oct 10, 2021

Stephan Müller's recent work on LRNG (code not yet(?)* accepted in the kernel tree ... replacing almost everything I thought I knew about randomness in Linux, heh) includes a good write up of his jitter-based entropy source.

That write up can be found here: https://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.pdf .

In particular it discusses why this jitter implementation is a valid source in virtual environments.

I'd propose utilizing that as a credited entropy source for boot time entropy in VMs, esp. if the CPU HWRND (via RDRAND) is no longer credited by compilation flag (Qubes maintainer decisions) and/or many non-RDRAND systems are experiencing long boot times due to a lack of credited entropy.

The credit rate for jitter can be assigned at boot time as per: https://www.chronox.de/lrng/doc/lrng.pdf .

There are also power-up noise source tests which are configurable at kernel compile time, which might be useful for Whonix templates, for example.

In addition to the new LRNG code availability on recent kernels, backports to many older kernels are available up through the current patchset (v42).

B

  • corrected from "now accepted", due to my misreading a phoronix post.

@qua3k
Copy link

qua3k commented Oct 10, 2021

For 4.8 - 5.5 it's reasonable to use /dev/urandom after having checked that the CSPRNG was initially seeded e.g. via getrandom.

It's not. Developers should be using getrandom(...), not /dev/(u)random. Older kernels without getrandom(...) don't have any properly designed APIs for obtaining random numbers, and using /dev/random means that it will continue to block unnecessarily.

@DemiMarie
Copy link

For 4.8 - 5.5 it's reasonable to use /dev/urandom after having checked that the CSPRNG was initially seeded e.g. via getrandom.

It's not. Developers should be using getrandom(...), not /dev/(u)random. Older kernels without getrandom(...) don't have any properly designed APIs for obtaining random numbers, and using /dev/random means that it will continue to block unnecessarily.

It’s okay to use /dev/urandom after checking that the CSPRNG has been seeded. That is necessary when using cryptsetup, for example.

@jirka-h
Copy link

jirka-h commented Oct 10, 2021

But upstream does not seem to listen - dead silence for years now. The person allegedly "responsible" for it just seems to ignore what I am doing. Yet ignoring it does not make things better.

Oh, that's bad. I wish I could help you. Keeping my fingers crossed!

I am looking for obscure systems: very small scale CPUs for IoTs, SPARC or other rare systems. I have coverage for Intel x86 (old and contemporary), AMD, ARM 32/64 bit, POWER LE/BE, IBM Z, RISC-V, MIPS.

I have access to some not-so-common systems like ARM 64 bit and IBM Power LE. For x86_64, I have access to large (multi-socket, max CPU count) systems. If you happen to need help with testing on any of these, please let me know.

Viele Grüße!
Jirka

@smuellerDD
Copy link

smuellerDD commented Oct 11, 2021 via email

@jirka-h
Copy link

jirka-h commented Oct 12, 2021

Hi Stephen,

I have successfully built a kernel with your patch and stored it as rpm. The tests on the single-socket server have passed. I will run the tests on a multi-socket server when there are some spare HW cycles later this week and send it to you via email.

Cheers,
Jirka

@smuellerDD
Copy link

smuellerDD commented Oct 12, 2021 via email

@jirka-h
Copy link

jirka-h commented Oct 14, 2021

Hi Stephan,

what is the expected runtime for

./getrawentropy -s 1000000 > /dev/shm/4x_Xeon-E5-4610_v2_lrng_raw_noise.data

It's running on an old 4 socket (4x_Xeon-E5-4610_v2) server for 6 hours now, and so far, only 200 KiB of data has been collected. It's around 10Bytes per second.

I want to make sure if this is expected. I plan to run on more servers, but I want to make sure everything is all right.

Thanks!

Jirka

$ cat /proc/lrng_type 
DRNG name: ChaCha20 DRNG
Hash for reading entropy pool: SHA-256
Hash for operating aux entropy pool: SHA-256
LRNG security strength in bits: 256
per-CPU interrupt collection size: 1024
number of DRNG instances: 4
Standards compliance: 
High-resolution timer: true
LRNG minimally seeded: true
LRNG fully seeded: true
Continuous compression: true

@smuellerDD
Copy link

smuellerDD commented Oct 14, 2021 via email

@jpouellet
Copy link
Contributor

jpouellet commented Oct 14, 2021 via email

@marmarek
Copy link
Member Author

marmarek commented Oct 14, 2021 via email

@jirka-h
Copy link

jirka-h commented Oct 16, 2021

That is the amount of events trickling in. If you send, say, a ping flood to it or do another way of causing interrupts like dd if=$DISK of=file count=10000 oflag=direct then you get your results in a few minutes. Thanks a lot Stephan

This has helped, especially on systems with NVMe storage.

I have processed the results with ent utility (git clone https://github.com/ProhtMeyhet/ent-random-entropy-test.git), see results below. Entropy estimation is 3.122 bits per byte regardless of the test system used. Is this expected?

intel-E5-4627-v2-4s/4x_Xeon_E5-4627_v2_lrng_raw_noise.data.ent:Entropy = 3.122261 bits per byte.
intel-gold-6126-4s/4x_Xeon_Gold_6126_lrng_raw_noise.data.ent:Entropy = 3.122186 bits per byte.
intel-E5-4607-4s/4x_Xeon_E5-4607_lrng_raw_noise.data.ent:Entropy = 3.122283 bits per byte.
amd-epyc3-milan-7713-2s/2x_EPYC_7713__lrng_raw_noise.data.ent:Entropy = 3.122571 bits per byte.
intel-E5-4610-v2-4s/4x_Xeon-E5-4610_v2_lrng_raw_noise.data.ent:Entropy = 3.122024 bits per byte.
intel-gold-6126-2s/2x_Xeon_Gold_6126_lrng_raw_noise.data.ent:Entropy = 3.122105 bits per byte.
amd-epyc3-milan-7313-2s/2x_EPYC_7313_lrng_raw_noise.data.ent:Entropy = 3.122709 bits per byte.
intel-E7-4870-8s/8x_Xeon-E7-4870_lrng_raw_noise.data.ent:Entropy = 3.122159 bits per byte.

I will send you a link to download the full results via email.

Thanks
Jirka

@smuellerDD
Copy link

smuellerDD commented Oct 19, 2021 via email

@jirka-h
Copy link

jirka-h commented Oct 19, 2021

Hallo Stephan,

thanks for checking the results!

the data you obtained is ASCII numbers

Oh, my bad! I was assuming the data are binary without actually checking the values.

I have now rerun the analysis based on the above instructions and the results are indeed excellent!

intel-E5-4607-4s/4x_Xeon_E5-4607_lrng_raw_noise.data.minentropy_FF_8bits.txt:min(H_original, 8 X H_bitstring): 7.468554
intel-gold-6126-4s/4x_Xeon_Gold_6126_lrng_raw_noise.data.minentropy_FF_8bits.txt:min(H_original, 8 X H_bitstring): 6.904956
intel-gold-6126-2s/2x_Xeon_Gold_6126_lrng_raw_noise.data.minentropy_FF_8bits.txt:min(H_original, 8 X H_bitstring): 6.915690
intel-E5-4627-v2-4s/4x_Xeon_E5-4627_v2_lrng_raw_noise.data.minentropy_FF_8bits.txt:min(H_original, 8 X H_bitstring): 7.268956
amd-epyc3-milan-7313-2s/2x_EPYC_7313_lrng_raw_noise.data.minentropy_FF_8bits.txt:min(H_original, 8 X H_bitstring): 7.207621
intel-E7-4870-8s/8x_Xeon-E7-4870_lrng_raw_noise.data.minentropy_FF_8bits.txt:min(H_original, 8 X H_bitstring): 7.287790
intel-E5-4610-v2-4s/4x_Xeon-E5-4610_v2_lrng_raw_noise.data.minentropy_FF_8bits.txt:min(H_original, 8 X H_bitstring): 7.264641
amd-epyc3-milan-7713-2s/2x_EPYC_7713__lrng_raw_noise.data.minentropy_FF_8bits.txt:min(H_original, 8 X H_bitstring): 7.007947

Thanks a lot!
Jirka

@3hhh
Copy link

3hhh commented Mar 23, 2022

IIRC this one can be closed with 4.1 as the VM kernel now does the job itself?

@andrewdavidwong
Copy link
Member

IIRC this one can be closed with 4.1 as the VM kernel now does the job itself?

Closing as done. If anyone believes this issue is not yet done, please leave a comment, and we'll be happy to reopen it. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: other cryptography This issue pertains to cryptography. P: major Priority: major. Between "default" and "critical" in severity. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality. ux User experience
Projects
None yet
Development

No branches or pull requests