Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: RtlGenRandom vs BCryptGenRandom #65

Closed
newpavlov opened this issue Jul 26, 2019 · 11 comments · Fixed by #177
Closed

Windows: RtlGenRandom vs BCryptGenRandom #65

newpavlov opened this issue Jul 26, 2019 · 11 comments · Fixed by #177

Comments

@newpavlov
Copy link
Member

newpavlov commented Jul 26, 2019

IIUC both functions use the same algorithm under the hood (also see this). The main difference is that BCryptGenRandom will not work on Windows XP, but otherwise it looks like BCryptGenRandom should be preferred. IIRC Rust explicitly does not support Windows XP anymore, so it should not matter.

@chouquette
Can you help here? In rust-lang/rust#60260 you have used BCryptGenRandom for UWP targets and left RtlGenRandom for non-UWP targets. Is there a reason for that?

cc @GabrielMajeri

@chouquette
Copy link

Hi,

Mostly because I was asked to :) rust-lang/rust#60260 (comment)

My understanding is that Microsoft advises to use CryptGenRandom over RtlGenRandom, but CryptGenRandom is deprecated and the "next generation cryptography API" should be used instead, and BCryptGenRandom is part of that API, so if you have the choice, BcryptGenRandom is probably to be favored, however it seems in the context of Rust, there are some historical reasons for keeping RtlGenRandom

In the case of UWP the choice is easy, you can't use RtlGenRandom

@ollie27
Copy link

ollie27 commented Jul 26, 2019

I believe XP is still technically supported by std (https://forge.rust-lang.org/platform-support.html#tier-3) and that's why it needs both implementations. When XP support is officially dropped then the RtlGenRandom version can be removed and BCryptGenRandom can be used for all Windows targets. I suggest that getrandom just copies std for now.

@newpavlov
Copy link
Member Author

newpavlov commented Jul 26, 2019

Ah, indeed. In that case I think we can use RtlGenRandom fallback only for XP targets, and BCryptGenRandom for all other Windows targets.

UPD: Never mind, we can't distinguish between XP and later versions by using only target triplet.

Relevant discussion: https://internals.rust-lang.org/t/8745

@newpavlov
Copy link
Member Author

Relevant issue: openssl/openssl#8644

So we will have to wait for dropping of Windows XP (and hopefully Vista) from the list of supported Rust targets, until then we will have to keep using RtlGenRandom.

@alexcrichton
Copy link
Collaborator

I'd recommend using git blame and doing some digging to learn about the rationale for the usage in Rust's standard library. Doing that, for example, leads to rust-lang/rust#45370 which has a lot more information about why things are the way they are.

@newpavlov
Copy link
Member Author

The compiler team considers dropping Windows XP support, assuming it will be done, I think we can drop its support in getrandom v0.2 as well.

@josephlr
Copy link
Member

SGTM, how long after they drop support should we update the library? Given that rand 0.8 isn't out yet, I'd be fine doing it sooner.

@newpavlov
Copy link
Member Author

Right after the final decision will be made? IIRC Rust Windows XP support was arguably broken for a long time, so it should not be a problem. I also think that ideally we should do it before releasing rand v0.8.

@bluetech
Copy link

In case you're interested, Go had some discussion on this and decided to stick with RtlGenRandom for non-UWP: golang/go#33542

@josephlr
Copy link
Member

josephlr commented Jan 3, 2021

So it looks like Go's final decision was based on three main sources:

  • What we (Rust) did
  • What BoringSSL does (this then determines what Chrome does)
  • What Firefox does

They didn't really provide their own independent reasoning for preferring one over another. We're changing what Rust does, so the main arguments in favor of RtlGenRandom come from this BoringSSL thread and this BoringSSL CL where the BoringSSL folks decided to not adopt BCryptGenRandom as the universal way to get randomness on Windows.

The main downside is that BCryptGenRandom (on its very first call with BCRYPT_USE_SYSTEM_PREFERRED_RNG) might read some preferences from the Windows registry, which is not allowed inside the Chromium sandbox.

Both RtlGenRandom and BCryptGenRandom will be slow on their first call:

  • RtlGenRandom (part of advapi32.dll) has to load bcrypt.dll which contains the Windows cryptographic code
  • BCryptGenRandom is part of bcrypt.dll, so no additional DLL loading is necessary, but it has to find the "system preferred" RNG, which might require reading values from the windows registry

Windows 10 introduced CNG Algorithm Pseudo-handles to avoid these issues, which would allow us to invoke BCryptGenRandom like:

BCryptGenRandom(
    BCRYPT_RNG_ALG_HANDLE,
    chunk.as_mut_ptr(),
    chunk.len() as u32,
    0,
)

but that (maybe) only works on Windows 10. In all cases, performance was basically the same (as they all call the same bcrypt-internal crypto code).

My decision would be to switch to BCryptGenRandom. The advantages of avoiding advapi32.dll and having only one implementation outweigh the very Chromium-specific sandboxing concerns.

If either sandboxing or performance ever become an issue (unlikely), I think we should just try to invoke BCryptGenRandom with BCRYPT_RNG_ALG_HANDLE first, before falling back to invoking it with BCRYPT_USE_SYSTEM_PREFERRED_RNG.

@neolit123
Copy link

saw this thread linked from the golang issue.

My decision would be to switch to BCryptGenRandom

that is a very wise decision, because if one day a user decides to use the rust "crypto" wrappers and asks you "what is the underlying algorithm implementation behind RtlGenRandom and is it compliant with standard X?", only a Microsoft employee would be able to answer for sure and there is no documented guarantee whether such an implementation detail can change for RtlGenRandom in Windows-next without any notice.

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue May 17, 2022
Make HashMap fall back to RtlGenRandom if BCryptGenRandom fails

With PR rust-lang#84096, Rust `std::collections::hash_map::RandomState` changed from using `RtlGenRandom()` ([msdn](https://docs.microsoft.com/en-us/windows/win32/api/ntsecapi/nf-ntsecapi-rtlgenrandom)) to `BCryptGenRandom()` ([msdn](https://docs.microsoft.com/en-us/windows/win32/api/bcrypt/nf-bcrypt-bcryptgenrandom)) as its source of secure randomness after much discussion ([here](rust-random/getrandom#65 (comment)), among other places).

Unfortunately, after that PR landed, Mozilla Firefox started experiencing fairly-rare crashes during startup while attempting to initialize the `env_logger` crate. ([docs for env_logger](https://docs.rs/env_logger/latest/env_logger/)) The root issue is that on some machines, `BCryptGenRandom()` will fail with an `Access is denied. (os error 5)` error message. ([Bugzilla issue 1754490](https://bugzilla.mozilla.org/show_bug.cgi?id=1754490)) (Discussion in issue rust-lang#94098)

Note that this is happening upon startup of Firefox's unsandboxed Main Process, so this behavior is different and separate from previous issues ([like this](https://bugzilla.mozilla.org/show_bug.cgi?id=1746254)) where BCrypt DLLs were blocked by process sandboxing. In the case of sandboxing, we knew we were doing something abnormal and expected that we'd have to resort to abnormal measures to make it work.

However, in this case we are in a regular unsandboxed process just trying to initialize `env_logger` and getting a panic. We suspect that this may be caused by a virus scanner or some other security software blocking the loading of the BCrypt DLLs, but we're not completely sure as we haven't been able to replicate locally.

It is also possible that Firefox is not the only software affected by this; we just may be one of the pieces of Rust software that has the telemetry and crash reporting necessary to catch it.

I have read some of the historical discussion around using `BCryptGenRandom()` in Rust code, and I respect the decision that was made and agree that it was a good course of action, so I'm not trying to open a discussion about a return to `RtlGenRandom()`. Instead, I'd like to suggest that perhaps we use `RtlGenRandom()` as a "fallback RNG" in the case that BCrypt doesn't work.

This pull request implements this fallback behavior. I believe this would improve the robustness of this essential data structure within the standard library, and I see only 2 potential drawbacks:

1. Slight added overhead: It should be quite minimal though. The first call to `sys::rand::hashmap_random_keys()` will incur a bit of initialization overhead, and every call after will incur roughly 2 non-atomic global reads and 2 easily predictable branches. Both should be negligible compared to the actual cost of generating secure random numbers
2. `RtlGenRandom()` is deprecated by Microsoft: Technically true, but as mentioned in [this comment on GoLang](golang/go#33542 (comment)), this API is ubiquitous in Windows software and actually removing it would break lots of things. Also, Firefox uses it already in [our C++ code](https://searchfox.org/mozilla-central/rev/5f88c1d6977e03e22d3420d0cdf8ad0113c2eb31/mfbt/RandomNum.cpp#25), and [Chromium uses it in their code as well](https://source.chromium.org/chromium/chromium/src/+/main:base/rand_util_win.cc) (which transitively means that Microsoft uses it in their own web browser, Edge). If there did come a time when Microsoft truly removes this API, it should be easy enough for Rust to simply remove the fallback in the code I've added here
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue May 18, 2022
Make HashMap fall back to RtlGenRandom if BCryptGenRandom fails

With PR rust-lang#84096, Rust `std::collections::hash_map::RandomState` changed from using `RtlGenRandom()` ([msdn](https://docs.microsoft.com/en-us/windows/win32/api/ntsecapi/nf-ntsecapi-rtlgenrandom)) to `BCryptGenRandom()` ([msdn](https://docs.microsoft.com/en-us/windows/win32/api/bcrypt/nf-bcrypt-bcryptgenrandom)) as its source of secure randomness after much discussion ([here](rust-random/getrandom#65 (comment)), among other places).

Unfortunately, after that PR landed, Mozilla Firefox started experiencing fairly-rare crashes during startup while attempting to initialize the `env_logger` crate. ([docs for env_logger](https://docs.rs/env_logger/latest/env_logger/)) The root issue is that on some machines, `BCryptGenRandom()` will fail with an `Access is denied. (os error 5)` error message. ([Bugzilla issue 1754490](https://bugzilla.mozilla.org/show_bug.cgi?id=1754490)) (Discussion in issue rust-lang#94098)

Note that this is happening upon startup of Firefox's unsandboxed Main Process, so this behavior is different and separate from previous issues ([like this](https://bugzilla.mozilla.org/show_bug.cgi?id=1746254)) where BCrypt DLLs were blocked by process sandboxing. In the case of sandboxing, we knew we were doing something abnormal and expected that we'd have to resort to abnormal measures to make it work.

However, in this case we are in a regular unsandboxed process just trying to initialize `env_logger` and getting a panic. We suspect that this may be caused by a virus scanner or some other security software blocking the loading of the BCrypt DLLs, but we're not completely sure as we haven't been able to replicate locally.

It is also possible that Firefox is not the only software affected by this; we just may be one of the pieces of Rust software that has the telemetry and crash reporting necessary to catch it.

I have read some of the historical discussion around using `BCryptGenRandom()` in Rust code, and I respect the decision that was made and agree that it was a good course of action, so I'm not trying to open a discussion about a return to `RtlGenRandom()`. Instead, I'd like to suggest that perhaps we use `RtlGenRandom()` as a "fallback RNG" in the case that BCrypt doesn't work.

This pull request implements this fallback behavior. I believe this would improve the robustness of this essential data structure within the standard library, and I see only 2 potential drawbacks:

1. Slight added overhead: It should be quite minimal though. The first call to `sys::rand::hashmap_random_keys()` will incur a bit of initialization overhead, and every call after will incur roughly 2 non-atomic global reads and 2 easily predictable branches. Both should be negligible compared to the actual cost of generating secure random numbers
2. `RtlGenRandom()` is deprecated by Microsoft: Technically true, but as mentioned in [this comment on GoLang](golang/go#33542 (comment)), this API is ubiquitous in Windows software and actually removing it would break lots of things. Also, Firefox uses it already in [our C++ code](https://searchfox.org/mozilla-central/rev/5f88c1d6977e03e22d3420d0cdf8ad0113c2eb31/mfbt/RandomNum.cpp#25), and [Chromium uses it in their code as well](https://source.chromium.org/chromium/chromium/src/+/main:base/rand_util_win.cc) (which transitively means that Microsoft uses it in their own web browser, Edge). If there did come a time when Microsoft truly removes this API, it should be easy enough for Rust to simply remove the fallback in the code I've added here
workingjubilee pushed a commit to tcdi/postgrestd that referenced this issue Sep 15, 2022
Make HashMap fall back to RtlGenRandom if BCryptGenRandom fails

With PR #84096, Rust `std::collections::hash_map::RandomState` changed from using `RtlGenRandom()` ([msdn](https://docs.microsoft.com/en-us/windows/win32/api/ntsecapi/nf-ntsecapi-rtlgenrandom)) to `BCryptGenRandom()` ([msdn](https://docs.microsoft.com/en-us/windows/win32/api/bcrypt/nf-bcrypt-bcryptgenrandom)) as its source of secure randomness after much discussion ([here](rust-random/getrandom#65 (comment)), among other places).

Unfortunately, after that PR landed, Mozilla Firefox started experiencing fairly-rare crashes during startup while attempting to initialize the `env_logger` crate. ([docs for env_logger](https://docs.rs/env_logger/latest/env_logger/)) The root issue is that on some machines, `BCryptGenRandom()` will fail with an `Access is denied. (os error 5)` error message. ([Bugzilla issue 1754490](https://bugzilla.mozilla.org/show_bug.cgi?id=1754490)) (Discussion in issue #94098)

Note that this is happening upon startup of Firefox's unsandboxed Main Process, so this behavior is different and separate from previous issues ([like this](https://bugzilla.mozilla.org/show_bug.cgi?id=1746254)) where BCrypt DLLs were blocked by process sandboxing. In the case of sandboxing, we knew we were doing something abnormal and expected that we'd have to resort to abnormal measures to make it work.

However, in this case we are in a regular unsandboxed process just trying to initialize `env_logger` and getting a panic. We suspect that this may be caused by a virus scanner or some other security software blocking the loading of the BCrypt DLLs, but we're not completely sure as we haven't been able to replicate locally.

It is also possible that Firefox is not the only software affected by this; we just may be one of the pieces of Rust software that has the telemetry and crash reporting necessary to catch it.

I have read some of the historical discussion around using `BCryptGenRandom()` in Rust code, and I respect the decision that was made and agree that it was a good course of action, so I'm not trying to open a discussion about a return to `RtlGenRandom()`. Instead, I'd like to suggest that perhaps we use `RtlGenRandom()` as a "fallback RNG" in the case that BCrypt doesn't work.

This pull request implements this fallback behavior. I believe this would improve the robustness of this essential data structure within the standard library, and I see only 2 potential drawbacks:

1. Slight added overhead: It should be quite minimal though. The first call to `sys::rand::hashmap_random_keys()` will incur a bit of initialization overhead, and every call after will incur roughly 2 non-atomic global reads and 2 easily predictable branches. Both should be negligible compared to the actual cost of generating secure random numbers
2. `RtlGenRandom()` is deprecated by Microsoft: Technically true, but as mentioned in [this comment on GoLang](golang/go#33542 (comment)), this API is ubiquitous in Windows software and actually removing it would break lots of things. Also, Firefox uses it already in [our C++ code](https://searchfox.org/mozilla-central/rev/5f88c1d6977e03e22d3420d0cdf8ad0113c2eb31/mfbt/RandomNum.cpp#25), and [Chromium uses it in their code as well](https://source.chromium.org/chromium/chromium/src/+/main:base/rand_util_win.cc) (which transitively means that Microsoft uses it in their own web browser, Edge). If there did come a time when Microsoft truly removes this API, it should be easy enough for Rust to simply remove the fallback in the code I've added here
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants