-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
macOS: Cannot add to X509Store [regression from 3.1] #39603
Comments
Some other background on this, removing from the keychain does work, so it seems not related to the keychain being locked or in an inconsistent state. I did note that Big Sur is getting changes to how importing certificates as root is handled, but saw nothing describing this exact use case. Other thing to note: I can't reproduce this in netcoreapp3.1. I wonder if this has anything to do with how the binaries are signed and/or if 5 previews are signed differently than 3.1 official builds. Or, something broke between 3.1 and 5.0. |
Thank you, @vcsjones! Are you planning to investigate this further? If not, we'll get someone from the team assigned to it so we can get it into 5.0 (probably RC2). |
Yeah, but I haven't made a ton of progress. I think a good first step would be to see if someone other than me can repro it, and do the unit tests actually run (not just skipped). Maybe it just doesn't like me. I can still repro it on Big Sur beta 4, I'm installing 5 right now. |
@vcsjones I got some guidance on how we could determine if this is related to signing/notarization or if there's a potential functional regression here. I'm trying to get my machine set up to be able to check this, but it looks like it'll take me some time, so I wanted to share with you in case this is quick for you to verify. We would need to compare when running with a signed apphost to running with an unsigned apphost. |
Hm, I think I can pull down 3.1 and build an unsigned version and see if it starts failing. |
Thanks, @vcsjones. Here's what I understand:
|
I set this repro up on my MacBook while still on Catalina, and I was able to repro to get the same error as reported.
Repro Steps:
I'm going to try this against .NET Core 3.1 on this setup. |
It runs fine against .NET Core 3.1 on my Catalina machine but fails with .NET 5.0 on Catalina. Combinations / Results
@jeffschwMSFT do you have suggestions on where to go next from here in the investigation? |
If you think it is a functional change that causes this, I can run a |
https://github.com/dotnet/runtime/commits/master/src/libraries/Native/Unix/System.Security.Cryptography.Native.Apple currently goes back to 3.1 on page 1, nothing (by title) looks suspicious, except possibly renaming the shims (maybe there's a manifest somewhere that needs to be updated?). |
@jeffhandley Also, I think when you ran |
runtime/src/libraries/Native/Unix/System.Security.Cryptography.Native.Apple/pal_keychain.c Line 354 in ec59f65
The resulting OS Status is kPOSIXErrorBase. That isn't a very descriptive error code. Native stack: debug 15:13:29.064492-0400 0x00007fff3614fa2d Security::CommonError::LogBacktrace() + 181 debug 15:13:29.064522-0400 0x00007fff3614fd51 Security::UnixError::UnixError(int, bool) + 277 debug 15:13:29.064546-0400 0x00007fff3614fdae Security::UnixError::throwMe(int) + 36 debug 15:13:29.064568-0400 0x00007fff35eb7394 Security::AtomicBufferedFile::open() + 696 debug 15:13:29.064593-0400 0x00007fff35eb5e21 Security::DbModifier::getDbVersion(bool) + 331 debug 15:13:29.064619-0400 0x00007fff35eb5ca5 Security::AppleDatabase::dbOpen(Security::DbContext&) + 57 debug 15:13:29.064645-0400 0x00007fff35eb55bf Security::Database::_dbOpen(Security::DatabaseSession&, unsigned int, Security::AccessCredentials const*, void const*) + 113 debug 15:13:29.064670-0400 0x00007fff35eb4932 Security::DatabaseManager::dbOpen(Security::DatabaseSession&, Security::DbName const&, unsigned int, Security::AccessCredentials const*, void const*) + 64 debug 15:13:29.064693-0400 0x00007fff35eb445d Security::DatabaseSession::DbOpen(char const*, cssm_net_address const*, unsigned int, Security::AccessCredentials const*, void const*, long&) + 189 debug 15:13:29.064717-0400 0x00007fff35ec61b3 cssm_DbOpen(long, char const*, cssm_net_address const*, unsigned int, cssm_access_credentials const*, void const*, long*) + 118 debug 15:13:29.064764-0400 0x00007fff35ec5f7c CSSM_DL_DbOpen + 152 debug 15:13:29.064862-0400 0x00007fff35ec5298 Security::CssmClient::DbImpl::open() + 248 debug 15:13:29.064965-0400 0x00007fff35ffb0a9 SSDatabaseImpl::load(Security::DLDbIdentifier const&) + 53 debug 15:13:29.065066-0400 0x00007fff35ec636d SSDLSession::DbOpen(char const*, cssm_net_address const*, unsigned int, Security::AccessCredentials const*, void const*, long&) + 273 debug 15:13:29.065133-0400 0x00007fff35ec61b3 cssm_DbOpen(long, char const*, cssm_net_address const*, unsigned int, cssm_access_credentials const*, void const*, long*) + 118 debug 15:13:29.065244-0400 0x00007fff35ec5f7c CSSM_DL_DbOpen + 152 debug 15:13:29.065339-0400 0x00007fff35ec5298 Security::CssmClient::DbImpl::open() + 248 debug 15:13:29.065439-0400 0x00007fff35ec5162 Security::CssmClient::DbImpl::activate() + 80 debug 15:13:29.065521-0400 0x00007fff35ff3e5c Security::CssmClient::DbImpl::handle() + 56 debug 15:13:29.065588-0400 0x00007fff35ed2a17 Security::CssmClient::DbUniqueRecordImpl::get(Security::CssmClient::DbAttributes*, Security::CssmDataContainer*) + 155 debug 15:13:29.065705-0400 0x00007fff360b5263 Security::KeychainCore::KeyItem::copyTo(Security::KeychainCore::Keychain const&, Security::KeychainCore::Access*) + 2611 debug 15:13:29.065787-0400 0x00007fff360f7096 SecKeychainItemCreateCopy + 257 It looks like it is having trouble opening the keychain in the temporary directory. |
Perhaps related log output:
|
Good catch. I reran with |
I'm wondering if some of this may depend on system integrity setting. I have 11.0 system up and I can take a look. Running |
Yeah, that's where my stack in #39603 (comment) came from.
Maybe? I'm just thrown by the fact that 3.1 works. |
I have repro using latest daily 5.0 build:
I will take a look and share findings. |
It seems like this is somehow related to deleting temporary keychain before finishing the import @bartonjs. If I comment out that code, (turning AppleCryptoNative_SecKeychainDelete to NOP) import works as expected. I also verified that 3.1.6 works. I don't have 3.1 build on my 11.0 machine so I did not check if the sequence is the same. |
Interesting. Thanks for getting to the bottom of that. |
Not particularly... but that must mean we changed the managed code for it... possibly as part of the PFX rewrite? But I'm surprised an order of operations would have changed. |
The problem is basically here: Lines 106 to 113 in afa4b2f
We import the X509Certificate2 in to a temporary keychain (because |
Did that part changed? I thought I saw similar code in 3.1 but I may be mistaken. |
As a test, if that code is changed to: SafeKeychainHandle keychain = persist
? Interop.AppleCrypto.SecKeychainCopyDefault()
: Interop.AppleCrypto.CreateTemporaryKeychain();
System.GC.SuppressFinalize(keychain);
return ImportPkcs12(rawData, password, exportable, keychain); Then the problem goes away, too. |
It seems like holding ref to the keychain is no longer good enough. |
It doesn't. I tried. (I also tried changing the keychain temp file extension to
I don't think so. I still haven't been able to nail down why 3.1 is working, but I thought I might be better off looking in to 5.0. It's entirely possible 3.1 is working by luck. |
Is there a possibility of a GC change during 5.0 that could have affected this? |
I asked @JulieLeeMSFT and @mangod9 to take a look. Sometimes things that have have lifetime issues can 'work' for many years before a slight change in allocations that yield the failure - we often see this with Timers, etc. |
Thanks, @jeffschwMSFT! Even if it was introduced by outside factors, I think the fix would likely be along the lines of what @vcsjones illustrated. Are there any objections to moving forward with that as the fix? |
That wasn't a complete fix unfortunately, all it did was stop the temp .keychain file from being deleted. That change as-is will leave the disk cluttered with temporary keychains. I think a correct fix would be to transfer ownership of the I can give that a shot. |
I think we need to take a closer look. Based on the code pattern (embedded within a using), I am surprised that this is failing out of scope. |
I don't think this is a GC issue - I threw in the SuppressFinalize so the GC wouldn't clean up the handle (which deletes the temp keychain) and force it to leak. That snippet was solely there to at least confirm that is the source of some object-lifetime issue. |
Got it; thanks for the clarification, @vcsjones. |
I'll take a look at the codegen side to see what we're doing for GC tracking. |
Note that this is all new code in 5.0. ImportPkcs12 call at #39603 (comment) did not exist in 3.1. I think it is most likely a bug in the new code, it is unlikely to be a bug in the core runtime. |
Since this fails with debug codegen it's also unlikely to be a jit GC tracking issue. GC info leading up to the call to 000122 E829FBFFFF call AppleCrypto:CreateTemporaryKeychain():System.Security.Cryptography.Apple.SafeTemporaryKeychainHandle
000127 48894590 mov gword ptr [rbp-70H], rax
00012B EB09 jmp SHORT G_M61524_IG08
;; bbWeight=1 PerfScore 13.50
G_M61524_IG07:
00012D E8CEFAFFFF call AppleCrypto:SecKeychainCopyDefault():System.Security.Cryptography.Apple.SafeKeychainHandle
000132 48894590 mov gword ptr [rbp-70H], rax
;; bbWeight=1 PerfScore 2.00
G_M61524_IG08:
000136 488B7D90 mov rdi, gword ptr [rbp-70H]
00013A 48897DC0 mov gword ptr [rbp-40H], rdi
00013E 488B7DC0 mov rdi, gword ptr [rbp-40H]
000142 48897DB8 mov gword ptr [rbp-48H], rdi
;; bbWeight=1 PerfScore 4.00
G_M61524_IG09:
000146 488B7DF0 mov rdi, bword ptr [rbp-10H]
00014A 488B75F8 mov rsi, qword ptr [rbp-08H]
00014E 488B55E8 mov rdx, gword ptr [rbp-18H]
000152 8B4DCC mov ecx, dword ptr [rbp-34H]
000155 4C8B45C0 mov r8, gword ptr [rbp-40H]
000159 E85AC0FFFF call Internal.Cryptography.Pal.AppleCertificatePal:ImportPkcs12(System.ReadOnlySpan`1[Byte],Microsoft.Win32.SafeHandles.SafePasswordHandle,bool,System.Security.Cryptography.Apple.SafeKeychainHandle):Internal.Cryptography.ICertificatePal
00015E 488945B0 mov gword ptr [rbp-50H], rax
;; stack slots rbp-70, rbp-40, rbp-48 are untracked
Register slot id for reg rax = 17.
Register slot id for reg rsi = 18.
Register slot id for reg rdi = 19.
Register slot id for reg rdx = 20.
Register slot id for reg r8 = 21.
Register slot id for reg rcx = 22.
Set state of slot 17 at instr offset 0x127 to Live.
Set state of slot 17 at instr offset 0x12d to Dead.
Set state of slot 17 at instr offset 0x132 to Live.
Set state of slot 17 at instr offset 0x136 to Dead.
Set state of slot 19 at instr offset 0x13a to Live.
Set state of slot 19 at instr offset 0x146 to Dead.
Set state of slot 16 at instr offset 0x14a to Live.
Set state of slot 20 at instr offset 0x152 to Live.
Set state of slot 21 at instr offset 0x159 to Live.
Set state of slot 17 at instr offset 0x15e to Live.
|
I'm editing the title to remove "Big Sur" since this repro'd for me on Catalina as well. |
Thanks a ton. Sorry I've just been underwater with other things lately. |
reopening to 5.0. Master is now for 6.0 |
I was wondering what regressed it, and I guess it was probably the big PFX reader/writer change: af6ab5f dotnet/corefx#42226 |
Fixed in 5.0.0 as well with #41882. |
I cannot seem to write to an
X509Store
in macOS. Code to reproduce:This seems to always fail on
store.Add(cert)
. Exception:The unit tests for this appear to be in
X509StoreMutableTests_OSX
. When I attempt to run them, all of them are skipped because the "can I write to the store?" conditional fails:This is not an SSH session, or a VM, or anything out of the ordinary, it's a login zsh shell.
The text was updated successfully, but these errors were encountered: