Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS: Cannot add to X509Store [regression from 3.1] #39603

Closed
vcsjones opened this issue Jul 18, 2020 · 45 comments · Fixed by #41787
Closed

macOS: Cannot add to X509Store [regression from 3.1] #39603

vcsjones opened this issue Jul 18, 2020 · 45 comments · Fixed by #41787

Comments

@vcsjones
Copy link
Member

vcsjones commented Jul 18, 2020

I cannot seem to write to an X509Store in macOS. Code to reproduce:

using System.Security.Cryptography.X509Certificates;

namespace scratch {
    class Program {
        static void Main(string[] args) {
            using X509Certificate2 cert = new X509Certificate2("cert.pfx", "test", X509KeyStorageFlags.Exportable);
            using X509Store store = new X509Store(StoreName.My, StoreLocation.CurrentUser);
            store.Open(OpenFlags.ReadWrite);
            store.Add(cert);
        }
    }
}

This seems to always fail on store.Add(cert). Exception:

Unhandled exception. Interop+AppleCrypto+AppleCommonCryptoCryptographicException: UNIX[Undefined error: 0]
   at Interop.AppleCrypto.X509StoreAddCertificate(SafeKeychainItemHandle certOrIdentity, SafeKeychainHandle keychain)
   at Internal.Cryptography.Pal.StorePal.AppleKeychainStore.Add(ICertificatePal cert)
   at System.Security.Cryptography.X509Certificates.X509Store.Add(X509Certificate2 certificate)
   at scratch.Program.Main(String[] args) in /Users/kjones/Projects/scratch/Program.cs:line 9

The unit tests for this appear to be in X509StoreMutableTests_OSX. When I attempt to run them, all of them are skipped because the "can I write to the store?" conditional fails:

Writing precondition failed with kPOSIXErrorBase, skipping tests.

This is not an SSH session, or a VM, or anything out of the ordinary, it's a login zsh shell.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Security untriaged New issue has not been triaged by the area owner labels Jul 18, 2020
@ghost
Copy link

ghost commented Jul 18, 2020

Tagging subscribers to this area: @bartonjs, @vcsjones, @krwq
Notify danmosemsft if you want to be subscribed.

@jkotas jkotas added the os-mac-os-x macOS aka OSX label Jul 18, 2020
@vcsjones
Copy link
Member Author

Some other background on this, removing from the keychain does work, so it seems not related to the keychain being locked or in an inconsistent state. I did note that Big Sur is getting changes to how importing certificates as root is handled, but saw nothing describing this exact use case.

Other thing to note: I can't reproduce this in netcoreapp3.1. I wonder if this has anything to do with how the binaries are signed and/or if 5 previews are signed differently than 3.1 official builds. Or, something broke between 3.1 and 5.0.

@jeffhandley
Copy link
Member

Thank you, @vcsjones! Are you planning to investigate this further? If not, we'll get someone from the team assigned to it so we can get it into 5.0 (probably RC2).

@vcsjones
Copy link
Member Author

@jeffhandley

Are you planning to investigate this further?

Yeah, but I haven't made a ton of progress. I think a good first step would be to see if someone other than me can repro it, and do the unit tests actually run (not just skipped). Maybe it just doesn't like me.

I can still repro it on Big Sur beta 4, I'm installing 5 right now.

@jeffhandley
Copy link
Member

@vcsjones I got some guidance on how we could determine if this is related to signing/notarization or if there's a potential functional regression here. I'm trying to get my machine set up to be able to check this, but it looks like it'll take me some time, so I wanted to share with you in case this is quick for you to verify.

We would need to compare when running with a signed apphost to running with an unsigned apphost.

@vcsjones
Copy link
Member Author

Hm, I think I can pull down 3.1 and build an unsigned version and see if it starts failing.

@jeffhandley
Copy link
Member

Thanks, @vcsjones. Here's what I understand:

  • If it fails with an unsigned apphost:
    • Entitlements are not enforced, so there's a product regression
  • If it fails only using a signed apphost:
    • There might be a new entitlement that needs to be added

@jeffhandley
Copy link
Member

I set this repro up on my MacBook while still on Catalina, and I was able to repro to get the same error as reported.

  • OS: macOS Catalina
  • SDK: dotnet-sdk-5.0.100.rc.1.20422.1-osx-x64.pkg

Repro Steps:

  1. dotnet new console
  2. Update Program.cs to have the same code as is shown in the issue description
  3. openssl req -x509 -days 365 -newkey rsa:4096 -keyout cert.key -out cert.crt (using "test" as the PEM pass phrase)
  4. openssl pkcs12 -export -in cert.crt -inkey cert.key -out cert.pfx (using "test" as the export password)
  5. dotnet run

I'm going to try this against .NET Core 3.1 on this setup.

@jeffhandley
Copy link
Member

It runs fine against .NET Core 3.1 on my Catalina machine but fails with .NET 5.0 on Catalina.

Combinations / Results

  • <TargetFramework>netcoreapp3.1</TargetFramework>
    • Without <UseAppHost>true</UseAppHost>
      • bin/Debug/netcoreapp3.1/x509.dll: code object is not signed at all (codesign -d bin/Debug/netcoreapp3.1/x509.dll)
      • dotnet run succeeds
      • dotnet run bin/Debug/netcoreapp3.1/x509.dll succeeds
    • With <UseAppHost>true</UseAppHost>
      • bin/Debug/netcoreapp3.1/x509.dll: code object is not signed at all (codesign -d bin/Debug/netcoreapp3.1/x509.dll)
      • dotnet run succeeds
      • dotnet run bin/Debug/netcoreapp3.1/x509.dll succeeds
  • <TargetFramework>net5.0</TargetFramework>
    • Without <UseAppHost>true</UseAppHost>
      • bin/Debug/net5.0/x509.dll: code object is not signed at all (codesign -d bin/Debug/net5.0/x509.dll)
      • dotnet run fails
      • dotnet run bin/Debug/net5.0/x509.dll fails
    • With <UseAppHost>true</UseAppHost>
      • bin/Debug/net5.0/x509.dll: code object is not signed at all (codesign -d bin/Debug/net5.0/x509.dll)
      • dotnet run fails
      • dotnet run bin/Debug/net5.0/x509.dll fails

@jeffschwMSFT do you have suggestions on where to go next from here in the investigation?

@vcsjones
Copy link
Member Author

If you think it is a functional change that causes this, I can run a git-bisect overnight to try and identify the commit that regressed the behavior.

@bartonjs
Copy link
Member

https://github.com/dotnet/runtime/commits/master/src/libraries/Native/Unix/System.Security.Cryptography.Native.Apple currently goes back to 3.1 on page 1, nothing (by title) looks suspicious, except possibly renaming the shims (maybe there's a manifest somewhere that needs to be updated?).

@jeffschwMSFT
Copy link
Member

@wfurt did a lot of deeper investigations into certs on OSX. I think our next step is to understand what OSX API is failing and why.

cc @javiercn

@vcsjones
Copy link
Member Author

@jeffhandley Also, I think when you ran dotnet run bin/Debug/net5.0/x509.dll, it should have been dotnet bin/Debug/net5.0/x509.dll (no "run"). The former is still doing dotnet run and just passing the path as the first args to Main. I don't think it changes the results when I tried your steps.

@vcsjones
Copy link
Member Author

what OSX API is failing

SecKeychainItemCreateCopy

status = SecKeychainItemCreateCopy((SecKeychainItemRef)privateKey, keychain, NULL, &itemCopy);

and why

The resulting OS Status is kPOSIXErrorBase. That isn't a very descriptive error code.

Native stack:

debug	15:13:29.064492-0400	0x00007fff3614fa2d Security::CommonError::LogBacktrace() + 181
debug	15:13:29.064522-0400	0x00007fff3614fd51 Security::UnixError::UnixError(int, bool) + 277
debug	15:13:29.064546-0400	0x00007fff3614fdae Security::UnixError::throwMe(int) + 36
debug	15:13:29.064568-0400	0x00007fff35eb7394 Security::AtomicBufferedFile::open() + 696
debug	15:13:29.064593-0400	0x00007fff35eb5e21 Security::DbModifier::getDbVersion(bool) + 331
debug	15:13:29.064619-0400	0x00007fff35eb5ca5 Security::AppleDatabase::dbOpen(Security::DbContext&) + 57
debug	15:13:29.064645-0400	0x00007fff35eb55bf Security::Database::_dbOpen(Security::DatabaseSession&, unsigned int, Security::AccessCredentials const*, void const*) + 113
debug	15:13:29.064670-0400	0x00007fff35eb4932 Security::DatabaseManager::dbOpen(Security::DatabaseSession&, Security::DbName const&, unsigned int, Security::AccessCredentials const*, void const*) + 64
debug	15:13:29.064693-0400	0x00007fff35eb445d Security::DatabaseSession::DbOpen(char const*, cssm_net_address const*, unsigned int, Security::AccessCredentials const*, void const*, long&) + 189
debug	15:13:29.064717-0400	0x00007fff35ec61b3 cssm_DbOpen(long, char const*, cssm_net_address const*, unsigned int, cssm_access_credentials const*, void const*, long*) + 118
debug	15:13:29.064764-0400	0x00007fff35ec5f7c CSSM_DL_DbOpen + 152
debug	15:13:29.064862-0400	0x00007fff35ec5298 Security::CssmClient::DbImpl::open() + 248
debug	15:13:29.064965-0400	0x00007fff35ffb0a9 SSDatabaseImpl::load(Security::DLDbIdentifier const&) + 53
debug	15:13:29.065066-0400	0x00007fff35ec636d SSDLSession::DbOpen(char const*, cssm_net_address const*, unsigned int, Security::AccessCredentials const*, void const*, long&) + 273
debug	15:13:29.065133-0400	0x00007fff35ec61b3 cssm_DbOpen(long, char const*, cssm_net_address const*, unsigned int, cssm_access_credentials const*, void const*, long*) + 118
debug	15:13:29.065244-0400	0x00007fff35ec5f7c CSSM_DL_DbOpen + 152
debug	15:13:29.065339-0400	0x00007fff35ec5298 Security::CssmClient::DbImpl::open() + 248
debug	15:13:29.065439-0400	0x00007fff35ec5162 Security::CssmClient::DbImpl::activate() + 80
debug	15:13:29.065521-0400	0x00007fff35ff3e5c Security::CssmClient::DbImpl::handle() + 56
debug	15:13:29.065588-0400	0x00007fff35ed2a17 Security::CssmClient::DbUniqueRecordImpl::get(Security::CssmClient::DbAttributes*, Security::CssmDataContainer*) + 155
debug	15:13:29.065705-0400	0x00007fff360b5263 Security::KeychainCore::KeyItem::copyTo(Security::KeychainCore::Keychain const&, Security::KeychainCore::Access*) + 2611
debug	15:13:29.065787-0400	0x00007fff360f7096 SecKeychainItemCreateCopy + 257

It looks like it is having trouble opening the keychain in the temporary directory.

@vcsjones
Copy link
Member Author

Perhaps related log output:

open /private/var/folders/1s/8zmwdp4555j61y_lwd9vvm2w0000gn/T/e792f9d4e6db4762987911c6f22970c8.keychain: Undefined error: 0

@jeffhandley
Copy link
Member

@jeffhandley Also, I think when you ran dotnet run bin/Debug/net5.0/x509.dll, it should have been dotnet bin/Debug/net5.0/x509.dll (no "run"). The former is still doing dotnet run and just passing the path as the first args to Main. I don't think it changes the results when I tried your steps.

Good catch. I reran with dotnet .../x509.dll in all 4 scenarios and it did not change the results.

@wfurt
Copy link
Member

wfurt commented Aug 24, 2020

I'm wondering if some of this may depend on system integrity setting. I have 11.0 system up and I can take a look.
There was at least one other case where 11.0 broke binary compatibility with 10.x. That could explain kPOSIXErrorBase.

Running log stream --process <PID_OF_SECURITYD> may give some more insight @vcsjones.

@vcsjones
Copy link
Member Author

Running log stream --process <PID_OF_SECURITYD> may give some more insight @vcsjones.

Yeah, that's where my stack in #39603 (comment) came from.

I'm wondering if some of this may depend on system integrity setting

Maybe? I'm just thrown by the fact that 3.1 works.

@wfurt
Copy link
Member

wfurt commented Aug 24, 2020

I have repro using latest daily 5.0 build:

bash-3.2$ ~/dotnet/dotnet run
Unhandled exception. Interop+AppleCrypto+AppleCommonCryptoCryptographicException: UNIX[Undefined error: 0]
   at Interop.AppleCrypto.X509StoreAddCertificate(SafeKeychainItemHandle certOrIdentity, SafeKeychainHandle keychain)
   at Internal.Cryptography.Pal.StorePal.AppleKeychainStore.Add(ICertificatePal cert)
   at System.Security.Cryptography.X509Certificates.X509Store.Add(X509Certificate2 certificate)
   at scratch.Program.Main(String[] args) in /Users/furt/certs/Program.cs:line 9

I will take a look and share findings.

@wfurt
Copy link
Member

wfurt commented Aug 24, 2020

It seems like this is somehow related to deleting temporary keychain before finishing the import @bartonjs. If I comment out that code, (turning AppleCryptoNative_SecKeychainDelete to NOP) import works as expected.
Does it ring any bell?

I also verified that 3.1.6 works. I don't have 3.1 build on my 11.0 machine so I did not check if the sequence is the same.

@vcsjones
Copy link
Member Author

deleting temporary keychain before finishing the import

Interesting. Thanks for getting to the bottom of that.

@bartonjs
Copy link
Member

Does it ring any bell?

Not particularly... but that must mean we changed the managed code for it... possibly as part of the PFX rewrite? But I'm surprised an order of operations would have changed.

@vcsjones
Copy link
Member Author

The problem is basically here:

SafeKeychainHandle keychain = persist
? Interop.AppleCrypto.SecKeychainCopyDefault()
: Interop.AppleCrypto.CreateTemporaryKeychain();
using (keychain)
{
return ImportPkcs12(rawData, password, exportable, keychain);
}

We import the X509Certificate2 in to a temporary keychain (because PersistKeySet is not set). I think the temporary keychain that the key gets imported in to needs to live as long as the returned ICertificatePal.

@wfurt
Copy link
Member

wfurt commented Aug 26, 2020

Did that part changed? I thought I saw similar code in 3.1 but I may be mistaken.
Also, log seems to complain about old-style keychain outside of ~/Library/Keychains.
I was wondering if the location also may make difference of there is some other way how to avoid that warning.
(But I did not have time (yet?) for experiments)

@vcsjones
Copy link
Member Author

vcsjones commented Aug 26, 2020

As a test, if that code is changed to:

SafeKeychainHandle keychain = persist
    ? Interop.AppleCrypto.SecKeychainCopyDefault()
    : Interop.AppleCrypto.CreateTemporaryKeychain();

System.GC.SuppressFinalize(keychain);
return ImportPkcs12(rawData, password, exportable, keychain);

Then the problem goes away, too.

@wfurt
Copy link
Member

wfurt commented Aug 26, 2020

It seems like holding ref to the keychain is no longer good enough.

@vcsjones
Copy link
Member Author

vcsjones commented Aug 26, 2020

if the location also may make difference

It doesn't. I tried. (I also tried changing the keychain temp file extension to keychain-db in case for some reason that mattered)

Did that part changed?

I don't think so. I still haven't been able to nail down why 3.1 is working, but I thought I might be better off looking in to 5.0. It's entirely possible 3.1 is working by luck.

@jeffhandley
Copy link
Member

Is there a possibility of a GC change during 5.0 that could have affected this?

@jeffschwMSFT
Copy link
Member

I asked @JulieLeeMSFT and @mangod9 to take a look. Sometimes things that have have lifetime issues can 'work' for many years before a slight change in allocations that yield the failure - we often see this with Timers, etc.

@jeffhandley
Copy link
Member

Thanks, @jeffschwMSFT!

Even if it was introduced by outside factors, I think the fix would likely be along the lines of what @vcsjones illustrated. Are there any objections to moving forward with that as the fix?

@vcsjones
Copy link
Member Author

Are there any objections to moving forward with that as the fix?

That wasn't a complete fix unfortunately, all it did was stop the temp .keychain file from being deleted. That change as-is will leave the disk cluttered with temporary keychains.

I think a correct fix would be to transfer ownership of the SafeTemporaryKeychainHandle to the AppleCertificatePal iif the PKCS12 file is loaded as non-persisted and the PAL is constructed with a SafeSecIdentityHandle. When the PAL is disposed, then we can clean up the temp keychain.

I can give that a shot.

@jeffschwMSFT
Copy link
Member

I think we need to take a closer look. Based on the code pattern (embedded within a using), I am surprised that this is failing out of scope.

@jkotas @davidwrighton

@vcsjones
Copy link
Member Author

I don't think this is a GC issue - I threw in the SuppressFinalize so the GC wouldn't clean up the handle (which deletes the temp keychain) and force it to leak. That snippet was solely there to at least confirm that is the source of some object-lifetime issue.

@jeffhandley
Copy link
Member

Got it; thanks for the clarification, @vcsjones.

@AndyAyersMS
Copy link
Member

I'll take a look at the codegen side to see what we're doing for GC tracking.

@jkotas
Copy link
Member

jkotas commented Aug 26, 2020

, I am surprised that this is failing out of scope.

Note that this is all new code in 5.0. ImportPkcs12 call at #39603 (comment) did not exist in 3.1. I think it is most likely a bug in the new code, it is unlikely to be a bug in the core runtime.

@AndyAyersMS
Copy link
Member

Since this fails with debug codegen it's also unlikely to be a jit GC tracking issue.

GC info leading up to the call to ImportPkcs12 looks reasonable:

000122 E829FBFFFF           call     AppleCrypto:CreateTemporaryKeychain():System.Security.Cryptography.Apple.SafeTemporaryKeychainHandle
000127 48894590             mov      gword ptr [rbp-70H], rax
00012B EB09                 jmp      SHORT G_M61524_IG08
						;; bbWeight=1    PerfScore 13.50
G_M61524_IG07:
00012D E8CEFAFFFF           call     AppleCrypto:SecKeychainCopyDefault():System.Security.Cryptography.Apple.SafeKeychainHandle
000132 48894590             mov      gword ptr [rbp-70H], rax
						;; bbWeight=1    PerfScore 2.00
G_M61524_IG08:
000136 488B7D90             mov      rdi, gword ptr [rbp-70H]
00013A 48897DC0             mov      gword ptr [rbp-40H], rdi
00013E 488B7DC0             mov      rdi, gword ptr [rbp-40H]
000142 48897DB8             mov      gword ptr [rbp-48H], rdi
						;; bbWeight=1    PerfScore 4.00
G_M61524_IG09:
000146 488B7DF0             mov      rdi, bword ptr [rbp-10H]
00014A 488B75F8             mov      rsi, qword ptr [rbp-08H]
00014E 488B55E8             mov      rdx, gword ptr [rbp-18H]
000152 8B4DCC               mov      ecx, dword ptr [rbp-34H]
000155 4C8B45C0             mov      r8, gword ptr [rbp-40H]
000159 E85AC0FFFF           call     Internal.Cryptography.Pal.AppleCertificatePal:ImportPkcs12(System.ReadOnlySpan`1[Byte],Microsoft.Win32.SafeHandles.SafePasswordHandle,bool,System.Security.Cryptography.Apple.SafeKeychainHandle):Internal.Cryptography.ICertificatePal
00015E 488945B0             mov      gword ptr [rbp-50H], rax

;; stack slots rbp-70, rbp-40, rbp-48 are untracked

Register slot id for reg rax = 17.
Register slot id for reg rsi = 18.
Register slot id for reg rdi = 19.
Register slot id for reg rdx = 20.
Register slot id for reg r8 = 21.
Register slot id for reg rcx = 22.

Set state of slot 17 at instr offset 0x127 to Live.
Set state of slot 17 at instr offset 0x12d to Dead.
Set state of slot 17 at instr offset 0x132 to Live.
Set state of slot 17 at instr offset 0x136 to Dead.
Set state of slot 19 at instr offset 0x13a to Live.
Set state of slot 19 at instr offset 0x146 to Dead.
Set state of slot 16 at instr offset 0x14a to Live.
Set state of slot 20 at instr offset 0x152 to Live.
Set state of slot 21 at instr offset 0x159 to Live.
Set state of slot 17 at instr offset 0x15e to Live.

@jeffhandley
Copy link
Member

Hey @vcsjones -- @wfurt is going to take a look into this this week. Thanks for your help on the investigation; you've given us a great place to continue from. 💯

@jeffhandley
Copy link
Member

I'm editing the title to remove "Big Sur" since this repro'd for me on Catalina as well.

@jeffhandley jeffhandley changed the title macOS Big Sur: Cannot add to X509Store macOS: Cannot add to X509Store [regression from 3.1] Sep 1, 2020
@vcsjones
Copy link
Member Author

vcsjones commented Sep 1, 2020

@wfurt is going to take a look into this this week.

Thanks a ton. Sorry I've just been underwater with other things lately.

@wfurt
Copy link
Member

wfurt commented Sep 4, 2020

reopening to 5.0. Master is now for 6.0

@danmoseley
Copy link
Member

danmoseley commented Sep 4, 2020

I was wondering what regressed it, and I guess it was probably the big PFX reader/writer change: af6ab5f dotnet/corefx#42226

@jeffhandley
Copy link
Member

Fixed in 5.0.0 as well with #41882.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants