Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Move dlopen and dlsym to PAL #25134

Closed
wants to merge 5 commits into from
Closed

Move dlopen and dlsym to PAL #25134

wants to merge 5 commits into from

Conversation

qmfrederik
Copy link

System.Drawing P/Invokes into dlopen and dlsym to load the various libgdiplus/GDI+ functions.

It currently assumes that libdl.so exists and contains dlopen and dlsym. This is not always the case - for example on CentOS you have libdl.so.2 but not libdl.so and on FreeBSD dlopen does not live in libdl at all.

Instead of trying figure out where dlopen lives at runtime, add dlopen and dlsym to the PAL and resolve it at compile time.

@qmfrederik
Copy link
Author

This should help fix #25102 and #24538, @safern let me know if you agree with the approach.


#include <dlfcn.h>

extern "C" void* SystemNative_DlOpen(const char *file, int mode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we try to avoid just assuming that these values are the same on all platforms, e.g. that RTLD_GLOBAL is 4 everywhere. Two options:

  1. Switch on mode, accepting the values we use in .NET, and use that to determine the actual value to pass to dlopen.
  2. Use static_asserts to validate the values we're using for mode match the platform at compile time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I limited the values to RTLD_LAZY and RTLD_NOW, added PAL_ equivalents and added static_asserts to validate the values at compile time.

}

[DllImport(Libraries.SystemNative, EntryPoint = "SystemNative_DlOpen")]
internal static extern IntPtr DlOpen(string fileName, DlOpenFlags flag);
Copy link
Member

@stephentoub stephentoub Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like dlopen is also being used in tests, e.g.

nativeLib = dlopen("libgdiplus.so", RTLD_NOW);

Interop.Libdl.dlopen((

Do those need to be fixed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated both usages of dlopen. I couldn't find any other places that use dlopen so I think we're good now.

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, otherwise looks good.

PAL_RTLD_LAZY = 1,
PAL_RTLD_NOW = 2
};
extern "C" void* SystemNative_DlOpen(const char *file, int mode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@safern safern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks @qmfrederik

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fixing these up.

@stephentoub
Copy link
Member

Actually, wait... does System.Drawing.Common need to continue working on .NET Core 2.0 / .NET Standard 2.0? If so, it can't take additional dependencies on the shim.
cc: @weshaggard, @Petermarcu

@stephentoub stephentoub added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Nov 9, 2017
@safern
Copy link
Member

safern commented Nov 9, 2017

Actually, wait... does System.Drawing.Common need to continue working on .NET Core 2.0 / .NET Standard 2.0? If so, it can't take additional dependencies on the shim.

Yes it needs to continue working in 2.0, good catch.

We could potentially include files when TargetGroup == netcoreapp2.0 and behave the old way, but it will be a little bit messy.

@qmfrederik
Copy link
Author

Alternatively, I guess we could keep the direct the P/Invoke into libdl on netcoreapp2.0 and use the shim on netcoreapp2.1 & beyond?
System.Drawing.Common would still not work out of the box on some Linux distros on 2.0 but that may be better than no support at all.

Let me know which way you want to go and I'll update the PR.

@qmfrederik qmfrederik changed the title Move dlopen and dlsym to PAL [WIP] Move dlopen and dlsym to PAL Nov 9, 2017
@qmfrederik
Copy link
Author

I updated the PR so that there are now to versions of DlOpen and DlSym: On .NET Core 2.0, they will directly P/Invoke libdl (with known limitations); on .NET Core 2.1, they would P/Invoke the PAL.

Net, there would be no change for netcoreapp2.0 and the CentOS & BSD issues would be fixed in netcoreapp2.1.

I'm not really up to speed with the different build configuration in this repo, so if I can improve this, let me know.

@qmfrederik
Copy link
Author

@dotnet-bot test Tizen armel Debug Build

@@ -16,6 +16,8 @@
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)' == 'netcoreapp2.0-Unix-Debug|AnyCPU'" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)' == 'netcoreapp2.0-Unix-Release|AnyCPU'" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)' == 'netcoreapp2.1-Unix-Debug|AnyCPU'" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

netcoreapp (versionless) is equivalent to netcoreapp2.1 so you can leave it as netcoreapp. I will go ahead and update your branch with the right confiurations as we need to keep netcoreapp2.0, netfx and netstandard as a PackageConfiguration and netcoreapp as a build configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the build configurations in the csproj are only for Visual Studio to understand the different configurations that this project can target, but the real configurations we use to build are in configurations.props file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed, you can take a look at what I did in commit: 2aafcc4

@safern
Copy link
Member

safern commented Nov 10, 2017

@weshaggard do you mind reviewing: 2aafcc4 please?

@qmfrederik qmfrederik changed the title [WIP] Move dlopen and dlsym to PAL Move dlopen and dlsym to PAL Nov 10, 2017
@weshaggard
Copy link
Member

@weshaggard do you mind reviewing: 2aafcc4 please?

The configurations look correct to me.

@safern
Copy link
Member

safern commented Nov 10, 2017

@weshaggard do you mind reviewing: 2aafcc4 please?

Thanks.

@safern
Copy link
Member

safern commented Nov 10, 2017

@qmfrederik could you please validate that this PR on the current state actually fixes the problem?

@qmfrederik
Copy link
Author

@safern I'm trying to but must be doing something wrong. I'm basically trying to follow the steps in Using your local CoreFX build dogfooding instructions on Linux.

From what I gather there, to build be able to dogfood from a local build, I need to add corefx/bin/packages/Debug as a NuGet source and restore from there. The docs mention you need ./build -allconfigurations to get the NuGet packages building.

At first, I hit #25185 which is fixable, but now the builds fail with this message:

error : File System.Numerics.Vectors.WindowsRuntime.dll is marked as inbox for framework uap10.0.15138 but was missing from framework package Microsoft.Private.CoreFx.UAP/4.6.0-preview1-26009-0. Either add the file or update InboxOn entry in /home/fcarlier/git/corefx/pkg/Microsoft.Private.PackageBaseline/packageIndex.json. This may be suppressed with with PermitMissingInbox suppression [/home/fcarlier/git/corefx/pkg/Microsoft.Private.CoreFx.UAP/Microsoft.Private.CoreFx.UAP.pkgproj]

I guess I've missed something. Is there an easier way to build a NuGet package source from my corefx repo?

@safern
Copy link
Member

safern commented Nov 11, 2017

If what you’re trying to achieve is tu build System.Drawing.Common Package, you just need to do build and then do msbuild src/System.Drawing.Common/pkg/System.Drawing.Common.pkgproj

That will create the System.Drawing.Common Package in the Package drop. What OS are you in?

@qmfrederik
Copy link
Author

@safern

Building src/System.Drawing.Common/pkg/System.Drawing.Common.pkgproj directly got me the System.Drawing.Common NuGet package, thanks. Adding myget as a NuGet package source also made dotnet restore succeed for my test project:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp2.1</TargetFramework>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="System.Drawing.Common" Version="4.5.0-preview1-26009-0"/>
  </ItemGroup>
</Project>
<configuration>
  <packageSources>
    <add key="local coreclr" value="/home/fcarlier/git/corefx/bin/packages/Debug/" />
    <add key="dotnet myget" value="https://dotnet.myget.org/F/dotnet-core/api/v3/index.json" />
  </packageSources>
</configuration>

However, running dotnet publish -f netcoreapp2.1 -r linux-x64 seems to copy the netcoreapp2.0 version of System.Drawing.Common.dll to my output directory:

5820a81b1226dab24785b7dddbad7646cd08bb6b  bin/Debug/netcoreapp2.1/linux-x64/publish/System.Drawing.Common.dll
5820a81b1226dab24785b7dddbad7646cd08bb6b  /home/fcarlier/git/corefx/bin/Unix.AnyCPU.Debug/System.Drawing.Common/netcoreapp2.0/System.Drawing.Common.dll
71d098430087bb458af41c8c936abbf3dd1ffb89  /home/fcarlier/git/corefx/bin/Unix.AnyCPU.Debug/System.Drawing.Common/netcoreapp/System.Drawing.Common.dll

Copying the output (I'm building on Ubuntu 16.04 and testing on CentOS 7) confirms that:

./drawing
Unhandled Exception: System.TypeInitializationException: The type initializer for 'Gdip' threw an exception. ---> System.DllNotFoundException: Unable to load DLL 'libdl': The specified module or one of its dependencies could not be found.
 (Exception from HRESULT: 0x8007007E)
   at Interop.Sys.DlOpen(String fileName, DlOpenFlags flag)
   at System.Drawing.SafeNativeMethods.Gdip.LoadNativeLibrary()
   at System.Drawing.SafeNativeMethods.Gdip..cctor()
   --- End of inner exception stack trace ---
   at System.Drawing.SafeNativeMethods.Gdip.GdipCreateBitmapFromFile(String filename, IntPtr& bitmap)
   at System.Drawing.Bitmap..ctor(String filename, Boolean useIcm)
   at System.Drawing.Bitmap..ctor(String filename)
   at drawing.Program.Main(String[] args) in /home/fcarlier/scratch/drawing/Program.cs:line 11
Aborted

When I manually copy over System.Native.so and System.Drawing.Common.dll for the netcoreapp2.1 profile, it does work:

Unhandled Exception: System.ArgumentException: Parameter is not valid.
   at System.Drawing.SafeNativeMethods.Gdip.CheckStatus(Int32 status)
   at System.Drawing.Bitmap..ctor(String filename, Boolean useIcm)
   at System.Drawing.Bitmap..ctor(String filename)
   at drawing.Program.Main(String[] args) in /home/fcarlier/scratch/drawing/Program.cs:line 11
Aborted

So the fix in itself works on CentOS, but somehow my dotnet restore/dotnet build process got me the netcoreapp2.0 instead of the netcoreapp version of System.Drawing.Common.

I'm using the latest SDK:

fcarlier@ubuntu:~/scratch/drawing$ ~/scratch/dotnet --version
2.2.0-preview1-007525

@safern
Copy link
Member

safern commented Nov 11, 2017

I believe because the pkgproj has netcoreapp2.0 as a target framework because we haven’t shipped the stable versions. Once we ship netcoreapp2.1 I think we will have to ship a stable version of this packages with a netcoreapp2.1 asset. @weshaggard correct me if I’m wrong.

Your validation confirms that for netcoreapp2.0 we still depend on the direct P/invoke to libl and we don’t depend on the System.Native shim which is what we want.

And for netcoreapp2.1 we are getting the expected behavior.

@stephentoub stephentoub removed the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Nov 11, 2017
netcoreapp2.0-Windows_NT;
netcoreapp2.0-Unix;
netfx;
netstandard;
</PackageConfigurations>
<BuildConfigurations>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is turning System.Drawing.Common into yet another partial OOB, with unmanaged dependency. Partial OOBs like that have very high maintenance costs. If we want to do this, it should be approved by the .NET Core disro owners.
cc @Petermarcu @karelz

The right way to solve the problem with PInvoking to DlOpen is to add the imperative dll import APIs to .NET Core 2.1: https://github.com/dotnet/coreclr/issues/14968. It will avoid the need for the dependency on .NE T Core shims, and putting System.Drawing.Common inbox.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas, won't it be a partial OOB either way? A build for 2.0 on top of standard, and then a build for 2.1 using netcoreapp-specific APIs. How does a dependency on a System.Native.so/dylib function cause any more issues than a dependency on netcoreapp-specific method?

Copy link
Member

@jkotas jkotas Nov 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is whether it is part of .NET Runtime download or whether the app carries it with itself.

Everything that depends on System.Native.so/dylib shim should be part of .NET Runtime download. The System.Native.so/dylib shim is not part of our public API surface.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think dotnet/coreclr#14968 will solve the problem this PR is trying to address.

Yes, there are a lot of cases where managed code wants to have a lot of control over which library is being loaded. Managed libraries which wrap around native libraries, for example, will know where their native library is located, what name(s) it may have, and will try to load that directly.
For example, I think it's normal System.Drawing.Common has a very strong opinion about where libgdiplus.so is located, because it has domain knowledge about libgdiplus. The same for, say, FFmpeg.AutoGen, which is a managed wrapper around ffmpeg.

On the other hand, System.Drawing.Common doesn't care at all about dlopen and has no domain knowledge about it. The way I understand it, dotnet/coreclr#14968 would allow System.Drawing.Common to say something along the lines of "dlopen may be in libdl.so on most Linux distro's, libdl.so.2 on CentOS and something else on FreeBSD".

I don't think it's the business of System.Drawing.Common to care about that.

Anyone writing a managed wrapper around a native library trying and using function pointers will need dlopen and dlsym. Everyone doing that will need to replicate the same search logic.
And most importantly, that logic will eventually fail: when corefx is ported to a new platform, when libdl versions,... .

Long story short, for this use case, I believe a better option would be to make dlopen/dlsym in one shape or another part of the CoreFX API, as suggested in #17135.

"The least that could possible work" may be to expose a public LoadNativeLibrary and LoadNativeSymbol API in CoreFX on which System.Drawing.Common and others can depend.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe a better option would be to make dlopen/dlsym in one shape or another part of the CoreFX API, as suggested in #17135.

Agree. I did not know that #17135 existed. The APIs proposed by #17135 are the kind of APIs that I thought we will add as part of dotnet/coreclr#14968.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nuget package declares dependency on netcore2.1, it is expected to work on netcoreapp2.2 or even netcoreapp3.0 as well. How do we make sure that it works there when it depends on netcoreapp2.1 shims?

Copy link
Member

@stephentoub stephentoub Nov 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we make sure that it works there when it depends on netcoreapp2.1 shims?

By taking such a dependency we're signing up to keep those functions exposed, as part of the internal surface area of the product, little different than signing up to continue keeping public surface area exposed in subsequent versions.

Like I said earlier, I'm fine if we decide against that and want to impose the restrictions you highlight. I'm simply calling out that I don't think it's the only avenue available to us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

The difference is that we have a lot of infrastructure to maintain compatibility of the public surface. Such infrastructure does not exist for the internal surface. It would need to be built.

We have tried to play similar internal surface games in .NET Framework in the past (e.g. with System.SizedReference). It tends to turn into poorly designed, tested and documented part that everybody is afraid to touch. I would love to keep things simple and stick to public surface when it comes to dependencies between independent packages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that we have a lot of infrastructure to maintain compatibility of the public surface.

Fair enough.

I would love to keep things simple and stick to public surface when it comes to dependencies between independent packages.

Ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I agree with @jkotas that we should avoid taking these type of dependencies in independent shipping packages. While the rule likely isn't written down anywhere we do try to follow the same strategy for anything the depends on System.Private.CoreLib directly as well.

@jkotas jkotas added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Nov 11, 2017
@qmfrederik
Copy link
Author

@jkotas @stephentoub @karelz Although I'm don't fully understand why this PR carries a high maintenance cost, I do think it's worth getting generic load library/load symbol functionality in corefx.

Look at #17135 and dotnet/coreclr#14968, what would the next steps be to implement this? Do we need to go through an API review process for this (and how do we start one)?

@jkotas
Copy link
Member

jkotas commented Nov 12, 2017

Do we need to go through an API review process for this

Yes. cc @russellhadley

@karelz karelz added this to the 2.1.0 milestone Nov 18, 2017
@stephentoub
Copy link
Member

Seems like we're going to pursue another direction, so I'm going to close out this PR for now. Thanks for the efforts here, @qmfrederik.

@stephentoub stephentoub closed this Dec 5, 2017
@qmfrederik
Copy link
Author

@stephentoub I'm fine with pursuing another direction; yet it seems like we're a stuck in identifying what that direction is/what the next steps are.

The API proposal of #17135 looks good to me. It would solve the System.Drawing.Common problem, and also be very valuable to all other native-code wrappers I know of.

What are the next steps we can take to move this forward?

@stephentoub
Copy link
Member

What are the next steps we can take to move this forward?

When the proposal in that issue has had the appropriate discussion and is ready for review, its tag can be changed from api-needs-work to api-ready-for-review, and then it'll be discussed at a subsequent API review meeting.

@qmfrederik
Copy link
Author

When the proposal in that issue has had the appropriate discussion and is ready for review

That seems to be the tricky part 😄 . Who defines when it's ready/more work is needed?
The proposal comes from mellinoe, we had a 'looks good' from miguel, who else needs to (pre)-review the proposal?

@stephentoub
Copy link
Member

These probably are better questions to ask on that issue rather than here. @karelz, can you help push that forward or find someone to do so?

@karelz
Copy link
Member

karelz commented Dec 5, 2017

Area owners need to push it further -- @russellhadley @luqun @shrah can you please look at #17135?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Drawing * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants