-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Conversation
This should help fix #25102 and #24538, @safern let me know if you agree with the approach. |
|
||
#include <dlfcn.h> | ||
|
||
extern "C" void* SystemNative_DlOpen(const char *file, int mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we try to avoid just assuming that these values are the same on all platforms, e.g. that RTLD_GLOBAL is 4 everywhere. Two options:
- Switch on mode, accepting the values we use in .NET, and use that to determine the actual value to pass to dlopen.
- Use static_asserts to validate the values we're using for mode match the platform at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I limited the values to RTLD_LAZY
and RTLD_NOW
, added PAL_
equivalents and added static_asserts
to validate the values at compile time.
} | ||
|
||
[DllImport(Libraries.SystemNative, EntryPoint = "SystemNative_DlOpen")] | ||
internal static extern IntPtr DlOpen(string fileName, DlOpenFlags flag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like dlopen is also being used in tests, e.g.
nativeLib = dlopen("libgdiplus.so", RTLD_NOW); |
Interop.Libdl.dlopen(( |
Do those need to be fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated both usages of dlopen
. I couldn't find any other places that use dlopen
so I think we're good now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, otherwise looks good.
PAL_RTLD_LAZY = 1, | ||
PAL_RTLD_NOW = 2 | ||
}; | ||
extern "C" void* SystemNative_DlOpen(const char *file, int mode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra line and add comments like in: https://github.com/dotnet/corefx/blob/master/src/Native/Unix/System.Native/pal_io.h#L289-L294?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks @qmfrederik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for fixing these up.
Actually, wait... does System.Drawing.Common need to continue working on .NET Core 2.0 / .NET Standard 2.0? If so, it can't take additional dependencies on the shim. |
Yes it needs to continue working in 2.0, good catch. We could potentially include files when TargetGroup == netcoreapp2.0 and behave the old way, but it will be a little bit messy. |
Alternatively, I guess we could keep the direct the P/Invoke into Let me know which way you want to go and I'll update the PR. |
I updated the PR so that there are now to versions of DlOpen and DlSym: On .NET Core 2.0, they will directly P/Invoke libdl (with known limitations); on .NET Core 2.1, they would P/Invoke the PAL. Net, there would be no change for netcoreapp2.0 and the CentOS & BSD issues would be fixed in netcoreapp2.1. I'm not really up to speed with the different build configuration in this repo, so if I can improve this, let me know. |
@dotnet-bot test Tizen armel Debug Build |
@@ -16,6 +16,8 @@ | |||
</PropertyGroup> | |||
<PropertyGroup Condition="'$(Configuration)|$(Platform)' == 'netcoreapp2.0-Unix-Debug|AnyCPU'" /> | |||
<PropertyGroup Condition="'$(Configuration)|$(Platform)' == 'netcoreapp2.0-Unix-Release|AnyCPU'" /> | |||
<PropertyGroup Condition="'$(Configuration)|$(Platform)' == 'netcoreapp2.1-Unix-Debug|AnyCPU'" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
netcoreapp
(versionless) is equivalent to netcoreapp2.1 so you can leave it as netcoreapp
. I will go ahead and update your branch with the right confiurations as we need to keep netcoreapp2.0, netfx and netstandard as a PackageConfiguration
and netcoreapp
as a build configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the build configurations in the csproj are only for Visual Studio to understand the different configurations that this project can target, but the real configurations we use to build are in configurations.props file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed, you can take a look at what I did in commit: 2aafcc4
@weshaggard do you mind reviewing: 2aafcc4 please? |
The configurations look correct to me. |
Thanks. |
@qmfrederik could you please validate that this PR on the current state actually fixes the problem? |
@safern I'm trying to but must be doing something wrong. I'm basically trying to follow the steps in Using your local CoreFX build dogfooding instructions on Linux. From what I gather there, to build be able to dogfood from a local build, I need to add At first, I hit #25185 which is fixable, but now the builds fail with this message:
I guess I've missed something. Is there an easier way to build a NuGet package source from my corefx repo? |
If what you’re trying to achieve is tu build System.Drawing.Common Package, you just need to do build and then do msbuild src/System.Drawing.Common/pkg/System.Drawing.Common.pkgproj That will create the System.Drawing.Common Package in the Package drop. What OS are you in? |
Building <Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>netcoreapp2.1</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="System.Drawing.Common" Version="4.5.0-preview1-26009-0"/>
</ItemGroup>
</Project> <configuration>
<packageSources>
<add key="local coreclr" value="/home/fcarlier/git/corefx/bin/packages/Debug/" />
<add key="dotnet myget" value="https://dotnet.myget.org/F/dotnet-core/api/v3/index.json" />
</packageSources>
</configuration> However, running
Copying the output (I'm building on Ubuntu 16.04 and testing on CentOS 7) confirms that:
When I manually copy over
So the fix in itself works on CentOS, but somehow my I'm using the latest SDK:
|
I believe because the pkgproj has netcoreapp2.0 as a target framework because we haven’t shipped the stable versions. Once we ship netcoreapp2.1 I think we will have to ship a stable version of this packages with a netcoreapp2.1 asset. @weshaggard correct me if I’m wrong. Your validation confirms that for netcoreapp2.0 we still depend on the direct P/invoke to libl and we don’t depend on the System.Native shim which is what we want. And for netcoreapp2.1 we are getting the expected behavior. |
netcoreapp2.0-Windows_NT; | ||
netcoreapp2.0-Unix; | ||
netfx; | ||
netstandard; | ||
</PackageConfigurations> | ||
<BuildConfigurations> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is turning System.Drawing.Common into yet another partial OOB, with unmanaged dependency. Partial OOBs like that have very high maintenance costs. If we want to do this, it should be approved by the .NET Core disro owners.
cc @Petermarcu @karelz
The right way to solve the problem with PInvoking to DlOpen is to add the imperative dll import APIs to .NET Core 2.1: https://github.com/dotnet/coreclr/issues/14968. It will avoid the need for the dependency on .NE T Core shims, and putting System.Drawing.Common inbox.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas, won't it be a partial OOB either way? A build for 2.0 on top of standard, and then a build for 2.1 using netcoreapp-specific APIs. How does a dependency on a System.Native.so/dylib function cause any more issues than a dependency on netcoreapp-specific method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is whether it is part of .NET Runtime download or whether the app carries it with itself.
Everything that depends on System.Native.so/dylib shim should be part of .NET Runtime download. The System.Native.so/dylib shim is not part of our public API surface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think dotnet/coreclr#14968 will solve the problem this PR is trying to address.
Yes, there are a lot of cases where managed code wants to have a lot of control over which library is being loaded. Managed libraries which wrap around native libraries, for example, will know where their native library is located, what name(s) it may have, and will try to load that directly.
For example, I think it's normal System.Drawing.Common has a very strong opinion about where libgdiplus.so is located, because it has domain knowledge about libgdiplus. The same for, say, FFmpeg.AutoGen, which is a managed wrapper around ffmpeg.
On the other hand, System.Drawing.Common doesn't care at all about dlopen and has no domain knowledge about it. The way I understand it, dotnet/coreclr#14968 would allow System.Drawing.Common to say something along the lines of "dlopen may be in libdl.so on most Linux distro's, libdl.so.2 on CentOS and something else on FreeBSD".
I don't think it's the business of System.Drawing.Common to care about that.
Anyone writing a managed wrapper around a native library trying and using function pointers will need dlopen and dlsym. Everyone doing that will need to replicate the same search logic.
And most importantly, that logic will eventually fail: when corefx is ported to a new platform, when libdl versions,... .
Long story short, for this use case, I believe a better option would be to make dlopen/dlsym in one shape or another part of the CoreFX API, as suggested in #17135.
"The least that could possible work" may be to expose a public LoadNativeLibrary
and LoadNativeSymbol
API in CoreFX on which System.Drawing.Common and others can depend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe a better option would be to make dlopen/dlsym in one shape or another part of the CoreFX API, as suggested in #17135.
Agree. I did not know that #17135 existed. The APIs proposed by #17135 are the kind of APIs that I thought we will add as part of dotnet/coreclr#14968.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nuget package declares dependency on netcore2.1, it is expected to work on netcoreapp2.2 or even netcoreapp3.0 as well. How do we make sure that it works there when it depends on netcoreapp2.1 shims?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we make sure that it works there when it depends on netcoreapp2.1 shims?
By taking such a dependency we're signing up to keep those functions exposed, as part of the internal surface area of the product, little different than signing up to continue keeping public surface area exposed in subsequent versions.
Like I said earlier, I'm fine if we decide against that and want to impose the restrictions you highlight. I'm simply calling out that I don't think it's the only avenue available to us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
The difference is that we have a lot of infrastructure to maintain compatibility of the public surface. Such infrastructure does not exist for the internal surface. It would need to be built.
We have tried to play similar internal surface games in .NET Framework in the past (e.g. with System.SizedReference
). It tends to turn into poorly designed, tested and documented part that everybody is afraid to touch. I would love to keep things simple and stick to public surface when it comes to dependencies between independent packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is that we have a lot of infrastructure to maintain compatibility of the public surface.
Fair enough.
I would love to keep things simple and stick to public surface when it comes to dependencies between independent packages.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I agree with @jkotas that we should avoid taking these type of dependencies in independent shipping packages. While the rule likely isn't written down anywhere we do try to follow the same strategy for anything the depends on System.Private.CoreLib directly as well.
@jkotas @stephentoub @karelz Although I'm don't fully understand why this PR carries a high maintenance cost, I do think it's worth getting generic load library/load symbol functionality in corefx. Look at #17135 and dotnet/coreclr#14968, what would the next steps be to implement this? Do we need to go through an API review process for this (and how do we start one)? |
Yes. cc @russellhadley |
Seems like we're going to pursue another direction, so I'm going to close out this PR for now. Thanks for the efforts here, @qmfrederik. |
@stephentoub I'm fine with pursuing another direction; yet it seems like we're a stuck in identifying what that direction is/what the next steps are. The API proposal of #17135 looks good to me. It would solve the System.Drawing.Common problem, and also be very valuable to all other native-code wrappers I know of. What are the next steps we can take to move this forward? |
When the proposal in that issue has had the appropriate discussion and is ready for review, its tag can be changed from api-needs-work to api-ready-for-review, and then it'll be discussed at a subsequent API review meeting. |
That seems to be the tricky part 😄 . Who defines when it's ready/more work is needed? |
These probably are better questions to ask on that issue rather than here. @karelz, can you help push that forward or find someone to do so? |
Area owners need to push it further -- @russellhadley @luqun @shrah can you please look at #17135? |
System.Drawing P/Invokes into
dlopen
anddlsym
to load the various libgdiplus/GDI+ functions.It currently assumes that
libdl.so
exists and containsdlopen
anddlsym
. This is not always the case - for example on CentOS you havelibdl.so.2
but notlibdl.so
and on FreeBSDdlopen
does not live inlibdl
at all.Instead of trying figure out where
dlopen
lives at runtime, adddlopen
anddlsym
to the PAL and resolve it at compile time.