Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasting 100MB by Standard.Image: Implement polyglot/lib #11483

Open
6 tasks
JaroslavTulach opened this issue Nov 4, 2024 · 9 comments · May be fixed by #11874
Open
6 tasks

Wasting 100MB by Standard.Image: Implement polyglot/lib #11483

JaroslavTulach opened this issue Nov 4, 2024 · 9 comments · May be fixed by #11874
Assignees
Labels
-compiler -libs Libraries: New libraries to be implemented

Comments

@JaroslavTulach
Copy link
Member

JaroslavTulach commented Nov 4, 2024

I've just checked sizes of various libraries inside of Enso distribution:

enso/built-distribution/enso-engine-*/enso-*/lib/Standard$ du -s * | sort -n
164     Geo
196     Searcher
324     Examples
2616    Visualization
4052    Microsoft
4464    Test
16732   Tableau
25272   AWS
44120   Google_Api
46848   Database
48392   Table
63616   Base
72004   Snowflake
118976  Image

e.g. the Standard.Image library occupies 118MB! Why? Because its opencv.jar internally contains various ~10MB huge binaries for each supported platform.

Standard$ unzip -l Image/0.0.0-dev/polyglot/java/opencv-4.7.0-0.jar | grep nu.*opencv.*470 
 20426616  2023-03-28 02:41   nu/pattern/opencv/linux/ARMv7/libopencv_java470.so
 28689512  2023-03-28 02:41   nu/pattern/opencv/linux/ARMv8/libopencv_java470.so
 64297552  2023-03-28 02:41   nu/pattern/opencv/linux/x86_64/libopencv_java470.so
 22088596  2023-03-28 02:41   nu/pattern/opencv/osx/ARMv8/libopencv_java470.dylib
 54792848  2023-03-28 02:41   nu/pattern/opencv/osx/x86_64/libopencv_java470.dylib
 69675008  2023-03-28 02:41   nu/pattern/opencv/windows/x86_32/opencv_java470.dll
108318208  2023-03-28 02:41   nu/pattern/opencv/windows/x86_64/opencv_java470.dll

If we, remove these native libraries and keep just a single one for the platform we build the distribution for we:

  • save bunch of disk space
  • save CPU time while starting (wasted by extracting these libs to temporary directory)
  • extract at build time rather than during runtime
  • make it easier for native image to link to such extracted library

Specification

The solution is to enhance HostClassLoader and override its findLibrary method to perform native library search. We will adhere to the NetBeans specification - we are using it anyway in when building enso4igv as the libraries layout is:

$ unzip -l enso4igv*nbm | grep /lib/ | cut -d / -f3- | grep -v /$
lib/aarch64/libenso_parser.dylib
lib/enso_parser.dll
lib/libenso_parser.so
lib/x86_64/libenso_parser.dylib

You may place the native libraries (DLL or shared-object) beneath a polyglot/lib directory of an Enso Library with Polyglot Java libraries. If your native library file names for different architectures or operating systems clash, you may create subdirectories under polyglot/lib for each supported platform and nested subdirectories for each supported operating system. The directory names must match System.getProperty("os.arch") and System.getProperty("os.name").replaceAll(" .*$", "").toLowerCase(Locale.ENGLISH) respectively. The System.loadLibrary call originating from the library code will try to locate the library file in the following order of directories:

    polyglot/lib/
    polyglot/lib/<arch>/
    polyglot/lib/<arch>/<os>/

so you may place e.g. 64-bit Linux version of a foo library in a file polyglot/lib/amd64/linux/libfoo.so and 64-bit Windows version in polyglot/lib/amd64/windows/foo.dll (or also polyglot/lib/amd64/foo.dll and of course in polyglot/lib/foo.dll).

Justification

The OpenCV SharedLoader seems to be using System.loadLibrary and thus, if our HostClassLoader supports findLibrary, it should load it from polyglot/lib directory rather than extracting it to temporary location on each startup.

Tasks

Preview Give feedback
@JaroslavTulach JaroslavTulach added -compiler -libs Libraries: New libraries to be implemented labels Nov 4, 2024
@github-project-automation github-project-automation bot moved this to ❓New in Issues Board Nov 4, 2024
@JaroslavTulach JaroslavTulach self-assigned this Nov 5, 2024
@Akirathan
Copy link
Member

Extracting anything from a jar archive during build time can be (easily) managed by setting various assembly sbt settings. I believe that I would be able to define build settings/tasks to do any kind of transformation for the jar. Once you know how should the resulting jar look like, which files should be removed from it, just let me know and I will create a patch of build.sbt that achieves that.

@JaroslavTulach JaroslavTulach removed their assignment Dec 10, 2024
@JaroslavTulach JaroslavTulach changed the title Wasting 100MB by Standard.Image library Wasting 100MB by Standard.Image: Implement polyglot/lib Dec 10, 2024
@Akirathan Akirathan linked a pull request Dec 16, 2024 that will close this issue
6 tasks
@enso-bot
Copy link

enso-bot bot commented Dec 16, 2024

Pavel Marek reports a new STANDUP for today (2024-12-16):

Progress: - Started with creating a separate HostClassLoader per package. It should be finished by 2024-12-25.

@enso-bot
Copy link

enso-bot bot commented Dec 17, 2024

Pavel Marek reports a new STANDUP for today (2024-12-17):

Progress: - Reverting separate HostClassLoader per project - it was a mistake - #11874 (comment).

  • HostClassLoader will just search native libraries in all the projects. It should be finished by 2024-12-25.

@JaroslavTulach
Copy link
Member Author

  • HostClassLoader will just search native libraries in all the projects

+1, there already is work on class loader/layer per library in #10714

@enso-bot
Copy link

enso-bot bot commented Dec 18, 2024

Pavel Marek reports a new STANDUP for today (2024-12-18):

Progress: - Convincing sbt build script to extract native libraries from opencv.jar.

@enso-bot
Copy link

enso-bot bot commented Dec 20, 2024

Pavel Marek reports a new STANDUP for today (2024-12-20):

Progress: - Fixing problems with running Image_Tests on native image.

  • Playing with EnsoLibraryFeature to include dll
    • Works with setting specific java.library.path system prop for runtime, but need to point it to built-distribution. It should be finished by 2024-12-25.

@Akirathan Akirathan moved this from 📤 Backlog to 🔧 Implementation in Issues Board Dec 23, 2024
@enso-bot
Copy link

enso-bot bot commented Dec 23, 2024

Pavel Marek reports a new STANDUP for today (2024-12-23):

Progress: - sbt correctly extracts native libs from opencv.jar.

  • Small DevX improvement: The task to extract native libs is not re-executed every time.
  • Fixing os name and arch for MacOS - waiting for tests on CI. It should be finished by 2024-12-25.

@enso-bot
Copy link

enso-bot bot commented Dec 27, 2024

Pavel Marek reports a new STANDUP for today (2024-12-27):

Progress: - Trying other suggested alternatives to find and load opencv native library.

  • How do we force HostClassLoader.findLibrary to load the opencv native lib at NI runtime?
  • Initialize org.enso.image and nu.pattern during build time?
    • Does not work because of OpenCV$SharedLoader$HOLDER which is statically initialized to a different object every time.
  • Force loading of all classes in EnsoLibraryFeature via HostClassLoader.
    • Having problems with fixing the NI build this way. It should be finished by 2025-01-02.

@enso-bot
Copy link

enso-bot bot commented Dec 31, 2024

Pavel Marek reports a new STANDUP for today (2024-12-31):

Progress: - Many failed attempts to include HostClassLoader.findLibrary in the NI runtime - #11874 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-compiler -libs Libraries: New libraries to be implemented
Projects
Status: 🔧 Implementation
Development

Successfully merging a pull request may close this issue.

2 participants