question: slinc is about 3 times slower than jni (when using OpenJDK 17). Is this expected peformance? #81

i10416 · 2023-02-26T14:15:45Z

Hello.
I ran a small comparative benchmark between slinc and jni, and the benchmark result shows slinc is about 3 times slower than jni.
Is this expected peformance? I guess slinc(or Panama) abstraction is not free and I heard that there is some performance overhead for struct allocation in Panama, thus I assume this overhead is expected, but I want to hear author's opinion for my confidence.

context:

Scala 3.2.2
JVM: JDK 17.0.3, OpenJDK 64-Bit Server VM, 17.0.3+7-LTS
slinc: 0.1.1-110-7863cb
Apple clang version 13.1.6 (clang-1316.0.21.2.5)

src:

Benchmark	Mode	Cnt	Score	Error	Units
NativeBenchmarks.jni	avgt	5	5064.292	± 593.829	ns/op
NativeBenchmarks.slinc	avgt	5	16882.792	± 1172.054	ns/op

markehammons · 2023-02-26T14:42:25Z

I haven't had a good comparison with JNI so I can't say for sure. However one thing I note is that your code in the JNI implementation doesn't seem to handle deallocation at all, while the Slinc code does on account of the confined Scope. Scope.global would give a similar effect as what's going on in the JNI version.

That being said, it's possible there's more effective ways to implement the Slinc code to get closer to JNI performance. If you'd like to contribute some JNI benchmarks to the project I'd appreciate it!

i10416 · 2023-02-26T15:07:55Z

Thank you for feedback.

your code in the JNI implementation doesn't seem to handle deallocation at all

Ah, that's a good point. I slacked off deallocation😰 I will investigate it.

you'd like to contribute some JNI benchmarks to the project I'd appreciate it!

I'm happy to contribute JNI benchmarks but I'm concerned that benchmark workflow gets messy as JNI requires building native lib. In addition, I usually use sbt for my build, so it will take a bit to translate sbt build into mill's and make a PR.

i10416 · 2023-02-26T15:13:04Z

Panama competes with JNI or even outperforms JNI in some situation as shown in this talk(https://www.youtube.com/watch?v=4xFV-A7JToY), so I think(hope) it is possible to improve performance.

markehammons · 2023-02-28T08:49:56Z

I'm happy to contribute JNI benchmarks but I'm concerned that benchmark workflow gets messy as JNI requires building native lib. In addition, I usually use sbt for my build, so it will take a bit to translate sbt build into mill's and make a PR.

I'm already doing this in some capacity for my tests, so it's not a huge issue. I'm not too worried about it overcomplicating things. If you want, we can meet on google meet and I can show you how we can extend mill to do the build of the C++ part.

markehammons · 2023-02-28T09:13:24Z

Panama competes with JNI or even outperforms JNI in some situation as shown in this talk(https://www.youtube.com/watch?v=4xFV-A7JToY), so I think(hope) it is possible to improve performance.

It should be possible, and one way will be to drop the usage of MethodHandleFacade, a shim I put in place while Scala 3 didn't officially support MethodHandle.invoke. Now that Scala 3 does support these methods, I should be able to get better performance by using them directly.

There's other things to do to, but right now, the current version of Slinc is probably going to be slower. I'm currently reworking it to be better designed, less complex, and more suitable to build libraries that can be loaded by users using java 17, 18, 19, or whatever. Part of that process is me giving up on trying to do compile-time optimization. Where I'm hoping to gain performance back is JITC powered by runtime multi-stage compilation.

i10416 · 2023-02-28T17:35:04Z

we can meet on google meet and I can show you how we can extend mill to do the build of the C++ part.

That's great. I live in Japan now, but I plan to go to EU region for travel next week, so it is convenient to hold meets next week or later in terms of timezone.(I guess you are in EU from your GitHub profile and fr domain.) Thanks a lot.

By the way, https://github.com/scala-cli/libsodiumjni seems a good example of using JNI with mill, so I'll take a look at it for now to learn mill stuffs.

i10416 · 2023-03-05T14:11:01Z

With Java 19, SlinC is nearly as fast as JNI 😉!

JVM: OpenJDK Runtime Environment Zulu19.30+11-CA (build 19.0.1+10)

Benchmark	Mode	Cnt	Score	Error	Units
NativeBenchmarks.jni	avgt	5	4872.056	± 57.582	ns/op
NativeBenchmarks.slinc	avgt	5	5607.126	± 115.210	ns/op

i10416 · 2023-03-07T05:22:43Z

I added simpler benchmark, sorting 1,000,000 elements by qsort, that upcalls JVM method from native. It seems upcall has large overhead even if we use JNI.
I couldn't find out why SlinC(or foreign API) takes 5 time longer than JNI.

JVM: OpenJDK Runtime Environment Zulu19.30+11-CA (build 19.0.1+10)

Benchmark		Mode	Cnt	Score	Error	Units
SimpleNativeCallBenchmarks.jniNativeQSort	using native comparator	avgt	5	4113.280	± 184.594	ns/op
SimpleNativeCallBenchmarks.jniQSort	using upcall comparator, destructively mutate original array	avgt	5	281968.369	± 4070.398	ns/op
SimpleNativeCallBenchmarks.slincQSortWithCopyBack	using upcall comparator, copy and transfer array	avgt	5	1609949.152	± 429499.499	ns/op
SimpleNativeCallBenchmarks.slincQSortWithoutCopyBack	using upcall comparator, copy and transfer array, discarding result	avgt	5	1574451.526	± 378398.468	ns/op

https://github.com/i10416/bench#qsort-benchmark

markehammons · 2023-03-07T08:19:33Z

What we can try, and what I don't have available at the moment, is creating an upcall from a method rather than a lambda. The way the foreign API suggests creating an upcall is targeting a method, but I used lambdas instead for ease of use.

markehammons · 2023-03-07T08:28:33Z

Another thing is that I think your bench is doing a lot of extra work in Slinc. I notice that for each call you recreate the upcall, use it, then toss it away. Upcall creation is expensive, and I don't think the JNI version is recreating its upcall binding for each iteration.

Can you try allocating the upcall in a static location (not in the benchmark loop) using Scope.global?

markehammons · 2023-03-07T10:30:36Z

Having cloned your bench and having the callback allocated once (rather than per benchmark iteration), I see a improvement in performance of Slinc's upcall code to just 2x slower than JNI, rather than 5x slower. I think there may be more performance improvements to be found, but first I should make us able to generate an upcall from a method rather than a lambda and see what the performance from that looks like.

i10416 · 2023-03-07T10:40:17Z

Thank you for feedback!

I see a improvement in performance of Slinc's upcall code to just 2x slower than JNI, rather than 5x slower.

Oh! it's significant!

i10416 · 2023-03-11T11:53:21Z

i10416/bench@22323c9

JFYI:

Hi, I can reproduce your improvement in performance by pre-allocating upcall in my local machine! Thanks.

i10416 changed the title ~~question: slinc is about 3 times slower than jni. Is this expected peformance?~~ question: slinc is about 3 times slower than jni (when using OpenJDK 17). Is this expected peformance? Mar 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question: slinc is about 3 times slower than jni (when using OpenJDK 17). Is this expected peformance? #81

question: slinc is about 3 times slower than jni (when using OpenJDK 17). Is this expected peformance? #81

i10416 commented Feb 26, 2023 •

edited

Loading

markehammons commented Feb 26, 2023

i10416 commented Feb 26, 2023 •

edited

Loading

i10416 commented Feb 26, 2023

markehammons commented Feb 28, 2023

markehammons commented Feb 28, 2023

i10416 commented Feb 28, 2023 •

edited

Loading

i10416 commented Mar 5, 2023 •

edited

Loading

i10416 commented Mar 7, 2023 •

edited

Loading

markehammons commented Mar 7, 2023

markehammons commented Mar 7, 2023 •

edited

Loading

markehammons commented Mar 7, 2023 •

edited

Loading

i10416 commented Mar 7, 2023 •

edited

Loading

i10416 commented Mar 11, 2023

question: slinc is about 3 times slower than jni (when using OpenJDK 17). Is this expected peformance? #81

question: slinc is about 3 times slower than jni (when using OpenJDK 17). Is this expected peformance? #81

Comments

i10416 commented Feb 26, 2023 • edited Loading

markehammons commented Feb 26, 2023

i10416 commented Feb 26, 2023 • edited Loading

i10416 commented Feb 26, 2023

markehammons commented Feb 28, 2023

markehammons commented Feb 28, 2023

i10416 commented Feb 28, 2023 • edited Loading

i10416 commented Mar 5, 2023 • edited Loading

i10416 commented Mar 7, 2023 • edited Loading

markehammons commented Mar 7, 2023

markehammons commented Mar 7, 2023 • edited Loading

markehammons commented Mar 7, 2023 • edited Loading

i10416 commented Mar 7, 2023 • edited Loading

i10416 commented Mar 11, 2023

i10416 commented Feb 26, 2023 •

edited

Loading

i10416 commented Feb 26, 2023 •

edited

Loading

i10416 commented Feb 28, 2023 •

edited

Loading

i10416 commented Mar 5, 2023 •

edited

Loading

i10416 commented Mar 7, 2023 •

edited

Loading

markehammons commented Mar 7, 2023 •

edited

Loading

markehammons commented Mar 7, 2023 •

edited

Loading

i10416 commented Mar 7, 2023 •

edited

Loading