Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] NoClassDefFoundError: io/grpc/netty/shaded/io/grpc/netty/shaded/io/netty/handler/logging/ByteBufFormat #23513

Closed
2 of 3 tasks
suninuni opened this issue Oct 24, 2024 · 7 comments · Fixed by #23604
Closed
2 of 3 tasks
Assignees
Labels
release/4.0.1 triage/lhotari/important lhotari's triaging label for important issues or PRs type/bug The PR fixed a bug or issue reported a bug
Milestone

Comments

@suninuni
Copy link

suninuni commented Oct 24, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

4.0.0

Minimal reproduce step

  • Using official 4.0.0 images and helm repo.
  • Use etcd as metadataStore.
  • Found error in pulsar-bookie-init job

What did you expect to see?

no error.

What did you see instead?

The whole output for bookie-init job

 [0.007s][info][gc,init] CardTable entry size: 512
 [0.007s][debug][gc,heap] Minimum heap 268435456  Initial heap 268435456  Maximum heap 268435456
 [0.007s][info ][gc     ] Using G1
 [0.009s][info ][gc,init] Version: 21.0.4+7-LTS (release)
 [0.009s][info ][gc,init] CPUs: 16 total, 16 available
 [0.009s][info ][gc,init] Memory: 123G
 [0.009s][info ][gc,init] Large Page Support: Disabled
 [0.009s][info ][gc,init] NUMA Support: Disabled
 [0.009s][info ][gc,init] Compressed Oops: Enabled (32-bit)
 [0.009s][info ][gc,init] Heap Region Size: 1M
 [0.009s][info ][gc,init] Heap Min Capacity: 256M
 [0.009s][info ][gc,init] Heap Initial Capacity: 256M
 [0.009s][info ][gc,init] Heap Max Capacity: 256M
 [0.009s][info ][gc,init] Pre-touch: Disabled
 [0.009s][info ][gc,init] Parallel Workers: 4
 [0.009s][info ][gc,init] Concurrent Workers: 4
 [0.009s][info ][gc,init] Concurrent Refinement Workers: 4
 [0.009s][info ][gc,init] Periodic GC: Disabled
 [0.009s][info ][gc,metaspace] CDS archive(s) not mapped
 [0.009s][info ][gc,metaspace] Compressed class space mapped at: 0x0000000100000000-0x0000000140000000, reserved size: 1073741824
 [0.009s][info ][gc,metaspace] Narrow klass base: 0x0000000000000000, Narrow klass shift: 3, Narrow klass range: 0x140000000
 [0.613s][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 574895074 ns, Reaching safepoint: 72595 ns, Cleanup: 126407 ns, At safepoint: 13755 ns, Total: 212757 ns
 [1.000s][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 387005131 ns, Reaching safepoint: 5410 ns, Cleanup: 139630 ns, At safepoint: 4609 ns, Total: 149649 ns
 [1.103s][info ][gc,start    ] GC(0) Pause Young (Concurrent Start) (Metadata GC Threshold)
 [1.103s][debug][gc,heap     ] GC(0) Heap before GC invocations=0 (full 0):
 [1.103s][debug][gc,heap     ] GC(0)  garbage-first heap   total 262144K, used 59392K [0x00000000f0000000, 0x0000000100000000)
 [1.103s][debug][gc,heap     ] GC(0)   region size 1024K, 54 young (55296K), 0 survivors (0K)
 [1.103s][debug][gc,heap     ] GC(0)  Metaspace       used 21241K, committed 21504K, reserved 1114112K
 [1.103s][debug][gc,heap     ] GC(0)   class space    used 2234K, committed 2368K, reserved 1048576K
 [1.103s][info ][gc,task     ] GC(0) Using 4 workers of 4 for evacuation
 [1.109s][info ][gc,phases   ] GC(0)   Pre Evacuate Collection Set: 0.1ms
 [1.109s][info ][gc,phases   ] GC(0)   Merge Heap Roots: 0.0ms
 [1.109s][info ][gc,phases   ] GC(0)   Evacuate Collection Set: 5.3ms
 [1.109s][info ][gc,phases   ] GC(0)   Post Evacuate Collection Set: 0.6ms
 [1.109s][info ][gc,phases   ] GC(0)   Other: 0.2ms
 [1.109s][info ][gc,heap     ] GC(0) Eden regions: 54->0(114)
 [1.109s][trace][gc,heap     ] GC(0)  Used: 0K, Waste: 0K
 [1.109s][info ][gc,heap     ] GC(0) Survivor regions: 0->14(16)
 [1.109s][trace][gc,heap     ] GC(0)  Used: 13508K, Waste: 827K
 [1.109s][info ][gc,heap     ] GC(0) Old regions: 0->0
 [1.109s][trace][gc,heap     ] GC(0)  Used: 0K, Waste: 0K
 [1.109s][info ][gc,heap     ] GC(0) Humongous regions: 5->4
 [1.109s][trace][gc,heap     ] GC(0)  Used: 4096K, Waste: 0K
 [1.109s][info ][gc,metaspace] GC(0) Metaspace: 21241K(21504K)->21241K(21504K) NonClass: 19006K(19136K)->19006K(19136K) Class: 2234K(2368K)->2234K(2368K)
 [1.109s][debug][gc,heap     ] GC(0) Heap after GC invocations=1 (full 0):
 [1.109s][debug][gc,heap     ] GC(0)  garbage-first heap   total 262144K, used 17604K [0x00000000f0000000, 0x0000000100000000)
 [1.109s][debug][gc,heap     ] GC(0)   region size 1024K, 14 young (14336K), 14 survivors (14336K)
 [1.109s][debug][gc,heap     ] GC(0)  Metaspace       used 21241K, committed 21504K, reserved 1114112K
 [1.109s][debug][gc,heap     ] GC(0)   class space    used 2234K, committed 2368K, reserved 1048576K
 [1.109s][info ][gc          ] GC(0) Pause Young (Concurrent Start) (Metadata GC Threshold) 58M->17M(256M) 6.374ms
 [1.110s][info ][gc,cpu      ] GC(0) User=0.02s Sys=0.00s Real=0.01s
 [1.110s][info ][gc          ] GC(1) Concurrent Mark Cycle
 [1.110s][info ][safepoint   ] Safepoint "CollectForMetadataAllocation", Time since last: 103199506 ns, Reaching safepoint: 3106 ns, Cleanup: 42239 ns, At safepoint: 6427757 ns, Total: 6473102 ns
 [1.110s][info ][gc,marking  ] GC(1) Concurrent Scan Root Regions
 [1.111s][info ][gc,marking  ] GC(1) Concurrent Scan Root Regions 0.962ms
 [1.111s][info ][gc,marking  ] GC(1) Concurrent Mark
 [1.111s][info ][gc,marking  ] GC(1) Concurrent Mark From Roots
 [1.111s][info ][gc,task     ] GC(1) Using 4 workers of 4 for marking
 [1.111s][info ][gc,marking  ] GC(1) Concurrent Mark From Roots 0.090ms
 [1.111s][info ][gc,marking  ] GC(1) Concurrent Preclean
 [1.111s][info ][gc,marking  ] GC(1) Concurrent Preclean 0.004ms
 [1.111s][info ][gc,start    ] GC(1) Pause Remark
 [1.112s][info ][gc          ] GC(1) Pause Remark 18M->18M(256M) 1.329ms
 [1.112s][info ][gc,cpu      ] GC(1) User=0.01s Sys=0.00s Real=0.00s
 [1.112s][info ][safepoint   ] Safepoint "G1PauseRemark", Time since last: 1088192 ns, Reaching safepoint: 63218 ns, Cleanup: 3446 ns, At safepoint: 1349659 ns, Total: 1416323 ns
 [1.112s][info ][gc,marking  ] GC(1) Concurrent Mark 1.539ms
 [1.112s][info ][gc,marking  ] GC(1) Concurrent Rebuild Remembered Sets and Scrub Regions
 [1.112s][info ][gc,marking  ] GC(1) Concurrent Rebuild Remembered Sets and Scrub Regions 0.014ms
 [1.112s][info ][gc,start    ] GC(1) Pause Cleanup
 [1.112s][info ][gc          ] GC(1) Pause Cleanup 18M->18M(256M) 0.013ms
 [1.112s][info ][gc,cpu      ] GC(1) User=0.00s Sys=0.00s Real=0.00s
 [1.112s][info ][safepoint   ] Safepoint "G1PauseCleanup", Time since last: 39364 ns, Reaching safepoint: 70952 ns, Cleanup: 2605 ns, At safepoint: 27000 ns, Total: 100557 ns
 [1.112s][info ][gc,marking  ] GC(1) Concurrent Clear Claimed Marks
 [1.112s][info ][gc,marking  ] GC(1) Concurrent Clear Claimed Marks 0.005ms
 [1.112s][info ][gc,marking  ] GC(1) Concurrent Cleanup for Next Mark
 [1.113s][info ][gc,marking  ] GC(1) Concurrent Cleanup for Next Mark 0.536ms
 [1.113s][info ][gc          ] GC(1) Concurrent Mark Cycle 3.228ms
 2024-10-24T08:45:07,361+0000 [main] INFO  org.apache.bookkeeper.meta.MetadataDrivers - BookKeeper metadata driver manager initialized
 2024-10-24T08:45:07,506+0000 [main] ERROR org.apache.bookkeeper.bookie.BookieShell - Got an exception
 java.lang.NoClassDefFoundError: io/grpc/netty/shaded/io/grpc/netty/shaded/io/netty/handler/logging/ByteBufFormat
 	at org.apache.pulsar.jetcd.shaded.io.vertx.core.net.NetworkOptions.<clinit>(NetworkOptions.java:62) ~[org.apache.pulsar-jetcd-core-shaded-4.0.0-shaded.jar:4.0.0]
 	at org.apache.pulsar.jetcd.shaded.io.vertx.core.VertxOptions.<init>(VertxOptions.java:143) ~[org.apache.pulsar-jetcd-core-shaded-4.0.0-shaded.jar:4.0.0]
 	at io.etcd.jetcd.impl.ClientConnectionManager.vertx(ClientConnectionManager.java:223) ~[org.apache.pulsar-jetcd-core-shaded-4.0.0-shaded.jar:4.0.0]
 	at io.etcd.jetcd.impl.ClientConnectionManager.defaultChannelBuilder(ClientConnectionManager.java:158) ~[org.apache.pulsar-jetcd-core-shaded-4.0.0-shaded.jar:4.0.0]
 	at io.etcd.jetcd.impl.ClientConnectionManager.defaultChannelBuilder(ClientConnectionManager.java:149) ~[org.apache.pulsar-jetcd-core-shaded-4.0.0-shaded.jar:4.0.0]
 	at io.etcd.jetcd.impl.ClientConnectionManager.getChannel(ClientConnectionManager.java:70) ~[org.apache.pulsar-jetcd-core-shaded-4.0.0-shaded.jar:4.0.0]
 	at io.etcd.jetcd.impl.ClientConnectionManager.newStub(ClientConnectionManager.java:102) ~[org.apache.pulsar-jetcd-core-shaded-4.0.0-shaded.jar:4.0.0]
 	at io.etcd.jetcd.impl.KVImpl.<init>(KVImpl.java:51) ~[org.apache.pulsar-jetcd-core-shaded-4.0.0-shaded.jar:4.0.0]
 	at io.etcd.jetcd.impl.ClientImpl.lambda$new$0(ClientImpl.java:48) ~[org.apache.pulsar-jetcd-core-shaded-4.0.0-shaded.jar:4.0.0]
 	at io.etcd.jetcd.support.MemorizingClientSupplier.get(MemorizingClientSupplier.java:37) ~[org.apache.pulsar-jetcd-core-shaded-4.0.0-shaded.jar:4.0.0]
 	at io.etcd.jetcd.impl.ClientImpl.getKVClient(ClientImpl.java:65) ~[org.apache.pulsar-jetcd-core-shaded-4.0.0-shaded.jar:4.0.0]
 	at org.apache.pulsar.metadata.impl.EtcdMetadataStore.<init>(EtcdMetadataStore.java:111) ~[org.apache.pulsar-pulsar-metadata-4.0.0.jar:4.0.0]
 	at org.apache.pulsar.metadata.impl.EtcdMetadataStoreProvider.create(EtcdMetadataStore.java:515) ~[org.apache.pulsar-pulsar-metadata-4.0.0.jar:4.0.0]
 	at org.apache.pulsar.metadata.impl.MetadataStoreFactoryImpl.newInstance(MetadataStoreFactoryImpl.java:63) ~[org.apache.pulsar-pulsar-metadata-4.0.0.jar:4.0.0]
 	at org.apache.pulsar.metadata.impl.MetadataStoreFactoryImpl.createExtended(MetadataStoreFactoryImpl.java:50) ~[org.apache.pulsar-pulsar-metadata-4.0.0.jar:4.0.0]
 	at org.apache.pulsar.metadata.api.extended.MetadataStoreExtended.create(MetadataStoreExtended.java:42) ~[org.apache.pulsar-pulsar-metadata-4.0.0.jar:4.0.0]
 	at org.apache.pulsar.metadata.bookkeeper.AbstractMetadataDriver.createMetadataStore(AbstractMetadataDriver.java:117) ~[org.apache.pulsar-pulsar-metadata-4.0.0.jar:4.0.0]
 	at org.apache.pulsar.metadata.bookkeeper.AbstractMetadataDriver.initialize(AbstractMetadataDriver.java:59) ~[org.apache.pulsar-pulsar-metadata-4.0.0.jar:4.0.0]
 	at org.apache.pulsar.metadata.bookkeeper.PulsarMetadataBookieDriver.initialize(PulsarMetadataBookieDriver.java:45) ~[org.apache.pulsar-pulsar-metadata-4.0.0.jar:4.0.0]
 	at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithMetadataBookieDriver(MetadataDrivers.java:356) ~[org.apache.bookkeeper-bookkeeper-server-4.17.1.jar:4.17.1]
 	at org.apache.bookkeeper.meta.MetadataDrivers.runFunctionWithRegistrationManager(MetadataDrivers.java:383) ~[org.apache.bookkeeper-bookkeeper-server-4.17.1.jar:4.17.1]
 	at org.apache.bookkeeper.client.BookKeeperAdmin.initNewCluster(BookKeeperAdmin.java:1310) ~[org.apache.bookkeeper-bookkeeper-server-4.17.1.jar:4.17.1]
 	at org.apache.bookkeeper.tools.cli.commands.bookies.InitCommand.apply(InitCommand.java:56) ~[org.apache.bookkeeper-bookkeeper-server-4.17.1.jar:4.17.1]
 	at org.apache.bookkeeper.bookie.BookieShell$InitNewCluster.runCmd(BookieShell.java:364) ~[org.apache.bookkeeper-bookkeeper-server-4.17.1.jar:4.17.1]
 	at org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:254) ~[org.apache.bookkeeper-bookkeeper-server-4.17.1.jar:4.17.1]
 	at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:2674) ~[org.apache.bookkeeper-bookkeeper-server-4.17.1.jar:4.17.1]
 	at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:2771) [org.apache.bookkeeper-bookkeeper-server-4.17.1.jar:4.17.1]
 Caused by: java.lang.ClassNotFoundException: io.grpc.netty.shaded.io.grpc.netty.shaded.io.netty.handler.logging.ByteBufFormat
 	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(Unknown Source) ~[?:?]
 	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Unknown Source) ~[?:?]
 	at java.base/java.lang.ClassLoader.loadClass(Unknown Source) ~[?:?]
 	... 27 more
 [1.496s][info ][gc,heap,exit] Heap
 [1.496s][info ][gc,heap,exit]  garbage-first heap   total 262144K, used 39108K [0x00000000f0000000, 0x0000000100000000)
 [1.496s][info ][gc,heap,exit]   region size 1024K, 35 young (35840K), 14 survivors (14336K)
 [1.496s][info ][gc,heap,exit]  Metaspace       used 26566K, committed 26880K, reserved 1114112K
 [1.496s][info ][gc,heap,exit]   class space    used 2884K, committed 3008K, reserved 1048576K

Anything else?

Related #22892 (comment)

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@suninuni suninuni added the type/bug The PR fixed a bug or issue reported a bug label Oct 24, 2024
@suninuni
Copy link
Author

suninuni commented Oct 24, 2024

@lhotari please take a look

@lhotari
Copy link
Member

lhotari commented Oct 24, 2024

Minimal reproduce step

  • Using official 4.0.0 images and helm repo.
  • Use etcd as metadataStore.
  • Found error in pulsar-bookie-init job

@suninuni Thanks for reporting this. Please help fixing this issue by providing exact steps to reproduce the problem. Providing all commands that need to be run will be a helpful first step. Thanks!

Most likely this is caused by an issue in the shading configuration, but we need a good way to prevent future regressions. That could go in a integration / system test while the fix is made.

@lhotari lhotari added the triage/lhotari/important lhotari's triaging label for important issues or PRs label Oct 24, 2024
@lhotari lhotari added this to the 4.1.0 milestone Oct 24, 2024
@lhotari lhotari self-assigned this Oct 24, 2024
@suninuni
Copy link
Author

@lhotari

I think we need an integration test for Standalone's --metadata-url=etcd://xxx param.

Related https://github.com/apache/pulsar/pull/17077/files

This is a simple docker-compose.yml which can reproduce the problem

version: '2'
services:
  pulsar:
    image: dockerpull.com/apachepulsar/pulsar:4.0.0
    container_name: pulsar
    command: >
      sh -c 'bin/apply-config-from-env.py conf/standalone.conf &&
             bin/apply-config-from-env.py conf/client.conf &&
             exec bin/pulsar standalone --metadata-url=etcd:http://pulsar-etcd:2379'
    environment:
      PULSAR_MEM: -Xms256m -Xmx256m -XX:MaxDirectMemorySize=256m
      BOOKIE_MEM: -Xms256m -Xmx256m -XX:MaxDirectMemorySize=256m
      PULSAR_PREFIX_advertisedAddress: pulsar
      PULSAR_PREFIX_webServiceUrl: http://pulsar:8080/
      PULSAR_PREFIX_authenticationEnabled: 'false'
      PULSAR_PREFIX_defaultRetentionSizeInMB: '1024'
      PULSAR_PREFIX_defaultRetentionTimeInMinutes: '4320'
      PULSAR_PREFIX_forceDeleteNamespaceAllowed: 'true'
    stop_grace_period: 30s

@Shawyeok
Copy link
Contributor

Shawyeok commented Nov 12, 2024

This issue it also reproducible in pulsar 3.0.6:

docker run -it --rm apachepulsar/pulsar:3.0.6 bin/pulsar standalone --metadata-url etcd:http://a-etcd:2379

@lhotari
It seems this was introduced in PR #22892. It’s unusual to see shadedArtifactAttached enabled on a module specifically designed for jar shading, and it may not be necessary.

<!-- required for IntelliJ support -->
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>shaded</shadedClassifierName>

I’ve removed some configurations in my fork, and tests have passed with the import verified correctly in org.apache.pulsar.jetcd.shaded.io.vertx.core.net.NetworkOptions class.

This should address the issue, although the impact on IntelliJ support remains uncertain.

@Shawyeok
Copy link
Contributor

Shawyeok commented Nov 15, 2024

Removing the attach and unpack configurations caused compile errors for some etcd-related classes in IntelliJ IDEA.

IntelliJ expects the artifact JAR path to be target/{artifactId}-{version}.jar.
image

However, the Pulsar build system sets finalName to {artifactId} in the parent pom.xml, which doesn’t match this format. I’m not sure why this is done—perhaps @merlimat can clarify.

pulsar/pom.xml

Line 1762 in 81385c5

<finalName>${project.artifactId}</finalName>

Changing finalName to ${artifactId}-${version} (the default) fixed the compile errors.

@Shawyeok
Copy link
Contributor

I've raise a PR #23604 to address this problem, feel free to take a look, thanks.

@lhotari
Copy link
Member

lhotari commented Nov 15, 2024

Thanks for the great work on addressing this issue, @Shawyeok!

hezhangjian pushed a commit to apache/bookkeeper that referenced this issue Nov 18, 2024
### Motivation

There is a potential jar shading issue introduced in #4426 that causes `NoClassDefFoundError` when connecting to an etcd metadata store.

The `jetcd-core-shaded` module was introduced in #4426 to address the compatibility issues between jetcd-core’s grpc-java dependency and Netty. You can find more details [here][1] and in the [grpc-java documentation][2].

[1]: #4426 (comment)
[2]: https://github.com/grpc/grpc-java/blob/master/SECURITY.md#netty

Currently, we use `unpack-shaded-jar` execution unpacks the shaded jar produced by `maven-shade-plugin:shade` into the `jetcd-core-shaded/target/classes` directory. However, the classes in this directory conflict with its dependencies. If the `maven-shade-plugin:shade` runs again without cleaning this directory, it can produce an incorrect shaded jar. You can replicate and verify this issue with the following commands:
```shell
# Step 1: Clean the build directory
mvn clean

# Step 2: Perform an install and unpack the shaded jar into a directory.
# Verify the import statement for `io.netty.handler.logging.ByteBufFormat` in 
# `org/apache/pulsar/jetcd/shaded/io/vertx/core/net/NetClientOptions.class`. 
# The correct import should be: 
# `import io.grpc.netty.shaded.io.netty.handler.logging.ByteBufFormat;`.
mvn install
unzip $M2_REPO/org/apache/bookkeeper/metadata/drivers/jetcd-core-shaded/4.18.0-SNAPSHOT/jetcd-core-shaded-4.18.0-SNAPSHOT-shaded.jar \
  -d metadata-drivers/jetcd-core-shaded/target/first-classes

# Step 3: Run the install command again without cleaning.
# The unpacked jar from the previous step will persist in `target/classes`. 
# Unpack the shaded jar into a different directory (e.g., second-classes) and check the import.
# The incorrect import will be: 
# `import io.grpc.netty.shaded.io.grpc.netty.shaded.io.netty.handler.logging.ByteBufFormat;`.
mvn install
unzip $M2_REPO/org/apache/bookkeeper/metadata/drivers/jetcd-core-shaded/4.18.0-SNAPSHOT/jetcd-core-shaded-4.18.0-SNAPSHOT-shaded.jar \
  -d metadata-drivers/jetcd-core-shaded/target/second-classes

# Step 4: Use IntelliJ IDEA's "Compare Directories" tool to compare the `first-classes` 
# and `second-classes` directories. The differences in imports should become apparent.
```

We can remove the attach and unpack configurations, and it should work fine.

This issue has already been [reported][3] in apache/pulsar, and a similar [patch][patch] has addressed it.

[3]: apache/pulsar#23513
[patch]: apache/pulsar#23604
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release/4.0.1 triage/lhotari/important lhotari's triaging label for important issues or PRs type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
3 participants