Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Beam YAML WriteToJson fails on Beam 2.55 #30776

Closed
2 of 16 tasks
Polber opened this issue Mar 27, 2024 · 2 comments · Fixed by #30779 or #30780
Closed
2 of 16 tasks

[Bug]: Beam YAML WriteToJson fails on Beam 2.55 #30776

Polber opened this issue Mar 27, 2024 · 2 comments · Fixed by #30779 or #30780

Comments

@Polber
Copy link
Contributor

Polber commented Mar 27, 2024

What happened?

Since Beam 2.55 was released, the Cross-language transform for JsonWrite does not work on Beam YAML (or Beam Python when using ExternalTransform)

A change to https://github.com/apache/beam/blob/master/sdks/java/io/json/build.gradle removed a dependency on everit -
implementation library.java.everit_json_schema
PR: #29924

This also removed the library from being packaged into the beam-sdks-java-extensions-sql-expansion-service-2.55.0.jar, (sdks:java:extensions:schemaio-expansion-service:shadowJar)

So, when using xlang JsonWrite - https://github.com/apache/beam/blob/master/sdks/java/io/json/src/main/java/org/apache/beam/sdk/io/json/providers/JsonWriteTransformProvider.java
the expansion will fail complaining about java.lang.ClassNotFoundException: org.everit.json.schema.Schema$Builder

Issue Priority

Priority: 1 (data loss / total loss of function)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@kennknowles
Copy link
Member

Seems like there's an issue in how dependencies are specified. Searched for uses of it: https://github.com/search?q=repo%3Aapache%2Fbeam+org.everit+language%3AJava&type=code

It looks like the core SDK depends on it but requires users to add it as a dependency:

provided library.java.everit_json_schema

You were getting lucky that it was also added as a firm dependency, despite sdks/java/io/json/ not actually depending on it. I bet the reason I removed it was that I got an IWYU error. There are two good fixes: (1) add a dep directly at the point of bundling the expansion service jar or (2) just add the dep to the core SDK. And I guess there is fix (3) which is cludge to check if it is present and don't validate if it is not available.

@kennknowles
Copy link
Member

I notice that sdks/java/extensions/schemaio-expansion-service/build.gradle is has suppressed all dependency configuration warnings.

I presume this is because it does not directly depend on any of those things, but wants them in the uber jar. I have to believe there is a more principled way of achieving that, for example a runtime scope or something to do with shadow jar configuration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment