-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scala.MatchError: None (of class scala.None$)
during CaseClassSerializer
#148
Comments
Hi @chollinger93 , thank you for the detailed investigation and for spotted workaround. As I understood you can reproduce this issue again and again? I think it is really needs to be tried on the Flink app/job cluster environment. Probably this can't be spotted in Flink Local mode or simply saying within single JVM. One question: when you run on AWS Flink, do you remove Flink's Scala JAR somehow? Another option is to suppress old Flink's Scala is to use this property: classloader.parent-first-patterns.default. Works fine for me so far. Please check here the idea: https://ververica.zendesk.com/hc/en-us/articles/13583223554460-How-to-run-my-Flink-job-with-Scala-2-13-and-Scala-3-version-in-VVP |
Yes, this is reproducible, provided you get a message that has a I don't think you can remove jars from the AWS Flink runners - I think it's all managed K8s pods with no custom init actions. The link you provided is a good idea. Unfortunately, I don't think the AWS runner gives me access to that property (or anything in the cluster config, outside of raising a support ticket for some of them) and setting |
Got it. I will try you code example on one of my session clusters (with and without Flink's scala around). It would be great if you could verify at AWS whether changing the |
Hi @chollinger93 , I've tried to reproduce your scenario with in-memory data generator, but I could not get the error in Scala Option serialiser.
Another way, if could you give me complete example of a Flink job which I could build and deploy in any Flink-based platform. |
Hey, sorry for not responding earlier. I have it on my backlog to try Flink via K8s/locally, but haven't gotten around to it yet. Setting Your example is pretty spot on, assuming My real code at work has the added complexity of using protobuf, but I can see if I can put together a minimal example soon. Sorry about the delays on my end - really appreciate you looking into this! |
I spent the morning trying to get a minimal example to work (and by "work", I mean "get it to break"), but I'm afraid between us having multiple modules, Kafka, protobuf (and using protobuf to parse from raw That being said, I did try your While poking around the jar, I saw On a related note, I don't think we can't actually use Here are the relevant sections: def excludeJars(cp: Classpath) = cp filter { f =>
Set(
"scala-asm-.+-scala-1\\.jar",
"interface-.+\\.jar",
"interface-.+\\.jar",
"jline-terminal-.+\\.jar",
"jline-reader-.+\\.jar",
"jline-.+\\.jar",
"scala-compiler-.+\\.jar",
"scala3-compiler_.+\\.jar",
"flink-shaded-zookeeper-.+\\.jar",
"flink-shaded-jackson-.+\\.jar",
"annotations-.+\\.jar"
).map(p => f.data.getName.matches(p)).forall(identity)
}
lazy val commonSettings = Seq(
// ...
assembly / assemblyExcludedJars := {
val cp = (assembly / fullClasspath).value
excludeJars(cp)
}
}
// dependencies.scala
"org.apache.flink" % "flink-core" % V.flink,
("org.flinkextended" % "flink-scala-api_3" % V.flinkScala)
.excludeAll(
ExclusionRule(organization = "org.apache.flink"),
ExclusionRule(organization = "org.scalameta")
)
.exclude("com.thesamet.scalapb", "lenses_2.13")
.exclude("com.thesamet.scalapb", "scalapb-runtime_2.13"), If you have a docs page/repo anywhere, I'd be happy to contribute there to make this easier to discover for future AWS users. |
|
I'm afraid this caused some premature celebration - after letting this job run in our dev environment for a while, I just got the ping for messages being dropped due to the same error from the original post. I'm going to have to table the investigation here. Looking at the full rendered dependency graph I don't see anything blatantly wrong with my jar, so I must assume without control over the Flink classpath in the AWS runner, there's nothing else I can do. |
If you have a chance to try your job with Apache Flink Kubernetes Operator or Ververica Community version that would be another option to confirm whether classpath control is exactly the root cause of the problem |
I will try that in the next weeks when I get a chance to set it up on a K8s cluster! |
Hi @novakov-alexey, I'm able to reproduce this bug with a minimalist project: https://github.com/arnaud-daroussin/DuplicatedNoneBug Somehow, there is 2 instances of A simple ProcessFunction outputting I didn't spot where the problem is, but with this minimal project, it should help you to pinpoint this bug! |
Hi @arnaud-daroussin , thanks, I will have a look. |
@arnaud-daroussin once again, thanks for creating a separate project with a test. I think the problem is not in the library code but in the user side code. It can be fixed by: arnaud-daroussin/DuplicatedNoneBug#1 |
Of course, you are right, I should have seen my simplified project don't use flink-scala-api at all! But thanks to your remark, I eventually found a missing serializer import for a ProcessFunction on my main project. |
I will close then this ticket as we still do not have clear evidence that this issue is a bug and not a user side misconfiguration. |
@novakov-alexey, I owe you / anyone googling a response here. tl;drI think managed Flink without control over the classpath (or at least class loading) is incompatible with using Scala 3 and Longer versionI've finally found the time to install the K8s Flink operator CRD and set up the whole thing in application mode. Works very well. I assume the root cause of the issue I reported truly lies with Flink's default classpath on AWS and that there's little one can do about that. It doesn't show you the classpath in the UI and extracting it at runtime only yields the job's jar (both in JM and TM), but according to this: https://docs.aws.amazon.com/managed-flink/latest/java/best-practices.html, it looks functionally identical to a standard Flink installation + the kinesis runtime. If that's indeed the case, I guess the final state the application runs with is a wild mix of 2.12, 2.13, and 3, with the 2.13 & 3 dependencies coming from the far jar, since you can't mark them as ReproductionYou can reproduce a variant of this by not removing Just wget + tar a flink release, I used 1.18.1, and just grab all jars linked in the docs above. If we search those for a # local reproduction of what i _suspect_ flink on AWS' classpath looks like
➜ flink-1.18.1 (main) ✗ [local] λ for j in $(ls lib/*.jar); do echo $j; jar -tf $j | grep scala/None.class; done 15:20:52
lib/aws-kinesisanalytics-runtime-1.2.0.jar
lib/flink-cep-1.18.1.jar
lib/flink-clients-1.18.1.jar
lib/flink-connector-base-1.18.1.jar
lib/flink-connector-files-1.18.1.jar
lib/flink-core-1.18.1.jar
lib/flink-csv-1.18.1.jar
lib/flink-dist-1.18.1.jar
lib/flink-java-1.18.1.jar
lib/flink-json-1.18.1.jar
lib/flink-metrics-core-1.18.1.jar
lib/flink-runtime-web-1.18.1.jar
lib/flink-scala_2.12-1.18.1.jar # !!!
scala/None.class # !!!
lib/flink-streaming-java-1.18.1.jar
lib/flink-table-api-bridge-base-1.18.1.jar
lib/flink-table-api-java-1.18.1.jar
lib/flink-table-api-java-bridge-1.18.1.jar
lib/flink-table-api-java-uber-1.18.1.jar
lib/flink-table-planner-loader-1.18.1.jar
lib/flink-table-runtime-1.18.1.jar
lib/log4j-1.2-api-2.17.1.jar
lib/log4j-1.2-api-2.20.0.jar
lib/log4j-api-2.17.1.jar
lib/log4j-api-2.20.0.jar
lib/log4j-core-2.17.1.jar
lib/log4j-core-2.20.0.jar
lib/log4j-slf4j-impl-2.17.1.jar
lib/log4j-slf4j-impl-2.20.0.jar Grabbed https://github.com/novakov-alexey/flink-scala-option.git and ran java.lang.NoClassDefFoundError: scala/collection/MapFactory
at com.example.foosStream.main(foosStream.scala:55) Which might tempt you to do this: lazy val root = (project in file(".")).settings(
name := "flink-scala-option",
// removed exclude, set this to true
assemblyPackageScala / assembleArtifact := true,
libraryDependencies ++= Seq(
("org.flinkextended" %% "flink-scala-api" % "1.18.1_1.1.5").excludeAll(
ExclusionRule(organization = "org.apache.flink"),
ExclusionRule(organization = "org.scalameta"),
ExclusionRule(organization = "com.google.code.findbugs")
),
"org.apache.flink" % "flink-clients" % "1.18.1" % Provided,
"ch.qos.logback" % "logback-classic" % "1.4.14" % Provided
)
) Which, of course, means: jar -tf flink-scala-option-assembly-0.1.0-SNAPSHOT.jar | grep "scala/None"
scala/None$.class
scala/None.class This only works if you remove java.lang.NoSuchMethodError: 'java.lang.Object scala.Predef$.refArrayOps(java.lang.Object[])' But once you set it to classloader.parent-first-patterns.default: java.;org.apache.flink.;com.esotericsoftware.kryo;org.apache.hadoop.;javax.annotation.;org.xml;javax.xml;org.apache.xerces;org.w3c;org.rocksdb.;org.slf4j;org.apache.log4j;org.apache.logging;org.apache.commons.logging;ch.qos.logback (i.e., the default from the docs w/o the In which case, we now have an environment 2 competing SolutionFor us, I self-hosted on K8s. When deploying the charts, I've added I've also not bundled any Flink or scala dependencies and used |
@chollinger93 thanks for excellent report and investigation. I think you are exactly right as for K8s hosted Flink and AWS Flink ways of using Scala 2.13/3. To summarise once again:
P.S. If you try to use Scala 2.13 or 3 and still keeping Scala 2 around in Flink, then the consequences are random errors related to classloading such as "NoSuchMethodError" |
Direct follow up from #106, but I figured a new issue would be helpful. Using
"1.18.1_1.1.5"
w/ scala3.4.0
on AWS' managed Flink (i.e., little control over their setup) w/ Flink 1.18.I am seeing
scala.MatchError: None (of class scala.None$)
issues just like in #106, also having a hard time replicating it locally at all, so I can't test against this, nor run a debugger through it. I think @novakov-alexey 's classpath suspicion is spot on.My flow is essentially
[]byte
=>A
(via Kafka) =>B
(viaProcessFunction
), bothA
andB
being case classes w/Optional
values.The Kafka deserializer works flawlessly.
It seems to fail on the custom mapping step, which is a
ProcessFunction[A, B]
. The stacktrace points me to actx.output(tag, p)
w/in theProcessFunction
, where the following happens:The line in question is https://github.com/flink-extended/flink-scala-api/blob/v1.18.1_1.1.5/src/main/scala/org/apache/flinkx/api/serializer/OptionSerializer.scala#L50
In the past, my
A => B
map has failed on something seemingly simple likeSo I resorted to (essentially):
(Note the match on
_
, rather than matching onNone
)Which fixed the issue and provided the downstream processing an actual
None
value.Today, dealing with a new message type, I had to deal with a even more nested case class that was partially
None
. I'll spare you the full example, but doing basically thisDid work.
Which would support the theory that, unless you manually re-write every
None
type, something is making it so thatNone
!=None
. The only way I can "explain" that (using that term liberally) is "something something classpath".I haven't read through the
OptionSerializer
code in this repo in it's entirety, but one could argue for pattern matching against_
there too.If I get the time I'll do more digging, but for now my super-cursed pattern matching against
_
works around this (I think?)The text was updated successfully, but these errors were encountered: