-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14222][BUILD] Remove jackson-module-scala dependency #12213
Conversation
/cc @aarondav and @ahirreddy, do you know of any pitfalls that I watch out for in removing this dependency? |
By the way, I'm almost certain that this patch is going to fail its first round of tests provided that our coverage is good enough; just wanted to get an early draft out for feedback to find out if there are any catastrophic blockers. |
I don't know enough about Spark internals to know if this makes sense, but there is a smaller alternative databinding, Jackson jr: https://github.com/FasterXML/jackson-jr which does use Jackson's streaming parser+generator ( So using this might allow removal of |
Besides a few direct uses of Jackson, I think that Spark also uses it indirectly through |
@JoshRosen ah. Yes, good point on usage via json4s. |
I think one place that would break without the scala module, and where we don't have unit tests right now, is rebuilding the SQL UI in the history server. The events are read from the log file and processed with jackson, and since they're case classes, that probably won't work without the scala module. |
@@ -56,7 +55,7 @@ private[spark] object JsonProtocol { | |||
|
|||
private implicit val format = DefaultFormats | |||
|
|||
private val mapper = new ObjectMapper().registerModule(DefaultScalaModule) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added in #10061 in order to allow the new SparkListenerSQLExecutionStart
and SparkListenerSQLExecutionEnd
events to be written to the event log using Jackson. I'll see if there's an existing unit test for roundtrip serialization of these events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @carsonwang
Test build #55145 has finished for PR 12213 at commit
|
Okay, so it turns out that removing this is going to be kind of tricky right now because we'd take on the burden of ensuring proper serialization of Scala Therefore, I'm going to fall back on an alternative proposal: I'll take on the burden of getting jackson-module-scala to run on 2.12 and then will work to move from a lot of imperative json4s code to Jackson databind (https://issues.apache.org/jira/browse/SPARK-12141) so that we can eventually remove json4s and consolidate on Jackson (https://issues.apache.org/jira/browse/SPARK-14439). Once we've done that, if it later turns out that we need to remove jackson-module-scala then we can just absorb / inline whatever portion of it we actually use. |
@JoshRosen probably makes sense, but just in case it needs to be revisited Jackson's core databind has much more support for "referential types" (which includes |
This patch removes our
jackson-module-scala
dependency in order to reduce the number of dependencies that we'll have to upgrade when adding experimental Scala 2.12 support. I think that our current use of this library is fairly minimal and removing the dependency seems like less work than having to help test and publish it for every Scala 2.12 milestone plus the final 2.12 release (see FasterXML/jackson-module-scala#245)./cc @vanzin @andrewor14