Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14222][BUILD] Remove jackson-module-scala dependency #12213

Closed

Conversation

JoshRosen
Copy link
Contributor

This patch removes our jackson-module-scala dependency in order to reduce the number of dependencies that we'll have to upgrade when adding experimental Scala 2.12 support. I think that our current use of this library is fairly minimal and removing the dependency seems like less work than having to help test and publish it for every Scala 2.12 milestone plus the final 2.12 release (see FasterXML/jackson-module-scala#245).

/cc @vanzin @andrewor14

@JoshRosen
Copy link
Contributor Author

/cc @aarondav and @ahirreddy, do you know of any pitfalls that I watch out for in removing this dependency?

@JoshRosen JoshRosen changed the title [SPARK-14222] Remove jackson-module-scala dependency [SPARK-14222][BUILD] Remove jackson-module-scala dependency Apr 6, 2016
@JoshRosen
Copy link
Contributor Author

By the way, I'm almost certain that this patch is going to fail its first round of tests provided that our coverage is good enough; just wanted to get an early draft out for feedback to find out if there are any catastrophic blockers.

@cowtowncoder
Copy link

I don't know enough about Spark internals to know if this makes sense, but there is a smaller alternative databinding, Jackson jr:

https://github.com/FasterXML/jackson-jr

which does use Jackson's streaming parser+generator (jackson-core), but not databinding. This makes it about 80% smaller to use. There is also uber-jar variant with shaded jackson-core, which would remove any conflicts with various jackson versions, with size of 350kB (jackson-jr-objects which does not bundle jackson-core is about 75kB).
Jackson-jr does support both "untyped" access (Maps, Lists, wrappers) and very basic Bean-binding, and there is even simple optional tree model (jackson-jr-stree, 24kB); and performance-wise is comparable to full Jackson databinding for most use cases.

So using this might allow removal of jackson-databind, and this in turn could help with some of version compatibility problems as many spark jobs use Jackson for their own needs.
But as I said, I don't know how heavy usage is and whether this is a practical possibility.

@JoshRosen
Copy link
Contributor Author

Besides a few direct uses of Jackson, I think that Spark also uses it indirectly through json4s-jackson so switching to jackson-jr might be a little tricky right now. Thanks for pointing out that library, though: it's going to be a huge help if we do wind up having to shade / hide the full version of Jackson from our classpath.

@cowtowncoder
Copy link

@JoshRosen ah. Yes, good point on usage via json4s.

@vanzin
Copy link
Contributor

vanzin commented Apr 6, 2016

I think one place that would break without the scala module, and where we don't have unit tests right now, is rebuilding the SQL UI in the history server. The events are read from the log file and processed with jackson, and since they're case classes, that probably won't work without the scala module.

@@ -56,7 +55,7 @@ private[spark] object JsonProtocol {

private implicit val format = DefaultFormats

private val mapper = new ObjectMapper().registerModule(DefaultScalaModule)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added in #10061 in order to allow the new SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd events to be written to the event log using Jackson. I'll see if there's an existing unit test for roundtrip serialization of these events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Apr 7, 2016

Test build #55145 has finished for PR 12213 at commit 4104b1a.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Okay, so it turns out that removing this is going to be kind of tricky right now because we'd take on the burden of ensuring proper serialization of Scala Option types and, from experience, that's going to become a bit messy.

Therefore, I'm going to fall back on an alternative proposal: I'll take on the burden of getting jackson-module-scala to run on 2.12 and then will work to move from a lot of imperative json4s code to Jackson databind (https://issues.apache.org/jira/browse/SPARK-12141) so that we can eventually remove json4s and consolidate on Jackson (https://issues.apache.org/jira/browse/SPARK-14439). Once we've done that, if it later turns out that we need to remove jackson-module-scala then we can just absorb / inline whatever portion of it we actually use.

@JoshRosen JoshRosen closed this Apr 7, 2016
@JoshRosen JoshRosen deleted the remove-jackson-module-scala branch April 7, 2016 00:20
@cowtowncoder
Copy link

@JoshRosen probably makes sense, but just in case it needs to be revisited Jackson's core databind has much more support for "referential types" (which includes Option/Optional) as of 2.7, and supporting it should be much simpler with basic (de)serializer like one used for AtomicReference. I can help more if/when this is needed; for now I just mention it as forward-reference. I will actually need to work with @nbauernfeind (maintainer of Jackson Scala module) to ensure Scala module 2.7 is fully up-to-date with this part of type handling; earlier it required much more explicit work by module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants