-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Scala] Scala Serialization optimization #682
Comments
I would recommend that specialized Scala support be put in a separate jar from the Java support. The Scala Library jar is big so you don't want to have a dependency on it unless you need to. The fury-scala jar would need to be published for multiple Scala versions. Many Spark and Flink users still use Scala 2.11 so you would need to support 2.11, 2.12, 2.13 and 3. If you are not familiar with Scala, publishing the Scala version specific jars is pretty straightforward because build tools like Maven have specialized plugins for Scala. |
Maybe we can use |
I think it will be hard to support reasonably complicated Scala classes using Java Reflection only code. |
Supporting Serialization/Deserialization of This FAQ shows some of the problems that Jackson-Module-Scala has in this area. https://github.com/FasterXML/jackson-module-scala/wiki/FAQ |
I see, thanks. I didn't realized scala will erase primitive type for nested generic type. I haven't use scala for several years. Thanks very much for sharing this information. This breaks our assumption for java genercis. For java collection types, we can know the value is a // generate pseudo-code
for (Entry e : map.entrySet()) {
// null flag write
stringSerailizer.writeString(e.getKey());
buffer.writeSliLong(e.getValue());
} But for scala, since we can't know the actual type using relection when infer // generate pseudo-code
for (Entry e : map.entrySet()) {
// null flag write
stringSerailizer.writeString(e.getKey());
Object v = e.getValue();
writeType(v.getClass());
Serializer s = getSerializer(v); // query from map
s.write(buffer, v)// vritual method call
} It will be slower and introduce more space overhead. |
It may only be necessary to write the inner type info for It may be possible to use scala libs to find to out what the erased types are but it is pretty complicated and you would like need a separate solution for Scala 2.x and Scala 3. |
You could experiment with Unfortunately, |
Our current protocol will write it only once, see #923 , but without this information ahead, the serialization for elements will introduce virtual method call, which will be slower. And the jit optimization is how fury get such boost up, it will be better if we can support such things for scala, considering it's used wisedly in spark/flink/akka. |
Yes, we shouldn't ignore this type information, otherwise the pperformance won't be the best |
Another thing I found it that scala collection doesn't implement java collection interface One method I can see is we convert scala collection to java collection in generated serializer(come with object creation overhead), or implement seperate jit support for such collection |
Converting Scala collections to Java collections and vice versa will not be cheap. |
Now with your new input, I totally agree that we should add scala optimization support in a new library. This is a complicated work, and scala collections are much more complex than java collection framework. We must write the implementation using scala collection API, otherwise this work will be too much. |
Currently if a scala class doesn't have scala collections field type, the performance is good in fury. Better scala serialization support should be done in a new jar. And perhaps we also need to add |
Is your feature request related to a problem? Please describe.
Scala normal classes and case classes are supported by fury well.
But scala still has some special serialization behaviour which need optimization:
object
serialization #764The text was updated successfully, but these errors were encountered: