-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Scala 2.12 #332
Comments
Starting from #184 first issue i ran into is gradle-scalastyle-plugin not found for scala 2.12. resolved by adding this maven repo:
|
To stop scala compiler from exiting with strange messages like |
Thanks for making the effort to upgrade and support both Scala 2.11/12 versions @koertkuipers !! |
after publishing my own in-house versions of xgboost and mleap it seems everything compiles and tests fine using scala 2.12. seemed a little too easy... wonder if i am not running tests correctly. this is before #274 |
xgboost not having scala version in package name doesnt help |
see also dmlc/xgboost#4574 |
i build xgboost 0.90 for scala 2.11/2.12 here and published it to our bintray i build mleap for scala 2.11/2.12 here and published it also to our bintray my branch for transmogrifai for scala 2.12 is feat-scala212. its currently set to build for only scala 2.12 since i don't yet know how to do both with gradle in one pass but it also builds fine for scala 2.11 if you simply change |
Nice! Please pull the latest from https://github.com/salesforce/TransmogrifAI/tree/mt/spark-2.4 since I just updated it. I would like to try adding a cross build for 2.11/12 |
i expected to see serialization issues after merging in https://github.com/salesforce/TransmogrifAI/tree/mt/spark-2.4 due to #274 but i don't 😕 i see you put a lot of functions in companion objects like:
do you expect this approach to work for scala 2.12? |
I think it works only because models are serialized and loaded within the same code. If you train a model, save it then recompile the codebase and try to load the model - it would fail. |
@koertkuipers I agree here with @tovbinm, the problem is that in Scala 2.12 all lambda functions are given random names for each time you compile your code. So model trained on your laptop will not load in your production environment due to missing references. We are currently working on the fix for that. |
The test should be constructed a follows:
This way we would have a set of models what would contain stages saved from previous JVM runs and would allow us testing if the models would load & produce scores correctly. |
I propose to follow the migration guide for 2.12 in #274 and replace all the lambda functions introduced with concrete classes. |
i spoke to soon about everything testing fine. i was using |
all the tests except for one pass before #274. the only failure before #274 is:
after #274 i have lots of failures. but this is good news as this was expected and we know how to fix them.
|
i tried to convert some lambdas to concrete classes as explained in #274 migration notes. for example in
i replaced by:
this did not work. i now get:
however it works if i add Serializable:
|
i always thought scala functions extend serializable. but its a little more complex than that:
|
@wsuchy @tovbinm is what i am seeing the same issue? or is this something else? i am seeing:
|
@koertkuipers the way I see this working is to explicitly wrap lambdas inside a specific object (one lambda per one object) e.g:
And then have this supported by serializer to check presence of that object and hardcode it to always point to |
I think it cleaner to have classes extend FunctionX base classes, i.e. class GenderFn extends Function1[Passenger, MultiPickList] with Serializable {
def apply(p: Passenger): MultiPickList = Set(p.getGender).toMultiPickList
} |
@tovbinm the problem will be supporting it from macros, as at the time we evaluate them there is type information. |
@tovbinm @wsuchy take a look at tresata-opensource@95095ed i simply replaced lambdas with concrete classes until all the tests passed. its not pretty but it works. i am unsure if i have to add more tests for workflow independent model loading, e.g. load model from json files. |
Found this out today: In our hello world example, the code like this:
that result in json containing:
Even though one can instantiate back the workflow, the names are still volatile and depend on the order of occurrence. I think we should throw an exception instead. |
ok we now have all tests pass in scala 2.12 and are fully caught up with master branch |
That’s outstanding!! It would be great to have it merged into our codebase together with a cross build changes for Scala 2.11/2.12. |
@koertkuipers are there any plans on your end to raise a PR with Scala 2.12 changes? ;) |
i don't know gradle well, so to make it cross-compile for 2.11 and 2.12 is
probably not something i will get to.
…On Sat, Aug 24, 2019, 12:14 Matthew Tovbin ***@***.***> wrote:
@koertkuipers <https://github.com/koertkuipers> are there any plans on
your end to raise a PR with Scala 2.12 changes? ;)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#332?email_source=notifications&email_token=AAGIQJEDYZYWRLIQT73OFF3QGFM47A5CNFSM4HYLISR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5CDBOQ#issuecomment-524562618>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAGIQJH6JH5RQIOBI3JPAQ3QGFM47ANCNFSM4HYLISRQ>
.
|
I can help to add it. Let’s start from what you already have. |
edit: Ah I just saw that you are reverting back to Spark 2.3 so I'll assume this is blocked until the overall repo is ready for 2.4! @tovbinm I'm not particularly well versed in Gradle, but cross compiled Transmogrifai is something I'm interested in having so I'm happy to spend some time this weekend taking a stab if you don't have free cycles. Have you had a chance to try from @koertkuipers branch or should I start? |
We are planning to keep Spark 2.3 with Scala 2.11 in our master branch. Additionally, we would maintain the Spark 2.4 with Scala 2.11/12 branch separately, until the team is ready to move to Spark 2.4. I havent had a chance to review @koertkuipers branch. Where is it? |
its here: |
@koertkuipers it seems that you had to recompile xgboost with scala 2.12. would they not accept the PR as well so that we could use the official release? |
see: |
@tovbinm @koertkuipers it looks like we've finally got unblocked: https://search.maven.org/artifact/ml.dmlc/xgboost-jvm_2.12 |
Sweet! For a start, we need to start on the cross version build, since I assume you folks are not ready for 2.12 yet. |
Baby steps: I've done some work cross-version building in a branch based on @koertkuipers fork: https://github.com/salesforce/TransmogrifAI/tree/ndv/scala212 I'm using https://github.com/ADTRAN/gradle-scala-multiversion-plugin, which seems to work well. EDIT: this has been resolved ----> Looks like we're still blocked by MLeap (and the XGBoost component of that) for 2.12: combust/mleap#697 |
An observation: the only version of XGBoost that is based on 2.12 (1.2.0) also requires Spark 3 (https://github.com/dmlc/xgboost/releases). So if we want to keep using XGBoost, there may be no way to separate the Scala 2.12 upgrade from the Spark 3 upgrade. |
Spark 2.4 has Scala 2.11 and 2.12 builds. Starting with Spark 3.0 Scala 2.12 will become default and Scala 2.11 will be dropped.
This depends on #184
This will require:
The text was updated successfully, but these errors were encountered: