You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What did you find confusing? Please describe.
I have been trying to include MLeap in my PySparkProcessor Job so I can serialize a Spark Pipeline to use later in a serving container, as this is the expected format. The documentation surrounding how to do this appears to be outdated. I can't seem to get the dependencies correct and keep receiving different error. What is the correct way to do this? Can we get an updated example with the correct base image, mleap_spark_assembly.jar, and mleap version that will work?
Additional context
After attempting to follow the documentation, I also tried a few different things to get it to work. I tried downloading the needed .jar files from Maven and including them via the submit_jars parameter. I also tried using the configuration option to specify MLeap as a dependency via the spark.jars.packages property. I also tried extending one of the base spark processing containers to install the needed python package and then include the .jars in the run command but could not get it to work. The thread below describes some of the issue I was running into, and I referred to it often when trying to troubleshoot how to get the serialization to work:
What did you find confusing? Please describe.
I have been trying to include MLeap in my PySparkProcessor Job so I can serialize a Spark Pipeline to use later in a serving container, as this is the expected format. The documentation surrounding how to do this appears to be outdated. I can't seem to get the dependencies correct and keep receiving different error. What is the correct way to do this? Can we get an updated example with the correct base image, mleap_spark_assembly.jar, and mleap version that will work?
Describe how documentation can be improved
I found some documentation that I tried to follow here:
https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_processing/spark_distributed_data_processing/sagemaker-spark-processing.html#Distributed-Data-Processing-using-Apache-Spark-and-SageMaker-Processing, but this did not have specific instructions for how to include MLeap. There was an example of how to include MLeap in a Glue Job, but following those same steps for a PySparkProcessor Job did not appear to work:
https://sagemaker-examples.readthedocs.io/en/latest/advanced_functionality/inference_pipeline_sparkml_xgboost_abalone/inference_pipeline_sparkml_xgboost_abalone.html#Serializing-the-trained-Spark-ML-Model-with-MLeap.
An updated example with what mleap version to use, as well as what base image to use would be very helpful.
Additional context
After attempting to follow the documentation, I also tried a few different things to get it to work. I tried downloading the needed .jar files from Maven and including them via the
submit_jars
parameter. I also tried using theconfiguration
option to specify MLeap as a dependency via thespark.jars.packages
property. I also tried extending one of the base spark processing containers to install the needed python package and then include the .jars in therun
command but could not get it to work. The thread below describes some of the issue I was running into, and I referred to it often when trying to troubleshoot how to get the serialization to work:combust/mleap-docs#8
The text was updated successfully, but these errors were encountered: