Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark ml decision tree model convert to pmml #93

Closed
XScarlett opened this issue May 21, 2020 · 4 comments
Closed

spark ml decision tree model convert to pmml #93

XScarlett opened this issue May 21, 2020 · 4 comments

Comments

@XScarlett
Copy link

when convert spark ml pipelinemodel to pmml, i want to set missingValueStrategy as lastPrediction and set ScoreDistribution and score in every node(not just leaf node), how can i do this in java?

The following picture is my code and part of pmml.xml result:
image

image

@vruusmann
Copy link
Member

Right now you're invoking PMMLBuilder#buildFile(...) which saves the PMML class model object into a file in the local filesystem.

If you invoke PMMLBuilder#build(), then you'll obtain a live org.dmg.pmml.PMML object instance that you can modify as you see fit. I'd recommend using the Visitor API of the JPMML-Model library for implementing all the necessary transformations and rearrangements.

For example, changing the TreeModel@missingValueStrategy attribute value:

PMMLBuilder pmmlBuilder = ...
PMML pmml = pmmlBuilder.build();

Visitor mvsCustomizer = new AbstractVisitor(){
    @Override
    public VisitorAction visit(TreeModel treeModel){
      treeModel.setMissingValueStrategy(TreeModel.MissingValueStrategy.LAST_PREDICTION);
      return super.visit(treeModel);
    }
};
mvsCustomizer.applyTo(pmml);

@vruusmann
Copy link
Member

It's possible to compute record counts for "parent" tree levels by summing the record counts of their "child" tree levels.

There's a Visitor API example available in another demo project:
https://github.com/vruusmann/rf_feature_impact/blob/master/src/main/java/feature_impact/visitors/ScoreDistributionGenerator.java

@vruusmann
Copy link
Member

Leaving this issue open-ish - a reminder that perhaps there's a way to generalize and implement all this functionality in the form of JPMML-SparkML conversion options.

@XScarlett
Copy link
Author

Thank you so much!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants