Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I save only transformer pipeline to pmml ? #103

Closed
liuyonglang opened this issue Jan 1, 2021 · 11 comments
Closed

How can I save only transformer pipeline to pmml ? #103

liuyonglang opened this issue Jan 1, 2021 · 11 comments

Comments

@liuyonglang
Copy link

liuyonglang commented Jan 1, 2021

hello:
I have a try. but get this error "Expected a pipeline with one or more models, got a pipeline with zero models". thanks!!

@vruusmann
Copy link
Member

Exact duplicate of #61

@liuyonglang
Copy link
Author

yeah,I have see that.but I use spark to train Model, how can I do with this.

@vruusmann
Copy link
Member

I have see that.but I use spark to train Model

This functionality was implemented into the JPMML-Converter library (base layer for all JPMML family conversion libraries) as detailed in jpmml/jpmml-converter#11

So, the technical capability is there, but looks like it hasn't been fully integrated with org.jpmml.sparkml.PMMLBuilder yet.

Specifically, there needs to be an additional clause for if(models.size() == 0) in the following if-else statement:
https://github.com/jpmml/jpmml-sparkml/blob/1.6.2/src/main/java/org/jpmml/sparkml/PMMLBuilder.java#L154-L166

@vruusmann vruusmann reopened this Jan 1, 2021
@liuyonglang
Copy link
Author

oh. so what to do in the additional clause for if(models.size() == 0) ? thanks.

@liuyonglang
Copy link
Author

can I modify add an additional clause for if(models.size() == 0) just do nothing

@liuyonglang
Copy link
Author

I check there code's logic ,i found this model can not be null ,so I dont know how to add that condition .can I new a BaselineModel ?

@vruusmann
Copy link
Member

i found this model can not be null

This model can very well be null. Simply add some extra nullability checks here and there to make sure that you don't run into an accidental NPE.

For example, this check (postProcessorNames.size() > 0) should be changed to (model != null) && (postProcessorNames.size() > 0), etc.

@liuyonglang
Copy link
Author

I see. so I just modify that class like you said .dont go to modelEncoder logic .it will work. I will have a try tomorrow. thanks and happy new year .

@liuyonglang
Copy link
Author

hello . I have a try .I can save to pmml whit only feature pipelineModel. but the evaluator can not load this pmml.so,how can I use this pmml?

@liuyonglang
Copy link
Author

i found this pmml have no TransformationDictionary ,so even i use the Tranformer entry point,I just get result size 0

@liuyonglang
Copy link
Author

@vruusmann hello:
This problem's detail is here:
the pipeline generate code is :
`process_1 = Pipeline(stages=[StringIndexer(inputCol="C7", outputCol="C7_index",handleInvalid="keep" or "skip")])
process_2 = Pipeline(stages=[BatchStringIndexer(inputCols=["C1","C2"], outputCols=["C1_index","C2_index"])])

onehot = OneHotEncoder(inputCol="C7_index", outputCol="C7_index_hot")
vec = VectorAssembler(inputCols= ['C1', 'C3', 'C4', 'C5', 'C6', 'C7_index_hot', 'C8', 'C9', 'C10', 'C11'],outputCol='input_feature')
mymodel_tmp = LogisticRegression(featuresCol='input_feature',labelCol= "C2")
mymodel_lgb = LightGBMClassifier(featuresCol='input_feature',labelCol= "C2")
mymodel_xgb = XGBoostClassifier(featuresCol='input_feature',labelCol= "C2",
treeMethod = 'hist',
maxDeltaStep = 0.0,
seed = 12345,
objective = 'binary:logistic',
trainTestRatio = 0.9,
evalMetric = 'logloss',
nthread = 1,
numWorkers= 2,
growPolicy='lossguide',
missing=float(0))
process_3 = Pipeline(stages=[process_1,onehot,vec,mymodel_tmp])
process_4 = Pipeline(stages=[process_1,onehot,vec])
process_5 = Pipeline(stages=[mymodel_xgb])
process_6 = Pipeline(stages=[process_1,onehot,vec,mymodel_xgb])
process_7 = Pipeline(stages=[process_1,onehot,vec])
process_8 = Pipeline(stages=[process_1,onehot])
process_9 = Pipeline(stages=[process_1,process_2])
process_10 = Pipeline(stages=[process_1,onehot,process_2])

#model_1 = process_1.fit(data)
#model_2 = process_2.fit(data)
#model_3 = process_3.fit(data)
data_model = process_4.fit(data)
#data_input = data_model.transform(data)
#model_4 = mymodel_lgb.fit(data_input)
#model_5 = process_5.fit(data_input)
#model_6 = process_6.fit(data)
model_7 = process_7.fit(data)
feature_8 = process_8.fit(data)
feature_6 = process_1.fit(data)
feature_9 = process_9.fit(data)
feature_10 = process_10.fit(data)

#model_1.write().overwrite().save(path_1)
#model_2.write().overwrite().save(path_2)
#model_3.write().overwrite().save(path_3)
#model_4.write().overwrite().save(path_4)
#model_5.write().overwrite().save(path_5)

pmmlBuilder1 = PMMLBuilder(spark.sparkContext, data, feature_8).putOption(None, spark.sparkContext._jvm.org.jpmml.sparkml.model.HasTreeOptions.OPTION_COMPACT, True)

pmmlBuilder1.buildFile("feature_8.pmml")

pmmlBuilder2 = PMMLBuilder(spark.sparkContext, data, feature_6).putOption(None, spark.sparkContext._jvm.org.jpmml.sparkml.model.HasTreeOptions.OPTION_COMPACT, True)

pmmlBuilder2.buildFile("feature_6.pmml")

pmmlBuilder3 = PMMLBuilder(spark.sparkContext, data, feature_9).putOption(None, spark.sparkContext._jvm.org.jpmml.sparkml.model.HasTreeOptions.OPTION_COMPACT, True)

pmmlBuilder3.buildFile("feature_9.pmml")

pmmlBuilder4 = PMMLBuilder(spark.sparkContext, data, feature_10).putOption(None, spark.sparkContext._jvm.org.jpmml.sparkml.model.HasTreeOptions.OPTION_COMPACT, True)

pmmlBuilder4.buildFile("feature_10.pmml")`;

I have only modify the jpmml-spark's code like this:
`org.dmg.pmml.Model model;

	if(models.size() == 1){
		model = Iterables.getOnlyElement(models);
	} else

	if(models.size() > 1){
		model = MiningModelUtil.createModelChain(models);
	} else if(models.size() == 0){
		model = null;
	}

	else{
		throw new IllegalArgumentException("Expected a pipeline with one or more models, got a pipeline with zero models");
	} // End if

	if(postProcessorNames.size() > 0 && model != null){
		org.dmg.pmml.Model finalModel = MiningModelUtil.getFinalModel(model);
		Output output = ModelUtil.ensureOutput(finalModel);

		for(FieldName postProcessorName : postProcessorNames){
			DerivedField derivedField = derivedFields.get(postProcessorName);

			encoder.removeDerivedField(postProcessorName);

			OutputField outputField = new OutputField(derivedField.getName(), derivedField.getOpType(), derivedField.getDataType())
				.setResultFeature(ResultFeature.TRANSFORMED_VALUE)
				.setExpression(derivedField.getExpression());

			output.addOutputFields(outputField);
		}
	}

	PMML pmml = encoder.encodePMML(model);`

But, I got the feature_6.pmml dont have TransformationDictionary tag ,the pmml's end like this <Value value="11868"/> <Value value="8981"/> <Value value="14766"/> <Value value="11345"/> <Value value="9571"/> <Value value="9076"/> <Value value="10804"/> <Value value="13217"/> <Value value="14623"/> <Value value="8397"/> <Value value="2141"/> <Value value="23688"/> <Value value="__unknown" property="invalid"/> </DataField> </DataDictionary> </PMML>

So, I use the jpmml-evaluator's Transformer's API can not get the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants