How can I save only transformer pipeline to pmml ? #103

liuyonglang · 2021-01-01T09:32:36Z

hello:
I have a try. but get this error "Expected a pipeline with one or more models, got a pipeline with zero models". thanks!!

vruusmann · 2021-01-01T09:46:44Z

Exact duplicate of #61

liuyonglang · 2021-01-01T10:03:30Z

yeah,I have see that.but I use spark to train Model, how can I do with this.

vruusmann · 2021-01-01T10:30:29Z

I have see that.but I use spark to train Model

This functionality was implemented into the JPMML-Converter library (base layer for all JPMML family conversion libraries) as detailed in jpmml/jpmml-converter#11

So, the technical capability is there, but looks like it hasn't been fully integrated with org.jpmml.sparkml.PMMLBuilder yet.

Specifically, there needs to be an additional clause for if(models.size() == 0) in the following if-else statement:
https://github.com/jpmml/jpmml-sparkml/blob/1.6.2/src/main/java/org/jpmml/sparkml/PMMLBuilder.java#L154-L166

liuyonglang · 2021-01-01T10:45:46Z

oh. so what to do in the additional clause for if(models.size() == 0) ? thanks.

liuyonglang · 2021-01-01T10:48:59Z

can I modify add an additional clause for if(models.size() == 0) just do nothing

liuyonglang · 2021-01-01T11:10:47Z

I check there code's logic ,i found this model can not be null ,so I dont know how to add that condition .can I new a BaselineModel ?

vruusmann · 2021-01-01T11:23:51Z

i found this model can not be null

This model can very well be null. Simply add some extra nullability checks here and there to make sure that you don't run into an accidental NPE.

For example, this check (postProcessorNames.size() > 0) should be changed to (model != null) && (postProcessorNames.size() > 0), etc.

liuyonglang · 2021-01-01T11:32:56Z

I see. so I just modify that class like you said .dont go to modelEncoder logic .it will work. I will have a try tomorrow. thanks and happy new year .

liuyonglang · 2021-01-02T10:46:27Z

hello . I have a try .I can save to pmml whit only feature pipelineModel. but the evaluator can not load this pmml.so,how can I use this pmml?

liuyonglang · 2021-01-02T11:05:23Z

i found this pmml have no TransformationDictionary ,so even i use the Tranformer entry point,I just get result size 0

liuyonglang · 2021-01-04T02:30:20Z

@vruusmann hello:
This problem's detail is here:
the pipeline generate code is :
`process_1 = Pipeline(stages=[StringIndexer(inputCol="C7", outputCol="C7_index",handleInvalid="keep" or "skip")])
process_2 = Pipeline(stages=[BatchStringIndexer(inputCols=["C1","C2"], outputCols=["C1_index","C2_index"])])

onehot = OneHotEncoder(inputCol="C7_index", outputCol="C7_index_hot")
vec = VectorAssembler(inputCols= ['C1', 'C3', 'C4', 'C5', 'C6', 'C7_index_hot', 'C8', 'C9', 'C10', 'C11'],outputCol='input_feature')
mymodel_tmp = LogisticRegression(featuresCol='input_feature',labelCol= "C2")
mymodel_lgb = LightGBMClassifier(featuresCol='input_feature',labelCol= "C2")
mymodel_xgb = XGBoostClassifier(featuresCol='input_feature',labelCol= "C2",
treeMethod = 'hist',
maxDeltaStep = 0.0,
seed = 12345,
objective = 'binary:logistic',
trainTestRatio = 0.9,
evalMetric = 'logloss',
nthread = 1,
numWorkers= 2,
growPolicy='lossguide',
missing=float(0))
process_3 = Pipeline(stages=[process_1,onehot,vec,mymodel_tmp])
process_4 = Pipeline(stages=[process_1,onehot,vec])
process_5 = Pipeline(stages=[mymodel_xgb])
process_6 = Pipeline(stages=[process_1,onehot,vec,mymodel_xgb])
process_7 = Pipeline(stages=[process_1,onehot,vec])
process_8 = Pipeline(stages=[process_1,onehot])
process_9 = Pipeline(stages=[process_1,process_2])
process_10 = Pipeline(stages=[process_1,onehot,process_2])

#model_1 = process_1.fit(data)
#model_2 = process_2.fit(data)
#model_3 = process_3.fit(data)
data_model = process_4.fit(data)
#data_input = data_model.transform(data)
#model_4 = mymodel_lgb.fit(data_input)
#model_5 = process_5.fit(data_input)
#model_6 = process_6.fit(data)
model_7 = process_7.fit(data)
feature_8 = process_8.fit(data)
feature_6 = process_1.fit(data)
feature_9 = process_9.fit(data)
feature_10 = process_10.fit(data)

#model_1.write().overwrite().save(path_1)
#model_2.write().overwrite().save(path_2)
#model_3.write().overwrite().save(path_3)
#model_4.write().overwrite().save(path_4)
#model_5.write().overwrite().save(path_5)

pmmlBuilder1 = PMMLBuilder(spark.sparkContext, data, feature_8).putOption(None, spark.sparkContext._jvm.org.jpmml.sparkml.model.HasTreeOptions.OPTION_COMPACT, True)

pmmlBuilder1.buildFile("feature_8.pmml")

pmmlBuilder2 = PMMLBuilder(spark.sparkContext, data, feature_6).putOption(None, spark.sparkContext._jvm.org.jpmml.sparkml.model.HasTreeOptions.OPTION_COMPACT, True)

pmmlBuilder2.buildFile("feature_6.pmml")

pmmlBuilder3 = PMMLBuilder(spark.sparkContext, data, feature_9).putOption(None, spark.sparkContext._jvm.org.jpmml.sparkml.model.HasTreeOptions.OPTION_COMPACT, True)

pmmlBuilder3.buildFile("feature_9.pmml")

pmmlBuilder4 = PMMLBuilder(spark.sparkContext, data, feature_10).putOption(None, spark.sparkContext._jvm.org.jpmml.sparkml.model.HasTreeOptions.OPTION_COMPACT, True)

pmmlBuilder4.buildFile("feature_10.pmml")`;

I have only modify the jpmml-spark's code like this:
`org.dmg.pmml.Model model;

	if(models.size() == 1){
		model = Iterables.getOnlyElement(models);
	} else

	if(models.size() > 1){
		model = MiningModelUtil.createModelChain(models);
	} else if(models.size() == 0){
		model = null;
	}

	else{
		throw new IllegalArgumentException("Expected a pipeline with one or more models, got a pipeline with zero models");
	} // End if

	if(postProcessorNames.size() > 0 && model != null){
		org.dmg.pmml.Model finalModel = MiningModelUtil.getFinalModel(model);
		Output output = ModelUtil.ensureOutput(finalModel);

		for(FieldName postProcessorName : postProcessorNames){
			DerivedField derivedField = derivedFields.get(postProcessorName);

			encoder.removeDerivedField(postProcessorName);

			OutputField outputField = new OutputField(derivedField.getName(), derivedField.getOpType(), derivedField.getDataType())
				.setResultFeature(ResultFeature.TRANSFORMED_VALUE)
				.setExpression(derivedField.getExpression());

			output.addOutputFields(outputField);
		}
	}

	PMML pmml = encoder.encodePMML(model);`

But, I got the feature_6.pmml dont have TransformationDictionary tag ,the pmml's end like this <Value value="11868"/> <Value value="8981"/> <Value value="14766"/> <Value value="11345"/> <Value value="9571"/> <Value value="9076"/> <Value value="10804"/> <Value value="13217"/> <Value value="14623"/> <Value value="8397"/> <Value value="2141"/> <Value value="23688"/> <Value value="__unknown" property="invalid"/> </DataField> </DataDictionary> </PMML>

So, I use the jpmml-evaluator's Transformer's API can not get the result.

vruusmann closed this as completed Jan 1, 2021

vruusmann reopened this Jan 1, 2021

vruusmann closed this as completed in 33e650b Jan 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I save only transformer pipeline to pmml ? #103

How can I save only transformer pipeline to pmml ? #103

liuyonglang commented Jan 1, 2021 •

edited

Loading

vruusmann commented Jan 1, 2021

liuyonglang commented Jan 1, 2021

vruusmann commented Jan 1, 2021

liuyonglang commented Jan 1, 2021

liuyonglang commented Jan 1, 2021

liuyonglang commented Jan 1, 2021

vruusmann commented Jan 1, 2021

liuyonglang commented Jan 1, 2021

liuyonglang commented Jan 2, 2021

liuyonglang commented Jan 2, 2021

liuyonglang commented Jan 4, 2021

How can I save only transformer pipeline to pmml ? #103

How can I save only transformer pipeline to pmml ? #103

Comments

liuyonglang commented Jan 1, 2021 • edited Loading

vruusmann commented Jan 1, 2021

liuyonglang commented Jan 1, 2021

vruusmann commented Jan 1, 2021

liuyonglang commented Jan 1, 2021

liuyonglang commented Jan 1, 2021

liuyonglang commented Jan 1, 2021

vruusmann commented Jan 1, 2021

liuyonglang commented Jan 1, 2021

liuyonglang commented Jan 2, 2021

liuyonglang commented Jan 2, 2021

liuyonglang commented Jan 4, 2021

liuyonglang commented Jan 1, 2021 •

edited

Loading