A REST server to quickly create test PMML models.
pmml-zoo
allows you to quickly generate simulated data and create test PMML models by sending a JSON payload to a REST server, getting back the trained model.
What it is not
pmml-zoo
doesn't aim at creating production models, it is intended to create models for smoke, integration and unit tests.
The best way to get started is using pmml-zoo
container image.
$ docker pull ruivieira/pmml-zoo:0.0.1
$ docker run -i --rm -p 5000:5000 ruivieira/pmml-zoo
Assuming the server is running locally, the full REST API will be available at http://0.0.0.0:5000/apidocs.
As an example, let's create a linear regression.
We can send the following JSON payload to 0.0.0.0:5000/model/linear-regression
:
curl --request POST \
--url http://0.0.0.0:5000/model/linear-regression \
--header 'content-type: application/json' \
--data '
{"data": {
"size": 1000,
"inputs": [
{"name": "feature-1",
"type": "continuous",
"points": [[10.0, 20.0], [20.0, 40.0], [50, 35.0], [100, 16.0]]
},
{"name": "feature-2",
"type": "discrete",
"points": [[0, 3.9], [2, 4.3], [8, 2.9], [9, 7.0]]
},
{"name": "feature-3",
"type": "categorical",
"points": [["low", 2.0], ["medium", 4.0], ["high", 1.0]]
}
],
"outputs": [
{"name": "feature-4",
"type": "continuous",
"points": [[1.0, 2.0], [4.0, 7.3], [7.0, 1.0], [100, 16.0]]
}]
}
}' \
-o model.pmml
Data is simulated by first creating an empirical distribution by interpolating the provided points
.
This empirical distribution is then sampled size
times and that will be the variable data.
An important note is that all variables are independent (although spurious correlation may occur).
A complete explanation is provided in the documentation.
size
is the size of the dataset.points
is a list of data points to use to construct the interpolation, in the format(value, weight)
. For instance a list of[(1.0, 2.0), (2.0, 4.0)]
means that value2.0
will more frequent.name
is the feature name, which be used in the PMML modeltype
can be one ofcontinuous
,discrete
orcategorical
inputs
andoutputs
have the same format, with the obvious difference implied in the name.
After sending the above payload, a response consisting of the PMML's XML is returned, which is save (in this example) to the model.pmml
file.
For now, these are the supported models:
- Linear regression (
/model/linearregression
) - Random forest classification (
/model/randomforest
)
Please use the issues for any suggestions, feedback, PRs or bugs. Thank you!