Predictions for new data #39

MassimilianoGrassiDataScience · 2022-11-24T12:36:46Z

I was trying Autoprognosis, and I was able to develop the model successfully. Now I want to apply it to new data. I loaded the model following the tutorial and then I (naively?) used .predict_proba, but it did not work.

With different attempts, including re-developing the model with different data, it always resulted in errors, with different errors for different attempts.

E.g., the latest error is This QuantileTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.. In this case, the model is:
{'models': [<autoprognosis.plugins.pipeline.nop_normal_transform_catboost at 0x16f4b31f0>], 'weights': [0.9999999900000002], 'explainer_plugins': [], 'explanations_nepoch': 10000, 'explainers': None}

What should I do between load_model_from_file(model_path) and .predict_proba?

Thanks!

The text was updated successfully, but these errors were encountered:

MassimilianoGrassiDataScience · 2022-11-24T12:56:43Z

Just to provide another example, I have the error 'SimpleClassifierAggregator' object has no attribute '_classes' with the following model:
{'models': [<autoprognosis.plugins.pipeline.fast_ica_uniform_transform_lda at 0x173c63400>, <autoprognosis.plugins.pipeline.nop_scaler_random_forest at 0x173c316d0>, <autoprognosis.plugins.pipeline.variance_threshold_scaler_catboost at 0x173b1edf0>], 'method': 'average', 'explainer_plugins': [], 'explanations_nepoch': 10000, 'clf': <autoprognosis.plugins.ensemble.combos.SimpleClassifierAggregator at 0x173b1eee0>}

bcebere · 2022-11-24T14:29:27Z

Hello @MassimilianoGrassiDataScience

AutoPrognosis selects a model architecture, and saves that without training the model. You get the architecture, and you can run your own benchmarks on different folds.

The main README contains such an example

from pathlib import Path

from sklearn.datasets import load_breast_cancer

from autoprognosis.studies.classifiers import ClassifierStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator


X, Y = load_breast_cancer(return_X_y=True, as_frame=True)

df = X.copy()
df["target"] = Y

workspace = Path("workspace")
study_name = "example"

study = ClassifierStudy(
    study_name=study_name,
    dataset=df,  # pandas DataFrame
    target="target",  # the label column in the dataset
    num_iter=100,  # how many trials to do for each candidate
    timeout=60,  # seconds
    classifiers=["logistic_regression", "lda", "qda"],
    workspace=workspace,
)

study.run()

output = workspace / study_name / "model.p"
model = load_model_from_file(output)

# <model> contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.
# This way, we can further benchmark the selected model on the training set.
metrics = evaluate_estimator(model, X, Y)

print(f"model {model.name()} -> {metrics['clf']}")

# Train the model
model.fit(X, Y)

# Predict the probabilities of each class using the model
model.predict_proba(X)

As you can see, before the predict_proba call, you need to call fit on your data, even if it is the same data you used for conducting the model search.

We will improve the error message in the future.

Let me know if this fixes your problem.

MassimilianoGrassiDataScience · 2022-11-24T15:47:09Z

Thank you! It works now!

bcebere closed this as completed Nov 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictions for new data #39

Predictions for new data #39

MassimilianoGrassiDataScience commented Nov 24, 2022 •

edited

Loading

MassimilianoGrassiDataScience commented Nov 24, 2022

bcebere commented Nov 24, 2022

MassimilianoGrassiDataScience commented Nov 24, 2022

Predictions for new data #39

Predictions for new data #39

Comments

MassimilianoGrassiDataScience commented Nov 24, 2022 • edited Loading

MassimilianoGrassiDataScience commented Nov 24, 2022

bcebere commented Nov 24, 2022

MassimilianoGrassiDataScience commented Nov 24, 2022

MassimilianoGrassiDataScience commented Nov 24, 2022 •

edited

Loading