Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predictions for new data #39

Closed
MassimilianoGrassiDataScience opened this issue Nov 24, 2022 · 3 comments
Closed

Predictions for new data #39

MassimilianoGrassiDataScience opened this issue Nov 24, 2022 · 3 comments

Comments

@MassimilianoGrassiDataScience
Copy link

MassimilianoGrassiDataScience commented Nov 24, 2022

I was trying Autoprognosis, and I was able to develop the model successfully. Now I want to apply it to new data. I loaded the model following the tutorial and then I (naively?) used .predict_proba, but it did not work.

With different attempts, including re-developing the model with different data, it always resulted in errors, with different errors for different attempts.

E.g., the latest error is This QuantileTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.. In this case, the model is:
{'models': [<autoprognosis.plugins.pipeline.nop_normal_transform_catboost at 0x16f4b31f0>], 'weights': [0.9999999900000002], 'explainer_plugins': [], 'explanations_nepoch': 10000, 'explainers': None}

What should I do between load_model_from_file(model_path) and .predict_proba?

Thanks!

@MassimilianoGrassiDataScience
Copy link
Author

Just to provide another example, I have the error 'SimpleClassifierAggregator' object has no attribute '_classes' with the following model:
{'models': [<autoprognosis.plugins.pipeline.fast_ica_uniform_transform_lda at 0x173c63400>, <autoprognosis.plugins.pipeline.nop_scaler_random_forest at 0x173c316d0>, <autoprognosis.plugins.pipeline.variance_threshold_scaler_catboost at 0x173b1edf0>], 'method': 'average', 'explainer_plugins': [], 'explanations_nepoch': 10000, 'clf': <autoprognosis.plugins.ensemble.combos.SimpleClassifierAggregator at 0x173b1eee0>}

@bcebere
Copy link
Contributor

bcebere commented Nov 24, 2022

Hello @MassimilianoGrassiDataScience

AutoPrognosis selects a model architecture, and saves that without training the model. You get the architecture, and you can run your own benchmarks on different folds.

The main README contains such an example

from pathlib import Path

from sklearn.datasets import load_breast_cancer

from autoprognosis.studies.classifiers import ClassifierStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator


X, Y = load_breast_cancer(return_X_y=True, as_frame=True)

df = X.copy()
df["target"] = Y

workspace = Path("workspace")
study_name = "example"

study = ClassifierStudy(
    study_name=study_name,
    dataset=df,  # pandas DataFrame
    target="target",  # the label column in the dataset
    num_iter=100,  # how many trials to do for each candidate
    timeout=60,  # seconds
    classifiers=["logistic_regression", "lda", "qda"],
    workspace=workspace,
)

study.run()

output = workspace / study_name / "model.p"
model = load_model_from_file(output)

# <model> contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.
# This way, we can further benchmark the selected model on the training set.
metrics = evaluate_estimator(model, X, Y)

print(f"model {model.name()} -> {metrics['clf']}")

# Train the model
model.fit(X, Y)

# Predict the probabilities of each class using the model
model.predict_proba(X)

As you can see, before the predict_proba call, you need to call fit on your data, even if it is the same data you used for conducting the model search.

We will improve the error message in the future.

Let me know if this fixes your problem.

@MassimilianoGrassiDataScience
Copy link
Author

Thank you! It works now!

@bcebere bcebere closed this as completed Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants