Uncertainty: Conformal Prediction V1 #802

Kevin-Chen0 · 2022-10-09T13:22:04Z

Incorporate the first 2 split conformal prediction (SCP) methods to create prediction intervals for NeuralProphet:

Absolute difference between y and y-hat (Naive)
Conformalized Quantile Regression (CQR) - Source

Example code:

m = NeuralProphet(quantiles=[0.05, 0.95])
metrics = m.fit(train_df)
m.conformalize(calibration_df, alpha=0.1, method='cqr')  # Create the conformal prediction interval width q
forecast = m.predict(test_df)  # Automatically includes two predictive columns with +q and -q
m = m.highlight_nth_step_ahead_of_each_forecast(1)
m.plot(forecast)  # Automatically plots two predictive columns with +q and -q

Use the conformal_prediction_energy_hospital_load.ipynb example notebook as test.

… conformal_prediction_energy_hospital_load.ipynb example notebook.

…on_energy_hospital_load.ipynb example notebook.

…_energy_hospital_load.ipynb example notebook.

…ction_energy_hospital_load.ipynb.

…prediction_energy_hospital_load.ipynb.

…l input params for m1 model except for quantiles.

…hospital_load.ipynb hence 4 models m1-4.

…b a bit.

…_energy_hospital_load.ipynb.

…al_prediction_energy_hospital_load.ipynb.

…and plot_forecast.py files.

…otebook.

…oad_enbpi.ipynb.

…pynb.

…lude val_cov_pct and make fold_overlap_pct a dependent variable.

…l_load_enbpi.ipynb and cross_validation_energy_hospital_load.ipynb.

…load.ipynb and feature-use/conformal_prediction_energy_hospital_load_enbpi.ipynb.

…_enbpi_agg.ipynb.

…diction.ipynb and uncertainty_estimation.ipynb to uncertainty_quantile_regression.ipynb.

…imation.ipynb.

…rmalize() method.

ourownstory

Great work Kevin!

As this exposes a lot of new UI, I focused on critically reviewing those areas.
Thus far, I have only reviewed the forecaster.py file.

Will review rest later, adding a review already so you can move forward with the changes.

neuralprophet/forecaster.py

ourownstory · 2022-11-27T01:37:42Z

neuralprophet/forecaster.py

+        # Conformal prediction interval with q
+        if self.q_hats:
+            if self.conformal_method == "naive":
+                df["yhat1 - qhat1"] = df["yhat1"] - self.q_hats[0]


I think it would be more understandable to the general user, if we keep uncertainty naming consistent and also just show one uncertainty value in the df.
As we currently use f"yhat1 {quantile_lo}%" and f"yhat1 {quantile_hi}%" I suggest we stick with those.

CP method is naive, so it does not take into account quantile low and quantile high. That is for CQR. Naive is only for yhat +- q, so saying f"yhat1 {quantile_lo}%" in incorrect here.

ourownstory · 2022-11-27T01:39:17Z

neuralprophet/forecaster.py

+            else:  # self.conformal_method == "cqr"
+                quantile_hi = str(max(self.config_train.quantiles) * 100)
+                quantile_lo = str(min(self.config_train.quantiles) * 100)
+                df[f"yhat1 {quantile_hi}% - qhat1"] = df[f"yhat1 {quantile_hi}%"] - self.q_hats[0]


in the case of CQR, I think my comment above also applies - let's simply overwrite f"yhat1 {quantile_lo}%" and f"yhat1 {quantile_hi}%".
IMO, When a user explicitly executes thes conformalize steps, it should be evident enough that the outcome is the conformalized quantile.

I don't agree on this front. The only way it is evident is that is stating f"yhat1 {quantile_hi}% - qhat1".

neuralprophet/forecaster.py

ourownstory · 2022-11-27T01:43:54Z

neuralprophet/forecaster.py

+                df["yhat1 - qhat1"] = df["yhat1"] - self.q_hats[0]
+                df["yhat1 + qhat1"] = df["yhat1"] + self.q_hats[0]
+            else:  # self.conformal_method == "cqr"
+                quantile_hi = str(max(self.config_train.quantiles) * 100)


This will skip all but the top and bottom quantiles - if a user specifies more than 2, they are not conformalized.
We should either raise an error in this case in the conformalize step (and here), or adopt the conformalize method for more than 2 quantiles. Or am I misreading something here?

If the CP is "naive", then NP doesn't use the quantiles at all. If CP is "CQR", then it will take the largest and smallest value in the quantile, regardless of the length of quantiles list.

I can make an assert check that the quantiles list need to have at least 2 values, for high and low, but that can be done as a separate PR.

ourownstory · 2022-11-27T01:48:15Z

neuralprophet/forecaster.py

@@ -3012,3 +3025,30 @@ def _reshape_raw_predictions_to_forecst_df(self, df, predicted, components):
                        yhat_df = pd.Series(yhat, name=comp).set_axis(df_forecast.index)
                        df_forecast = pd.concat([df_forecast, yhat_df], axis=1, ignore_index=False)
        return df_forecast
+
+    def conformalize(self, df_cal, alpha, method="naive", plotting_backend="default"):


If a user configures QR, then uses CP, I would expect for them to receive CQR by default.
It appears to me that the default method="naive" will however ignore the QR estimates.
Maybe we could even remove the method arg, as QR + CP = CQR and no-QR + CP = naive CP.
Does that make sense?

No, naive is a legitimate conformal prediction method. User should be able to use it regardless of whether there is any QR or not. So my design philosophy is that the user can use whatever CP method regardless of the trained model.

As for what the default method should be, for QR or not QR, that can be debatable. I am leaning towards creating an "auto", to encompass what you said: QR + CP = CQR and no-QR + CP = naive CP. But that can be added in a new PR.

I would definitely love to see a more unified approach to our different uncertainty modelling approaches - it's getting rather confusing to me and also how different their APIs are or rather my uncertainty on how to use them / combine them.

ourownstory · 2022-11-27T01:51:28Z

neuralprophet/forecaster.py

+            df_cal : pd.DataFrame
+                calibration dataframe
+            alpha : float
+                user-specified significance level of the prediction interval


What happens if quantiles in QR were already set? Shouldn't it then be using the same?
We could even make it optional, automatically retrieving the QR quantiles and fail if they are not set, non-symmetric or more than two?

Not necessarily, you can have alpha that is different from QR.

It is possible, by default, to retrieve the QR quantiles, check for symmetry, and automatically alpha, but that can be done as a separate PR.

ourownstory · 2022-11-27T01:52:23Z

neuralprophet/forecaster.py

+                    * ``cqr``: Conformalized Quantile Regression
+        """
+        df_cal = self.predict(df_cal)
+        if isinstance(plotting_backend, str) and plotting_backend == "default":


Why do we need to check for isinstance(plotting_backend, str)?

To ensure that plotting_backend is not None, as it is one of the default values.

neuralprophet/forecaster.py

…d into Conformal config.

ourownstory · 2022-12-02T00:47:18Z

neuralprophet/conformal_prediction.py

+    q_hats = []
+    noncon_scores_list = _get_nonconformity_scores(df_cal, method, quantiles)
+
+    for noncon_scores in noncon_scores_list:


Maybe we can move this part to a separate method/function in a subsequent PR.

ourownstory

Good work Kevin!

We can merge this PR, and address the comments in a subsequent PR.

ourownstory · 2022-12-02T00:49:16Z

neuralprophet/conformal_prediction.py

+        scores_list = [scores_df["scores"].values]
+    else:  # method == "naive"
+        # Naive nonconformity scoring function
+        scores_list = [abs(df["y"] - df["yhat1"]).values]


seems like we are currently only doing this for one-step ahead?
Maybe we can extend this to multiple forecast steps in a subsequent PR.

Yes, it is hardcoded so it only does one-step ahead. I agree we can create a subsequent PR to enable conformal prediction for multiple steps.

ourownstory · 2022-12-02T00:51:42Z

neuralprophet/conformal_prediction.py

+                nonconformity scores from the calibration datapoints
+
+    """
+    quantile_hi = None


Maybe we can extend this to an arbitrary amount of quantiles in a subsequent PR.

I don't follow. CQR only looks at the lowest and highest quantiles regardless of the number of quantiles in-between (if any). Maybe can extend for more advanced versions of conformal prediction, although I don't know of any that need beyond high and low quantiles.

ourownstory · 2022-12-02T00:55:29Z

neuralprophet/forecaster.py

@@ -812,6 +816,18 @@ def predict(self, df, decompose=True, raw=False):
            forecast = pd.concat((forecast, fcst), ignore_index=True)
        df = df_utils.return_df_in_original_format(forecast, received_ID_col, received_single_time_series)
        self.predict_steps = self.n_forecasts
+        # Conformal prediction interval with q


I think this code block can be isolated and moved to the conformalize method if I understand this correctly?

No, I have it when the .predict() outputs the forecast_df, that df also contains the forecasted value plus/min the conformal prediction interval. For example:

yhat1 - qhat1 yhat1 + qhat1

Just like what's been with QR, given forecast columns like:

yhat1 5.0% yhat1 95.0%

ourownstory · 2022-12-02T01:03:34Z

neuralprophet/forecaster.py

@@ -3000,3 +3015,46 @@ def _reshape_raw_predictions_to_forecst_df(self, df, predicted, components):
                        yhat_df = pd.Series(yhat, name=comp).set_axis(df_forecast.index)
                        df_forecast = pd.concat([df_forecast, yhat_df], axis=1, ignore_index=False)
        return df_forecast
+
+    def conformalize(self, df_cal, alpha, method="naive", plotting_backend="default"):


currently, predict implicitly assume that this method has been called beforehand.
We could:
a) To improve encapsulation we can move the code block from predict here, call this method conformal_predict or similar, and have it accept a prediction df which is passed on to predict internally.
b) alternatively, this could be split into a three step procedure: compute_conformalization, predict, conformalize_prediction.

Both would have the advantage of separating regular prediction and conformal prediction.
There may be an even better approach though.

Please see PR #1044. This implements what you have for a). There is a conformal_predict() method that combines the conformalize() and predict(), and will need to input most the calibration and test sets for split CP. With this implemented, conformalize() will then be removed. Also, the config_conformal will no longer be necessary and will be removed. @noxan

Also, for future bootstrapped (e.g. Jackknife+ and CV+ based) CP methods, calibration set will not be needed, which is why calibration_df is set to None as default in conformal_predict().

ourownstory · 2022-12-02T01:05:51Z

neuralprophet/forecaster.py

+                Options
+                    * (default) ``naive``: Naive or Absolute Residual
+                    * ``cqr``: Conformalized Quantile Regression
+            plotting_backend : str


Maybe we can separate the plotting functionality into a separate method in a subsequent PR.

ourownstory · 2022-12-02T01:07:29Z

neuralprophet/forecaster.py

+                df_cal, alpha, method, self.config_train.quantiles, plotting_backend
+            )
+
+    # def conformalize_predict(self, df, df_cal, alpha, method="naive"):


Oh, I see, you already created this method.
Maybe we can make conformalize a util instead of a class method and use this method as the main interface?

Possible, but I intend to give the user the to run either the .conformalize() and .predict() methods separately or run as one with .conformalize_prediction(), just like for scikit's .fit_transform(). However, the reason why I haven't yet is because the .conformalize_prediction() will need to input two datasets, on the calibration set and another test set. Maybe that is fine because b/c the .train() gives the optionality of adding a validation set alongside the train set.

ourownstory · 2022-12-02T01:09:20Z

neuralprophet/plot_forecast_matplotlib.py

@@ -131,6 +138,17 @@ def plot(
                        alpha=0.2,
                    )

+    # Plot any conformal prediction intervals


for the default plotting method I suggest we only plot one uncertainty type at once, as it may be confusing to most users to see multiple uncertainty types at once.

ourownstory · 2022-12-02T01:10:34Z

tests/test_plotting.py

+            fig1.show()
+            fig2.show()
+    # With auto-regression enabled
+    # TO-DO: Fix Assertion error n_train >= 1


How can we best resolve this issue?

Will need to dig into it.

tests/test_uncertainty.py

neuralprophet/plot_forecast_matplotlib.py

noxan · 2022-12-02T05:05:11Z

neuralprophet/configure.py

+    q_hats: list
+
+
+ConfigConformalPrediction = Conformal


There is no point in having Conformal and ConfigConformalPrediction right? For other dataclasses we had this separation because they might contain multiple items, yet this is not the case here.

Not necessarily for ConfigConformalPrediction, but @ourownstory wants me to put q_hats and method into a conformal/uncertainty config dataclass instead of being a primary class variable for NeuralProphet. So I did.

noxan · 2022-12-02T05:10:17Z

neuralprophet/forecaster.py

+        """
+        df_cal = self.predict(df_cal)
+        if isinstance(plotting_backend, str) and plotting_backend == "default":
+            plotting_backend = "matplotlib"


Why is here another selection of our plotting backend happening? We should definitely have this centralized to make the migration to plotly more easy.

It's needed to order to print out the One-Sided Interval Width with q plot when user is running .conformalize(). This plot is very helpful in visualizing the q for given alpha, where q is the basis of the prediction interval. See the uncertainty_conformal_prediction.ipynb.

noxan · 2022-12-02T05:13:54Z

neuralprophet/conformal_prediction.py

+                * ``cqr``: Conformalized Quantile Regression
+
+        quantiles : list
+            list of quantiles for quantile regression uncertainty estimate


Those conformal prediction rely on the quantile regression as it reuses those parameters or are those separate ones? Just asking as I'm trying to better understand how those to methods are connected (or not)?

Right now there are two available conformal prediction methods: naive and CQR. Naive does conformal prediction independently from the QR while CQR applies conformal prediction on the QR itself rather than the point prediction.

noxan · 2022-12-02T05:16:18Z

neuralprophet/forecaster.py

+                df[f"yhat1 {quantile_hi}% - qhat1"] = df[f"yhat1 {quantile_hi}%"] - self.config_conformal.q_hats[0]
+                df[f"yhat1 {quantile_hi}% + qhat1"] = df[f"yhat1 {quantile_hi}%"] + self.config_conformal.q_hats[0]
+                df[f"yhat1 {quantile_lo}% - qhat1"] = df[f"yhat1 {quantile_lo}%"] - self.config_conformal.q_hats[0]
+                df[f"yhat1 {quantile_lo}% + qhat1"] = df[f"yhat1 {quantile_lo}%"] + self.config_conformal.q_hats[0]


I'm a bit confused with all the extra output values. What benefit do they bring me as a user and how would I interpret them? Maybe conformal prediction is an advanced feature overall, yet it would be great to have some guide or instructions on how to make use of this method or at least to better understand what I'm missing out on 😅

Kevin-Chen0 and others added 30 commits September 4, 2022 16:24

Added Naive and CQR conformal prediction methods to forecaster.py and…

aa76e81

… conformal_prediction_energy_hospital_load.ipynb example notebook.

Changed to predict test_df instead of train-df for conformal_predicti…

1aba414

…on_energy_hospital_load.ipynb example notebook.

Added Evaluate performance section at the end of conformal_prediction…

61caad5

…_energy_hospital_load.ipynb example notebook.

Fixed the conformal interval plotting for m1 model in conformal_predi…

0de1951

…ction_energy_hospital_load.ipynb.

Renamed the 'naive' conformal_method to 'residual'.

51a1074

Fixed interval_width for CQR and the forecast plotting for conformal_…

37fd419

…prediction_energy_hospital_load.ipynb.

Added conformal_prediction.py and 'cqr_adv' conformal method.

38895d8

Renamed conformal method 'residual' back to 'naive'.

5c5ad6b

Merge remote-tracking branch 'origin/main' into conformal-prediction

fdb0697

Merge with main branch

c572506

Modified conformal_prediction_energy_hospital_load.ipynb to remove al…

5731c8c

…l input params for m1 model except for quantiles.

Merge main to branch.

f2d2767

Added default model as m1 in feature-use/conformal_prediction_energy_…

77f2305

…hospital_load.ipynb hence 4 models m1-4.

Reverting last commit.

890dc4c

Added default model as m1 in feature-use/conformal_prediction_energy_…

fbbc152

…hospital_load.ipynb hence 4 models m1-4.

Cleaned up feature-use/conformal_prediction_energy_hospital_load.ipyn…

837c1c9

…b a bit.

Modified the evaluate performance section in the conformal_prediction…

b9f34ec

…_energy_hospital_load.ipynb.

Modified models predictions so that it exclused first 24*3 in conform…

ee139f0

…al_prediction_energy_hospital_load.ipynb.

Merge branch 'main' into conformal-prediction

62e24ea

Applied black formatting for forecaster.py, conformal_prediction.py, …

8263175

…and plot_forecast.py files.

Added conformal_prediction_energy_hospital_load_enbpi.ipynb example n…

1086a53

…otebook.

Added bootstrapped B vector to conformal_prediction_energy_hospital_l…

997679e

…oad_enbpi.ipynb.

Modified conformal_prediction_energy_hospital_load_enbpi.ipynb.

d355dac

Modified eval_df in conformal_prediction_energy_hospital_load_enbpi.i…

39bbfe7

…pynb.

Added cross_validation_energy_hospital_load.ipynb.

fc7fa58

Merged

7e7458e

Modified conformal_prediction_energy_hospital_load_enbpi.ipynb to inc…

64961b6

…lude val_cov_pct and make fold_overlap_pct a dependent variable.

Added val_fold_size param to both conformal_prediction_energy_hospita…

8f7e9af

…l_load_enbpi.ipynb and cross_validation_energy_hospital_load.ipynb.

Cleared model .fit() output for conformal_prediction_energy_hospital_…

df51d6a

…load.ipynb and feature-use/conformal_prediction_energy_hospital_load_enbpi.ipynb.

Added tutorials/feature-use/conformal_prediction_energy_hospital_load…

697be93

…_enbpi_agg.ipynb.

ourownstory added this to the Release 0.5.1 milestone Nov 23, 2022

Kevin-Chen0 modified the milestones: Release 0.5.1, Release 0.5.0 Nov 23, 2022

Added comments for all the conformal prediction methods.

14b5978

Kevin-Chen0 requested review from noxan and karl-richter November 24, 2022 14:52

Kevin-Chen2 and others added 6 commits November 24, 2022 16:53

Added plot_nonconformity_scores() to plot_forecast_plotly.

14b5605

Merge branch 'main' into split-conformal-prediction

cc05fe6

Renamed split_conformal_prediction.ipynb to uncertainty_conformal_pre…

4db230c

…diction.ipynb and uncertainty_estimation.ipynb to uncertainty_quantile_regression.ipynb.

Renamed uncertainty_quantile_regression.ipynb back to uncertainty_est…

26bbaee

…imation.ipynb.

Modified viz for the conformalize() method.

e512a9a

Changed the plotting_backend default value to 'default' for the confo…

ab780cd

…rmalize() method.

ourownstory requested changes Nov 27, 2022

View reviewed changes

ourownstory modified the milestones: Release 0.5.0, Release 0.5.1 Nov 27, 2022

Kevin-Chen0 and others added 4 commits November 27, 2022 09:39

Merge branch 'main' into split-conformal-prediction

eefa24f

Merge branch 'main' into split-conformal-prediction

60369ec

Created Conformal config dataclass and put q_hats and conformal_metho…

d990048

…d into Conformal config.

Merge branch 'main' into split-conformal-prediction

fde920e

ourownstory reviewed Dec 2, 2022

View reviewed changes

ourownstory approved these changes Dec 2, 2022

View reviewed changes

ourownstory merged commit 5290746 into main Dec 2, 2022

ourownstory deleted the split-conformal-prediction branch December 2, 2022 01:19

noxan mentioned this pull request Dec 2, 2022

Conformal Prediction V1.1 tasks #1017

Closed

3 tasks

noxan reviewed Dec 2, 2022

View reviewed changes

Kevin-Chen0 mentioned this pull request Jan 8, 2023

Conformal Prediction V1.2 tasks #1101

Closed

5 tasks

tztsai mentioned this pull request Apr 6, 2023

[Time Series] Add Autoregressive Model vanderschaarlab/synthcity#110

Open

Uncertainty: Conformal Prediction V1 #802

Uncertainty: Conformal Prediction V1 #802

Conversation

Kevin-Chen0 commented Oct 9, 2022

ourownstory left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ourownstory left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kevin-Chen0 Dec 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kevin-Chen0 Dec 6, 2022 •

edited

Loading