Skip to content

yaozhang24/pcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Posterior Conformal Prediction (PCP)

This repository contains the Python code to implement PCP and reproduce the experiments and figures in our article Posterior Conformal Prediction.

Prerequisites

  • numpy
  • random
  • pandas
  • scipy
  • scikit-learn
  • statistics
  • statsmodels
  • tqdm
  • tensorflow
  • keras
  • conditionalconformal

Installing

The development version of our code is available on github:

git clone https://github.com/yaozhang24/pcp.git

Example

One simple way to start using PCP is to generate intervals with approximate conditional coverage in our synthetic data experiment:

import numpy as np
from utils import PCP, train_val_test_split, simulate_data, cross_val_residuals
from sklearn.ensemble import RandomForestRegressor

# Generate a synthetic dataset and split it into three folds
X, Y = simulate_data(num_samples=15000, setting=1)
X_train, X_val, X_test, Y_train, Y_val, Y_test, _ = train_val_test_split(X, Y, 1/3)

# Train the random forest model on the full training data
RF = RandomForestRegressor().fit(X_train, Y_train)

# Get predictions and residuals for the validation and test sets
predictions_val = RF.predict(X_val)
R_val = np.abs(Y_val - predictions_val)
predictions = RF.predict(X_test)
R_test = np.abs(Y_test - predictions)

# Cross-validation to generate a separate set of residuals for hyperparameter selection
RF_model = RandomForestRegressor()
X_train_cv, R_train_cv = cross_val_residuals(X_train, Y_train, model=RF_model)

# Run PCP
alpha = 0.1  # Level for PCP
PCP_model = PCP()
PCP_model.train(X_train_cv, R_train_cv) # Hyperparameter selection
pcp_quantiles = PCP_model.calibrate(X_val, R_val, X_test, R_test, alpha)[0] # Compute quantiles

# Compute intervals for all test samples
lower_bounds = predictions - np.array(pcp_quantiles)
upper_bounds = predictions + np.array(pcp_quantiles)

PCP can also be applied to achieve robust subgroup coverage, and level-adaptive coverage in classification. The implementation of PCP in these applications follows the same steps above. We refer to our real-data experiments (MEPS19 and HAM10000) for a demonstration.

Reproduction

To reproduce the experiments and figures in our article, please download the following datasets and run the corresponding notebooks in our repository.

Datasets

Communities and Crime (link)

Communities and Crime Unnormalized (link)

Online News Popularity (link)

Superconductivty (link)

Medical Expenditure Panel Survey (MEPS) 19 & 20 (link)

HAM10000 image dataset (link)

About

Posterior Conformal Prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published