ILSRVC_CP_Notes.html

<style>html {font-family: Arial; line-height: normal; }
h2 {margin-left: 1em;}
h3 {margin-left: 2em}
h4 {margin-left: 3em;}
p,ul {margin-left: 4em;}</style>

<h1>Notes</h1>
<p>This is a demo of Conformal Prediction, a framework for prediction with fixed error rate.
 In this demo, CP is applied on top of a pre-trained ResNet50 model on the ImageNet LSVR Challenge 2012 data set.<br>
 Conformal Predictors output prediction sets (i.e. not just one label like conventional classifiers and regressors),
 with the property of <b>validity</b>: for any chosen target error rate \(\epsilon\), the prediction set fails to contain the
 correct label (i.e. makes an error) with relative frequency \(\epsilon\), barring statistical fluctuation. <br>
 Specifically, this demo use Class-conditional Inductive Conformal Predictors.
 While the demo can be useful to illustrate some aspect of CP, these notes are not meant to be an introduction to CP.
 The reader is referred to the monograph
 <cite> <a href="http://www.alrw.net/">[Gammerman et al., Algorithmic Learning
  in Random World]</a> </cite> for the full discussion of Conformal Predictors.<br>
 </p>

<h2>Details of this demo</h2>
<p> The user can choose an image out of a set of 2000 using a slider.<br>
The ImageNet label for the image is shown below the image itself.
 Another slider allows to choose the significance level \(\epsilon\) (i.e. the target error rate) which can vary from 0 to 1.</p>
<p>Below the sliders you can see two boxes containing the prediction set, one for ResNet50 and one for CP.<br>
It is also possible to choose between two forms of Non Conformity Measures, referred to here as NegProb and Ratio.<br>
 The plot shows the Empirical Cumulative Distribution Function (ECDF) of the Non Conformity Measure for the label
 selected in the CP prediction set shown on the right (or the one with the largest p-value). The ECDF is calculated on
 the calibration examples (plus hypothetical completion). </p>

<h2>Predictions</h2>
<p>CP outputs sets of labels, whereas the ResNet50 model outputs a distribution of probability over the 1,000 labels defined for the ImageNet data set.
In order to have a similar form of prediction for the two methods, we built a prediction set out of the ResNet50 probability distribution.
Specifically, we want to build a prediction set with a validity property, i.e. a set of labels such that, if the probability estimates are calibrated 
(that is, if they correspond to long-term relative frequencies), the actual label is contained in the set with the relative frequency equal to the chosen
confidence level \( 1-\epsilon \). To do that, we output the smallest set of labels whose total probability 
(as estimated by ResNet50) exceeds \(1-\epsilon\). <br>
For CP, we show the prediction set for the chosen significance level.<br>
 One should note that the ResNet50 sets constructed above are conservative, as the probability of hit equals or exceeds the targeted confidence.</p>


<h2>Calibration set and test set</h2>
<p>The CP calibration set and the test set are a random partition of the ILSVRC2012 Validation Set.
The latter comprises 50,000 labelled images, evenly distributed over the 1,000 labels.<br>
For the purposes of this demo, the ILSVRC2012 Validation Set was partitioned into a calibration set with 48,000 images and test set with 2,000 images.
 The partitioning was done with shuffling and stratification, ensuring that each category has the same number of images.</p>

<h2>NCMs</h2>
<p>Two NCMs are used here. Of course, many other choices of NCMs are possible.<br>
Some definitions:
<ul>
 <li>Let \( \ell\) denote the number of observations (images in this case) in the calibration set and let's use the index \( \ell+1 \) to denote a test object. </li>
<li>Let  \( \lbrace z_{1}, \ldots z_{\ell} \rbrace\) the calibration set with observations \( z_i = (x_i,y_i) \), where \( x_i \) is
a 224-by-224 image and \( y \in [1,2, \ldots, 1000] \)
</li>
<li>Let \( (p_1,p_2, \ldots, p_{1000}) \) the vector of 1,000 real numbers representing the probability distribution over the 1,000 labels
 estimated by ResNet50 for a test object \( x_{\ell+1}\) </li>
<li> Let \(\bar y\) be a hypothetical label for the test object </li>
</ul></p>
<h3>NegProb</h3>
<p>The NCM here referred to as NegProb is defined as:
$$ \mathcal{A}(x_{\ell+1},\bar y) = -p_{\bar y} $$
 i.e. the probability estimated for the hypothetical label, with its sign changed.</p>
<h3>Ratio</h3>
<p>The NCM here referred to as Ratio is defined as:
$$ \mathcal{A}(x_{\ell+1},\bar y) = \frac {\max_{y \neq \bar y}p_y} {p_{\bar y} } $$
i.e. the ratio of the max probability estimated for labels other than the hypothetical one to the probability estimated for the hypothetical label.</p>