-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CARD Regression Toy Example Notebook #18
Comments
Hi @nilsleh, thank you for checking out our paper and repo. Our work focuses solely on modeling (recovering) the aleatoric uncertainty of the true We have written scripts that are pre-configured for the toy examples. For instance, you may run bash training_scripts/run_toy_sinusoidal_regression_mdn.sh after |
Hi @XzwHan , thank you for your reply! I understood the benefit of your approach and like your proposed metric. I have run the preconfigured scripts for the toy examples but have the following question. In Table 2 of your paper you report the NLL on the UCI regression tasks and to compute the NLL of a single data point one requires a predictive uncertainty. It appears in the code that in order to do the NLL computation, the predictive variance is the variance computed over all samples. However, I am not entirely clear what timestep Or maybe more general, given a new data point, what is the corresponding uncertainty that CARDS would compute for
|
We would compute NLL at all timesteps (so that we can make plots like Figure 8 to see the change of NLL during the reverse diffusion process); the reported NLL in Table 2 is at For one new data point In other words, both NLL and QICE are metrics for distribution matching, instead of "uncertainty". Aleatoric uncertainty is part of the data-generating true distribution, thus when we observe the learned distribution matches well with the true distribution, we can say that the learned distribution has captured the aleatoric uncertainty of the true distribution. |
@XzwHan Thank you again for your reply. I think what I am trying to ask is given the scenario that you have fitted your CARD model on the regression training data and you are supposed to make a prediction for a new data point In section 4.2.1 of the paper you talk about Instance Level Confidence in the classification setting to which you previously state: "We intend to provide an alternative sense of uncertainty, by introducing the idea of model Thus if you interpret predictive uncertainty as "confidence" and are given a single regression test instance From the toy examples in Figure 1 of your paper, it is quiet powerful to see what a variety of distributions CARD can recover. It is clear that for instance the "Full Circle" experiment could not be modeled by a standard BNN with gaussian assumption and here, it also wouldn't make sense to talk about a predictive uncertainty as a variance or std since that would just be trying to match a Gaussian and yield unreasonable results. However, in cases like the "Log-Log-Linear" example one could train a heteroscedastic model and get uncertainty bands at an instance level for each new data point |
@wrkhard I suppose if you use an Ensemble, you also have to make some distributional assumption to define your predictive distribution. For example, Deep Ensembles make an approximate assumption of a Gaussian Mixture Model over the predictions from the individual ensemble members. So one could make a similar assumption with the samples from the CARD model to get uncertainty bands. This assumption is also done in other common approaches like MC-Dropout. I suppose the other important detail one could get into is aleatoric vs epistemic uncertainty, where Ensembles and CARD vary, so I think there are more things to consider. |
Hi @nilsleh and @wrkhard, thank you very much for your questions and comments. In our work, the parameterization of the diffusion model is a deterministic function (the forward noise prediction network Meanwhile, to answer the question "what is your uncertainty about the prediction that your model generated for this test point", we would need epistemic uncertainty, which for regression tasks would be quite useful under an out-of-distribution setting, or in general new data coming from regions where training data is sparse: as @wrkhard mentioned, you could train an ensemble of models, and see whether or not different models would give drastically different predictions. A good reference is this review paper about aleatoric vs. epistemic uncertainty: their figures illustrate the concepts well. We also briefly discussed about the differences between these two types of uncertainty in our paper: please check Section A.2.1 and A.2.3. For our construction of instance-level model confidence for classification, we are hesitant to frame it with the current taxonomy of uncertainty: conceptually it is very similar to epistemic uncertainty, but we only train one deterministic model, instead of training multiple "hypotheses" as what's usually done to obtain epistemic uncertainty. Therefore, we just worded it as "an alternative way of measuring model confidence". Hope this helps to clarify some parts in our paper! |
Hi @wrkhard , Actually yes. We began writing a UQ-library for pytorch with lightning with the aim of laying the ground work for an open-source effort for the implementation of a variety of UQ-methods that are accessible for practitioners - called Lightning-UQ-Box. We also tried to port over the CARD implementation and have a notebook on our documentation page which can be run in google colab (little rocket icon on the top) and where we try to recreate the result for the Toy Donut dataset from Figure 1. Let me know if this is helpful for you, we are in the early stages of development so appreciate any feedback :) @XzwHan I tried to distill the scripts EDIT: updated link to notebook |
Hi @wrkhard @nilsleh, thank you for your feedback and suggestions! Would you guys be willing to elaborate a bit more about the particular tasks with earth based data you are working on, such that CARD, or uncertainty-based methods in general, could potentially be helpful (e.g., are they regression/classification, which metrics you would check, what functions in our code could be most useful)? Making the code more modular is in our plans as one next step to improve CARD, and we are actively searching for suitable applications for our method, thus your perspectives and use cases would be greatly appreciated for us in the next phase of development. |
Hi @XzwHan Many tasks in Earth based remote sensing and modeling are inverse problems, and there is interest in fast methods that can capture complex posteriors. Even for non-inverse problems, robust UQ is often a requirement in any operational science products. If there was a way to include an easy to use UQ with CARD such that one may abstain when uncertainty is high - that would be a great feature I would think. Most problems in my work are primarily regression and I am often interested in finding a robust prediction interval. |
Could CARD be used in a similar fashion to this work: https://neurips.cc/virtual/2022/event/56948 There seem to be many connections between Flows and Diffusion. |
Hi @wrkhard, we introduced the mechanism to obtain instance-level confidence under the context of classification (Section 4.2.1 of our paper), which might be most closely related to the goal of "abstain when uncertainty is high". For regression tasks, CARD aims for the "fidelity" of the true underlying distribution, where we don't assume the presence of outliers or noisy data; meanwhile, conformal prediction might be helpful for the tasks you described as well. For inverse problems, diffusion models could be a reasonable modeling choice due to its ability to learn multimodal distributions, but might be restricted by its sampling speed — this would no longer be an issue if the recent developments in diffusion distillation could be applied under this setting. |
Thank you for the interesting Paper and publishing your code. I am trying to create a small notebook that demonstrates the CARD training and evaluation procedure on a toy dataset in order to better understand the mechanics and details of the CARD method. I am aware that you have runable scripts for some toy examples, however, they all expect config files and the large
regression/main.py
script which I found a bit difficult to follow since it is so large with lots of additional functionality. Therefore, I was hoping to create a small notebook that demonstrates the method on a small toy example and where I could add comments and descriptions to better understand the method. Here is a google colab notebook in which I tried to extract the code pieces into a small reproducible example. The central question I have is how I would properly extract a form of predictive uncertainty (with or without a Normal distribution assumption like used for NLL) and show it for the toy example to compare its qualities to other methods. Thanks in advance. If you think such a notebook is useful, I would also be happy to contribute it in a PR.The text was updated successfully, but these errors were encountered: