Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change normalisation default, fix bug in normalise_by_negative, adapt citations, absolute imports #166

Merged
merged 43 commits into from
Oct 18, 2022
Merged
Changes from 10 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
d1ebebe
Updated and Unified all Metric Citations. Wrote additional test cases…
leanderweber Oct 11, 2022
d190c86
Changed default normalisation function for all metrics to normalise_b…
leanderweber Oct 11, 2022
306976d
Changed all relative imports to absolute
leanderweber Oct 11, 2022
d12b912
Fixed undefined name in focus.py
leanderweber Oct 11, 2022
2e4c3a6
Merge branch 'main' into fix-normalisation-division
leanderweber Oct 11, 2022
14eb6b6
Edited Documentation
leanderweber Oct 12, 2022
a5f8e69
Updated docs
leanderweber Oct 12, 2022
a5f5faa
Updated docs
leanderweber Oct 13, 2022
7752c63
zennit test cases are not optional now. Resolved most * imports in qu…
leanderweber Oct 13, 2022
c507136
Updated tutorials with changes to helper hierarchy
leanderweber Oct 13, 2022
8f82724
Refactored installation options to allow for targeting single XAI pac…
leanderweber Oct 13, 2022
faef1fe
some fixes on documentation made
annahedstroem Oct 13, 2022
df80b16
tiny comments update
annahedstroem Oct 13, 2022
a918b41
minor change in docs
leanderweber Oct 14, 2022
82ef36e
Merge branch 'fix-normalisation-division' into installation-update
leanderweber Oct 14, 2022
dcca7c2
Merge pull request #172 from understandable-machine-intelligence-lab/…
leanderweber Oct 14, 2022
9b7e71c
Fixed Remaining blanket imports in tests (except fixtures)
leanderweber Oct 14, 2022
83ec610
Changed quantus/helpers/functions to quantus/functions
leanderweber Oct 14, 2022
4d8d6a6
updated docs and TODO
leanderweber Oct 14, 2022
665e81b
Updated Readme. Ran black.
leanderweber Oct 14, 2022
6281492
Fix error in tests
leanderweber Oct 14, 2022
9a821d9
Updated Getting Started
annahedstroem Oct 14, 2022
a14bddf
minor docs changes
leanderweber Oct 14, 2022
8f4f3b0
fix tests
leanderweber Oct 14, 2022
2972a63
Resolved merge conflicts
annahedstroem Oct 17, 2022
5477108
Fixed import metric file
annahedstroem Oct 17, 2022
de02a23
fixed import warn
annahedstroem Oct 17, 2022
508854d
Fixed import issues caused by merge
leanderweber Oct 18, 2022
81e4385
Small fixes to docs
annahedstroem Oct 18, 2022
e85a726
updated README.md
annahedstroem Oct 18, 2022
9b1c1bb
updated documentation
annahedstroem Oct 18, 2022
a0cef7b
Updated docs
annahedstroem Oct 18, 2022
5319b84
updated docs
annahedstroem Oct 18, 2022
6bca39c
updated docs
annahedstroem Oct 18, 2022
fbfb472
updated docs
annahedstroem Oct 18, 2022
fb12786
updated docs
annahedstroem Oct 18, 2022
8aeca45
updated docs
annahedstroem Oct 18, 2022
63e2214
updated docs
annahedstroem Oct 18, 2022
465e465
README.md update
annahedstroem Oct 18, 2022
be6c72b
README.md update
annahedstroem Oct 18, 2022
184bd98
README.md update and other docs
annahedstroem Oct 18, 2022
77e9ba0
README.md update and other docs
annahedstroem Oct 18, 2022
ce8bcec
README.md update and other docs
annahedstroem Oct 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 0 additions & 7 deletions __init__.py

This file was deleted.

2 changes: 1 addition & 1 deletion docs/source/docs_api/quantus.helpers.rst
Original file line number Diff line number Diff line change
@@ -28,4 +28,4 @@ Submodules
quantus.helpers.similarity_func
quantus.helpers.tf_model
quantus.helpers.utils
quantus.helpers.warn_func
quantus.helpers.warn
2 changes: 1 addition & 1 deletion docs/source/docs_api/quantus.helpers.warn_func.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
quantus.helpers.warn\_func module
=================================

.. automodule:: quantus.helpers.warn_func
.. automodule:: quantus.helpers.warn
:members:
:undoc-members:
:show-inheritance:
281 changes: 212 additions & 69 deletions docs/source/getting_started/getting_started_example.md

Large diffs are not rendered by default.

53 changes: 43 additions & 10 deletions docs/source/getting_started/installation.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,66 @@
## Quick Installation

Quantus can be installed from [PyPI](https://pypi.org/project/quantus/)
(this way assumes that you have either `torch` or `tensorflow` already installed on your machine).
### Installing from PyPI

If you already have [PyTorch](https://pytorch.org/) or [Tensorflow](https://www.tensorflow.org) installed on your machine,
Quantus can be obtained from [PyPI](https://pypi.org/project/quantus/) as follows:

```setup
pip install quantus
```

If you don't have `torch` or `tensorflow` installed, you can simply add the package you want and install it simultaneously.
Otherwise, you can simply add the desired framework in brackets, and it will be installed in addition to Quantus:

```setup
pip install "quantus[torch]"
pip install quantus[torch]
```
Or, alternatively for `tensorflow` you run:

OR

```setup
pip install "quantus[tensorflow]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the " must be kept, else the pip command won't work! please add it to the others as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pip install quantus[tensorflow]
```

Additionally, if you want to use the basic explainability functionality such as `quantus.explain` in your evaluations, you can run `pip install "quantus[extras]"` (this step requires that either `torch` or `tensorflow` is installed).
To use Quantus with `zennit` support, install in the following way: `pip install "quantus[zennit]"`.
### Installing from requirements.txt

Alternatively, simply install requirements.txt (again, this requires that either `torch` or `tensorflow` is installed and won't include the explainability functionality to the installation):
Alternatively, you can simply install from the requirements.txt found [here](https://github.com/understandable-machine-intelligence-lab/Quantus/blob/main/requirements.txt),
however, this only installs with the default setup, requiring either PyTorch or Tensorflow:

```setup
pip install -r requirements.txt
```

**Package requirements**
### Installing XAI Library Support (PyPI only)

Most evaluation metrics in Quantus allow for a choice of either providing pre-computed explanations directly as an input,
or to instead make use of several wrappers implemented in `quantus.explain` around common explainability libraries.
The following XAI Libraries are currently supported:

**Captum**

To enable the use of wrappers around [Captum](https://captum.ai/), you need to have PyTorch already installed and can then run

```setup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be captum, tensorflow? please also add quotation!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, captum is based on pytorch, so i think it is correct as-is.

however, with the new installation options, neither torch nor tensorflow will be required to be installed already, so I will just remove that part.

pip install quantus[extras]
```

**tf-explain**

To enable the use of wrappers around [tf.explain](https://github.com/sicara/tf-explain), you need to have [Tensorflow already installed and can then run

```setup
pip install quantus[extras]
```

**Zennit**

To use Quantus with support for the [Zennit](https://github.com/chr5tphr/zennit) library you need to have PyTorch already installed and can then run

```setup
pip install quantus[zennit]
```

### Package Requirements

```
python>=3.7.0
83 changes: 65 additions & 18 deletions docs/source/guidelines/guidelines_and_disclaimers.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,87 @@
## User guidelines

Just 'throwing' some metrics at your XAI explanations and consider the job done, is an approach not very productive.
Just 'throwing' some metrics at your explanations and considering the job done is not a very productive approach.
Before evaluating your explanations, make sure to:

* Always read the original publication to understand the context that the metric was introduced in - it may differ from your specific task and/ or data domain
* Spend time on understanding and investigate how the hyperparameters of the metrics influence the evaluation outcome; does changing the perturbation function fundamentally change scores?
* Establish evidence that your chosen metric is well-behaved in your specific setting e.g., include a random explanation (as a control variant) to verify the metric
* Reflect on the metric's underlying assumptions e.g., most perturbation-based metrics don't account for nonlinear interactions between features
* Ensure that your model is well-trained, a poor behaving model e.g., a non-robust model will have useless explanations
* Spend time on understanding and investigating how the hyperparameters of metrics can influence the evaluation outcome. Some parameters that usually influence results significantly include:
* the choice of perturbation function
* whether normalisation is applied and the choice of the normalisation function
* whether unsigned or signed attributions are considered
* Establish evidence that your chosen metric is well-behaved in your specific setting, e.g., include a random explanation (as a control variant) to verify the metric
* Reflect on the metric's underlying assumptions, e.g., most perturbation-based metrics don't account for nonlinear interactions between features
* Ensure that your model is well-trained, as a poor behaving model, e.g., a non-robust model will have useless explanations
* Each metric measures different properties of explanations, and especially the various categories (faithfulness, localisation, ...) can be viewed as different facettes of evaluation,
but a single metric never suffices as a sole criterion for the quality of an explanation method


## Disclaimers

**1. Implementation may differ from the original author(s)**

Note that the implementations of metrics in this library have not been verified by the original authors. Thus any metric implementation in this library may differ from the original authors. It is moreover likely that differences exist since 1) the source code of original publication is most often not made publicly available, 2) sometimes the mathematical definition of the metric is missing and/ or 3) the description of hyperparameter choice was left out. This leaves room for (subjective) interpretations.
Note that the implementations of metrics in this library have not been verified by the original authors.
Thus any metric implementation in this library may differ from the original authors.
It is moreover likely that differences exist since
* the source code of original publication is most often not made publicly available
* sometimes the mathematical definition of the metric is missing
* the description of hyperparameter choice was left out.

This leaves room for (subjective) interpretations.

**2. Discrepancy in operationalisation is likely**

Metrics for XAI methods are often empirical interpretations (or translations) of qualities that researcher(s) stated were important for explanations to fulfil. Hence it may be a discrepancy between what the author claims to measure by the proposed metric and what is actually measured e.g., using entropy as an operationalisation of explanation complexity.
Metrics for XAI methods are often empirical interpretations (or translations) of qualities that researcher(s) stated
were important for explanations to fulfil. Hence there may be a discrepancy between what the author claims to measure by
the proposed metric and what is actually measured, e.g., using entropy as an operationalisation of explanation complexity.

**3. Hyperparameters may (and should) change depending on application/ task and dataset/ domain**
**3. Hyperparameters may (and should) change depending on the application/ task and dataset/ domain**

Metrics are often designed with a specific use case in mind e.g., in an image classification setting. Thus it is not always clear how to change the hyperparameters to make them suitable for another setting. Pay careful attention to how your hyperparameters should be tuned; what is a proper baseline value in your context i.e., that represents the notion of “missingness”?
Metrics are often designed with a specific use case in mind, most commonly for an image classification setting.
Thus it is not always clear how to change the hyperparameters to make them suitable for another setting.
Pay careful attention to how your hyperparameters should be tuned and what a proper baseline value could be in your context

**4. Evaluation of explanations must be understood in its context; its application and of its kind**

What evaluation metric to use is completely dependent on: 1) the type of explanation (explanation by example cannot be evaluated the same way as attribution-based/ feature-importance methods), 2) the application/ task: we may not require the explanations to fulfil certain criteria in some context compared to others e.g., multi-label vs single label classification 3) the dataset/ domain: text vs images e.g, different dependency structures between features exist, and preprocessing of the data, leading to differences on what the model may perceive, and how attribution methods can react to that (prime example: MNIST in range [0,1] vs [-1,1] and any NN) and 4) the user (most evaluation metrics are founded from principles of what a user want from its explanation e.g., even in the seemingly objective measures we are enforcing our preferences e.g., in TCAV "explain in a language we can understand", object localisation "explain over objects we think are important", robustness "explain similarly over things we think looks similar" etc. Thus it is important to define what attribution quality means for each experimental setting.

**5. Evaluation (and explanations) will be unreliable if the model is not robust**

Evaluation will fail if you explain a poorly trained model. If the model is not robust, then explanations cannot be expected to be meaningful or interpretable [1, 2]. If the model achieves high predictive performance, but for the wrong reasons (e.g., Clever Hans, Backdoor issues) [3, 4], there is likely to be unexpected effects on the localisation metrics (which generally captures how well explanations are able to centre attributional evidence on the object of interest).

**6. Evaluation outcomes can be true to data or true to model**

Interpretation of evaluation outcome will differ depending on whether we prioritise that attributions are faithful to data or to the model [5, 6]. As explained in [5], imagine if a model is trained to use only one of two highly correlated features. The explanation might then rightly point out that this one feature is important (and that the other correlated feature is not). But if we were to re-train the model, the model might now pick the other feature as basis for prediction, for which the explanation will consequently tell another story --- that the other feature is important. Since the explanation function have returned conflicting information about what features are important --- we might now believe that the explanation function in itself is unstable. But this may not necessarily be true --- in this case, the explanation has remained faithful to the model but not the data. As such, in the context of evaluation, to avoid misinterpretation of results, it may therefore be important to articulate what you care most about explaining.
What evaluation metric to use can depend on the following factors:
* **The type of explanation:** e.g., an explanation by example cannot be evaluated
the same way as attribution-based or feature-importance methods
* **The application/ task:** we may not require the explanations to fulfil
certain criteria in some context compared to others, e.g., multi-label
vs. single label classification
* **The dataset/ domain:** e.g, text vs. images, or if different dependency structures between features exist,
as well as the preprocessing of the data, leading to differences on what the model
may perceive, and how attribution methods can react to that
* **The user:** most evaluation metrics are founded from principles of what
a user may expect from explanations, even in the seemingly objective
measures. E.g., localisation asks for the explanation to be focused on objects expected to be important,
and may fail independent of the explanation if the model simply does not consider those objects,
while robustness asks to explain similarly over things we
think looks similar, not considering how the model represents the data manifold etc.
Thus it is important to define what attribution quality means for each experimental setting.

**5. Evaluation (and explanations) can be unreliable if the model is not robust**

Evaluation can fail (depending on the evaluation method) if you explain a poorly trained model.
If the model is not robust, then explanations cannot be expected to be meaningful or interpretable [1, 2].
If the model achieves high predictive performance, but for the wrong reasons (e.g., Clever Hans effects, Backdoor issues)
[3, 4], unexpected effects on localisation metrics are likely.

**6. Evaluation outcomes can be true to the data or true to the model**

Generally, explanations should depend on both the data and the model.
However, both are difficult to measure at the same time, and
the interpretation of evaluation outcomes will differ depending on whether we prioritise
that attributions are faithful to data or to the model [5, 6]. As explained in [5],
imagine if a model is trained to use only one of two highly correlated features.
The explanation might then rightly point out that this one feature is important
(and that the other correlated feature is not). But if we were to re-train the model,
the model might now pick the other feature as basis for prediction, for which the explanation
will consequently tell another story --- that the other feature is important. Since the
explanation function have returned conflicting information about what features are important
--- we might now believe that the explanation function in itself is unstable. But this may
not necessarily be true --- in this case, the explanation has remained faithful to the model
but not the data. As such, in the context of evaluation, to avoid misinterpretation of results,
it may therefore be important to articulate what you care most about explaining.

**References**

28 changes: 20 additions & 8 deletions docs/source/index.md
Original file line number Diff line number Diff line change
@@ -16,21 +16,32 @@ e.g. pixel replacement strategy of a faithfulness test influences the ranking of
[📑 Shortcut to paper!](https://arxiv.org/abs/2202.06861)


This documentation is complementary to Quantus repository's [README.md](https://github.com/understandable-machine-intelligence-lab/Quantus) and provides documentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once everything related to the Installation, Getting started etc is finished then we also need to update the README.md so that they match

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated README

for how to install Quantus (**Installation**), how to contribute to the project (**Developer Documentation**) and on the interface (**API Documentation**).
For further guidance on what to think about when applying Quantus, please read the user guidelines (**Guidelines**).
This documentation is complementary to the [README.md](https://github.com/understandable-machine-intelligence-lab/Quantus) in the Quantus repository and provides documentation
for how to {doc}`install </getting_started/installation>` Quantus, how to {doc}`contribute </docs_dev/CONTRIBUTING>` to the project, and on the {doc}`interface </docs_api/modules>`.
For further guidance on what to think about when applying Quantus, please read the {doc}`user guidelines </guidelines/guidelines_and_disclaimers>`.

Do you want to get started? Please have a look at our simple MNIST/torch/Saliency/IntGrad toy example (**Getting started**).
Do you want to get started? Please have a look at our simple {doc}`toy example </getting_started/getting_started_example>` with PyTorch using MNIST data.
For more examples, check the [tutorials](https://github.com/understandable-machine-intelligence-lab/Quantus/tree/main/tutorials) folder.

Quantus can be installed from [PyPI](https://pypi.org/project/quantus/)
(this way assumes that you have either `torch` or `tensorflow` already installed on your machine).
If you already have [PyTorch](https://pytorch.org/) or [Tensorflow](https://www.tensorflow.org) installed on your machine, Quantus can be obtained from [PyPI](https://pypi.org/project/quantus/) as follows:

```setup
pip install quantus
```

For alternative ways to install Quantus, read more under **Installation**.
Otherwise, you can simply add the desired framework in brackets, and it will be installed in addition to Quantus:

```setup
pip install quantus[torch]
```

OR

```setup
pip install quantus[tensorflow]
```

For a more in-depth guide on how to install Quantus, read more {doc}`here </getting_started/installation>`.

```{toctree}
:caption: Installation
@@ -72,7 +83,7 @@ guidelines/guidelines_and_disclaimers

If you find this toolkit or its companion paper
[**Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations**](https://arxiv.org/abs/2202.06861)
interesting or useful in your research, use following Bibtex annotation to cite us:
interesting or useful in your research, please use the following Bibtex annotation to cite us:

```bibtex
@article{hedstrom2022quantus,
@@ -92,3 +103,4 @@ interesting or useful in your research, use following Bibtex annotation to cite
```

When applying the individual metrics of Quantus, please make sure to also properly cite the work of the original authors.
You can find the relevant citations in the documentation of each respective metric {doc}`here </docs_api/modules>`.
14 changes: 11 additions & 3 deletions quantus/__init__.py
Original file line number Diff line number Diff line change
@@ -4,6 +4,14 @@
# You should have received a copy of the GNU Lesser General Public License along with Quantus. If not, see <https://www.gnu.org/licenses/>.
# Quantus project URL: <https://github.com/understandable-machine-intelligence-lab/Quantus>.

from .helpers import *
from .metrics import *
from .evaluation import *
# Enable quantus.evaluate call
from quantus.evaluation import evaluate

# Enable quantus.explain call
from quantus.helpers.functions.explanation_func import explain

# Enable quantus.<function-class>.<function-name> call
from quantus.helpers.functions import *

# Enable quantus.<metric> call
from quantus.metrics import *
7 changes: 4 additions & 3 deletions quantus/evaluation.py
Original file line number Diff line number Diff line change
@@ -7,11 +7,12 @@
# Quantus project URL: <https://github.com/understandable-machine-intelligence-lab/Quantus>.

from typing import Union, Callable, Dict, Optional, List

import numpy as np

from .helpers import asserts
from .helpers import utils
from .helpers.model_interface import ModelInterface
from quantus.helpers import asserts
from quantus.helpers import utils
from quantus.helpers.model.model_interface import ModelInterface


def evaluate(
20 changes: 1 addition & 19 deletions quantus/helpers/__init__.py
Original file line number Diff line number Diff line change
@@ -6,24 +6,6 @@

from importlib import util

# Import files dependent on package installations.
__EXTRAS__ = util.find_spec("captum") or util.find_spec("tf_explain")
__MODELS__ = util.find_spec("torch") or util.find_spec("tensorflow")

from .asserts import *
from .constants import *
from .norm_func import *
from .normalise_func import *
from .mosaic_func import *
from .loss_func import *
from .discretise_func import *
from .perturb_func import *
from .plotting import *
from .similarity_func import *
from .utils import *
from .warn_func import *

# Import files dependent on package installations.
if __MODELS__:
from .models import *
if __EXTRAS__:
from .explanation_func import *
3 changes: 2 additions & 1 deletion quantus/helpers/asserts.py
Original file line number Diff line number Diff line change
@@ -6,9 +6,10 @@
# You should have received a copy of the GNU Lesser General Public License along with Quantus. If not, see <https://www.gnu.org/licenses/>.
# Quantus project URL: <https://github.com/understandable-machine-intelligence-lab/Quantus>.

import numpy as np
from typing import Callable, Tuple, Sequence

import numpy as np


def attributes_check(metric):
"""
Loading