-
Notifications
You must be signed in to change notification settings - Fork 50
/
19-redteam.Rmd
143 lines (109 loc) · 12.2 KB
/
19-redteam.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
# MI²RedTeam {.unnumbered #mi2redteam}
<img src="images/intro-redteam.png" style="width: 95%;">
**MI²<span style="color:red"><strong>Red</strong></span>Team** analyses machine and deep learning predictive models through the lens of AI explainability, fairness, security and human trust. We develop methods and tools for explanatory model analysis and apply them in practice.
**MI²<span style="color:red"><strong>Red</strong></span>Team** is a group of researchers experienced in [XAI](https://doi.org/10.1007/978-3-031-04083-2_2) who perform a rigorous evaluation of AI solutions in order to improve their transparency and security. We apply state-of-the-art methods and introduce new ones to tailor our analysis to the specific predictive task.
> We openly collaborate on various topics related to explainable and interpretable machine learning. Feel free to [reach out to us](mailto:[email protected]) with research ideas and development opportunities. **We help organizations to better understand the vulnerabilities of their AI systems, and take steps to mitigate them.**
#### Red-Teaming LLMs {-}
<div>
<img src="images/paper_ramai.png">
<a href="https://doi.org/10.48550/arXiv.2404.14230">Resistance Against Manipulative AI: key factors and possible actions</a>
<p>Piotr Wilczyński, Wiktoria Mieleszczenko-Kowszewicz, Przemysław Biecek</p>
<p><strong>ECAI (2024)</strong></p>
We describe the results of two experiments designed to determine what characteristics of humans are associated with their susceptibility to LLM manipulation, and what characteristics of LLMs are associated with their manipulativeness potential. We explore human factors by conducting user studies in which participants answer general knowledge questions using LLM-generated hints, whereas LLM factors by provoking language models to create manipulative statements. In the long term, we put AI literacy at the forefront, arguing that educating society would minimize the risk of manipulation and its consequences.
</div>
#### GCD {-}
<div>
<img src="images/paper_gcd.png" align="left">
<a href="https://doi.org/10.48550/arXiv.2404.12488">Global Counterfactual Directions</a>
<p>Bartlomiej Sobieski, Przemysław Biecek</p>
<p><strong>ECCV (2024)</strong></p>
We discover that the latent space of Diffusion Autoencoders encodes the inference process of a given classifier in the form of global directions. We propose a novel proxy-based approach that discovers two types of these directions with the use of only single image in an entirely black-box manner. Precisely, g-directions allow for flipping the decision of a given classifier on an entire dataset of images, while h-directions further increase the diversity of explanations.
</div>
#### Red-Teaming SAM {-}
<div>
<img src="images/redteaming_sam.png" align="left">
<a href="https://doi.org/10.48550/arXiv.2404.02067">Red-Teaming Segment Anything Model</a>
<p> Krzysztof Jankowski, Bartlomiej Sobieski, Mateusz Kwiatkowski, Jakub Szulc, Michal Janik, Hubert Baniecki, Przemyslaw Biecek</p>
<p><strong>CVPR Workshops (2024)</strong></p>
The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis of SAM. We analyze the impact of style transfer on segmentation masks. We assess whether the model can be used for attacks on privacy, such as recognizing celebrities' faces. Finally, we check how robust the model is to adversarial attacks on segmentation masks under text prompts.
</div>
#### Red-Teaming HSI {-}
<div>
<img src="images/redteaming_hsi.png" align="left">
<a href="https://doi.org/10.48550/arXiv.2403.08017">Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI</a>
<p> Vladimir Zaigrajew, Hubert Baniecki, Lukasz Tulczyjew, Agata M. Wijata, Jakub Nalepa, Nicolas Longépé, Przemyslaw Biecek</p>
<p><strong>ICLR Workshops (2024)</strong></p>
Remote sensing applications require machine learning models that are reliable and robust, highlighting the importance of red teaming for uncovering flaws and biases. We introduce a novel red teaming approach for hyperspectral image analysis, specifically for soil parameter estimation in the Hyperview challenge. Utilizing SHAP for red teaming, we enhanced the top-performing model based on our findings. Additionally, we introduced a new visualization technique to improve model understanding in the hyperspectral domain.
</div>
#### Adversarial attacks and defenses for XAI {-}
<div>
<img src="images/advxai.png" align="left">
<a href="https://doi.org/10.1016/j.inffus.2024.102303">Adversarial attacks and defenses in explainable artificial intelligence: A survey</a>
<p> Hubert Baniecki, Przemysław Biecek</p>
<p><strong>Information Fusion (2024)</strong></p>
Explanations of machine learning models can be manipulated. We introduce a unified notation and taxonomy of adversarial attacks on explanations. Adversarial examples, data poisoning, and backdoor attacks are key safety issues in XAI. Defense methods like model regularization improve the robustness of explanations. We outline the emerging research directions in adversarial XAI.
</div>
#### Software: survex {-}
<div>
<img src="images/survex.png" align="left">
<a href="https://doi.org/10.1093/bioinformatics/btad723">survex: an R package for explaining machine learning survival models</a>
<p> Mikołaj Spytek, Mateusz Krzyziński, Sophie Hanna Langbein, Hubert Baniecki, Marvin N Wright, Przemysław Biecek</p>
<p><strong>Bioinformatics (2023)</strong></p>
This paper demonstrates the functionalities of the <a href="https://github.com/ModelOriented/survex">survex package</a>, which provides a comprehensive set of tools for explaining machine learning survival models. The capabilities of the proposed software encompass understanding and diagnosing survival models, which can lead to their improvement. By revealing insights into the decision-making process, such as variable effects and importances, survex enables the assessment of model reliability and the detection of biases. Thus, promoting transparency and responsibility in sensitive areas.
</div>
#### SurvSHAP(t) {-}
<div>
<img src="images/paper_survshap.png" align="left">
<a href="https://doi.org/10.1016/j.knosys.2022.110234 ">SurvSHAP(t): Time-dependent explanations of machine learning survival models</a>
<p>Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek</p>
<p><strong>Knowledge-Based Systems (2023)</strong></p>
In this paper, we introduce SurvSHAP(t), the first time-dependent explanation that allows for interpreting survival black-box models. The proposed methods aim to enhance precision diagnostics and support domain experts in making decisions. SurvSHAP(t) is model-agnostic and can be applied to all models with functional output. We provide an accessible implementation of time-dependent explanations in Python at <a href="https://github.com/MI2DataLab/survshap">this URL</a>.
</div>
#### IEMA {-}
<div>
<img src="images/paper_iema.png" align="left">
<a href="https://doi.org/10.1007/s10618-023-00924-w ">The grammar of interactive explanatory model analysis</a>
<p>Hubert Baniecki, Dariusz Parzych, Przemyslaw Biecek</p>
<p><strong>Data Mining and Knowledge Discovery (2023)</strong></p>
This paper proposes how different Explanatory Model Analysis (EMA) methods complement each other and discusses why it is essential to juxtapose them. The introduced process of Interactive EMA (IEMA) derives from the algorithmic side of explainable machine learning and aims to embrace ideas developed in cognitive sciences. We formalize the grammar of IEMA to describe human-model interaction. We conduct a user study to evaluate the usefulness of IEMA, which indicates that an interactive sequential analysis of a model may increase the accuracy and confidence of human decision making.
</div>
#### Software: fairmodels {-}
<div>
<img src="images/mini_fairmodels.png" align="left">
<a href="http://doi.org/10.32614/RJ-2022-019">fairmodels: a Flexible Tool for Bias Detection, Visualization, and Mitigation in Binary Classification Models</a>
<p>Jakub Wiśniewski, Przemyslaw Biecek</p>
<p><strong>The R Journal (2022)</strong></p>
This article introduces an R package fairmodels that helps to validate fairness and eliminate bias in binary classification models quickly and flexibly. It offers a model-agnostic approach to bias detection, visualization, and mitigation. The implemented functions and fairness metrics enable model fairness validation from different perspectives. In addition, the package includes a series of methods for bias mitigation that aim to diminish the discrimination in the model. The package is designed to examine a single model and facilitate comparisons between multiple models.
</div>
#### Fooling PDP {-}
<div>
<img src="images/foolingpd.png" align="left">
<a href="https://doi.org/10.1007/978-3-031-26409-2_8">Fooling Partial Dependence via Data Poisoning</a>
<p>Hubert Baniecki, Wojciech Kretowicz, Przemyslaw Biecek</p>
<p><strong>ECML PKDD (2022)</strong></p>
We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications where auditability became a must-have trait supporting black-box machine learning. The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic and gradient algorithms.
</div>
#### Fooling SHAP {-}
<div>
<img src="images/manipulatingshap.png" align="left">
<a href="https://doi.org/10.1609/aaai.v36i11.21590">Manipulating SHAP via Adversarial Data Perturbations (Student Abstract)</a>
<p>Hubert Baniecki, Przemyslaw Biecek</p>
<p><strong>AAAI Conference on Artificial Intelligence (2022)</strong></p>
We introduce a model-agnostic algorithm for manipulating SHapley Additive exPlanations (SHAP) with perturbation of tabular data. It is evaluated on predictive tasks from healthcare and financial domains to illustrate how crucial is the context of data distribution in interpreting machine learning models. Our method supports checking the stability of the explanations used by various stakeholders apparent in the domain of responsible AI; moreover, the result highlights the explanations' vulnerability that can be exploited by an adversary.
</div>
#### Models in the Wild {-}
<div>
<img src="images/paper_wildnlp.png" align="left">
<a href="https://link.springer.com/chapter/10.1007%2F978-3-030-36718-3_20">Models in the Wild: On Corruption Robustness of Neural NLP Systems</a>
<p>Barbara Rychalska, Dominika Basaj, Alicja Gosiewska, Przemyslaw Biecek</p>
<p><strong>International Conference on Neural Information Processing (2019)</strong></p>
In this paper we introduce WildNLP - a framework for testing model stability in a natural setting where text corruptions such as keyboard errors or misspelling occur. We compare robustness of deep learning models from 4 popular NLP tasks: Q&A, NLI, NER and Sentiment Analysis by testing their performance on aspects introduced in the framework. In particular, we focus on a comparison between recent state-of-the-art text representations and non-contextualized word embeddings. In order to improve robustness, we perform adversarial training on selected aspects and check its transferability to the improvement of models with various corruption types. We find that the high performance of models does not ensure sufficient robustness, although modern embedding techniques help to improve it.
</div>
#### Software: auditor {-}
<div>
<img src="images/paper_auditor.png" align="left">
<a href="https://doi.org/10.32614/RJ-2019-036">auditor: an R Package for Model-Agnostic Visual Validation and Diagnostics</a>
<p>Alicja Gosiewska, Przemyslaw Biecek</p>
<p><strong>The R Journal (2019)</strong></p>
This paper describes methodology and tools for model-agnostic auditing. It provides functinos for assessing and comparing the goodness of fit and performance of models. In addition, the package may be used for analysis of the similarity of residuals and for identification of outliers and influential observations. The examination is carried out by diagnostic scores and visual verification. The code presented in this paper are implemented in the auditor package. Its flexible and consistent grammar facilitates the validation models of a large class of models.
</div>