ArgusEyes is a system which allows data scientists to declaratively specify a variety of pipeline issues that they are concerned about. Subsequently, ArgusEyes can instrument, execute and screen the pipeline for the configured pipeline issues, as part of continuous integration processes. ArgusEyes detects complex issues by tracking record-level provenance and understanding the semantics of operations in ML pipelines. ArgusEyes was presented as an abstract at CIDR'22.
We provide three example scenarios (Note that you have to locally install ArgusEyes first to execute them). You can run ArgusEyes to execute the pipeline and screen it for a particular issue issue:
-
Detecting mislabeled images in a computer vision pipeline:
./eyes arguseyes/example_pipelines/mlinspect-computervision-sneakers-labelerrors.yaml
-
Spotting data leakage in a price prediction pipeline:
./eyes arguseyes/example_pipelines/mlflow-regression-nyctaxifare-dataleakage.yaml
-
Adressing fairness violations in a credit scoring pipeline:
./eyes arguseyes/example_pipelines/openml-classification-incomelevel-fairness.yaml
Prerequisite: Python 3.9
-
Clone this repository
-
Set up the environment
cd arguseyes
python3.9 -m venv venv
source venv/bin/activate
-
Install graphviz
Linux:
apt-get install graphviz
MAC OS:
brew install graphviz
-
Install pip dependencies
pip install -r requirements.txt