Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Global Actions in A Nutshell For Counterfactual Explainability (GLANCE) framework #196

Open
wants to merge 26 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
9a84a57
Glance Initial Commit
ntheol Oct 10, 2024
2a63966
deleted unnecessary code from node.py and iterative_merges.py
ntheol Oct 14, 2024
a20f1b0
Added compas_data folder and class for COMPASDataset
ntheol Oct 14, 2024
51f421b
added documentation
ntheol Oct 15, 2024
15a7426
tests initial commit under \tests\glance
ntheol Oct 15, 2024
e76774f
Updated setup.py with GLANCE requirements
ntheol Oct 15, 2024
03733a4
updated documentation on iterative_merges class
ntheol Oct 15, 2024
ec0b1d4
- Changed Iterative Merges to C_GLANCE
ntheol Oct 15, 2024
9d35838
Added Adult Dataset and preprocess code
ntheol Oct 15, 2024
edaa002
deleted Compas demo
ntheol Oct 15, 2024
99e550b
Dropped Fnlwgt feature in data prerpocessing
ntheol Oct 16, 2024
6eca838
deleted unused code
ntheol Oct 16, 2024
49065d8
changed fig height and width of output graph on the tree
ntheol Oct 16, 2024
755f9be
Updated demo notebook
ntheol Oct 16, 2024
0951a8e
Updated demo notebook
ntheol Oct 16, 2024
36b041b
Updated demo notebook
ntheol Oct 16, 2024
23843ce
Updated demo notebook
ntheol Oct 16, 2024
4026523
updated link to adult dataset
ntheol Oct 16, 2024
3f4c854
Minor Type Annotation Fix
ntheol Oct 16, 2024
ae6baf8
Added link to the repsective paper in notebook description
ntheol Oct 21, 2024
497de6f
Updated build.yml with glance tests
ntheol Jan 9, 2025
fae93ee
commented out other jobs --signoff
ntheol Jan 9, 2025
fef8aee
Updated Glance requirements
ntheol Jan 9, 2025
e109887
Updated setup.py
ntheol Jan 9, 2025
9bf4ec9
Updated Build.yml
ntheol Jan 9, 2025
3516dc2
Updated Build.yml
ntheol Jan 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions .github/workflows/Build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -448,3 +448,57 @@ jobs:

- name: Step 5 - Test GroupedCEExplainer
run: python ./tests/gce/test_gce.py

build-glance-on-py310:
# The type of runner that the job will run on
runs-on: "${{ matrix.os }}"
strategy:
matrix:
#os: [ubuntu-18.04, ubuntu-latest, macos-latest, windows-latest]
os: [ubuntu-20.04, macos-latest, windows-latest]
python-version: ["3.10"]

# Steps represent a sequence of tasks that will be executed as part of the job
steps:
- name: Step 1 - checkout aix360 repository
uses: actions/checkout@v3

- name: Step 2 - set up python version
uses: actions/setup-python@v4
with:
python-version: "${{ matrix.python-version }}"

- name: Step 3 - upgrade setuptools
run: pip3 install pytest nbmake wheel --upgrade setuptools

- name: Step 4 - Install aix360 with dipvae algorithm related dependencies
run: pip3 install .[glance]

- name: Step 5 - Test Base
run: pytest ./tests/glance/test_base.py

- name: Step 6 - Test Counterfactual Costs
run: pytest ./tests/glance/test_counterfactual_costs.py

- name: Step 7 - Test Counterfactual Tree
run: pytest ./tests/glance/test_counterfactual_tree.py

- name: Step 8 - Test Iterative Merges
run: pytest ./tests/glance/test_iterative_merges.py

- name: Step 9 - Test KMeans
run: pytest ./tests/glance/test_KMeans.py

- name: Step 10 - Test Local Cfs
run: pytest ./tests/glance/test_local_cfs.py

- name: Step 11 - Test Node
run: pytest ./tests/glance/test_node.py

- name: Step 12 - Test Phase2
run: pytest ./tests/glance/test_phase2.py

- name: Step 13 - Test Utils
run: pytest ./tests/glance/test_utils.py


Empty file.
115 changes: 115 additions & 0 deletions aix360/algorithms/glance/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
from abc import ABC, abstractmethod
import pandas as pd
import numpy as np


class ClusteringMethod(ABC):
"""
Abstract base class for clustering methods.
"""

def __init__(self):
"""
Initialize the ClusteringMethod.
"""
pass

@abstractmethod
def fit(self, data: pd.DataFrame):
"""
Fit the clustering model on the given data.

Parameters:
- data (pd.DataFrame): DataFrame of input data to fit the model.
"""
pass

@abstractmethod
def predict(self, instances: pd.DataFrame) -> np.ndarray:
"""
Predict the cluster labels for the given instances.

Parameters:
- instances (pd.DataFrame): DataFrame of input instances.

Returns:
- cluster_labels (np.ndarray): Array of cluster labels for each instance.
"""
pass


class LocalCounterfactualMethod(ABC):
"""
Abstract base class for local counterfactual methods.
"""

def __init__(self):
"""
Initialize the LocalCounterfactualMethod.
"""
pass

@abstractmethod
def fit(self, **kwargs):
"""
Fit the counterfactual method.

Parameters:
- **kwargs: Additional keyword arguments for fitting.
"""
pass

@abstractmethod
def explain_instances(
self, instances: pd.DataFrame, num_counterfactuals: int
) -> pd.DataFrame:
"""
Find the local counterfactuals for the given instances.

Parameters:
- instances (pd.DataFrame): DataFrame of input instances for which counterfactuals are desired.
- num_counterfactuals (int): Number of counterfactuals to generate for each instance.

Returns:
- counterfactuals (pd.DataFrame): DataFrame of counterfactual instances.
"""
pass


class GlobalCounterfactualMethod(ABC):
"""
Abstract base class for global counterfactual methods.
"""

def __init__(self, **kwargs):
"""
Initialize the LocalCounterfactualMethod.

Parameters:
- **kwargs: Additional keyword arguments for init.
"""
pass

@abstractmethod
def fit(self, X, y, **kwargs):
"""
Fit the counterfactual method.

Parameters:
- **kwargs: Additional keyword arguments for fitting.
"""
pass

@abstractmethod
def explain_group(self, instances: pd.DataFrame) -> pd.DataFrame:
"""
Find the global counterfactuals for the given group of instances.

Parameters:
- instances (pd.DataFrame, optional): DataFrame of input instances for which global counterfactuals are desired.
If None, explain the whole group of affected instances.

Returns:
- counterfactuals (pd.DataFrame): DataFrame of counterfactual instances.
"""
pass
1 change: 1 addition & 0 deletions aix360/algorithms/glance/clustering/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .kmeans import KMeansMethod
60 changes: 60 additions & 0 deletions aix360/algorithms/glance/clustering/kmeans.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
from ..base import ClusteringMethod
from sklearn.cluster import KMeans


class KMeansMethod(ClusteringMethod):
"""
Implementation of a clustering method using KMeans.

This class provides an interface to apply KMeans clustering to a dataset.
"""

def __init__(self, num_clusters, random_seed):
"""
Initializes the KMeansMethod class.

Parameters:
----------
num_clusters : int
The number of clusters to form as well as the number of centroids to generate.
random_seed : int
A seed for the random number generator to ensure reproducibility.
"""

self.num_clusters = num_clusters
self.random_seed = random_seed
self.model = KMeans()

def fit(self, data):
"""
Fits the KMeans model on the provided dataset.

Parameters:
----------
data : array-like or sparse matrix, shape (n_samples, n_features)
Training instances to cluster.

Returns:
-------
None
"""
self.model = KMeans(
n_clusters=self.num_clusters, n_init=10, random_state=self.random_seed
)
self.model.fit(data)

def predict(self, instances):
"""
Predicts the nearest cluster each sample in the provided data belongs to.

Parameters:
----------
instances : array-like or sparse matrix, shape (n_samples, n_features)
New data to predict.

Returns:
-------
labels : array, shape (n_samples,)
Index of the cluster each sample belongs to.
"""
return self.model.predict(instances)
58 changes: 58 additions & 0 deletions aix360/algorithms/glance/counterfactual_costs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
from typing import Callable, List, Dict
import numpy as np
import pandas as pd


def build_dist_func_dataframe(
X: pd.DataFrame,
numerical_columns: List[str],
categorical_columns: List[str],
n_bins: int = 10,
) -> Callable[[pd.DataFrame, pd.DataFrame], pd.Series]:
"""
Builds and returns a custom distance function for computing distances between rows of two DataFrames based on specified numerical and categorical columns.

For numerical columns, the values are first binned into intervals based on the provided number of bins (`n_bins`).
The distance between numerical features is computed as the sum of the absolute differences between binned values. For categorical columns, the distance is calculated as the number of mismatched categorical values.

Parameters:
----------
X : pd.DataFrame
The reference DataFrame used to determine the bin intervals for numerical columns.
numerical_columns : List[str]
List of column names in `X` that contain numerical features.
categorical_columns : List[str]
List of column names in `X` that contain categorical features.
n_bins : int, optional
The number of bins to use when normalizing numerical columns, by default 10.

Returns:
-------
Callable[[pd.DataFrame, pd.DataFrame], pd.Series]
A distance function that takes two DataFrames as input (`X1` and `X2`) and returns a Series of distances between corresponding rows in `X1` and `X2`.

The distance function works as follows:
- For numerical columns: the absolute differences between binned values are summed.
- For categorical columns: the number of mismatches between values is counted.
"""
feat_intervals = {
col: ((max(X[col]) - min(X[col])) / n_bins) for col in numerical_columns
}

def bin_numericals(instances: pd.DataFrame):
ret = instances.copy()
for col in numerical_columns:
ret[col] /= feat_intervals[col]
return ret

def dist_f(X1: pd.DataFrame, X2: pd.DataFrame) -> pd.Series:
X1 = bin_numericals(X1)
X2 = bin_numericals(X2)

ret = (X1[numerical_columns] - X2[numerical_columns]).abs().sum(axis="columns")
ret += (X1[categorical_columns] != X2[categorical_columns]).astype(int).sum(axis="columns")

return ret

return dist_f

Empty file.
Loading
Loading