PreSeCoLM (Predicting Sensitive Concepts in Language Models)

This repository includes the implementation of some experiments in the scope of predicting sensitive concepts (protected attributes such as ethnicity or gender) in language models to enhance the models interpretability. It includes the code to reproduce the papers:

Sarah Schröder, Alexander Schulz and Barbara Hammer. "Evaluating Concept Discovery Methods for Sensitive Attributes in Language Models". Accepted at ESANN 2025.

Installation

TODO

Experiment Details

Currently Used Datasets

BIOS
TwitterAAE
Jigsaw Unintended Bias
CrowSPairs

Currently Supported Language Models

Huggingface Models (using this Wrapper)
OpenAI Embedding Models

Concept Prediction Methods

Concept Activation Vectors (CAV)
Concept Bottleneck Models (CBM)
Bias Subspaces (refering to semantic bias scores [1][2], our implementation is based on [1])

ESANN 2025 Experiments

TODO refer to branch/ other readme

Cite this

TODO

References

[1] "The SAME score: Improved cosine based bias score for word embeddings", Arxiv Paper, IEEE IJCNN Paper
[2] "Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings", Arxiv Paper, NIPS Paper

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
datasets		datasets
examples		examples
experiments		experiments
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PreSeCoLM (Predicting Sensitive Concepts in Language Models)

Installation

Experiment Details

Currently Used Datasets

Currently Supported Language Models

Concept Prediction Methods

ESANN 2025 Experiments

Cite this

References

About

Releases

Packages

Languages

License

HammerLabML/PreSeCoLM

Folders and files

Latest commit

History

Repository files navigation

PreSeCoLM (Predicting Sensitive Concepts in Language Models)

Installation

Experiment Details

Currently Used Datasets

Currently Supported Language Models

Concept Prediction Methods

ESANN 2025 Experiments

Cite this

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages