Skip to content

HammerLabML/PreSeCoLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PreSeCoLM (Predicting Sensitive Concepts in Language Models)

This repository includes the implementation of some experiments in the scope of predicting sensitive concepts (protected attributes such as ethnicity or gender) in language models to enhance the models interpretability. It includes the code to reproduce the papers:

  • Sarah Schröder, Alexander Schulz and Barbara Hammer. "Evaluating Concept Discovery Methods for Sensitive Attributes in Language Models". Accepted at ESANN 2025.

Installation

TODO

Experiment Details

Currently Used Datasets

  • BIOS
  • TwitterAAE
  • Jigsaw Unintended Bias
  • CrowSPairs

Currently Supported Language Models

  • Huggingface Models (using this Wrapper)
  • OpenAI Embedding Models

Concept Prediction Methods

  • Concept Activation Vectors (CAV)
  • Concept Bottleneck Models (CBM)
  • Bias Subspaces (refering to semantic bias scores [1][2], our implementation is based on [1])

ESANN 2025 Experiments

TODO refer to branch/ other readme

Cite this

TODO

References

[1] "The SAME score: Improved cosine based bias score for word embeddings", Arxiv Paper, IEEE IJCNN Paper
[2] "Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings", Arxiv Paper, NIPS Paper

About

Predicting Sensitive Concepts in Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages