This repository contains the experimental code to reproduce the results in Dataless Knowledge Fusion by Merging Weights of Language Models, a paper to be published during the Eleventh International Conference on Learning Representations (ICLR 2023), to be held May 1-5, 2023 in Kigali, Rwanda.
@inproceedings{
jin2023dataless,
title={Dataless Knowledge Fusion by Merging Weights of Language Models},
author={Xisen Jin and Xiang Ren and Daniel Preotiuc-Pietro and Pengxiang Cheng},
booktitle={The Eleventh International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=FCnohuR6AnM}
}
We used PyTorch 1.13.1. See requirements.txt for other requirements.
If you are just interested in the Regresssion Mean (RegMean) algorithm, please check regmean_demo.ipynb.
This is a standalone Jupyter notebook that merges two Hugging Face transformer models fine-tuned on GLUE. This file does not import files under src/
.
Please download the unified emotion dataset in this repo. The files should be placed under PROJECT_ROOT/resources/emotion_splits
in the following structure.
.
├── crowdflower
│ ├── dev.jsonl
│ ├── full.jsonl
│ ├── test.jsonl
│ └── train.jsonl
├── dailydialog
│ ├── dev.jsonl
│ ├── full.jsonl
│ ├── test.jsonl
│ └── train.jsonl
├── electoraltweets
│ ├── dev.jsonl
│ ├── full.jsonl
│ ├── test.jsonl
│ └── train.jsonl
├── emobank
│ ├── dev.jsonl
│ ├── full.jsonl
│ ├── test.jsonl
│ └── train.jsonl
...
Please prepare CoNLL2003, OntoNotes, and Twitter NER datasets and place them under PROJECT_ROOT/resources/ner
.
.
├── conll2003
│ ├── dev.conll
│ ├── test.conll
│ └── train.conll
├── ontonotes
│ ├── onto.development.bc.ner
│ ├── onto.development.bn.ner
│ ├── onto.development.mz.ner
│ ├── onto.development.nw.ner
│ ├── onto.development.tc.ner
│ ├── onto.development.wb.ner
│ ├── onto.test.bc.ner
│ ├── onto.test.bn.ner
│ ├── onto.test.mz.ner
│ ├── onto.test.nw.ner
│ ├── onto.test.tc.ner
│ ├── onto.test.wb.ner
│ ├── onto.train.bc.ner
│ ├── onto.train.bn.ner
│ ├── onto.train.mz.ner
│ ├── onto.train.nw.ner
│ ├── onto.train.tc.ner
│ └── onto.train.wb.ner
└── twitter
├── annotated.twitter-ner-20-21-tweet-dev-withcleaned.json
├── annotated.twitter-ner-20-21-tweet-test-withcleaned.json
└── annotated.twitter-ner-20-21-tweet-train-withcleaned.json
Here, CoNLL and OntoNotes datasets contain entries in the CoNLL format.
CRICKET O Conll
- O Conll
LEICESTERSHIRE B-ORG Conll
TAKE O Conll
OVER O Conll
AT O Conll
TOP O Conll
AFTER O Conll
INNINGS O Conll
VICTORY O Conll
. O Conll
LONDON B-LOC Conll
1996-08-30 O Conll
...
Twitter NER contains 1 JSON dict per line.
{"text": "Spectacular skies over #Clonmel tonight http://t.co/OxclQkuyTp /via @niallodonovan #lastdayofautumn", "id": "539106999980797952", "entities": [{"startCharOffset": 24, "endOffset": 31, "endCharOffset": 31, "surface": "Clonmel", "startOffset": 24, "type": "LOC"}, {"startCharOffset": 69, "endOffset": 82, "endCharOffset": 82, "surface": "niallodonovan", "startOffset": 69, "type": "PER"}], "labels": ["O", "O", "O", "O", "B-LOC", "O", "O", "O", "O", "B-PER", "O", "O"], "tokens": ["Spectacular", "skies", "over", "#", "Clonmel", "tonight", "http://t.co/OxclQkuyTp", "/", "via", "@niallodonovan", "#", "lastdayofautumn"], "domain": "TWT"}
GLUE datasets will be downloaded and loaded with Hugging Face's datasets
library.
Please download pretrained models (e.g., RoBERTa-base) from the Hugging Face models repository and place them under PROJECT_ROOT/resources
(e.g., PROJECT_ROOT/resources/roberta-base
).
-
--config_files
: See undersrc/configs
. The training module (src.run_experiments
) requires three config files defining default arguments (src/defaults.yaml
), data config (undersrc/configs/datasets
), and exp config (undersrc/configs/exps
). -
--filter_model
: Useful when merging only a subset of individual models specificed in data config, e.g.,--filter_model model0 model1
will perform pairwaise merging of model0 and model1 (see the definition of alias like model0, model1 in the data config). -
--templates
: config files may contain templates like{seed}
. The values of templates should be specified in command lines (e.g.,--templates seed=1
).
Individual models (before merging) will be trained and stored under local_zoo_dir
specified in the config. If none of the individual models in the zoo match the given model type and zoo_filter
arguments in the config, then the program will automatically train new individual models and store them under local_zoo_dir
. If individual models are found in local_zoo_dir
, they will be loaded without re-training.
Example: RegMean, Emotion, Same Head Init, Merginging Model0 (dailydialogue) and Model1 (crowdflower)
HF_DATASETS_OFFLINE=1 CUDA_VISIBLE_DEVICES=0 python -m src.run_experiments --config src/configs/defaults.yaml src/configs/datasets/emotion.yaml src/configs/exps/roberta-base/roberta-base-emotion.yaml --templates seed=1 --filter_model model0 model1
Merging two emotion classification models trained on different datasets (domains).
- Emotion, RoBERTa-base:
scripts/roberta/pairwise_emotion.py
- Emotion, T5-base:
scripts/t5/pairwise_emotion.py
- Emotion, RoBERTa-base:
scripts/t5/pairwise_emotion.py
Merging two models trained on different GLUE tasks. Task-specific classification heads are not merged.
- GLUE, DistilBERT-base:
scripts/distilbert/pairwise_glue_difftask.py
- GLUE, RoBERTa-base:
scripts/roberta/pairwise_glue_difftask.py
Merging two models trained on two non-IID partitions of the same GLUE task
- GLUE, DistilBERT-base:
scripts/distilbert/pairwise_glue_subset.py
- GLUE, RoBERTa-base:
scripts/roberta/pairwise_glue_subset.py
Greedily merging multiple (two to all) models in the order of OOD performance of individual models.
- Emotion, RoBERTa-base:
scripts/roberta/incremental_emotion.py
- Emotion, T5-base:
scripts/t5/incremental_emotion.py
- Emotion, DeBERTa-large:
scripts/deberta/incrementale_emotion.py
- NER, RoBERTa-base:
scripts/roberta/incremental_ner.py
- NER, DeBERTa-large:
scripts/deberta/incremental_ner.py
Please note these scripts run inference on both in-domain and out-of-domain test sets.
Each script above will run Simple, Fisher, and RegMean averaging. They also run the Multi-Task Learning (MTL), model ensemble, and the performance of individual models (before merging) as comparators. You can comment out lines inside these scripts to just run part of each one.
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.
This project has adopted a Code of Conduct. If you have any concerns about the Code, or behavior which you have experienced in the project, please contact us at [email protected].
If you believe you have identified a security vulnerability in this project, please send an email to the project team at [email protected] detailing the suspected issue and any methods you've found to reproduce it.
Please do NOT open an issue in the GitHub repository, as we'd prefer to keep vulnerability reports private until we've had an opportunity to review and address them.