Manipulated Knowledge Spread

Reproduction Code for Paper "Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities". The preprint of our paper is publicly available at this link.

Requirements

Install the required python dependencies:

pip install -r requirements.txt

Datasets

All datasets we used are provided in the data/ folder, including CounterFact (1K), zsRE (1K) and their toxic versions.

Instructions

Baseline

Perform and evaluate knowledge editing on the CounterFact (1K) dataset using vicuna 7B without any multi-agent interaction:

python baseline_easyedit.py --config_path=../config/agent/vicuna-7b.yaml

Intuition Verification

We request the agents and GPT-4 to generate fake but plausible evidence for all manipulated evidence in the data/ folder.

Evaluate the extent to which single agent is persuaded under different prompt settings:

python baseline_prompt_edit.py --config_path=../config/agent/vicuna-7b.yaml --prompt_type=no_edit

python baseline_prompt_edit.py --config_path=../config/agent/vicuna-7b.yaml --prompt_type=direct_answer --with_evidence

Attack Pipeline

Stage 1: Persuasiveness Injection

To inject persuasiveness into the agent, you should first generate perference data for the LLM:

python generate_dataset.py

We encourage to generate different perference datasets for different LLMs, which minimizes the impact of the LLMs.

Then we use the DPO method for training:

python dpo_training.py

You can modify ckpt_path to adjust the LoRA model path, which will be used in the second stage.

Stage 2: Manipulated Knowledge Injection

Training

As a running example, the script for testing the results of manipulated knowledge spread on the CounterFact (1K) dataset using vicuna 7B is as follows:

python simulation.py --config_path=../config/agent/vicuna-7b.yaml

All chats will be stored in history/ for subsequent experimental analyses. For other experimental setups, you can modify the corresponding yaml file in config/.

RAG Scenario

Format chat histories

python format_rag.py --dataset_path=./counterfact/counterfact-edit-1k.json --input_folder=<chat_history_directory>

RAG training

Evaluation

python baseline_prompt_edit.py --config_path=../config/agent/vicuna-7b.yaml --prompt_type=rag --rag_path=<path_to_rag> --top_k=5

Defense Methodology

Method 1: System Prompts for Enhanced Vigilance

To implement this defense strategy, please proceed with the following replacement in simulation folder: replace the code in prompt.py with the code from prompt_defense.py. Once the replacement is complete, execute the code following the steps in the Attack Pipeline.

Method 2: Supervisory Agents for Interaction Monitoring

To implement this defense strategy, please proceed with the following replacements in simulation folder: replace the code in agent.py with the code from agent_supervision.py, the code in history.py with the code from history_supervision.py, the code in prompt.py with the code from prompt_supervision.py, and the code in simulation.py with the code from simulation_supervision.py.

Then, to use GPT-4 chat assistant as the supervisory agent, please set system prompt of your assistant as the content in system_prompt.txt, then configure your API key and URL in set_gpt4.py.

Once these replacements and settings are complete, execute the code following the steps in the Attack Pipeline.

License

This project is licensed under the Apache-2.0 License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Manipulated Knowledge Spread

Requirements

Datasets

Instructions

Baseline

Intuition Verification

Attack Pipeline

Stage 1: Persuasiveness Injection

Stage 2: Manipulated Knowledge Injection

Training

RAG Scenario

Defense Methodology

Method 1: System Prompts for Enhanced Vigilance

Method 2: Supervisory Agents for Interaction Monitoring

License

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
config		config
data		data
figures		figures
simulation		simulation
README.md		README.md
requirements.txt		requirements.txt
system_prompt.txt		system_prompt.txt

Jometeorie/KnowledgeSpread

Folders and files

Latest commit

History

Repository files navigation

Manipulated Knowledge Spread

Requirements

Datasets

Instructions

Baseline

Intuition Verification

Attack Pipeline

Stage 1: Persuasiveness Injection

Stage 2: Manipulated Knowledge Injection

Training

RAG Scenario

Defense Methodology

Method 1: System Prompts for Enhanced Vigilance

Method 2: Supervisory Agents for Interaction Monitoring

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages