Skip to content

SiyuanWangw/ULogic

Repository files navigation

Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs [Paper]

Illustration of Logic Scaffolding

This repository hosts the codes of our logic scaffolding inferential rule generation framework for primitive rule generation and rule composition, and the data of our generated inferential rule base ULogic.

Running the rule generation and probing

You can directly run primitive_rule_pipeline.ipynb and rule_composition.ipynb respectively for primitive rule generation and rule composition in object interaction domain (for illustration). The functions (or step number and name) of each module in our scripts are clearly commentted before each section.

For rule probing in terms of different questions in Section 3.2, please run limit_analysis.ipynb.

Before running, you need to set your API key for OpenAI.

Dataset

Data/ulogic.json provides all inferential rules generated by our framework while Data/probing_subset.json provides a high-quality author verified subset of ULogic for rule probing of LLMs. Each rule is formatted as following:

{
   "s_rule": "CanObserve(Person X, Animal Y):- MoveTo(Person X, Region Z), Born(Animal Y, Region Z);",
   "v_rule": "If Person X moves to Region Z and Animal Y is born in Region Z, then Person X can observe Animal Y.",
   "domain": "accessibility",
   "depth": 0,
   "length": 2,
   "positive": true,
   "structure": "disjunctive",
   "label": true,
   "original_human_prediction": "2",
   "flipped_human_prediction": "1"
}

"label": true denotes that this instance is expert verified (by paper authors).

"original_human_prediction" and "flipped_human_prediction" are the probing results of AMT human over the original rules and their flipped counterparts (The definition can be found in the second paragraph of Sec. 3.1.). "original_human_prediction": "2" and "flipped_human_prediction": "1" mean that the original rule is classified as False/Wrong/No by human while the flipped rule is classified as True/Right/Yes.

Rule Distillation as Inference Engine

The training and validation data for rule distillation are provided in the RuleDistillation/Data/general folder. (training_data_d0.json is the training data while training_data_5000.json is the validationd data.)

  1. For the training of Inference Engine, please run the following script:
cd RuleDistillation
bash run_finetune.sh
  1. During the inference stage, please run this script (The test data are provided in the RuleDistillation/Data/human/processed folder, respectively for conlcusion generation, premise completion and premise generation):
bash run_generate.sh

Demo

We provide a demo for our inference engine.

Considering the security of external access ports on the server, our demo is no longer accessible.
We have uploaded the streamlit implementation of our web demo. You can train a model following above rule distillation, and run

CUDA_VISIBLE_DEVICES=0 streamlit run web_demo.py --server.fileWatcherType none --server.port 8888

to run a demo in your local server.

Authors and Citation

This study was authored by Siyuan Wang, Zhongyu Wei, Yejin Choi and Xiang Ren. We encourage the use of our code and data in your research and kindly request citation of our paper as follows:

@article{wang2024can,
  title={Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs},
  author={Wang, Siyuan and Wei, Zhongyu and Choi, Yejin and Ren, Xiang},
  journal={arXiv preprint arXiv:2402.11442},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published