SubgoalXL: Subgoal-based Expert Learning for Theorem Proving

Introduction

This repository contains the code and resources for our AI-powered theorem-proving project, designed to tackle complex formal proofs using language models.

For more details, please refer to our paper.

Setup

To get started with model deployment and experimentation, follow these setup instructions.

1. Isabelle Environment Setup (TODO)

We provide a complete setup for the Isabelle environment to ensure smooth integration with the project’s scripts. Please refer to the Isabelle pipeline for detailed instructions.

2. Conda Environment Setup

Ensure that you have a conda environment with PyTorch and CUDA installed. Then, run the following command to set up the required environment:

cd project_directory
pip install -r requirements.txt

Data Resources

We provide curated datasets essential for initializing the expert learning phase. These datasets can be downloaded from our datasets hub. The full data preparation pipeline for generating these datasets is located in data preparation.

Training

We offer a comprehensive training pipeline for model development. Please refer to the training pipeline to get started.

Inference

Our inference pipeline is designed for applying models to new proof tasks efficiently. Please refer to the inference pipeline for instructions.

Verification

To ensure the correctness of generated proofs, we provide a verification pipeline. Details can be found in the verification pipeline.

Model Weights

We provide our model weights, which you can use directly for model deployment and experimentation. The weights can be downloaded from our model hub.

Citation

If you use our code or data, please cite our paper:

@article{zhao2024subgoalxl,
      title = {SubgoalXL: Subgoal-based Expert Learning for Theorem Proving},
      author = {Zhao, Xueliang and Zheng, Lin and Bo, Haige and Hu, Changran and Thakker, Urmish and Kong, Lingpeng},
      journal={arXiv preprint arXiv:2408.11172},
      url={https://arxiv.org/abs/2408.11172}, 
      year = {2024},
}

Acknowledgements

Our research was greatly accelerated by SambaNova Systems' advanced technology, which enabled us to surpass the 1000 tokens per second barrier for large language model inference. This capability was critical in generating complex formal statements and proofs with exceptional speed, leveraging the outstanding performance of the SN20, SN30, and SN40 series chips.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
data_preparation		data_preparation
datasets/std		datasets/std
inference		inference
misc		misc
training		training
utils		utils
verification		verification
.DS_Store		.DS_Store
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SubgoalXL: Subgoal-based Expert Learning for Theorem Proving

Introduction

Setup

1. Isabelle Environment Setup (TODO)

2. Conda Environment Setup

Data Resources

Training

Inference

Verification

Model Weights

Citation

Acknowledgements

About

Releases

Packages

Languages

License

zhaoxlpku/SubgoalXL

Folders and files

Latest commit

History

Repository files navigation

SubgoalXL: Subgoal-based Expert Learning for Theorem Proving

Introduction

Setup

1. Isabelle Environment Setup (TODO)

2. Conda Environment Setup

Data Resources

Training

Inference

Verification

Model Weights

Citation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages