Skip to content

zhaoxlpku/SubgoalXL

Repository files navigation

SubgoalXL: Subgoal-based Expert Learning for Theorem Proving

Project Logo

Code License Data License

Introduction

This repository contains the code and resources for our AI-powered theorem-proving project, designed to tackle complex formal proofs using language models.

For more details, please refer to our paper.

Theorem-Proving Pipeline

Setup

To get started with model deployment and experimentation, follow these setup instructions.

1. Isabelle Environment Setup (TODO)

We provide a complete setup for the Isabelle environment to ensure smooth integration with the project’s scripts. Please refer to the Isabelle pipeline for detailed instructions.

2. Conda Environment Setup

Ensure that you have a conda environment with PyTorch and CUDA installed. Then, run the following command to set up the required environment:

cd project_directory
pip install -r requirements.txt

Data Resources

We provide curated datasets essential for initializing the expert learning phase. These datasets can be downloaded from our datasets hub. The full data preparation pipeline for generating these datasets is located in data preparation.

Training

We offer a comprehensive training pipeline for model development. Please refer to the training pipeline to get started.

Inference

Our inference pipeline is designed for applying models to new proof tasks efficiently. Please refer to the inference pipeline for instructions.

Verification

To ensure the correctness of generated proofs, we provide a verification pipeline. Details can be found in the verification pipeline.

Model Weights

We provide our model weights, which you can use directly for model deployment and experimentation. The weights can be downloaded from our model hub.

Citation

If you use our code or data, please cite our paper:

@article{zhao2024subgoalxl,
      title = {SubgoalXL: Subgoal-based Expert Learning for Theorem Proving},
      author = {Zhao, Xueliang and Zheng, Lin and Bo, Haige and Hu, Changran and Thakker, Urmish and Kong, Lingpeng},
      journal={arXiv preprint arXiv:2408.11172},
      url={https://arxiv.org/abs/2408.11172}, 
      year = {2024},
}

Acknowledgements

Our research was greatly accelerated by SambaNova Systems' advanced technology, which enabled us to surpass the 1000 tokens per second barrier for large language model inference. This capability was critical in generating complex formal statements and proofs with exceptional speed, leveraging the outstanding performance of the SN20, SN30, and SN40 series chips.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published