Rango

Rango is a neural proof synthesis tool for the Coq theorem prover (see paper). Rango uses retrieval augmentation to adapt to its environment. This repository contains the code required for:

Processing data to train Rango, proof retrievers and lemma retrievers
Training Rango, proof retrievers and lemma retrievers
Running Rango on proofs in CoqStoq
Evaluating Rango on CoqStoq's testing set

CoqStoq Dataset

CoqStoq is a benchmark for evaluating proof sythesis tools for Coq.
You can access the CoqStoq repository here. The CoqStoq repository simply enumerates the theorems in the CoqStoq benchmark and provides an environment for testing proof synthesis tools.

Artifact

The easiest way to replicate our project is using the replication package provided here . Once you download the replication package, you should follow ARTIFACT.md to build a docker image, and run replication commands in a docker container.

Source Code Documentation

You can find a high-level overview of the source code in MAP.md.

Local Setup

The following instructions apply if you want to set up this repository without docker. Note that we ran most of our experiments using SLURM on a cluster. If you do not have slurm, or you do not have access to GPUs, you will only be able to run a subset of the following commands.

Setup

Install Dependencies:
1. Install repo: git clone --recurse-submodules https://github.com/rkthomps/coq-modeling pip3 install -e . cd coqpyt pip3 install . cd ../CoqStoq pip3 install -e .
2. Install opam and build CoqStoq Projects: Refer to the CoqStoq README

Running Rango on a CoqStoq Project

Ensure you have the Rango model downloaded (TODO: Put Rango model on huggingface.)
Ensure you have CoqStoq properly built cd CoqStoq then pytest
Ensure the CoqStoq data is arranged as follows: - Rango assumes that data has the following directory structure during evaluation: <name-of-dataset> /data_points /repos <name-of-dataset>-sentences.db - The data_points folder contains DatasetFile objects that let Rango know what premises and proofs are available in the context when synthesizing a proof. There is one file in this folder for every .v file in the repos folder.
- The repos folder contains all of the repositories that have the theorems on which Rango will be evaluated.

**Example**  
Suppose I wanted to create a dataset called "coqstoq-test" for example comprised of the theorems from the testing split of CoqStoq. I would do the following:  
1. `mkdir coqstoq-test` 
2. `ln -s CoqStoq/test-repos coqstoq-test/repos`
3. ``` 
    rm -rf ~/.cache/coqpyt_cache
    python3 scripts/create_coqstoq_data_points CoqStoq test coqstoq-test/data_points coqstoq-test/coqstoq-test-sentences.db
    ```
    - "CoqStoq" is the path to the CoqStoq repository
    - "test" is the split of CoqStoq for which we want to create data 
    - "coqstoq-test/data_points" is where we want to save the data points 
    - "coqstoq-test-sentences.db" is where we want to save the sentencedb (Contains shared premises between files.)

Running the Evaluation

You can run Rango on a dataset like the one above with either of the following scripts:

python3 src/evaluation/eval.py \
--conf_loc=example_confs/eval/coqstoq-test.yaml \
--slurm_conf=example_confs/slurm/gpu8.yaml

python3 src/evaluation/eval.py \
--conf_loc=example_confs/eval/coqstoq-test.yaml \
--n_workers=1

The prior requires access to a slurm cluster. The latter will run the evaluation with one worker.
Note that the configuration for the evaluation is in the file example_confs/eval/coqstoq-test.yaml. Depending on what you are evaluating, it is likely you will have to change paths in this configuration file.

Processing Data

Make sure you have a copy of the CoqStoq data_points files in the raw-data/coq-dataset/data_points subdirectory of your project. Then, with access to a slurm cluster, you may preprocess your dataset by running the command: bash slurm/example-jobs/create_dataset.sh. This command creates a dataset following a configuration file specified by a constant in the script. Example configuration files can be found in example-confs/data/lm.yaml, example-confs/data/premise.yaml, and example-confs/data/rerank.yaml for tactic generation, dense retrieval, and reranking respectively.

Before using your processed data to train models you must "consolidate it" into sqlite databases. You can consolidate a dataset as follows: python3 src/data_management/consolidate.py <split location> <dataset location> <new dataset location> Split location is likely splits/final-split.json, but you can also use an inter-file split: splits/random-split.json. Consolidating will create a directory with a train.db val.db and test.db file with training, validation and testing examples.

Doing Training

You can train a model by running sbatch slurm/example-jobs/train_decoder.sh This commmand will use the configuration file stored in confs/train/decoder.yaml. Example configuration files for training can be found in example-confs/train/decoder.yaml You can also train dense retrival models and rerankers with the train_select.sh and train_rerank.sh scripts in the slurm/example-jobs directory.

Name		Name	Last commit message	Last commit date
Latest commit History 553 Commits
CoqStoq @ 04e22ff		CoqStoq @ 04e22ff
coqpyt @ 0dccf21		coqpyt @ 0dccf21
example-confs		example-confs
notebooks		notebooks
results		results
samples		samples
scripts		scripts
slurm/example-jobs		slurm/example-jobs
splits		splits
src		src
switches		switches
test		test
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
ARTIFACT.md		ARTIFACT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MAP.md		MAP.md
Makefile		Makefile
README.md		README.md
final_split.yaml		final_split.yaml
paper.pdf		paper.pdf
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rango

CoqStoq Dataset

Artifact

Source Code Documentation

Local Setup

Setup

Running Rango on a CoqStoq Project

Running the Evaluation

Processing Data

Doing Training

About

Releases

Packages

Contributors 2

Languages

License

rkthomps/coq-modeling

Folders and files

Latest commit

History

Repository files navigation

Rango

CoqStoq Dataset

Artifact

Source Code Documentation

Local Setup

Setup

Running Rango on a CoqStoq Project

Running the Evaluation

Processing Data

Doing Training

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages