Skip to content
forked from CIPITR/CIPITR

Complex Imperative Program Induction from Terminal Rewards

Notifications You must be signed in to change notification settings

tk1363704/CIPITR

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

Complex Imperative Program Induction From Terminal Rewards (CIPITR)

This repository contains the implementation of the program induction model proposed in the TACL paper Complex Program Induction for Querying Knowledge Bases in the Absence of Gold Programs and links to download associated datasets.

Currently this code only handles program induction where the input variables to the program are gold i.e. for example if KBQA requires entity, relation type linking on the query before program induction, this code sends the oracle entity, relation, type linker's output to CIPITR.

Datasets

Datasets on Complex Question answering on Knowledge Bases, used for evaluating CIPITR

  1. Complex Sequential Question Answering (https://amritasaha1812.github.io/CSQA/) Dataset
  2. WebQuestionsSP (https://www.microsoft.com/en-us/download/details.aspx?id=52763) Dataset

Experiments on CQA

  • Step 7: For running the experiments on CQA (or any subset of CQA) with the gold entity, relation, type linking, we recommend using the tensorflow version.

  • Step 8: To do so go inside CSQA_TACL_FINAL/code/NPI/tensorflow/gold_WikiData_CSQA folder

  • Step 9: Each of the experiments are configured with a parameter file (in the parameter folder). There are seven question types (simple, logical, verify, quanti, quanti_count, comp, comp_count) and each of the variants can be run on either the smaller subset of the dataset ( i.e. CQA subset with 100 QA pairs per question type) or the full dataset. For e.g. for running on the simple question type on CQA-100 subset, use the parameter file parameters_simple_small.json and to run on full CQA dataset, use the parameter file parameters_simple_big.json (small is for 100 QA pair subset of CQA and big is for full CQA)

  • Step 10: Create a folder model.

  • Step 11: To run training on any of the question categories (simple/logical/verify/quanti/comp/quanti_count/comp_count) run python train.py <parameter_file> <time-stamp> (time-stamp is the ID of the current experiment run). Example script to run is in run.sh. This script will start the training as well as dump the trained model in the model and also run validation.

  • Step 12: To load the trained model and run test, run python load.py <parameter_file> <time-stamp> (use the same ID as used during training)

  • Step 13: To download pre-trained models and log files:

  • For e.g. to train and test the tensorflow code on simple question type on 100-QA pair subset of CQA:

    • cd CSQA_TACL_FINAL/code/NPI/tensorflow/gold_WikiData_CSQA
    • python train.py parameters/parameters_simple_small.json small_Jan_7 #this will create a folder model/simple_small_Jan_7 to dump the trained model
    • python load.py parameters/parameters_simple_small.json small_Jan_7 #this will run the trained model on the test data, as mentioned in the parameter file

Experiments on WebQuestionsSP

  • Step 1: For experiments on the WebQuestionsSP dataset, download the preprocessed version of the dataset and the corresponding subset of freebase, i.e. freebase_webqsp.zip (https://drive.google.com/file/d/1CuV4QJxknTqDmAaLwBfO1kyNW7IXTd1Q/view?usp=sharing)

  • Step 2: For running any of the tensorflow scripts, go inside CSQA_TACL_FINAL/code/NPI/tensorflow and install the dependencies by running $pip install -r requirements.txt

  • Step 3: Similarly, for running any of the pytorch scripts, go inside CSQA_TACL_FINAL/code/NPI/pytorch and install the dependencies by running $pip install -r requirements.txt

  • Step 4: Go inside code/NPI/pytorch/gold_FB_webQuestionsSP folder.

  • Step 5: Each of the experiments are configured with a parameter file (in the parameters folder). The experiments on the gold entity, relation, type (ERT) linking data have parameters inside the parameters/gold folder and the experiments on the noisy ERT linking data have parameters inside the parameters/noisy folder. There are five categories of questions, 1infc and 1inf (i.e. questions with inference chain length 1, with and without additional non-temporal constraint), 2infc and 2inf (i.e. questions with inference chain length 2, with and without additional non-temporal constraint), c_date (i.e. questions with temporal constraints, having inference chain of any length). Accordingly parameter files in the gold folder are named parameters_<category>.json. For e.g. to run an experiment on questions with length-1 inference chain and no constraint with gold ERT linker data, use the parameter file parameters/gold/parameters_1inf.json

  • Step 6: Create a folder model

  • Step 7: To run training on any of the question categories (1inf/1infc/2inf/2infc/c_date) run python train.py <parameters_file> <time-stamp> (time-stamp is the ID of the current experiment run). Example script to run is in run.sh. This script will start the training as well as dump the trained model in the model and also run validation.

  • Step 8: To load the trained model and run test, run python load.py <parameter_file> <time-stamp> (use the same ID as used during training)

  • Step 9: To download pre-trained models and log files:

  • For e.g. to train and test the pytorch code on 1inf question type on WebQuestionsSP:

    • cd CSQA_TACL_FINAL/code/NPI/pytorch/gold_FB_webQuestionsSP
    • python train.py parameters/gold/parameters_1inf.json Jan_7 #this will create a folder model/1inf_Jan_7 to dump the trained model
    • python load.py parameters/gold/parameters_1inf.json Jan_7 #this will run the trained model on the test data with gold ERT linking, as mentioned in the parameter file
    • python load.py parameters/noisy/parameters_1inf.json Jan_7 #this will run the trained model on the test data with noisy ERT linking, as mentioned in the parameter file

RL Environment for CQA Dataset

We have also provided a simple RL environment for doing Question answering over the CQA dataset using Wikidata Knowledge base.

  • Step 1: The RL environment is located at NPI/RL_ENVIRONMENT_CSQA/code/ directory.
  • Step 2: To incorporate the environment one has to simply import the environment.py file.
  • Step 3: To instantiate an environment you will need to input a parameter file. Sample parameter files are located in the parameters folder.
  • Step 4 A detailed and sufficient instruction on using and instantiating an environment object is provided in the sample_env_usage.ipynb notebook.

About

Complex Imperative Program Induction from Terminal Rewards

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.2%
  • Jupyter Notebook 3.7%
  • Other 0.1%