This is the source code for our paper Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL (EMNLP 2019).
In this paper, we propose to leverage adjective-noun phrasing knowledge mined from the web to predict the comparison relations in text-to-SQL. Experimental results on both the original and the re-split Spider dataset show that our approach achieves significant improvement over syntaxSQL and SQLNet on comparison relation prediction.
- The baseline codes use Python 2.7 and Pytorch 0.2.0 GPU. Install Python dependency:
pip install -r requirements.txt
Alternatively use docker:docker pull buaa1156/py27torch0.2cuda8vim:latest
- The preprocess scripts use Python >= 3.5.
- The dataset comes from the Spider task website, and the
singletable
andresplitdata
used in our paper are underdata/singletable
anddata/resplitdata
respectively. - The knowledge used in this paper is under the folder
data/knowledge
. - Download the pretrained Glove, and put it under
syntaxSQL
andSQLNet
folders asglove/glove.42B.300d.txt
- Download
evaluation.py
andprocess_sql.py
from the Spider github page, and evaluate the results following their instructions.
- Generated
train
anddev
data by running:python3 preprocess_syntaxSQL.py train|dev singletable|resplitdata
- Preprocess knowledge features by running:
python3 preprocess_direction_features.py syntaxSQL singletable|resplitdata weighted|direct
- Run
run_train.sh
andrun_test.sh
under the directorysyntaxSQL
after setting thedata_type
,feats_format
, andDATE
at first lines.data_type
:singletable
orresplitdata
feats_format
:weighted
ordirect
DATE
: automatically set as local time whiletraining
and manually assigned whiletesting
- Copy files in
data/
directory toSQLNet/data/
- Preprocess knowledge features by running:
python3 preprocess_direction_features.py SQLNet singletable|resplitdata weighted|direct
- Run
run_train.sh
andrun_test.sh
under the directorySQLNet
after setting thedata_type
,feats_format
, andDATE
at first lines.data_type
:singletable
orresplitdata
feats_format
:weighted
ordirect
DATE
: automatically set as local time whiletraining
and manually assigned whiletesting
If you have any question, please go ahead and open an issue.
@inproceedings{liu2019leveraging,
title={Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL},
author={Liu, Haoyan and Fang, Lei and Liu, Qian and Chen, Bei and Jian-Guang, LOU and Li, Zhoujun},
booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
pages={3506--3511},
year={2019}
}
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.