Update README.md

phosseini · Apr 18, 2022 · e08b566 · e08b566
1 parent 20dfb79
commit e08b566
Showing 1 changed file with 11 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -4,13 +4,17 @@
   <img src='overview.png' width='600' height='300' style="vertical-align:middle;margin:100px 50px">
 </p>
 
-## Converting ATOMIC-to-Text
-Triples in ATOMIC are stored in form of: `(subject, relation, target)`. We convert (verbalize) these triples to natural language text to later use them in training/fine-tuning some Pretrained Language Models (PLMs).
-#### Steps:
-1. Download ATOMIC 2020 [here](https://allenai.org/data/atomic-2020), put it in the `/data` folder, and unzip it.
-2. Run the following code: [`atomic_to_text.py`](https://github.com/phosseini/causal-reasoning/blob/main/atomic_to_text.py)
-3. Output will be stored as `.txt` and `.csv` files in the `/data` folder.
-
+## Converting Knowledge Graphs to Text
+### ATOMIC-to-Text
+Triples in ATOMIC are stored in form of: `(subject, relation, target)`. We convert (verbalize) these triples to natural language text to later use them in training/fine-tuning some Pretrained Language Models (PLMs):
+1. Download ATOMIC 2020 [here](https://allenai.org/data/atomic-2020), put the zip file in the `/data` folder, and unzip it (we only need `dev.tsv` and `train.tsv`).
+2. Run the following code: [`atomic_to_text.py`](https://github.com/phosseini/causal-reasoning/blob/main/atomic_to_text.py) (depending on whether you're running the grammar check, this may take a while.)
+3. Outputs will be stored as `.txt` and `.csv` files in the `/data` folder following the name patterns: `atomic2020_dev.*` and `atomic2020_train.*`.
+
+### GLUCOSE-to-Text
+1. Download GLUCOSE [here](https://tinyurl.com/yyeo92pt), unzip the file, and put the `GLUCOSE_training_data_final.csv` file in the `/data` folder.
+2. Run the following code: [glucose_to_text.py](https://github.com/phosseini/causal-reasoning/blob/main/glucose_to_text.py)
+3. Output will be stored in: `data/glucose_train.csv`
 
 ## Continual Pretraining
 Once we converted the ATOMIC triples to text, we can continually pretrain a Pretrained Language Model (PLM), BERT here, using the converted text. We call this pretraining step a **continual pretraining** since we use one of the objectives, Masked Language Modeling (MLM) that was originally used in pretraining BERT, to further train the PLM. There are two steps for running the pretraining: