Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
phosseini authored Apr 18, 2022
1 parent 20dfb79 commit e08b566
Showing 1 changed file with 11 additions and 7 deletions.
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,17 @@
<img src='overview.png' width='600' height='300' style="vertical-align:middle;margin:100px 50px">
</p>

## Converting ATOMIC-to-Text
Triples in ATOMIC are stored in form of: `(subject, relation, target)`. We convert (verbalize) these triples to natural language text to later use them in training/fine-tuning some Pretrained Language Models (PLMs).
#### Steps:
1. Download ATOMIC 2020 [here](https://allenai.org/data/atomic-2020), put it in the `/data` folder, and unzip it.
2. Run the following code: [`atomic_to_text.py`](https://github.com/phosseini/causal-reasoning/blob/main/atomic_to_text.py)
3. Output will be stored as `.txt` and `.csv` files in the `/data` folder.

## Converting Knowledge Graphs to Text
### ATOMIC-to-Text
Triples in ATOMIC are stored in form of: `(subject, relation, target)`. We convert (verbalize) these triples to natural language text to later use them in training/fine-tuning some Pretrained Language Models (PLMs):
1. Download ATOMIC 2020 [here](https://allenai.org/data/atomic-2020), put the zip file in the `/data` folder, and unzip it (we only need `dev.tsv` and `train.tsv`).
2. Run the following code: [`atomic_to_text.py`](https://github.com/phosseini/causal-reasoning/blob/main/atomic_to_text.py) (depending on whether you're running the grammar check, this may take a while.)
3. Outputs will be stored as `.txt` and `.csv` files in the `/data` folder following the name patterns: `atomic2020_dev.*` and `atomic2020_train.*`.

### GLUCOSE-to-Text
1. Download GLUCOSE [here](https://tinyurl.com/yyeo92pt), unzip the file, and put the `GLUCOSE_training_data_final.csv` file in the `/data` folder.
2. Run the following code: [glucose_to_text.py](https://github.com/phosseini/causal-reasoning/blob/main/glucose_to_text.py)
3. Output will be stored in: `data/glucose_train.csv`

## Continual Pretraining
Once we converted the ATOMIC triples to text, we can continually pretrain a Pretrained Language Model (PLM), BERT here, using the converted text. We call this pretraining step a **continual pretraining** since we use one of the objectives, Masked Language Modeling (MLM) that was originally used in pretraining BERT, to further train the PLM. There are two steps for running the pretraining:
Expand Down

0 comments on commit e08b566

Please sign in to comment.