Efficient Event Extraction with a Few Keywords and Examples.

Generate Weakly Supervised Training Data

python generate_weakly_supervised_data.py --"proper-args" arguments explained below. There is a newer version with minor updates (different from the one used in paper) generate_weakly_supervised_data_v2.py with the same set of arguments.

Prepare a corpus of unlabeled sentences --train-file. An example train file is shown in train_corpus_file.jsonl
Prepare a preprocess function that takes --train-file as input argument and returns a list of numerical sentence_ids and sentence strings. (can be stored in seperate python scripts and pass the path as arguments, e.g., --preprocess-func myutils.utils.preprocess). There is a default preprocessing that accepts jsonl files (each line being a json string), with each entry having a 'sentence' key for the sentence string. The sentence_id is automatically assigned as the line number.
Prepare label keywords as in the abel_info.json, and pass as the --label-json argument.
(optional but for better performance) Prepare a few annotated example sentences in --example-json with the similar format as example.json.
--encoding-save-dir, --corpus-jsonl are directories and path to save some intermediate outputs.
--output-save-dir is the directory to save the generated outputs.
--threshold is the threshold for annotation (refer to paper). Usually 0.65-0.75 works if you don't have enough example data to decide the value.
--evaluate (action='store_true') instead of use --threshold, we can use this to find the best threshold on the example-json

Run Training

python run_train.py --example-regularization --weak-annotation kw --root <root_dir> arguments explained below.

--root, a directory containing a ./data subdirectory, which contains the data/label_info.json, data/example_json and generated weakly supervised data named weakly_supervised_data_kw.jsonl. A development file and test file (named dev/test.char.jsonl respectively) is required and the script will run evaluation on them. if no such files are available you can simply duplicate and rename the weakly supervised training file.

The trained model will be saved at ./log.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
compare_gradients.py		compare_gradients.py
event_entailment.json		event_entailment.json
example.json		example.json
generate_weakly_supervised_data.py		generate_weakly_supervised_data.py
generate_weakly_supervised_data_v2.py		generate_weakly_supervised_data_v2.py
label_info.json		label_info.json
run_train.py		run_train.py
train_corpus_file.jsonl		train_corpus_file.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient Event Extraction with a Few Keywords and Examples.

Generate Weakly Supervised Training Data

Run Training

About

Releases

Packages

Languages

Perfec-Yu/efficient-event-extraction

Folders and files

Latest commit

History

Repository files navigation

Efficient Event Extraction with a Few Keywords and Examples.

Generate Weakly Supervised Training Data

Run Training

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages