Official library to compute results for the BioCreative Task 1 competition. It is intended to be used by BioCreative participants to test their systems before submitting them. It will be used as well by the competition organizers to compute the official metrics.
Written in Python 3.8
Output is printed in terminal.
- Python3
- pandas
- sklearn
You will need python3 (together with its base libraries) as well as the pandas and sklearn packages.
To install all dependencies:
git clone https://github.com/tonifuc3m/drugprot-evaluation-library.git
cd drugprot-evaluation-library
pip install -r requirements.txt
To run the evaluation library, move to the src/ directory and execute the main.py script.
cd src
python main.py -g ../gs-data/gs_relations.tsv -p ../toy-data/pred_relations.tsv -e ../gs-data/gs_entities.tsv --pmids ../gs-data/pmids.txt
The relevant metrics are micro-average precision, recall and f1-score. The tool allows you to explore your results by relation type as well.
The Predictions TSV file contains the predicted relations. It must have these columns (separated by a \t):
- Article identifier (PMID)
- DrugProt relation
- Interactor argument 1 (of type CHEMICAL)
- Interactor argument 2 (of type GENE)
Example:
12488248 INHIBITOR Arg1:T1 Arg2:T52
12488248 INHIBITOR Arg1:T2 Arg2:T52
23220562 ACTIVATOR Arg1:T12 Arg2:T42
23220562 ACTIVATOR Arg1:T12 Arg2:T43
23220562 INDIRECT-DOWNREGULATOR Arg1:T1 Arg2:T14
For more in-depth information about the Data Format (Gold Standard and Predictions), have a look at the toy-data directory or at the Zenodo page.
-g/--gs_path
: path to Gold Standard relations TSV file-p/--pred_path
: path to Prediction TSV file-e/--ent_path
: path to Gold Standard entities TSV file--pmids
: path to list of relevant PMIDs
$ cd src
$ python main.py -g ../gs-data/gs_relations.tsv -p ../toy-data/pred_relations.tsv -e ../gs-data/gs_entities.tsv --pmids ../gs-data/pmids.txt
python main.py -g ../gs-data/gs_relations.tsv -p ../toy-data/pred_relations.tsv -e ../gs-data/gs_entities.tsv --pmids ../gs-data/pmids.txt
Loading GS files...
Loading prediction files...
Checking GS files...
Checking Predictions files...
Formatting data...
Computing DrugProt (BioCreative VII) metrics ...
(p = Precision, r=Recall, f1 = F1 score)
By relation type
p_INDIRECT-DOWNREGULATOR=1.0
r_INDIRECT-DOWNREGULATOR=0.5
f1_INDIRECT-DOWNREGULATOR=0.667
p_DIRECT-REGULATOR=1.0
r_DIRECT-REGULATOR=1.0
f1_DIRECT-REGULATOR=1.0
p_ACTIVATOR=1.0
r_ACTIVATOR=1.0
f1_ACTIVATOR=1.0
p_INHIBITOR=1.0
r_INHIBITOR=0.8
f1_INHIBITOR=0.889
p_AGONIST=0.8
r_AGONIST=0.667
f1_AGONIST=0.727
p_ANTAGONIST=1.0
r_ANTAGONIST=1.0
f1_ANTAGONIST=1.0
p_PART-OF=1.0
r_PART-OF=0.5
f1_PART-OF=0.667
The following relations are not present in the Gold Standard: AGONIST-INHIBITOR,SUBSTRATE_PRODUCT-OF,AGONIST-ACTIVATOR,INDIRECT-UPREGULATOR,SUBSTRATE,PRODUCT-OF
The following relations are not present in the Predictions: SUBSTRATE,PRODUCT-OF,SUBSTRATE_PRODUCT-OF,AGONIST-ACTIVATOR
The following relations are not present in the Predictions: SUBSTRATE_PRODUCT-OF,PRODUCT-OF,AGONIST-ACTIVATOR,SUBSTRATE
Gobal results across all DrugProt relations (micro-average)
p_micro=0.783
r_micro=0.72
f1_micro=0.75
If you find any bugs in the evaluation library, please, contact me at antoniomiresc@gmail.com