This repository contains our implementation of concept length predictors in the ALC description logic.
- Clone this repository:
- Install Anaconda3, then all required librairies by executing the following commands (Linux):
conda create -n clip python==3.11.5 && conda activate clip
pip install -r requirements.txt
git clone && cd Ontolearn && git checkout 0.5.4 && pip install -e .
Download DL-Learner-1.4.0 from github and extract it into this repository (cloned above)
Clone DLFoil and DLFocl dlfoil, dlfocl, and extract the two repositories into
Install Java (version 8+) and Apache Maven (Only necessary for running DL-Learner and DL-Foil/DL-Focl)
- Download datasets and extract the zip file into
and rename the folder as Datasets
*Open a terminal and navigate into /reproduce_results/ cd LearnALCLengths/reproduce_results/
- Reproduce CLIP concept learning results on all KBs
- Reproduce the training of concept length predictors
- Furthermore, one can train concept length predictors on a single knowledge base as follows
, whereK
is one of carcinogenesis, mutagenesis, semantic_bible or vicodi. Use -h to see more training options (examplepython -h
*Open a terminal and navigate into /other_learning_systems/scripts cd LearnALCLengths/dllearner/scripts
- Reproduce concept learning results on knowledge base K for algorithm Algo
python --learning_systems Algo --knowledge_bases K
- To reproduce the results for multiple algorithms on multiple knowledge bases, use the schema
python --learning_systems Algo1 Algo2... --knowledge_bases K1 K2...
Note that Algo
is one of celoe, ocel or eltl, and K
is one of carcinogenesis, mutagenesis, semantic_bible or vicodi (all lower cased)
For DLFoil, open a terminal and navigate into /dl-foil/DLFoil2 cd LearnALCLengths/dl-foil/DLFoil2
- Run
mvn clean install
- Open a different terminal and run the following
python LearnALCLengths/generators/
- Now execute the following in the first terminal (in LearnALCLengths/dl-foil/DLFoil2):
mvn -e exec:java -Dexec.args=K_config.xml >> ../dlfoil_out_K.txt
, whereK
is one of carcinogenesis, mutagenesis, semantic_bible or vicodi.
Note that DLFoil fails to solve our learning problems as it gets stuck on the refinement of certain partial descriptions.
We could not run DLFocl.
The authors did not provide sufficient documentation to run their algorithm; the documentation is here
Open a terminal and navigate into /reproduce_results/ cd LearnALCLengths/reproduce_results/
- Run Wilcoxon statistical test on concept learning results
All Algos vs CLIP
Add your data into Datasets: it should be a folder containing a file formatted as RDF/XML or OWL/XML and should have the same name as the folder.
Navigate into /generators and run
python train_data/ --kb your_folder_name
, use -h to see more options. The generated file Data.json underyour_folder_name/Train_data/
should serve for training concept length predictors, see example scripts in/reproduce_results/train_clp/
. -
Similarly, learning problems can be generated using one of the example files in generators/learning_problems/ (replace folder names by your folder name)
Navigate into /Embeddings/Compute-Embeddings/ and run the following to embed your knowledge base:
python --path_dataset_folder your_folder_name
Train concept length predictors by preparing and running your python file
following examples in/reproduce_results/train_clp/
. -
Finally, prepare a script (see examples in
) and run CLIP on your data.
We based our implementation on the open source implementation of ontolearn. We would like to thank the Ontolearn team for the readable codebase.