This repository contains code to classify individuals based on an AIM set into populations. The code was built with TabPFN, which has been published in Nature "[Accurate predictions on small data with a tabular foundation model] by Hollmann et al.
This repsository contains:
Application_TabPFN.py
: to use TabPFN for classification of individuals into populations that can be downloaded. You just have to change the input paths to your test and training data (Application_TabPFN.py).CrossValidation_Code/run_experiments.py
: a python script that was used to run our experiments.Streamlit_TabPFN.py
: a Grafical User Interface to use TabPFN for classification of individuals into populations, which runs in the Browser.data_test.csv
,data_train.csv
: Example test and training data. This is a subset of the data published as a Supplemental (1-s2.0-S1872497323000285-mmc5.xlsx
) of Ruiz-Ramirez et al.
prediction_prob.csv
: Example output prediction probabilites.
Extract_EUROPEAN.py
,Extract_continents.py
: the code that was used to prepare the data of our paper Enhancing BGA Prediction with machine learning.Cross_Validation_Code/Confusion_matrix.py
: the code to plot the confusion matrices.Cross_Validation_Code/run_plotting.py
: python code to plot the mean of ROC AUC, accuracy and logloss (or any other metric) and its confidence interval.Cross_Validation_Code/SNIPPER_CrossValidation.R
: Code for the Cross Validation with SNIPPERCross_Validation_Code/PLS_DA_CrossValidation.py
: Code for the Cross Validation with PLS-DACross_Validation_Code/Admixture_Model_CrossValidation.py
: Code for the Cross Validation with the Admixture Model
See here for an online version of the Grapical User Interface. Note that there are no data protection rules on the server where this interface runs.
To use the graphical user interface, download the repository and install the requirements using
pip install -r requirements.txt
Afterwards, type
streamlit run Streamlit_TabPFN.py
Then, your browser should open and you can use the example test and trainings data or you can upload your own data.
With
python Application_TabPFN.py
a TabPFN run is performed on train_data_eur.csv
and test_data_eur.csv
. Change the file according to your needs if it should run on a different dataset.