This repository contains the implementation of the paper CARTE: Pretraining and Transfer for Tabular Learning.
CARTE is a pretrained model for tabular data by treating each table row as a star graph and training a graph transformer on top of this representation.
- CARTERegressor on Wine Poland dataset
- CARTEClassifier on Spotify dataset
Other datasets are available for testing: datasets
Warning
This library is currently in a phase of active development. All features are subject to change without prior notice. If you are interested in collaborating, please feel free to reach out by opening an issue or starting a discussion.
The library has been tested on Linux, MacOSX and Windows.
CARTE-AI can be installed from PyPI:
pip install carte-ai pip install huggingface_hub
After a correct installation, you should be able to import the module without errors:
import carte_ai
import pandas as pd
from carte_ai.data.load_data import *
num_train = 128 # Example: set the number of training groups/entities
random_state = 1 # Set a random seed for reproducibility
X_train, X_test, y_train, y_test = wina_pl(num_train, random_state)
print("Wina Poland dataset:", X_train.shape, X_test.shape)
The basic preparations are:
- preprocess raw data
- load the prepared data and configs; set train/test split
- generate graphs for each table entries (rows) using the Table2GraphTransformer
- create an estimator and make inference
import fasttext
from huggingface_hub import hf_hub_download
from carte_ai import Table2GraphTransformer
model_path = hf_hub_download(repo_id="hi-paris/fastText", filename="cc.en.300.bin")
preprocessor = Table2GraphTransformer(fasttext_model_path=model_path)
# Fit and transform the training data
X_train = preprocessor.fit_transform(X_train, y=y_train)
# Transform the test data
X_test = preprocessor.transform(X_test)
For learning, CARTE currently runs with the sklearn interface (fit/predict) and the process is:
- Define parameters
- Set the estimator
- Run 'fit' to train the model and 'predict' to make predictions
from carte_ai import CARTERegressor, CARTEClassifier
# Define some parameters
fixed_params = dict()
fixed_params["num_model"] = 10 # 10 models for the bagging strategy
fixed_params["disable_pbar"] = False # True if you want cleanness
fixed_params["random_state"] = 0
fixed_params["device"] = "cpu"
fixed_params["n_jobs"] = 10
fixed_params["pretrained_model_path"] = config_directory["pretrained_model"]
# Define the estimator and run fit/predict
estimator = CARTERegressor(**fixed_params) # CARTERegressor for Regression
estimator.fit(X=X_train, y=y_train)
y_pred = estimator.predict(X_test)
# Obtain the r2 score on predictions
score = r2_score(y_test, y_pred)
print(f"\nThe R2 score for CARTE:", "{:.4f}".format(score))
➡️ installation instructions setup paper
➡️ read the contributions guidelines
@article{kim2024carte,
title={CARTE: pretraining and transfer for tabular learning},
author={Kim, Myung Jun and Grinsztajn, L{\'e}o and Varoquaux, Ga{\"e}l},
journal={arXiv preprint arXiv:2402.16785},
year={2024}
}