Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dependencies versioning and ci #141

Merged
merged 16 commits into from
Jan 21, 2025
Merged
15 changes: 15 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
version: 2
updates:
# Python dependencies
- package-ecosystem: "pip"
directory: "/"
schedule:
interval: "weekly"
open-pull-requests-limit: 3

# GitHub Actions
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
open-pull-requests-limit: 3
123 changes: 123 additions & 0 deletions .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
name: In pull request
on:
pull_request:
branches:
- main

jobs:
check_python_linting:
name: Ruff Linting & Formatting
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: chartboost/ruff-action@v1
with:
src: "./src ./tests"
version: 0.8.6
- uses: chartboost/ruff-action@v1
with:
src: "./src ./tests"
version: 0.8.6
args: 'format --check'

test_compatibility:
name: Test Package Compatibility
strategy:
fail-fast: false
matrix:
include:
- os: ubuntu-latest
python-version: "3.9"
dependency-set: minimum
- os: macos-13 # macos-latest doesn't work with python 3.10
# https://github.com/actions/setup-python/issues/855
python-version: "3.9"
dependency-set: minimum
- os: ubuntu-latest
python-version: "3.12"
dependency-set: maximum
- os: macos-latest
python-version: "3.12"
dependency-set: maximum
runs-on: ${{ matrix.os }}

steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
architecture: x64

- name: Install pip-tools
run: python -m pip install pip-tools

- name: Generate requirements file for minimum dependencies
if: matrix.dependency-set == 'minimum'
run: |
python << EOF
import re

with open('pyproject.toml', 'r') as f:
content = f.read()

# Find dependencies section using regex
deps_match = re.search(r'dependencies\s*=\s*\[(.*?)\]', content, re.DOTALL)
if deps_match:
deps = [d.strip(' "\'') for d in deps_match.group(1).strip().split('\n') if d.strip()]
min_reqs = []
for dep in deps:
match = re.match(r'([^>=<\s]+)\s*>=\s*([^,]+)', dep)
if match:
package, min_ver = match.groups()
min_reqs.append(f"{package}=={min_ver}")

with open('requirements.txt', 'w') as f:
f.write('\n'.join(min_reqs))
EOF

- name: Generate requirements file for maximum dependencies
if: matrix.dependency-set == 'maximum'
run: |
python << EOF
import re

with open('pyproject.toml', 'r') as f:
content = f.read()

# Find dependencies section using regex
deps_match = re.search(r'dependencies\s*=\s*\[(.*?)\]', content, re.DOTALL)
if deps_match:
deps = [d.strip(' "\'') for d in deps_match.group(1).strip().split('\n') if d.strip()]
max_reqs = []
for dep in deps:
# Extract the upper bound version if it exists
match = re.search(r'([^>=<\s]+)\s*.*<=\s*([^,\s]+)', dep)
if match:
package, max_ver = match.groups()
# Remove any remaining quotes from the version
max_ver = max_ver.strip('"\'')
max_reqs.append(f"{package}=={max_ver}")
else:
# If no upper bound, just use the package name
package = re.match(r'([^>=<\s]+)', dep).group(1)
max_reqs.append(package)

with open('requirements.txt', 'w') as f:
f.write('\n'.join(max_reqs))
EOF

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install --no-deps .
pip install pytest
pip install -r requirements.txt

- name: Initialize submodules
run: git submodule update --init --recursive

- name: Run Tests
run: |
pytest tests/
16 changes: 8 additions & 8 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ build-backend = "setuptools.build_meta"
name = "tabpfn"
version = "2.0.3"
dependencies = [
"torch>=2.1",
"scikit-learn>=1.2.0",
"typing_extensions",
"scipy",
"pandas",
"einops",
"huggingface-hub",
"torch>=2.1,<=2.5.1",
"scikit-learn>=1.2.0,<=1.6.1",
"typing_extensions>=4.4.0,<=4.12.2",
"scipy>=1.7.3,<=1.15.1",
"pandas>=1.4.0,<=2.2.3",
"einops>=0.2.0,<=0.8.0",
"huggingface-hub>=0.0.1,<=0.27.1",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the current max versions or the end of our support versions?
IMO, as long as we do not have an upper limit, we should not define one.

But I think this is up to the taste of the maintainer and boils to one of the following choices:

  • Update this file every time any of the unlimited packages are updated, or
  • update this file only when something breaks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the current max versions. My thinking was that it would prevent failures for users, and dependabot would update these max versions weekly (and then we could merge in one click if it works, otherwise investigate).
But yeah I'm not sure, not specifying the max version does reduce the maintainer work a little bit, and could be combined with some other github action which checks that things are working with the latest version.

Copy link
Collaborator

@eddiebergman eddiebergman Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to chime in:

Both options have tradeoffs and I want to be a bit more explicit about them:

  • No upperbound (Max compatibility with other libs, more immediate issues):

You effectively are at the whim of the lib to not break their API. This rarely happens with well maintained libraries except in X changes of a version (X.Y.Z, see semvar). However it will eventually happen, depending on how much functionality you use of a library. For example, with numpy, they're very unlikely to change how np.add works and if that's all you use, then you shouldn't need to upperbound it. However if you were to rely on numpy's C bindings, then they may change it as they did in the bump to 2.0.0. The upside of this is that if a user installs a higher version and it just works, then they don't have to come ask you to bump it.

  • Explicit Upperbound (Limited compatibility, less immediate issues):

You effectively specify a set of dependencies which guarantee you library works (with cave-eats such as pre-existing bugs in existing versions within bounds). The downside here is that if the user uses a library that needs a library version with a lower bound such as 2.6.0 and you upperbound on 2.5.0, then they can not use your library alongside their other one, and you will likely get a request to up it at some point.

So the tradeoff is when do you want to patch a version change? With no upperbound, it will happen immediately for a user when an issue occurs and they will raise an issue, at which point you recommend downgrading the conflict (if they can, same problem you get with Explicit upperbound but harder to debug.

With Explicit upperbound, you save the user from unknown dependancy hell, replacing it with known dependancy hell. You also save yourself immediate dependency management, but as things go with time, eventually your upperbounds become the lowerbounds of the eco-system, and you will have to solve them bit by bit. You also prevent perfectly compatible libraries from working when they otherwise would.


Given that, my recommendation and strategy is use your vibe-checks to know which libraries to upperbound and which to not. For example, I would very comfortably set numpy to have an upperbound on X, i.e. numpy<3, as they are unlikely to break anything anytime soon. Likewise with torch, but you need to consider that torch is usually a lot more unstable due to the hardware eco-system they build upon and the functionality you utilize from torch, such as their attention mechanisms. There are some libraries like typing_extensions which are developed by Python themselves and (almost) never backwards breaking, as the library is pure python and doesn't rely on CPython or otherwise system-level libraries. I would never upper-bound this.

If you were to use something like SMAC (sorry SMAC) or otherwise less production grade libraries, then they are unlikely to follow semvar strictly, and from version to version, they could break things unknowingly, usually not intentionally. These are often the hardest to judge as if several popular libraries depend on it, each of them distrusting it and setting a relatively strict upperbound, this will cause lots of conflicts.

Other considerations is the frequency of library updates. Things like pyyaml are basically complete, in that it is unlikely to update, meaning a stricter upperbound isn't going to cause as much dep maintanence as a package which updates extremely frequently.

---lock files

Recommendation: Upperbound libraries like numpy and torch to their next X. If you are unsure of the development practices of a dependancy, upperbound on their Y. Never upper bound on a Z unless there's a history of issues.

If you want to provide a this setup works assuming no other dependancies, such as in a Docker environment for deployment, then you provide lock files which specify exact dependancies.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @eddiebergman, really interesting!! Based on your comments I made the max versions much laxer: no bound for typing-extension, bound on X for all the rest except scikit-learn and einops, for which bound on Y. Tell me if you disagree! I would say that using these lax upper bounds + dependabot is a good balance between making life easy for maintainers while being robust for users.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well put into words @eddiebergman!

I agree with going forward like this and like the reasoning.
One more comment: I think it is also useful to no upperbound to push ourselves to support higher versions.

]
requires-python = ">=3.9,<3.12"
requires-python = ">=3.9,<3.13"
authors = [
{ name = "Noah Hollmann", email = "noah.hollmann@charite.de" },
{ name = "Samuel Müller", email = "muellesa@cs.uni-freiburg.de" },
Expand Down
8 changes: 4 additions & 4 deletions src/tabpfn/classifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,9 +181,9 @@ def __init__( # noqa: PLR0913
Whether to balance the probabilities based on the class distribution
in the training data. This can help to improve predictive performance
when the classes are highly imbalanced and the metric of interest is
insensitive to class imbalance (e.g., balanced accuracy, balanced log loss,
roc-auc macro ovo, etc.). This is only applied when predicting during a
post-processing step.
insensitive to class imbalance (e.g., balanced accuracy, balanced log
loss, roc-auc macro ovo, etc.). This is only applied when predicting
during a post-processing step.

average_before_softmax:
Only used if `n_estimators > 1`. Whether to average the predictions of
Expand Down Expand Up @@ -443,7 +443,7 @@ def fit(self, X: XType, y: YType) -> Self:
"classes supported by TabPFN. Consider using a strategy to reduce "
"the number of classes. For code see "
"https://github.com/PriorLabs/tabpfn-extensions/blob/main/src/"
"tabpfn_extensions/many_class/many_class_classifier.py"
"tabpfn_extensions/many_class/many_class_classifier.py",
)

# Will convert specified categorical indices to category dtype, as well
Expand Down
2 changes: 1 addition & 1 deletion src/tabpfn/model/loading.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def get_regressor_v2(cls) -> ModelSource:
"tabpfn-v2-regressor-09gpqh39.ckpt",
"tabpfn-v2-regressor-2noar4o2.ckpt",
"tabpfn-v2-regressor-5wof9ojf.ckpt",
"tabpfn-v2-regressor-wyl4o83o.ckpt"
"tabpfn-v2-regressor-wyl4o83o.ckpt",
]
return cls(
repo_id="Prior-Labs/TabPFN-v2-reg",
Expand Down
3 changes: 1 addition & 2 deletions src/tabpfn/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,7 @@
import torch
from sklearn.base import check_array, is_classifier
from sklearn.compose import ColumnTransformer, make_column_selector
from sklearn.preprocessing import OrdinalEncoder, FunctionTransformer

from sklearn.preprocessing import FunctionTransformer, OrdinalEncoder
from sklearn.utils.multiclass import check_classification_targets
from torch import nn

Expand Down
52 changes: 33 additions & 19 deletions tests/test_classifier_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
import sklearn.datasets
import torch
from sklearn.base import check_is_fitted
from sklearn.utils.estimator_checks import parametrize_with_checks
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.utils.estimator_checks import parametrize_with_checks

from tabpfn import TabPFNClassifier

Expand Down Expand Up @@ -96,6 +96,7 @@ def test_fit(
predictions = model.predict(X)
assert predictions.shape == (X.shape[0],), "Predictions shape is incorrect!"


# TODO(eddiebergman): Should probably run a larger suite with different configurations
@parametrize_with_checks(
[TabPFNClassifier(inference_config={"USE_SKLEARN_16_DECIMAL_PRECISION": True})],
Expand All @@ -112,46 +113,59 @@ def test_sklearn_compatible_estimator(

check(estimator)


def test_balanced_probabilities(X_y: tuple[np.ndarray, np.ndarray]) -> None:
"""Test that balance_probabilities=True works correctly."""
X, y = X_y

model = TabPFNClassifier(
balance_probabilities=True,
)

model.fit(X, y)
probabilities = model.predict_proba(X)

# Check that probabilities sum to 1 for each prediction
assert np.allclose(probabilities.sum(axis=1), 1.0)

# Check that the mean probability for each class is roughly equal
mean_probs = probabilities.mean(axis=0)
expected_mean = 1.0 / len(np.unique(y))
assert np.allclose(mean_probs, expected_mean, rtol=0.1), \
"Class probabilities are not properly balanced"
assert np.allclose(
mean_probs,
expected_mean,
rtol=0.1,
), "Class probabilities are not properly balanced"


def test_classifier_in_pipeline(X_y: tuple[np.ndarray, np.ndarray]) -> None:
"""Test that TabPFNClassifier works correctly within a sklearn pipeline."""
X, y = X_y

# Create a simple preprocessing pipeline
pipeline = Pipeline([
('scaler', StandardScaler()),
('classifier', TabPFNClassifier(
n_estimators=2 # Fewer estimators for faster testing
))
])

pipeline = Pipeline(
[
("scaler", StandardScaler()),
(
"classifier",
TabPFNClassifier(
n_estimators=2, # Fewer estimators for faster testing
),
),
],
)

pipeline.fit(X, y)
probabilities = pipeline.predict_proba(X)

# Check that probabilities sum to 1 for each prediction
assert np.allclose(probabilities.sum(axis=1), 1.0)

# Check that the mean probability for each class is roughly equal
mean_probs = probabilities.mean(axis=0)
expected_mean = 1.0 / len(np.unique(y))
assert np.allclose(mean_probs, expected_mean, rtol=0.1), \
"Class probabilities are not properly balanced in pipeline"
assert np.allclose(
mean_probs,
expected_mean,
rtol=0.1,
), "Class probabilities are not properly balanced in pipeline"
48 changes: 29 additions & 19 deletions tests/test_regressor_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
import sklearn.datasets
import torch
from sklearn.base import check_is_fitted
from sklearn.utils.estimator_checks import parametrize_with_checks
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.utils.estimator_checks import parametrize_with_checks

from tabpfn import TabPFNRegressor

Expand Down Expand Up @@ -110,38 +110,48 @@ def test_sklearn_compatible_estimator(
"check_methods_sample_order_invariance",
):
estimator.inference_precision = torch.float64

if check.func.__name__ == "check_methods_sample_order_invariance": # type: ignore
pytest.xfail("We're not at 1e-7 difference yet")
check(estimator)


def test_regressor_in_pipeline(X_y: tuple[np.ndarray, np.ndarray]) -> None:
"""Test that TabPFNRegressor works correctly within a sklearn pipeline."""
X, y = X_y

# Create a simple preprocessing pipeline
pipeline = Pipeline([
('scaler', StandardScaler()),
('regressor', TabPFNRegressor(
n_estimators=2 # Fewer estimators for faster testing
))
])

pipeline = Pipeline(
[
("scaler", StandardScaler()),
(
"regressor",
TabPFNRegressor(
n_estimators=2, # Fewer estimators for faster testing
),
),
],
)

pipeline.fit(X, y)
predictions = pipeline.predict(X)

# Check predictions shape
assert predictions.shape == (X.shape[0],), "Predictions shape is incorrect"

# Test different prediction modes through the pipeline
predictions_median = pipeline.predict(X, output_type="median")
assert predictions_median.shape == (X.shape[0],), "Median predictions shape is incorrect"

assert predictions_median.shape == (
X.shape[0],
), "Median predictions shape is incorrect"

predictions_mode = pipeline.predict(X, output_type="mode")
assert predictions_mode.shape == (X.shape[0],), "Mode predictions shape is incorrect"

assert predictions_mode.shape == (
X.shape[0],
), "Mode predictions shape is incorrect"

quantiles = pipeline.predict(X, output_type="quantiles", quantiles=[0.1, 0.9])
assert isinstance(quantiles, list)
assert len(quantiles) == 2
assert quantiles[0].shape == (X.shape[0],), "Quantile predictions shape is incorrect"


assert quantiles[0].shape == (
X.shape[0],
), "Quantile predictions shape is incorrect"
Loading