Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] design - 2nd party package integration, decentral estimator library #6639

Open
fkiraly opened this issue Jun 18, 2024 · 8 comments
Open
Labels
API design API design & software architecture enhancement Adding new functionality module:base-framework BaseObject, registry, base framework

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 18, 2024

Opening a discussion on 2nd party integration patterns and on how (or whether) to build a decentral estimator library.

More concretely, what are the base patterns in setting up packages like prophetverse or skchange that provide sktime compatible estimators to direct users, following the unified sktime API?

FYI @felipeangelimvieira, @Tveten

Some starter thoughts of mine about requirements and principles.

Operational, governance desiderata:

  • decentral - 2nd party maintainers can manage their package freely
  • gatekeeperless - not one person or small group blocking/approving
  • indexed - users can easily find and search, using types and tags
  • simple 2nd party developer setup - like publishing on pypi
  • maximizing credit - visible contributor and owner assignment

Technical desiderata:

  • unified API - all estimators follow strongly compatible interfaces and types
  • easy API conformance checking
  • queriable estimator index - there probably needs to be a central index
  • estimator level dependency management
  • minimizing integration complexity

Short-term, how would this look like?

  • unified API and tests defined in sktime
  • estimators are mini-packages with own dependencies - similar to current sktime
  • use of sktime proper as a package index - docstring plus tags
  • 2nd party package runs check_estimator

Questions I have:

  • pattern "use import from soft dependency if available, use docstring/tag identical copy if not" - any problems with using different classes depending on environment?
  • where should tests live? Certainly in the 2nd party package. Any testing in sktime? Does not scale if "test all"; no clear mechanism to trigger on change otherwise?
  • how to minimize complexity of the developer experience?
  • how to ensure that credit is assigned to the 2nd party package and its developers?

Interesting mid- and long-term questions - increasing decentrality even more, perhaps separate package index? Extensibility with custom APIs and types, not just estimators, like tsbootstrap?

@fkiraly fkiraly added API design API design & software architecture enhancement Adding new functionality module:base-framework BaseObject, registry, base framework labels Jun 18, 2024
@felipeangelimvieira
Copy link
Contributor

Franz, I have a few additional questions/points to consider:

  • 2nd party packages, such as Prophetverse, may have specific methods (such as predict_all_samples, which returns all posterior samples from the model). Would that create any compatibility issues with sktime?
  • Some arguments in Prophetverse models are custom objects, such as those passed in exogenous_effects (docs here). These objects have their own inner hyperparameters. Should they implement get_params() behavior so that their parameters can be set with set_params()? Or would that be ideal but not essential?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 19, 2024

Would that create any compatibility issues with sktime?

No issues I foresee, the contract allows you to add arbitrary public methods. If you want to test them, you can add separate tests, or inherit from the sktime test class.

Ofc that specific method relates to a general Bayesian interface that we may end up defining, but it would also not affect an existing public method unless the name is identical.

Should they implement get_params() behavior so that their parameters can be set with set_params()?

That is not required by the API but I would see it as a good practice suggestion, or users will not have access to tuning and automl related to these in a more granular way than the entire hyperparameter (not the nested ones)

If the objects have a fixed set of parameters, or is a list, you can inherit from various scikit-base classes to give the API a nice sklearn-like flavour with get/set-params.

Concretely, what you could do in this case:

  • have AbstractEffect inherit from skbase BaseObject - that will give it set_params, get_params and a couple other things. You can test conformance with sktime check_estimator, too.
    • idea: probably skbase should have a check_object too, which checks the minimal skbase API - but you can do via sktime atm.
  • For "list of AbstractEffect". you can inherit from BaseMetaObject. If you want lists of abstract effects to also be abstract effects, you can inherit BaseMetaObjectMixin and AbstractEffect.
    • would be keen to hear about the developer experience here, if you are going to try.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 19, 2024

Linking discussion about dependency handling and update/deprecations workflows here:

felipeangelimvieira/prophetverse#64 (comment)

@fkiraly
Copy link
Collaborator Author

fkiraly commented Oct 19, 2024

On discord, @yarnabrina highlighted the following problem:

Suppose we have a 2nd party package like skchange, where skchange imports sktime for BaseEstimator or similar. That import is non-negotiable because the package uses sktime base classes.

What is the policy on dependencies in sktime? In this special case, we import skchange in the notebook, but should it also be imported, say, in the all_extras and detection dependency sets?

Circular package level dependencies could be problematic, and if it were not for the notebook, they could be avoided by using the placeholder pattern for indexing, as used for instance with prophetverse.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Oct 19, 2024

FYI @Tveten, @felipeangelimvieira, as you are directly impacted by the above observation.

@felipeangelimvieira
Copy link
Contributor

Thank you Franz. In that case, Prophetverse is already safe with respect to circular dependencies, as you said right? One interesting pattern to look at is the one in Airflow (Airflow Providers)

@fkiraly
Copy link
Collaborator Author

fkiraly commented Oct 21, 2024

In that case, Prophetverse is already safe with respect to circular dependencies, as you said right?

Yes, sktime does not require prophetverse in its soft dependencies, it only checks if present in the placeholder import. This is "safe" as the tests are in prophetverse, which depends on sktime for the interface and the parametrize_with_checks tests.

But now suppose we wanted to use the "effects" base classes in another model, or do sth similar, that would create a problem.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Oct 21, 2024

One interesting pattern to look at is the one in Airflow (Airflow Providers)

Do you know how this works under the hood, @felipeangelimvieira? Is it the same pattern that we have used for sktime and prophetverse, namely that the providers import the core, and the core checks for presence?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design API design & software architecture enhancement Adding new functionality module:base-framework BaseObject, registry, base framework
Projects
None yet
Development

No branches or pull requests

2 participants