Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fix cloning of config for nested objects #275

Merged
merged 4 commits into from
Jan 31, 2024

Conversation

tpvasconcelos
Copy link
Contributor

Reference Issues/PRs

Fixes #276

What does this implement/fix? Explain your changes.

Please refer to: #276

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

The _clone() method has a # copied from sklearn comment above it. Any concerns with modifying this method? Any alternative solutions come to mind?

PR checklist

For all contributions
  • I've reviewed the project documentation on contributing
  • I've added myself to the list of contributors.
  • The PR title starts with either [ENH], [CI/CD], [MNT], [DOC], or [BUG] indicating whether
    the PR topic is related to enhancement, CI/CD, maintenance, documentation, or a bug.
For code contributions
  • Unit tests have been added covering code functionality
  • Appropriate docstrings have been added (see documentation standards)
  • New public functionality has been added to the API Reference

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Astutely observed.

However, question, given that there are two points of recursion

  • the first, this deals with nesting in lists etc.
  • the second, new_object_params[name] = self._clone(param, safe=False), this deals with direct nesting.

Naively, I would think we do not need to change the first occurrence, current line 228, because we go back to _clone and from there to line current 250.

Further, non-blocking question, I was wondering: should we make the contract less strict about inheritance, and more about interfaces, would it make sense to do hasattr(param, "clone") and hasattr(param, "_config")? Or perhaps it is safer to make the deviation from sklearn more strict, which is the current state. I tend towards the latter.

@fkiraly fkiraly added the bug Something isn't working label Jan 31, 2024
Copy link

codecov bot commented Jan 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (b754e93) 84.69% compared to head (c6cdd7f) 84.73%.

❗ Current head c6cdd7f differs from pull request most recent head 35ffcf8. Consider uploading reports for the commit 35ffcf8 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #275      +/-   ##
==========================================
+ Coverage   84.69%   84.73%   +0.03%     
==========================================
  Files          43       43              
  Lines        2928     2928              
==========================================
+ Hits         2480     2481       +1     
+ Misses        448      447       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tpvasconcelos
Copy link
Contributor Author

tpvasconcelos commented Jan 31, 2024

Naively, I would think we do not need to change the first occurrence, current line 228, because we go back to _clone and from there to line current 250.

I changed the implementation to just do the config cloning inside the _clone method instead of having it in the clone method. I think this is a cleaner implementation.

@tpvasconcelos
Copy link
Contributor Author

Further, non-blocking question, I was wondering: should we make the contract less strict about inheritance, and more about interfaces, would it make sense to do hasattr(param, "clone") and hasattr(param, "_config")? Or perhaps it is safer to make the deviation from sklearn more strict, which is the current state. I tend towards the latter.

With the current implementation (see latest changes) this is not relevant anymore. Could you take a look? 🙏

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very elegant!

Putting the set_config at the recursion nodes rather than having it just in the top node solves the problem conclusively and elegantly. Nice.

Cohesion is maintained since the total number of calls does not change, and _clone gets to handle set_params and set_config, so even that is improved.

@fkiraly
Copy link
Contributor

fkiraly commented Jan 31, 2024

You should also add yourself here: https://github.com/sktime/skbase/blob/main/.all-contributorsrc

bug, code, ideas.

@fkiraly
Copy link
Contributor

fkiraly commented Jan 31, 2024

The _clone() method has a # copied from sklearn comment above it. Any concerns with modifying this method?

Since you are asking: in-principle no concerns. The history behind _clone is that we originally wanted a clean re-implementation which turned out to be discrepant from sklearn in some important examples due to the magic after if estimator_type in (list, tuple, set, frozenset): and safe arg which defines a very specific contract.

Ultimately, one high-level compatibility/interoperability requirement for BaseObject is that BaseEstimator could replace sklearn BaseEstimator with at most minimal adjustments (all of which must be improvements), so the only quick way to get there, after multiple unsuccessful attempts at re-implementation, was to copy-paste sklearn clone.

Funnily, sklearn seem to be going in a similar direction by allowing estimators to dispatch to their internal clone (which seems a bit unsafe to me compared to base class inheritance). Either way, some tech discussions on uniformization of APIs are going to be scheduled for later this year between the two projects (sktime, sklearn) - in case you are interested.

@fkiraly fkiraly merged commit 6cd5155 into sktime:main Jan 31, 2024
23 of 26 checks passed
@tpvasconcelos
Copy link
Contributor Author

I see...

Just for reference, I'm attaching a visual diff between the current BaseObject._clone() implementation and sklearn's _clone_parametrized():

image

Summary of differences:

  1. skbase does not clone dict values (@fkiraly do you know why?)
  2. sklearn also clones the _metadata_request and _sklearn_output_config attributes
  3. skbase now also clones the config flags

I need to think more about this to give any meaningful input here.

Either way, some tech discussions on uniformization of APIs are going to be scheduled for later this year between the two projects (sktime, sklearn) - in case you are interested.

@fkiraly definitely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] clone does not clone config for nested objects
2 participants