Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow.assign.inplace attribute #4978

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

OfekShilon
Copy link
Contributor

@OfekShilon OfekShilon commented May 6, 2021

Closes #4783
Previous fix attempts were too general (global option) or too specific (per-column attribute). This one, I hope, hits the right balance: setDT adds a per-dt attribute that disables in-place assignment semantics. It was made from a fresh fork to ease merging.
I'll close the previous PRs.

@codecov
Copy link

codecov bot commented May 6, 2021

Codecov Report

Merging #4978 (3ec3020) into master (cb8aeff) will increase coverage by 1.97%.
The diff coverage is 100.00%.

❗ Current head 3ec3020 differs from pull request most recent head 6f70189. Consider uploading reports for the commit 6f70189 to get more accurate results

@@            Coverage Diff             @@
##           master    #4978      +/-   ##
==========================================
+ Coverage   97.49%   99.47%   +1.97%     
==========================================
  Files          80       75       -5     
  Lines       14810    14728      -82     
==========================================
+ Hits        14439    14650     +211     
+ Misses        371       78     -293     
Impacted Files Coverage Δ
R/as.data.table.R 100.00% <100.00%> (ø)
R/data.table.R 99.94% <100.00%> (+0.20%) ⬆️
R/setops.R 100.00% <100.00%> (ø)
src/assign.c 99.71% <100.00%> (-0.15%) ⬇️
src/init.c 100.00% <100.00%> (+2.04%) ⬆️
src/fastmean.c 96.82% <0.00%> (-3.18%) ⬇️
src/utils.c 94.77% <0.00%> (-2.64%) ⬇️
src/uniqlist.c 98.26% <0.00%> (-1.27%) ⬇️
src/frollR.c 99.48% <0.00%> (-0.52%) ⬇️
src/subset.c 99.50% <0.00%> (-0.50%) ⬇️
... and 58 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@OfekShilon
Copy link
Contributor Author

This suggested fix includes an addition of an internal-only argument to setDT. Is this acceptable?

@OfekShilon OfekShilon requested a review from mattdowle May 10, 2021 08:43
@mattdowle mattdowle changed the title Fix 4783 - by adding an allow.assign.inplace attribute allow.assign.inplace attribute May 10, 2021
@mattdowle mattdowle added the WIP label May 10, 2021
@mattdowle
Copy link
Member

I added WIP label so you can remove it when the changes requested are either done, or I misunderstood and you want to send the PR back to me.
One of my to-do filters is https://github.com/Rdatatable/data.table/pulls?q=is%3Apr+is%3Aopen+review%3Achanges-requested+-label%3AWIP+, so removing WIP will add it there.

@OfekShilon
Copy link
Contributor Author

OfekShilon commented May 11, 2021

@mattdowle Thanks! Before I fix it, I'd appreciate your opinion:
Is an additional setDT argument for internal-calls-only considered acceptable style?
One alternative I can think of is for internal calls to raise some option and setDT to check it, instead of a direct argument. More cumbersome, but doesn't involve an interface change.

@mattdowle
Copy link
Member

mattdowle commented May 12, 2021

Is an additional setDT argument for internal-calls-only considered acceptable style?

It would be unusual and I'm not sure we have any other examples of that. But don't worry about that yet. I suspect this PR will morph quite a bit. Just taking it one step at a time. If we need a new internal-only version of setDT(), then what springs to mind on first thought is to create a new function (e.g. setDT_internal()) and use that internally, rather than a user-visible argument or an internal option switch. But that would be a relatively easy and quick change to make, so lets leave that till later, if possible.

@OfekShilon
Copy link
Contributor Author

It would be unusual and I'm not sure we have any other examples of that. But don't worry about that yet. I suspect this PR will morph quite a bit. Just taking it one step at a time. If we need a new internal-only version of setDT(), then what springs to mind on first thought is to create a new function (e.g. setDT_internal()) and use that internally, rather than a user-visible argument or an internal option switch. But that would be a relatively easy and quick change to make, so lets leave that till later, if possible.

I completely understand why you suspect this PR would keep changing - and I apologize for changing it twice already - but conversation with you guys is the only way to make it mold. The internals (modifications to Cassign() ) are all in place, the discussion is only about the proper API to use it - and I'm not sure what is the proper "data.tableism" here.
Do you consider the Cassign change acceptable?
Since writing this, a colleague of mine asked to have a DT with full copy-on-modify semantics - so I now think to communicate with setDT via an option and not an argument, so advanced users (and not just internal functions) could choose so themselves. Sounds acceptable? If so, I'll fix this PR soon.

@mattdowle
Copy link
Member

mattdowle commented May 14, 2021

@OfekShilon Don't worry - there is no friction due to the PR changing. We don't mind changes in a PR at all.

Do you consider the Cassign change acceptable?

A change of that nature to Cassign is acceptable yes, although I haven't gone through the details of it yet. I can see it is looking at a new attribute and changing behavior depending on the attribute on the data. That's fine in principle.

Thanks for reducing the number of test changes but what I can see now is the new option(data.table.prevent.inplace=FALSE). We aren't allowing any new options like that which change code behavior because if users set that option, they would change the behavior of setDT inside packages too, likely inadvertently. In other words the options() mechanism in R is global: both user space is affected and inside packages using data.table too. So we only use option() to control display and output (e.g. see news item new feature 7 in this dev release), and temporarily (i.e. we remove the option after a few years) to make it easier for users to migrate their code when we make backwards incompatible changes (and even then it isn't perfect because of the global nature of the option). That's why I wrote above "rather than a user-visible argument or an internal option switch". What you've added is a user visible option which is worse than those two. Instead, I suggested above creating a new internal function and then using that internally.

Having said that, it is feasible that this PR ends up making a change in behavior that does involve a temporary option() to control the migration. If that's your intention then there's a chance that might be ok, but there is no 'POTENTIALLY BREAKING CHANGES' section at the top of the NEWS file in this PR yet, if that's the case.

But first, can you explain why you didn't do what I asked in #4978 (comment) please? I think I must have misunderstood, so I need you to explain that. That's the stage I'm at currently.

@OfekShilon OfekShilon removed the WIP label Sep 7, 2021
@OfekShilon
Copy link
Contributor Author

@mattdowle Sorry for the long leave. Some explanations on the order of changes:
(1) You requested that I remove the setDT_noattr workaround and replace it with internal test() fixes.
(2) I added global option to bypass in-place assignment. This made the setDT_noattr workaround redundant and I removed it entirely.
(3) You commented here that the global-option approach is not acceptable.
(4) I now reverted the global-option code, which made your setDT_noattr reservation relevant again.
(5) I also applied your review, deleted setDT_noattr and modified internal test() logic. Note that it wasn't as straightforward as originally thought (for the first time had to apply some logic to lists of data.tables), but all tests now pass.

Hopefully this can be merged now.

@OfekShilon
Copy link
Contributor Author

@mattdowle I resolved your review long ago, somehow I still see 'changes requested'. Am I missing something?

Do you consider this change satisfactory? Can it be included in 1.15?

@OfekShilon
Copy link
Contributor Author

@mattdowle Is there anything unclear, or otherwise missing, to merge this fix?

@MichaelChirico
Copy link
Member

@mattdowle I agree it's far too long since we've heard from you here. Marked as 'High' to make sure this doesn't escape notice as we close in on 1.14.6

@OfekShilon
Copy link
Contributor Author

This might have become hard to merge in the 1.5Y since suggesting it. Would it help to start over in a new PR? Perhaps with better explanations?

@MichaelChirico
Copy link
Member

MichaelChirico commented Nov 21, 2022

@OfekShilon I merged to master locally, but I don't have push permissions to IstraResearch/data.table, could you grant it? thanks

@OfekShilon
Copy link
Contributor Author

@OfekShilon I merged to master locally, but I don't have push permissions to IstraResearch/data.table, could you grant it? thanks

Thanks! I'm afraid I'm not allowed to grand access to my organization - just pushed a merge of master to this branch.

@OfekShilon OfekShilon requested a review from mattdowle March 26, 2023 09:09
@jangorecki
Copy link
Member

I don't think comfortable for having such a significant change merged shortly before preparing major release. I am putting it for the next milestone, as currently the priority is to have master branch released on CRAN.

@OfekShilon
Copy link
Contributor Author

@jangorecki this is from 2.5Y ago, I'd be surprised if it's mergeable with reasonable effort.

@jangorecki
Copy link
Member

Like most of other PR we will have to merge/rebase to master and resolve conflicts. That's the cost of not actively maintaining the project. What's positive that at least it won't be as difficult as if project would be actively maintained for 2.5 years, because there were not that many changes over this time.

@MichaelChirico
Copy link
Member

Finally read the issue & comments carefully enough to agree that yes, there's something wanting with the current code.

However, I don't want prevent.assign.inplace to be TRUE by default. In my experience it's very common to do things like setDT(foo()) where foo() is some other I/O function, e.g. DBI::dbGetQuery().

Preventing assignment on objects like that will be extremely user-unfriendly IMO.

I wonder if there's some way to use R internals here -- we really care about the case of setDT(x) where x is already bound to some variable. I'm not sure this is possible/advisable.

A much simpler solution is to just add the argument and keep it back-compatible -- users with a codebase where the "leaky" behavior tends to bite should use a wrapper function, although I'm not totally opposed to adding an option (I know Matt pushed back on that idea in the earliest review).

That said, as Ofek is no longer working much on R, I wonder who might sponsor this work to get it over the finish line. For now I'll bump the milestone; please chime in if this PR is heavily affecting you as well.

@MichaelChirico MichaelChirico modified the milestones: 1.17.0, 1.18.0 Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Changes inside a function sometimes leak outside, sometimes not Inconsistent semantics after setDT
4 participants