Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose operations from TaskChampion #373

Closed
djmitche opened this issue Aug 28, 2022 · 28 comments · Fixed by #464 or #474
Closed

Expose operations from TaskChampion #373

djmitche opened this issue Aug 28, 2022 · 28 comments · Fixed by #464 or #474
Assignees
Labels
topic:per-task-ops Issues related to exposing per-task operations
Milestone

Comments

@djmitche
Copy link
Collaborator

djmitche commented Aug 28, 2022

Background

Every change to a task is represented internally as an "Operation"
https://github.com/GothenburgBitFactory/taskwarrior/blob/41992d484909bd865bc252e99e588c5c3f37c71a/taskchampion/taskchampion/src/storage/op.rs#L10-L38
This is how we accomplish task sync -- different replicas share the set of operations they have applied, and resolve any conflicts in a reasonable, consistent way.

Note that a single command-line invocation may produce several operations. For example

task 123 start +inprogress prio:M

would create four Update operations: one to set the start property to the current time, one to add the tag_inprogress property, one to set the priority property to M and one to update the modified property to the current time.

There are also UndoPoint operations, which do not actually change anything but act as a signpost for how many operations to revert when running task undo -- that command basically reverts operations until it reaches an UndoPoint. The UndoPoint operations are not sent to the server.

Once operations are successfully sent to the server, they are no longer stored locally. This avoids the local task database growing forever.

Currently, none of this is visible in Taskchampion's public API. This bug would change that.

Motivation

We'll need this for a few reasons:

The task undo command previously showed a "diff" of what it would change, and asked the user for permission. This is not possible with the current Taskchampion API (see GothenburgBitFactory/taskwarrior@4b814bc). But if there was a way to query Taskchampion to say "hey, what are the operations back to the latest UndoPoint?" then it could generate a diff view from that.

The task info command has historically used the undo information to show the history for a particular task. This operated by scanning the entire undo history for records matching that task's uuid, and formatting those nicely for display. Again, that's currently impossible (support was removed in GothenburgBitFactory/taskwarrior#3060). If there was a way to query Taskchampion to say "hey, what operations have happened for this task?" then we could bring back that functionality.

Looking forward, one of the things we've considered in #372 is to allow a user of the Taskchampion API to provide a list of operations to apply, instead of calling methods like task.start(..) or task.set_description(..). A different possibility we've considered is that a user of the Taskchampion API could call methods like task.start(..) or task.set_description(..) and build up a list of operations, then "commit" those all at once. This would allow the task command to confirm changes with the user before committing them, for example. All of this paragraph is out of scope for this issue, but should be kept in mind when designing the API.

Details

Off the top of my head, I think this can be broken into two parts:

Operations for Undo

  • Making the ReplicaOp type public
  • Defining an API on the Replica type for fetching all operations back to the latest UndoPoint, and implementing that through the storage layer.
  • Adding a C interface in taskchampion-lib
  • Adding a C++ interface
  • Writing some C++ support for printing a list of operations as a "diff"
  • Using this for confirmation of task undo

Operations for Info

all of the above, and..

  • Updating the Taskchampion storage backend to store operations for each task for as long as that task exists. This is different from just storing all operations in the order they occurred, because once a task is deleted we want to also delete its associated operations. This will require changing the database schema.
  • Making that information available from the Replica type
  • Adding a C interface in taskchampion-lib
  • Adding a C+ interface
  • Using this to show a task's history in task info when the journal.info config is enabled.
@djmitche djmitche self-assigned this Aug 28, 2022
djmitche referenced this issue in djmitche/taskwarrior Sep 25, 2022
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Sep 25, 2022
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Oct 10, 2022
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Oct 10, 2022
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Nov 12, 2022
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Nov 12, 2022
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Nov 16, 2022
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Nov 16, 2022
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Dec 18, 2022
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Dec 18, 2022
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Dec 23, 2022
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Dec 23, 2022
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Jan 19, 2023
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Jan 19, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Feb 4, 2023
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Feb 4, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Feb 9, 2023
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Feb 9, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Apr 9, 2023
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Apr 9, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in GothenburgBitFactory/taskwarrior Apr 16, 2023
This support will require access to all of the operations ever performed
on a task, which is not currently exposed by TaskChampion (but see #2928)
djmitche referenced this issue in djmitche/taskwarrior Apr 19, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior Apr 30, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in djmitche/taskwarrior May 28, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
djmitche referenced this issue in GothenburgBitFactory/taskwarrior Jun 11, 2023
TaskChampion does not make the necessary information available to
accomplish this, but see #2928.
@djmitche
Copy link
Collaborator Author

@masaeedu is considering working on this, so I'll un-assign from myself.

@djmitche djmitche removed their assignment Jun 19, 2023
@djmitche
Copy link
Collaborator Author

djmitche commented Sep 9, 2024

I've called this the "WINNING INVARIANT":

applying the losing operation followed by the winning operation has the same effect as applying only the winning operation

However, given operations Create(uuid) and an Update(uuid, prop, value), neither "winner" would satisfy that invariant. Admittedly, a Update informally implies that the task had already been created when it was applied, but the operational transform needs to be able to deal with any operations in any state, so that informal implication feels like the kind of "cheat" that creates strange bugs years later.

I need to think about this some more. It's possible this whole idea doesn't work :(

@djmitche
Copy link
Collaborator Author

djmitche commented Sep 9, 2024

Maybe the trick is to redefine how the operations apply so that every combination can satisfy the WINNING INVARIANT.

I think the critical change would be that an Update should create the task if it does not exist. Then Update can win over both Create and Delete.

Create Update Delete
Create Redundant Update Wins Delete Wins
Update Latest Wins Delete Wins
Delete Redundant

Here are the WINNING INVARIANT statements for each conflict:

  • Create vs. Update: Create followed by Update has same effect as Update ✔️ (task exists with updated value)
  • Create vs. Delete: Create followed by Delete has same effect as Delete ✔️ (task does not exist)
  • Update vs. Delete: Update followed by Delete has same effect as Delete ✔️ (task does not exist)

This sort of suggests that we never needed the Create operation to begin with, but I don't think that's worth revisiting at this time.

@djmitche
Copy link
Collaborator Author

Preliminary work implementing this suggests it's going to work!

@djmitche
Copy link
Collaborator Author

Nope, this doesn't work. The analysis above works on an operation-by-operation basis, but sync is over whole sequences of operations, and those have all already been applied on a replica somewhere. So if we get a winning operation W from the server, then some other replica has already applied [W, O1, O2, O3, ..]. If a local operation L loses to W, and we try to push [L, W] to the server, then eventually that other replica will apply [L, W], for a total sequence of operations [W, O1, O2, O3, .., L, W]. The problem is, W may no longer make sense after those O1..: for example, O2 might delete the task that L and W refer to. In fact, taskdb::sync::test::test_sync_create_delete provides an example where this fails.

I'll think a bit more first, but it might be time to roll back some? all? of the PRs I've made here.

@djmitche
Copy link
Collaborator Author

djmitche commented Oct 1, 2024

So, yes, on thinking further, the idea of trying to "hide" operations using the WINNING INVARIANT is not going to work. The problem is that when a local operation loses to an operation from the server, there's really no good way to send that operation to the server -- the other replica has already applied the winning operation and many more subsequent operations.

So, proposal: don't send that operation, but instead send an "FYI" operation, which is defined to contain the information about the change, but not affect taskdb state at all and never conflict with any other operations. Then task info would see mostly regular operations, but occasionally an FYI, and would just display that like it would any other Update (sorted by modification time). Applying an FYI is simple (do nothing). And the SyncOp::transform function actually becomes a little simpler: on conflict, the losing operation is just turned into the corresponding FYI operation.

There are two problems with this approach.

  1. It means doubling the number of operation variants -- adding FyiUpdate, FyiCreate, and FyiDelete to the existing Update, Create, and Delete. I think we could limit that doubling to just SyncOp (private to the crate) and not Operation (in the public API).
  2. It's not backward-compatible -- these new Fyi* operations would cause a 0.7.0 replica to fail. I'm not sure how to solve this.

I'm not sure how to solve the second problem, or if it's even worthwhile trying right now. Maybe the 0.8.0 version of this does not guarantee that task history matches across replicas -- is that good enough?

@djmitche
Copy link
Collaborator Author

djmitche commented Oct 3, 2024

I think we should park this for the current release. Looking at what's landed, we are currently storing operations much as before, but with this handy virtual uuid column, and with a get_task_operations method on Replica.

I'd propose removing get_task_operations from the public Replica API (since currently it only returns un-synced operations, which isn't very useful), and making a release.

We could also think about defining these Fyi* operations now, and just ignoring them when they are found in versions from the server. That would mean that we could have a range of compatibliity with those operations. But, given the false starts on this issue, I'm not confident enough to plan that far ahead.

@ryneeverett
Copy link
Collaborator

How confident are you in the "handiness" of the virtual uuid column? I'm just thinking it seems questionable to update the schema in a release if doing so doesn't offer any new functionality and may end up being cruft from a "false start".

@djmitche
Copy link
Collaborator Author

djmitche commented Oct 8, 2024

That's a good question, although removing it will require a bunch more reverts. I don't think it actually hurts -- removing it later would be straightforward and have little overall impact. I'd prefer to leave it in just on the basis of difficulty of removing it.

@github-project-automation github-project-automation bot moved this from In progress to Done in Taskwarrior Development Oct 8, 2024
@ryneeverett
Copy link
Collaborator

Don't think you meant to close this though you might have wanted to remove it from the 0.8.0 milestone?

I'd prefer to leave it in just on the basis of difficulty of removing it.

This is sort of what I was driving at. In the future it might be better to avoid PR's to the main branch which (a) aren't justified by their own value-add and (b) are in pursuit of an uncertain roadmap with some big question marks along the way. I'd be happy to review PR's to a feature branch if you want your WIP reviewed. If the concern is difficult rebases...frankly I think it would be better to be in that situation than in the difficult reversion situation.

@ryneeverett ryneeverett reopened this Oct 11, 2024
@djmitche
Copy link
Collaborator Author

Don't think you meant to close this though you might have wanted to remove it from the 0.8.0 milestone?

Oops, I unintentionally set #464 to auto-close it.

This is sort of what I was driving at. In the future it might be better to avoid PR's to the main branch which (a) aren't justified by their own value-add and (b) are in pursuit of an uncertain roadmap with some big question marks along the way. I'd be happy to review PR's to a feature branch if you want your WIP reviewed. If the concern is difficult rebases...frankly I think it would be better to be in that situation than in the difficult reversion situation.

I understand and tend to agree, but on the other hand lots of projects operate on a flag-based model where just about every piece of code landed doesn't do anything, so it's not an unreasonable way of working. Anyway, I'll make a PR to back all of this out.

@ryneeverett
Copy link
Collaborator

on the other hand lots of projects operate on a flag-based model where just about every piece of code landed doesn't do anything, so it's not an unreasonable way of working

Fair point, though in order for the given feature to not do anything I would think it's code would need to be fairly isolated -- if not in separate modules then at least within if feature-enabled blocks. I'm not sure it's possible to do that with a feature like this which requires a fundamental change in the data model.

@djmitche
Copy link
Collaborator Author

Reflecting on the options we have for this:

  • Some kind of AddTaskHistory operation (space-prohibitive)
  • FYI operations (not backward-compatible)
  • Do nothing, don't support historical operations (makes lots of things in TaskWarrior not useful)

None of these seem great. However,

ryneverett on Aug 31:

At the end of the day though, I agree that the behavior issues you're describing are OK if they're not reasonable to solve. I'm not sure anyone expects task info to be perfect and to a certain extent it should be unsurprising that an offline-first decentralized system has an inconsistent historical perspective.

I think this is a good point. If we can get task info consistent except when operations conflicted and the losing one appears on only one replica, I think that's good enough.

And, I think we can do that without changing the sync model or violating any invariants.

Right now, when getting new operations from the server, we take them one-by-one and transpose each with the full list of un-synced local operations. If the server operation still exists at the end, we apply it locally and then discard it. Once all server operations are handled, we can upload the (transformed) local operations to the server.

A simple modification is to store the transformed server operation instead of discarding it, tagged as synced. The synced operations aren't part of any invariants and exist only for the task history, so this can't invalidate or break anything.

This can still end up with different results on different replicas, when the server operation "loses" on some replicas and does not get stored, while it does appear in the storage where the operation was created. But, such conflicts are rare and "make sense" to users. The example earlier in this thread was probably the most common, with a task marked status:completed on one replica and later status:deleted on another. One replica shows both operations, and the other shows only the later operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:per-task-ops Issues related to exposing per-task operations
Projects
Status: Done
2 participants