From 2ab47269afebb8a541c71b180bfb5d16857b29a8 Mon Sep 17 00:00:00 2001 From: "Dustin J. Mitchell" Date: Sat, 8 Jun 2024 23:05:38 +0000 Subject: [PATCH] Overhaul documentation after TW merge --- docs/src/SUMMARY.md | 26 ++++++++-------- docs/src/data-model.md | 5 ---- docs/src/http.md | 2 +- docs/src/installation.md | 3 -- docs/src/introduction.md | 11 +++++++ docs/src/plans.md | 35 ---------------------- docs/src/running-sync-server.md | 11 ------- docs/src/sync-model.md | 43 ++++++++++++++------------ docs/src/sync-protocol.md | 3 +- docs/src/tasks.md | 32 +++++++++----------- docs/src/usage.md | 53 +++++++++++++++++++++++++++++++++ 11 files changed, 117 insertions(+), 107 deletions(-) delete mode 100644 docs/src/data-model.md delete mode 100644 docs/src/installation.md create mode 100644 docs/src/introduction.md delete mode 100644 docs/src/plans.md delete mode 100644 docs/src/running-sync-server.md create mode 100644 docs/src/usage.md diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md index b5bc3c6e5..398a59e79 100644 --- a/docs/src/SUMMARY.md +++ b/docs/src/SUMMARY.md @@ -1,17 +1,15 @@ # Summary -- [Installation](./installation.md) - * [Running the Sync Server](./running-sync-server.md) +- [Introduction](./introduction.md) +- [Usage](./usage.md) +- [Task Model](./tasks.md) +- [Synchronization and the Sync Server](./sync.md) + * [Synchronization Model](./sync-model.md) + * [Snapshots](./snapshots.md) + * [Server-Replica Protocol](./sync-protocol.md) + * [Encryption](./encryption.md) + * [HTTP Implementation](./http.md) + * [Object-Store Implementation](./object-store.md) - [Internal Details](./internals.md) - * [Data Model](./data-model.md) - * [Replica Storage](./storage.md) - * [Task Database](./taskdb.md) - * [Tasks](./tasks.md) - * [Synchronization and the Sync Server](./sync.md) - * [Synchronization Model](./sync-model.md) - * [Snapshots](./snapshots.md) - * [Server-Replica Protocol](./sync-protocol.md) - * [Encryption](./encryption.md) - * [HTTP Implementation](./http.md) - * [Object-Store Implementation](./object-store.md) - * [Planned Functionality](./plans.md) + * [Replica Storage](./storage.md) + * [Task Database](./taskdb.md) diff --git a/docs/src/data-model.md b/docs/src/data-model.md deleted file mode 100644 index 2a43df62b..000000000 --- a/docs/src/data-model.md +++ /dev/null @@ -1,5 +0,0 @@ -# Data Model - -A client manages a single offline instance of a single user's task list, called a replica. -This section covers the structure of that data. -Note that this data model is visible only on the client; the server does not have access to client data. diff --git a/docs/src/http.md b/docs/src/http.md index f513f6eb6..d0cdb2ff2 100644 --- a/docs/src/http.md +++ b/docs/src/http.md @@ -1,7 +1,7 @@ # HTTP Representation The transactions in the sync protocol are realized for an HTTP server at `` using the HTTP requests and responses described here. -The `base_url` *should* be an HTTPS endpoint on general principle, but nothing in the functonality or security of the protocol depends on connection encryption. +The `base_url` *should* be an HTTPS endpoint on general principle, but nothing in the functionality or security of the protocol depends on connection encryption. The replica identifies itself to the server using a `client_id` in the form of a UUID. This value is passed with every request in the `X-Client-Id` header, in its dashed-hex format. diff --git a/docs/src/installation.md b/docs/src/installation.md deleted file mode 100644 index a597a11da..000000000 --- a/docs/src/installation.md +++ /dev/null @@ -1,3 +0,0 @@ -# Installation - -As this is currently in development, installation is by cloning the repository and running "cargo build". diff --git a/docs/src/introduction.md b/docs/src/introduction.md new file mode 100644 index 000000000..2064a68c2 --- /dev/null +++ b/docs/src/introduction.md @@ -0,0 +1,11 @@ +# TaskChampion + +TaskChampion implements the task storage and synchronization behind Taskwarrior. +It includes an implementation with Rust and C APIs, allowing any application to maintain and manipulate its own replica. +It also includes a specification for tasks and how they are synchronized, inviting alternative implementations of replicas or task servers. + +## Relationship with Taskwarrior + +TaskChampion was originally developed as a project "inspired by" Taskwarrior, and later incorporated into Taskwarrior in its 3.0 release. +Taskwarrior embeds TaskChampion, but does not have any kind of privileged access to its implementation details. +Any other application can also embed TaskChampion and implement similar functionality, and even interoperate with Taskwarrior either in the same replica or via sync. diff --git a/docs/src/plans.md b/docs/src/plans.md deleted file mode 100644 index 4ee20c5d0..000000000 --- a/docs/src/plans.md +++ /dev/null @@ -1,35 +0,0 @@ -# Planned Functionality - -This section is a bit of a to-do list for additional functionality to add to the synchronzation system. -Each feature has some discussion of how it might be implemented. - -## Snapshots - -As designed, storage required on the server would grow with time, as would the time required for new clients to update to the latest version. -As an optimization, the server also stores "snapshots" containing a full copy of the task database at a given version. -Based on configurable heuristics, it may delete older operations and snapshots, as long as enough data remains for active clients to synchronize and for new clients to initialize. - -Since snapshots must be computed by clients, the server may "request" a snapshot when providing the latest version to a client. -This request comes with a number indicating how much it 'wants" the snapshot. -Clients which can easily generate and transmit a snapshot should be generous to the server, while clients with more limited resources can wait until the server's requests are more desperate. -The intent is, where possible, to request snapshots created on well-connected desktop clients over mobile and low-power clients. - -## Encryption and Signing - -From the server's perspective, all data except for version numbers are opaque binary blobs. -Clients encrypt and sign these blobs using a symmetric key known only to the clients. -This secures the data at-rest on the server. -Note that privacy is not complete, as the server still has some information about users, including source and frequency of synchronization transactions and size of those transactions. - -## Backups - -In this design, the server is little more than an authenticated storage for encrypted blobs provided by the client. -To allow for failure or data loss on the server, clients are expected to cache these blobs locally for a short time (a week), along with a server-provided HMAC signature. -When data loss is detected -- such as when a client expects the server to have a version N or higher, and the server only has N-1, the client can send those blobs to the server. -The server can validate the HMAC and, if successful, add the blobs to its datastore. - -## Expiration - -Deleted tasks remain in the task database, and are simply hidden in most views. -All tasks have an expiration time after which they may be flushed, preventing unbounded increase in task database size. -However, purging of a task does not satisfy the necessary OT guarantees, so some further formal design work is required before this is implemented. diff --git a/docs/src/running-sync-server.md b/docs/src/running-sync-server.md deleted file mode 100644 index e8b8c56ce..000000000 --- a/docs/src/running-sync-server.md +++ /dev/null @@ -1,11 +0,0 @@ -# Running the Sync Server - -> NOTE: TaskChampion is still in development and not yet feature-complete. -> The server is functional, but lacks any administrative features. - -Run `taskchampion-sync-server` to start the sync server. -Use `--port` to specify the port it should listen on, and `--data-dir` to specify the directory which it should store its data. -It only serves HTTP; the expectation is that a frontend proxy will be used for HTTPS support. - -The server has optional parameters `--snapshot-days` and `--snapshot-version`, giving the target number of days and versions, respectively, between snapshots of the client state. -The default values for these parameters are generally adequate. diff --git a/docs/src/sync-model.md b/docs/src/sync-model.md index f03626efb..ce33ed215 100644 --- a/docs/src/sync-model.md +++ b/docs/src/sync-model.md @@ -1,12 +1,9 @@ # Synchronization Model -The [task database](./taskdb.md) also implements synchronization. Synchronization occurs between disconnected replicas, mediated by a server. The replicas never communicate directly with one another. The server does not have access to the task data; it sees only opaque blobs of data with a small amount of metadata. -The synchronization process is a critical part of the task database's functionality, and it cannot function efficiently without occasional synchronization operations - ## Operational Transforms Synchronization is based on [operational transformation](https://en.wikipedia.org/wiki/Operational_transformation). @@ -15,7 +12,7 @@ This section will assume some familiarity with the concept. ## State and Operations At a given time, the set of tasks in a replica's storage is the essential "state" of that replica. -All modifications to that state occur via operations, as defined in [Replica Storage](./storage.md). +All modifications to that state occur via operations, as defined below. We can draw a network, or graph, with the nodes representing states and the edges representing operations. For example: @@ -34,18 +31,27 @@ Since the replicas are not connected, each may have additional operations that h The synchronization process uses operational transformation to "linearize" those operations. This process is analogous (vaguely) to rebasing a sequence of Git commits. -Critically, though, operations cannot merge; in effect, the only option is rebasing. +Critically, though, operations cannot merge; in effect, the operations must be rebased. Furthermore, once an operation has been sent to the server it cannot be changed; in effect, the server does not permit "force push". ### Sync Operations -The [Replica Storage](./storage.md) model contains additional information in its operations that is not included in operations synchronized to other replicas. -In this document, we will be discussing "sync operations" of the form +The operations are: * `Create(uuid)` * `Delete(uuid)` * `Update(uuid, property, value, timestamp)` +The Create form creates a new task. +Creating a task that already exists has no effect. + +Similarly, the Delete form deletes an existing task. +Deleting a task that does not exist has no effect. + +The Update form updates the given property of the given task, where the property and values are strings. +The `property` is the property being updated, and the `value` gives its new value (or None to delete a property). +The timestamp on updates serves as additional metadata and is used to resolve conflicts. +Updating a task that does not exist has no effect. ### Versions @@ -56,7 +62,7 @@ The version with the nil UUID is implicitly the empty database. The server stores the operations to change a state from a "parent" version to a "child" version, and provides that information as needed to replicas. Replicas use this information to update their local task databases, and to generate new versions to send to the server. -Replicas generate a new version to transmit local changes to the server. +In order to transmit new changes to the server, replicas generate a new version. The changes are represented as a sequence of operations with the state resulting from the final operation corresponding to the version. In order to keep the versions in a single sequence, the server will only accept a proposed version from a replica if its parent version matches the latest version on the server. @@ -64,17 +70,16 @@ In the non-conflict case (such as with a single replica), then, a replica's sync ### Replica Invariant -The replica's [storage](./storage.md) contains the current state in `tasks`, the as-yet un-synchronized operations in `operations`, and the last version at which synchronization occurred in `base_version`. - +A replica stores the current state of all tasks, a sequence of as-yet un-synchronized operations, and the last version at which synchronization occurred (the "base version"). The replica's un-synchronized operations are already reflected in its local `tasks`, so the following invariant holds: -> Applying `operations` to the set of tasks at `base_version` gives a set of tasks identical -> to `tasks`. +> Applying the replica's sequence of operations to the set of tasks at the base version gives a set of tasks identical +> to the replica's current state. ### Transformation When the latest version on the server contains operations that are not present in the replica, then the states have diverged. -For example: +For example, here the replica has local operations `a`-`b`, but the server has a new version with operations `w`-`z`: ```text o -- version N @@ -115,27 +120,27 @@ Continuing the example above, the resulting operations are shown with `'`: The replica applies w' through z' locally, and sends a' through c' to the server as the operations to generate version N+2. Either path through this graph, a-b-c-w'-x'-y'-z' or a'-b'-c'-w-x-y-z, must generate *precisely* the same final state at version N+2. -Careful selection of the operations and the transformation function ensure this. +Careful selection of the operations and the transformation function ensures this. See the comments in the source code for the details of how this transformation process is implemented. ## Synchronization Process -To perform a synchronization, the replica first requests the child version of `base_version` from the server (GetChildVersion). -It applies that version to its local `tasks`, rebases its local `operations` as described above, and updates `base_version`. +To perform a synchronization, the replica first requests the child version of its stored base version from the server (GetChildVersion). +If such a version exists, it applies the transformation described above, resulting in an updated state and an updated list of local operations. The replica repeats this process until the server indicates no additional child versions exist. If there are no un-synchronized local operations, the process is complete. -Otherwise, the replica creates a new version containing its local operations, giving its `base_version` as the parent version, and transmits that to the server (AddVersion). +Otherwise, the replica creates a new version containing its local operations, giving its base version as the parent version, and transmits that to the server (AddVersion). In most cases, this will succeed, but if another replica has created a new version in the interim, then the new version will conflict with that other replica's new version and the server will respond with the new expected parent version. -In this case, the process repeats. +In this case, the entire process repeats. If the server indicates a conflict twice with the same expected base version, that is an indication that the replica has diverged (something serious has gone wrong). ## Servers A replica depends on periodic synchronization for performant operation. Without synchronization, its list of pending operations would grow indefinitely, and tasks could never be expired. -So all replicas, even "singleton" replicas which do not replicate task data with any other replica, must synchronize periodically. +So all replicas, even "singleton" replicas which do not replicate task data with any other replica, should synchronize periodically. TaskChampion provides a `LocalServer` for this purpose. It implements the `get_child_version` and `add_version` operations as described, storing data on-disk locally. diff --git a/docs/src/sync-protocol.md b/docs/src/sync-protocol.md index 98caf24e3..2f59b7911 100644 --- a/docs/src/sync-protocol.md +++ b/docs/src/sync-protocol.md @@ -42,7 +42,7 @@ Timestamps are in RFC3339 format with a `Z` suffix. ### Snapshot -The decrypted form of a snapshot is a JSON object mapping task IDs to task properties. +The decrypted form of a snapshot is a JSON object mapping task UUIDs to task properties. For example (pretty-printed for clarity): ```json @@ -93,6 +93,7 @@ If found, it returns the version's * encrypted version data. If not found, it returns an indication that no such version exists. +Note that this circumstance is not an error, and occurs during every successful sync process. ### AddSnapshot diff --git a/docs/src/tasks.md b/docs/src/tasks.md index bac90cc9d..7dc904171 100644 --- a/docs/src/tasks.md +++ b/docs/src/tasks.md @@ -1,29 +1,25 @@ -# Tasks +# Task Model Tasks are stored internally as a key/value map with string keys and values. -All fields are optional: the `Create` operation creates an empty task. Display layers should apply appropriate defaults where necessary. -## Atomicity - -The synchronization process does not support read-modify-write operations. -For example, suppose tags are updated by reading a list of tags, adding a tag, and writing the result back. -This would be captured as an `Update` operation containing the amended list of tags. -Suppose two such `Update` operations are made in different replicas and must be reconciled: - * `Update("d394be59-60e6-499e-b7e7-ca0142648409", "tags", "oldtag,newtag1", "2020-11-23T14:21:22Z")` - * `Update("d394be59-60e6-499e-b7e7-ca0142648409", "tags", "oldtag,newtag2", "2020-11-23T15:08:57Z")` - -The result of this reconciliation will be `oldtag,newtag2`, while the user almost certainly intended `oldtag,newtag1,newtag2`. - -The key names given below avoid this issue, allowing user updates such as adding a tag or deleting a dependency to be represented in a single `Update` operation. - ## Validity -_Any_ key/value map is a valid task. +Any_ key/value map is a valid task, including an empty task. Consumers of task data must make a best effort to interpret any map, even if it contains apparently contradictory information. For example, a task with status "completed" but no "end" key present should be interpreted as completed at an unknown time. -## Representations +## Atomicity + +Replicas only synchronize with one another occasionally, so it is impossible to know the "current" state of a task with certainty. +This makes some kinds of modifications challenging. +For example, suppose task tags were updated by reading a list of tags from a property of the key/value map, adding a tag, and writing the result back. +Suppose two such modifications are made in different replicas, one setting `tags` to "oldtag,newtag1" and one setting `tags` to "oldtag,newtag2". +Reconciling these two changes on a sync operation would result in one change winning, losing one of the new tags. + +The key names given below avoid this issue, allowing user updates such as adding a tag or deleting a dependency to be represented in a single modification. + +## Value Representations Integers are stored in decimal notation. @@ -44,7 +40,7 @@ The following keys, and key formats, are defined: * `annotation_` - value is an annotation created at the given time; for example, `annotation_1693329505`. * `dep_` - indicates this task depends on another task identified by ``; the value is ignored; for example, `dep_8c4fed9c-c0d2-40c2-936d-36fc44e084a0` -Note that while TaskChampion recognizes "recurring" as a status, it does not implement recurrence directly. +Note that while TaskChampion recognizes "R" as a status, it does not implement recurrence directly. ### UDAs diff --git a/docs/src/usage.md b/docs/src/usage.md new file mode 100644 index 000000000..3d91fe138 --- /dev/null +++ b/docs/src/usage.md @@ -0,0 +1,53 @@ +# Usage + +## Conceptual Overview + +The following provide a brief overview of the core concepts in TaskChampion. +Subsequent chapters, and API documentation, provide more detail. + +### Replica + +A TaskChampion replica is a local copy of a user’s task data. +As the name suggests, several replicas of the same data can exist (such as on a user’s laptop and on their phone) and can synchronize with one another. + +A replica contains a collection of tasks, indexed by UUID. +It also stores a working set, and ancillary information to support synchronization. + +### Task + +A task is the unit of work that TaskChampion tracks. +A task is represented as a map of strings to strings. +The meaning of those strings are given in the [task model](./tasks.md). + +### Working Set + +A working set contains, roughly, the tasks that are currently pending. +It assigns a short, integer identifier to each such task, which is easier for users to remember and type. +The working set can be "rebuilt" as the task list changes, updating the identifiers for some tasks. + +### Storage + +Storage defines where and how tasks are stored. + +### Server + +A server supports synchronizing tasks among several replicas. +This may refer to an instance of `taskchampion-sync-server` or a number of other options. + +## APIs + +### Rust + +TaskChampion is implemented in Rust, and implementation represents its primary public API. +It is documented at [docs.rs/taskchampion](https://docs.rs/taskchampion/latest/taskchampion/). + +### C + +The C API contains a rough mapping of Rust types to opaque C structures, and Rust methods to C functions. + +The `taskchampion-lib` crate generates libraries suitable for use from C (or any C-compatible language). +It is a "normal" Cargo crate that happens to export a number of `extern "C"` symbols, and also contains a [`taskchampion.h`](https://github.com/GothenburgBitFactory/taskchampion/blob/main/lib/taskchampion.h) defining those symbols. +The primary documentation for the C API is in the header file. + +*WARNING: the C API is not yet stable!* +Please consult with the TaskChampion developers before relying on this API.