Skip to content

Commit

Permalink
Overhaul documentation after TW merge
Browse files Browse the repository at this point in the history
  • Loading branch information
djmitche committed Jun 8, 2024
1 parent 6713e4e commit 2ab4726
Show file tree
Hide file tree
Showing 11 changed files with 117 additions and 107 deletions.
26 changes: 12 additions & 14 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@
# Summary

- [Installation](./installation.md)
* [Running the Sync Server](./running-sync-server.md)
- [Introduction](./introduction.md)
- [Usage](./usage.md)
- [Task Model](./tasks.md)
- [Synchronization and the Sync Server](./sync.md)
* [Synchronization Model](./sync-model.md)
* [Snapshots](./snapshots.md)
* [Server-Replica Protocol](./sync-protocol.md)
* [Encryption](./encryption.md)
* [HTTP Implementation](./http.md)
* [Object-Store Implementation](./object-store.md)
- [Internal Details](./internals.md)
* [Data Model](./data-model.md)
* [Replica Storage](./storage.md)
* [Task Database](./taskdb.md)
* [Tasks](./tasks.md)
* [Synchronization and the Sync Server](./sync.md)
* [Synchronization Model](./sync-model.md)
* [Snapshots](./snapshots.md)
* [Server-Replica Protocol](./sync-protocol.md)
* [Encryption](./encryption.md)
* [HTTP Implementation](./http.md)
* [Object-Store Implementation](./object-store.md)
* [Planned Functionality](./plans.md)
* [Replica Storage](./storage.md)
* [Task Database](./taskdb.md)
5 changes: 0 additions & 5 deletions docs/src/data-model.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/src/http.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# HTTP Representation

The transactions in the sync protocol are realized for an HTTP server at `<base_url>` using the HTTP requests and responses described here.
The `base_url` *should* be an HTTPS endpoint on general principle, but nothing in the functonality or security of the protocol depends on connection encryption.
The `base_url` *should* be an HTTPS endpoint on general principle, but nothing in the functionality or security of the protocol depends on connection encryption.

The replica identifies itself to the server using a `client_id` in the form of a UUID.
This value is passed with every request in the `X-Client-Id` header, in its dashed-hex format.
Expand Down
3 changes: 0 additions & 3 deletions docs/src/installation.md

This file was deleted.

11 changes: 11 additions & 0 deletions docs/src/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# TaskChampion

TaskChampion implements the task storage and synchronization behind Taskwarrior.
It includes an implementation with Rust and C APIs, allowing any application to maintain and manipulate its own replica.
It also includes a specification for tasks and how they are synchronized, inviting alternative implementations of replicas or task servers.

## Relationship with Taskwarrior

TaskChampion was originally developed as a project "inspired by" Taskwarrior, and later incorporated into Taskwarrior in its 3.0 release.
Taskwarrior embeds TaskChampion, but does not have any kind of privileged access to its implementation details.
Any other application can also embed TaskChampion and implement similar functionality, and even interoperate with Taskwarrior either in the same replica or via sync.
35 changes: 0 additions & 35 deletions docs/src/plans.md

This file was deleted.

11 changes: 0 additions & 11 deletions docs/src/running-sync-server.md

This file was deleted.

43 changes: 24 additions & 19 deletions docs/src/sync-model.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,9 @@
# Synchronization Model

The [task database](./taskdb.md) also implements synchronization.
Synchronization occurs between disconnected replicas, mediated by a server.
The replicas never communicate directly with one another.
The server does not have access to the task data; it sees only opaque blobs of data with a small amount of metadata.

The synchronization process is a critical part of the task database's functionality, and it cannot function efficiently without occasional synchronization operations

## Operational Transforms

Synchronization is based on [operational transformation](https://en.wikipedia.org/wiki/Operational_transformation).
Expand All @@ -15,7 +12,7 @@ This section will assume some familiarity with the concept.
## State and Operations

At a given time, the set of tasks in a replica's storage is the essential "state" of that replica.
All modifications to that state occur via operations, as defined in [Replica Storage](./storage.md).
All modifications to that state occur via operations, as defined below.
We can draw a network, or graph, with the nodes representing states and the edges representing operations.
For example:

Expand All @@ -34,18 +31,27 @@ Since the replicas are not connected, each may have additional operations that h
The synchronization process uses operational transformation to "linearize" those operations.

This process is analogous (vaguely) to rebasing a sequence of Git commits.
Critically, though, operations cannot merge; in effect, the only option is rebasing.
Critically, though, operations cannot merge; in effect, the operations must be rebased.
Furthermore, once an operation has been sent to the server it cannot be changed; in effect, the server does not permit "force push".

### Sync Operations

The [Replica Storage](./storage.md) model contains additional information in its operations that is not included in operations synchronized to other replicas.
In this document, we will be discussing "sync operations" of the form
The operations are:

* `Create(uuid)`
* `Delete(uuid)`
* `Update(uuid, property, value, timestamp)`

The Create form creates a new task.
Creating a task that already exists has no effect.

Similarly, the Delete form deletes an existing task.
Deleting a task that does not exist has no effect.

The Update form updates the given property of the given task, where the property and values are strings.
The `property` is the property being updated, and the `value` gives its new value (or None to delete a property).
The timestamp on updates serves as additional metadata and is used to resolve conflicts.
Updating a task that does not exist has no effect.

### Versions

Expand All @@ -56,25 +62,24 @@ The version with the nil UUID is implicitly the empty database.
The server stores the operations to change a state from a "parent" version to a "child" version, and provides that information as needed to replicas.
Replicas use this information to update their local task databases, and to generate new versions to send to the server.

Replicas generate a new version to transmit local changes to the server.
In order to transmit new changes to the server, replicas generate a new version.
The changes are represented as a sequence of operations with the state resulting from the final operation corresponding to the version.
In order to keep the versions in a single sequence, the server will only accept a proposed version from a replica if its parent version matches the latest version on the server.

In the non-conflict case (such as with a single replica), then, a replica's synchronization process involves gathering up the operations it has accumulated since its last synchronization; bundling those operations into a version; and sending that version to the server.

### Replica Invariant

The replica's [storage](./storage.md) contains the current state in `tasks`, the as-yet un-synchronized operations in `operations`, and the last version at which synchronization occurred in `base_version`.

A replica stores the current state of all tasks, a sequence of as-yet un-synchronized operations, and the last version at which synchronization occurred (the "base version").
The replica's un-synchronized operations are already reflected in its local `tasks`, so the following invariant holds:

> Applying `operations` to the set of tasks at `base_version` gives a set of tasks identical
> to `tasks`.
> Applying the replica's sequence of operations to the set of tasks at the base version gives a set of tasks identical
> to the replica's current state.
### Transformation

When the latest version on the server contains operations that are not present in the replica, then the states have diverged.
For example:
For example, here the replica has local operations `a`-`b`, but the server has a new version with operations `w`-`z`:

```text
o -- version N
Expand Down Expand Up @@ -115,27 +120,27 @@ Continuing the example above, the resulting operations are shown with `'`:

The replica applies w' through z' locally, and sends a' through c' to the server as the operations to generate version N+2.
Either path through this graph, a-b-c-w'-x'-y'-z' or a'-b'-c'-w-x-y-z, must generate *precisely* the same final state at version N+2.
Careful selection of the operations and the transformation function ensure this.
Careful selection of the operations and the transformation function ensures this.

See the comments in the source code for the details of how this transformation process is implemented.

## Synchronization Process

To perform a synchronization, the replica first requests the child version of `base_version` from the server (GetChildVersion).
It applies that version to its local `tasks`, rebases its local `operations` as described above, and updates `base_version`.
To perform a synchronization, the replica first requests the child version of its stored base version from the server (GetChildVersion).
If such a version exists, it applies the transformation described above, resulting in an updated state and an updated list of local operations.
The replica repeats this process until the server indicates no additional child versions exist.
If there are no un-synchronized local operations, the process is complete.

Otherwise, the replica creates a new version containing its local operations, giving its `base_version` as the parent version, and transmits that to the server (AddVersion).
Otherwise, the replica creates a new version containing its local operations, giving its base version as the parent version, and transmits that to the server (AddVersion).
In most cases, this will succeed, but if another replica has created a new version in the interim, then the new version will conflict with that other replica's new version and the server will respond with the new expected parent version.
In this case, the process repeats.
In this case, the entire process repeats.
If the server indicates a conflict twice with the same expected base version, that is an indication that the replica has diverged (something serious has gone wrong).

## Servers

A replica depends on periodic synchronization for performant operation.
Without synchronization, its list of pending operations would grow indefinitely, and tasks could never be expired.
So all replicas, even "singleton" replicas which do not replicate task data with any other replica, must synchronize periodically.
So all replicas, even "singleton" replicas which do not replicate task data with any other replica, should synchronize periodically.

TaskChampion provides a `LocalServer` for this purpose.
It implements the `get_child_version` and `add_version` operations as described, storing data on-disk locally.
3 changes: 2 additions & 1 deletion docs/src/sync-protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Timestamps are in RFC3339 format with a `Z` suffix.

### Snapshot

The decrypted form of a snapshot is a JSON object mapping task IDs to task properties.
The decrypted form of a snapshot is a JSON object mapping task UUIDs to task properties.
For example (pretty-printed for clarity):

```json
Expand Down Expand Up @@ -93,6 +93,7 @@ If found, it returns the version's
* encrypted version data.

If not found, it returns an indication that no such version exists.
Note that this circumstance is not an error, and occurs during every successful sync process.

### AddSnapshot

Expand Down
32 changes: 14 additions & 18 deletions docs/src/tasks.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,25 @@
# Tasks
# Task Model

Tasks are stored internally as a key/value map with string keys and values.
All fields are optional: the `Create` operation creates an empty task.
Display layers should apply appropriate defaults where necessary.

## Atomicity

The synchronization process does not support read-modify-write operations.
For example, suppose tags are updated by reading a list of tags, adding a tag, and writing the result back.
This would be captured as an `Update` operation containing the amended list of tags.
Suppose two such `Update` operations are made in different replicas and must be reconciled:
* `Update("d394be59-60e6-499e-b7e7-ca0142648409", "tags", "oldtag,newtag1", "2020-11-23T14:21:22Z")`
* `Update("d394be59-60e6-499e-b7e7-ca0142648409", "tags", "oldtag,newtag2", "2020-11-23T15:08:57Z")`

The result of this reconciliation will be `oldtag,newtag2`, while the user almost certainly intended `oldtag,newtag1,newtag2`.

The key names given below avoid this issue, allowing user updates such as adding a tag or deleting a dependency to be represented in a single `Update` operation.

## Validity

_Any_ key/value map is a valid task.
Any_ key/value map is a valid task, including an empty task.
Consumers of task data must make a best effort to interpret any map, even if it contains apparently contradictory information.
For example, a task with status "completed" but no "end" key present should be interpreted as completed at an unknown time.

## Representations
## Atomicity

Replicas only synchronize with one another occasionally, so it is impossible to know the "current" state of a task with certainty.
This makes some kinds of modifications challenging.
For example, suppose task tags were updated by reading a list of tags from a property of the key/value map, adding a tag, and writing the result back.
Suppose two such modifications are made in different replicas, one setting `tags` to "oldtag,newtag1" and one setting `tags` to "oldtag,newtag2".
Reconciling these two changes on a sync operation would result in one change winning, losing one of the new tags.

The key names given below avoid this issue, allowing user updates such as adding a tag or deleting a dependency to be represented in a single modification.

## Value Representations

Integers are stored in decimal notation.

Expand All @@ -44,7 +40,7 @@ The following keys, and key formats, are defined:
* `annotation_<timestamp>` - value is an annotation created at the given time; for example, `annotation_1693329505`.
* `dep_<uuid>` - indicates this task depends on another task identified by `<uuid>`; the value is ignored; for example, `dep_8c4fed9c-c0d2-40c2-936d-36fc44e084a0`

Note that while TaskChampion recognizes "recurring" as a status, it does not implement recurrence directly.
Note that while TaskChampion recognizes "R" as a status, it does not implement recurrence directly.

### UDAs

Expand Down
53 changes: 53 additions & 0 deletions docs/src/usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Usage

## Conceptual Overview

The following provide a brief overview of the core concepts in TaskChampion.
Subsequent chapters, and API documentation, provide more detail.

### Replica

A TaskChampion replica is a local copy of a user’s task data.
As the name suggests, several replicas of the same data can exist (such as on a user’s laptop and on their phone) and can synchronize with one another.

A replica contains a collection of tasks, indexed by UUID.
It also stores a working set, and ancillary information to support synchronization.

### Task

A task is the unit of work that TaskChampion tracks.
A task is represented as a map of strings to strings.
The meaning of those strings are given in the [task model](./tasks.md).

### Working Set

A working set contains, roughly, the tasks that are currently pending.
It assigns a short, integer identifier to each such task, which is easier for users to remember and type.
The working set can be "rebuilt" as the task list changes, updating the identifiers for some tasks.

### Storage

Storage defines where and how tasks are stored.

### Server

A server supports synchronizing tasks among several replicas.
This may refer to an instance of `taskchampion-sync-server` or a number of other options.

## APIs

### Rust

TaskChampion is implemented in Rust, and implementation represents its primary public API.
It is documented at [docs.rs/taskchampion](https://docs.rs/taskchampion/latest/taskchampion/).

### C

The C API contains a rough mapping of Rust types to opaque C structures, and Rust methods to C functions.

The `taskchampion-lib` crate generates libraries suitable for use from C (or any C-compatible language).
It is a "normal" Cargo crate that happens to export a number of `extern "C"` symbols, and also contains a [`taskchampion.h`](https://github.com/GothenburgBitFactory/taskchampion/blob/main/lib/taskchampion.h) defining those symbols.
The primary documentation for the C API is in the header file.

*WARNING: the C API is not yet stable!*
Please consult with the TaskChampion developers before relying on this API.

0 comments on commit 2ab4726

Please sign in to comment.