From ed65350478c4e403de49e434b733969e2e81b77d Mon Sep 17 00:00:00 2001 From: Paul Frazee Date: Mon, 5 Feb 2018 22:39:45 -0600 Subject: [PATCH 1/2] Add proposals/0000-hyperdrive.md (WIP) --- proposals/0000-hyperdrive.md | 170 +++++++++++++++++++++++++++++++++++ 1 file changed, 170 insertions(+) create mode 100644 proposals/0000-hyperdrive.md diff --git a/proposals/0000-hyperdrive.md b/proposals/0000-hyperdrive.md new file mode 100644 index 0000000..ccb6743 --- /dev/null +++ b/proposals/0000-hyperdrive.md @@ -0,0 +1,170 @@ + +Title: **DEP-0000: Hyperdrive** + +Short Name: `0000-hyperdrive` + +Type: Standard + +Status: Undefined (as of 2018-02-05) + +Github PR: (add HTTPS link here after PR is opened) + +Authors: [Paul Frazee](https://github.com/pfrazee) + + +# Summary +[summary]: #summary + +Hyperdrive archives are the top-level abstraction in Dat. They provide an interface for listing, reading, and writing files and folders. Each Hyperdrive is constructed using two Hypercore registers: one for representing the file metadata, and the other for representing the file content. + +As an example, consider a folder with two files: + +``` +bat.jpg +cat.jpg +``` + +These files are added to a Hyperdrive archive by splitting them into chunks and constructing Hypercore registers representing the chunks and filesystem metadata. + +Let's assume `bat.jpg` and `cat.jpg` both produce three chunks, each around 64KB. Here we will show a pseudo-representation for the purposes of illustrating the replication process. The six chunks get sorted into a list like this: + +``` +bat-1 +bat-2 +bat-3 +cat-1 +cat-2 +cat-3 +``` + +Via Hypercore, these chunks are hashed, and the hashes are arranged into a Merkle tree (the content register): + +``` +0 - hash(bat-1) + 1 - hash(0 + 2) +2 - hash(bat-2) + 3 - hash(1 + 5) +4 - hash(bat-3) + 5 - hash(4 + 6) +6 - hash(cat-1) +8 - hash(cat-2) + 9 - hash(8 + 10) +10 - hash(cat-3) +``` + +This tree is for the hashes of the contents of the photos. There is also a second Hypercore register that Hyperdrive generates that represents the list of files and their metadata and looks something like this (the metadata register): + +``` +0 - hash({contentRegister: '9e29d624...'}) + 1 - hash(0 + 2) +2 - hash({"bat.jpg", offset: 0, length: 3}) +4 - hash({"cat.jpg", offset: 3, length: 3}) +``` + +The first entry in this feed is a special metadata entry that tells Dat the address of the second feed, the content register. The remaining two entries declare the existence of the two files, and indicate where to find their data in the content register. + + +# File archives +[file-archives]: #file-archives + +TODO + + +## File metadata +[file-metadata]: #file-metadata + +Dat tries as much as possible to act as a one-to-one mirror of the state of a folder and all its contents. When importing files, Dat uses a sorted, depth-first recursion to list all the files in the tree. For each file it finds, it grabs the filesystem metadata (filename, Stat object, etc) and appends the data to the metadata register. + + +## Append-tree data structure +[append-tree-data-structure]: #append-tree-data-structure + +"Append-tree" is a tree data structure modeled on top of an append-only log (the Hypercore register). The data structure stores a small index for every entry in the log so that no external indexing is required to model the tree. It also provides fast lookups on sparsely replicated logs. + +TODO- describe + + +# Hypercore entry schemas +[hypercore-entry-schemas]: #hypercore-entry-schemas + +Hyperdrive encodes its data to its Hypercore registers using Google's [protobuf](https://developers.google.com/protocol-buffers/) encoding. + + +## Index +[hypercore-entry-schema-index]: #hypercore-entry-schema-index + +The schema of the first entry written in every metadata Hypercore register. The `type` field will contain the string `"hyperdrive"`. The `content` field will contain the public key of the content Hypercore register. + +``` +message Index { + required string type = 1; + optional bytes content = 2; +} +``` + + +## Node +[hypercore-entry-schema-node]: #hypercore-entry-schema-node + +This entry schema encodes a file in the metadata Hypercore register. The `name` field is a string providing the file-path. The `value` field is a Stat object, using the `Stat` encoding (below). The `paths` field is an index used to optimize lookups (TODO how?). + +``` +message Node { + required string name = 1; + optional bytes value = 2; + optional bytes paths = 3; +} +``` + + +## Stat +[hypercore-entry-schema-stat]: #hypercore-entry-schema-stat + +This schema encodes the "Stat" of a file in the metadata Hypercore register. TODO- describe fields. + +``` +message Stat { + required uint32 mode = 1; + optional uint32 uid = 2; + optional uint32 gid = 3; + optional uint64 size = 4; + optional uint64 blocks = 5; + optional uint64 offset = 6; + optional uint64 byteOffset = 7; + optional uint64 mtime = 8; + optional uint64 ctime = 9; +} +``` + + +# Specifications and parameters +[specifications-and-parameters]: #specifications-and-parameters + +All file paths are converted to Unix-style with a forward-slash separator. + + +# Drawbacks +[drawbacks]: #drawbacks + +TODO + + +# Rationale and alternatives +[alternatives]: #alternatives + +TODO + + +# Unresolved questions +[unresolved]: #unresolved-questions + +TODO + + +# Changelog +[changelog]: #changelog + +A brief statemnt about current status can go here, follow by a list of dates +when the status line of this DEP changed (in most-recent-last order). + +- YYYY-MM-DD: First complete draft submitted for review From d1a179ba015edfb32ab228928101aeb39c7d5c92 Mon Sep 17 00:00:00 2001 From: Paul Frazee Date: Thu, 15 Feb 2018 14:02:31 -0600 Subject: [PATCH 2/2] Rename hypercore 'register' to 'feed' --- proposals/0000-hyperdrive.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/proposals/0000-hyperdrive.md b/proposals/0000-hyperdrive.md index ccb6743..56a254a 100644 --- a/proposals/0000-hyperdrive.md +++ b/proposals/0000-hyperdrive.md @@ -15,7 +15,7 @@ Authors: [Paul Frazee](https://github.com/pfrazee) # Summary [summary]: #summary -Hyperdrive archives are the top-level abstraction in Dat. They provide an interface for listing, reading, and writing files and folders. Each Hyperdrive is constructed using two Hypercore registers: one for representing the file metadata, and the other for representing the file content. +Hyperdrive archives are the top-level abstraction in Dat. They provide an interface for listing, reading, and writing files and folders. Each Hyperdrive is constructed using two Hypercore feeds: one for representing the file metadata, and the other for representing the file content. As an example, consider a folder with two files: @@ -24,7 +24,7 @@ bat.jpg cat.jpg ``` -These files are added to a Hyperdrive archive by splitting them into chunks and constructing Hypercore registers representing the chunks and filesystem metadata. +These files are added to a Hyperdrive archive by splitting them into chunks and constructing Hypercore feeds representing the chunks and filesystem metadata. Let's assume `bat.jpg` and `cat.jpg` both produce three chunks, each around 64KB. Here we will show a pseudo-representation for the purposes of illustrating the replication process. The six chunks get sorted into a list like this: @@ -37,7 +37,7 @@ cat-2 cat-3 ``` -Via Hypercore, these chunks are hashed, and the hashes are arranged into a Merkle tree (the content register): +Via Hypercore, these chunks are hashed, and the hashes are arranged into a Merkle tree (the content feed): ``` 0 - hash(bat-1) @@ -52,16 +52,16 @@ Via Hypercore, these chunks are hashed, and the hashes are arranged into a Merkl 10 - hash(cat-3) ``` -This tree is for the hashes of the contents of the photos. There is also a second Hypercore register that Hyperdrive generates that represents the list of files and their metadata and looks something like this (the metadata register): +This tree is for the hashes of the contents of the photos. There is also a second Hypercore feed that Hyperdrive generates that represents the list of files and their metadata and looks something like this (the metadata feed): ``` -0 - hash({contentRegister: '9e29d624...'}) +0 - hash({contentFeed: '9e29d624...'}) 1 - hash(0 + 2) 2 - hash({"bat.jpg", offset: 0, length: 3}) 4 - hash({"cat.jpg", offset: 3, length: 3}) ``` -The first entry in this feed is a special metadata entry that tells Dat the address of the second feed, the content register. The remaining two entries declare the existence of the two files, and indicate where to find their data in the content register. +The first entry in this feed is a special metadata entry that tells Dat the address of the second feed, the content feed. The remaining two entries declare the existence of the two files, and indicate where to find their data in the content feed. # File archives @@ -73,13 +73,13 @@ TODO ## File metadata [file-metadata]: #file-metadata -Dat tries as much as possible to act as a one-to-one mirror of the state of a folder and all its contents. When importing files, Dat uses a sorted, depth-first recursion to list all the files in the tree. For each file it finds, it grabs the filesystem metadata (filename, Stat object, etc) and appends the data to the metadata register. +Dat tries as much as possible to act as a one-to-one mirror of the state of a folder and all its contents. When importing files, Dat uses a sorted, depth-first recursion to list all the files in the tree. For each file it finds, it grabs the filesystem metadata (filename, Stat object, etc) and appends the data to the metadata feed. ## Append-tree data structure [append-tree-data-structure]: #append-tree-data-structure -"Append-tree" is a tree data structure modeled on top of an append-only log (the Hypercore register). The data structure stores a small index for every entry in the log so that no external indexing is required to model the tree. It also provides fast lookups on sparsely replicated logs. +"Append-tree" is a tree data structure modeled on top of an append-only log (the Hypercore feed). The data structure stores a small index for every entry in the log so that no external indexing is required to model the tree. It also provides fast lookups on sparsely replicated logs. TODO- describe @@ -87,13 +87,13 @@ TODO- describe # Hypercore entry schemas [hypercore-entry-schemas]: #hypercore-entry-schemas -Hyperdrive encodes its data to its Hypercore registers using Google's [protobuf](https://developers.google.com/protocol-buffers/) encoding. +Hyperdrive encodes its data to its Hypercore feeds using Google's [protobuf](https://developers.google.com/protocol-buffers/) encoding. ## Index [hypercore-entry-schema-index]: #hypercore-entry-schema-index -The schema of the first entry written in every metadata Hypercore register. The `type` field will contain the string `"hyperdrive"`. The `content` field will contain the public key of the content Hypercore register. +The schema of the first entry written in every metadata Hypercore feed. The `type` field will contain the string `"hyperdrive"`. The `content` field will contain the public key of the content Hypercore feed. ``` message Index { @@ -106,7 +106,7 @@ message Index { ## Node [hypercore-entry-schema-node]: #hypercore-entry-schema-node -This entry schema encodes a file in the metadata Hypercore register. The `name` field is a string providing the file-path. The `value` field is a Stat object, using the `Stat` encoding (below). The `paths` field is an index used to optimize lookups (TODO how?). +This entry schema encodes a file in the metadata Hypercore feed. The `name` field is a string providing the file-path. The `value` field is a Stat object, using the `Stat` encoding (below). The `paths` field is an index used to optimize lookups (TODO how?). ``` message Node { @@ -120,7 +120,7 @@ message Node { ## Stat [hypercore-entry-schema-stat]: #hypercore-entry-schema-stat -This schema encodes the "Stat" of a file in the metadata Hypercore register. TODO- describe fields. +This schema encodes the "Stat" of a file in the metadata Hypercore feed. TODO- describe fields. ``` message Stat {