Merge pull request #64 from GraphStreamingProject/v1.0.0

V1.0.0
GraphStreamingProject · Sep 16, 2022 · 1a2c5ba · 1a2c5ba
2 parents 2295eed + bdee4d7
commit 1a2c5ba
Showing 1 changed file with 12 additions and 11 deletions.
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
-# GutterTree
-A fast external memory buffering system for graph streaming.
+# Guttering Systems
+Data-structures for efficient external-memory and in-memory guttering systems.
 
-The following describes some of the basics of how this works. There are three key objects `GutterTree`, `BufferControlBlock`, and `WorkQueue`.
+The following describes some of the basics of how the GutterTree works.
 
 ## GutterTree
 This class specifies most of the meta-data and functions for handling a gutter tree. Specifically, the gutter tree maintains a list of nodes that compose the tree in addition to some other buffers used to enable efficient IO when either flushing or reading data from a node.
@@ -27,7 +27,7 @@ When flushing we utilize `flush_buffers` to achieve efficient file writing. The
 A flush of a leaf node is simply accomplished by moving the data in question into the `WorkQueue`. If the WorkQueue is full then this insertion is blocked until the queue is no longer full.
 
 
-## BufferControlBlock
+### BufferControlBlock
 Encodes the meta-data associated with a block including its `file_offset` and `storage_ptr`. These two attributes represent the location of a node within the large and physically contiguous `backing_store` file. The nodes of the tree are stored in the file following a breadth first search. Specifically, the data stored within the buffer at each node is what is held within the file. Therefore the data in each level is contiguous on disk.
 
 Example:  
@@ -57,12 +57,13 @@ The WorkQueue is designed with a limited size. This is because RAM usage needs t
 
 The structure of the WorkQueue is as follows
 ```
------------------------------------------------------------------------------------
-| node 4 updates | clean | clean | node 1 updates | node 2 updates| node 3 updates|
------------------------------------------------------------------------------------
-                   ^ Head                             ^ Tail
-```
+                ---------- -> ---------- -> ----------
+producer_queue: | empty  |    | empty  |    | empty  |  
+                ---------- <- ---------- <- ----------
 
-`Head` marks the where the next insertion to the CircularQueue should take place and `Tail` marks where the next read should happen from. Slots are marked `clean` if their data has been taken out of the queue and processed. The queue is full if the head ever points to a `dirty` slot and empty if `Head` and `Tail` point to the same slot (which is also clean).
+                ----------- -> ----------- -> ----------- -> -----------
+consumer_queue: | updates |    | updates |    | updates |    | updates |    
+                ----------- <- ----------- <- ----------- <- -----------
+```
 
-Note in this example that `Tail` does not point to the updates for node 1. This is because the data for node 1 is currently being processed by a query. Once this query finishes, the thread performing the query will mark the slot with `node 1 updates` as clean.
+The `producer queue` contains empty gutters ready to be filled and placed into the `consumer_queue`. `get_data()` calls return the head of the `consumer_queue`. Callbacks are necessary to place gutters back into the producer queue.