Skip to content

Commit

Permalink
Merge pull request #64 from GraphStreamingProject/v1.0.0
Browse files Browse the repository at this point in the history
V1.0.0
  • Loading branch information
etwest authored Sep 16, 2022
2 parents 2295eed + bdee4d7 commit 1a2c5ba
Showing 1 changed file with 12 additions and 11 deletions.
23 changes: 12 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# GutterTree
A fast external memory buffering system for graph streaming.
# Guttering Systems
Data-structures for efficient external-memory and in-memory guttering systems.

The following describes some of the basics of how this works. There are three key objects `GutterTree`, `BufferControlBlock`, and `WorkQueue`.
The following describes some of the basics of how the GutterTree works.

## GutterTree
This class specifies most of the meta-data and functions for handling a gutter tree. Specifically, the gutter tree maintains a list of nodes that compose the tree in addition to some other buffers used to enable efficient IO when either flushing or reading data from a node.
Expand All @@ -27,7 +27,7 @@ When flushing we utilize `flush_buffers` to achieve efficient file writing. The
A flush of a leaf node is simply accomplished by moving the data in question into the `WorkQueue`. If the WorkQueue is full then this insertion is blocked until the queue is no longer full.


## BufferControlBlock
### BufferControlBlock
Encodes the meta-data associated with a block including its `file_offset` and `storage_ptr`. These two attributes represent the location of a node within the large and physically contiguous `backing_store` file. The nodes of the tree are stored in the file following a breadth first search. Specifically, the data stored within the buffer at each node is what is held within the file. Therefore the data in each level is contiguous on disk.

Example:
Expand Down Expand Up @@ -57,12 +57,13 @@ The WorkQueue is designed with a limited size. This is because RAM usage needs t

The structure of the WorkQueue is as follows
```
-----------------------------------------------------------------------------------
| node 4 updates | clean | clean | node 1 updates | node 2 updates| node 3 updates|
-----------------------------------------------------------------------------------
^ Head ^ Tail
```
---------- -> ---------- -> ----------
producer_queue: | empty | | empty | | empty |
---------- <- ---------- <- ----------
`Head` marks the where the next insertion to the CircularQueue should take place and `Tail` marks where the next read should happen from. Slots are marked `clean` if their data has been taken out of the queue and processed. The queue is full if the head ever points to a `dirty` slot and empty if `Head` and `Tail` point to the same slot (which is also clean).
----------- -> ----------- -> ----------- -> -----------
consumer_queue: | updates | | updates | | updates | | updates |
----------- <- ----------- <- ----------- <- -----------
```

Note in this example that `Tail` does not point to the updates for node 1. This is because the data for node 1 is currently being processed by a query. Once this query finishes, the thread performing the query will mark the slot with `node 1 updates` as clean.
The `producer queue` contains empty gutters ready to be filled and placed into the `consumer_queue`. `get_data()` calls return the head of the `consumer_queue`. Callbacks are necessary to place gutters back into the producer queue.

0 comments on commit 1a2c5ba

Please sign in to comment.