Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs/Object Storage #1106

Merged
merged 18 commits into from
Mar 29, 2021
Merged
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 176 additions & 0 deletions docs/implementation_design/object_storage.md
Original file line number Diff line number Diff line change
@@ -1 +1,177 @@
# Object storage
In GoShimmer `ObjectStorage` is used as a base data structure for many data collections elements such as `branchStorage`, `conflictStorage`, `messageStorage` and others.
apenzk marked this conversation as resolved.
Show resolved Hide resolved
It can be described by the following characteristics, it:
- is a manual cache which keeps objects in memory as long as consumers are using it
- uses key-value storage type
- provides mutex options for guarding shared variables and preventing changing the object state by multiple goroutines at the same time
- takes care of dynamic creation of different object types depending on the key, and the serialized data it receives through the utility `objectstorage.Factory`
- helps with multiple `ObjectStorage` instances creation from the same package and automatic configuration.
apenzk marked this conversation as resolved.
Show resolved Hide resolved

In order to create an object storage we need to provide the underlying `kvstore.KVStore` structure backed by the database.



## Database
GoShimmer stores data in the form of an object storage system. The data is stored in one large repository with flat structure. It is a scalable solution that allows for fast data retrieval because of its categorization structure.

Additionally, GoShimmer leaves the possibility to store data only in memory that can be specified with the parameter `CfgDatabaseInMemory` value. In-memory storage is purely based on a Go map, package `mapdb` from hive.go.
For the persistent storage in a database it uses package `badger` from hive.go. It is a simple and fast key-value database that performs well for both reads and writes simultaneously.

Both solutions are implemented in `database` package, along with prefix definitions that can be used during creation of new object storage elements.
apenzk marked this conversation as resolved.
Show resolved Hide resolved

The database plugin is responsible for creating a `store` instance of the chosen database under the directory specified with `CfgDatabaseDir` parameter. It will manage a proper closure of the database upon receiving a shutdown signal. During the start configuration, the database is marked as unhealthy, and it will be marked as healthy on shutdown. Then the garbage collector is run and the database can be closed.

## ObjectStorage


Assume we need to store data for some newly created object `A`. Then we need to define a new prefix for our package in the `database` package, and prefixes for single storage objects. They will be later used during `ObjectStorage` creation. A package prefix will be combined with a store specific prefix to create a specific realm.
```Go
package example

type Storage struct {
A *objectstorage.ObjectStorage
...
shutdownOnce sync.Once
}
```
### ObjectStorage factory
To easily create multiple storage objects instances for one package, the most convenient way is to use the factory function.
```Go
osFactory := objectstorage.NewFactory(store, database.Prefix)
```
It needs two parameters:
- `store` - the key value `kvstore` instance
- `database.Prefix` - prefix defined in the `database` package for our new `example` package. It will be responsible for automatic configuration of newly provided KVStore object.
apenzk marked this conversation as resolved.
Show resolved Hide resolved


After defining the storage factory for the group of objects, we can use it to create an `*objectstorage.ObjectStorage` instance:
```Go
AStorage = osFactory.New(objPrefix, FromObjectStorage)
AStorage = osFactory.New(objPrefix, FromObjectStorage, optionalOptions...)
```
For the function parameter we should provide:
- `objPrefix` - mentioned before, we provide the object specific prefix.
- `FromObjectStorage` - a function that allows the dynamic creation of different object types depending on the stored data.
- `optionalOptions` - an optional parameter provided in the form of options array `[]objectstorage.Option`. All possible options are defined in `objectstorage.Options`. If we do not specify them during creation, the default values will be used, such as enabled persistence or setting cache time to 0.

### StorableObject
`StorableObject` is an interface that allows the dynamic creation of different object types depending on the stored data. We need to make sure that all methods required by the interface are implemented to use the object storage factory.

- `SetModified` - marks the object as modified, which will be written to the disk (if persistence is enabled).
- `IsModified` - returns true if the object is marked as modified
- `Delete` - marks the object to be deleted from the persistence layer
- `IsDeleted` - returns true if the object was marked as deleted
- `Persist` - enables or disables persistence for this object
- `ShouldPersist` - returns true if this object is going to be persisted
- `Update` - updates the object with the values of another object - requires an explicit implementation
- `ObjectStorageKey` - returns the key that is used to store the object in the database - requires an explicit implementation
- `ObjectStorageValue` - marshals the object data into a sequence of bytes that are used as the value part in the object storage - requires an explicit implementation

Most of these have their default implementation in `objectstorage` library, except from `Update`, `ObjectStorageKey`, `ObjectStorageValue` which need to be provided.

### StorableObjectFactory function
The function `ObjectFromObjectStorage` from object storage provides functionality to restore objects from the `ObjectStorage`. By convention the implementation of this function usually follows the schema:
`ObjectFromObjectStorage` uses `ObjectFromBytes`
```Go
func ObjectFromObjectStorage(key []byte, data []byte) (result StorableObject, err error) {
result, err := ObjectFromBytes(marshalutil.New(data))
...
return
}
```

`ObjectFromBytes` unmarshals the object sequence of bytes with a help of `marshalutil` library. Returned `consumedBytes` can be used for the testing purposes.
apenzk marked this conversation as resolved.
Show resolved Hide resolved
Created `marshalUtil` instance stores the stream of bytes and keeps track of what has been already read (`readOffset`).
apenzk marked this conversation as resolved.
Show resolved Hide resolved
```Go
func ObjectFromBytes(bytes []byte) (object *ObjectType, consumedBytes int, err error) {
marshalUtil := marshalutil.New(bytes)
if object, err = ObjectFromMarshalUtil(marshalUtil); err != nil {
...
consumedBytes = marshalUtil.ReadOffset()
return
}
```
The key logic is implemented in `ObjectFromMarshalUtil` that takes the marshaled object and transforms it into the object of specified type.
Because the data is stored in a sequence of bytes, it has no information about the form of an object and any data types it had before writing to the database.
Thus, we need to serialize any data into a stream of bytes in order to write it (marshaling), and deserialize the stream of bytes back into correct data structures when reading it (unmarshaling).
Let's consider as an example, unmarshaling of the `Approver` object.
```Go
type Approver struct {
approverType ApproverType // 8 bytes
referencedMessageID MessageID // 32 bytes
approverMessageID MessageID // 32 bytes
}
```

Order in which we read bytes has to reflect the order in which it was written down during marshaling. As in the example, the order: `referencedMessageID`, `approverType`, `approverMessageID` is the same in both marshalling and unmarshalling.
apenzk marked this conversation as resolved.
Show resolved Hide resolved

```Go
// Unmarshalling
func ApproverFromMarshalUtil(marshalUtil *marshalutil.MarshalUtil) (result *Approver) {
result = &Approver{}
result.referencedMessageID = MessageIDFromMarshalUtil(marshalUtil)
result.approverType = ApproverTypeFromMarshalUtil(marshalUtil)
result.approverMessageID = MessageIDFromMarshalUtil(marshalUtil)
return
}
// Marshalling
func (a *Approver) ObjectStorageApprover() []byte {
return marshalutil.New().
Write(a.referencedMessageID).
Write(a.approverType).
Write(a.approverMessageID).
Bytes()
}
```

We continue to decompose our object into smaller pieces with help of `MarshalUtil` struct that keeps track of bytes, and a read offset.
Then we use `marshalutil` build in methods on an appropriate parts of the byte stream with its length defined by the data
apenzk marked this conversation as resolved.
Show resolved Hide resolved
type of the struct field. This way, we are able to parse bytes to the correct Go data structure.

### ObjectStorage methods
After defining marshalling and unmarshalling mechanism for`objectStorage` bytes conversion,
we can start using it for its sole purpose, to actually store and read the particular parts of the project elements.

- `Load` allows retrieving corresponding object based on provided id. For example, method on the message `objectStorage`
apenzk marked this conversation as resolved.
Show resolved Hide resolved
is getting the cached object.
- To convert object retrieved in the form of a cache to its own corresponding type, we can use `Unwrap`.
apenzk marked this conversation as resolved.
Show resolved Hide resolved
In the code below it will return the message wrapped by the cached object.
- `Exists` - checks weather the object has been deleted if yest it is released from memory with `Release` method.
apenzk marked this conversation as resolved.
Show resolved Hide resolved
```Go
func (s *Storage) Message(messageID MessageID) *CachedMessage {
return &CachedMessage{CachedObject: s.messageStorage.Load(messageID[:])}
}

cachedMessage := messagelayer.Tangle().Storage.Message(msgID)
if !cachedMessage.Exists() {
msgObject.Release()
}
message := cachedMessage.Unwrap()
```
- `Consume` will be useful when we want to apply function on the cached object. `Consume` unwraps the `CachedObject` and passes a type-casted version to the consumer function.
apenzk marked this conversation as resolved.
Show resolved Hide resolved
Right after the object is consumed and when callback is finished, the object is released.
apenzk marked this conversation as resolved.
Show resolved Hide resolved

```Go
cachedMessage.Consume(func(message *tangle.Message) {
doSomething(message)
})
```
- `ForEach` - allows to apply a `Consumer` function for every object residing within the cache and the underlying persistence layer.
For example, that's how we can count number of messages.
apenzk marked this conversation as resolved.
Show resolved Hide resolved
```Go
messageCount := 0
messageStorage.ForEach(func(key []byte, cachedObject objectstorage.CachedObject) bool {
cachedObject.Consume(func(object objectstorage.StorableObject) {
messageCount++
})
}
```
- `Store` - storing an object in the objectStorage. An extended version is method `StoreIfAbsent`
that stores an object only if it was not stored before and returns boolean indication if object was stored.
apenzk marked this conversation as resolved.
Show resolved Hide resolved
Similarly, works `ComputeIfAbsent` but does not access the value log.
apenzk marked this conversation as resolved.
Show resolved Hide resolved
```Go
cachedMessage := messageStorage.Store(newMessage)
cachedMessage, stored := messageStorage.StoreIfAbsent(newMessage)
cachedMessage := messageStorage.ComputeIfAbsent(newMessage, remappingFunction)
```