Skip to content

Commit

Permalink
docs(readme): add usage+development guidelines
Browse files Browse the repository at this point in the history
  • Loading branch information
cmdoret authored Dec 15, 2023
1 parent 49c3bf1 commit 1ae6c94
Showing 1 changed file with 41 additions and 6 deletions.
47 changes: 41 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,22 @@

Initial system for creating and serving multi-omics digital objects.

## Motivation
## Context

### Motivation

Provide a digital object and system to process, store and serve multi-omics data with their metadata such that:
* Traceability and reproducibility is ensured by rich metadata
* The different omics layers are processed and distributed together
* Common operations such as liftover can be automated easily and ensure that omics layers are kept in sync

## Architecture
### Architecture

The digital object is composed of multiple files:
* CRAM files for alignment data, Zarr
* HDF5 files for array data
* Zarr for array data
* RDF for metadata (either separate, or embedded in the array file).

A webserver is required to list available objects and serve them over the network.

The basic structure is as follows:

```mermaid
Expand All @@ -44,6 +44,41 @@ end;
OBJ -->|display metadata| INS
```

## Installation

The development version of the library can be installed from github using pip:

```sh
pip install git+https://github.com/sdsc-ordes/smoc-poc.git@main#egg=modo
```

## Usage

The user facing API is in `modo.api`. It allows to interact with existing digital objects:

```py
from modo.api import MODO

ex = MODO('./example-digital-object')
ex.list_files()
ex.list_samples()
```

Creating digital objects via the API is not yet supported.

## Development

The development environment can be set up as follows:

```sh
git clone https://github.com/sdsc-orders/smoc-poc && cd smoc-poc
make install
```

This will install dependencies and create the python virtual environment using [poetry](https://python-poetry.org/) and setup pre-commit hooks with [pre-commit](https://pre-commit.com/).

The tests can be run with `make test`, it will execute pytest with the doctest module.

## Implementation details

* To allow horizontal traversal of digital objects in the database (e.g. for listing), the metadata would need to be exported in a central database/knowledge-graph on the server side.
Expand All @@ -54,7 +89,7 @@ end;
+ Relative paths in the digital object could work, but need to be OS-independent


# Status and limitations
## Status and limitations

* Focusing on data retrieval, object creation not yet implemented
* The htsget protocol supports streaming CRAM files, but it is currently only implemented for BAM in major genome browsers (igv.js, jbrowse)

0 comments on commit 1ae6c94

Please sign in to comment.