vignettes/SCopeLoomR_tutorial.Rmd

---
title: "SCopeLoomR Tutorial - Creating and reading .loom files"
package: r pkg_ver('SCopeLoomR')
output:
  html_notebook:
    toc: yes
  html_document:
    keep_md: true
    df_print: paged
    toc: yes
    toc_float: yes
  BiocStyle::html_document:
    number_sections: no
  pdf_document:
    toc: yes
vignette: |
  %\VignetteIndexEntry{SCopeLoomR} 
  %\VignetteEncoding{UTF-8} 
  %\VignetteEngine{knitr::rmarkdown}
---

```{r setup, include=FALSE, echo=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(SCopeLoomR)
library(SingleCellExperiment)
```


*HTML built on `r format(Sys.time(), "%b %d, %Y")` with SCopeLoomR **version `r packageVersion("SCopeLoomR")`***.

# Introduction

The .loom file format is designed to store very large omics datasets. It has been created is maintained by the [Linnarsson lab](http://linnarssonlab.org/), and its naming conventions are described in [loompy](http://linnarssonlab.org/loompy/conventions/index.html)

`SCopeLoomR` is an R package to easily create and manipulate .loom files. These loom files are compatible with [SCope](http://scope.aertslab.org), a loom viewer which also allows to visualize the results from [SCENIC](http://scenic.aertslab.org) (e.g.: regulon, regulon activities). 

# Installation
For example:
```{r install, eval=FALSE}
#install.packages("devtools")
devtools::install_github("aertslab/SCopeLoomR")
```

# Creating a loom object

## Load Data

Load the data to include in the loom file:

```{r loadData}
library(SingleCellExperiment)
library(SCopeLoomR)
data(sce) # e.g. a SingleCellExperiment 

# DGEM (Digital gene expression matrix)
dgem <- counts(sce)
  dim(dgem)
  class(dgem)
  head(colnames(dgem))  #should contain the Cell ID/name

# Known cell information/annotation  
cell.info <- colData(sce)
cell.info$nGene <- colSums(dgem>0)
  head(cell.info)

# Default embedding (e.g. t-SNE or PCA coordinates)
data(tSNE_expr)
default.tsne <- tSNE_expr
default.tne.name <- "t-SNE on full expression matrix"
head(default.tsne)
```

## Create loom file 

Minimum information required to create the .loom object:

```{r createLoom}
library(SCopeLoomR)

### Create the minimal loom file
file.name <- "example.loom"
build_loom(
  file.name=file.name,
  dgem=dgem,
  title="Fake expression dataset for examples",
  genome="Mouse", # Just for user information, not used internally
  default.embedding=default.tsne,
  default.embedding.name=default.tne.name
)
# 
```

## Add optional information 

To add any following information to a loom, please run the following command before

```{r}
loom <- open_loom(file.path = file.name, mode = "r+")
```

### Hierarchy

You can organize/group your .loom files in SCope by specifiying differents grouping levels. The current .loom file will be put in `Mouse -> Toy Datasets` branch of the SCope loom tree. 

```{r addHierarchy}
add_hierarchy(
  loom = loom,
  hierarchy = create_hierarchy(
    level.1.name = "Mouse",
    level.2.name = "Toy Datasets",
    level.3.name = ""
  )
)
```

The same command can be used to update the hierarchy of a .loom file (set `overwrite=TRUE`):

```{r, eval=FALSE}
add_hierarchy(
  loom = loom,
  hierarchy = create_hierarchy(
    level.1.name = "[level-1-name]",
    level.2.name = "[level-2-name]",
    level.3.name = "[level-3-name]"
  ),
  overwrite = T
)
```


### Annotations/metrics

Annotations and/or metrics can be added to query in [SCope](http://scope.aertslab.org).

```{r addAnnot}
# Add annotation (categorical variable)
add_col_attr(
  loom=loom,
  key = "Cell type",
  value=cell.info$cellType,
  as.annotation=T
)
# Add metric (numerical variable)
add_col_attr(
  loom=loom,
  key = "Age",
  value=sample(0:20, nrow(cell.info), replace=T),
  as.metric = T
)
```

### SCENIC results 

```{r addSCENIC, eval=TRUE, warning=FALSE}
scenic.dir <- file.path(system.file('extdata', package='SCopeLoomR'), "SCENIC_fakeOutput/") # Modify for your analysis

# Regulon activity (AUC matrix)
library(AUCell)
regulonsAUC <- readRDS(file.path(scenic.dir, "int/3.4_regulonAUC.Rds"))
add_scenic_regulons_auc_matrix(loom=loom, regulons.AUC=getAUC(regulonsAUC))

# Regulons (gene list), regulon thresholds (optional) and regulon motifs (optional)
regulons <- readRDS(file.path(scenic.dir, "int/3.1_regulons_forAUCell.Rds"))
aucThresholds <- readRDS(file.path(scenic.dir, "int/3.5_AUCellThresholds.Rds"))
regulon.enrichment.table <- NULL # readRDS(file.path(scenic.dir, "int/2.3_motifEnrichment.Rds"))
add_scenic_regulons(loom=loom
                    , regulons=regulons
                    , regulon.threshold.assignments=aucThresholds# $threshold.assignment # Optional
                    , regulon.enrichment.table = regulon.enrichment.table) # Optional
                    
# Alternative t-SNE
tSNE <- readRDS(file.path(scenic.dir, "int/tSNE_AUC_50pcs_50perpl.Rds"))
add_embedding(loom=loom, embedding=tSNE$Y, name="SCENIC (t-SNE on AUC)")
```

### Seurat results 

*(Not run in this example. You would need to use your own file names)*

```{r addSeuratEmbedding, eval=FALSE}
seurat.dir <- "Seurat_output/"
seurat.tsne <- readRDS(file = paste0(seurat.dir, "seurat_tsne.rds.gz"))
seurat <- readRDS(paste0(seurat.dir, "seuratObject.rds.gz"))

# Add extra embeddings
add_embedding(loom=loom, embedding=seurat.tsne, name="Seurat 82PC, 30perp"))
```

#### Clusterings

```{r addSeuratClusters, eval=FALSE}
add_seurat_clustering(loom = loom
                      , seurat = seurat)
```

##### Adding clustering(s) along with annotation for a given resolution (default one if set)
```{r addSeuratClusters2, eval=FALSE}
seurat.annotation<-read.table(file = paste0(seuratDir, "Res2_Clusters.tsv", header=T, quote = '', sep = "\t", stringsAsFactors=F))
add_seurat_clustering(loom = loom
                      , seurat = seurat
                      , default.clustering.resolution = "res.2"
                      , annotation = seurat.annotation
                      , annotation.cluster.id.cn = "res.2" 
                      , annotation.cluster.description.cn = "Annotation")
```

##### Adding clustering(s) along with cluster markers 
```{r addSeuratClusters3, eval=FALSE}
seurat.resolutions <- get_seurat_clustering_resolutions(seurat = seurat)
seurat.markers.file.path.list<-do.call(what = "c", lapply(X=seurat.resolutions, FUN=function(resolution) {
  l<-list()
  l[paste0("res.",resolution)]<-paste0(seuratDir, "res.",resolution,"/Seurat_PC82_res",resolution,"_bimod_Cluster_Markers.rds.gz")
  return (l)
}))
add_seurat_clustering(loom = loom
                      , seurat = seurat
                      , seurat.markers.file.path.list = seurat.markers.file.path.list)
```

##### Adding clustering(s), along with cluster markers and metrics (e.g.: logFC, p-value, ...)

```{r addSeuratClusters4, eval=FALSE}
add_seurat_clustering(loom = loom
                      , seurat = seurat
                      , seurat.markers.file.path.list = seurat.markers.file.path.list
                      , seurat.marker.metric.accessors = c("avg_logFC", "p_val_adj")
                      , seurat.marker.metric.names = c("Avg. logFC", "Adj. p-value")
                      , seurat.marker.metric.description = c("Average log fold change from Wilcox differential test (cfr. Seurat)"
                                                             , "Adjusted p-value using Bonferroni correction based on the total number of genes in the dataset (cfr. Seurat)"))
```

### Trajectory inference methods results

*(Not run in this example. You would need to use your own file names)*

**Monocle**: Adding the trajectory data from a CellDataSet object generated by monocle.

```{r addMonocleTrajectory, eval=FALSE}
# Add monocle embedding and trajectory
S_matrix <- reducedDimS(cds = cds)
monocle.embedding<-data.frame(t(x = S_matrix[c(1, 2), ]))
add_embedding(loom = loom
              , embedding = monocle.embedding
              , name = "Monocle (DDRTree)"
              , trajectory = create_trajectory_from_monocle(cds = cds))
```

All trajectories methods (> 59) implemented in the **dyno** framework can be added to .loom files and subsequently be displayed in SCope. To see how to run these, please visit https://github.com/dynverse/dyno. 

```{r}
add_embedding_from_dyno(loom = loom
                        , dyno = model
                        , name = "PAGA (dyno)")
```


## Finalize (save)
```{r finalize}
finalize(loom=loom)
```

# Read data from a loom object

## Get the gene expression matrix

```{r openLoom}
loom_path <- file.path(system.file('extdata', package='SCopeLoomR'), "example.loom")
loom <- open_loom(loom_path, mode="r+")
dgem <- get_dgem(loom)
close_loom(loom)
dgem[1:5,1:5]
```

## Get the gene expression matrix of a given cluster

This .loom file can be downloaded at http://scope.aertslab.org in the left panel under `Drosophila` > `Brain`.

```{r dgemCluster, eval=FALSE}
loom <- open_loom(loom = "Aerts_Fly_AdultBrain_Filtered_57k.loom", mode="r")
cluster10_dgem <- get_cluster_dgem_by_name(loom = loom, cluster.name = "Astrocyte-like - Cluster 10")
close_loom(loom)
```

## Get annotations from a clustering

```{r}
# Get annotations from clustering (default is set to Annotation)
clustering_annotations_df <- get_clustering_annotations(loom = loom)
# Check available clusterings
list_clusterings_names()
# Get annotations from clustering with given name (here: [custom-clustering-name])
clustering_annotations_df <- get_clustering_annotations(
  loom = loom,
  clustering_name = "[custom-clustering-name]",
)
# Get annotations from clustering with given name (here: [custom-clustering-name]) annotated by person associated with the given ORCID.
clustering_annotations_df <- get_clustering_annotations(
  loom = loom,
  clustering_name = "[custom-clustering-name]",
  curator_orcid = "XXX-XXX-XXX-XXX"
)
# Get annotations from clustering with given name (here: [custom-clustering-name]) annotated by person associated with the given ORCID.
# Save the resulting output (data.frame) to CSV file (here: ./test.csv)
clustering_annotations_df <- get_clustering_annotations(
  loom = loom,
  clustering_name = "[custom-clustering-name]",
  curator_orcid = "XXX-XXX-XXX-XXX",
  out_file_path = "./tmp.csv"
)
```

## Get all newly added annotations by ORCID

```{r}
# Get all annotations across all clusterings that were added after 2019-01-01 and from the user with the given XXX-XXX-XXX-XXX ORCID
all_clustering_annotations_df <- get_all_clustering_annotations(
  loom = loom,
  curator_orcid = "XXX-XXX-XXX-XXX",
  from_date = "2019-01-01",
)
```


## Other general accessors

```{r loomSCENIC, fig.height=4, fig.width=4}
loomPath <- file.path(system.file('extdata', package='SCopeLoomR'), "example.loom")
loom <- open_loom(loomPath, mode="r")

# SCENIC results:
regulons_incidMat <- get_regulons(loom)
regulonsAUC <- get_regulonsAuc(loom)
regulonsAucThresholds <- get_regulonThresholds(loom)
embeddings <- get_embeddings(loom)
close_loom(loom)

# Use the loaded info... e.g.:
plot(embeddings[[1]], pch=16)
```

Don't forget...
```{r}
close_loom(loom)
# or finalize(loom) 
```