Skip to content

Latest commit

 

History

History
71 lines (41 loc) · 16.4 KB

README.md

File metadata and controls

71 lines (41 loc) · 16.4 KB

Cell atlas of the developing human brain

Data and code related to our manuscript Comprehensive cell atlas of the first-trimester developing human brain (Emelie Braun, Miri Danan-Gotthold et al. 2022, in review).

Fig1D

Preprint (bioRxiv)

https://www.biorxiv.org/content/10.1101/2022.10.24.513487v1

Code

We used the Shoji tensor database and the cytograph-shoji pipeline.

Code for making many of the figures is available as Jupyter notebooks

Data

Complete dataset

Metadata per sample: table_S1.xlsx

Metadata per cluster: table_S2.xlsx

Raw data: EGAS00001004107

Complete processed dataset: HumanFetalBrainPool.h5

Also available in h5ad format with CELLxGENE annotations: human_dev.h5ad

See further below for a description of the content of the .h5 files

Alternative expression matrices generated with the "standard" cellranger + velocyto pipeline using cellranger GRCh38-3.0.0 annotations are available in loom and anndata formats:

human_dev_GRCh38-3.0.0.loom

human_dev-GRCh38-3.0.0.h5ad (Annotations basically follow CELLxGENE standards.)

human_dev-GRCh38-3.0.0_all_layers.h5ad (The same but including 'ambiguous', 'spliced', and 'unspliced' layers.)

These files contain exactly the same cells as the HumanFetalBrainPool.h5 file. Some ~8000 cells that were filtered out by this procedure have zero total UMI count.

Working datasets

(coming soon)

Spatial EEL FISH datasets

HE_5week_7 Section 1 Z=970um
Section 2 Z=810um
Section 3 Z=640um
3 spatial EEL FISH datasets of sagittaly cut full human embryo at 5 weeks post conception. Data is in the .parquet format and can be opened by FISHscale, Python Pandas or any other Parquet reader.
r_px_microscope_stitched and c_px_microscope_stitched contain the RNA molecule coordinates in pixels (pixel size of 0.18um).
r_transformed and c_transformed contain the RNA molecule coordinates in pixels (pixel size of 0.27um).
Tissue and Brain columns indicate if the detected molecules are in the tissue or in the brain respectively.

Description of tensors

The datasets are provided as HDF5 files containing the tensors listed below. In Python, they can be accessed using h5py (other languages have similary libraries).

The most important tensors are Expression (the expression matrix; sum of spliced and unspliced UMIs), Gene (gene names), Accession (Ensembl accessions), Clusters (cluster labels), Embedding (tSNE), Factors (PCA components), ManifoldIndices (KNN graph edges) and ManifoldWeights (KNN graph edge weights).

dtyperankdimsshape(values)
Accessionstring1genes59,480["pCAG-DsRed2_101-650", "pCS-Cherry-DEST_101-850", "pCAG ···
Agefloat321cells1,665,937[8.0, 8.0, 8.0, 8.0, 8.0, ...]
AnnotationDefinitionstring1annotations51["+MPZ", "+EYA1 +ISL1", "+NHLH1", "+MEIS2 +ISL1 +SIX3", ···
AnnotationDescriptionstring1annotations51["Schwann cell-like (E-SCHWL; +MPZ)", "Otic vesicle of t ···
AnnotationNamestring1annotations51["E-SCHWL", "HB-OTV", "NBL", "TH-RETN", "CB-PURK", ...]
AnnotationPosteriorfloat322clusters ✕ annotations617 ✕ 51[[-1.8189894e-12, 6.617445e-24, 1.0, 3.3087225e-24, 3.30 ···
CellClassstring1cells1,665,937["Erythrocyte", "Erythrocyte", "Erythrocyte", "Erythrocy ···
CellCycleFractionfloat321cells1,665,937[0.0, 0.0001071352, 0.0, 0.00095663266, 0.0, ...]
CellIDstring1cells1,665,937["10X89_1:AAACGGGAGGCTACGA", "10X89_1:ACGAGGAAGAGCCTAG", ···
Chemistrystring1cells1,665,937["v2", "v2", "v2", "v2", "v2", ...]
Chromosomestring1genes59,480["chrEXTRA", "chrEXTRA", "chrEXTRA", "chrEXTRA", "chrEXT ···
Classstring1clusters617["Neuroblast", "Radial glia", "Radial glia", "Glioblast" ···
ClusterIDuint321clusters617[0, 1, 2, 3, 4, ...]
Clustersuint321cells1,665,937[240, 240, 236, 240, 233, ...]
Donorstring1cells1,665,937["BRC2006", "BRC2006", "BRC2006", "BRC2006", "BRC2006", ...]
DoubletFlagbool1cells1,665,937[False, False, False, False, False, ...]
DoubletScorefloat321cells1,665,937[0.02, 0.02, 0.03, 0.01, 0.02, ...]
DropletClassuint81cells1,665,937[0, 0, 0, 0, 0, ...]
Embeddingfloat322cells ✕ 21,665,937 ✕ 2[[22.061909, 11.055673], [23.594717, 10.600938], [25.339 ···
Endstring1genes59,480["550", "1320", "2090", "3610", "4730", ...]
Enrichmentfloat322clusters ✕ genes617 ✕ 59,480[[1.0, 1.0, 1.0, 1.0, 1.0, ...], [1.0, 1.0, 1.0, 1.0, 1. ···
Expressionuint162cells ✕ genes1,665,937 ✕ 59,480[[0, 0, 0, 0, 0, ...], [0, 0, 0, 0, 0, ...], [0, 0, 0, 0 ···
Factorsfloat322cells ✕ __1,665,937 ✕ 50[[-1.5914472, 1.524089, 0.21222332, -4.3109193, -5.85292 ···
Genestring1genes59,480["marker-DsRed", "marker-Cherry", "marker-GFP", "marker- ···
GeneNonzerosuint321genes59,480[0, 0, 0, 0, 0, ...]
GeneTotalUMIsuint321genes59,480[0, 0, 0, 0, 0, ...]
Linkagefloat322__ ✕ 4616 ✕ 4[[238.0, 239.0, 0.0016231078, 2.0], [237.0, 617.0, 0.002 ···
Loadingsfloat322genes ✕ __59,480 ✕ 50[[0.0, 0.0, 0.0, 0.0, 0.0, ...], [0.0, 0.0, 0.0, 0.0, 0. ···
ManifoldIndicesuint322__ ✕ 240,164,783 ✕ 2[[0, 6], [0, 106], [0, 208], [0, 225], [0, 246], ...]
ManifoldRadiusfloat320()()1.0
ManifoldWeightsfloat321__40,164,783[0.9746674, 0.9753966, 0.97435904, 0.9760038, 0.98073715 ···
MeanAgefloat641clusters617[10.651846331718932, 10.967863210449874, 10.768960981864 ···
MeanCellCyclefloat641clusters617[0.002357223176804402, 0.003319249633509612, 0.023186484 ···
MeanDoubletScorefloat641clusters617[0.09462042097992746, 0.11769588179965942, 0.19775236498 ···
MeanExpressionfloat642clusters ✕ genes617 ✕ 59,480[[0.0, 0.0, 0.0, 0.0, 0.0, ...], [0.0, 0.0, 0.0, 0.0, 0. ···
MeanTotalUMIfloat641clusters617[5449.63220088626, 5258.164957264958, 7567.301298701311, ···
MitoFractionfloat321cells1,665,937[0.0, 0.0038568673, 0.008797339, 0.0015943878, 0.0018687 ···
NCellsuint641clusters617[1354, 1170, 770, 1232, 1536, ...]
NGenesuint321cells1,665,937[121, 271, 674, 101, 113, ...]
Nonzerosuint642clusters ✕ genes617 ✕ 59,480[[0, 0, 0, 0, 0, ...], [0, 0, 0, 0, 0, ...], [0, 0, 0, 0 ···
OverallTotalUMIsuint640()()13029800607
PrevClustersuint321cells1,665,937[658, 658, 662, 658, 669, ...]
Recipestring1__2["{'InitializeWorkspace': {'from_workspace': 'samples202 ···
Regionstring1cells1,665,937["Telencephalon", "Telencephalon", "Telencephalon", "Tel ···
SampleIDstring1cells1,665,937["10X89_1", "10X89_1", "10X89_1", "10X89_1", "10X89_1", ...]
SelectedFeaturesbool1genes59,480[False, False, False, False, False, ...]
Sexstring1cells1,665,937["", "", "", "", "", ...]
Speciesstring0()()"Homo sapiens"
Startstring1genes59,480["1", "571", "1341", "2111", "3631", ...]
StdevExpressionfloat321genes59,480[0.0, 0.0, 0.0, 0.0, 0.0, ...]
Subdivisionstring1cells1,665,937["Cortex", "Cortex", "Cortex", "Cortex", "Cortex", ...]
Subregionstring1cells1,665,937["Cortex", "Cortex", "Cortex", "Cortex", "Cortex", ...]
Tissuestring1cells1,665,937["Cortex", "Cortex", "Cortex", "Cortex", "Cortex", ...]
TopLevelClusteruint321cells1,665,937[25, 25, 25, 25, 25, ...]
TotalUMIsuint321cells1,665,937[4630, 9334, 9321, 3136, 4281, ...]
Trinariesfloat322clusters ✕ genes617 ✕ 59,480[[-1.8189894e-12, -1.8189894e-12, -1.8189894e-12, -1.818 ···
UnsplicedFractionfloat321cells1,665,937[0.3514039, 0.33833298, 0.3174552, 0.32589287, 0.3585611 ···
ValidCellsbool1cells1,665,937[True, True, True, True, True, ...]
ValidGenesbool1genes59,480[False, False, False, False, False, ...]

Genes and transcripts annotation

Our gene and transcripts annotation is based on Based on GRCh38.p13 gencode V35 primary sequence assembly.

We discarded genes or transcripts that overlapped or mapped to other genes or non-coding RNAs 3’ UTR.

The GTF file used for read counts: gb_pri_annot_filtered.gtf.gz

The genes and transcripts that were discarded: gb_pri_filtered_transcripts.txt.gz