aertslab · LukasMahieu · Jan 15, 2025 · Jan 2, 2025 · Jan 2, 2025 · Jan 3, 2025
diff --git a/docs/api/datasets.md b/docs/api/datasets.md
@@ -12,6 +12,7 @@ Downloading of use case datasets which ar explored in the example analyses.
 
     get_dataset
     get_motif_db
+    get_model
     Genome
     register_genome
 ```
diff --git a/docs/index.md b/docs/index.md
@@ -9,6 +9,7 @@
 installation
 tutorials/index
 api/index
+models/index
 changelog
 contributing
 references

diff --git a/docs/models/biccn.rst b/docs/models/biccn.rst
@@ -0,0 +1,45 @@
+BICCN
+============
+
+.. sidebar:: Model Features
+
+   - **Genome**: *mm10*
+   - **Type**: Peak Regression
+   - **Parameters**: 6.3M
+   - **Size**: 23MB
+   - **Input shape**: (2114, 4)
+   - **Output shape**: (19,)
+
+The **BICCN** model is a peak regression model fine-tuned to cell type-specific regions for cell types in the mouse cortex. It was used in the BICCN Challenge, to predict in vivo activity of a large set of validated enhancers. The selected model was the one that had the highest ranking out of all submitted sequence-models.
+
+After pretraining on all consensus peaks, the model was fine-tuned to specific peaks. Specific peaks were determined through the ratio of highest and second highest peak, and the ratio of the second and third highest peak. These sets of regions were then used as input to the model, where 2114bp one-hot encoded DNA sequences were used to per cell type the mean peak accessibility over the center 1000 bp of the peak.
+
+The model is a CNN multiclass regression model using the :func:`~crested.tl.zoo.chrombpnet` architecture.
+
+Details of the data and the model can be found in the original publication.
+
+-------------------
+
+.. admonition:: Citation
+
+    Johansen, N.J., Kempynck, N. et al. Evaluating Methods for the Prediction of Cell Type-Specific Enhancers in the Mammalian Cortex. bioRxiv (2024). https://doi.org/10.1101/2024.08.21.609075
+
+Usage
+-------------------
+
+.. code-block:: python
+    :linenos:
+
+    import crested
+    import keras
+
+    # download model
+    model_path, output_names = crested.get_model("BICCN")
+
+    # load model
+    model = keras.models.load_model(model_path)
+
+    # make predictions
+    sequence = "A" * 500
+    predictions = crested.tl.predict(sequence, model)
+    print(predictions.shape)
diff --git a/docs/models/deepchickenbrain1.rst b/docs/models/deepchickenbrain1.rst
@@ -0,0 +1,45 @@
+DeepChickenBrain1
+=================
+
+.. sidebar:: Model Features
+
+   - **Genome**: *galGal6*
+   - **Type**: Topic Classification
+   - **Parameters**: 11.1M
+   - **Size**: 38MB
+   - **Input shape**: (500, 4)
+   - **Output shape**: (20,)
+
+The **DeepChickenBrain1** model is a topic classification model, fine-tuned with differential accessible regions (DARs) to make cell type level predictions for cell types in the chicken telencephalon.
+
+After pretraining on topics, obtained through `pycistopic <https://pycistopic.readthedocs.io/en/latest/>`_, DARs were calculated per cell type and used as cell type representation. These sets of regions were then used as input to the model, where 500bp one-hot encoded DNA sequences were used to predict the cell type(s) to which the regions belong.
+
+The model is a CNN multiclass classifier which uses the :func:`~crested.tl.zoo.deeptopic_cnn` architecture.
+
+Details of the data and the model can be found in the original publication.
+
+-------------------
+
+.. admonition:: Citation
+
+    Hecker, N., Kempynck, N. et al. Enhancer-driven cell type comparison reveals similarities between the mammalian and bird pallium. bioRxiv (2024). https://doi.org/10.1101/2024.04.17.589795
+
+Usage
+-------------------
+
+.. code-block:: python
+    :linenos:
+
+    import crested
+    import keras
+
+    # download model
+    model_path, output_names = crested.get_model("DeepChickenBrain1")
+
+    # load model
+    model = keras.models.load_model(model_path)
+
+    # make predictions
+    sequence = "A" * 500
+    predictions = crested.tl.predict(sequence, model)
+    print(predictions.shape)
diff --git a/docs/models/deepchickenbrain2.rst b/docs/models/deepchickenbrain2.rst
@@ -0,0 +1,48 @@
+DeepChickenBrain2
+=================
+
+.. sidebar:: Model Features
+
+   - **Genome**: *galGal6*
+   - **Type**: Peak Regression
+   - **Parameters**: 6.3M
+   - **Size**: 23MB
+   - **Input shape**: (2114, 4)
+   - **Output shape**: (20,)
+
+
+The **DeepChickenBrain2** model is a peak regression model fine-tuned to cell type-specific regions for cell types in the chicken telencephalon.
+
+After pretraining on all consensus peaks, the model was fine-tuned to specific peaks obtained with the :func:`~crested.pp.filter_regions_on_specificity` function. These sets of regions were then used as input to the model, where 2114bp one-hot encoded DNA sequences were used to per cell type the mean peak accessibility over the center 1000 bp of the peak.
+
+Peak heights were normalized across cell types with the :func:`~crested.pp.normalize_peaks` function.
+
+The model is a CNN multiclass regression model that uses the :func:`~crested.tl.zoo.chrombpnet` architecture.
+
+Details of the data and the model can be found in the original publication.
+
+-------------------
+
+.. admonition:: Citation
+
+    Hecker, N., Kempynck, N. et al. Enhancer-driven cell type comparison reveals similarities between the mammalian and bird pallium. bioRxiv (2024). https://doi.org/10.1101/2024.04.17.589795
+
+Usage
+-------------------
+
+.. code-block:: python
+    :linenos:
+
+    import crested
+    import keras
+
+    # download model
+    model_path, output_names = crested.get_model("DeepChickenBrain2")
+
+    # load model
+    model = keras.models.load_model(model_path)
+
+    # make predictions
+    sequence = "A" * 500
+    predictions = crested.tl.predict(sequence, model)
+    print(predictions.shape)
diff --git a/docs/models/deepflybrain.rst b/docs/models/deepflybrain.rst
@@ -0,0 +1,53 @@
+DeepFlyBrain
+============
+
+.. sidebar:: Model Features
+
+   - **Genome**: *dm6*
+   - **Type**: Topic Classification
+   - **Parameters**: 3.2M
+   - **Size**: 12MB
+   - **Input shape**: (500, 4)
+   - **Output shape**: (81,)
+
+The **DeepFlyBrain** model is a topic classification model trained on KCs, T-Neurons, and Glia cells from the adult fly brain (17K cells total).
+
+Using `pycistopic <https://pycistopic.readthedocs.io/en/latest/>`_, binarized topics per region were extracted for 81 target topics. These sets of regions were then used as input for a DL model, where 500bp one-hot encoded (ACGT) DNA sequences were used to predict the topic set to which the region belongs.
+
+The model is a hybrid CNN-RNN multiclass classifier which is very similar to :func:`~crested.tl.zoo.deeptopic_lstm` with addition of a reverse complement layer in the first layer of the model.
+
+Details of the data and model can be found in the original publication.
+
+-------------------
+
+.. admonition:: Citation
+
+    Janssens, J., Aibar, S., Taskiran, I.I. et al. Decoding gene regulation in the fly brain. Nature 601, 630–636 (2022). https://doi.org/10.1038/s41586-021-04262-z
+
+Usage
+-------------------
+
+.. code-block:: python
+    :linenos:
+
+    import crested
+    import keras
+
+    # download model
+    model_path, output_names = crested.get_model("DeepFlyBrain")
+
+    # load model
+    model = keras.models.load_model(model_path)
+
+    # make predictions
+    sequence = "A" * 500
+    predictions = crested.tl.predict(sequence, model)
+    print(predictions.shape)
+
+-------------------
+
+.. warning::
+
+    DeepFlyBrain was originally trained using Tensorflow 1 as the backend.
+    Even though the model architecture and weights are exactly the same, there will be slight differences in the output compared to the original model due to backend changes between Tensorflow 1 and 2.
+    Overall the correlation between the original and the Keras 3 model is very high (0.99+), but if you want the exact same outputs and contribution plots as in the original publication, you should use an older, compatible environment which you can find in `kipoi <https://kipoi.org/models/DeepFlyBrain/>`_.
diff --git a/docs/models/deephumanbrain.rst b/docs/models/deephumanbrain.rst
@@ -0,0 +1,48 @@
+DeepHumanBrain
+==============
+
+.. sidebar:: Model Features
+
+   - **Genome**: *hg38*
+   - **Type**: Peak Regression
+   - **Parameters**: 25.3M
+   - **Size**: 91MB
+   - **Input shape**: (2114, 4)
+   - **Output shape**: (76,)
+
+
+The **DeepHumanBrain** model is a peak regression model fine-tuned to cell type-specific regions for cell types in the whole human brain. The dataset was obtained from Li et al., 2023 (Science).
+
+After pretraining on all consensus peaks, the model was fine-tuned to specific peaks obtained with the :func:`~crested.pp.filter_regions_on_specificity` function. These sets of regions were then used as input to the model, where 2114bp one-hot encoded DNA sequences were used to per cell type the mean peak accessibility over the center 1000 bp of the peak.
+
+Peak heights were normalized across cell types with the :func:`~crested.pp.normalize_peaks` function.
+
+The model is a CNN multiclass regression model that uses the :func:`~crested.tl.zoo.chrombpnet` architecture. It has 1024 convolutional filters per layer instead of the default 512..
+
+Details of the data and the model can be found in the original publication.
+
+-------------------
+
+.. admonition:: Citation
+
+    Hecker, N., Kempynck, N. et al. Enhancer-driven cell type comparison reveals similarities between the mammalian and bird pallium. bioRxiv (2024). https://doi.org/10.1101/2024.04.17.589795
+
+Usage
+-------------------
+
+.. code-block:: python
+    :linenos:
+
+    import crested
+    import keras
+
+    # download model
+    model_path, output_names = crested.get_model("DeepHumanBrain")
+
+    # load model
+    model = keras.models.load_model(model_path)
+
+    # make predictions
+    sequence = "A" * 500
+    predictions = crested.tl.predict(sequence, model)
+    print(predictions.shape)
diff --git a/docs/models/deephumancortex1.rst b/docs/models/deephumancortex1.rst
@@ -0,0 +1,46 @@
+DeepHumanCortex1
+================
+
+.. sidebar:: Model Features
+
+   - **Genome**: *hg38*
+   - **Type**: Topic Classification
+   - **Parameters**: 11.1M
+   - **Size**: 37MB
+   - **Input shape**: (500, 4)
+   - **Output shape**: (13,)
+
+
+The **DeepHumanCortex1** model is a topic classification model, fine-tuned with differential accessible regions (DARs) to make cell type level predictions for cell types in the human motor cortex. The dataset was obtained from Ma et al., 2022 (Science).
+
+After pretraining on topics, obtained through `pycistopic <https://pycistopic.readthedocs.io/en/latest/>`_, DARs were calculated per cell type and used as cell type representation. These sets of regions were then used as input to the model, where 500bp one-hot encoded DNA sequences were used to predict the cell type(s) to which the regions belong.
+
+The model is a CNN multiclass classifier which uses the :func:`~crested.tl.zoo.deeptopic_cnn` architecture.
+
+Details of the data and the model can be found in the original publication.
+
+-------------------
+
+.. admonition:: Citation
+
+    Hecker, N., Kempynck, N. et al. Enhancer-driven cell type comparison reveals similarities between the mammalian and bird pallium. bioRxiv (2024). https://doi.org/10.1101/2024.04.17.589795
+
+Usage
+-------------------
+
+.. code-block:: python
+    :linenos:
+
+    import crested
+    import keras
+
+    # download model
+    model_path, output_names = crested.get_model("DeepHumanCortex1")
+
+    # load model
+    model = keras.models.load_model(model_path)
+
+    # make predictions
+    sequence = "A" * 500
+    predictions = crested.tl.predict(sequence, model)
+    print(predictions.shape)
diff --git a/docs/models/deephumancortex2.rst b/docs/models/deephumancortex2.rst
@@ -0,0 +1,46 @@
+DeepHumanCortex2
+================
+
+.. sidebar:: Model Features
+
+   - **Genome**: *hg38*
+   - **Type**: Topic Classification
+   - **Parameters**: 13.9M
+   - **Size**: 47MB
+   - **Input shape**: (500, 4)
+   - **Output shape**: (14,)
+
+
+The **DeepHumanCortex2** model is a topic classification model, fine-tuned with differential accessible regions (DARs) to make cell type level predictions for cell types in the human motor cortex. The dataset was obtained from Bakken et al., 2021(Science).
+
+After pretraining on topics, obtained through `pycistopic <https://pycistopic.readthedocs.io/en/latest/>`_, DARs were calculated per cell type and used as cell type representation. These sets of regions were then used as input to the model, where 500bp one-hot encoded DNA sequences were used to predict the cell type(s) to which the regions belong.
+
+The model is a CNN multiclass classifier which is uses the :func:`~crested.tl.zoo.deeptopic_cnn` architecture.
+
+Details of the data and the model can be found in the original publication.
+
+-------------------
+
+.. admonition:: Citation
+
+    Hecker, N., Kempynck, N. et al. Enhancer-driven cell type comparison reveals similarities between the mammalian and bird pallium. bioRxiv (2024). https://doi.org/10.1101/2024.04.17.589795
+
+Usage
+-------------------
+
+.. code-block:: python
+    :linenos:
+
+    import crested
+    import keras
+
+    # download model
+    model_path, output_names = crested.get_model("DeepHumanCortex2")
+
+    # load model
+    model = keras.models.load_model(model_path)
+
+    # make predictions
+    sequence = "A" * 500
+    predictions = crested.tl.predict(sequence, model)
+    print(predictions.shape)