[DL Edition] T038: Protein Ligand Interaction Prediction #290

Old-Shatterhand · 2022-12-08T15:09:46Z

Description

Proof of concept for GNN-based protein ligand interaction prediction in talktorial T038

None

Initial draft for further discussion

review-notebook-app · 2022-12-08T15:09:50Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Old-Shatterhand · 2022-12-09T14:07:47Z

After some minor updates on the nodebook, the following work is left.

Old-Shatterhand · 2022-12-19T14:01:50Z

Pullrequest of talktorial about GNN-based protein-ligand interaction prediction.

One line summary: Introduction of GNN-based protein-ligand interaction prediction
Potential labels or categories (e.g. machine learning, small molecules, online APIs): Kinases, Machine Learning, Graph Neural Networks
Time it took to execute (approx.): 1 hour
I have used the talktorial template and followed the content and formatting suggestions there
Packages must be open-sourced and should be installable from conda-forge. If you are adding new packages to the TeachOpenCADD environment, please check if already installed packages can perform the same functionality and if not leave a sentence explaining why the new addition is needed. If the new package is not on conda-forge, please list them and their intended usage here.
- biotite, pypdb, chembl-webresource-client, rdkit: Already in TeachOpenCADD
- torch 1.10.1, torch-geometric 2.2.0, torch-cluster 1.6.0, torch-scatter 2.0.9, torch-sparse 0.6.13, torch-spline-conv 1.2.1, cpuonly 2.0: From "dubious" sources (either own conda-channel (pyorch) or installed as pip wheel. All of them only for cpu
Data must be publicly available, preferably accessible via a webserver or downloadable via a URL. Please list the data resources that you use and how to access them:
- KiBA dataset: Access via website

Talktorial includes cross-references to other talktorials if applicable
The table of contents reflects the talktorial story-line; order of #, ##, ### headers is correct
URLs are linked with meaningful words, instead of pasting the URL directly or linking words like here.
I have spell-checked the notebook
Images have enough resolution to be rendered with quality, without being too heavy.
All figures have a description
Markdown cell content is still in-line with code cell output (whenever results are discussed)
I have checked that cell outputs are not incredibly long (this applies also to DataFrames)
Formatting looks correctly on the Sphinx render (bold, italics, figure placing)

We present our talktorials on our TeachOpenCADD website (https://projects.volkamerlab.org/teachopencadd/), so we have to check as well if the Jupyter notebook renders nicely there.

If this PR adds a new talktorial, please follow these steps:
- Add your talktorial to the complete list of talktorials here (at the end).
- Add your talktorial to one or multiple of the collections here. Or propose a new collection section in your PR.
- Add your talktorial's nblink file by running python generate_nblinks.py from within the directory teachopencadd/docs/talktorials.
- Please complile the website following the instructions here.
Check the rendering of the talktorial of this PR.
Is your talktorial listed in the talktorial list?
Is your talktorial listed in the talktorial collections?
- Add a picture for your talktorial in the collection view by following these instructions.

gerritgr · 2023-02-10T09:43:17Z

I will (and elsewhere)-> We will? (I think in other notebooks, it is third person, have not checked though).
"field of protein ligand interaction prediction" -> "protein-ligand"
Maybe explain the terms protein and ligand very shortly in the intro.
Titles are lowercase (Sentence Case) in the other notebooks.
"..., I'll link to this otherwise, I'll explain new things below." -> "... , I will link to this. Otherwise, I will explain new things below."
"one wants to" -> wants
There are some other typos, but Grammarly or so can catch these I guess.
simple Feed-forward Neural Network (FNN): do you mean MLP? GNNs are also technically feed-forward networks.
State in the beginning of the workflow that this is a binary classification task.
" from the PDB entry with ID 4O75." link to a talkturial explaining PDB earlier.
explain C_alpha
In the "Technical background", say that we use the same GNN architecture for both ligands and proteins.
I think the BCE explanation should be clarified. What are the negative and positive samples here (binding and non-binding?)?
suppress DtypeWarning and add a comment explaining what kiba_preprocessing is doing
"Storing and representing data in PLI-prediction is a bit different from other neural networks. ": You mean the input data (i.e., the graphs), right? Maybe clarify this. Also, even though it is obvious, PLI was not introduced as an abbreviation.
Maybe merge together with the "Data Points" subsection.
I have not checked the code yet.

Old-Shatterhand · 2023-02-14T11:15:34Z

I implemented Gerrits comments and uploaded a new notebook.

mbackenkoehler · 2023-03-10T13:25:34Z