Merge branch 'develop' into fix/lipoic_acid_biosynth

SysBioChalmers · Oct 28, 2024 · 7095b2e · 7095b2e
2 parents 01254f6 + 19af535
commit 7095b2e
Show file tree

Hide file tree

Showing 21 changed files with 13,230 additions and 53 deletions.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -5,9 +5,9 @@ Is it an additional test/function/dataset?
 e.g. This PR improves/fixes # by ...
 -->
 
-
 **I hereby confirm that I have:**
 <!-- *Note: replace [ ] with [X] to check the box. -->
-- [ ] Tested my code on my own computer for running the model
-- [ ] Selected `develop` as a target branch
-- [ ] Any removed reactions and metabolites have been moved to the corresponding deprecated identifier lists
+- [ ] Any removed reactions and metabolites have been moved to the corresponding deprecated identifier lists in `data/deprecatedIdentifiers/`.
+<!-- Chose ONE of the following two options. -->
+- [ ] This PR has `develop` as target branch, and will be resolved with a **squash-merge**.
+- [ ] This PR has `main` as target branch, and will be resolved with a **merge commit**.
diff --git a/.github/workflows/check-metabolictasks.yml b/.github/workflows/check-metabolictasks.yml
@@ -1,6 +1,6 @@
 name: Test metabolic tasks
 
-on: [push, pull_request]
+on: [push]
 
 jobs:
   check-metabolictasks:

diff --git a/.github/workflows/commentGeneEssential.md b/.github/workflows/commentGeneEssential.md
@@ -0,0 +1,7 @@
+This PR has been [automatically tested with GH Actions](https://github.com/SysBioChalmers/Human-GEM/actions/runs/{GH_ACTION_RUN}). Here is the output of the gene essentiality test:
+
+<pre>
+{TEST_RESULTS}
+</pre>
+
+> _Note: In the case of multiple test runs, this post will be edited._
diff --git a/.github/workflows/commentMacaw.md b/.github/workflows/commentMacaw.md
@@ -0,0 +1,9 @@
+This PR has been [automatically tested with GH Actions](https://github.com/SysBioChalmers/Human-GEM/actions/runs/{GH_ACTION_RUN}). Here is the output of the [MACAW](https://github.com/Devlin-Moyer/macaw) test:
+
+<pre>
+{TEST_RESULTS}
+</pre>
+
+This and a more detailed output from MACAW are also committed to `data/macawResults/`.
+
+> _Note: In the case of multiple test runs, this post will be edited._
diff --git a/.github/workflows/commentsFromTests.md b/.github/workflows/commentsFromTests.md
@@ -0,0 +1,7 @@
+This PR has been [automatically tested with GH Actions](https://github.com/SysBioChalmers/Human-GEM/actions/runs/{GH_ACTION_RUN}). Here is the output of the gene essentiality test:
+
+<pre>
+{TEST_RESULTS}
+</pre>
+
+> _Note: In the case of multiple test runs, this post will be edited._
diff --git a/.github/workflows/gene-essentiality.yml b/.github/workflows/gene-essentiality.yml
@@ -0,0 +1,80 @@
+name: Check gene essentiality with Hart 2015
+
+on:
+  pull_request:
+    branches:
+      - "main"
+      - "develop"
+
+jobs:
+  check-metabolictasks:
+    runs-on: self-hosted
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Fetch RAVEN
+        uses: actions/checkout@v4
+        with:
+          repository: "SysBioChalmers/RAVEN"
+          path: "RAVEN"
+
+      - name: Run gene essentiality
+        id: essentiality
+        run: >
+          TEST_RESULTS=$(/usr/local/bin/matlab -batch
+          "warning('off', 'MATLAB:rmpath:DirNotFound');
+          rmpath(genpath('/home/m/ecModels-dependencies/RAVEN'));
+          rmpath(genpath('/home/m/actions-runner'));
+          addpath(genpath('.'));
+          setRavenSolver('gurobi');
+          ihuman = readYAMLmodel('model/Human-GEM.yml');
+          taskStruct = parseTaskList('data/metabolicTasks/metabolicTasks_Essential.txt');
+          [~, eGenes] = evalc('estimateEssentialGenes(ihuman, ''Hart2015_RNAseq.txt'', taskStruct);');
+          output = transpose(evaluateHart2015Essentiality(eGenes));
+          fid = fopen('data/testResults/gene-essential.csv','w');
+          fprintf(fid,[repmat('%s,',1,9) '%s\n'],output{:,1});
+          fprintf(fid,['%s,%d,%d,%d,%d' repmat(',%.4g',1,5) '\n'],output{:,2:end});
+          fclose(fid);
+          disp(cell2table(transpose(output(:,2:end)),'VariableNames',output(:,1)));") &&
+          echo "$TEST_RESULTS" &&
+          PARSED_RESULTS="${TEST_RESULTS//$'\n'/'<br>'}" &&
+          PARSED_RESULTS="${PARSED_RESULTS//$'\r'/'<br>'}" &&
+          echo "results=$PARSED_RESULTS" >> $GITHUB_OUTPUT
+
+      - name: Mention PR# in README.md
+        env:
+          PR_NUMBER: ${{ github.event.number }}
+        run: sed -i -e "s/[[:digit:]]\{3,4\}\*\* (gene /$PR_NUMBER\*\* (gene /" data/testResults/README.md
+
+      - name: Update local branch before committing changes
+        env:
+          BRANCH_NAME: ${{ github.head_ref || github.ref_name }} 
+        run: |
+          git stash
+          git fetch
+          git checkout $BRANCH_NAME
+          if git stash list | grep -q 'stash@{'; then
+            git stash pop
+          fi
+        
+      - name: Auto-commit results
+        uses: stefanzweifel/git-auto-commit-action@v5
+        with:
+          commit_user_name: memote-bot
+          commit_message: "chore: add gene essentiality test result"
+          file_pattern: data/testResults/*
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          PR_NUMBER: ${{ github.event.number }}
+
+      - name: Post comment
+        uses: NejcZdovc/comment-pr@v2
+        with:
+          file: "commentGeneEssential.md"
+          identifier: "GITHUB_COMMENT_GENE"
+        env:
+          GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
+          TEST_RESULTS: ${{steps.essentiality.outputs.results}}
+          GH_ACTION_RUN: ${{github.run_id}}
diff --git a/.github/workflows/macaw-tests.yml b/.github/workflows/macaw-tests.yml
@@ -0,0 +1,71 @@
+name: Run macaw tests
+
+on: [pull_request]
+
+jobs:
+  macaw-tests:
+    runs-on: ubuntu-latest
+    timeout-minutes: 60
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Test checkout
+        run: ls -la && cd code && ls -la
+
+      - name: Set up Python 3
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.10"
+
+      - name: Install macaw
+        run: pip install git+https://github.com/Devlin-Moyer/macaw.git@main numpy==1.26.4
+
+      - name: Run macaw
+        id: macaw-run
+        run: |
+          TEST_RESULTS=$(python code/test/macawTests.py)
+          echo $TEST_RESULTS
+          PARSED_RESULTS="${TEST_RESULTS//$'\n'/'<br>'}"
+          PARSED_RESULTS="${PARSED_RESULTS//$'\r'/'<br>'}"
+          echo $PARSED_RESULTS
+          echo "results=$PARSED_RESULTS" >> $GITHUB_OUTPUT
+          printf "$TEST_RESULTS" > data/testResults/macaw_summary.md
+
+      - name: Mention PR# in README.md
+        env:
+          PR_NUMBER: ${{ github.event.number }}
+        run: sed -i -e "s/[[:digit:]]\{3,4\}\*\* (MACAW)/$PR_NUMBER\*\* (MACAW)/" data/testResults/README.md
+
+      - name: Update local branch before committing changes
+        env:
+          BRANCH_NAME: ${{ github.head_ref || github.ref_name }} 
+        run: |
+          git stash
+          git fetch
+          git checkout $BRANCH_NAME
+          if git stash list | grep -q 'stash@{'; then
+            git stash pop
+          fi
+
+      - name: Auto-commit results
+        uses: stefanzweifel/git-auto-commit-action@v4
+        with:
+          commit_user_name: memote-bot
+          commit_message: "chore: add macaw test result"
+          file_pattern: data/testResults/*
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          PR_NUMBER: ${{ github.event.number }}
+
+      - name: Post comment
+        uses: NejcZdovc/comment-pr@v2
+        with:
+          file: "commentMacaw.md"
+          identifier: "GITHUB_COMMENT_MACAW"
+        env:
+          GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
+          TEST_RESULTS: ${{steps.macaw-run.outputs.results}}
+          GH_ACTION_RUN: ${{github.run_id}}
+
diff --git a/.github/workflows/yaml-conversion.yml b/.github/workflows/yaml-conversion.yml
@@ -1,6 +1,6 @@
 name: Test YAML conversion
 
-on: [push, pull_request]
+on: [push]
 
 jobs:
   yaml-conversion:

diff --git a/.github/workflows/yaml-validation.yml b/.github/workflows/yaml-validation.yml
@@ -1,6 +1,6 @@
 name: YAML validation
 
-on: [push, pull_request]
+on: [push]
 
 jobs:
   yaml-validation:

diff --git a/code/estimateEssentialGenes.m → code/test/estimateEssentialGenes.m b/code/estimateEssentialGenes.m → code/test/estimateEssentialGenes.m
diff --git a/code/evalGeneEssentialityPred.m → code/test/evalGeneEssentialityPred.m b/code/evalGeneEssentialityPred.m → code/test/evalGeneEssentialityPred.m
diff --git a/code/evaluateHart2015Essentiality.m → code/test/evaluateHart2015Essentiality.m b/code/evaluateHart2015Essentiality.m → code/test/evaluateHart2015Essentiality.m
@@ -31,16 +31,16 @@
 % Table S2 from the Hart2015 datasets:
 % In this study, essential genes are, by definition, a subset of fitness genes
 x = readtable('Hart2015_TableS2.xlsx');
-    
+
 % remove duplicated rows (MARCH1 and MARCH2 genes)
 ind1 = find(ismember(x.Gene,'MARCH1'),1,'last');
 ind2 = find(ismember(x.Gene,'MARCH2'),1,'last');
 x([ind1;ind2],:) = [];
-    
+
 % extract gene list and cell types
 genes = x.Gene;
 celltypes = regexprep(upper(x.Properties.VariableNames(3:7)'),'BF_','');
-    
+
 % for each cell type, determine the "fitness" genes, defined as those with
 % a 5% FDRs were chosen as thresholds (the values are extracted from the
 % supporting information of the Hart 2015 paper. Genes observed in 3 or	more
@@ -63,7 +63,7 @@
 % also get the genes that were essential in all 5 cell lines ("all")
 celltypes(end+1) = {'all'};
 fitness_mat(:,end+1) = all(fitness_mat == 1,2);
-    
+
 % put Hart2015 essentiality data into data structure for comparison
 expdata = {};
 expdata.genes = genes;
@@ -94,30 +94,31 @@
 % calculate true and false positives and negatives
 [TP,TN,FP,FN,Penr] = deal(nan(numel(tissues),1));  % initialize variables
 for i = 1:numel(tissues)
-    
+
     [~,tissue_ind] = ismember(tissues(i), expdata.tissues);
     if tissue_ind == 0
         continue  % a few tissues are missing from the DepMap dataset
     end
-    
+
     modelGenes = eGenes.refModel.genes;
     if strcmpi(tissues{i},'all')
         modelEssential = modelGenes(sum(modelPred,2) == size(modelPred,2));
     else
         modelEssential = modelGenes(modelPred(:,i));
     end
     modelNonEssential = setdiff(modelGenes,modelEssential);
-    
+
     expGenes = expdata.genes(~isnan(expdata.essential(:,tissue_ind)));
     expEssential = expdata.genes(expdata.essential(:,tissue_ind) == 1);
     expNonEssential = setdiff(expGenes,expEssential);
-    
+
     TP(i) = sum(ismember(modelEssential, expEssential));        % true positives
     TN(i) = sum(ismember(modelNonEssential, expNonEssential));  % true negatives
     FP(i) = sum(ismember(modelEssential, expNonEssential));     % false positives
-    FN(i) = sum(ismember(modelNonEssential, expEssential));     % false negatives    
+    FN(i) = sum(ismember(modelNonEssential, expEssential));     % false negatives
 
-    Penr(i) = EnrichmentTest(intersect(modelGenes,expGenes), intersect(modelEssential,expGenes), intersect(expEssential,modelGenes));
+    % Requires Statistics and Machine Learning Toolbox
+    %Penr(i) = EnrichmentTest(intersect(modelGenes,expGenes), intersect(modelEssential,expGenes), intersect(expEssential,modelGenes));
 end
 
 % calculate some metrics
@@ -126,13 +127,15 @@
 accuracy = (TP + TN)./(TP + TN + FP + FN);
 F1 = 2*TP./(2*TP + FP + FN);
 MCC = ((TP.*TN) - (FP.*FN))./sqrt((TP+FP).*(TP+FN).*(TN+FP).*(TN+FN));  % Matthews correlation coefficient
-PenrAdj = adjust_pvalues(Penr,'Benjamini');
+%PenrAdj = adjust_pvalues(Penr,'Benjamini');
 
 % get results for cell types
-results = [{'cellLine','TP','TN','FP','FN','accuracy','sensitivity','specificity','F1','MCC','Penr','logPenr','PenrAdj','logPenrAdj'};
-          [tissues, num2cell([TP, TN, FP, FN, accuracy, sensitivity, specificity, F1, MCC, Penr, -log10(Penr), PenrAdj, -log10(PenrAdj)])]];
-
+results = [{'cellLine','TP','TN','FP','FN','accuracy','sensitivity','specificity','F1','MCC'};
+    [tissues, num2cell([TP, TN, FP, FN, accuracy, sensitivity, specificity, F1, MCC])]];
 
+% Including Penr results
+%results = [{'cellLine','TP','TN','FP','FN','accuracy','sensitivity','specificity','F1','MCC','Penr','logPenr','PenrAdj','logPenrAdj'};
+%    [tissues, num2cell([TP, TN, FP, FN, accuracy, sensitivity, specificity, F1, MCC, Penr, -log10(Penr), PenrAdj, -log10(PenrAdj)])]];
 end
 
 function [penr,pdep] = EnrichmentTest(pop,sample,successes)

diff --git a/code/test/macawTests.py b/code/test/macawTests.py
@@ -0,0 +1,8 @@
+import cobra
+from macaw.main import dead_end_test, duplicate_test
+
+model = cobra.io.load_yaml_model("model/Human-GEM.yml")
+(dead_end_results, dead_end_edges) = dead_end_test(model)
+(duplicate_results, duplicate_edges) = duplicate_test(model)
+output = dead_end_results.merge(duplicate_results)
+output.to_csv('data/testResults/macaw_results.csv', index = False)
diff --git a/data/deprecatedIdentifiers/deprecatedReactions.tsv b/data/deprecatedIdentifiers/deprecatedReactions.tsv
@@ -299,3 +299,4 @@ rxns	rxnKEGGID	rxnBiGGID	rxnEHMNID	rxnHepatoNET1ID	rxnREACTOMEID	rxnRecon3DID	rx
 "MAR04205"	""	""	"RT0084"	""	""	"HMR_4205"	""	"HMR_4205"	"RCR20258"	""	0	""	""	"HMR_4205"
 "MAR08745"	""	"THMMPtm"	""	""	""	""	"MNXR104822"	"HMR_8745"	"RCR20669"	""	0	""	""	"HMR_8745"
 "MAR08747"	""	"THMPPtm"	""	""	""	""	"MNXR104824"	"HMR_8747"	"RCR20670"	""	0	""	""	"HMR_8747"
+"MAR07799"	""	"ATPasel"	""	""	""	""	"MNXR96136"	"HMR_7799"	"RCR20334"	""	0	""	""	"HMR_7799"
diff --git a/data/metabolicTasks/metabolicTasks_Full.txt b/data/metabolicTasks/metabolicTasks_Full.txt
@@ -254,8 +254,8 @@
 							8,11-eicosadienoic acid[c]	1									
 	148	Mead acid de novo synthesis (minimal substrates, physiological excretion)		O2[e];glucose[e]			CO2[e];H2O[e]										
 							mead acid[c]	1									
-	149	Henicosanoic acid de novo synthesis (minimal substrates, physiological excretion)		O2[e];glucose[e];isoleucine[c]			CO2[e];H2O[e];urate[e]										
-							henicosanoic acid[c]	1									
+	149	heneicosanoic acid de novo synthesis (minimal substrates, physiological excretion)		O2[e];glucose[e];isoleucine[c]			CO2[e];H2O[e];urate[e]										
+							heneicosanoic acid[c]	1									
 	150	Behenic acid de novo synthesis (minimal substrates, physiological excretion)		O2[e];glucose[e]			CO2[e];H2O[e]										
 							behenic acid[c]	1									
 	151	cis-erucic acid de novo synthesis (minimal substrates, physiological excretion)		O2[e];glucose[e]			CO2[e];H2O[e]										
@@ -372,7 +372,7 @@
 				O2[e]													
 	207	Mead acid (complete oxidation)		mead acid[e]	1	1	CO2[e];H2O[e]										
 				O2[e]													
-	208	Henicosanoic acid (complete oxidation)		henicosanoic acid[e]	1	1	CO2[e];H2O[e]										
+	208	heneicosanoic acid (complete oxidation)		heneicosanoic acid[e]	1	1	CO2[e];H2O[e]										
 				O2[e]													
 	209	Behenic acid (complete oxidation)		behenic acid[e]	1	1	CO2[e];H2O[e]										
 				O2[e]													

diff --git a/data/testResults/README.md b/data/testResults/README.md
@@ -0,0 +1,23 @@
+# Test results
+
+The file here contains results from the [MACAW](https://github.com/Devlin-Moyer/macaw) `dead_end_test` and `duplicate_test` tests, and from cell-line specific gene essentiality prediction based on the [Hart _et al._ (2015)](https://doi.org/10.1016/j.cell.2015.11.015) dataset.
+
+The test results shown here were obtained by the GitHub Actions run in **PR #675** (MACAW) and **PR #675** (gene essentiality), and will be updated by any subsequent PR. Summary results are shown as a comment in the corresponding PR.
+
+### MACAW: `dead_end_test`
+Looks for metabolites in Human-GEM that can only be produced by all reactions they participate in or only consumed, then identifies all reactions that are prevented from sustaining steady-state fluxes because of each of these dead-end metabolites. The simplest case of a dead-end metabolite is one that only participates in a single reaction. Also flags all reversible reactions that can only carry fluxes in a single direction because one of their metabolites can either only be consumed or only be produced by all other reactions it participates in.
+
+### MACAW: `duplicate_test`
+Identifies sets of reactions that may be duplicates of each other because they:
+
+- Involve exactly the same metabolites with exactly the same stoichiometric coefficients (but potentially different associated genes).
+- Involve exactly the same metabolites, but go in different directions and/or some are reversible and some are not.
+- Involve exactly the same metabolites, but with different stoichiometric coefficients.
+- Represent the oxidation and/or reduction of the same metabolite, but use different electron acceptors/donors from the given list of pairs of oxidized and reduced forms of various electron carriers (e.g. NAD(H), NADP(H), FAD(H2), ubiquinone/ubiquinol, cytochromes).
+
+It is possible for a single reaction to fit in multiple of the above categories. There are sometimes cases where sets of reactions that fall into one of the above categories are completely legitimate representations of real biochemistry (e.g. separate irreversible reactions for importing vs exporting the same metabolite because two different transporters encoded by different genes are each responsible for transporting that metabolite in only one direction, enzymes that can use NAD(H) or NADP(H) interchangeably to catalyze the same redox reaction), but reactions that meet these criteria are generally worth close examination to ensure that they should actually all exist as separate reactions.
+
+### Cell-line specific gene essentiality
+Evaluate gene essentiality predictions in 5 cell-line specific GEMs with experimental fitness data gathered from the [Hart _et al._ (2015)](https://doi.org/10.1016/j.cell.2015.11.015).
+
+Cell-line specific GEMs are constructed with tINIT2 for DLD1, GBM, HCT116, HeLa and RPE1 cell lines. Then, the `metabolicTasks_Essential.txt` list of tasks is used to identify essential genes in each of these models. The predicted gene essentiality is compared to results from a high-throughput CRISPR-Cas9 screen for identifying genes that affect fitness. Only the summary statistics of this comparison are kept. 
diff --git a/data/testResults/gene-essential.csv b/data/testResults/gene-essential.csv
@@ -0,0 +1,7 @@
+cellLine,TP,TN,FP,FN,accuracy,sensitivity,specificity,F1,MCC
+DLD1,36,2185,59,279,0.8679,0.1143,0.9737,0.1756,0.1529
+GBM,34,2165,61,298,0.8597,0.1024,0.9726,0.1593,0.1333
+HCT116,46,2207,53,309,0.8616,0.1296,0.9765,0.2026,0.1905
+HELA,30,2263,69,254,0.8765,0.1056,0.9704,0.1567,0.124
+RPE1,14,2204,81,259,0.8671,0.05128,0.9646,0.07609,0.02585
+all,7,2408,92,109,0.9232,0.06034,0.9632,0.06512,0.0254