Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev 0.7.6 - Notebook updates #285

Merged
merged 23 commits into from
Oct 3, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
8e7ec05
notebook: use heatmap to depict COG distribution
matinnuhamunada Sep 26, 2023
b068d5a
notebook: enrich deeptf with faa annotation
matinnuhamunada Sep 26, 2023
6f8117a
notebook: generate graphml file for cytoscape
matinnuhamunada Sep 27, 2023
3684896
fix: correct notebook links and display
matinnuhamunada Sep 27, 2023
403a349
feat: colorise bigscape class and add knownclusterblast
matinnuhamunada Sep 27, 2023
975ea78
fix: cleanup unused cell
matinnuhamunada Sep 27, 2023
e99f4d1
feat: extract ARTS 4 tables
matinnuhamunada Sep 29, 2023
07f7522
fix: correct new arts output format
matinnuhamunada Sep 29, 2023
0936781
fix: update rule for arts output and notebook
matinnuhamunada Sep 29, 2023
67a8dd9
test: update GTDB API result
matinnuhamunada Sep 29, 2023
75cdd38
test: update expected output for arts extract
matinnuhamunada Sep 29, 2023
d0c4cef
test: merge arts results
matinnuhamunada Sep 29, 2023
b261a0c
test: add missing expected duptable
matinnuhamunada Sep 29, 2023
cc24be6
test: add missing config and symlink
matinnuhamunada Sep 29, 2023
2989d76
test: add final step of arts
matinnuhamunada Sep 29, 2023
7bf99b8
test: add config
matinnuhamunada Sep 30, 2023
34ade74
feat: annotate bigfam models
matinnuhamunada Sep 30, 2023
c3e0474
fix: refrain using directory in params
matinnuhamunada Sep 30, 2023
aef96dc
fix: correct shell script
matinnuhamunada Sep 30, 2023
9e4de6d
chore: update java requirement for metabase
matinnuhamunada Oct 3, 2023
0c11c77
notebook: add instruction for cblaster-bgc
matinnuhamunada Oct 3, 2023
e057e48
chore: remove unused notebooks
matinnuhamunada Oct 3, 2023
d32336d
chore: bump version 0.7.6
matinnuhamunada Oct 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
notebook: enrich deeptf with faa annotation
  • Loading branch information
matinnuhamunada committed Sep 26, 2023
commit b068d5afa92c595ddd606c7a06daa70a2d39f4c5
45 changes: 41 additions & 4 deletions workflow/notebook/deeptfactor.py.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,46 @@
"outputs": [],
"source": [
"df = pd.read_csv(\"../tables/df_deeptfactor.csv\", index_col=0)\n",
"df = df[df.deeptfactor_prediction == True]\n",
"\n",
"display(HTML(DT(df, columnDefs=[{\"className\": \"dt-center\", \"targets\": \"_all\", \"searchable\": True}], maxColumns=df.shape[1], maxBytes=0)))"
"df = df[df.deeptfactor_prediction == True]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8236e8fb-710f-4e8c-8153-fc8984197b6b",
"metadata": {},
"outputs": [],
"source": [
"faa_dictionary = []\n",
"df_gtdb = pd.read_csv(\"../tables/df_gtdb_meta.csv\")\n",
"for genome_id in df_gtdb.genome_id:\n",
" with open(f\"../../../interim/prokka/{genome_id}/{genome_id}.faa\", \"r\") as f:\n",
" data = f.readlines()\n",
" aa_dict = [i.strip(\"\\n\").strip(\">\").split(\" \", 1) for i in data if i.startswith(\">\")]\n",
" df_aa = pd.DataFrame(aa_dict, columns=[\"locus_tag\", \"annotation\"]).set_index(\"locus_tag\")\n",
" df_aa[\"genome_id\"] = genome_id\n",
" faa_dictionary.append(df_aa)\n",
"df_aa = pd.concat(faa_dictionary)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "321ccedf-5f5e-49e0-8417-ded2a706117e",
"metadata": {},
"outputs": [],
"source": [
"df_deeptf = pd.merge(df.reset_index().drop(columns='genome_id'), df_aa.reset_index(), on=\"locus_tag\", how=\"outer\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "be0120e7-613b-4e26-93a9-380eb4a62b50",
"metadata": {},
"outputs": [],
"source": [
"display(HTML(DT(df_deeptf, columnDefs=[{\"className\": \"dt-center\", \"targets\": \"_all\", \"searchable\": True}], maxColumns=df_deeptf.shape[1], maxBytes=0)))"
]
},
{
Expand Down Expand Up @@ -97,7 +134,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
"version": "3.9.18"
}
},
"nbformat": 4,
Expand Down
7 changes: 3 additions & 4 deletions workflow/notebook/eggnog.py.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -197,9 +197,9 @@
"id": "8984c203-d0b2-4bb9-ac8e-7db534864ea8",
"metadata": {},
"source": [
"## Summary Table\n",
"## Summary\n",
"\n",
"### Summary of number of unique genes belonging to each of the COG categories"
"### Clustered Heatmap of COG Categories across Genomes"
]
},
{
Expand All @@ -212,7 +212,6 @@
"outputs": [],
"source": [
"sns.clustermap(df_cog, cmap='YlGnBu', annot=True, fmt=\"d\", method='ward', linewidths=.5)\n",
"plt.title('Clustered Heatmap of COG Categories across Genomes', loc='left', fontsize=16)\n",
"plt.show()\n",
"\n",
"#display(HTML(DT(df_cog_unique, columnDefs=[{\"className\": \"dt-center\", \"targets\": \"_all\"}],)))"
Expand All @@ -223,7 +222,7 @@
"id": "7e757544-2bde-4b14-a0a7-b74df3be9bf5",
"metadata": {},
"source": [
"##### Legend"
"#### Legend"
]
},
{
Expand Down