NLMyo🔧: a toolbox built to leverage the power of Large Language Models (LLMs) to exploit histology text reports.
NLMyo🔧: is a toolbox built to leverage the power of Large Language Models (LLMs) to exploit histology text reports.
Available tools:
- Anonymizer🕵️: a tool to automatically censor patient histology report PDF.
- Extract Metadata📝: a tool to extract metadata from histology reports such as biopsy number, muscle, diagnosis...
- Auto Classify 🪄: a tool to automatically predict a diagnosis of congenital myopathy subtype from an histology reports using AI (large language models). Currently can predict between: Nemaline Myopathy, Core Myopathy, Centro-nuclear Myopathy, Non Congenital Myopathy (NON-MC).
- Report Search 🔎: a tool to search for a specific term in a set of histology reports. The tool will return the top 5 reports containing closest to your symptom query from our database of reports..
🚨 DISCLAIMER: For some tools you can select OpenAI API mode for better results. In OpenAI mode, all data inserted in this tools are sent to OpenAI servers. Please do not upload private or non-anonymized data. As per their terms of service OpenAI does not retain any data (for more time than legal requirements, click for source) and do not use them for trainning. However, we do not take any responsibility for any data leak.
This project is free and open-source under the AGPL license, feel free to fork and contribute to the development.
You can use the demo version at https://lbgi.fr/NLMyo/ or see the #How To Install to have your own instance.
Once on the website, simply select the right tool in the sidebar on the left.
Here is a sample pdf that you can use with the tools PDF File
- Create a
.env
file with your OpenAI API key such asOPENAI_API_KEY=sk-...
- Install the venv with
poetry install
and activate withsource .venv/bin/activate
- Get the Vicuna LLM model
cd models && wget https://huggingface.co/eachadea/ggml-vicuna-7b-1.1/resolve/main/ggml-vic7b-q4_1.bin
- If you are from our lab and have SSH access you can pull the DVC Data (Raw Data + ChromaDB) with
dvc pull
- If you are not from our lab and want to create your own embedding. Create a folder
data/processed/
containing all your*.txt
file to embed. And runpython ingest.py
to create the ChromaDB (vector store) - Run the app using
streamlit run Home.py
Creator and Maintainer: Corentin Meyer, 3rd year PhD Student in the CSTB Team, ICube — CNRS — Unistra corentin.meyer@etu.unistra.fr
[placeholder]
NLMyo is born within the collaboration between the CSTB Team @ ICube led by Julie D. Thompson, the Morphological Unit of the Institute of Myology of Paris led by Teresinha Evangelista, the imagery platform MyoImage of Center of Research in Myology led by Bruno Cadot, the photonic microscopy platform of the IGMBC led by Bertrand Vernay and the Pathophysiology of neuromuscular diseases team @ IGBMC led by Jocelyn Laporte