This repository provides a script that uses a custom Large Language Model (LLM) endpoint to analyze images and convert their visual content into Markdown format. The script takes .jpg
or .jpeg
images from the images
directory, sends them to the LLM along with a prompt, and receives a fully Markdown-formatted description of the image.
- LLM Integration: Easily configure the script to use custom LLM endpoints such as Llama 3.2 Vision models.
- Image-to-Markdown Conversion: The response is returned solely in Markdown format for easy integration into documentation, reports, or websites.
- Customizable Output: Optionally save LLM responses to
.md
files within theoutput
directory.
- Python 3.9+ recommended
- A compatible LLM API endpoint, model name, and API key
pip
for installing dependencies
-
Clone the repository:
git clone https://github.com/yourusername/llm-vision-to-markdown.git cd llm-vision-to-markdown
-
Install the required dependencies:
pip install -r requirements.txt
Copy the .env.example
file to .env
and fill in your LLM configuration:
LLM_BASE_URL=<your-llm-endpoint>
LLM_MODEL_NAME=<your-llm-model-name>
LLM_API_KEY=<your-llm-api-key>
SAVE_RESPONSE_TO_FILE=true # or false
LLM_BASE_URL
: The base URL of the LLM inference endpoint.LLM_MODEL_NAME
: The model name (e.g., "llama3.2-vision").LLM_API_KEY
: Your API key for authentication.SAVE_RESPONSE_TO_FILE
: Iftrue
, Markdown responses are saved to theoutput
folder.
-
Place one or more
.jpg
or.jpeg
images in theimages
directory. -
Run the script:
python main.py
-
The script will process each image, send it to the LLM, and print the Markdown response. If
SAVE_RESPONSE_TO_FILE
istrue
, a corresponding.md
file will be created in theoutput
folder.
If you place sample.jpg
in the images
folder and run the script, you might see output like:
File: sample.jpg
User Prompt:
[...prompt...]
Model Response:
# A Beautiful Landscape
- **Mountains**: Tall, snow-capped peaks in the background
- **Lake**: A calm, reflective surface at the center
- **Trees**: Lush green foliage on both sides of the view
--------------------------------------------------
This response would be saved in output/sample.jpg.md
.
- If the script prints "No .jpg or .jpeg files in the images folder.", ensure that you have placed images in the correct directory.
- If you encounter authentication or network issues, verify your
.env
settings and ensure you have a stable internet connection.
Feel free to open issues or create pull requests for bug fixes, feature requests, or other improvements.
This project is licensed under the MIT License.