LLM Vision-to-Markdown Converter

This repository provides a script that uses a custom Large Language Model (LLM) endpoint to analyze images and convert their visual content into Markdown format. The script takes .jpg or .jpeg images from the images directory, sends them to the LLM along with a prompt, and receives a fully Markdown-formatted description of the image.

Features

LLM Integration: Easily configure the script to use custom LLM endpoints such as Llama 3.2 Vision models.
Image-to-Markdown Conversion: The response is returned solely in Markdown format for easy integration into documentation, reports, or websites.
Customizable Output: Optionally save LLM responses to .md files within the output directory.

Getting Started

Prerequisites

Python 3.9+ recommended
A compatible LLM API endpoint, model name, and API key
pip for installing dependencies

Installation

Clone the repository:

git clone https://github.com/yourusername/llm-vision-to-markdown.git
cd llm-vision-to-markdown

Install the required dependencies:
```
pip install -r requirements.txt
```

Configuration

Copy the .env.example file to .env and fill in your LLM configuration:

LLM_BASE_URL=<your-llm-endpoint>
LLM_MODEL_NAME=<your-llm-model-name>
LLM_API_KEY=<your-llm-api-key>
SAVE_RESPONSE_TO_FILE=true  # or false

LLM_BASE_URL: The base URL of the LLM inference endpoint.
LLM_MODEL_NAME: The model name (e.g., "llama3.2-vision").
LLM_API_KEY: Your API key for authentication.
SAVE_RESPONSE_TO_FILE: If true, Markdown responses are saved to the output folder.

Usage

Place one or more .jpg or .jpeg images in the images directory.
Run the script:
```
python main.py
```
The script will process each image, send it to the LLM, and print the Markdown response. If SAVE_RESPONSE_TO_FILE is true, a corresponding .md file will be created in the output folder.

Example

If you place sample.jpg in the images folder and run the script, you might see output like:

File: sample.jpg
User Prompt: 
[...prompt...]
Model Response:
# A Beautiful Landscape

- **Mountains**: Tall, snow-capped peaks in the background
- **Lake**: A calm, reflective surface at the center
- **Trees**: Lush green foliage on both sides of the view
--------------------------------------------------

This response would be saved in output/sample.jpg.md.

Troubleshooting

If the script prints "No .jpg or .jpeg files in the images folder.", ensure that you have placed images in the correct directory.
If you encounter authentication or network issues, verify your .env settings and ensure you have a stable internet connection.

Contributing

Feel free to open issues or create pull requests for bug fixes, feature requests, or other improvements.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.env-example		.env-example
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requrements.txt		requrements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Vision-to-Markdown Converter

Features

Getting Started

Prerequisites

Installation

Configuration

Usage

Example

Troubleshooting

Contributing

License

About

Releases

Packages

Languages

dennismeissel/LLM-Vision-to-Markdown-Converter

Folders and files

Latest commit

History

Repository files navigation

LLM Vision-to-Markdown Converter

Features

Getting Started

Prerequisites

Installation

Configuration

Usage

Example

Troubleshooting

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages