-
Notifications
You must be signed in to change notification settings - Fork 22
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Enhanced Vision Parser and LLM functionality and introduced new image…
… processing capabilities with support for base64 and URL image modes
- Loading branch information
1 parent
c138c1e
commit 5bc5025
Showing
16 changed files
with
1,438 additions
and
230 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,23 @@ | ||
from .parser import VisionParser, PDFPageConfig, VisionParserError, UnsupportedFileError | ||
from .llm import LLMError, UnsupportedModelError | ||
from importlib.metadata import version | ||
from .utils import ImageExtractionError | ||
from importlib.metadata import version, PackageNotFoundError | ||
from .constants import SUPPORTED_MODELS | ||
|
||
try: | ||
__version__ = version("vision-parse") | ||
except Exception: | ||
__version__ = "0.1.0" | ||
except PackageNotFoundError: | ||
# Use a development version when package is not installed | ||
__version__ = "0.0.0.dev0" | ||
|
||
__all__ = [ | ||
"VisionParser", | ||
"PDFPageConfig", | ||
"ImageExtractionError", | ||
"VisionParserError", | ||
"UnsupportedFileError", | ||
"UnsupportedModelError", | ||
"LLMError", | ||
"SUPPORTED_MODELS", | ||
"__version__", | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
from typing import Dict | ||
|
||
SUPPORTED_MODELS: Dict[str, str] = { | ||
"llama3.2-vision:11b": "ollama", | ||
"llama3.2-vision:70b": "ollama", | ||
"llava:13b": "ollama", | ||
"llava:34b": "ollama", | ||
"gpt-4o": "openai", | ||
"gpt-4o-mini": "openai", | ||
"gemini-1.5-flash": "gemini", | ||
"gemini-2.0-flash-exp": "gemini", | ||
"gemini-1.5-pro": "gemini", | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,14 @@ | ||
Analyze this image and return a detailed JSON description including any text detected, images detected, tables detected, extracted text and confidence score for the extracted text. | ||
Confidence score for the extracted text should be a float value between 0 and 1. If you cannot determine certain details, leave those fields empty. | ||
- Confidence score for the extracted text should be a float value between 0 and 1. If you cannot determine certain details, leave those fields empty or zero. | ||
- Ensure markdown text formatting for extracted text is applied properly by analyzing the image. | ||
- Please ensure that the JSON object is valid and all the fields are present in the response as below: | ||
|
||
```json | ||
{ | ||
"text_detected": "Yes" | "No", | ||
"images_detected": "Yes" | "No", | ||
"tables_detected": "Yes" | "No", | ||
"extracted_text": "Extracted text from the image", | ||
"confidence_score_text": "Confidence score for the extracted text" | ||
} | ||
``` |
Oops, something went wrong.