Skip to content
This repository has been archived by the owner on Jan 9, 2025. It is now read-only.

Commit

Permalink
feat(document): add repair function
Browse files Browse the repository at this point in the history
feat(document): add repair function
  • Loading branch information
chuang8511 committed Sep 26, 2024
1 parent 89b9e88 commit 7dc87dc
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion operator/document/v0/python/transform_pdf_to_markdown.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,11 @@ class PdfTransformer:
base64_images: list[dict]

def __init__(self, x: BytesIO, display_image_tag: bool = False, image_index: int = 0):
self.pdf = pdfplumber.open(x)
try:
self.pdf = pdfplumber.open(x)
except Exception as e:
self.errors = [str(e)]
self.pdf = pdfplumber.open(x, repair=True)
self.raw_pages = self.pdf.pages
self.metadata = self.pdf.metadata
self.display_image_tag = display_image_tag
Expand Down

0 comments on commit 7dc87dc

Please sign in to comment.