Skip to content

Convert PDF to fixed-layout EPUB, conserving the table of contents, inner cross-references and hyperlinks.

Notifications You must be signed in to change notification settings

aourednik/pdf2epub3fixed

Repository files navigation

PDF2epub3fixed

This python script generates a fixed-layout EPUB3 e-book from a PDF file in two variants:

  • your_file_html.epub : A rich text variant, with a table of contents, clickable cross-references and hyperlinks. The text body is selectable and searchable. Vector drawings are converted to EPUB-suported SVG. Positioning of all text boxes is 95% reliable and the resulting file readable by most epub readers. For fine-tuning, use an EPUB editor like Sigil.
  • your_file_pageimages.epub : A variant containing high-res image renderings of all your pages, with a table of contents, clickable cross-references and hyperlinks. The only HTML elements included in the EPUB are the links. This conversion is more bullet-proof but yields a larger file, with unselectable and unsearchable text.

Further, the script produces files and folders that can help analyse the structure of your PDF file, and to understand eventual conversion errors:

  • your_file_pageimages.json : JSON object containing the positionings of your words, images and link-boxes.
  • your_file_html/ : Folder containing all XML and other resources that corresponds to the pre-zipped sturcture of your_file_html.epub
  • your_file_pageimages/ : Folder containing all XML and other resources that corresponds to the pre-zipped sturcture of your_file_pageimages.epub

(Yes, an EPUB is nothing but a zipped collection of XMLs.)

This script is particularly suitable for the conversion of PDFs generated with LaTeX variants (XeLaTeX, LuaLaTeX etc.) as it reproduces the "link-boxes" that LaTeX usually generates for cross-refs and hyperlinks. Rendering of complex mathematical equations, nevertheless, is reliable only in the pageimages.epub variant.

Installation

Installing Git and Conda

The python script requires dependencies. It is best run in a managed environment. I advise using Git for download and Conda for enviroment management. If you already have them installed, skip this section.

Both Conda and Git are available for all major platforms (Linux, Mac, Windows). See:

On Mac, you can also use homebrew:

brew install git
brew install --cask miniconda

Installing PDF2epub3fixed using Git and Conda

These lines can be executed in any terminal, including the Windows Console":

git clone https://github.com/aourednik/pdf2epub3fixed.git
cd pdf2epub3fixed

# Create and activate Conda environment
conda create -y -n pdf2epub3fixed python=3.13
conda activate pdf2epub3fixed

# Install dependencies
pip3 install pymupdf
conda install pillow
conda install shututil
conda install zipfile
conda install pyyaml

Use

If you have not already done so, activate the conda environment and navigate to where pdf2epub3fixed.py is located:

On Mac and Linux

conda activate pdf2epub3fixed
cd path/to/pdf2epub3fixed

On Windows

conda activate pdf2epub3fixed
cd path\to\pdf2epub3fixed

Execute with a configuration file

Prepare a configuration file (See an example in config.yml) and run this:

python pdf2epub3fixed.py --yaml_config=config.yml

For all undefined arguments, PDF2epub3fixed will fall back on default values.

Execute with inline arguments

You can also directly provide the arguments in the command line:

python pdf2epub3fixed.py --pdf_path=path/to/your/pdffile.pdf

The following additional arguments should be used:

  • --output_folder="path/to/your/output/folder" (by default, this is set to the output subfolder of the folder from which you execute pdf2epub3fixed.py.)
  • --epub_file_name="your_epub_file_name_without_extension"
  • --title="Your title"
  • --author="Monica Example"
  • --language="en" ("fr-FR", "de" etc.)
  • --publisher = "Publishing House"
  • --date = "yyyy-mm-dd"
  • --description = "Your book abstract"
  • --rights = "All rights reserved."
  • --font_folder = "path/to/your/Fonts" (This folder should contain all the fonts used in your PDF, in TTF format. As fonts are embedded in the EPUB and impact on its size, make sure you only include fonts you really need)
  • --cover_image = "your_cover_image.png"
  • --urn = "12345678-1234-1234-1234-123456789abc"

For all undefined arguments, PDF2epub3fixed will fall back on default values.

Example files

This repository contains an example PDF and cover image consisting of an excerpt of my English translation of my French book Robopoïèses. This translation is currently unpublished and rights can be discussed with my French editor laurence.gudin@editions-baconniere.ch .

I use this for code testing, as the book has crosslinks, hyperlinks, a complex layout and contains text in several writing systems, including right-to-left scripts.

About

Convert PDF to fixed-layout EPUB, conserving the table of contents, inner cross-references and hyperlinks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages