# Trans-PolyDocs

Quickly translate Markdown documents or PDF documents (converted by Doc2X) into Markdown/Word while retaining the original format of formulas/tables/images.

Supports multiple translators and multi-threaded translation:

DeepSeek (default translator)
OpenAI (supports custom URL, must end with /v1)
Ollama
Google Translate (experimental, may be unstable, provided by py-googletrans)
DeepL (official API)
DeeLX

Main Interface	LLM Settings	Multiple Translators
Drag or click to import Markdown/PDF files, supports automatic dark mode switching	Detailed configuration for LLM, offering more customization	Supports multiple translators

## Translating using LLM

If you wish to use LLM for translation, there are additional settings to aid in the translation process.

### Text Extraction Method for Translation

Due to the uncertainty of LLM output, three methods are built-in for extracting translated text: json, markdown, and direct. The preset prompt output requires using the markdown method for extraction.

json method extracts content with the key "translated" from JSON format.
markdown method extracts text enclosed in ``` (snips documents between the first and last ```).
direct method uses the raw response text directly.

### Prompts

You can use some variables in the input prompt to fill in. In the GUI, you can click the corresponding button to copy the variable name. Supported variables include:

{{text}} Text to be translated (REQUIRED)
{{prev_text}} Previous text context, if none, then an empty string
{{next_text}} Next text context, if none, then an empty string
{{dest}} Target language for translation

For example, here is a sample input prompt using variables:

Translate the following text to {{dest}}:
{{text}}

You do not need to emphasize retaining formula structure in the prompt; the program will replace formulas with emoji and restore them post-translation. This ensures that any translator (such as DeepL) can retain formula translation.

## Running the GUI

Important

If you wish to output the translated document as a Word file, please install pandoc before running the program.

Windows:

Download installer or enter winget install --source winget --exact --id JohnMacFarlane.Pandoc in Powershell

MacOS: Run brew install pandoc in Terminal

Ubuntu/Debian: Run sudo apt install pandoc in Terminal

Arch/Manjaro: Run sudo pacman -S pandoc-cli in Terminal

### Precompiled Program

You can click on releases on the right to download the precompiled program, download the version corresponding to your operating system, unzip the package, and run the program.

### Adjusting Output Word Styles

Adjust the styles in the reference.docx located at the root directory after extraction.

### Running from Source

After cloning the current repository, execute it from the repository path:

conda create -n translate python=3.12
conda activate translate
pip install uv
uv pip install -r requirements.txt
python app.py

## CLI Program

If you wish to use the CLI program, after cloning the current repository, execute the following in the repository path to copy sample environment variables:

cp example.env .env

Then modify it according to the instructions in .env, configure the environment, and run:

conda create -n translate python=3.12
conda activate translate
pip install uv
uv pip install -r requirements.txt
python Main.py

## Custom Translator

If you want to use your own translation API, you can customize a translator. A sample translator is as follows:

def translate(text: str, prev_text: str, next_text: str) -> str:
    try:
        return "This is an example!"
    except Exception as e:
        print(f"Error: {e}")
        return text

Then import MD_Translate.py in your program for use:

# Your defined translate function
from MD_Translate import Process_MD
file_path = "path"  # Path to the MD file
threads = 10  # Number of translation threads
Process_MD(md_file=file_path, translate=translator, thread=threads)

graph TD
A[Input MD or PDF file] --> B{File Type}
B -->|MD File| C[Parse MD file]
B -->|PDF File| D[Use Doc2X to convert to MD file]
D --> C
C --> E[Segment based on type]
E --> F[Call translator for translation]
F --> G{Retain formula translation successful?}
G -->|Yes| H[Retain formula translation]
G -->|No| I[Rollback to plain text translation]
H --> J[Combine translated document]
I --> J
J --> K[Use pandoc to convert to Word]

Loading

## Packaging

Use pyinstaller to package. Install with pip install pyinstaller. Run the following command:

pyinstaller -w --onefile -i icon.png app.py

And copy the reference.docx and example.env from the project to the same directory as the generated binary file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_EN.md

README_EN.md

Files

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls