Skip to content

Commit

Permalink
release_notes and readme updated for v2.0-rc1
Browse files Browse the repository at this point in the history
  • Loading branch information
fritz-hh committed Jan 7, 2014
1 parent 828f195 commit 29d6748
Show file tree
Hide file tree
Showing 3 changed files with 54 additions and 3 deletions.
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be search

To get the script usage, call: sh ./OCRmyPDF.sh -h

Features
Main features
--------

- Generates a searchable PDF/A file from a PDF file containing only images
Expand Down Expand Up @@ -52,4 +52,9 @@ In case you detect an issue, please:
- if no problem report exists on github, please create one here: https://github.com/fritz-hh/OCRmyPDF/issues
- Describe your problem thoroughly
- Append the console output of the script when running the debug mode (-g option)
- If possible provide your input PDF file as well as the content of the temporary folder (using a file sharing service like www.file-upload.net)
- If possible provide your input PDF file as well as the content of the temporary folder (using a file sharing service like www.file-upload.net)

Press & Media
-------------

- c't 1-2014, page 59: Detailed presentation of OCRmyPDF v1.0 in the leading german IT magazine (c't)
46 changes: 46 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,52 @@ Please always read this file before installing the package

Download software here: https://github.com/fritz-hh/OCRmyPDF/tags

v2.0-rc1 (2014-01-07):
====

New features
------------

- Huge performance improvement on machines having multiple CPU/cores (processing of several pages concurrently) (fixes #18)
- By default prevent from processing a PDF file already containing fonts (i.e. text)(it can be overridden with the -f flag) (fixes #16)
- Warn if the resolution is too low to get reasonable OCR results (fixes #37)
- New option (-o) to perform automatic oversampling if the image resolution is too low. This can improve OCR results.
- Warn if using a tesseract version older than v3.02.02 (as older versions are known to produce invalid output) (fixes #41)
- Echo version of the installed dependencies (e.g. tesseract) in debug mode in order to ease support (fixes #35)
- Echo the arguments passed to the script in debug mode to ease support

Changes
-------

- In debug mode: The debug page is now placed after the respective "normal" page
- Reduced disk space usage if -d (deskew) or -c (cleanup) options are not selected
- New file src/config.sh containing various configuration parameters
- Documentation of the tesseract config file "tess-cfg/no_ligature" improved
- Improved consistency of the temporary file names

Fixes
-----

- Improved robustness:
- in case vertical resolution differs from horizontal resolution (fixes #38)
- in case a PDF page contains more than one image (fixes #36)
- Fix a problem occurring if python 3 is the standard interpreter (fixes #33)
- Fix a problem occurring if the input PDF file contains special characters like "#" (fixes #34)

Tested with
-----------

- Operating system: FreeBSD 9.1
- Dependencies:
- poppler-utils 0.22.2
- ImageMagick 6.8.0-7 2013-03-30
- Unpaper 0.3
- tesseract 3.02.02
- Python 2.7.3
- pdftk 1.45
- ghoscript (gs): 9.06
- java: openjdk version "1.7.0_17"

v1.1-stable (2014-01-06):
====

Expand Down
2 changes: 1 addition & 1 deletion src/config.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
TOOLNAME="OCRmyPDF"
VERSION="v2.x"
VERSION="v2.0-rc1"

# possible exit codes
EXIT_BAD_ARGS="1"
Expand Down

0 comments on commit 29d6748

Please sign in to comment.