iiif2annos

Read a manifest, OCR the images, create AnnotationLists and add them to a copy of the manifest

This tool uses the tesseract OCR engine. Ensure you have this installed and on your $PATH before running the code below.

usage: ocr.py [-h] [--base-output-uri OUTPUTURI] [--lang LANG] [-c] manifest output

Read a manifest, OCR all the pages then adds the results as annotation lists

positional arguments:
  manifest              URL to Manifest file
  output                Output directory for annotation lists

options:
  -h, --help            show this help message and exit
  --base-output-uri OUTPUTURI
                        Output URI for annotations and annotation list
  --lang LANG           Language to pass to the OCR engine see: https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
  -c, --confidence      Include OCR confidence value in text of the annotation?

This should work with v2 manifests and v3 manifest. For v2 AnnotationLists are created for v3 AnnotationPages are created.

Example

python iiif2annos/ocr.py --lang frk --base-output-uri http://localhost:5500/newspaper https://preview.iiif.io/cookbook/update_newspaper/recipe/0068-newspaper/newspaper_issue_1-manifest.json  newspaper

Using these blogs as a guide:

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
iiif2annos		iiif2annos
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iiif2annos

Example

About

Releases

Packages

Languages

License

digital-york/iiif2annos

Folders and files

Latest commit

History

Repository files navigation

iiif2annos

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages