Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove workflow server branch #14

Merged
merged 1 commit into from
Oct 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 5 additions & 11 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,20 @@ LABEL \
org.label-schema.vcs-url="https://github.com/slub/ocrd_controller" \
org.label-schema.build-date=$BUILD_DATE

# keep PREFIX and VIRTUAL_ENV from ocrd/all
# keep PREFIX and VIRTUAL_ENV from ocrd/all (i.e. /usr/local)
# but export them for COPY etc
ENV PREFIX=$PREFIX
ENV VIRTUAL_ENV=$VIRTUAL_ENV
ENV HOME=/

# must mount a host-side directory for ocrd-resources
VOLUME /models
# override XDG_DATA_HOME from ocrd/all (i.e. /usr/local/share)
ENV XDG_DATA_HOME=/models
# override TESSDATA_PREFIX from ocrd/all
ENV TESSDATA_PREFIX=$XDG_DATA_HOME/ocrd-resources/ocrd-tesserocr-recognize
RUN mkdir $TESSDATA_PREFIX
RUN mv /usr/local/share/tessdata/*.traineddata $TESSDATA_PREFIX
# must mount a host-side directory for ocrd/resource.yml
VOLUME /config
ENV XDG_CONFIG_HOME=/config
Expand Down Expand Up @@ -55,16 +59,6 @@ EXPOSE 22
WORKDIR /build

RUN ln /usr/bin/python3 /usr/bin/python
# prevent make from updating the git modules automatically
ENV NO_UPDATE=1
#
# update to core#652 (workflow server)
RUN git -C core fetch origin pull/652/head:workflow-server
RUN git -C core checkout workflow-server
RUN for venv in $VIRTUAL_ENV $VIRTUAL_ENV/sub-venv/*; do . $venv/bin/activate && make -C core install PIP_INSTALL="pip install -e"; done
# update ocrd-import
RUN git -C workflow-configuration pull origin master
RUN . $VIRTUAL_ENV/bin/activate && make -C workflow-configuration install
# configure writing to ocrd.log for profiling
COPY ocrd_logging.conf /etc

Expand Down
19 changes: 2 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
* [Starting and mounting](#starting-and-mounting)
* [General management](#general-management)
* [Processing](#processing)
* [Workflow server](#workflow-server)
* [Data transfer](#data-transfer)
* [Parallel options](#parallel-options)
* [Logging](#logging)
Expand Down Expand Up @@ -86,19 +85,6 @@ Subsequently, you can use these models on your `DATA` files:
# or equivalently:
ssh -p 8022 ocrd@controller "ocrd-tesserocr-recognize -m some-document/mets.xml -P segmentation_level region -P model Fraktur"

#### Workflow server

Currently, the OCR-D installation hosts an implementation of the [workflow server](https://github.com/OCR-D/core/pull/652),
which can be used to significantly reduce initialization overhead when running the same workflow repeatedly
on many workspaces (especially with GPU-bound processors):

ssh -p 8022 ocrd@controller "ocrd workflow server -j 4 -t 120 'tesserocr-recognize -P segmentation_level region -P model Fraktur'"

And subsequently:

ssh -p 8022 ocrd@controller "ocrd workflow client process -m some-document/mets.xml"
ssh -p 8022 ocrd@controller "ocrd workflow client process -m other-document/mets.xml"

### Data transfer

If your data files cannot be directly mounted on the host (not even as a network share),
Expand All @@ -122,7 +108,6 @@ For parallel processing, you can either
- issueing parallel commands –
* via basic shell scripting
* via [ocrd-make](https://bertsky.github.io/workflow-configuration) calls
* via [`ocrd workflow server --processes`](#workflow-server) concurrency
- run processes on multiple controllers.

Note: internally, `WORKERS` is implemented as a (GNU parallel-based) semaphore
Expand All @@ -137,5 +122,5 @@ All logs are accumulated on standard output, which can be inspected via Docker:

## See also

- [Meta-repo for integration of Kitodo.Production with OCR-D in Docker](https://github.com/markusweigelt/kitodo_production_ocrd)
- [Sister component OCR-D Manager](https://github.com/markusweigelt/ocrd_manager)
- [Meta-repo for integration of Kitodo.Production with OCR-D in Docker](https://github.com/slub/kitodo_production_ocrd)
- [Sister component OCR-D Manager](https://github.com/slub/ocrd_manager)