Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowly deprecate LaMachine in favour of more lightweight and distributed application-containers #214

Open
proycon opened this issue Feb 16, 2022 · 4 comments
Assignees

Comments

@proycon
Copy link
Owner

proycon commented Feb 16, 2022

If you're a user wondering where to migrate from LaMachine, read this comment: #214 (comment)


CLARIAH, the project in which LaMachine is embedded, has (finally) been moving
towards clearer software & infrastructure requirements in the past year. These
requirements focus on industry-standard solutions like application containers
(OCI containers; e.g. usable with Docker) and orchestration of these containers.

This path has some implications for LaMachine, as it, as far containerization
is concerned, makes use of more 'fat' containers, alongside other deployment
options (VM, local installation etc), whereas aspects such as orchestration are
deliberately left out of scope for LaMachine. LaMachine is essentially an
infrastructure solution that I provided because there was no common solution to
speak of on the project (CLARIAH) level.

As that changes and the new direction becomes clearer, LaMachine's purpose also
needs to be revisited. The current solution is likely going to be obsolete.
This is not something that will happen overnight and users needn't worry, but I
and want to embrace the new direction and slowly facilitate this. Eventually
LaMachine will be deprecated and (or possibly continue in a very different
form).

A major factor contributing to the decision to deprecate LaMachine is the fact
that LaMachine currently tries to tailor for a wide variety of environments (OS
variants, Linux distros, Python versions, etc), which comes with a significant
maintenance cost (and more prone to breakage). This is no longer sustainable in
the long run.

Practically, deprecating LaMachine entails the following:

  • Provide lightweight application containers for all software participating in LaMachine:
    • provide a build file (Dockerfile/Containerfile) for each, in the upstream software repository. No need for Ansible in most cases.
    • publish the resulting images (on Docker Hub and in the (yet to be established) CLARIAH docker registry)
      • automate building and publishing using the continuous deployment infrastructure being set up for CLARIAH
    • prefer a single light-weight distribution like Alpine Linux and actively contribute packages where possible
  • Drop the explicit VM support, container platforms like Docker automatically virtualize on non-Linux systems anyway. The added value is too small to warrant the maintenance cost. h
  • Python bindings that bind with native code (e.g. python-frog, python-ucto, python-timbl, colibri-core, analiticcl) should be provided as wheels and distributed via PyPi
  • Drop the 'native compilation' / virtualenv support. People using virtualenvs can use wheels directly with pip (see above point). The use of virtualenvs remains encouraged but it is no longer needed to 'hold the user's hand' here and set everything up.
  • Preserve various useful aspects of LaMachine. A lot of expertise has been gathered over the years, most of it should be perfectly reusable.
    • development vs stable distinction (stable draws from repositories like Alpine/pypi etc, development builds/installs latest git head source)
    • port and reuse existing configuration templates
  • Configure orchestration solutions (infrastructure-as-code) consisting of multiple components; here is a role for a possible LaMachine v3 if we want to reuse th ename.
  • Limit macOS support; macOS users can use containers (virtualised Linux). Certain software will be continued to be provided natively for macOS via homebrew (as it currently is), without the overarching LaMachine layer.

This issue is meant to track this progress and provide a place for discussion. End-users need not be alarmed yet at this stage.

@proycon
Copy link
Owner Author

proycon commented Jul 18, 2022

Status update: this has been ongoing for a while. Various things that were in LaMachine are now containerized independently. I will slowly continue this as time progresses.

@proycon
Copy link
Owner Author

proycon commented Jul 22, 2022

Where to migrate to from LaMachine?

LaMachine is a meta-distribution that provided a one-in-all solution for the installation of
a variety of software. Now LaMachine is being deprecated, you will need another solution
to install the software you want. What the best solution is depends a lot on the
specific software you want to use, the system you are on, and your use-case.

As there will be a little less hand-holding without LaMachine, we expect users
who want to install and locally use software to at least be familiar with common technologies
such as Python Virtual Environments and Docker containers.

This post intends to guide you to new solutions. It will point to where you
can find information on how to install specific software that was previously
handled by LaMachine. I will attempt to keep this comment up to date for a while:

  • Frog
    • (command-line interface) -> Frog
      • Alpine Linux: apk add frog
      • Docker: docker pull proycon/frog
      • macOS + homebrew: brew install frog
    • (from python) -> Frog for Python
      • pip install python-frog (use a virtual environment!)
    • (webservice) -> Frog Webservice (CLAM)
      • Docker: docker pull proycon/frog-webservice
  • Ucto
    • (command-line interface) -> ucto
      • Alpine Linux: apk add ucto
      • Docker: docker pull proycon/ucto
      • macOS + homebrew: brew install ucto
    • (from python) -> Ucto for Python
      • pip install python-ucto (use a virtual environment!)
    • (webservice) -> Ucto Webservice (CLAM)
      • Docker: docker pull proycon/ucto
  • Timbl
    • (command-line interface) -> timbl
      • Alpine Linux: apk add timbl
      • Docker: docker pull proycon/timbl
      • macOS + homebrew: brew install timbl
    • (from python) -> Python timbl
      • pip install python-timbl (use a virtual environment!)
  • Colibri Core
    • (command-line interface & python binding) -> colibri-core
      • Python: pip install colibricore
      • Docker: docker pull proycon/colibri-core (no python binding)
  • FoLiA tools/utilities
    • (command-line interface) -> FoLiA utils & FoLiA tools
      • Docker: docker pull proycon/foliautils
      • Python: pip install folia-tools
  • FLAT
    • Docker: docker pull proycon/flat
    • Python: pip install FoLiA-Linguistic-Annotation-Tool (but demands a lot of configuration, docker container recommended for a more out-of-the-box experience like LaMachine provided!)
  • DeepFrog
    • (command-line interface & rust library) -> deepfrog
      • Cargo: cargo install deepfrog (may have some issues currently)
  • Analiticcl
    • (command-line interface & rust library) -> analiticcl
      • Cargo: cargo install analiticcl
    • (python binding) -> analiticcl
      • pip install analiticcl
  • Oersetter
  • Dutch Speech Recognition; Kaldi-NL, asr-nl (formally oral history), and forcedalignment2
    • (command line interface): Kaldi-NL
      • Docker: docker pull proycon/kaldi_nl
    • (webservice): asr-nl
      • Docker: docker pull proycon/asr_nl
    • (webservice): forcedaligment2
      • Docker: docker pull proycon/forcedaligment2

For certain software, there are no convenient alternatives to LaMachine yet, solutions hopefully will emerge as-needed.

LaMachine provided some software by CLARIAH/CLARIN partners, we now refer first and foremost to the partners:

  • Alpino -> Use the solutions provided by Groningen, or alternatively use my docker container (docker pull proycon/alpino) but without guarantees that it's up to date.
    • (webservice) -> Alpino Webservice (CLAM, with FoLiA support)
      • Docker: docker pull proycon/alpino_webservice .

LaMachine also bundled a lot of third-party software like Jupyter Lab/Notebook,
pytorch, Moses, tensorflow, spaCy, freeling, coreNLP, fasttext, Nextflow. You will need to check
your distribution or language's package manager or the upstream provider for
solutions.

Integrated environments that offer and interconnect multiple tools for researchers over the web,
as were already offered by LaMachine, will be offered instead by the larger
CLARIAH infrastructure, of which the Language and Speech Tools portal at CLST is a notable part that will be kept up to date with services for many of the aforementioned software.

The deprecation of LaMachine does not mean it will become suddenly completely unavailable, it can still be used as-is. Things will remain working for a certain time until they break at some point due to divergences in the ecosystem. Such things will no longer be fixed then and users will be directed to the alternative solutions in this post instead.

@mhkuu
Copy link

mhkuu commented Aug 24, 2023

Hey @proycon , I was just notified of this message by a co-worker. So a very belated "thank you very much" for your development and maintenance of LaMachine over the years! 🙏

@egpbos
Copy link

egpbos commented May 17, 2024

Just in case anybody comes here looking for replacements: there has been a conda-forge package for ticcltools (and its dependency ticcutils) for some years. I am no longer involved in TICCL development, but the conda package is quite low maintenance, so I do keep the package synced with the ticcltools repo (i.e.: to my knowledge, it is up to date, but I haven't checked recently).

To install ticcltools this way with conda (or mamba): conda install ticcltools -c conda-forge.

If somebody wants to help maintaining the conda-forge packages, wants to take over fully or is simply interested, here they are, feel free to contribute in any way:

https://github.com/conda-forge/ticcltools-feedstock
https://github.com/conda-forge/ticcutils-feedstock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants