Using Advanced Queue instead of ActiveMQ/Karaf/Alpaca #1746

seth-shaw-unlv · 2021-02-01T19:31:33Z

During last week's Tech Call Kyle Huynh and Nat Kanthan demonstrated their new Triplestore Indexer.

Instead of relying on on a message emitted to ActiveMQ to be consumed by Karaf and Alpaca, it uses the Advanced Queue module to schedule a job and then uses an indexing action to push the JSON-LD into the triplestore. One of the big advantages of this approach is that the Advanced Queue modules provides a user-interface for tracking what jobs are pending, failed, or completed.

During the call it was suggested that this can be used as a pattern to replace other events we emit to ActiveMQ (e.g. Fedora indexing and derivatives). This would require:

Porting Alpaca to PHP actions (in place of the existing EmitEvent-based actions; we can probably copy/pasta some of the existing actions' form code to help with this)
Creating a new Context reaction that can populate the ported Alpaca actions into an Advanced Queue queue.

This way we can keep all of our Context conditions logic in place, we are simply swapping out the context reaction.

One of the limitations of this proposal is the 'turn-around time' between performing an action (creating a node, etc.) and the resulting actions (e.g. indexing). Currently, Karaf is constantly polling ActiveMQ which can result in the triggered actions to be seemingly instantaneous (unless under a significant load or intensive actions like large image derivatives); whereas Advanced Queue either needs a cron-run OR a drush command to perform it's work. That stated, there are ways to treat drush commands like a daemon to provide the same effect as we currently do with Karaf. For example, with Triplestore indexer, you can configure it to run for a certain amount of time, e.g. just under a minute, and then configure cron to run drush for that particular queue as often, i.e. every minute. There are probably better ways to daemonize drush, but this would probably work well enough.

Did I miss anything? Other thoughts or corrections?

whikloj · 2021-02-02T15:03:50Z

I will still want to use ActiveMQ and Camel (though I'm working on dumping Karaf) so I'd like whatever solution to not close off that possibility. As long as we can configure an external broker OR use advanced queue then I think that would be robust enough.

seth-shaw-unlv · 2021-02-02T16:13:36Z

We should be able to manage that, @whikloj. We would create new actions and a new Context reaction for those that want them while keeping the existing ones. The question would then be which we provide by default, which would involve getting feedback from several stakeholder; but we can cross that bridge when we get to it.

seth-shaw-unlv · 2021-02-02T16:21:46Z

Should have tagged @Natkeeran in the first post. I don't appear to have a Github handle for Kyle.

whikloj · 2021-02-02T19:07:42Z

I'm fine with defaulting to Advanced Queue and leaving ActiveMQ and Alpaca as an alternative option for those that want it.

DiegoPino · 2021-02-02T20:51:19Z

@seth-shaw-unlv just in case you have time to look at some other's projects code, we have "daemonized" drush for our background processors in Archipelago (HOCR, any binary that runs on metadata conditionals/files/input, file transmutations, etc) using queue workers and a hierarchical post processor plugin system provided by Strawberry Runners. Was in our roadmap for a long time and the approach has been working well in 1.0.0-RC1 since it went public. We even have now an open pull for Multi Child processing using reactphp written by @giancarlobi (we have been using it for a few months already). The approach is quite efficient and works perfectly. We decided not to go for advanced queue module because CORE was enough for all these needs, adding an extra dependency made just all more complex to maintain for us.

Just wanted to put this here in case you want to look at our Drush (10) approach/code and our Background Service supervisor. Good luck

seth-shaw-unlv · 2021-02-02T21:15:20Z

Thanks for the tip, @DiegoPino. I don't know if I'll be the first one to tackle this issue as my stakeholder's to-do list is already quite long.

kylehuynh205 · 2021-02-06T17:06:17Z

Should have tagged @Natkeeran in the first post. I don't appear to have a Github handle for Kyle.

Thanks Seth, mine is @kylehuynh205 (https://github.com/kylehuynh205)

kylehuynh205 · 2021-04-09T20:11:38Z

@seth-shaw-unlv just in case you have time to look at some other's projects code, we have "daemonized" drush for our background processors in Archipelago (HOCR, any binary that runs on metadata conditionals/files/input, file transmutations, etc) using queue workers and a hierarchical post processor plugin system provided by Strawberry Runners. Was in our roadmap for a long time and the approach has been working well in 1.0.0-RC1 since it went public. We even have now an open pull for Multi Child processing using reactphp written by @giancarlobi (we have been using it for a few months already). The approach is quite efficient and works perfectly. We decided not to go for advanced queue module because CORE was enough for all these needs, adding an extra dependency made just all more complex to maintain for us.

Just wanted to put this here in case you want to look at our Drush (10) approach/code and our Background Service supervisor. Good luck

Thanks for the great suggestions from @seth-shaw-unlv and @DiegoPino, we have developed a 'prototype' version module which works as a daemonized ReactPHP's Event Loop. This runner can be configure to run in an interval, check if the queues have any queued jobs, then run advanced queue(s). This will help to auto run the queues without manually running Drush command and cron job.
Please find the module at: https://www.drupal.org/project/advancedqueue_runner

kylehuynh205 · 2021-10-25T14:17:43Z

A few enhancement with our approach for Blazegraph micro-service with Advanced Queue and the Runner.

We add a feature to re-run a job if it is failure, with options to choose how many time to re-run and delay between each time of re-run.

After monitoring the Advanced Queue Runner, we found that the runner can be interrupted sometimes .ie if the server is reboot. We add a feature to check and re-run it while cron runs. If cron setup to run more frequently in a Drupal site, the runner can avoid to be re-start manually.

With latest versions:
https://www.drupal.org/project/triplestore_indexer/releases/8.x-1.5-beta1
https://www.drupal.org/project/advancedqueue_runner/releases/8.x-1.1-alpha2

kstapelfeldt added enhancement labels Sep 9, 2021

kstapelfeldt added Type: enhancement Identifies work on an enhancement to the Islandora codebase and removed architecture labels Sep 25, 2021

kstapelfeldt added this to Islandora Issues Queue Feb 1, 2022

kstapelfeldt moved this to Todo in Islandora Issues Queue Feb 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Advanced Queue instead of ActiveMQ/Karaf/Alpaca #1746

Using Advanced Queue instead of ActiveMQ/Karaf/Alpaca #1746

seth-shaw-unlv commented Feb 1, 2021

whikloj commented Feb 2, 2021

seth-shaw-unlv commented Feb 2, 2021

seth-shaw-unlv commented Feb 2, 2021

whikloj commented Feb 2, 2021

DiegoPino commented Feb 2, 2021

seth-shaw-unlv commented Feb 2, 2021

kylehuynh205 commented Feb 6, 2021

kylehuynh205 commented Apr 9, 2021 •

edited

Loading

kylehuynh205 commented Oct 25, 2021

Using Advanced Queue instead of ActiveMQ/Karaf/Alpaca #1746

Using Advanced Queue instead of ActiveMQ/Karaf/Alpaca #1746

Comments

seth-shaw-unlv commented Feb 1, 2021

whikloj commented Feb 2, 2021

seth-shaw-unlv commented Feb 2, 2021

seth-shaw-unlv commented Feb 2, 2021

whikloj commented Feb 2, 2021

DiegoPino commented Feb 2, 2021

seth-shaw-unlv commented Feb 2, 2021

kylehuynh205 commented Feb 6, 2021

kylehuynh205 commented Apr 9, 2021 • edited Loading

kylehuynh205 commented Oct 25, 2021

kylehuynh205 commented Apr 9, 2021 •

edited

Loading