Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reinstall + fileinstall: async clash #274

Closed
oliver-sanders opened this issue Dec 5, 2023 · 1 comment · Fixed by cylc/cylc-flow#5868, #276 or metomi/rose#2744
Closed

reinstall + fileinstall: async clash #274

oliver-sanders opened this issue Dec 5, 2023 · 1 comment · Fixed by cylc/cylc-flow#5868, #276 or metomi/rose#2744
Assignees
Labels
bug Something isn't working
Milestone

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented Dec 5, 2023

Due to a recent change in cylc-flow, reinstallation is now async (note unreleased bug).

Unfortunately, this puts cylc-rose in an awkward situation because of the following call chain:

  • cylc-flow - reinstall (async)
  • cylc-rose - fileinstall
  • rose - JobRunner.run (async)

Because cylc-rose is sync, we can't bridge the gap between the two async islands of code.

Try reinstalling a workflow which uses a rose fileinstallation with cylc-rose installed, bang!

Example

# rose-suite.conf
[file:foo]
source=bar
# bar
whatever
# flow.cylc
whatever
$ cylc reinstall reload --yes
REINSTALLED reload/run1 from /net/home/h06/osanders/cylc-tmp/reload
PluginError: Error in plugin cylc.post_install.rose_opts
RuntimeError: This event loop is already running

Possible Solutions

1 - Propagate async/await

Bridge the gap by propagating async/await right the way from cylc-flow, through cylc-rose into rose.

Problem! If we try to make cylc-rose async (by making the install plugin interfaces async), then we end up making the glbl_cfg async (because the install plugins are called during loading of the global config). This would be prohibitively disruptive as every single glbl_cfg call would need to be wrapped in async which would cascade to the point of making pretty much all of the cylc-flow codebase async.

Dropping plugin support for the global config (not the worst idea, although arguably preprocessors should be plugins) solves the issue, but would require duplicate sync/async parsec implementations, messy.

2 - Push fileinstallation into a separate process

Out of sight out of mind.

If the async part of rose happened in a different process, then there wouldn't be any clash with cylc-flow.

But this means forking another process. Too expensive?

3 - Try to drop back into the event loop

We can access the running event loop via asyncio.get_event_loop(), so we can schedule the file installation to run in this loop.

The problem is that we can't await it which means that we have no way of knowing when it has completed.

You can't poll for completion because sleep is blocking to asyncio. You can't poll from a thread, because of the GIL.

With no way of knowing when fileinstallation is complete, cylc play would follow on from cylc install in a cylc vip command call, potentially whilst file installation is still taking place.

This patch makes the error go away, but it would introduce a subtle new bug
diff --git a/metomi/rose/job_runner.py b/metomi/rose/job_runner.py
index 6c8d1610..6627080e 100644
--- a/metomi/rose/job_runner.py
+++ b/metomi/rose/job_runner.py
@@ -175,19 +175,22 @@ class JobRunner:
                 The maximum number of jobs to run concurrently.
 
         """
-        running = []
         loop = asyncio.get_event_loop()
         loop.set_exception_handler(self.job_processor.handle_event)
-        loop.run_until_complete(
-            asyncio.gather(
-                self._run_jobs(running, job_manager, args, concurrency),
-                self._post_process_jobs(running, job_manager, args),
-            )
+        loop.create_task(
+            self._run(job_manager, *args, concurrency=concurrency)
         )
         dead_jobs = job_manager.get_dead_jobs()
         if dead_jobs:
             raise JobRunnerNotCompletedError(dead_jobs)
 
+    async def _run(self, job_manager, *args, concurrency=6):
+        running = []
+        asyncio.gather(
+            self._run_jobs(running, job_manager, args, concurrency),
+            self._post_process_jobs(running, job_manager, args),
+        )
+
     async def _run_jobs(self, running, job_manager, args, concurrency):
         """Run pending jobs subject to the concurrency limit.
 

But we can shift the awaiting part of the problem back to cylc-flow, icky, but, hopefully functional...

@oliver-sanders oliver-sanders added the bug Something isn't working label Dec 5, 2023
@oliver-sanders oliver-sanders added this to the 1.4.0 milestone Dec 5, 2023
@oliver-sanders oliver-sanders self-assigned this Dec 5, 2023
@oliver-sanders
Copy link
Member Author

I'm got a solution along the lines of (3) where we create the async tasks in the Rose code, and await them in the cylc-flow code.

It's slightly icky but it's simple and effective. The main caveat to be aware of is that file installation will happen after the cylc-rose plugin returns, so don't go poking at any files from within the cylc-rose plugin becuase they might not be there yet.

oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Dec 6, 2023
* Await any background tasks started by a plugin before continuing.
  See cylc/cylc-rose#274
* Centralise the plugin loading/running/reporting logic.
* Fix the installation test for back-compat-mode.
oliver-sanders added a commit to oliver-sanders/cylc-rose that referenced this issue Dec 6, 2023
* Rose will now take over the role of managing its event loop.
* Addresses cylc#274
oliver-sanders added a commit to oliver-sanders/rose that referenced this issue Dec 6, 2023
* Allow Rose file installation to be called by code which already
  has an event loop running by scheduling coroutines to run in the
  background (i.e. schedule but don't await).
* The calling code can list these tasks using `asyncio.all_tasks()`
  and await them as appropriate.
* Addresses cylc/cylc-rose#274
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Dec 6, 2023
* Await any background tasks started by a plugin before continuing.
* Addresses cylc/cylc-rose#274
* Centralise the plugin loading/running/reporting logic.
* Fix the installation test for back-compat-mode.
oliver-sanders added a commit to oliver-sanders/rose that referenced this issue Dec 6, 2023
* Allow Rose file installation to be called by code which already
  has an event loop running by scheduling coroutines to run in the
  background (i.e. schedule but don't await).
* The calling code can list these tasks using `asyncio.all_tasks()`
  and await them as appropriate.
* Addresses cylc/cylc-rose#274
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Dec 6, 2023
* Await any background tasks started by a plugin before continuing.
* Addresses cylc/cylc-rose#274
* Centralise the plugin loading/running/reporting logic.
* Fix the installation test for back-compat-mode.
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Jan 26, 2024
* Await any background tasks started by a plugin before continuing.
* Addresses cylc/cylc-rose#274
* Centralise the plugin loading/running/reporting logic.
* Fix the installation test for back-compat-mode.
oliver-sanders added a commit to oliver-sanders/cylc-rose that referenced this issue Jan 26, 2024
* Rose will now take over the role of managing its event loop.
* Addresses cylc#274
oliver-sanders added a commit to oliver-sanders/rose that referenced this issue Jan 26, 2024
* Allow Rose file installation to be called by code which already
  has an event loop running by scheduling coroutines to run in the
  background (i.e. schedule but don't await).
* The calling code can list these tasks using `asyncio.all_tasks()`
  and await them as appropriate.
* Addresses cylc/cylc-rose#274
oliver-sanders added a commit to oliver-sanders/cylc-rose that referenced this issue Jan 26, 2024
* Rose will now take over the role of managing its event loop.
* Addresses cylc#274
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Jan 26, 2024
* Await any background tasks started by a plugin before continuing.
* Addresses cylc/cylc-rose#274
* Centralise the plugin loading/running/reporting logic.
* Fix the installation test for back-compat-mode.
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Jan 26, 2024
* Await any background tasks started by a plugin before continuing.
* Addresses cylc/cylc-rose#274
* Centralise the plugin loading/running/reporting logic.
* Fix the installation test for back-compat-mode.
oliver-sanders added a commit to oliver-sanders/rose that referenced this issue Jan 31, 2024
* Allow Rose file installation to be called by code which already
  has an event loop running by scheduling coroutines to run in the
  background (i.e. schedule but don't await).
* The calling code can list these tasks using `asyncio.all_tasks()`
  and await them as appropriate.
* Addresses cylc/cylc-rose#274
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Feb 7, 2024
* Await any background tasks started by a plugin before continuing.
* Addresses cylc/cylc-rose#274
* Centralise the plugin loading/running/reporting logic.
* Fix the installation test for back-compat-mode.
oliver-sanders added a commit to oliver-sanders/cylc-flow that referenced this issue Feb 20, 2024
* Await any background tasks started by a plugin before continuing.
* Addresses cylc/cylc-rose#274
* Centralise the plugin loading/running/reporting logic.
* Fix the installation test for back-compat-mode.
oliver-sanders added a commit to oliver-sanders/cylc-rose that referenced this issue Feb 20, 2024
* Rose will now take over the role of managing its event loop.
* Addresses cylc#274
oliver-sanders added a commit to oliver-sanders/rose that referenced this issue Feb 20, 2024
* Allow Rose file installation to be called by code which already
  has an event loop running by scheduling coroutines to run in the
  background (i.e. schedule but don't await).
* The calling code can list these tasks using `asyncio.all_tasks()`
  and await them as appropriate.
* Addresses cylc/cylc-rose#274
MetRonnie pushed a commit to metomi/rose that referenced this issue Feb 22, 2024
* Allow Rose file installation to be called by code which already
  has an event loop running by scheduling coroutines to run in the
  background (i.e. schedule but don't await).
* The calling code can list these tasks using `asyncio.all_tasks()`
  and await them as appropriate.
* Addresses cylc/cylc-rose#274
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment