Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suite freeze on non-existent xtrigger #3054

Closed
dwsutherland opened this issue Mar 31, 2019 · 5 comments
Closed

Suite freeze on non-existent xtrigger #3054

dwsutherland opened this issue Mar 31, 2019 · 5 comments
Assignees
Labels
bug Something is wrong :(
Milestone

Comments

@dwsutherland
Copy link
Member

dwsutherland commented Mar 31, 2019

If an assigned xtrigger function does not exist (or can't be found), i.e.;

    [[xtriggers]]
        oopsie = not_a_function()
        clock_pt1h = wall_clock(offset=PT1H)

    [[dependencies]]
        [[[P1M]]]
            graph = """
@clock_pt1h => qux
@oopsie => qux
"""

Then the suite completely freezes with:
(I added the print)

Xtrigger name: not_a_function
2019-04-01T09:23:07+13:00 ERROR - the JSON object must be str, bytes or bytearray, not NoneType
        Traceback (most recent call last):
          File "/home/sutherlander/repos/cylc8/lib/cylc/scheduler.py", line 257, in start
            self.run()
          File "/home/sutherlander/repos/cylc8/lib/cylc/scheduler.py", line 1534, in run
            self.proc_pool.process()
          File "/home/sutherlander/repos/cylc8/lib/cylc/subprocpool.py", line 179, in process
            self._proc_exit(proc, "", ctx, callback, callback_args)
          File "/home/sutherlander/repos/cylc8/lib/cylc/subprocpool.py", line 170, in _proc_exit
            self._run_command_exit(ctx, callback, callback_args)
          File "/home/sutherlander/repos/cylc8/lib/cylc/subprocpool.py", line 366, in _run_command_exit
            callback(ctx, *callback_args)
          File "/home/sutherlander/repos/cylc8/lib/cylc/xtrigger_mgr.py", line 265, in callback
            satisfied, results = json.loads(ctx.out)
          File "/usr/lib/python3.7/json/__init__.py", line 341, in loads
            raise TypeError(f'the JSON object must be str, bytes or bytearray, '
        TypeError: the JSON object must be str, bytes or bytearray, not NoneType
2019-04-01T09:23:07+13:00 ERROR - error caught: cleaning up before exit
2019-04-01T09:23:07+13:00 INFO - Suite shutting down - ERROR: the JSON object must be str, bytes or bytearray, not NoneType
2019-04-01T09:23:07+13:00 ERROR - [Errno 3] No such process
        Traceback (most recent call last):
          File "/home/sutherlander/repos/cylc8/lib/cylc/scheduler.py", line 257, in start
            self.run()
          File "/home/sutherlander/repos/cylc8/lib/cylc/scheduler.py", line 1534, in run
            self.proc_pool.process()
          File "/home/sutherlander/repos/cylc8/lib/cylc/subprocpool.py", line 179, in process
            self._proc_exit(proc, "", ctx, callback, callback_args)
          File "/home/sutherlander/repos/cylc8/lib/cylc/subprocpool.py", line 170, in _proc_exit
            self._run_command_exit(ctx, callback, callback_args)
          File "/home/sutherlander/repos/cylc8/lib/cylc/subprocpool.py", line 366, in _run_command_exit
            callback(ctx, *callback_args)
          File "/home/sutherlander/repos/cylc8/lib/cylc/xtrigger_mgr.py", line 265, in callback
            satisfied, results = json.loads(ctx.out)
          File "/usr/lib/python3.7/json/__init__.py", line 341, in loads
            raise TypeError(f'the JSON object must be str, bytes or bytearray, '
        TypeError: the JSON object must be str, bytes or bytearray, not NoneType
        
        During handling of the above exception, another exception occurred:
        
        Traceback (most recent call last):
          File "/home/sutherlander/repos/cylc8/lib/cylc/scheduler.py", line 283, in start
            self.shutdown('ERROR: ' + str(exc))
          File "/home/sutherlander/repos/cylc8/lib/cylc/scheduler.py", line 1727, in shutdown
            self.proc_pool.terminate()
          File "/home/sutherlander/repos/cylc8/lib/cylc/subprocpool.py", line 268, in terminate
            os.killpg(proc.pid, SIGKILL)
        ProcessLookupError: [Errno 3] No such process
2019-04-01T09:23:07+13:00 INFO - DONE

Which is due to the subproc None type stdout being read as JSON in the xtrigger (xtrigger_mgr.py) callback method...
One solution would be to catch this type error:

@@ -261,7 +261,7 @@ class XtriggerManager(object):
         self.active.remove(sig)
         try:
             satisfied, results = json.loads(ctx.out)
-        except ValueError:
+        except (ValueError, TypeError):
             return
         LOG.debug('%s: returned %s' % (sig, results))
         if satisfied:

running with debug already shows the issue:

2019-04-01T10:07:36+13:00 DEBUG - [xtrigger-func cmd] cylc-function-run not_a_function '[]' '{}' /home/sutherlander/baz
        [xtrigger-func ret_code] 1
        [xtrigger-func err]
        Traceback (most recent call last):
          File "/home/sutherlander/repos/cylc8/lib/cylc/subprocpool.py", line 52, in get_func
            mod_by_name = __import__(mod_name, fromlist=[mod_name])
        ModuleNotFoundError: No module named 'not_a_function'
        
        During handling of the above exception, another exception occurred:
        
        Traceback (most recent call last):
          File "/home/sutherlander/repos/cylc8/bin/cylc-function-run", line 37, in <module>
            run_function(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4])
          File "/home/sutherlander/repos/cylc8/lib/cylc/subprocpool.py", line 81, in run_function
            func = get_func(func_name, src_dir)
          File "/home/sutherlander/repos/cylc8/lib/cylc/subprocpool.py", line 57, in get_func
            mod_by_name = __import__(mod_name, fromlist=[mod_name])
        ModuleNotFoundError: No module named 'cylc.xtriggers.not_a_function'

I would create a pull request, however, shouldn't this be done/caught on suite validation? (run/reload?)

@dwsutherland
Copy link
Member Author

BTW - shows as not satisfied in the info:

(cylc8proto) sutherlander@cortex-vbox:baz$ cylc show baz 'qux.*'
title: Some Top family
description: some task qux
URL: (not given)

prerequisites (- => not satisfied):
  (None)

outputs (- => not completed):
  - qux.20170101T0000+13 expired
  - qux.20170101T0000+13 submitted
  - qux.20170101T0000+13 submit-failed
  - qux.20170101T0000+13 started
  - qux.20170101T0000+13 succeeded
  - qux.20170101T0000+13 failed

other:
  o  Clock trigger time reached ... True
  o  Triggers at ... 2017-01-01T00:00:00+13:00
  o  xtrigger "oopsie" ... NOT satisfied
  o  xclock "clock_pt1h" ... satisfied

@hjoliver
Copy link
Member

Bug reproduced (end of last week, when you mentioned this @dwsutherland).

Yes, we should detect this on validation if possible, as well as not crash like this at run time.

@hjoliver
Copy link
Member

Assigning optimistically to next release, as this is a bit nasty.

@wxtim
Copy link
Member

wxtim commented Apr 4, 2019

Can we mark this completed and create a new issue for the testing discussed after #3056 was merged?

@dwsutherland
Copy link
Member Author

Yes we can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

No branches or pull requests

3 participants