You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
original_trace = self.error_queues[error_index].get()
msg = "\n\n-- Process %d terminated with the following error:\n" % error_index
msg += original_trace
> raise ProcessRaisedException(msg, error_index, failed_process.pid)
E torch.multiprocessing.spawn.ProcessRaisedException:
E
E -- Process 0 terminated with the following error:
E Traceback (most recent call last):
E File "/home/dagardner/work/morpheus/morpheus/models/dfencoder/multiprocessing.py", line 30, in _wrap
E fn(i, *args)
E File "/home/dagardner/work/morpheus/tests/dfencoder/test_dfencoder_distributed_e2e.py", line 176, in _run_test
E assert min(losses) < LOSS_TARGETS[loss_type][ft] * LOSS_TOLERANCE_RATIO
E AssertionError
../conda/envs/morpheus/lib/python3.10/site-packages/torch/multiprocessing/spawn.py:160: ProcessRaisedException
Full env printout
No response
Other/Misc.
No response
Code of Conduct
I agree to follow Morpheus' Code of Conduct
I have searched the open bugs and have found no duplicates for this bug report
The text was updated successfully, but these errors were encountered:
Version
23.07
Which installation method(s) does this occur on?
Source
Describe the bug.
This test intermittently fails. In repeated testing this failed on the 51st iteration.
Minimum reproducible example
Relevant log output
Full env printout
No response
Other/Misc.
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: