You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the near term, Caffeine will likely treat most errors as immediately fatal (ideally with a high-quality message as part of the error crash).
However one particularly important error that IMO should not get this treatment is memory allocation. Unlike hardware failure, out-of-memory is a common condition when scaling problems in real production science, and needs to be handled in a robust manner by a production-quality runtime. It's even plausible that some applications might perform non-trivial recovery from allocation failure.
prif_allocate and prif_allocate_non_symmetric currently ignore the possibility of errors and I suspect they crash in obscure ways upon memory exhaustion. IMO these two calls should be fixed to strictly adhere to Fortran error handling semantics, specifically wrt returning meaningful stat and errmsg (when provided) or crashing with a useful console message (when not provided). Ideally the error message in either case should include status information about the initial and current state of the shared heaps, and recommendations to the end-user about how to resolve the problem.
The text was updated successfully, but these errors were encountered:
In the near term, Caffeine will likely treat most errors as immediately fatal (ideally with a high-quality message as part of the error crash).
However one particularly important error that IMO should not get this treatment is memory allocation. Unlike hardware failure, out-of-memory is a common condition when scaling problems in real production science, and needs to be handled in a robust manner by a production-quality runtime. It's even plausible that some applications might perform non-trivial recovery from allocation failure.
prif_allocate
andprif_allocate_non_symmetric
currently ignore the possibility of errors and I suspect they crash in obscure ways upon memory exhaustion. IMO these two calls should be fixed to strictly adhere to Fortran error handling semantics, specifically wrt returning meaningfulstat
anderrmsg
(when provided) or crashing with a useful console message (when not provided). Ideally the error message in either case should include status information about the initial and current state of the shared heaps, and recommendations to the end-user about how to resolve the problem.The text was updated successfully, but these errors were encountered: