-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ummunotify code paths broken #429
Comments
Over on #415, the possibility of removing the ummunotify code from OMPI was raised. I'd like to investigate the issue before we outright remove it. |
…_scatter_block_zero_malloc coll/libnbc: ireduce_scatter_block
I've just given a try to ummunotify & openmpi-1.10.2, for the openib btl. Haven't got very far, certainly not to the point where I can witness the exact same bug as Joshua. Regarding what's been mentioned on #415, it very much seems that the ummunotify code in ompi is broken indeed. I've pulled ummunotify from the latest MOFED release where I found it shipped, which is MLNX_OFED_LINUX-2.2-1.0.1-debian7.2-x86_64/ (I'm on debian). Anyway for the very little that ompi does before crashing, that does not matter a lot. For the moment, BTL_OPENIB_MALLOC_HOOKS_ENABLED and -mca memory_linux_ummunotify_enable 1 cause a segfault in |
I'm moving this to future milestone since its unlikely there will be anyone available to work on this issue in the foreseeable future. |
well the future is here and we're no longer considering use of ummunotify or similar package within Open MPI directly. Rather we've enhanced our malloc hook methodology where we still use memory registration caches within Open MPI itself. |
@jladd-mlnx sent me mail a looooong time ago indicating that ummunotify code paths in OMPI are broken:
On Dec 11, 2013, at 10:51 AM, Joshua Ladd wrote:
Gentlemen,
ummunotify was recently added to MOFED, as a result, we are now observing rcache errors on our ConnectX-3 class of HCAs with both the OMPI 1.7.X and 1.6.X series. As far as I can tell, there is no way to disqualify ummunotify at configure time and the default runtime behavior is "-1" (enable it, if you have it.) The error goes away when we pass "-mca memory_linux_ummunotify_enable 0". We would like to disable ummunotify by default until this can be resolved. Do you guys have any objections to this?
Thanks,
Josh
The text was updated successfully, but these errors were encountered: