Error with cudaFreeHost(ptr) in syncedmem.hpp:30 #3053

LiberiFatali · 2015-09-10T09:37:48Z

In my case, when caffe model finishs predicting an image. There is the below error:

F0910 15:09:52.590445 21913 syncedmem.hpp:30] Check failed: error == cudaSuccess (11 vs. 0) invalid argument
*** Check failure stack trace: ***
Aborted

It comes from CUDA_CHECK(cudaFreeHost(ptr)); in syncedmem.hpp.
Everything else works well. Any ideas for fixing this? I use latest code in the master.

I'm using Ubuntu 14.04, NVIDIA driver 352.41, Cuda 7.5 and CuDNN v2.

hberntsen · 2015-09-11T11:31:07Z

I get the same error in Python 2.7:

import caffe
#standard bvlc_alexnet
net = caffe.Net('deploy.prototxt', 'bvlc_alexnet.caffemodel', caffe.TEST)
caffe.set_mode_gpu()
exit

On Ubuntu 14.04, NVIDIA 346.82, Cuda 7. The error is encountered after the exit command.

beniz · 2015-09-12T12:30:42Z

I've seen this before as well as a crash whenever a net is deleted after a predict. This not guaranteed to help you immediately, but you may want to try to get your Caffe tree back to one of these two commits and try your code again:

Hopefully, one of the versions above will work for you, thus helping to pinpoint the issue.

EDIT: d2f0457 that activates pinned memory is crashing some of my setups at Net destruction, and this looks similar to the problem reported in this issue. Reverting the commit clears the bug.

ronghanghu · 2015-09-12T16:03:49Z

I think we have a bug here. I'll try to look into this today.

ronghanghu · 2015-09-12T21:34:02Z

@hberntsen @beniz I just took a look today. Can you reproduce the same error if you run caffe.set_mode_gpu() before creating your net? That is,

import caffe
#standard bvlc_alexnet
caffe.set_mode_gpu()
net = caffe.Net('deploy.prototxt', 'bvlc_alexnet.caffemodel', caffe.TEST)
exit

Right now the mode is not (but actually should be) an attribute of the net that is set up during creation (just like phase), since in CPU mode malloc is used while in GPU mode cudaMallocHost is used (introduced in #2903). So, if you run caffe.set_mode_gpu() after creating a net, caffe will allocate CPU memory using malloc (since it is CPU mode during net construction) and try to use cudaFreeHost instead of free to free memory allocated by malloc when destroying a net, resulting in this error.

ronghanghu · 2015-09-12T21:37:07Z

One solution is to always use cudaMallocHost to allocate host memory unless using CPU_ONLY build. @shelhamer do you agree? Some docs regarding this function:

The cudaMallocHost operation under the hood is doing something like a malloc plus additional OS functions to "pin" each page associated with the allocation (making cudaMemcpy faster). These additional OS operations take extra time, as compared to just doing a malloc. And note that as the size of the allocation increases, the registration ("pinning") cost will generally increase as well.

Alternatively, we can also add mode to be an attribute of net, but that involves more hacking at the cost of interface change and at risk of introducing new mistakes.

beniz · 2015-09-12T21:49:18Z

@ronghanghu I believe this is indeed a good catch, thanks for the quick reaction! Though I cannot test-run immediately, my code appears to be calling set_mode_gpu after Net creation.

Alternatively, we can also add mode to be an attribute of net, but that involves more hacking at the cost of interface change and at risk of introducing new mistakes.

If this gives the ability to have multiple nets in memory, some using CPU, some using GPU, as a user of the caffelib, I would rate it as a very good feature to conserve (since as far as I understand, this was working prior enforcing the use of `cudaMallocHost').

ronghanghu · 2015-09-13T00:21:55Z

If this gives the ability to have multiple nets in memory, some using CPU, some using GPU, as a user of the caffelib, I would rate it as a very good feature to conserve (since as far as I understand, this was working prior enforcing the use of `cudaMallocHost').

Eventually we would like to give mode and device to net and layers, in the spirit of #1500. ~~But it seems like a short-term fix to always use cudaMallocHost to allocate cpu memory regardless of the mode of the net to avoid this crash.~~ cudaMallocHost seems to assume at least one GPU is there (I don't know why). seem to be running into the same issue as mentioned in 46a431a

However, although mode/device is right now not a member of Net and Layer and thus changable in runtime, it is better to set them in advance and not change them during a lifecycle of a net.

hberntsen · 2015-09-13T12:46:50Z

@ronghanghu In that case the error is does not appear. So setting the mode to GPU before loading the net circumvents the error.

LiberiFatali · 2015-09-14T01:47:32Z

I use caffe.Classifier to contruct the net, not caffe.Net. So when I use

self.net = caffe.Classifier(MODEL_FILE, PRETRAINED) 
caffe.set_mode_gpu()

or

caffe.set_mode_gpu()
self.net = caffe.Classifier(MODEL_FILE, PRETRAINED)

this error is still there.

I will try suggested commits above. Thanks.

LiberiFatali · 2015-09-14T02:07:15Z

I also test on GPUs with and without connected monitor to see if it is the problem of freeing in-use GPU memory. Still got the error.

ronghanghu · 2015-09-14T16:52:37Z

@LiberiFatali I could not reproduce your mistake (here is the code snap I used).

import caffe
caffe.set_mode_gpu()

model   = './caffe/models/bvlc_reference_caffenet/deploy.prototxt'
weights = './caffe/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'
net = caffe.Classifier(model, weights)

impath = './caffe/examples/images/cat.jpg'
im = caffe.io.load_image(impath)
probs = net.predict([im])

It doesn't produce any error on a BVLC machine. Can you try out this code snap on your machine?

LiberiFatali · 2015-09-15T04:04:25Z

@ronghanghu Above codes work well on my machine. Setting mode gpu before creating the net solves it now.

eldar · 2015-09-23T14:17:01Z

I have exactly the same issue when cleaning up network instances in Matlab with call to caffe.reset_all(). Placing caffe.set_mode_gpu(); before loading the model also solved it.

cheer37 · 2016-03-04T13:39:27Z

I put the setmode before net creation. But i got this error still.
How can i solve this problem?
Thanks.

ucasqcz · 2016-05-04T14:29:30Z

@cheer37 ,have you fix the problem ,i did not use the latest PR of caffe,and i get the same error
Check failed: error == cudaSuccess (29 vs. 0) driver shutting down
how did you solve this problem ?
thanks.

zhxjlbs · 2016-12-19T01:43:05Z

@ucasqcz have you fix the problem, i get the same error
Check failed: error == cudaSuccess (29 vs. 0) driver shutting down
how did you solve this problem ?

ronghanghu mentioned this issue Sep 13, 2015

Always Use cudaMallocHost to allocate and pin CPU memory #3060

Closed

ronghanghu mentioned this issue Sep 16, 2015

Ensure consistency between memory alloc and free #3073

Merged

ronghanghu closed this as completed in #3073 Sep 26, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with cudaFreeHost(ptr) in syncedmem.hpp:30 #3053

Error with cudaFreeHost(ptr) in syncedmem.hpp:30 #3053

LiberiFatali commented Sep 10, 2015

hberntsen commented Sep 11, 2015

beniz commented Sep 12, 2015

ronghanghu commented Sep 12, 2015

ronghanghu commented Sep 12, 2015

ronghanghu commented Sep 12, 2015

beniz commented Sep 12, 2015

ronghanghu commented Sep 13, 2015

hberntsen commented Sep 13, 2015

LiberiFatali commented Sep 14, 2015

LiberiFatali commented Sep 14, 2015

ronghanghu commented Sep 14, 2015

LiberiFatali commented Sep 15, 2015

eldar commented Sep 23, 2015

cheer37 commented Mar 4, 2016

ucasqcz commented May 4, 2016

zhxjlbs commented Dec 19, 2016

Error with cudaFreeHost(ptr) in syncedmem.hpp:30 #3053

Error with cudaFreeHost(ptr) in syncedmem.hpp:30 #3053

Comments

LiberiFatali commented Sep 10, 2015

hberntsen commented Sep 11, 2015

beniz commented Sep 12, 2015

ronghanghu commented Sep 12, 2015

ronghanghu commented Sep 12, 2015

ronghanghu commented Sep 12, 2015

beniz commented Sep 12, 2015

ronghanghu commented Sep 13, 2015

hberntsen commented Sep 13, 2015

LiberiFatali commented Sep 14, 2015

LiberiFatali commented Sep 14, 2015

ronghanghu commented Sep 14, 2015

LiberiFatali commented Sep 15, 2015

eldar commented Sep 23, 2015

cheer37 commented Mar 4, 2016

ucasqcz commented May 4, 2016

zhxjlbs commented Dec 19, 2016