-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Memory Usage for Multiple GPUS #1399
Comments
Should have been fixed by #507, unless something has caused a regression. Are you having this problem with current dev? In the medium term, this should become totally impossible with per-net device settings. |
This issue is still unresolved. The DataLayer prefetch thread, "DataLayer::InternalThreadEntry" in data_layer.cpp, indirectly calls "cudaEventCreate" in benchmark.cpp. The whole story is that "DataLayer::InternalThreadEntry()" instantiates an object of class "CPUTimer". "CPUTimer" is derived from "Timer", and the constructor of "Timer" calls "Timer::Init", which calls "cudaEventCreate". Since only the main thread calls caffe::Caffe::SetDevice, this other thread defaults to GPU 0. In my machine, this results in a memory allocation of about 38M on GPU 0 for each instance of Caffe (see my example below). In my application, I cannot not run two instances of Caffe on a GTX 690 because it exceeds the 2048MB limit, so I have had to fix this problem by modifying class CPUTimer so that it no longer inherits Timer. If you find my fix acceptable, please let me know and I will create and submit a pull request. |
…as causing the allocation of 38MiB of memory on GPU 0 when we instructed caffe to run on another GPU.
I got the same issue, and finally found to be the use from opencv in preprocessing, particularly cv::merge, cv::subtract and cv::split which are common operations for caffe. |
Hi, Guys
I'm wondering if anyone has already posted this issue before. I'm working on a machine with two Tesla K40 gpus. When I trained a network on GPU1, there is a thread in GPU0 which takes a portion of GPU memory. I used to find this problem on early version of Caffe, in which the process in GPU0 took a duplicated memory block but the utilization of GPU is 0. Now it only takes a portion of the total memory usage.
I guess it is because that some initialization of the code is fixed to use GPU0, but where is it?
The text was updated successfully, but these errors were encountered: