-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caffe hang when creating data layer #3965
Comments
Problem appeared to be solved by adding "batch_size: 1" to both training and testing data. But still not sure why adding this will prevent the hang. Any insight from you guys will be helpful! |
That being said, a better user interface would be to have caffe raise an error instead of hang. Feel free to PR this change. |
@seanbell Sorry being a first timer on github: "PR"? |
It means to create a Pull Request. Some docs: |
Can caffe report the reason e.g. |
I created a simplest net to learn the division "/" function (input is A and B, label is A/B). However, when I try to run the trainer, it hang forever. If I do
killall caffe
, I see that it's waiting forBlockingQueue
. Searched around and it was mentioned (didn't note down the source) that it might be caused by the training and testing phase sharing the same lmdb. So I copied the same data to separate training and testing folders, but the problem persists.Wondering why the hang, and how I should debug this problem?
Here is the console output:
Here is my
solver.prototxt
:Here is my
net.prototxt
:Here is how I generated the training and label data:
The text was updated successfully, but these errors were encountered: