Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use tensor random generator #1094

Closed
wants to merge 5 commits into from
Closed

[WIP] Use tensor random generator #1094

wants to merge 5 commits into from

Conversation

msperber
Copy link
Collaborator

@msperber msperber commented Nov 24, 2017

Use tensor random generator instead of the standard one. This should fix speed issues on GPUs due to creating random numbers on CPU and then transferring to GPU, as was done previously.

I implemented this carefully, but it would be good if someone could review the code, as this involves core functionality and is hard to unit-test.

Some numbers on a CPU machine and a GPU machine:

e = dy.random_normal((1000,1000), batch_size=40)
e2 = dy.dropout(e, 0.5)
v = e2.value()

old, CPU: 8.45s
new, CPU: 8.08s
old, GPU: 6.33s
new, GPU: 3.48s

I suspect that most of the 3.48s were dynet overhead because in my particular use case dropout went from consuming 90% of computation time to <1%.

fixes #542, #438
related to #1059

  • Initial implementation using Tensor random generator
  • fix strange behavior on some GPUs

Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! This is long-needed. I have a few comments though.

std::uniform_int_distribution<> seed_dist(1, 2147483647);
Eigen::internal::UniformRandomGenerator<float> uni_rg(seed_dist(*rndeng));
m.tvec().device(*dev.edevice) = m.tvec().random(uni_rg);
m.tvec().device(*dev.edevice) = (m.tvec() < m.tvec().constant((1.f-p))).cast<float>() * 1.f / (1.f-p);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need *1.f here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed now.

@@ -30,7 +30,10 @@ size_t Dropout::aux_storage_size() const {
template<class MyDevice>
void Dropout::forward_dev_impl(const MyDevice & dev, const vector<const Tensor*>& xs, Tensor& fx) const {
Tensor m(dim, (float*)aux_mem, fx.device, DeviceMempool::FXS);
TensorTools::randomize_bernoulli(m, (1.f-p), 1.f / (1.f-p));
std::uniform_int_distribution<> seed_dist(1, 2147483647);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of this code being copied all the time. If this is something that we'll want to use frequently, perhaps it can be made a global variable in the DyNet namespace or something?

Also, tangentially, there are some random number generation functions in tensor.h/tensor.cc that it might be reasonable to move to their own rand.h/rand.cc files, along with this one. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good, I moved the random helpers to their own files and added a draw_random_seed() helper to avoid code duplication.

@@ -30,7 +30,10 @@ size_t Dropout::aux_storage_size() const {
template<class MyDevice>
void Dropout::forward_dev_impl(const MyDevice & dev, const vector<const Tensor*>& xs, Tensor& fx) const {
Tensor m(dim, (float*)aux_mem, fx.device, DeviceMempool::FXS);
TensorTools::randomize_bernoulli(m, (1.f-p), 1.f / (1.f-p));
std::uniform_int_distribution<> seed_dist(1, 2147483647);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is there a reason why you didn't just implement all of this in TensorTools? It seems like that would work as well. If there's a reason that's no problem though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do that partly because the TensorTools are also used for parameter initialization which I didn't want to touch, and partly because I didn't know how to properly move over the device-related code.

@neubig
Copy link
Contributor

neubig commented Nov 28, 2017

OK, I tested this on CPU and GPU with the current master branch and this PR using the attached script. It looks like master CPU/GPU and PR CPU are consistent, but the PR on GPU gives weird results. Could you try running the script and seeing what's up?

dynet-random.py.zip

@msperber
Copy link
Collaborator Author

msperber commented Nov 28, 2017

Thanks for taking a look, that is strange indeed. random_normal() and random_uniform() look fine to me but bernoulli and dropout do not, which would hint at this line being broken on GPU:

m.tvec().device(*dev.edevice) = (m.tvec() < m.tvec().constant((1.f-p))).cast<float>() / (1.f-p);

The result of operator< is a Tensor<bool> which might potentially be a problem?

@msperber msperber changed the title Use tensor random generator [WIP] Use tensor random generator Dec 3, 2017
@neubig
Copy link
Contributor

neubig commented Dec 19, 2017

OK, I'm getting very strange behavior even on RandomNormal. It looks like almost exactly half the time it's sampling the exact same value, while half the time it's sampling from the appropriate distribution. This looks like a relatively major bug in the upstream Eigen implementation. I'll try to reproduce it only in Eigen and see if they're interested in fixing it.

@neubig
Copy link
Contributor

neubig commented Dec 19, 2017

FYI, we're having some discussions offline about how to implement this directly in cuRAND, which will presumably be more stable than relying on Eigen and potentially easier.

@neubig neubig mentioned this pull request Dec 22, 2017
@neubig
Copy link
Contributor

neubig commented Jan 24, 2018

Closed in favor of #1154.

@neubig neubig closed this Jan 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dropout speed issues
2 participants