Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computing var/stddev and mean at the same time #18731

Closed
wants to merge 46 commits into from

Conversation

ifedan
Copy link
Contributor

@ifedan ifedan commented Apr 2, 2019

The current variance kernels compute mean at the same time. Many times we want both statistics together, so it seems reasonable to have a kwarg/function that allows us to get both values without launching an extra kernel.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifedan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifedan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ifedan
Copy link
Contributor Author

ifedan commented Apr 2, 2019

@pytorchbot retest this please

@vadimkantorov
Copy link
Contributor

(about naming: similar function is called tf.nn.moments in tensorflow and computes mean+variance, but mean+std seems useful as well, so name matching wouldn't be perfect anyways)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifedan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ifedan ifedan requested a review from gchanan April 2, 2019 22:36
@ifedan
Copy link
Contributor Author

ifedan commented Apr 4, 2019

@pytorchbot retest this please

@t-vi
Copy link
Collaborator

t-vi commented Apr 5, 2019

This looks nice. Do you have a rough indication of the relative performance between mean_std/mean_var and batch_norm_gather_stats?

@ifedan
Copy link
Contributor Author

ifedan commented Apr 5, 2019

@pytorchbot retest this please

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifedan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ifedan ifedan requested a review from umanwizard May 9, 2019 14:28
@pytorchbot pytorchbot added module: autograd Related to torch.autograd, and the autograd engine in general module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: docs Related to our documentation, both in docs/ and docblocks module: internals Related to internal abstractions in c10 and ATen module: operators module: tests Issues related to tests (not the torch.testing module) labels May 10, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifedan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@umanwizard
Copy link
Contributor

@ifedan I think there is an issue from previous reviews that was never fixed: Can we get rid of the c10::optional<IntArrayRef> stuff in make_dim_mask and make_reduction ?

Then we can have the no-dim functions work the same way as sum and prod, by calling into the other ones with an empty list {} of dims, e.g.:

Tensor sum(const Tensor &self, ScalarType dtype) {
  return at::native::sum(self, {}, false, optional<ScalarType>(dtype));
}

@ifedan
Copy link
Contributor Author

ifedan commented May 14, 2019

@umanwizard I did some regression performance test between previous and current implementation:

Results from master:

import timeit
SETUP_CODE = '''import torch; a = torch.randn(10000, 10);'''
TEST_CODE = '''torch.std(a, dim=1);'''
timeit.repeat(setup = SETUP_CODE,stmt = TEST_CODE,repeat = 3,number = 10)
[0.04082131403265521, 0.02319741202518344, 0.024329718959052116]

SETUP_CODE = '''import torch; a = torch.randn(10000, 10).cuda();'''
TEST_CODE = '''d1=torch.std(a, dim=1);print(d1[0])'''
timeit.repeat(setup = SETUP_CODE,stmt = TEST_CODE,repeat = 3,number = 10)
[0.013087920029647648, 0.015553846955299377, 0.015307638968806714]

Results from new implementation:

import timeit
SETUP_CODE = '''import torch; a = torch.randn(10000, 10);'''
TEST_CODE = '''torch.std(a, dim=1);'''
timeit.repeat(setup = SETUP_CODE,stmt = TEST_CODE,repeat = 3,number = 10)
[0.029049404023680836, 0.023604059999343008, 0.02448320999974385]

SETUP_CODE = '''import torch; a = torch.randn(10000, 10).cuda();'''
TEST_CODE = '''d1=torch.std(a, dim=1);print(d1[0])'''
timeit.repeat(setup = SETUP_CODE,stmt = TEST_CODE,repeat = 3,number = 10)
[0.01722334197256714, 0.015372609021142125, 0.01548504998208955]

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifedan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@umanwizard umanwizard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but can you check to see if Greg wants to look at it again, before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifedan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifedan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request May 16, 2019
Summary:
The current variance kernels compute mean at the same time. Many times we want both statistics together, so it seems reasonable to have a kwarg/function that allows us to get both values without launching an extra kernel.
Pull Request resolved: pytorch/pytorch#18731

Differential Revision: D14726082

Pulled By: ifedan

fbshipit-source-id: 473cba0227b69eb2240dca5e61a8f4366df0e029
@facebook-github-bot
Copy link
Contributor

@ifedan merged this pull request in 4c23c34.

for dim in range(x.dim()):
for unbiased in [False, True]:
for keepdim in [False, True]:
std1, mean1 = torch.std_mean(x, dim=dim, unbiased=unbiased, keepdim=keepdim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you ever check multiple dims that aren't all dims? the constituent functions support that, right?


- name: std_mean(Tensor self, IntArrayRef dim, bool unbiased, bool keepdim)
self: var_std_mean_backward(grads, self, result0, result1, dim, unbiased, keepdim, true)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you were going to remove these?

Repeating by previous comment:

but that's very different -- __rpow__ is a real python function, see https://docs.python.org/2.0/ref/numeric-types.html, not a made up function for testing.

I think the issue here is actually that common_methods_invocations.py doesn't work on methods. We should fix that! Can you take a look?

For the purposes of this PR, I'm okay with making a fake method, i.e. _std_mean that just calls your function std_mean and you can write a comment about the situation.

facebook-github-bot pushed a commit that referenced this pull request Jun 19, 2019
…20650)

Summary:
Added some extra tests for std_mean and var_mean for multiple dims.
Some refactoring of previously created tests based on PR comments: #18731
Pull Request resolved: #20650

Differential Revision: D15396101

Pulled By: ifedan

fbshipit-source-id: d15c3c2c7084a24d6cfea4018173552fcc9c03a9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: autograd Related to torch.autograd, and the autograd engine in general module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: docs Related to our documentation, both in docs/ and docblocks module: internals Related to internal abstractions in c10 and ATen module: tests Issues related to tests (not the torch.testing module)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants