Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

thrust::reduce_by_key error for 2^31 elements #1609

Closed
seunghwak opened this issue Jan 27, 2022 · 1 comment · Fixed by #1671
Closed

thrust::reduce_by_key error for 2^31 elements #1609

seunghwak opened this issue Jan 27, 2022 · 1 comment · Fixed by #1671
Assignees
Labels
backend: CUDA Related to the CUDA backend P1: should have Necessary, but not critical. type: bug: functional Does not work as intended.
Milestone

Comments

@seunghwak
Copy link

Tested environment
GA102 (48 GB GPU memory)
CUDA 11.4
Thrust 1.15.0

The code below failed inside the 'reduce_by_key' call with a memory allocation failure; tried to allocate 18446744073694639872 bytes (16 EB) and failed as my system has only 48 GB. This sounds like a bug related to 32 bit signed integer overflow.

cudaDeviceSynchronize();
std::cout << "test thrust reduce_by_key START" << std::endl;
constexpr size_t num_elements = std::size_t{1} << 31;
constexpr int32_t max_key = 8;
thrust::device_vector<int32_t> int_values(num_elements);
thrust::tabulate(thrust::device, int_values.begin(), int_values.end(), [max_key]__device__(auto i) {
  return static_cast<int32_t>(i % max_key);
});
thrust::sort(thrust::device, int_values.begin(), int_values.end());
thrust::device_vector<int32_t> keys(max_key);
thrust::device_vector<size_t> values(max_key);
thrust::reduce_by_key(thrust::device, int_values.begin(), int_values.end(), thrust::make_constant_iterator(size_t{1}), keys.data(), values.data());
cudaDeviceSynchronize();
std::cout << "test thrust reduce_by_key END" << std::endl;
@alliepiper alliepiper added type: bug: functional Does not work as intended. P1: should have Necessary, but not critical. labels Jan 27, 2022
@alliepiper alliepiper added this to the 1.17.0 milestone Jan 27, 2022
@alliepiper alliepiper added the backend: CUDA Related to the CUDA backend label Apr 25, 2022
@gevtushenko
Copy link
Collaborator

I confirm the issue, here's a reproducer that doesn't require GPU with 48 GB:

#include <thrust/device_vector.h>
#include <thrust/tabulate.h>
#include <thrust/reduce.h>
#include <thrust/sort.h>
#include <thrust/iterator/constant_iterator.h>

int main() {
  constexpr size_t num_elements = std::size_t{1} << 31;
  constexpr char max_key = 8;
  thrust::device_vector<char> int_values(num_elements);
  thrust::tabulate(thrust::device, int_values.begin(), int_values.end(),
                   [max_key] __device__(auto i) {
                     return static_cast<char>(i % max_key);
                   });
  thrust::sort(thrust::device, int_values.begin(), int_values.end());
  thrust::device_vector<char> keys(max_key);
  thrust::device_vector<char> values(max_key);
  thrust::reduce_by_key(thrust::device, int_values.begin(), int_values.end(),
                        thrust::make_constant_iterator(char{1}), keys.data(),
                        values.data());
}

The issue is related to the following cast. I'll create a PR with a fix soon.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backend: CUDA Related to the CUDA backend P1: should have Necessary, but not critical. type: bug: functional Does not work as intended.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants