Add unique_count algorithm #1612

upsj · 2022-02-04T23:34:21Z

For the common count -> allocate -> fill pattern (e.g. count_if/copy_if) there are some fill algorithms without a corresponding count algorithm, mainly unique_copy, unique_by_key_copy and reduce_by_key. So I want to suggest adding a count_unique algorithm with basically the same interface as unique save the return type:

difference_type thrust::count_unique(
        Policy exec,
        ForwardIterator first,
        ForwardIterator last,
        BinaryPredicate binary_pred)

The name probably needs a bit of discussion, since it doesn't actually count unique elements, but runs of unique elements, but unique has the same issue, so there is some prior art for it.

@allisonvacanti suggested using cub::DeviceRunLengthEncode::Encode to implement it, so I wanted to check whether there are any other approaches to consider before starting work on a PR. Another simple approach might be

auto zip_it = zip(it, it + 1);
if (size > 0) {
    return 1 + count_if(zip_it, zip_it + size - 1, [](auto a) { !binary_pred(get<0>(a), get<1>(a)); });
} else {
    return 0;
}

The text was updated successfully, but these errors were encountered:

jrhemstad · 2022-02-06T17:08:34Z

CC @codereport on naming bikeshedding. I'd advocate for unique_count so it sorts next to the other unique_* algorithms.

I suppose cub::DeviceRunLengthEncode::Encode/NonTrivialRuns with discard iterators for the unique values/counts would work. I'd benchmark it against just doing count_if.

codereport · 2022-02-06T17:41:26Z

Long story short, I agree with @jrhemstad, unique_count is the correct name.

More details: we sort of had the opposite issue at one point in RAPIDS. We had an algorithm/API called unique_count but it was really not the right name in the C++ sense because it was counting the number of "unique" (not in the C++ sense) elements. Basically the equivalent in Python is:

def unique_count(list):
   return len(set(list))

We ended up changing the name to distinct_count.

upsj · 2022-02-07T14:42:29Z

I will go with unique_count then. Should this be part of a new header or in unique/unique_by_key? Also it looks like head_flags already provides the necessary zip functionality wrapped neatly.

Add a counting equivalent to unique_* algorithms that can be used to allocate the correct amount of data before actually filling it. Addresses issue NVIDIA#1612

upsj changed the title ~~Add count_unique algorithm~~ Add unique_count algorithm Feb 13, 2022

upsj added a commit to upsj/thrust that referenced this issue Feb 13, 2022

add unique_count algorithm

6ad2aca

Add a counting equivalent to unique_* algorithms that can be used to allocate the correct amount of data before actually filling it. Addresses issue NVIDIA#1612

upsj added a commit to upsj/thrust that referenced this issue Feb 13, 2022

add unique_count algorithm

e0c9acf

Add a counting equivalent to unique_* algorithms that can be used to allocate the correct amount of data before actually filling it. Addresses issue NVIDIA#1612

upsj mentioned this issue Feb 13, 2022

Add unique_count algorithm #1619

Merged

upsj added a commit to upsj/thrust that referenced this issue Mar 11, 2022

add unique_count algorithm

7c5c3bb

Add a counting equivalent to unique_* algorithms that can be used to allocate the correct amount of data before actually filling it. Addresses issue NVIDIA#1612

upsj added a commit to upsj/thrust that referenced this issue Apr 17, 2022

add unique_count algorithm

8aecfe5

Add a counting equivalent to unique_* algorithms that can be used to allocate the correct amount of data before actually filling it. Addresses issue NVIDIA#1612

alliepiper closed this as completed in #1619 May 7, 2022

This was referenced Feb 8, 2024

♻️📝 Update mode example to use thrust::unique_count #1986

Closed

♻️📝 Update mode example to use thrust::unique_count NVIDIA/cccl#1354

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unique_count algorithm #1612

Add unique_count algorithm #1612

upsj commented Feb 4, 2022 •

edited

Loading

jrhemstad commented Feb 6, 2022

codereport commented Feb 6, 2022

upsj commented Feb 7, 2022

Add unique_count algorithm #1612

Add unique_count algorithm #1612

Comments

upsj commented Feb 4, 2022 • edited Loading

jrhemstad commented Feb 6, 2022

codereport commented Feb 6, 2022

upsj commented Feb 7, 2022

upsj commented Feb 4, 2022 •

edited

Loading