[Data] Add ds.distinct()
API to get unique values in a column.
#32984
Labels
data
Ray Data-related issues
enhancement
Request for new feature and/or capability
P1
Issue that should be fixed within a few weeks
Ray-2.6
Description
Currently if we want to get distinct values in a Ray dataset column, we have to write the following code.
It's kind of complicated and messy for users. I propose to add a
distinct()
api just like what pyspark does:Use case
For example, we have a image classification dataset and we want to collect all the unique labels in it. We can call this distinct() method.
The text was updated successfully, but these errors were encountered: