case when improvement: avoid copy_if_else #2079

res-life · 2024-05-28T11:17:48Z

closes #2084

Case when improvement: avoid lots of copy_if_else which generates a string column.

For the following case when:

select
  case
    when bool_1_expr then "value_1"
    when bool_2_expr then "value_2"
    when bool_3_expr then "value_3"
    ......
    else "value_else"
  end
from tab

Current logic, link:

Iteratively invoke copy_if_else to merge the tail 2 branches.
This incurs lots of memory(string column) operations, here intruduced 3 copy_if_else

      val elseRet = elseValue
        .map(_.columnarEvalAny(batch))
        .getOrElse(GpuScalar(null, branches.last._2.dataType))
      val any = branches.foldRight[Any](elseRet) {
        case ((predicateExpr, trueExpr), falseRet) =>
          computeIfElse(batch, predicateExpr, trueExpr, falseRet)
      }

Improvement:

First evaluate all the when exprs and get bool columns.
Then select the first true in the bool columns and return the bool column index
Then select salars according to the select column.

implement 2 kernels to handle:

/**
 * select the first column index with true value.
 * e.g.:
 * column 0 in table: true,  false, false
 * column 1 in table: false, true,  false
 * column 2 in table: false, false, true
 * 
 * return column: 0, 1, 2
*/
std::unique_ptr<cudf::column> select_first_true_index(
  cudf::table_view const& when_bool_columns,
  rmm::cuda_stream_view stream        = cudf::get_default_stream(),
  rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
 * Select strings int scalar column according to index column
 * scalar column: s0, s1, s2
 * index  column: 0,  1,  2,  2,  1,  0,  3
 * output column: s0, s1, s2, s2, s1, s0, null
 * 
*/
std::unique_ptr<cudf::column> select_from_index(
  cudf::strings_column_view const& then_and_else_scalar_column,
  cudf::column_view const& select_index_column,
  rmm::cuda_stream_view stream        = cudf::get_default_stream(),
  rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

Signed-off-by: Chong Gao res_life@163.com

Signed-off-by: Chong Gao <res_life@163.com>

revans2 · 2024-05-28T13:00:31Z

My biggest concern is with side effects. And to be clear that is not an issue with this code. It is a more general problem with case/when I want you to be aware of. Case/when and if/else in Spark are lazy. This means that if an expression in the when part has a side effect, like can throw an exception, then we cannot evaluate it for any row that would cause an exception to be triggered. This appears to be specific to scalars in the case/when so it should be fine.

winningsix · 2024-05-29T06:07:15Z

@res-life created an issue for this. #2084

Signed-off-by: Chong Gao <res_life@163.com>

revans2 · 2024-05-29T13:18:30Z

src/test/java/com/nvidia/spark/rapids/jni/CaseWhenTest.java

@@ -42,6 +44,47 @@ void selectIndexTest() {
    }
  }

+  public static ColumnVector fromBooleansWithNulls(Boolean... values) {


Use https://github.com/rapidsai/cudf/blob/ff981a4048a389b0e2582e94d3397a83096d16c9/java/src/main/java/ai/rapids/cudf/ColumnVector.java#L1457 instead

res-life · 2024-05-30T07:56:54Z

build

ttnghia · 2024-05-30T17:49:49Z

Please provide benchmarks to show off better how much benefit this can provide?

res-life · 2024-06-03T01:09:38Z

Please provide benchmarks to show off better how much benefit this can provide?

Will do it.

res-life · 2024-06-03T09:06:21Z

For end to end perf result, refer to:
NVIDIA/spark-rapids#10951 (comment)

@ttnghia is it needed to add benchmark tests? Above is end to end result.

res-life · 2024-06-04T09:16:32Z

@ttnghia Help review again.

src/main/cpp/src/CaseWhenJni.cpp

src/main/cpp/src/case_when.cu

ttnghia · 2024-06-04T18:02:22Z

src/main/cpp/src/case_when.cu

+  if (row_count == 0)  // empty begets empty
+    return cudf::make_empty_column(cudf::type_id::INT32);


Suggested change

if (row_count == 0) // empty begets empty

return cudf::make_empty_column(cudf::type_id::INT32);

if (row_count == 0) { // empty begets empty

return cudf::make_empty_column(cudf::type_id::INT32);

}

ttnghia · 2024-06-04T18:03:09Z

src/main/cpp/src/case_when.cu

+    cudf::data_type{cudf::type_id::INT32}, row_count, cudf::mask_state::ALL_VALID, stream, mr);
+
+  // select first true index
+  auto d_table = cudf::table_device_view::create(when_bool_columns, stream);


Suggested change

auto d_table = cudf::table_device_view::create(when_bool_columns, stream);

auto const d_table_ptr = cudf::table_device_view::create(when_bool_columns, stream);

src/main/cpp/src/case_when.cu

ttnghia · 2024-06-04T18:23:53Z

src/main/cpp/src/case_when.hpp

+ * Select strings in scalar column according to index column.
+ * If index is out of bound, use NULL value
+ * e.g.:
+ *   scalar column: s0, s1, s2
+ *   index  column: 0,  1,  2,  2,  1,  0,  3
+ *   output column: s0, s1, s2, s2, s1, s0, NULL
+ *
+ */
+std::unique_ptr<cudf::column> select_from_index(
+  cudf::strings_column_view const& then_and_else_scalar_column,
+  cudf::column_view const& select_index_column,
+  rmm::cuda_stream_view stream        = cudf::get_default_stream(),
+  rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());


Oh wait. I just realize that this is just a gather. So we don't need this function at all. Just call cudf::gather (through Java_ai_rapids_cudf_Table_gather), which already supports all data types.

res-life · 2024-06-05T02:43:32Z

@ttnghia Thanks a lot, I will fix.

Signed-off-by: Chong Gao <res_life@163.com>

res-life · 2024-06-18T07:02:46Z

build

res-life · 2024-06-18T10:21:53Z

@ttnghia Please review again.

res-life · 2024-06-18T10:30:18Z

Spark-Rapids corresponding change:
NVIDIA/spark-rapids@7c43e69

// removed
val finalRet = CaseWhen.selectFromIndex(scalarCol, firstTrueIndex)

src/main/cpp/src/case_when.cu

src/main/cpp/src/case_when.hpp

src/main/cpp/src/case_when.cu

res-life · 2024-06-24T03:09:41Z

build

src/main/cpp/src/case_when.cu

ttnghia · 2024-06-24T03:19:32Z

build

ttnghia

Sorry there are redundant headers that need to be removed.

src/main/cpp/src/CaseWhenJni.cpp

src/main/cpp/src/case_when.hpp

Stale review

ttnghia · 2024-06-24T06:05:41Z

src/main/cpp/src/case_when.cu

+namespace spark_rapids_jni {
+namespace detail {
+
+/**
+ * Select the column index for the first true in bool columns for the specified row
+ */
+struct select_first_true_fn {


Nit: We should wrap anything that is locally used in a source file into an anonymous namespace to avoid name clashing in the future with other source files.

Done.
Added anonymous namespace

res-life · 2024-06-24T13:42:28Z

build

revans2

The code looks good, but what do the performance numbers look like?

wjxiz1992 · 2024-06-25T07:31:46Z

The perf numbers
NVIDIA/spark-rapids#10951 (comment)

res-life · 2024-06-25T13:06:11Z

Yes, please refer to the above link.
I retested against the latest code, also got a similar result.

case when improvement: avoid copy_if_else

2bf976b

Signed-off-by: Chong Gao <res_life@163.com>

res-life requested review from revans2 and thirtiseven May 28, 2024 11:17

Chong Gao added 2 commits May 29, 2024 14:14

update doc

b8db222

Signed-off-by: Chong Gao <res_life@163.com>

Fix null handing bug

27a4e35

res-life marked this pull request as ready for review May 29, 2024 09:22

res-life mentioned this pull request May 29, 2024

[FEA][Performance] Merge multiple "copy_if_else" for "case when" in the case of multiple branches #2084

Closed

revans2 reviewed May 29, 2024

View reviewed changes

res-life changed the base branch from branch-24.06 to branch-24.08 May 29, 2024 14:25

Refactor code: remove duplicated code

644849d

winningsix mentioned this pull request May 31, 2024

Case when performance improvement: reduce the copy_if_else [databricks] NVIDIA/spark-rapids#10951

Merged

Merge branch 'branch-24.08' into case-when-perf

157fd86

ttnghia reviewed Jun 4, 2024

View reviewed changes

src/main/cpp/src/CaseWhenJni.cpp Outdated Show resolved Hide resolved

ttnghia reviewed Jun 4, 2024

View reviewed changes