Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ROW_CONVERSION test fails on GB100 and CUDA 12.8 #2988

Open
ttnghia opened this issue Feb 25, 2025 · 0 comments
Open

[BUG] ROW_CONVERSION test fails on GB100 and CUDA 12.8 #2988

ttnghia opened this issue Feb 25, 2025 · 0 comments
Labels
? - Needs Triage bug Something isn't working

Comments

@ttnghia
Copy link
Collaborator

ttnghia commented Feb 25, 2025

Running on a machine with GB100 and CUDA 12.8, the test fails as below:

 7/13 Test  #7: ROW_CONVERSION ...................***Failed   27.08 sec
Running main() from /root/work/spark-rapids-jni/target/jni/cmake-build/_deps/gtest-src/googletest/src/gtest_main.cc
[==========] Running 28 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 13 tests from ColumnToRowTests
[ RUN      ] ColumnToRowTests.Single
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] ColumnToRowTests.Single (450 ms)
[ RUN      ] ColumnToRowTests.SimpleString
[       OK ] ColumnToRowTests.SimpleString (1 ms)
[ RUN      ] ColumnToRowTests.DoubleString
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] ColumnToRowTests.DoubleString (0 ms)
[ RUN      ] ColumnToRowTests.BigStrings
[       OK ] ColumnToRowTests.BigStrings (2 ms)
[ RUN      ] ColumnToRowTests.ManyStrings
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] ColumnToRowTests.ManyStrings (3556 ms)
[ RUN      ] ColumnToRowTests.Simple
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] ColumnToRowTests.Simple (2 ms)
[ RUN      ] ColumnToRowTests.Tall
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] ColumnToRowTests.Tall (1 ms)
[ RUN      ] ColumnToRowTests.Wide
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] ColumnToRowTests.Wide (20 ms)
[ RUN      ] ColumnToRowTests.SingleByteWide
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] ColumnToRowTests.SingleByteWide (19 ms)
[ RUN      ] ColumnToRowTests.Non2Power
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] ColumnToRowTests.Non2Power (41 ms)
[ RUN      ] ColumnToRowTests.Big
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] ColumnToRowTests.Big (993 ms)
[ RUN      ] ColumnToRowTests.Bigger
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] ColumnToRowTests.Bigger (4513 ms)
[ RUN      ] ColumnToRowTests.Biggest
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] ColumnToRowTests.Biggest (4640 ms)
[----------] 13 tests from ColumnToRowTests (14244 ms total)

[----------] 15 tests from RowToColumnTests
[ RUN      ] RowToColumnTests.Single
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] RowToColumnTests.Single (1 ms)
[ RUN      ] RowToColumnTests.Simple
[       OK ] RowToColumnTests.Simple (16 ms)
[ RUN      ] RowToColumnTests.Tall
[       OK ] RowToColumnTests.Tall (0 ms)
[ RUN      ] RowToColumnTests.Wide
[       OK ] RowToColumnTests.Wide (44 ms)
[ RUN      ] RowToColumnTests.SingleByteWide
[       OK ] RowToColumnTests.SingleByteWide (43 ms)
[ RUN      ] RowToColumnTests.AllTypes
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] RowToColumnTests.AllTypes (1 ms)
[ RUN      ] RowToColumnTests.AllTypesLarge
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] RowToColumnTests.AllTypesLarge (4374 ms)
[ RUN      ] RowToColumnTests.Non2Power
[       OK ] RowToColumnTests.Non2Power (38 ms)
[ RUN      ] RowToColumnTests.Big
[       OK ] RowToColumnTests.Big (473 ms)
[ RUN      ] RowToColumnTests.Bigger
[       OK ] RowToColumnTests.Bigger (2159 ms)
[ RUN      ] RowToColumnTests.Biggest
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] RowToColumnTests.Biggest (4176 ms)
[ RUN      ] RowToColumnTests.SimpleString
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] RowToColumnTests.SimpleString (0 ms)
[ RUN      ] RowToColumnTests.DoubleString
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] RowToColumnTests.DoubleString (0 ms)
[ RUN      ] RowToColumnTests.BigStrings
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] RowToColumnTests.BigStrings (2 ms)
[ RUN      ] RowToColumnTests.ManyStrings
unknown file: Failure
C++ exception with description "parallel_for failed: cudaErrorInvalidDevice: invalid device ordinal" thrown in the test body.
[  FAILED  ] RowToColumnTests.ManyStrings (1232 ms)
[----------] 15 tests from RowToColumnTests (12567 ms total)

[----------] Global test environment tear-down
[==========] 28 tests from 2 test suites ran. (26811 ms total)
[  PASSED  ] 9 tests.
[  FAILED  ] 19 tests, listed below:
[  FAILED  ] ColumnToRowTests.Single
[  FAILED  ] ColumnToRowTests.DoubleString
[  FAILED  ] ColumnToRowTests.ManyStrings
[  FAILED  ] ColumnToRowTests.Simple
[  FAILED  ] ColumnToRowTests.Tall
[  FAILED  ] ColumnToRowTests.Wide
[  FAILED  ] ColumnToRowTests.SingleByteWide
[  FAILED  ] ColumnToRowTests.Non2Power
[  FAILED  ] ColumnToRowTests.Big
[  FAILED  ] ColumnToRowTests.Bigger
[  FAILED  ] ColumnToRowTests.Biggest
[  FAILED  ] RowToColumnTests.Single
[  FAILED  ] RowToColumnTests.AllTypes
[  FAILED  ] RowToColumnTests.AllTypesLarge
[  FAILED  ] RowToColumnTests.Biggest
[  FAILED  ] RowToColumnTests.SimpleString
[  FAILED  ] RowToColumnTests.DoubleString
[  FAILED  ] RowToColumnTests.BigStrings
[  FAILED  ] RowToColumnTests.ManyStrings
@ttnghia ttnghia added ? - Needs Triage bug Something isn't working labels Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant