Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow 64-bit indexing for TMA instructions, with validation #3850

Merged
merged 68 commits into from
Feb 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
64b496f
Start implementing bounds-based index type calc
jacobhinkle Feb 6, 2025
85491cc
Fix up a bit.
jacobhinkle Feb 6, 2025
d2a0ce7
Move computation to after lowering in CompiledKernel ctor
jacobhinkle Feb 6, 2025
74d44d2
Fix dispatch of lowered kernel exprs
jacobhinkle Feb 7, 2025
1869f44
Fix some propagation
jacobhinkle Feb 7, 2025
e62266f
Fix dispatch. Still no parallel dims and some missing ops
jacobhinkle Feb 7, 2025
25b93ab
Bound namedscalars like parallel indices properly
jacobhinkle Feb 7, 2025
506d7bd
Implement xor
jacobhinkle Feb 7, 2025
ad2e69d
Fix bugs. Test passes
jacobhinkle Feb 7, 2025
d48912c
Remove most debug prints
jacobhinkle Feb 7, 2025
da5cb92
Merge remote-tracking branch 'origin/main' into jh/compute_index_type…
jacobhinkle Feb 7, 2025
9333054
WAR for toSmem swizzle evaluation problem
jacobhinkle Feb 7, 2025
6ebbc35
Add more bitwise ops
jacobhinkle Feb 7, 2025
efa8507
Check casts and cast TMA coords
jacobhinkle Feb 7, 2025
60f5f79
Remove check in canScheduleRuntime
jacobhinkle Feb 7, 2025
b861294
Fix unused variable
jacobhinkle Feb 7, 2025
9b8ec99
Set index type in kir::Kernel so that it affects generated code
jacobhinkle Feb 10, 2025
64f379b
Merge remote-tracking branch 'origin/main' into jh/compute_index_type…
jacobhinkle Feb 12, 2025
1fa0593
Place note about forcing index_type in heuristic
jacobhinkle Feb 12, 2025
0564558
Add summary.has_index_casts
jacobhinkle Feb 12, 2025
da20afd
Move to executor_utils::validateIndexCasts
jacobhinkle Feb 12, 2025
51a5f89
Lintrunner
jacobhinkle Feb 12, 2025
fa53d58
Undo more changes to executor.cpp
jacobhinkle Feb 12, 2025
d35b8c8
Remove commented code
jacobhinkle Feb 12, 2025
b80f62a
Add test of index type validation
jacobhinkle Feb 12, 2025
51f9ca7
More informative error message
jacobhinkle Feb 12, 2025
ea571c5
Annotate overridden methods with final
jacobhinkle Feb 12, 2025
5782e07
Improve "not yet implemented" error messages
jacobhinkle Feb 13, 2025
b37d6bc
Add abs
jacobhinkle Feb 13, 2025
f6d0590
Fix up operator/ and remove recip()
jacobhinkle Feb 14, 2025
c8b76b9
Fix countCommonHighBits in both branches of #if
jacobhinkle Feb 14, 2025
5dd7467
Improve comment on BoundedInt
jacobhinkle Feb 14, 2025
9effae0
Add comment on ScalarBoundsCalculator
jacobhinkle Feb 14, 2025
08eedd9
Unguard EpilogueFusionInt64Indexing test
jacobhinkle Feb 14, 2025
784b4e5
Use switch for better error message
jacobhinkle Feb 14, 2025
3ed95d3
Use push_back instead of emplace_back to fix clang build
jacobhinkle Feb 14, 2025
b55617c
Update csrc/runtime/executor_utils.cpp
jacobhinkle Feb 17, 2025
555ef76
Move code to new files interval_analysis.{cpp,h}
jacobhinkle Feb 17, 2025
4991ee6
Move Val compiler from test_expr_simplifier.cpp to own file
jacobhinkle Feb 18, 2025
4382e21
Add set helper for interval analysis tests
jacobhinkle Feb 18, 2025
7112749
Revert test_expr_simplifier to contain compiler code
jacobhinkle Feb 18, 2025
7badc08
Change visibility and use ScalarBoundsCalculator in tests
jacobhinkle Feb 18, 2025
bc003a1
Implement exhaustive checking for any number of inputs. Start BinaryO…
jacobhinkle Feb 18, 2025
26812eb
Fix errors in sub and div handling
jacobhinkle Feb 18, 2025
7c48fae
Add more tests
jacobhinkle Feb 18, 2025
5a3092b
Create ceilDiv function for BoundedInt
jacobhinkle Feb 18, 2025
3f3ae8d
More div tests
jacobhinkle Feb 18, 2025
ec1afab
Add missing check in BinaryOp::evaluate for Div
jacobhinkle Feb 18, 2025
0b4f427
Check for division or modulo by zero and ignore
jacobhinkle Feb 18, 2025
3bd0e7c
More tests
jacobhinkle Feb 18, 2025
f583de3
Test bitwise binary ops
jacobhinkle Feb 18, 2025
d42bf28
Fix bitwise binary ops and add shift tests
jacobhinkle Feb 18, 2025
0b83c10
Add Loops test
jacobhinkle Feb 18, 2025
9b8133c
Add ParallelLoops test
jacobhinkle Feb 19, 2025
3ff7f3f
Merge remote-tracking branch 'origin/main' into jh/compute_index_type…
jacobhinkle Feb 19, 2025
6cd63f8
clang-tidy, set default inits for BoundedInt, remove a leftover if st…
jacobhinkle Feb 19, 2025
4d9b4e3
Made dtor override instead of final
jacobhinkle Feb 19, 2025
8e96dad
Remove unneeded include to try and fix build on aarch64
jacobhinkle Feb 19, 2025
c1b5388
Add runtime check to avoid triggering failed validation
jacobhinkle Feb 20, 2025
3c51a54
Use macro to clean up bitwise ops
jacobhinkle Feb 21, 2025
9f5589a
Remove unused functions for computing index type
jacobhinkle Feb 21, 2025
e0239a8
Guard handle(LoadStoreOp*) to only handle integral scalars
jacobhinkle Feb 21, 2025
2f4a958
Rename boundNamedScalar->setBoundsForNamedScalar
jacobhinkle Feb 21, 2025
f17d340
Write macro to handle division like ops, fix mod and add tests
jacobhinkle Feb 21, 2025
2b819c5
Add example to comment on division by zero and split ranges
jacobhinkle Feb 21, 2025
9dc2db0
Remove NVF_API
jacobhinkle Feb 21, 2025
943f554
Rename as has_narrowing_index_casts
jacobhinkle Feb 21, 2025
95998df
Fix build
jacobhinkle Feb 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@ list(APPEND NVFUSER_SRCS
${NVFUSER_SRCS_DIR}/id_model/validation_utils.cpp
${NVFUSER_SRCS_DIR}/index_compute.cpp
${NVFUSER_SRCS_DIR}/instrumentation.cpp
${NVFUSER_SRCS_DIR}/interval_analysis.cpp
${NVFUSER_SRCS_DIR}/ir/base_nodes.cpp
${NVFUSER_SRCS_DIR}/ir/builder.cpp
${NVFUSER_SRCS_DIR}/ir/cloner.cpp
Expand Down Expand Up @@ -580,6 +581,7 @@ list(APPEND JIT_TEST_SRCS
${NVFUSER_ROOT}/tests/cpp/test_indexing.cpp
${NVFUSER_ROOT}/tests/cpp/test_indexing_advanced.cpp
${NVFUSER_ROOT}/tests/cpp/test_inlining.cpp
${NVFUSER_ROOT}/tests/cpp/test_interval_analysis.cpp
${NVFUSER_ROOT}/tests/cpp/test_iter_visitor.cpp
${NVFUSER_ROOT}/tests/cpp/test_linked_hash_map.cpp
${NVFUSER_ROOT}/tests/cpp/test_loop_domain_scheduling.cpp
Expand Down
9 changes: 9 additions & 0 deletions csrc/index_compute.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include <expr_simplifier.h>
#include <instrumentation.h>
#include <ir/all_nodes.h>
#include <ir/builder.h>
#include <ir/iostream.h>
#include <ir/utils.h>
#include <logical_domain_map.h>
Expand Down Expand Up @@ -2718,6 +2719,14 @@ std::pair<Val*, Val*> Index::getCpAsyncBulkGmemIndex(
auto indices_inner_to_outer =
indexer.getIndexFor(ldst, !is_load, ids_to_index, loops);

// These are the box coordinates of the TMA box, which must be of type
// int32_t. Possible overflow in each of these dims should be checked
// elsewhere.
for (size_t i : c10::irange(indices_inner_to_outer.size())) {
indices_inner_to_outer[i] =
IrBuilder::maybeCastExpr(DataType::Int32, indices_inner_to_outer[i]);
}

auto coordinate = IrBuilder::arrayExpr(indices_inner_to_outer);
auto descriptor = tma_info.tensorMap();
if (is_load) {
Expand Down
Loading