[WIP] Compute bounds for Index scalars in lowered kernel #3850

jacobhinkle · 2025-02-07T17:37:06Z

This extends #3599 by also computing the minimal dtype required by the expressions in the lowered kernel. Like in #3599, we cast from nvfuser_index_t to int32_t when passing coords to the TMA expression. However, unlike #3599 we actually verify that this is safe to do by checking the bounds of the inputs to those casts. This way we can safely use 64-bit indexing with TMA and know that we will not get silently incorrect results. Also, we will more commonly use 32-bit indexing because with TMA we often do not have extremely large values for index variables since TMA allows us to do multi-dimensional indexing.

Fixes #3601

TODO: add a few tests

Need to figure out where to put this. I think we should not concern ourselves with index type during lowering at all, and only do this afterward.

…_by_bounds

github-actions · 2025-02-07T17:37:51Z

Review updated until commit 9b8ec99

Description

Added bounds-based index type calculation
Implemented BoundedInt struct for interval arithmetic
Created ScalarBoundsCalculator class for computing bounds of scalars
Updated KernelExecutor to compute index type after lowering

Changes walkthrough 📝

Relevant files

Enhancement

index_compute.cpp `Cast TMA box coordinates to int32_t` csrc/index_compute.cpp Included `ir/builder.h` Added casting of TMA box coordinates to `int32_t`	+9/-0
executor.cpp `Compute bounds for index scalars` csrc/runtime/executor.cpp Included additional headers for expression evaluation Defined `BoundedInt` struct for interval arithmetic Implemented `ScalarBoundsCalculator` class for bounds computation Updated `KernelExecutor::compile` to compute index type after lowering	+556/-7
matmul_utils.cpp `Remove index type check for Hopper matmul` csrc/scheduler/matmul_utils.cpp Removed index type check for Hopper matmul	+0/-8
kernel.h `Add setIndexType method to Kernel` csrc/kernel.h Included `type.h` Added `setIndexType` method to `Kernel` class	+5/-0

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests

⚡ Recommended focus areas for review

Possible Issue

The recip method in BoundedInt class does not handle the case where min and max are both negative correctly. It should handle the case where both min and max are negative and return the correct bounds.

BoundedInt recip() const {
  if (!canBeZero()) {
    return BoundedInt{1L / max, 1L / min};
  }

  if (min == 0L) {
    if (max == 0) {
      return BoundedInt{};
    }
    return BoundedInt{1 / max, std::numeric_limits<int64_t>::max()};
  } else if (max == 0L) {
    return BoundedInt{std::numeric_limits<int64_t>::min(), 1L / min};
  } else {
    return BoundedInt{};
  }
}

Performance Concern

The boundByDataType method in ScalarBoundsCalculator class does not handle the case where the bounds exceed the limits of the data type correctly. It should provide a more detailed error message or handle the overflow in a more robust way.

//! Return the bounds, computed over all scalars in the fusion with the given
//! data type
BoundedInt boundByDataType(DataType dtype = DataType::Index) {
  BoundedInt ret;
  bool initialized = false;
  for (auto& [val, b] : bounds_) {
    if (val->dtype() != dtype) {
      continue;
    }
    if (!initialized) {
      ret = b;
      initialized = true;
    } else {
      ret.min = std::min(ret.min, b.min);
      ret.max = std::max(ret.max, b.max);
    }
    if (b.min < std::numeric_limits<int32_t>::min() ||
        b.max > std::numeric_limits<int32_t>::max()) {
    }
  }
  return ret;
}

Code Quality

The operator* method in BoundedInt class does not handle overflow correctly. It should handle overflow in a more robust way, possibly by using a larger data type for intermediate calculations.

BoundedInt operator*(const BoundedInt& other) const {
  // TODO: How should we handle overflow here?
  std::vector<int64_t> xs{
      min * other.min, min * other.max, max * other.min, max * other.max};
  return BoundedInt{
      *std::min_element(xs.begin(), xs.end()),
      *std::max_element(xs.begin(), xs.end())};
}

BoundedInt operator*(const int64_t other) const {
  if (other < 0L) {
    return BoundedInt{max * other, min * other};
  }
  return BoundedInt{min * other, max * other};
}

jacobhinkle added 11 commits February 6, 2025 15:47

Start implementing bounds-based index type calc

64b496f

Fix up a bit.

85491cc

Need to figure out where to put this. I think we should not concern ourselves with index type during lowering at all, and only do this afterward.

Move computation to after lowering in CompiledKernel ctor

d2a0ce7

Fix dispatch of lowered kernel exprs

74d44d2

Fix some propagation

1869f44

Fix dispatch. Still no parallel dims and some missing ops

e62266f

Bound namedscalars like parallel indices properly

25b93ab

Implement xor

506d7bd

Fix bugs. Test passes

ad2e69d

Remove most debug prints

d48912c

Merge remote-tracking branch 'origin/main' into jh/compute_index_type…

da5cb92

…_by_bounds

jacobhinkle added 6 commits February 7, 2025 13:01

WAR for toSmem swizzle evaluation problem

9333054

Add more bitwise ops

6ebbc35

Check casts and cast TMA coords

efa8507

Remove check in canScheduleRuntime

60f5f79

Fix unused variable

b861294

Set index type in kir::Kernel so that it affects generated code

9b8ec99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Compute bounds for Index scalars in lowered kernel #3850

[WIP] Compute bounds for Index scalars in lowered kernel #3850

jacobhinkle commented Feb 7, 2025 •

edited

Loading

github-actions bot commented Feb 7, 2025 •

edited

Loading

[WIP] Compute bounds for Index scalars in lowered kernel #3850

Are you sure you want to change the base?

[WIP] Compute bounds for Index scalars in lowered kernel #3850

Conversation

jacobhinkle commented Feb 7, 2025 • edited Loading

github-actions bot commented Feb 7, 2025 • edited Loading

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

jacobhinkle commented Feb 7, 2025 •

edited

Loading

github-actions bot commented Feb 7, 2025 •

edited

Loading