-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: CUDA error 710 bugfix #1424
Conversation
@bowang007 Make sure to review this |
From my perspective see nothing wrong with sampling between |
- Issue arising when compiling BERT models with 3+ inputs - Added temporary fix by decreasing the range of allowed values to the random number generator for creating input tensors to [0,2), instead of [0,5) - Used random float inputs in the range [0, 2) instead of int, then casted to desired type. The ultimate effect of this change with regard to bug pytorch#1418, is random floats are selected in the range [0, 2), then casted to Int, effectively making the range of allowed ints {0, 1}, as required by the model - More robust fix to follow
8f41d83
to
595b9f4
Compare
|
||
// Make the value range for input tensor a uniform (float) distribution | ||
// over [LoValIncl, HiValExcl), then cast to the desired dtype | ||
auto in = ((HiValExcl - LoValIncl) * at::rand(shape, {at::kCUDA}) + LoValIncl).to(type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used float inputs in the range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like a little bit hard-coded for this model only, but will be resolved once the input range is open to users by this RFC #1425.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Resolves a CUDA 710 error Issue arising when compiling BERT models with 3+ inputs. The issue arises due to the role of the third tensor in inference computations. Specifically, as specified in the BERT model code linked here, the third argument,$[0,1]$ . This means that when values outside of this set are used, the input is invalid.
token_type_ids
is of typetorch.LongTensor
, but can only take indices inThis becomes problematic when the inputs are, for example, indices in a dictionary or embedding - which seems to be the case here. Specifically,$[0,4]$ .
aten::embedding
is used with Tensors which are the product oftoken_type_ids
. The issue traces to one line in theshape_analysis
code previewed below, which initializes a random tensor with values in the rangeThis tensor is run through the
forward
function of the module to determine the shapes of outputs and causes the model compilation-time error, as featured here in the shape analysis code.I have added a temporary fix by decreasing the range of allowed values to the random number generator for creating input tensors to 0-1, instead of 0-4, and am working on a more robust fix.
Fixes #1418
Type of change
Please delete options that are not relevant and/or add your own.
Checklist: