Expected all tensors to be on the same device #172

TimRepke · 2024-05-22T18:19:04Z

There appears to be an issue when running the code from chapter 6 (other sections not tested):

Error

Traceback (most recent call last):
  File "/home/user/workspace/project/llm/tune_incl.py", line 359, in <module>
    train_losses, val_losses, train_accs, val_accs, examples_seen = train_classifier_simple(
                                                                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/project/llm/tune_incl.py", line 155, in train_classifier_simple
    loss = calc_loss_batch(input_batch, target_batch, model, device)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/project/llm/tune_incl.py", line 112, in calc_loss_batch
    logits = model(input_batch)[:, -1, :]  # Logits of last output token
             ^^^^^^^^^^^^^^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/project/llm/util.py", line 173, in forward
    return logits
             ^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

Cause

I narrowed it down to this line:
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch06/01_main-chapter-code/gpt-class-finetune.py#L398

This replaces a the output layer after the model was moved to the GPU and later on trigger this error.

Solution

This issue can be mitigated by adding this just after that statement:

[...]
num_classes = 2
model.out_head = torch.nn.Linear(in_features=BASE_CONFIG["emb_dim"], out_features=num_classes)
# add this to move all model parameters to GPU
model = model.to(device)
[...]

The text was updated successfully, but these errors were encountered:

rasbt · 2024-05-22T22:53:06Z

Thanks! I must have only tested this script on CPU. (The notebook should work fine on both CPU and GPU)

rasbt mentioned this issue May 22, 2024

Fix device setting #173

Merged

rasbt closed this as completed in #173 May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expected all tensors to be on the same device #172

Expected all tensors to be on the same device #172

TimRepke commented May 22, 2024

rasbt commented May 22, 2024

Expected all tensors to be on the same device #172

Expected all tensors to be on the same device #172

Comments

TimRepke commented May 22, 2024

Error

Cause

Solution

rasbt commented May 22, 2024