Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected all tensors to be on the same device #172

Closed
TimRepke opened this issue May 22, 2024 · 1 comment · Fixed by #173
Closed

Expected all tensors to be on the same device #172

TimRepke opened this issue May 22, 2024 · 1 comment · Fixed by #173

Comments

@TimRepke
Copy link

There appears to be an issue when running the code from chapter 6 (other sections not tested):

Error

Traceback (most recent call last):
  File "/home/user/workspace/project/llm/tune_incl.py", line 359, in <module>
    train_losses, val_losses, train_accs, val_accs, examples_seen = train_classifier_simple(
                                                                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/project/llm/tune_incl.py", line 155, in train_classifier_simple
    loss = calc_loss_batch(input_batch, target_batch, model, device)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/project/llm/tune_incl.py", line 112, in calc_loss_batch
    logits = model(input_batch)[:, -1, :]  # Logits of last output token
             ^^^^^^^^^^^^^^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/project/llm/util.py", line 173, in forward
    return logits
             ^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.venvs/main/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

Cause

I narrowed it down to this line:
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch06/01_main-chapter-code/gpt-class-finetune.py#L398

This replaces a the output layer after the model was moved to the GPU and later on trigger this error.

Solution

This issue can be mitigated by adding this just after that statement:

[...]
num_classes = 2
model.out_head = torch.nn.Linear(in_features=BASE_CONFIG["emb_dim"], out_features=num_classes)
# add this to move all model parameters to GPU
model = model.to(device)
[...]
@rasbt
Copy link
Owner

rasbt commented May 22, 2024

Thanks! I must have only tested this script on CPU. (The notebook should work fine on both CPU and GPU)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants