Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3.1: replace for loop by tensor ops at inv_freq initialization #32244

Merged
merged 2 commits into from
Jul 27, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 16 additions & 12 deletions src/transformers/modeling_rope_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -324,18 +324,22 @@ def _compute_llama3_parameters(

low_freq_wavelen = old_context_len / low_freq_factor
high_freq_wavelen = old_context_len / high_freq_factor
new_freqs = []
for freq in inv_freq:
wavelen = 2 * math.pi / freq
if wavelen < high_freq_wavelen:
new_freqs.append(freq)
elif wavelen > low_freq_wavelen:
new_freqs.append(freq / factor)
else:
assert low_freq_wavelen != high_freq_wavelen
smooth = (old_context_len / wavelen - low_freq_factor) / (high_freq_factor - low_freq_factor)
new_freqs.append((1 - smooth) * freq / factor + smooth * freq)
inv_freq = torch.tensor(new_freqs, dtype=inv_freq.dtype, device=inv_freq.device)
assert low_freq_wavelen != high_freq_wavelen
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's raise an error here please


# wavelen < high_freq_wavelen: do nothing
# wavelen > low_freq_wavelen: divide by factor
# otherwise: interpolate between the two, using a smooth factor
wavelen = 2 * math.pi / inv_freq

inv_freq_new = torch.where(wavelen > low_freq_wavelen, inv_freq / factor, inv_freq)

smooth = (old_context_len / wavelen - low_freq_factor) / (high_freq_factor - low_freq_factor)
inv_freq_new = torch.where(
~(wavelen < high_freq_wavelen) * ~(wavelen > low_freq_wavelen),
(1 - smooth) * inv_freq_new / factor + smooth * inv_freq_new,
inv_freq_new,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can split this in 3 lines!


return inv_freq, attention_factor


Expand Down
Loading