Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[model support] Requesting support for Gemma 2 #10

Closed
sand-bit opened this issue Jul 28, 2024 · 1 comment
Closed

[model support] Requesting support for Gemma 2 #10

sand-bit opened this issue Jul 28, 2024 · 1 comment

Comments

@sand-bit
Copy link

Requesting support for Gemma 2 27B/9B.

  1. Sliding window attention, used every other layer
  2. Logit soft capping (aka soft capping, so no flash attention support right now)
  3. Possible tokenizer bugs in transformer implementation Investigate gemma 2 generation quality ggerganov/llama.cpp#8240 (comment) (See that issue for other discussions on tokenizer bugs. Maybe okay in Google's official Keras implementation?)
@james0zan
Copy link
Member

As it is a dense model, we may currently not be able to run Gemma 2 faster than its official support. Thus, Gemma's support may not be available in the coming days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants