-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial grok #169
Merged
Merged
initial grok #169
Changes from 1 commit
Commits
Show all changes
52 commits
Select commit
Hold shift + click to select a range
ba87a04
initial grok
dan-garvey de9842b
use name prefix instead of new dataclass
dan-garvey b7965b1
some hacks
dan-garvey 6d3d261
more hack
dan-garvey b5f535d
fix moe-ffn
dan-garvey 4095db0
Add in some missing grok specific model structure and constants
KyleHerndon e71630a
Add attn_output_norm layer
archana-ramalingam 5772a3d
Update MOE block in decode
archana-ramalingam 3f2914a
Some fixes to the grok model
KyleHerndon 7c2e133
Merge branch 'main' into grokstar
archana-ramalingam e1261f5
Revert "Merge branch 'main' into grokstar"
archana-ramalingam a242bde
Fix merging main changes
archana-ramalingam bb40d12
Update tensor trace names
archana-ramalingam cfa8420
Update moe block test
archana-ramalingam 325696f
Update paged attention block with grok changes
archana-ramalingam 48fce0c
Update paged attention block with grok changes
archana-ramalingam d9e787c
Add use_grok to MOE block
archana-ramalingam ab084cc
Use use_grok in MOE block
archana-ramalingam 29e3603
Change MOE activation from silu to gelu for Grok
archana-ramalingam 0670e1d
Allow router weight norm for all MOEs
archana-ramalingam a4be20b
Update llm_configs to support llama and grok architectures
archana-ramalingam 3049f87
Remove comment
archana-ramalingam b8240c8
Add optional params for Grok
archana-ramalingam 5bf30e0
Add all models supported in sharktank
archana-ramalingam d970944
Make rope_freq_base mandatory param
archana-ramalingam b1fd818
small refactor/cleanup
dan-garvey 85e2f87
more cleanup
dan-garvey 7ed9a23
this shouldn't have been unrebased??
dan-garvey 3510634
fix use_hf args
dan-garvey a4ff36a
Make use_grok optional in MOE and Attention blocks
archana-ramalingam 940db2f
Add use_grok to moe_block_test
archana-ramalingam bb2f5a1
fix kv cache test
dan-garvey b6e52eb
Add PreGatherMoeBlock to import from layers
archana-ramalingam b790cb5
Add MOE block export for prefill + decode
archana-ramalingam 19218f3
Fix architecture variable
archana-ramalingam 7deb42a
Fix imports
archana-ramalingam 1b6cb6d
Fix rope_freq_base
archana-ramalingam 43b20c4
fix flaky test
dan-garvey d938a08
Merge branch 'main' into grokstar
archana-ramalingam 6aeeb4f
Add short versions for args
archana-ramalingam cac489c
Remove use_hf and use_grok options from llama
archana-ramalingam d5c27fe
Move create_kv_cache to utils folder
archana-ramalingam 10d6c87
Fix error
archana-ramalingam 4816c93
Merge branch 'main' into grokstar
archana-ramalingam 124503f
revert addition of dtype arg
dan-garvey 46c6eb6
Merge branch 'main' into grokstar
dan-garvey f3a8fb1
Remove attention_dtype
archana-ramalingam dcc1e8f
Merge branch 'main' into grokstar
dan-garvey 88e38e2
fix missing parenth
dan-garvey 430045b
correctly rebase T_T
dan-garvey f0a3e31
nonstrict
dan-garvey e5dc9e9
Merge branch 'main' into grokstar
dan-garvey File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Remove attention_dtype
- Loading branch information
commit f3a8fb18612b60cb2748ffa5ac423104c3833300
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch