Skip to content

Commit

Permalink
vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#1…
Browse files Browse the repository at this point in the history
  • Loading branch information
mgroeber9110 authored Jan 30, 2025
1 parent 4314e56 commit ffd0821
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/llama-vocab.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1692,7 +1692,7 @@ void llama_vocab::impl::load(llama_model_loader & ml, const LLM_KV & kv) {
GGML_ASSERT(!ids.empty() && "model vocab missing newline token");
linefeed_id = ids[0];
} else {
const std::vector<int> ids = tokenize("\xC4\x8A", false); // U+010A
const std::vector<int> ids = tokenize("\n", false);

//GGML_ASSERT(!ids.empty() && "model vocab missing newline token");
if (ids.empty()) {
Expand Down

0 comments on commit ffd0821

Please sign in to comment.