Skip to content

Commit

Permalink
Use fallback config if class not defined (#53)
Browse files Browse the repository at this point in the history
Fixes distilgpt2 tokenization.

Previously, we only used the fallback configuration if there was no
`tokenizer_config.json` in the model repo. These files are now being
added to some repos in the context of removing dependencies with
transformers' internals, like this PR:
huggingface/transformers#29112. But only keys
removed from the hardcoded rules are being added to minimize potential
breaking changes.

We now use the fallback config if tokenizer_config.json exists, no
tokenizer class is specified, and we do have a fallback config for this
architecture.
  • Loading branch information
pcuenca authored Feb 29, 2024
1 parent bbbd7bf commit 03d86ac
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions Sources/Hub/Hub.swift
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,14 @@ public class LanguageModelConfigurationFromHub {
// Try to guess the class if it's not present and the modelType is
if let _ = hubConfig.tokenizerClass?.stringValue { return hubConfig }
guard let modelType = try await modelType else { return hubConfig }

// If the config exists but doesn't contain a tokenizerClass, use a fallback config if we have it
if let fallbackConfig = Self.fallbackTokenizerConfig(for: modelType) {
let configuration = fallbackConfig.dictionary.merging(hubConfig.dictionary, uniquingKeysWith: { current, _ in current })
return Config(configuration)
}

// Guess by capitalizing
var configuration = hubConfig.dictionary
configuration["tokenizer_class"] = "\(modelType.capitalized)Tokenizer"
return Config(configuration)
Expand Down

0 comments on commit 03d86ac

Please sign in to comment.