Skip to content

Commit

Permalink
Use fallback config if class not defined
Browse files Browse the repository at this point in the history
Fixes distilgpt2 tokenization.

Previously, we only used the fallback configuration if there was no
`tokenizer_config.json` in the model repo. These files are now being
added to some repos in the context of removing dependencies with
transformers' internals, like this PR:
huggingface/transformers#29112. But only keys
removed from the hardcoded rules are being added to minimize potential
breaking changes.

We now use the fallback config if tokenizer_config.json exists, no
tokenizer class is specified, and we do have a fallback config for this
architecture.
  • Loading branch information
pcuenca committed Feb 29, 2024
1 parent bbbd7bf commit e54ca4b
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions Sources/Hub/Hub.swift
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,14 @@ public class LanguageModelConfigurationFromHub {
// Try to guess the class if it's not present and the modelType is
if let _ = hubConfig.tokenizerClass?.stringValue { return hubConfig }
guard let modelType = try await modelType else { return hubConfig }

// If the config exists but doesn't contain a tokenizerClass, use a fallback config if we have it
if let fallbackConfig = Self.fallbackTokenizerConfig(for: modelType) {
let configuration = fallbackConfig.dictionary.merging(hubConfig.dictionary, uniquingKeysWith: { current, _ in current })
return Config(configuration)
}

// Guess by capitalizing
var configuration = hubConfig.dictionary
configuration["tokenizer_class"] = "\(modelType.capitalized)Tokenizer"
return Config(configuration)
Expand Down

0 comments on commit e54ca4b

Please sign in to comment.