Token types and AnyToken

When matching a single tokenizer (or multiple tokenizers with the same type) tokens returned will be of the type associated with the tokenizer.

let tokens = "123 456 abc def".tokens(matchedWith: .letters)
// tokens: [CharacterSet.Token]
// tokens.count -> 2
//
// token[1].text -> "abc"
// token[2].text -> "def"

When using multiple tokenizers of different types, you'll need to convert the tokenizers to AnyTokenizer first so that they are the same type.

Tokens returned from tokenizing with AnyTokenizer will have the protocol type TokenType but you can use type casting to access the specific instance or type checking to identify a particular type of token.

Here's an example using DateTokenizer, and CharacterSet together:

let tokens = "12/01/17 123".tokens(matchedWith: DateTokenizer.defaultTokenizer, CharacterSet.decimalDigits.anyTokenizer)
// tokens: [TokenType]
// tokens.count -> 2

for token in tokens {

    switch token {
      // type check to see if token is a CharacterSet token
      case is CharacterSet.Token:
        print("digits are:", token.text)

      // type cast to access the `date` property
      case let dateToken as DateTokenizer.Token:
        print("date is:", dateToken.date)

      default: break
    }
}
// -> prints
// date is: 2017-12-01 05:00:00 +0000
// digits are: 123

Providing a custom TokenType can make your code more expressive, but it is entirely optional. If a tokenizer does not define an associated token type, then the default type AnyToken will be used instead.

This means that it's entirely possible that two different tokenizers will return tokens with the type AnyToken. To distinguish between types of AnyToken produced by different tokenizers, AnyToken provides a tokenizerType property that identifies the tokenizers that created the token.

Here's an example of using the tokenizers ThreeLetterWord and FourLetterWord that both return AnyToken but uses tokenizerType to drive logic.

let tokens: [AnyToken] = "one two three four five six seven 8 9 10"
        .tokens(matchedWith: ThreeLetterWord.defaultTokenzier, FourLetterWord.defaultTokenzier)

for token in tokens {

    switch token.tokenizerType {
      // token was made by `ThreeLetterWord` tokenizer
      case is ThreeLetterWord.Type:
        print("3 letters:", token.text)

      // token was made by `FourLetterWord` tokenizer
      case is FourLetterWord.Type:
        print("4 letters:", token.text)

    default: break
    }
}
// prints ->
// 3 letters: one
// 3 letters: two
// 4 letters: four
// 4 letters: five
// 3 letters: six

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token types and AnyToken.md

Token types and AnyToken.md

Token types and AnyToken

Files

Token types and AnyToken.md

Latest commit

History

Token types and AnyToken.md

File metadata and controls

Token types and AnyToken