Speculative sampling and Llmoe? #27

SabinStargem · 2023-09-04T07:38:02Z

I heard about a new feature coming to Llama, where a method is used to speed up a model's inference. The benefit ranges, around 2x the speed, but probably closer to 1.5x. How it works is that a big model like 34b, uses a smaller draft model like 7b to sample input. The Github thread has video showing the performance benefits of the method.

My thoughts immediately jumped to Airoboros's Llmoe. Would it be possible to integrate a "Inference" Llmoe into vanilla Airoboros to benefit from speculative sampling?

speculative : PoC for speeding-up inference via speculative sampling

jondurbin · 2023-09-04T22:30:03Z

I've been looking that that very closely, it's a great idea! Perhaps fine-tuning the new 1.1b or something would be a start.

SabinStargem · 2023-09-05T04:23:27Z

One of the people posting in the Llama Github mentioned that chaining draft models might have potential. Something like a 3b->7b->13b->34b->70b. My gut says that much like b-parameters, there would probably be a sweet spot in the amount of draft models and their respective sizes in that configuration.

Fortunately, I believe that it would be relatively easy to objectively test multiple permutations - the metric is speed, which can be recorded easily. Provided that speculative sampling doesn't impact output quality, it should be a painless concept to test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speculative sampling and Llmoe? #27

Speculative sampling and Llmoe? #27

SabinStargem commented Sep 4, 2023

jondurbin commented Sep 4, 2023 •

edited

Loading

SabinStargem commented Sep 5, 2023 •

edited

Loading

Speculative sampling and Llmoe? #27

Speculative sampling and Llmoe? #27

Comments

SabinStargem commented Sep 4, 2023

jondurbin commented Sep 4, 2023 • edited Loading

SabinStargem commented Sep 5, 2023 • edited Loading

jondurbin commented Sep 4, 2023 •

edited

Loading

SabinStargem commented Sep 5, 2023 •

edited

Loading