Add optimal model size and stopping time feature #4847

TevenLeScao · 2020-06-08T12:53:29Z

🚀 Feature request

The calculator blog post presented an automated way to find scaling laws with model size and compute budget on language modeling tasks. Adding it to the library would help save on training costs by picking an optimal model size and training time.

Motivation

Estimating how big of a model to use and how long to train for is more of an art than a science. An automated tool to perform that task would allow researchers and practitioners to concentrate on the the high-level parts of their projects as opposed to parameter tweaking.

Your contribution

I can submit a PR with my existing work, probably integrating it within Trainer and/or knocknock.

The text was updated successfully, but these errors were encountered:

lopuhin · 2020-06-08T21:33:32Z

Great stuff, thank you! The energy estimates look 1000 worse than reality though, V100 running for 12 h should not consume 5432 kWh I think, else we'd be all dead. 5.4 kWh looks more reasonable.

TevenLeScao · 2020-06-08T23:29:40Z

Great stuff, thank you! The energy estimates look 1000 worse than reality though, V100 running for 12 h should not consume 5432 kWh I think, else we'd be all dead. 5.4 kWh looks more reasonable.

Ah yes - I remembered having a doubt on that, I checked again the library we used to estimate those and there might have been a unit conversion error, I'll fix that ASAP tomorrow!

Edit: it's fixed, thank you @lopuhin !

BramVanroy · 2020-06-09T11:14:52Z

This is already looking very promising! Good stuff.

When clicking the "initialize in transformers" button, the code block should probably not center-align the code, but left align instead. That makes the code a lot more readable.

TevenLeScao · 2020-06-10T13:44:32Z

This is already looking very promising! Good stuff.

When clicking the "initialize in transformers" button, the code block should probably not center-align the code, but left align instead. That makes the code a lot more readable.

Yeah that was a bit of an aesthetic choice to not break the flow of the web page, it definitely wouldn't be like this in a tool rather than a demo!

stale · 2020-08-09T14:32:18Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

julien-c · 2020-08-17T13:42:25Z

unstale, what's the status on this @TevenLeScao? Should we close?

TevenLeScao · 2020-08-24T08:48:53Z

@julien-c we had originally decided not to go forward with this, but I started working on it amongst the discussions about the scale of GPT-3. I didn't get to finish it before leaving for holidays two weeks ago, but the PR will be ready this week.

stale · 2020-10-24T03:45:14Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

moinnadeem · 2021-06-08T20:18:48Z

Hi! The "initialize in Huggingface" button is broken -- is there something I can do locally to solve it? I just wanted the lines of training code for a given wall-clock time.

TevenLeScao · 2021-06-08T20:43:57Z

Hey! The page seems broken, not sure why, I'll relaunch it

moinnadeem · 2021-06-09T00:48:34Z

@TevenLeScao Thanks for the immediate reply! The button to launch in Huggingface Transformers still isn't working, but I'm happy to help debug / send any reports if it helps! Alternatively, do you think you could help me understand what the button does? i'm just hoping to generate the configuration string n_layers=N_LAYERS,n_ctx=N_CTX, with the variables filled in by the calculator.

Thanks for your time!

TevenLeScao · 2021-06-09T12:29:26Z

I've relaunched, it should work now (just gotta figure why the page doesn't center on my desktop).

moinnadeem · 2021-06-09T17:31:59Z

@TevenLeScao Yes, it works -- thanks!

Out of curiosity, why did you use Transformer-XL as opposed to something like GPT-2? Does Transformer-XL reach a lower validation loss on Wikitext-103 as opposed to GPT-2 when training for the same number of steps?

TevenLeScao · 2021-06-09T18:18:33Z

Yeah, it was the state-of-the-art at the time!

TevenLeScao added Discussion Discussion on a topic (keep it focused or open a new issue though) High-Level feature Ex: LM (Pretraining) Related to language modeling pre-training labels Jun 8, 2020

TevenLeScao self-assigned this Jun 8, 2020

stale bot added the wontfix label Aug 9, 2020

stale bot closed this as completed Aug 16, 2020

julien-c reopened this Aug 17, 2020

stale bot removed the wontfix label Aug 17, 2020

TevenLeScao mentioned this issue Aug 27, 2020

Floating-point operations logging in trainer #6768

Merged

stale bot added the wontfix label Oct 24, 2020

stale bot closed this as completed Nov 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optimal model size and stopping time feature #4847

Add optimal model size and stopping time feature #4847

TevenLeScao commented Jun 8, 2020

lopuhin commented Jun 8, 2020

TevenLeScao commented Jun 8, 2020 •

edited

Loading

BramVanroy commented Jun 9, 2020 •

edited

Loading

TevenLeScao commented Jun 10, 2020

stale bot commented Aug 9, 2020

julien-c commented Aug 17, 2020

TevenLeScao commented Aug 24, 2020

stale bot commented Oct 24, 2020

moinnadeem commented Jun 8, 2021

TevenLeScao commented Jun 8, 2021

moinnadeem commented Jun 9, 2021

TevenLeScao commented Jun 9, 2021

moinnadeem commented Jun 9, 2021

TevenLeScao commented Jun 9, 2021

Add optimal model size and stopping time feature #4847

Add optimal model size and stopping time feature #4847

Comments

TevenLeScao commented Jun 8, 2020

🚀 Feature request

Motivation

Your contribution

lopuhin commented Jun 8, 2020

TevenLeScao commented Jun 8, 2020 • edited Loading

BramVanroy commented Jun 9, 2020 • edited Loading

TevenLeScao commented Jun 10, 2020

stale bot commented Aug 9, 2020

julien-c commented Aug 17, 2020

TevenLeScao commented Aug 24, 2020

stale bot commented Oct 24, 2020

moinnadeem commented Jun 8, 2021

TevenLeScao commented Jun 8, 2021

moinnadeem commented Jun 9, 2021

TevenLeScao commented Jun 9, 2021

moinnadeem commented Jun 9, 2021

TevenLeScao commented Jun 9, 2021

TevenLeScao commented Jun 8, 2020 •

edited

Loading

BramVanroy commented Jun 9, 2020 •

edited

Loading