-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to train/fine-tune with fp-16 flag? #126
Comments
Thanks a lot for your comments. It looks like there are at least two ways to train with mixed precision. Method 1:
Then, wrap an optimizer in
... where opt is an optimizer like Adam. Method 2: from tensorflow.keras.mixed_precision import experimental as mixed_precision
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_policy(policy) Method 2 doesn't seem to work with the Method 1 works and does yield a speedup but may or may not be worth it for you. Since mixed precision support in TF 2 is still kind of experimental and a little brittle, I've postponed adding direct support for this in ktrain for the time being. But, you can still experiment on your own using instructions above. However, if you're having trouble training BERT on your system, I would try to use DistilBert instead of using mixed precision with BERT, as DistilBERT is smaller and faster has nearly the same performance as BERT in my experience: ** DistilBERT example:** # load text data
categories = ['alt.atheism', 'soc.religion.christian','comp.graphics', 'sci.med']
from sklearn.datasets import fetch_20newsgroups
train_b = fetch_20newsgroups(subset='train', categories=categories, shuffle=True)
test_b = fetch_20newsgroups(subset='test',categories=categories, shuffle=True)
(x_train, y_train) = (train_b.data, train_b.target)
(x_test, y_test) = (test_b.data, test_b.target)
# build, train, and validate model (Transformer is wrapper around transformers library)
import ktrain
from ktrain import text
MODEL_NAME = 'distilbert-base-uncased'
t = text.Transformer(MODEL_NAME, maxlen=500, class_names=train_b.target_names)
trn = t.preprocess_train(x_train, y_train)
val = t.preprocess_test(x_test, y_test)
model = t.get_classifier()
learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=6)
learner.fit_onecycle(5e-5, 4)
learner.validate(class_names=t.get_classes()) # class_names must be string values
# Output from learner.validate()
# precision recall f1-score support
#
# alt.atheism 0.92 0.93 0.93 319
# comp.graphics 0.97 0.97 0.97 389
# sci.med 0.97 0.95 0.96 396
#soc.religion.christian 0.96 0.96 0.96 398
#
# accuracy 0.96 1502
# macro avg 0.95 0.96 0.95 1502
# weighted avg 0.96 0.96 0.96 1502 |
Thank you so much for this detailed response! It most definitely solved the problem. |
as said above thanks for the detailed info @amaiya I'm running xlm-roberta-large on v3-8 tpu's and the method 1 seems to actually use more memory. I.e. the max batch size I can fit is smaller than without the 3 lines above Could you anyone get it working properly on TPU's? |
I haven't tried mixed precision on TPUs, but this TensorFlow page has information on it including TPU-specific info. |
I am trying to train a BERT text classifier for a custom classification task. I have an RTX 2070 for accelerating the workflow.
It does run out memory a lot of times even with small batch-sizes. Is there a way to leverage fp-16 support for training?
It would be really helpful and train better.
Also, love your work! Thank you for creating this library.
The text was updated successfully, but these errors were encountered: