-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't Import Fine-Tuned BERT Sentence Embeddings Model #13860
Comments
Hi @nreamaroon Could you please show the output Since you fined-tune, this probably was rename/changed to something else, if you can have the output in signature set to |
Hi @maziyarpanahi - thanks for following up! Here's the output of
|
Thanks for sharing this. I can see that the output is
It seems the model was fined-tuned the BERT for a classification task rather than CLM/MLM, this is strange. I have also followed the same notebook and saved the model to use in Spark NLP. (however, I used MLM so it is more appropriate to use the model for BertEmbeddings/WordEmbeddings) This is how I used to save the fine-tuned model:
|
Thanks for your response. When I tried your suggestion (saving the fine-tuned BERT model initialized by
Is there a way to resolve this? Also, can you provide complete code or a notebook to replicate your workflow above so I can get a working example? Specifically speaking, fine-tuning a model initialized by I've only found guidance on fine-tuning for an MLM task using |
You are welcome. My notebook is an exact duplicate of the notebook you provided, I just wanted to replicate the work and make sure it can be imported into Spark NLP. Your new error suggests you didn't incorporate this part you had the first time:
Not the first time, but the very last time you are saving the model, you should use the tf function to make sure the types are int32.
Just out of curiosity, do you check the |
Thanks for the clarification. There are a few issues on my end now after trying this again. First, loading the model (after the initial save) with:
If
I can only load the model without Next issue is with saving the model a second time.
Which yields:
If I exclude
|
Hi all, I prepared a notebook that shows porting original models and fine-tuned models to Spark NLP |
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 5 days |
Is there an existing issue for this?
Who can help?
No response
What are you working on?
I am attempting to fine-tune a BERT model with the Transformers library from HuggingFace, then importing that into SparkNLP with the
loadSavedModel
function inBertSentenceEmbeddings
To fine-tune BERT for language modeling (as a Fill Mask task), I am following the instructions provided by HuggingFace in this notebook.
To import BERT models from HuggingFace for sentence embeddings in SparkNLP, I am following the instructions provided by John Snow Labs in this notebook.
Current Behavior
Currently, this isn't working as expected and
BertSentenceEmbeddings.loadSavedModel
is unable to import the fine-tuned BERT model - for example, when initialized withTFAutoModelForMaskedLM.from_pretrained('bert-base-cased')
. I receive the following error when trying to do so:IllegalArgumentException: No Operation named [missing_pooled_output_key] in the Graph
However, I do have any issues using
BertSentenceEmbeddings.loadSavedModel
to import models from HuggingFace that were not specifically fine-tuned on custom data - for example, when usingTFBertModel.from_pretrained('bert-base-cased')
without doing any fine-tuning.Expected Behavior
The expect behavior is that
BertSentenceEmbeddings.loadSavedModel
will import fine-tuned BERT models (Fill Mask category) without returning an error.Steps To Reproduce
fine-tune BERT using Transformers
primarily adapted from HuggingFace's instructions in this notebook - since JSL instructs that the BERT model must be in a Fill Task category
import fine-tuned BERT model into SparkNLP
adapted from JSL's instructions in this notebook.
Spark NLP version and Apache Spark
spark == 3.3.0
sparknlp == 4.4.4
Type of Spark Application
Python Application
Java Version
No response
Java Home Directory
No response
Setup and installation
AWS SageMaker and Databricks
Operating System and Version
No response
Link to your project (if available)
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: