-
Hello everyone, I am following the TrainAmazonReviewRanking example, with https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/training/transferlearning/TrainAmazonReviewRanking.java I have my model trained, with a file I now want to reload the model from this file, and don't find anything very relevant to do it. val reloaded = Model.newInstance("reloaded")
reloaded.setBlock(new SequentialBlock())
reloaded.load(Path.of(outputDir), "AmazonReviewRatingClassification", Map("epoch" -> epochs.toString).asJava) Should I reload the bert model, re-add the layer and then reload the file using Model.load or is it a way to load all at once? would there be instead a way to export some pt file instead? I haven't found alot on the .param files, is there some documentation I would need to read? Thanks in advance. François |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
The params file contains only parameters, not the model structure. So rather than just a place-holder block, you should be adding a block which is constructed the same as how the model was constructed during training. You can see an example of saving and loading in our tutorial. Our model saving is using DJL's own format (which is also engine agnostic) and we don't support saving into other formats like .pt |
Beta Was this translation helpful? Give feedback.
-
Hi Zach,
So would this be the correct way
public void buildModel(boolean createModel)
throws ModelNotFoundException, MalformedModelException, IOException,
TranslateException {
// MXNet base model
String modelUrls = "https://resources.djl.ai/test-models/distilbert.zip";
if ("PyTorch".equals(Engine.getInstance().getEngineName())) {
modelUrls = "
https://resources.djl.ai/test-models/traced_distilbert_wikipedia_uncased.zip
";
}
Criteria<NDList, NDList> criteria = null;
// Building the model from URL
criteria = Criteria.builder().optApplication(Application.NLP.WORD_EMBEDDING)
.setTypes(NDList.class, NDList.class).optModelUrls(modelUrls).optProgress(
new ProgressBar())
.optEngine(PtEngine.ENGINE_NAME).optOption("trainParam", createModel ?
"true" : "false").build();
embedding = criteria.loadModel();
SequentialBlock classifier = new SequentialBlock().add(ndList -> {
NDArray data = ndList.singletonOrThrow();
long maxLen = data.getShape().get(1);
NDList inputs = new NDList();
inputs.add(data.toType(DataType.INT64, false));
inputs.add(data.getManager().full(data.getShape(), 1, DataType.INT64));
inputs.add(data.getManager().arange(maxLen).toType(DataType.INT64, false
).broadcast(data.getShape()));
return inputs;
}).add(embedding.getBlock()).add(Linear.builder
().setUnits(768).build()).add(Activation::relu)
.add(Dropout.builder().optRate(0.01f).build()).add(Linear.builder
().setUnits(5).build())
.addSingleton(nd -> nd.get(":,0"));
model = Model.newInstance("ReviewRatingClassifier");
model.setBlock(classifier);
if (!createModel) {
model.load(Paths.get("build/model"), "ReviewRatingClassifier");
//embedding.load(Paths.get("build/model"),
"ReviewRatingClassifierEmbedding");
}
System.out.println("MODEL LOADED PROPERLY\nREADY FOR INFERENCING OR
TRAINING...");
}
The createModel would be true when training and when testing it would be
false.
…On Wed, Feb 14, 2024 at 2:36 PM Zach Kimberg ***@***.***> wrote:
The params file contains only parameters, not the model structure. So
rather than just a place-holder block, you should be adding a block which
is constructed the same as how the model was constructed during training.
You can see an example of saving and loading in our tutorial
<http://docs.djl.ai/docs/demos/jupyter/tutorial/03_image_classification_with_your_model.html>.
Our model saving is using DJL's own format (which is also engine agnostic)
and we don't support saving into other formats like .pt
—
Reply to this email directly, view it on GitHub
<#2992 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADG6U2HYJD6Q7UXUU2AYHLYTU4ABAVCNFSM6AAAAABDGPENYWVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DINZTGE2DK>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
com>
--
I am feeling fine, healthier and Happier, what about you
|
Beta Was this translation helpful? Give feedback.
-
Thx for your answers, doing this works and I get the same results with the reloaded model than with the original one, but still there's a point that I don't really like: I must know precisely the architecture of the mode to recreate it. Is there a way to serialize the structure to have something more generic to reload the model? |
Beta Was this translation helpful? Give feedback.
So this is actually a big question within the design of deep learning frameworks typically referred to as imperative vs symbolic. You always need to know the architecture of a model to recreate it. In a symbolic architecture like the original tensorflow, the architecture is a data structure so the framework just serializes the data structure. The downside is this only works if everything in the model (including the logic, functions, and control flow) are in the framework.
In comparison, an imperative framework views as model as code. This is what DJL does so a model is arbitrary Java code. In that sense, I don't know of any official ways to save this outside of maybe Java byte code. Inste…