Reloading a model on which transfert learning was applied #2992

larochef · 2024-02-13T13:03:03Z

larochef
Feb 13, 2024

Hello everyone,

I am following the TrainAmazonReviewRanking example, with https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/training/transferlearning/TrainAmazonReviewRanking.java

I have my model trained, with a file AmazonReviewRatingClassification-0002.params, which is roughly 256MB, so I assume all the parameters needed to reload it will be inside.

I now want to reload the model from this file, and don't find anything very relevant to do it.
I tried to just create an empty block and load the file, which doesn't fail but give me different results and a different shape as the one I expected.

val reloaded = Model.newInstance("reloaded")
reloaded.setBlock(new SequentialBlock())
reloaded.load(Path.of(outputDir), "AmazonReviewRatingClassification", Map("epoch" -> epochs.toString).asJava)

Should I reload the bert model, re-add the layer and then reload the file using Model.load or is it a way to load all at once? would there be instead a way to export some pt file instead? I haven't found alot on the .param files, is there some documentation I would need to read?

Thanks in advance.

François

Answered by zachgk

Feb 15, 2024

So this is actually a big question within the design of deep learning frameworks typically referred to as imperative vs symbolic. You always need to know the architecture of a model to recreate it. In a symbolic architecture like the original tensorflow, the architecture is a data structure so the framework just serializes the data structure. The downside is this only works if everything in the model (including the logic, functions, and control flow) are in the framework.

In comparison, an imperative framework views as model as code. This is what DJL does so a model is arbitrary Java code. In that sense, I don't know of any official ways to save this outside of maybe Java byte code. Inste…

View full answer

zachgk · 2024-02-14T22:36:36Z

zachgk
Feb 14, 2024
Maintainer

The params file contains only parameters, not the model structure. So rather than just a place-holder block, you should be adding a block which is constructed the same as how the model was constructed during training. You can see an example of saving and loading in our tutorial. Our model saving is using DJL's own format (which is also engine agnostic) and we don't support saving into other formats like .pt

0 replies

vaiju1981 · 2024-02-15T01:01:20Z

vaiju1981
Feb 15, 2024

Hi Zach, So would this be the correct way public void buildModel(boolean createModel) throws ModelNotFoundException, MalformedModelException, IOException, TranslateException { // MXNet base model String modelUrls = "https://resources.djl.ai/test-models/distilbert.zip"; if ("PyTorch".equals(Engine.getInstance().getEngineName())) { modelUrls = " https://resources.djl.ai/test-models/traced_distilbert_wikipedia_uncased.zip "; } Criteria<NDList, NDList> criteria = null; // Building the model from URL criteria = Criteria.builder().optApplication(Application.NLP.WORD_EMBEDDING) .setTypes(NDList.class, NDList.class).optModelUrls(modelUrls).optProgress( new ProgressBar()) .optEngine(PtEngine.ENGINE_NAME).optOption("trainParam", createModel ? "true" : "false").build(); embedding = criteria.loadModel(); SequentialBlock classifier = new SequentialBlock().add(ndList -> { NDArray data = ndList.singletonOrThrow(); long maxLen = data.getShape().get(1); NDList inputs = new NDList(); inputs.add(data.toType(DataType.INT64, false)); inputs.add(data.getManager().full(data.getShape(), 1, DataType.INT64)); inputs.add(data.getManager().arange(maxLen).toType(DataType.INT64, false ).broadcast(data.getShape())); return inputs; }).add(embedding.getBlock()).add(Linear.builder ().setUnits(768).build()).add(Activation::relu) .add(Dropout.builder().optRate(0.01f).build()).add(Linear.builder ().setUnits(5).build()) .addSingleton(nd -> nd.get(":,0")); model = Model.newInstance("ReviewRatingClassifier"); model.setBlock(classifier); if (!createModel) { model.load(Paths.get("build/model"), "ReviewRatingClassifier"); //embedding.load(Paths.get("build/model"), "ReviewRatingClassifierEmbedding"); } System.out.println("MODEL LOADED PROPERLY\nREADY FOR INFERENCING OR TRAINING..."); } The createModel would be true when training and when testing it would be false.

…

On Wed, Feb 14, 2024 at 2:36 PM Zach Kimberg ***@***.***> wrote: The params file contains only parameters, not the model structure. So rather than just a place-holder block, you should be adding a block which is constructed the same as how the model was constructed during training. You can see an example of saving and loading in our tutorial <http://docs.djl.ai/docs/demos/jupyter/tutorial/03_image_classification_with_your_model.html>. Our model saving is using DJL's own format (which is also engine agnostic) and we don't support saving into other formats like .pt — Reply to this email directly, view it on GitHub <#2992 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADG6U2HYJD6Q7UXUU2AYHLYTU4ABAVCNFSM6AAAAABDGPENYWVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DINZTGE2DK> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.*** com>

-- I am feeling fine, healthier and Happier, what about you

1 reply

zachgk Feb 15, 2024
Maintainer

Yeah, looking this over it seems right

larochef · 2024-02-15T07:14:20Z

larochef
Feb 15, 2024
Author

Thx for your answers, doing this works and I get the same results with the reloaded model than with the original one, but still there's a point that I don't really like: I must know precisely the architecture of the mode to recreate it.

Is there a way to serialize the structure to have something more generic to reload the model?
What I would want in the end is to have different models specialized for different usages and be able to load them easily.

2 replies

zachgk Feb 15, 2024
Maintainer

So this is actually a big question within the design of deep learning frameworks typically referred to as imperative vs symbolic. You always need to know the architecture of a model to recreate it. In a symbolic architecture like the original tensorflow, the architecture is a data structure so the framework just serializes the data structure. The downside is this only works if everything in the model (including the logic, functions, and control flow) are in the framework.

In comparison, an imperative framework views as model as code. This is what DJL does so a model is arbitrary Java code. In that sense, I don't know of any official ways to save this outside of maybe Java byte code. Instead, we decided just keeping the model constructor was easier. In the model zoo, we do this using BlockFactory which helps organize the model re-initialization by saving and loading custom argument maps.

Now, what some frameworks do is try to trace the model to convert arbitrary code into a data structure. For example, Pytorch uses tracing or scripting. Tracing works by running the model and just seeing what Pytorch operations occur. Scripting works by doing static code analysis. Overall, it's a bit of a messy practice which is why we haven't tried going down this route yet and don't have plans to do so at the moment. Of course, if you are interested you can try to contribute one.

For your case, you could try organizing things similarly to the DJL model zoo. You can look at some of our existing model zoos and the metadata files in your ~/.djl.ai/repo directory to see how it looks like. This is how we store and organize many models for loading. If you create a structure like that and your own Repository, you can load it using the Criteria just as easily as we load the official models

Answer selected by larochef

larochef Feb 19, 2024
Author

Thx for the answer, I will look into into it to see what I can do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reloading a model on which transfert learning was applied #2992

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Reloading a model on which transfert learning was applied #2992

larochef Feb 13, 2024

Replies: 3 comments · 3 replies

zachgk Feb 14, 2024 Maintainer

vaiju1981 Feb 15, 2024

zachgk Feb 15, 2024 Maintainer

larochef Feb 15, 2024 Author

zachgk Feb 15, 2024 Maintainer

larochef Feb 19, 2024 Author

larochef
Feb 13, 2024

Replies: 3 comments 3 replies

zachgk
Feb 14, 2024
Maintainer

vaiju1981
Feb 15, 2024

zachgk Feb 15, 2024
Maintainer

larochef
Feb 15, 2024
Author

zachgk Feb 15, 2024
Maintainer

larochef Feb 19, 2024
Author