-
Notifications
You must be signed in to change notification settings - Fork 3
New Machine Learning Model Extension Version 2.0.alpha schema and (de)serialization, validation package #2
Conversation
@rbavery |
Hey @fmigneault happy new year! Here's an update on progress with the stac_model validator and updates to the extension. I've made some substantial updates and looking to wrap this PR up early next week. I've made some extensive edits to the schema for more tasks and using common metadata objects where possible (Statistics, Raster, Asset Object). The hackmd for this README is at https://hackmd.io/@cHP95b4sTDWQdP7uy1Vv7A/rkneCaru6 If you have any time to comment on it in the coming weeks I'd much appreciate it! You can also refer to this Changelog where I've tried to keep track of some substantial updates. I'll need to make some further changes to make sure this complies with STAC Extension guidelines before this is ready to merge.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ rbavery for this first draft!
I didn't dig too much in the code (yet) since I felt there was already enough items to discuss about from the spec itself.
I'm guessing most of the generic files under stac_model
such as editorconfig, changes, etc. would be moved at the root? Is this only temporary while working on it from another source repo copy?
README.md
Outdated
|
||
- Examples: | ||
- [Example with a UNet trained with thelper](examples/item.json) | ||
- [Example with a ??? trained with torchgeo](examples/item.json) TODO update example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previously mentioned UNet
is what the contents of https://github.com/crim-ca/dlm-extension/blob/main/examples/model-arch-summary.txt refers to.
This is based on https://github.com/nyoki-mtl/pytorch-segmentation/blob/master/src/models/decoder.py that uses a generic pytorch model.
If you can provide a more relevant model with torchgeo
, then I'm fine with updating this reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! I'd like to use torchgeo for the example since I think it is the most popular framework specific to EO for training deep learning models. They have pretrained models on EO datasets, Sentinel-2, Landsat, and they intend to host others.
I've been working with their 13 band pretrained Sentinel-2 Resnet model and was thinking of proposing this as the new example: https://torchgeo.readthedocs.io/en/stable/tutorials/trainers.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree to use the more popular torchgeo. Just need to update the example with the corresponding output. The Sentinel-2 Resnet is good. I've been using it also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The https://github.com/rbavery/dlm-extension/blob/validate/stac_model/example.json seems appropriate. Is there a persistent location where the .pt
could be hosted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huggingface is down now but I'll upload the Torchscript version of https://huggingface.co/torchgeo/resnet18_sentinel2_all_moco/ I've been working with that corresponds to the example.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coming back to this, I can upload this after remaking a notebook demo with the new schema and a new version of stac_model
Yes should have specified but let's focus on discussing the spec! I can move this to root once it is finalized. |
…t or as separate inputs
…array for channelwise norm
…type and how to name model json by unique mlm:name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmigneault here's my review from rbavery#2, it's looking really comprehsensive. Looking forward to using this extension!
README.md
Outdated
| `detection` | `detection` | Generic detection of the "presence" of objects or entities, with or without positions. | | ||
| `object-detection` | *n/a* | Task corresponding to the identification of positions as bounding boxes of object detected in the scene. | | ||
| `segmentation` | `segmentation` | Generic tasks that regroups all types of segmentations tasks consisting of applying labels to pixels. | | ||
| `semantic-segmentation` | *n/a* | Specific segmentation task where all pixels are attributed labels, without consideration of similar instances. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `semantic-segmentation` | *n/a* | Specific segmentation task where all pixels are attributed labels, without consideration of similar instances. | | |
| `semantic-segmentation` | *n/a* | Specific segmentation task where all pixels are attributed labels, without consideration for segments as unique objects. | |
- `MXNet` | ||
- `Keras` | ||
- `Caffe` | ||
- `Weka` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should Weka
be listed? I've never heard of it or seen it in the wild.
Here's a suggested reordering based on my subjective interpretation of current popularity + longevity.
I also added rgee
and spatialRF
to showcase some R options. especially in academia, lots of folks use R, particularly random forest models for semantic segmentation.
I removed Caffe (no updates in 4 years) and MxNet (archived last year). I don't think anyone will publish models for these frameworks.
Removed ONNX since it isn't a training framework and I think the purpose of this field is to describe the framework used to train the model. this might be different than the inference runtime and format.
- `Weka` | |
- `PyTorch` | |
- `TensorFlow` | |
- `Scikit-learn` | |
- `Huggingface` | |
- `Keras` | |
- `rgee` | |
- `spatialRF` | |
- `JAX` | |
- `PyMC` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weka is not as common as others, but I have seen it once or twice. I prefer to have the option than not.
I think it is a good idea to include ONNX. Remember that MLM is not for training only. It can be used for inference runtime.
I don't think popularity or older frameworks should be ignored. People might want to support older algorithms that work well.
Will add the R-based references, those are good examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I'm fine to include Weka then and include the old frameworks.
I think it is a good idea to include ONNX. Remember that MLM is not for training only. It can be used for inference runtime.
I totally agree (in fact I don't think MLM is a good fit for describing how to train a model), but we call out that this framework
field should denote the framework used for training in the table. ONNX isn't used to train ML models. Instead, that can be denoted in the asset details. I f we include it, someone could get confused and not supply the actual framework used for training.
README.md
Outdated
- `wrap-fill-outliers` | ||
- `wrap-inverse-map` | ||
|
||
See [OpenCV - Normalization Flags](https://docs.opencv.org/4.x/d2/de8/group__core__array.html#ga87eef7ee3970f86906d69a92cbf064bd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this reference to normalization flags needs to be switched with the earlier reference to interpolation/resize methods.
long term I'd be interested in picking a different reference than OpenCV's C++ documentation, since the OpenCV lib is lower level than most folks encounter and the docs are a bit hard to follow (python programmer might get confused with the C data types for example). But I think this is better than us rolling our own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
Ideally, MLM would have its own listing and description (with OpenCV or other libs as references for corresponding implementations), but I didn't feel like replicating the formulas provided by OpenCV here. Some normalizations have very subtle differences, so it is not really trivial to describe properly instead of simply showing the actual formula.
I'm open to other references, but unless we got them really quickly and exhaustively, I'd leave them to a follow-up PR not to delay this one further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes let's do this later, no need to block this
README.md
Outdated
|
||
| Artifact Type | Description | | ||
|--------------------|--------------------------------------------------------------------------------------------------------------------------| | ||
| `torch.compile` | A model artifact obtained by [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html). | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `torch.compile` | A model artifact obtained by [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html). | | |
| `.pt2` | A model artifact obtained using [Pytorch's AOTInductor](https://pytorch.org/docs/main/torch.compiler_aot_inductor.html). | |
I think this is a necessary edit since torch.compile is an API for compiling pytorch nn.Modules. it's used to speed up Pytorch code in general, for training or inference. Many of the backend internals that make torch.compile work are used in AOTInductor (the tool that creates compiled model artifacts) but they aren't the same thing and I think here we want to refer to artifacts produced by AOTInductor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some reference mentions of AOTInductor and torch.export
. Let me know what you think of the edit.
I must say, though, I'm not 100% convinced about these definitions included in the spec (or rather, how explicit to be about them). They are still fairly prone to change soon. Even the doc mentions that many terms are used interchangeably.
https://pytorch.org/docs/main/torch.compiler.html#torch-compiler
In some cases, the terms torch.compile, TorchDynamo, torch.compiler might be used interchangeably in this documentation.
There is also a lot of ambiguity about torch.export
using TorchDynamo under the hood, but then, torch.compile
accepts backend='inductor'
for TorchInductor (https://pytorch.org/docs/main/torch.compiler.html#torch-compiler), although they are also listed as distinct technologies under torch.compiler
. And then, there is torch.export.save
/torch.export.save
for .pt2
(https://pytorch.org/docs/main/export.html#serialization) as parallel to the torch.save
/torch.load
for .pt
.
I do not want to make the maintenance of the spec a burden for any new subtle change from pytorch. It's a fast-evolving ecosystem. Since this is an open field, I fear it is getting slightly too specific for this framework.
README.md
Outdated
| Artifact Type | Description | | ||
|--------------------|--------------------------------------------------------------------------------------------------------------------------| | ||
| `torch.compile` | A model artifact obtained by [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html). | | ||
| `torch.jit.script` | A model artifact obtained by [`TorchScript`](https://pytorch.org/docs/stable/jit.html). | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `torch.jit.script` | A model artifact obtained by [`TorchScript`](https://pytorch.org/docs/stable/jit.html). | | |
| `torchscript` | A model artifact obtained by [`TorchScript Scripting`](https://pytorch.org/docs/stable/jit.html) and/or [`TorchScript Tracing`](https://pytorch.org/docs/stable/generated/torch.jit.trace.html). | |
There are two types of graph capture in torchscript, trace and script. I think we can either enumerate both or leave only one option. Either traced or scripted models can be loaded the same way so I favor only one field for both of them.
Somewhat confusingly, both can also be used together though this is not common. https://ppwwyyxx.com/blog/2022/TorchScript-Tracing-vs-Scripting/
Since Torchscript is bring phased out in favor of AOTInductor, I think we shouldn't make this too complex and only provide them as one field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to other comment about the export. I'm not 100% convinced about this. I got somewhat mixed feelings about it.
I find that torchscript
might just make it more ambiguous by not being explicit (about the actual function) whether it refers to torch.jit.trace
or torch.jit.script
, but then, it might also just be a technicality about pytorch that we should omit for MLM since this table is already extremely PyTorch-opinionated.
Also, I would like to remind that, even if things are being phased out, it does not mean we can ignore them. Some code/artifacts exists already, and users might what to represent them with the corresponding old technology.
@rbavery I also thought a bit more about the media-type for the artifact, and I think it is better to guide users in a single direction than multiple combinations. |
Almost ready to merge! |
Here's a python package for validation, import, and export of the stac model metadata standard. I'm thinking this should be named stac-model on PYPI?
This uses the old model metadata standard I have been using and ports my pydantic code. I'll update this to get it in-line with the DLM spec and then it'll be ready for a code review.
I've linked everything in the README assuming this will be transferred to the stac-extensions repo pretty soon, which I think will help with visibility and adoption.
cc @fmigneault-crim