fix JSON schema with MLM fields + support pydantic/pystac objects #2

fmigneault · 2024-03-28T19:56:54Z

PR just to demonstrate the changes from crim-ca#2

rbavery · 2024-03-30T00:41:17Z

@fmigneault awesome thanks. I can review this next week when it is ready.

…ca/dlm-extension/pull/2\#discussion_r1538309152)

fmigneault · 2024-03-30T00:54:21Z

@rbavery
I think I have addressed most of the editorial items.
I'll work on the JSON-schema definition to reflect the descriptions at the start of next week.

fmigneault · 2024-04-04T05:11:31Z

@rbavery Almost there with the JSON schema. Only a few definitions about mlm:output and the MLM-roles left to implement. I did a lot of adjustments to the README to place the sections closer to where they can mention first. There was a lot of back and forth between Item/Asset fields when following links.

…s to radiantearth/stac-spec#1268)

…ration (relates to stac-extensions/eo#12)"

…ands [AnyBandsRef])

json-schema/schema.json

rbavery

@fmigneault this looks great. Some requests to edit the documentation and revised schema

rbavery · 2024-04-09T18:30:58Z

README.md

+| `detection`             | `detection`                 | Generic detection of the "presence" of objects or entities, with or without positions.                          |
+| `object-detection`      | *n/a*                       | Task corresponding to the identification of positions as bounding boxes of object detected in the scene.        |
+| `segmentation`          | `segmentation`              | Generic tasks that regroups all types of segmentations tasks consisting of applying labels to pixels.           |
+| `semantic-segmentation` | *n/a*                       | Specific segmentation task where all pixels are attributed labels, without consideration of similar instances.  |


Suggested change

| `semantic-segmentation` | *n/a* | Specific segmentation task where all pixels are attributed labels, without consideration of similar instances. |

| `semantic-segmentation` | *n/a* | Specific segmentation task where all pixels are attributed labels, without consideration for segments as unique objects. |

rbavery · 2024-04-09T18:32:50Z

README.md

+such a model that produces pixel-wise "classifications" should be attributed the `segmentation` task
+(and more specifically `semantic-segmentation`) rather than `classification`. To avoid this kind of ambiguity,
+it is strongly recommended that `tasks` always aim to provide the most specific definitions possible to explicitly
+describe what the model accomplishes.


great explanation here!

rbavery · 2024-04-09T19:13:32Z

README.md

+- `MXNet`
+- `Keras`
+- `Caffe`
+- `Weka`


should Weka be listed? I've never heard of it or seen it in the wild.

Here's a suggested reordering based on my subjective interpretation of current popularity + longevity.

I also added rgee and spatialRF to showcase some R options. especially in academia, lots of folks use R, particularly random forest models for semantic segmentation.

I removed Caffe (no updates in 4 years) and MxNet (archived last year). I don't think anyone will publish models for these frameworks.

Removed ONNX since it isn't a training framework and I think the purpose of this field is to describe the framework used to train the model. this might be different than the inference runtime and format.

Suggested change

- `Weka`

- `PyTorch`

- `TensorFlow`

- `Scikit-learn`

- `Huggingface`

- `Keras`

- `rgee`

- `spatialRF`

- `JAX`

- `PyMC`

rbavery · 2024-04-09T19:14:24Z

README.md


-### Accelerator Enum
+In most cases, this should correspond to common library names of well-established ML frameworks.


Suggested change

In most cases, this should correspond to common library names of well-established ML frameworks.

This should correspond to the common library name of the well-established ML framework used to train the model.

rbavery · 2024-04-09T19:20:09Z

README.md

+- `wrap-fill-outliers`
+- `wrap-inverse-map`
+
+See [OpenCV - Normalization Flags](https://docs.opencv.org/4.x/d2/de8/group__core__array.html#ga87eef7ee3970f86906d69a92cbf064bd)


I think this reference to normalization flags needs to be switched with the reference to interpolation.

long term I'd be interested in picking a different reference than OpenCV's C++ documentation, since the OpenCV lib is lower level than most folks encounter and the docs are a bit hard to follow (python programmer might get confused with the C data types for example). But I think this is better than us rolling our own.

rbavery · 2024-04-09T19:22:06Z

README.md

+
+#### Model Artifact Media-Type
+
+Not all ML framework, libraries or model artifacts provide explicit media-type. When those are not provided, custom


Suggested change

Not all ML framework, libraries or model artifacts provide explicit media-type. When those are not provided, custom

Not all ML framework, libraries or model artifacts provide an explicit media-type. When those are not provided, custom

rbavery · 2024-04-09T19:24:15Z

README.md

+
+This value can be used to provide additional details about the specific model artifact being described.
+For example, PyTorch offers various strategies for providing model definitions, such as Pickle (`.pt`), TorchScript,
+or the compiled approach. Since they all refer to the same ML framework,


Suggested change

or the compiled approach. Since they all refer to the same ML framework,

or the upcoming [Ahead-Of-Time Compiled .pt2 format](https://pytorch.org/docs/main/torch.compiler_aot_inductor.html). Since they all refer to the same ML framework,

rbavery · 2024-04-09T19:27:06Z

README.md

+
+| Artifact Type      | Description                                                                                                              |
+|--------------------|--------------------------------------------------------------------------------------------------------------------------|
+| `torch.compile`    | A model artifact obtained by [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html).  |


Suggested change

| `torch.compile` | A model artifact obtained by [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html). |

| `.pt2` | A model artifact obtained using [Pytorch's AOTInductor](https://pytorch.org/docs/main/torch.compiler_aot_inductor.html). |

I think this is a necessary edit since torch.compile is an API for compiling pytorch nn.Modules. it's used to speed up Pytorch code in general, for training or inference. Many of the backend internals that make torch.compile work are used in AOTInductor (the tool that creates compiled model artifacts) but they aren't the same thing and I think here we want to refer to artifacts produced by AOTInductor

rbavery · 2024-04-09T19:32:37Z

README.md

+| Artifact Type      | Description                                                                                                              |
+|--------------------|--------------------------------------------------------------------------------------------------------------------------|
+| `torch.compile`    | A model artifact obtained by [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html).  |
+| `torch.jit.script` | A model artifact obtained by [`TorchScript`](https://pytorch.org/docs/stable/jit.html).                                  |


Suggested change

| `torch.jit.script` | A model artifact obtained by [`TorchScript`](https://pytorch.org/docs/stable/jit.html). |

| `torchscript` | A model artifact obtained by [`TorchScript Scripting`](https://pytorch.org/docs/stable/jit.html) and/or [`TorchScript Tracing`](https://pytorch.org/docs/stable/generated/torch.jit.trace.html). |

There are two types of graph capture in torchscript, trace and script. I think we can either enumerate both or leave only one option. Either traced or scripted models can be loaded the same way so I favor only one field for both of them.

Somewhat confusingly, both can also be used together though this is not common. https://ppwwyyxx.com/blog/2022/TorchScript-Tracing-vs-Scripting/

Since Torchscript is bring phased out in favor of AOTInductor, I think we shouldn't make this too complex and only provide them as one field.

rbavery · 2024-04-09T19:34:01Z

stac_model/output.py

+    data_type: DataType
+
+
+# MLMClassification: TypeAlias = Annotated[


can be deleted

[wip] address PR comments about tasks definitions

67b4688

fmigneault mentioned this pull request Mar 28, 2024

New Machine Learning Model Extension Version 2.0.alpha schema and (de)serialization, validation package crim-ca/dlm-extension#2

Merged

fmigneault-crim added 5 commits March 29, 2024 17:36

apply PR recommendations

efe223b

add best practice details

c79ea01

add yet again more best practices to integrate other STAC extensions

4d765c2

more best practices (relates to stac-extensions/classification#48 and s…

4db3b94

…tac-extensions/example-links#4)

adjustments from PR review

669c9a3

add more mlm:accelerator details (relates to https://github.com/crim-…

edcc8a2

…ca/dlm-extension/pull/2\#discussion_r1538309152)

fmigneault added 5 commits March 29, 2024 21:08

add details about link releation types

06ee0ef

add details about dimensions and tasks

1a50057

more examples and details

1faf4d9

[wip] updating JSON-schema with MLM fields

501971a

[wip] more updates to JSON schema for MLM definitions

6ec1cd5

fmigneault added 7 commits April 4, 2024 13:14

more schema adjustments

8aca9b3

more details about expected values for dim_order + pretrained flag

ab41765

address incompatibility of 'end_datetime=null' with STAC Core (relate…

be58e86

…s to radiantearth/stac-spec#1268)

add mlm:hyperparameters defintion (fixes #14)

8b46388

add example bands and statitics details

2b87297

update pydantic models with new json-schema fields

269bd73

add details & example with 'eo:bands' for special JSON schema conside…

03e7e06

…ration (relates to stac-extensions/eo#12)"

fmigneault-crim force-pushed the validate branch from b8bd14b to 03e7e06 Compare April 4, 2024 23:49

fmigneault added 3 commits April 4, 2024 20:52

update examples working against JSON schema (except check for cross-b…

2d6c70b

…ands [AnyBandsRef])

adjust pydantic eurosat_example with json-schema fields

d111678

fix pydantic drop unset fields as intended

4d57e41

fmigneault mentioned this pull request Apr 5, 2024

Roadmap for V2 of the ML Model Extension crim-ca/dlm-extension#7

Closed

20 tasks

fmigneault marked this pull request as ready for review April 5, 2024 04:30

add OmitIfNone reference code

4eb30da

fmigneault changed the title ~~[wip] address PR comments~~ fix JSON schema with MLM fields + support pydantic/pystac objects Apr 5, 2024

fix invalid raster/eo bands/statistics definitions in examples

2155745

devisperessutti reviewed Apr 9, 2024

View reviewed changes

json-schema/schema.json Outdated Show resolved Hide resolved

update schema title and description

9d14ac6

rbavery merged commit 9d14ac6 into rbavery:validate Apr 9, 2024

rbavery requested changes Apr 9, 2024

View reviewed changes

fmigneault mentioned this pull request Apr 23, 2024

Roadmap for V2 of the ML Model Extension crim-ca/mlm-extension#4

Closed

21 tasks

rbavery pushed a commit that referenced this pull request May 2, 2024

Merge pull request #2 from rbavery/validate

c3a3a67

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix JSON schema with MLM fields + support pydantic/pystac objects #2

fix JSON schema with MLM fields + support pydantic/pystac objects #2

fmigneault commented Mar 28, 2024

rbavery commented Mar 30, 2024 •

edited

Loading

fmigneault commented Mar 30, 2024

fmigneault commented Apr 4, 2024

rbavery left a comment

rbavery Apr 9, 2024

rbavery Apr 9, 2024

rbavery Apr 9, 2024

rbavery Apr 9, 2024

rbavery Apr 9, 2024

rbavery Apr 9, 2024

rbavery Apr 9, 2024

rbavery Apr 9, 2024

rbavery Apr 9, 2024

rbavery Apr 9, 2024

	\| `semantic-segmentation` \| n/a \| Specific segmentation task where all pixels are attributed labels, without consideration of similar instances. \|
	\| `semantic-segmentation` \| n/a \| Specific segmentation task where all pixels are attributed labels, without consideration for segments as unique objects. \|

-- `Weka`
+- `PyTorch`
+- `TensorFlow`
+- `Scikit-learn`
+- `Huggingface`
+- `Keras`
+- `rgee`
+- `spatialRF`
+- `JAX`
+- `PyMC`


		### Accelerator Enum
		In most cases, this should correspond to common library names of well-established ML frameworks.

	In most cases, this should correspond to common library names of well-established ML frameworks.
	This should correspond to the common library name of the well-established ML framework used to train the model.


		#### Model Artifact Media-Type

		Not all ML framework, libraries or model artifacts provide explicit media-type. When those are not provided, custom

	or the compiled approach. Since they all refer to the same ML framework,
	or the upcoming [Ahead-Of-Time Compiled .pt2 format](https://pytorch.org/docs/main/torch.compiler_aot_inductor.html). Since they all refer to the same ML framework,

	\| `torch.compile` \| A model artifact obtained by [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html). \|
	\| `.pt2` \| A model artifact obtained using [Pytorch's AOTInductor](https://pytorch.org/docs/main/torch.compiler_aot_inductor.html). \|

	\| `torch.jit.script` \| A model artifact obtained by [`TorchScript`](https://pytorch.org/docs/stable/jit.html). \|
	\| `torchscript` \| A model artifact obtained by [`TorchScript Scripting`](https://pytorch.org/docs/stable/jit.html) and/or [`TorchScript Tracing`](https://pytorch.org/docs/stable/generated/torch.jit.trace.html). \|

		data_type: DataType


		# MLMClassification: TypeAlias = Annotated[

fix JSON schema with MLM fields + support pydantic/pystac objects #2

fix JSON schema with MLM fields + support pydantic/pystac objects #2

Conversation

fmigneault commented Mar 28, 2024

rbavery commented Mar 30, 2024 • edited Loading

fmigneault commented Mar 30, 2024

fmigneault commented Apr 4, 2024

rbavery left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbavery commented Mar 30, 2024 •

edited

Loading