ONNX Model Hub Proposal #455

mhamilton723 · 2021-08-04T16:49:03Z

ONNX Model Hub Proposal

Why:

Our goal is to create a similar experience as the torch model hub, which allows users to load models with a single line:

import torch
model = torch.hub.load('facebookresearch/dino:main', 'dino_vits16')

We propose an API which looks like the following:

import onnx
# Access main onnx/models repo
model = onnx.hub.load('resnet50’)

# Commit hash/ custom repo access
model2 = onnx.hub.load(‘resnet50’, 'onnx/models:2jd82019dj23')

This dramatically simplifies user interaction with the models hosted in the current ONNX model zoo and allows the ONNX community to simplify and automate model sharing with minimal overhead leveraging existing git LFS stored models.

Design:

The current ONNX model repository has settled on git LFS for model storage and download. Within this repository models are organized by type and application domain, and have a roughly uniform structure with “.onnx” models hosted within the girthub repo available for download. Our goal is to align as closely as possible with the current Git LFS based system to enable versioned model downloads, with an added manifest to the ONNX model repo.:

ONNX_HUB_MANIFEST.json:

   {
      "name":"resnet50",
      "path":"vision/classification/resnet/model/resnet50-v2-7.tar.gz",
      "metadata":{
         "domain":"vision/classification",
         "checksum":"s872her8dj209274n",
         "foo":"bar",
			
      }
   },
   {
      "name":"arcface",
      "path":"...",
      "metadata":{
         "domain":"vision/face",
         "foo":"bar"
      }
   }

The client code will look for ONNX_HUB_MANIFEST.json at the specified github location and use this file to

support queries on which models are available informed by metadata domain tags (if available)
download the model from the relative github path (relative path allows version specific downloading and deterministic behavior for production systems)
optional checksum verification using the metadata
local caching of downloaded models in a configurable location with parameters to override cache

The official onnx hub manifest file will be checked for correctness by an CICD gate in the onnx/models repository which:

Checks to ensure links are live
Optional: Download and verify checksum
Optional: Ensures model metadata is neat and falls into a reasonable ontology

FAQ:

Will we need to re-host the models somewhere other than GIT LFS

No

Will we break production users by introducing breaking changes to the manifest?

No, production users can depend on a commit hash or git tag directly or host their own

Will users be able to host their own ONNX hubs?

Yes, by copying the manifest format they can turn their own repos into hubs allowing for enterprises to use this API on their own model collections

How will this work in Synapse’s Data Exfiltration Protection (DEP) mode?

Synapse can host a fork of ONNX model hub within the DEP layer to allow DEP customers to use MMLSpark’s upcoming ONNX on Spark transformer with a private copy of the model hub. Maintenance of this fork is handled via git pulls.

Is this secure?

This is as secure as the current practice of downloading by the github link (as that is exactly what the client will do). Checksums allow for additional levels of security

Is this performant?

Github downloads are backed by a CDN so it should provide fast speeds across the globe

Is this download functionality limited to python?

No, any client can read the manifest and duplicate this very simple download, cache, and load logic.

Why metadata?

The metadata fields hold optional information not required for the basic level of functionality – downloading models. The exact metadata used by the various clients can be discussed in this proposal. A simple first start might include a “domain” tag to help organize the models and a “checksum” tag for increased security. We encourage a discussion around what metadata should be included in the official hub at onnx/models.

Example Metadata

Here is a (noncomprehensive) list of metadata that can be attached to each model in the manifest:

Category	Name	Type	Note
Metrics	Top1-Accuracy	Double
	Top5-Accuracy	Double
	AUC	Double
	Objective	String	Classification, Regression, Forecast, Cluster, etc.
	Domain	String	vision, text, etc.
Inputs and outputs	Inputs	Map[String, NodeInfo]	This could allow models within a domain to have the same default API
	Outputs	Map[String, NodeInfo]	This could allow models within a domain to have the same default API
	PreProcessing	String	Instruction for preprocessing, such as input normalization.
	PostProcessing	String	Instruction for postprocessing, such as softmax.
Reproducibility	TrainingData	URL
	TestingData	URL
Interpretability	Notebook/Dashboard	URL	Link to interpretibility notebook, widgets or dashboards
Fairness	Demographic Parity
	Equalized Odds Ratio

Relevant Model Zoo feature areas

Which area in the Model Zoo infrastructure does this impact?
Feature Area (e.g. CI, file organization, model storage, other): other

Notes:

Microsoft ML for Apache Spark plans to replace it’s CNTK on Spark inference stack with a more modern ONNX inference stack. As part of this work , they hope to replace their current CNTK model repository with a more broadly applicable solution. Instead of hosting their own model repository they hope to align with the ONNX community and design this bit of infrastructure for general-purpose consumption both inside and outside of the Apache Spark ecosystem.

The text was updated successfully, but these errors were encountered:

wenbingl · 2021-08-10T22:34:45Z

that's great! look forward to the PR which may not be in this repo.

askhade · 2021-08-19T15:54:19Z

Thanks for the proposal and the PR.
Wanted to add some details around git lfs... Recently we have been hitting quota limits in git lfs more often. This prevents model downloads until we increase the quota and it is disruptive. Although at this moment we dont have any plans for moving away from git lfs but it is possible we may explore other solutions in future. In light of this, we should make sure the hub api is not tightly coupled with git lfs and that changing the backend (where we host the models) should be fairly easy.

GeorgeS2019 · 2022-01-12T07:03:06Z

This site Readme.md has not been updated especially of the contributed PR on STT/ASR and TTS

An up-to-date Onnx Model Hub is urgently needed by ONNX community

This issue is a tentative site to collect various ONNX Zoo sites to accelerate sharing of ONNX models. (which seems to be a success as a standard !)

jcwchen · 2022-02-28T21:23:46Z

Thank you @mhamilton723 for the contribution: ONNX Model hub is now included in recent ONNX 1.11 release. I think we can close this issue now.

mhamilton723 added the enhancement label Aug 4, 2021

This was referenced Aug 13, 2021

Add ONNX Hub Manifest and Generation Code #458

Merged

Add ONNX model hub python client onnx/onnx#3663

Merged

prasanthpul mentioned this issue Sep 2, 2021

Model Zoo API #355

Closed

wenbingl pinned this issue Sep 10, 2021

da-liii mentioned this issue Jan 11, 2022

Discussion: A Rikai Model Zoo eto-ai/rikai#481

Open

9 tasks

jcwchen closed this as completed Feb 28, 2022

jcwchen unpinned this issue Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX Model Hub Proposal #455

ONNX Model Hub Proposal #455

mhamilton723 commented Aug 4, 2021

wenbingl commented Aug 10, 2021

askhade commented Aug 19, 2021

GeorgeS2019 commented Jan 12, 2022 •

edited

Loading

jcwchen commented Feb 28, 2022

ONNX Model Hub Proposal #455

ONNX Model Hub Proposal #455

Comments

mhamilton723 commented Aug 4, 2021

ONNX Model Hub Proposal

Why:

Design:

FAQ:

Will we need to re-host the models somewhere other than GIT LFS

Will we break production users by introducing breaking changes to the manifest?

Will users be able to host their own ONNX hubs?

How will this work in Synapse’s Data Exfiltration Protection (DEP) mode?

Is this secure?

Is this performant?

Is this download functionality limited to python?

Why metadata?

Example Metadata

Relevant Model Zoo feature areas

Notes:

wenbingl commented Aug 10, 2021

askhade commented Aug 19, 2021

GeorgeS2019 commented Jan 12, 2022 • edited Loading

jcwchen commented Feb 28, 2022

GeorgeS2019 commented Jan 12, 2022 •

edited

Loading