-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset metadata spec #164
Changes from 14 commits
4ff110f
2980c07
c4b6e94
8dab2ac
9e9414b
431fe02
d32c1e2
c31422e
6224a83
4fef59f
28d25fc
f4ccca6
e7a9e5c
032cf97
89e35a9
e7d7641
6a32aca
af1b16a
a592be6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
# STAC Dataset Spec | ||
|
||
[STAC Items](https://github.com/radiantearth/stac-spec/json-spec/) are focused on search within a dataset*. Another topic of interest is the search of datasets, instead of within a dataset. The Dataset Spec is an independent spec that STAC Items are *strongly recommended* to use. Other parties can also independently use this spec to describe datasets in a lightweight way. | ||
|
||
The Datasets Spec is a superset of the [Catalog Spec](../static-catalog/). I shares the same fields and therefore every Dataset is also a valid Catalog. Datasets can have both parent Catalogs and Datasets and child Items, Catalogs and Datasets. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I -> It Also not sure if 'is a superset' is the best way to explain it. I'd say more like 'extends the catalog spec'. Probably would add a bit of color, like 'extends the catalog spec with additional fields to describe the set of items in the catalog'. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Both fixed/added. |
||
|
||
A Dataset can be represented in JSON format. Any JSON object that contains all the required fields is a valid STAC Dataset and Catalog. | ||
|
||
* [Example (Sentinel 2)](example-s2.json) | ||
* [JSON Schema](json-schema/dataset.json) | ||
|
||
*\* There is no standardized name for the concept we are describing here. Others called it: dataset series (ISO 19115), collection (CNES, NASA), dataset (JAXA), dataset series (ESA), product (JAXA).* | ||
|
||
## WARNING | ||
|
||
**This is still an early version of the STAC spec, expect that there may be some changes before everything is finalized.** | ||
|
||
Implementations are encouraged, however, as good effort will be made to not change anything too drastically. Using the specification now will ensure that needed changes can be made before everything is locked in. So now is an ideal time to implement, as your feedback will be directly incorporated. | ||
|
||
## Dataset fields | ||
|
||
| Element | Type | Description | | ||
| ----------- | ----------------- | ------------------------------------------------------------ | | ||
| name | string | **REQUIRED.** Identifier for the dataset that is unique across the provider. | | ||
| title | string | A short descriptive one-line title for the dataset. | | ||
| description | string | **REQUIRED.** Detailed multi-line description to fully explain the entity. [CommonMark 0.28](http://commonmark.org/) syntax MAY be used for rich text representation. | | ||
| keywords | [string] | List of keywords describing the dataset. | | ||
| version | string | Version of the dataset. [Semantic Versioning (SemVer)](https://semver.org/) SHOULD be followed. | | ||
| license | string | **REQUIRED.** Dataset's license(s) as a [SPDX License identifier or expression](https://spdx.org/licenses/) or `proprietary` if the license is not on the SPDX license list. Proprietary licensed data SHOULD add a link to the license text, see the `license` relation type. | | ||
| provider | [Provider Object] | Data provider, the organizations which influenced the content of the dataset. | | ||
| host | Host Object | Storage provider, the organization that hosts the dataset. | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In addition to the 'host' field, I suggest the 'source' field of the same type Host that points at the canonical source of the data. This is needed for 99% of datasets in the EE catalog, as we are mostly a mirror. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Open to add that, but I think we can add that with another PR shortly after this PR. |
||
| extent | [Extent Object] | **REQUIRED.** Spatial and temporal extents. | | ||
| links | [Link Object] | **REQUIRED.** A list of references to other documents. | | ||
|
||
### Extent Object | ||
|
||
The object describes the spatio-temporal extents of the dataset. Both spatial and temporal extents are required to be specified. | ||
|
||
**Note:** STAC datasets tries to be compliant to [WFS 3.0](https://github.com/opengeospatial/WFS_FES), but there are still issues to be solved. The WFS specification is in draft state any may change, especially regarding [3D support](https://github.com/opengeospatial/WFS_FES/issues/143) for spatial extents or the handling of [open date ranges](https://github.com/opengeospatial/WFS_FES/issues/155) for temporal extents. Therefore, It is also likely that the following fields change over time. | ||
|
||
| Element | Type | Description | | ||
| -------- | -------- | ------------------------------------------------------------ | | ||
| spatial | [number] | **REQUIRED.** Potential *spatial extent* covered by the dataset. West, north, east, south edges of the spatial extent. Only WGS84 longitude/latitude is supported. The list of four numbers can be extended to six numbers to support a 3D spatial extent. | | ||
| temporal | string | **REQUIRED.** Potential *temporal extent* covered by the dataset. Date/time intervals MUST be formatted according to ISO 8601. Open date ranges are supported by omitting either the start or the end time. Example for data from the beginning of 2019 until now: `2009-01-01T00:00:00Z/`. | | ||
|
||
### Provider Object | ||
|
||
The object provides information about a provider. A provider is any of the organizations that created or processed the content of the dataset and therefore influenced the data offered by this dataset. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you said somewhere that multiple providers are allowed? If so I'd make it a bit more explicit here. If not then we should provide guidance on which one of the 'any...that created or processed'. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Provider object provides information about a single provider, but the Dataset object can hold multiple providers in an array. I made that more clear in the provider field. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sent, sounds great. |
||
|
||
| Field Name | Type | Description | | ||
| ---------- | ------ | ------------------------------------------------------------ | | ||
| name | string | **REQUIRED.** The name of the organization or the individual. | | ||
| url | string | Homepage of the provider. | | ||
|
||
### Host Object | ||
|
||
The objects provides information about the storage provider hosting the data. | ||
|
||
**Note:** The idea of storage profiles is currently [discussed](https://github.com/radiantearth/stac-spec/issues/148). Therefore, scheme, id and region may be removed from the final spec once this concept id introduced to STAC. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo: id -> is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. |
||
|
||
| Field Name | Type | Description | | ||
| -------------- | ------- | ------------------------------------------------------------ | | ||
| name | string | **REQUIRED.** The name of the organization or the individual hosting the data. | | ||
| description | string | Detailed description to explain the hosting details. [CommonMark 0.28](http://commonmark.org/) syntax MAY be used for rich text representation. | | ||
| scheme | string | **REQUIRED.** The protocol/scheme used to access the data. Any of: `S3`, `GCS`, `URL`, `OTHER` | | ||
| id | string | **REQUIRED.** Host-specific identifier such as an URL or asset id. | | ||
| region | string | Provider specific region where the data is stored. | | ||
| requester_pays | boolean | `true` if requester pays, `false` if host pays. Defaults to `false`. | | ||
|
||
### Link Object | ||
|
||
This object describes a relationship with another entity. Data providers are advised to be liberal with links. | ||
|
||
| Field Name | Type | Description | | ||
| ---------- | ------ | ------------------------------------------------------------ | | ||
| href | string | **REQUIRED.** The actual link in the format of an URL. Relative and absolute links are both allowed. | | ||
| rel | string | **REQUIRED.** Relationship between the current document and the linked document. See chapter "Relation types" for more information. | | ||
| type | string | MIME-type of the referenced entity. | | ||
|
||
#### Relation types | ||
|
||
The following types are commonly used as `rel` types in the Link Object of a Dataset: | ||
|
||
| Type | Description | | ||
| ------- | ------------------------------------------------------------ | | ||
| self | **REQUIRED.** *Absolute* URL to the dataset file itself. This is required, to represent the location that the file can be found online. This is particularly useful when in a download package that includes metadata, so that the downstream user can know where the data has come from. | | ||
| root | URL to the root [STAC Catalog](../static-catalog/) or Dataset. | | ||
| parent | URL to the parent [STAC Catalog](../static-catalog/) or Dataset. | | ||
| child | URL to a child [STAC Catalog](../static-catalog/) or Dataset. | | ||
| item | URL to a [STAC Item](../json-spec/). | | ||
| license | The license URL for the dataset SHOULD be specified if the `license` field is set to `proprietary`. If there is no public license URL available, it is RECOMMENDED to supplement the STAC catalog with the license text in separate file and link to this file. | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we distinguish between links to official license pages vs local copies? Eg, rel="license_copy" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why should we distinguish them? Don't see a good reason yet and we don't do it for items, too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having to create a local license file means that the data provider has not organized their licenses well and may change them later, making local copies obsolete. But this is speculative, so I'm okay with not changing this now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, great. Please, make sure to open an issue on this one so we don't forget. |
||
|
||
## Extensions | ||
|
||
Important related extensions for the dataset spec: | ||
|
||
* [EO extension](../extensions/stac-eo-spec.md) | ||
Please note that some fields such as `eo:sun_elevation ` or `eo:sun_azimuth` are only meaningful on the item level and MUST not be used in datasets. | ||
* [Dimensions extension](../extensions/dimensions) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we don't have any examples of data with extra dimensions yet, so I suggest dropping dimensions from this version (and just mentioning it as 'planned" here, like we do for the provenance extension) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have at least a draft for this extension in contrast to provenance. And I have data for the dimension extension. I'll ask my colleague to publish that. But we can move that to a separate PR, if that's preferred by you or others. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, I'd rather reduce this PR and then calmly discuss the change separately. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, will separate when I'm back in the office in 9 hours ;) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dimensions extension has been moved to PR #227. |
||
* [Scientific extension](../extensions/scientific) | ||
* Provenance extension (planned, see [issue #179](https://github.com/radiantearth/stac-spec/issues/179)) | ||
|
||
The [extensions page](../extensions/) gives a full overview about relevant extensions for STAC Datasets. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
{ | ||
"name": "COPERNICUS/S2", | ||
"title": "Sentinel-2 MSI: MultiSpectral Instrument, Level-1C", | ||
"description": "Sentinel-2 is a wide-swath, high-resolution, multi-spectral\nimaging mission supporting Copernicus Land Monitoring studies,\nincluding the monitoring of vegetation, soil and water cover,\nas well as observation of inland waterways and coastal areas.\n\nThe Sentinel-2 data contain 13 UINT16 spectral bands representing\nTOA reflectance scaled by 10000. See the [Sentinel-2 User Handbook](https://sentinel.esa.int/documents/247904/685211/Sentinel-2_User_Handbook)\nfor details. In addition, three QA bands are present where one\n(QA60) is a bitmask band with cloud mask information. For more\ndetails, [see the full explanation of how cloud masks are computed.](https://sentinel.esa.int/web/sentinel/technical-guides/sentinel-2-msi/level-1c/cloud-masks)\n\nEach Sentinel-2 product (zip archive) may contain multiple\ngranules. Each granule becomes a separate Earth Engine asset.\nEE asset ids for Sentinel-2 assets have the following format:\nCOPERNICUS/S2/20151128T002653_20151128T102149_T56MNN. Here the\nfirst numeric part represents the sensing date and time, the\nsecond numeric part represents the product generation date and\ntime, and the final 6-character string is a unique granule identifier\nindicating its UTM grid reference (see [MGRS](https://en.wikipedia.org/wiki/Military_Grid_Reference_System)).\n\nFor more details on Sentinel-2 radiometric resoltuon, [see this page](https://earth.esa.int/web/sentinel/user-guides/sentinel-2-msi/resolutions/radiometric).\n", | ||
"license": "proprietary", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the S2 license 'proprietary'? If it is in our current definition maybe we should expand the definition a bit. Or ideally find the ideal spdx license. For landsat I think we just used https://spdx.org/licenses/PDDL-1.0.html as getting across the intent of US public domain stuff... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Honestly, I don't really know and don't feel in the position to decide that. That example is taken from GEE and I handled it the way GEE does. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, cool. Super minor issue, so let's just leave it, and following GEE sounds good. |
||
"keywords": [ | ||
"copernicus", | ||
"esa", | ||
"eu", | ||
"msi", | ||
"radiance", | ||
"sentinel" | ||
], | ||
"provider": [ | ||
{ | ||
"name": "European Union/ESA/Copernicus", | ||
"url": "https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi" | ||
} | ||
], | ||
"extent": { | ||
"spatial": [ | ||
180.0, | ||
-56.0, | ||
-180.0, | ||
83.0 | ||
], | ||
"temporal": "2015-06-23T00:00:00/" | ||
}, | ||
"links": [ | ||
{ | ||
"rel": "self", | ||
"href": "https://storage.cloud.google.com/earthengine-test/catalog/COPERNICUS_S2.json" | ||
}, | ||
{ | ||
"rel": "parent", | ||
"href": "https://storage.cloud.google.com/earthengine-test/catalog/catalog.json" | ||
}, | ||
{ | ||
"rel": "root", | ||
"href": "https://storage.cloud.google.com/earthengine-test/catalog/catalog.json" | ||
}, | ||
{ | ||
"rel": "license", | ||
"href": "https://scihub.copernicus.eu/twiki/pub/SciHubWebPortal/TermsConditions/Sentinel_Data_Terms_and_Conditions.pdf" | ||
} | ||
] | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,148 @@ | ||
{ | ||
"$schema": "http://json-schema.org/draft-06/schema#", | ||
"id": "dataset.json#", | ||
"title": "Dataset Item", | ||
"description": "This object represents the dataset in a SpatioTemporal Asset Catalog.", | ||
"type": "object", | ||
"required": [ | ||
"name", | ||
"description", | ||
"license", | ||
"extent", | ||
"links" | ||
], | ||
"additionalProperties": true, | ||
"properties": { | ||
"name": { | ||
"title": "Identifier", | ||
"type": "string" | ||
}, | ||
"title": { | ||
"title": "Title", | ||
"type": "string" | ||
}, | ||
"description": { | ||
"title": "Description", | ||
"type": "string" | ||
}, | ||
"keywords": { | ||
"title": "Keywords", | ||
"type": "array", | ||
"items": { | ||
"type": "string" | ||
} | ||
}, | ||
"version": { | ||
"title": "Dataset Version", | ||
"type": "string" | ||
}, | ||
"license": { | ||
"title": "Dataset License Name", | ||
"type": "string" | ||
}, | ||
"provider": { | ||
"type": "array", | ||
"items": { | ||
"properties": { | ||
"name": { | ||
"title": "Organization Name", | ||
"type": "string" | ||
}, | ||
"url": { | ||
"title": "Organization homepage", | ||
"type": "string", | ||
"format": "url" | ||
} | ||
} | ||
} | ||
}, | ||
"host": { | ||
"required": [ | ||
"name", | ||
"scheme", | ||
"id" | ||
], | ||
"properties": { | ||
"name": { | ||
"title": "Organization name", | ||
"type": "string" | ||
}, | ||
"description": { | ||
"title": "Description", | ||
"type": "string" | ||
}, | ||
"scheme": { | ||
"title": "Scheme", | ||
"type": "string", | ||
"enum": [ | ||
"S3", | ||
"GCS", | ||
"URL", | ||
"OTHER" | ||
] | ||
}, | ||
"id": { | ||
"title": "Identifirer", | ||
"type": "string" | ||
}, | ||
"region": { | ||
"title": "Region", | ||
"type": "string" | ||
}, | ||
"requester_pays": { | ||
"title": "Requester Pays", | ||
"type": "boolean", | ||
"default": false | ||
} | ||
}, | ||
"additionalProperties": true | ||
}, | ||
"extent": { | ||
"title": "Extents", | ||
"type": "object", | ||
"required": [ | ||
"spatial", | ||
"temporal" | ||
], | ||
"properties": { | ||
"spatial": { | ||
"title": "Spatial extent", | ||
"type": "array", | ||
"items": { | ||
"type": "number" | ||
} | ||
}, | ||
"temporal": { | ||
"title": "Temporal extent", | ||
"type": "string" | ||
} | ||
}, | ||
"additionalProperties": true | ||
}, | ||
"links": { | ||
"type": "array", | ||
"items": { | ||
"type": "object", | ||
"required": [ | ||
"href", | ||
"rel" | ||
], | ||
"properties": { | ||
"href": { | ||
"title": "Link", | ||
"type": "string" | ||
}, | ||
"rel": { | ||
"title": "Relation", | ||
"type": "string" | ||
}, | ||
"type": { | ||
"title": "type", | ||
"type": "string" | ||
} | ||
}, | ||
"additionalProperties": true | ||
} | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# STAC Dimensions Extension Spec | ||
|
||
This document explains the fields of the STAC Dimensions Extension (dim) to a STAC `Dataset`. Data can have different dimensions (= axes), e.g. in meteorology. The properties of these dimensions can be defined with this extension. | ||
|
||
## Dimensions Extension Description | ||
|
||
This is the field that extends the `Dataset` object: | ||
|
||
| Element | Type | Name | Description | | ||
| ---------------- | -------------------- | ------------------------- | ------------------------------------------------------------ | | ||
| dim:dimensions | [Dimension Object] | Dimensions | Dimensions of the data. If the dimensions have an order, the order SHOULD be reflected in the order of the array. | | ||
|
||
### Dimension Object | ||
|
||
| Element | Type | Name | Description | | ||
| ------- | ---------------- | ------------------- | ------------------------------------------------------------ | | ||
| label | string | Label (required) | Human-readable label for the dimension. | | ||
| unit | string | Unit of Measurement | Unit of measurement, preferably SI. ToDo: Any standard to express this, e.g. [UDUNITS](https://www.unidata.ucar.edu/software/udunits/) or this [dict](https://www.unc.edu/~rowlett/units/)? | | ||
| extent | [number\|string] | Data Extent | Specifies the extent of the data, i.e. the lower bound as the first element and the upper bound as the second element of the array. | |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
{ | ||
"dim:dimensions": [ | ||
{ | ||
"label": "Longitude", | ||
"unit": "°", | ||
"extent": [-180, 180] | ||
}, | ||
{ | ||
"label": "Latitude", | ||
"unit": "°", | ||
"extent": [-90, 90] | ||
}, | ||
{ | ||
"label": "Temperature", | ||
"unit": "°C", | ||
"extent": [-20, 60] | ||
}, | ||
{ | ||
"label": "Date", | ||
"extent": ["2018-01-01T00:00:00Z", "2018-01-31T23:59:59Z"] | ||
} | ||
] | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
{ | ||
"$schema": "http://json-schema.org/draft-07/schema#", | ||
"type": "object", | ||
"title": "STAC Dimensions Extension Spec", | ||
"properties": { | ||
"dim:dimensions": { | ||
"type": "array", | ||
"title": "Dimensions", | ||
"items": { | ||
"type": "object", | ||
"required": [ | ||
"label" | ||
], | ||
"properties": { | ||
"label": { | ||
"type": "string", | ||
"title": "Label" | ||
}, | ||
"unit": { | ||
"type": "string", | ||
"title": "Unit of Measurement" | ||
}, | ||
"extent": { | ||
"type": "array", | ||
"title": "Data Extent", | ||
"minItems": 2, | ||
"maxItems": 2, | ||
"items": { | ||
"type": ["number", "string"] | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it that STAC Items 'use' the spec? I guess 'use' is less clear to me, it seems to imply that they should maybe implement the fields? Maybe describe what 'use' means? Like 'STAC Items are recommended to provide a link to a dataset definition'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved that as suggested.