Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type field to Collection & Catalog #971

Merged
merged 29 commits into from
Mar 3, 2021
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
3b59ad0
added type parameter to collection & catalog, plus best practice`
cholmes Feb 5, 2021
96a7bc8
added type info to changelog
cholmes Feb 5, 2021
3dec709
fix capitalization i broke in rebase
jisantuc Mar 2, 2021
44d0879
fix other capitalization i broke in rebase
jisantuc Mar 2, 2021
bd490c4
update schemata, remove extension language
jisantuc Mar 2, 2021
355c947
add type to example collections
jisantuc Mar 2, 2021
3bdca94
remove single-file-stac references
jisantuc Mar 2, 2021
0dcf18f
catalog doesn't declare a summaries field
jisantuc Mar 2, 2021
4e746db
remove more collection <- catalog inheritance language
jisantuc Mar 2, 2021
740e451
remove inheritance refs from best-practices
jisantuc Mar 2, 2021
1742c81
Update best-practices.md
jisantuc Mar 2, 2021
2b9840e
Update best-practices.md
jisantuc Mar 2, 2021
e77843a
capitalization
jisantuc Mar 2, 2021
b63641c
fixed small typo
cholmes Mar 2, 2021
75a9acd
typo fix: entityt -> entity
cholmes Mar 2, 2021
7e5d0f5
Update best-practices.md
cholmes Mar 3, 2021
4b20b90
Update best-practices.md
cholmes Mar 3, 2021
1c2db9b
Update best-practices.md
cholmes Mar 3, 2021
97362a5
Update best-practices.md
cholmes Mar 3, 2021
bdf49d5
Update best-practices.md
cholmes Mar 3, 2021
fdb6deb
Update collection-spec/collection-spec.md
cholmes Mar 3, 2021
4133b1d
Update best-practices.md
cholmes Mar 3, 2021
c624dac
Clarity on which type field to use
cholmes Mar 3, 2021
339ac7e
Update from PR review
cholmes Mar 3, 2021
6f34897
Added line back in that was removed.
cholmes Mar 3, 2021
681264c
More clarity on types.
cholmes Mar 3, 2021
24e5476
More on what collections share with catalogs
cholmes Mar 3, 2021
11ecb1a
Merge remote-tracking branch 'origin/dev' into add-type
m-mohr Mar 3, 2021
44f4e6d
removed inherits language
cholmes Mar 3, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- Recommendation to enable CORS
- A 'visual' option as an asset role.
- Added a best practice recommendation to keep collections at consistent levels.
- Catalog and Collection now require a `type` parameter, to be set to `Catalog` or `Collection` for clients to more easily distinguish them easily.

### Changed

Expand Down
40 changes: 35 additions & 5 deletions best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@
- [Static to Dynamic best practices](#static-to-dynamic-best-practices)
- [Ingestion and links](#ingestion-and-links)
- [Keep catalogs in sync with cloud notification and queue services](#keep-catalogs-in-sync-with-cloud-notification-and-queue-services)

- [How to Differentiate STAC Files](#how-to-differentiate-stac-files)

This document makes a number of recommendations for creating real world SpatioTemporal Asset Catalogs. None of them
are required to meet the core specification, but following these practices will make life easier for client tooling
and for users. They come about from practical experience of implementors and introduce a bit more 'constraint' for
Expand Down Expand Up @@ -355,8 +356,10 @@ as they can only have a single representation of their catalog, since the static
While it is up to the implementor to organize the catalog, it is recommended to arrange it in a way that would make sense
for a human to browse a set of STAC Items in an intuitive matter.

The recommendation for static catalogs is to define them using the file name `catalog.json` or `collection.json` to distinguish
the Catalog from other JSON type files. In order to support multiple catalogs, the recommended practice
Users indicate their intent for a file to be parsed as a Collection or Catalog using the required `type` field on
each entity. For Collections, this field must have the value `Collection`, while for Catalogs, it must have the
value `Catalog`. Additionally, we recommend for static STACs indicate contents using the filenames `catalog.json`
or `collection.json` to distinguish the Catalog from other JSON type files. In order to support multiple catalogs, the recommended practice
is to place the Catalog file in namespaces "directories". For example:

- current/catalog.json
Expand All @@ -372,7 +375,7 @@ for clients to consume. A dynamic catalog will sometimes be populated by a stati
fields stored as a cached static catalog.

Dynamic catalogs often also implement the [STAC API](https://github.com/radiantearth/stac-api-spec/) specification, that
responds to search queries (like give me all imagery in Oahu gathered on January 15, 2017). But they are not required to. One
responds to search queries (like "give me all imagery in Oahu gathered on January 15, 2017"). But they are not required to. One
can have a dynamic service that only implements the core STAC specification, and is crawled by STAC API implementations that
provide 'search'. For example a Content Management Service like Drupal or an open data catalog like CKAN could choose to expose
its content as linked STAC Items by implementing a dynamic catalog.
Expand All @@ -399,7 +402,7 @@ ended up doing. Following these recommendations makes for more legible catalogs,
if you follow these recommendations.

1. Root documents (Catalogs / Collections) should be at the root of a directory tree containing the static catalog.
2. Catalogs that are not also Collections should be named `catalog.json` and Collections should be named `collection.json`.
2. Catalogs should be named `catalog.json` and Collections should be named `collection.json`.
3. Items should be named `<id>.json`.
4. Sub-Catalogs should be stored in subdirectories of their parent (and only 1 subdirectory deeper than a document's parent) (e.g. `.../sample/sub1/catalog.json`).
5. Items should be stored in subdirectories of their parent Catalog.
Expand Down Expand Up @@ -623,3 +626,30 @@ basic geographic filtering from listeners.
The dynamic STAC API would then listen to the notifications and update its internal datastore whenever new data comes into
the static catalog. Implementors have had success using AWS Lambda to do a full 'serverless' updating of the elasticsearch
database, but it could just as easily be a server-based process.

## How to Differentiate STAC Files

Any tool that crawls a STAC implementation or encounters a STAC file in the wild needs a clear way to determine if it is an Item,
Collection, Catalog or [ItemCollection](https://github.com/radiantearth/stac-api-spec/tree/v1.0.0-beta.1/fragments/itemcollection)
(part of the [STAC API spec](https://github.com/radiantearth/stac-api-spec/tree/v1.0.0-beta.1). As of 1.0.0 this is done primarily with the `type` field, and secondarily in Items with `stac_version`, or optionally the `rel` of the link to it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this challenge is less pronounced in API settings than static catalog settings, since the OpenAPI spec for the API tells you what to expect at each endpoint (and also catalogs don't live anywhere in the API). That said if someone saves off an ItemCollection and then refers to it from a static catalog, I guess all bets are off, so I think I've talked myself into this inclusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing your thoughts, even if you talked yourself into it. And I agree with both points - some people may save the itemcollection out in the wild.


```shell
if type is `Collection`
=> Collection
else if type is `Catalog`
=> Catalog
else if type is 'Feature' and stac_version is defined // stac_version in items is only available since 0.8, check for (stac_version or assets) to support pre-0.8 data
=> Item
else if type is 'FeatureCollection' and stac_version is defined
=> ItemCollection
else
=> Invalid (GeoJSON)
```

When actually crawling a STAC implementation one can also make use of the [relation type](catalog-spec/catalog-spec.md#relation-types
) (`rel` field) when following a link. If it is an `item` rel type then the file must be a STAC Item. If it is `child`, `parent` or
`root` then it must be a Catalog or a Collection, and the `type` field can be used to distinguish which one.

In versions of STAC prior to 1.0 the process was a bit more complicated, as there was no `type` field for catalogs and collections.
See [this issue comment](https://github.com/radiantearth/stac-spec/issues/889#issuecomment-684529444) for a heuristic that works
for older STAC versions (to 0.8.0, older than that should not be needed).
7 changes: 4 additions & 3 deletions catalog-spec/catalog-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ that contains all the required fields is a valid STAC Catalog.

- [Examples](../examples/)
- See an example [catalog.json](../examples/catalog.json). The [collection.json](../examples/collection.json) is also a valid
catalog file, demonstrating linking to items (it is also a Collection, so has additional fields)
Catalog file, demonstrating linking to items (it is also a Collection, so has additional fields)
- [JSON Schema](json-schema/catalog.json)

The [Catalog section of the Overview](../overview.md#catalog-overview) document provides background information on
Expand All @@ -40,6 +40,7 @@ also a valid STAC Catalog.
| Element | Type | Description |
| --------------- | ------------- | ------------------------------------------------------------ |
| stac_version | string | **REQUIRED.** The STAC version the Catalog implements. STAC versions can be mixed, but please keep the [recommended best practices](../best-practices.md#mixing-stac-versions) in mind. |
| type | string | **REQUIRED.** Set to `Catalog` if this Catalog only implements the Catalog spec, or `Collection` if it additional meets the [collection](../collection-spec/collection-spec.md) requirements and should be treated by clients as such. |
| stac_extensions | \[string] | A list of extension identifiers the Catalog implements. |
| id | string | **REQUIRED.** Identifier for the Catalog. |
| title | string | A short descriptive one-line title for the Catalog. |
Expand All @@ -51,7 +52,7 @@ also a valid STAC Catalog.

#### stac_extensions

A list of extensions the Catalog implements. This does NOT declare the extensions of children or Items. The list contains URLs to the JSON Schema files it can be validated against. For official [content extensions](../extensions/README.md#list-of-stac-extensions), a "shortcut" can be used. This means you can specify the folder name of the extension, for example `single-file-stac` for the Point Cloud extension. If the versions of the extension and the Catalog diverge, you can specify the URL of the JSON schema file.
A list of extensions the Catalog implements. This does NOT declare the extensions of children or Items. The list contains URLs to the JSON Schema files it can be validated against. For official [content extensions](../extensions/README.md#list-of-stac-extensions), a "shortcut" can be used. This means you can specify the folder name of the extension, for example `pointcloud` for the Point Cloud extension. If the versions of the extension and the Catalog diverge, you can specify the URL of the JSON schema file.
This list must only contain extensions that extend the Catalog itself, see the the 'Scope' column in the list of extensions.

### Link Object
Expand All @@ -75,7 +76,7 @@ The following types are commonly used as `rel` types in the Link Object of a STA

| Type | Description |
| ------- | ----------- |
| self | STRONGLY RECOMMENDED. *Absolute* URL to the location that the catalog file can be found online, if available. This is particularly useful when in a download package that includes metadata, so that the downstream user can know where the data has come from. |
| self | STRONGLY RECOMMENDED. *Absolute* URL to the location that the Catalog file can be found online, if available. This is particularly useful when in a download package that includes metadata, so that the downstream user can know where the data has come from. |
| root | STRONGLY RECOMMENDED. URL to the root STAC Catalog or [Collection](../collection-spec/README.md). Catalogs should include a link to their root, even if it's the root and points to itself. |
| parent | URL to the parent STAC Catalog or Collection. Non-root Catalogs should include a link to their parent. |
| child | URL to a child STAC Catalog or Collection. |
Expand Down
130 changes: 130 additions & 0 deletions catalog-spec/json-schema/catalog-core.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://schemas.stacspec.org/v1.0.0-beta.2/catalog-spec/json-schema/catalog.json#",
"title": "Core STAC Catalog Fields Specification",
"description": "This object represents the fields shared by Catalogs and Collections",
"allOf": [
{
"$ref": "#/definitions/catalogCore"
}
],
"definitions": {
"catalogCore": {
"title": "Catalog Core Fields",
"type": "object",
"required": [
"stac_version",
"id",
"description",
"links",
],
"properties": {
"stac_version": {
"title": "STAC version",
"type": "string",
"const": "1.0.0-beta.2"
},
"stac_extensions": {
"title": "STAC extensions",
"type": "array",
"uniqueItems": true,
"items": {
"anyOf": [
{
"title": "Reference to a JSON Schema",
"type": "string",
"format": "iri"
},
{
"title": "Reference to a core extension",
"type": "string"
}
]
}
},
"id": {
"title": "Identifier",
"type": "string"
},
"title": {
"title": "Title",
"type": "string"
},
"description": {
"title": "Description",
"type": "string"
},
"links": {
"title": "Links",
"type": "array",
"items": {
"$ref": "#/definitions/link"
}
},
"summaries": {
"$ref": "#/definitions/summaries"
}
}
},
"link": {
"type": "object",
"required": [
"rel",
"href"
],
"properties": {
"href": {
"title": "Link reference",
"type": "string",
"format": "iri-reference"
},
"rel": {
"title": "Link relation type",
"type": "string"
},
"type": {
"title": "Link type",
"type": "string"
},
"title": {
"title": "Link title",
"type": "string"
}
}
},
"summaries": {
"type": "object",
"additionalProperties": {
"oneOf": [
{
"title": "Stats",
"type": "object",
"required": [
"minimum",
"maximum"
],
"properties": {
"minimum": {
"title": "Minimum value",
"type": ["number", "string"]
},
"maximum": {
"title": "Maximum value",
"type": ["number", "string"]
}
}
},
{
"title": "Set of values",
"type": "array",
"minItems": 1,
"items": {
"description": "Any data type could occur."
}
}
]
}
}
}
}

112 changes: 6 additions & 106 deletions catalog-spec/json-schema/catalog.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
"title": "STAC Catalog Specification",
"description": "This object represents Catalogs in a SpatioTemporal Asset Catalog.",
"allOf": [
{
"$ref": "./catalog-core.json"
},
{
"$ref": "#/definitions/catalog"
}
Expand All @@ -13,117 +16,14 @@
"title": "Catalog",
"type": "object",
"required": [
"stac_version",
"id",
"description",
"links"
],
"properties": {
"stac_version": {
"title": "STAC version",
"type": "string",
"const": "1.0.0-beta.2"
},
"stac_extensions": {
"title": "STAC extensions",
"type": "array",
"uniqueItems": true,
"items": {
"anyOf": [
{
"title": "Reference to a JSON Schema",
"type": "string",
"format": "iri"
},
{
"title": "Reference to a core extension",
"type": "string"
}
]
}
},
"id": {
"title": "Identifier",
"type": "string"
},
"title": {
"title": "Title",
"type": "string"
},
"description": {
"title": "Description",
"type": "string"
},
"links": {
"title": "Links",
"type": "array",
"items": {
"$ref": "#/definitions/link"
}
},
"summaries": {
"$ref": "#/definitions/summaries"
}
}
},
"link": {
"type": "object",
"required": [
"rel",
"href"
"type"
],
"properties": {
"href": {
"title": "Link reference",
"type": "string",
"format": "iri-reference"
},
"rel": {
"title": "Link relation type",
"type": "string"
},
"type": {
"title": "Link type",
"type": "string"
},
"title": {
"title": "Link title",
"type": "string"
"title": "Type of STAC entityt",
"const": "Catalog"
}
}
},
"summaries": {
"type": "object",
"additionalProperties": {
"oneOf": [
{
"title": "Stats",
"type": "object",
"required": [
"minimum",
"maximum"
],
"properties": {
"minimum": {
"title": "Minimum value",
"type": ["number", "string"]
},
"maximum": {
"title": "Maximum value",
"type": ["number", "string"]
}
}
},
{
"title": "Set of values",
"type": "array",
"minItems": 1,
"items": {
"description": "Any data type could occur."
}
}
]
}
}
}
}
9 changes: 4 additions & 5 deletions collection-spec/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
# STAC Collections

The STAC [Collection specification](collection-spec.md) provides JSON fields to describe a set of Items, to help enable discovery.
It builds on the [Catalog Specification](../catalog-spec/catalog-spec.md), using the flexible structure detailed there to
further define and explain logical groups of [Items](../item-spec/item-spec.md). It shares the same fields and therefore every
Collection is also a valid Catalog - the JSON structure extends the core Catalog definition. Collections can have both parent Catalogs
and Collections as well as child Items, Catalogs and Collections.
It builds on fields from the [Catalog Specification](../catalog-spec/catalog-spec.md), using the flexible structure detailed there to
further define and explain logical groups of [Items](../item-spec/item-spec.md). Collections can have both parent Catalogs
and Collections as well as child Items, Catalogs and Collections.

The Collection concept can be used very flexibly - it just provides additional metadata about a set of Items. But it generally
The Collection concept can be used very flexibly - it provides additional metadata about a set of Items. But it generally
is used to describe a set of assets that are defined with the same properties and share higher level metadata. There is no standardized
name for this, and others called it: dataset series (ESA, ISO 19115), collection (CNES, NASA), dataset (JAXA), product (JAXA). Or
viewed in GIS terms, the Items are '[features](https://en.wikipedia.org/wiki/Simple_Features)' (that link to assets) and a
Expand Down
Loading