Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a proposal for a recipe for "category tables", which allows categorical variables to define categories via table resources. As indicated in the recipe, this approach as a number of advantages:
In case of a large number of categories it is often easier to maintain these in files, such as CSV files. This also keeps the
datapackage.json
file compact and readable for humans.The data set in the category table resource can store additional information besides the 'value' and 'label'. For example, the categories could have descriptions or the categories could form a hierarchy.
It is also possible to store additional meta data in the category table resource. For example, it is possible to indicate the source, license, version and owner of the data resource. This is important for many 'official' categories lists where there can be many similar versions maintained by different organisations.
When different fields use the same categories they can all refer to the same category table resource. First, this allows to reuse of the categories. Second, by referring to the same data resource, the field descriptors can indicate that the categories are comparable between fields.
It is possible to refer to category table resources in other data packages. This makes it, for example, possible to create centrally maintained repositories of categories.
It was first proposed / discussed here: #888
The current PR was work-shopped in great detail by myself, @djvanderlaan, @fomcl and @pschumm, and we plan to have a live discussion / Q & A at our next community meeting (2025-02-06). In the meantime, we look forward to everyone's thoughts and feedback.