Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recipe Proposal: Category Tables #1081

Open
wants to merge 11 commits into
base: next
Choose a base branch
from

Conversation

khusmann
Copy link
Contributor

@khusmann khusmann commented Feb 5, 2025

This is a proposal for a recipe for "category tables", which allows categorical variables to define categories via table resources. As indicated in the recipe, this approach as a number of advantages:

  • In case of a large number of categories it is often easier to maintain these in files, such as CSV files. This also keeps the datapackage.json file compact and readable for humans.

  • The data set in the category table resource can store additional information besides the 'value' and 'label'. For example, the categories could have descriptions or the categories could form a hierarchy.

  • It is also possible to store additional meta data in the category table resource. For example, it is possible to indicate the source, license, version and owner of the data resource. This is important for many 'official' categories lists where there can be many similar versions maintained by different organisations.

  • When different fields use the same categories they can all refer to the same category table resource. First, this allows to reuse of the categories. Second, by referring to the same data resource, the field descriptors can indicate that the categories are comparable between fields.

  • It is possible to refer to category table resources in other data packages. This makes it, for example, possible to create centrally maintained repositories of categories.

It was first proposed / discussed here: #888

The current PR was work-shopped in great detail by myself, @djvanderlaan, @fomcl and @pschumm, and we plan to have a live discussion / Q & A at our next community meeting (2025-02-06). In the meantime, we look forward to everyone's thoughts and feedback.

@roll roll added the docs label Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants