Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve data documentation #270

Open
6 of 24 tasks
jesper-friis opened this issue Dec 13, 2024 · 0 comments
Open
6 of 24 tasks

Improve data documentation #270

jesper-friis opened this issue Dec 13, 2024 · 0 comments
Labels
issue collection Collection of related issues

Comments

@jesper-friis
Copy link
Contributor

jesper-friis commented Dec 13, 2024

Follow up dataset PR #256 with more functionality, including:

  • Add some kind of session with a user-specific information that should not be stored in the triplestore, including:
    - credentials like username and password to external storages
    - local root for relative paths
    - default storage (e.g. redis) to work against during execution of the user workflow
    Exactly how this session should work is still unclear. It will probably be a class with the functions documented in the
    top of the dataset module as methods + some additional methods for dealing with the internal state.
  • Decide whether we want to move the protocol plugins from DLite to Tripper? It may also be possible to create a new package.
  • Update CONTEXT_URL to point to master branch (or to github pages if we copy the context there)
  • Create a good example/tutorial - addressed in PR Added documentation for datasets #280
  • Document the @context - addressed in PR Added documentation for datasets #280
  • Fix the __TODO__'s in the code - addressed in PR Dataset TODOs #279
  • Add interface for documenting using tables - addressed by PR New TableDoc class providing a table interface for data documentation #273
  • Check DCAT-AP and DCAT-AP-NO and add missing keywords to the default context. Update the documentation accordingly. The figure on DCAT-AP shows mandatory and recommended keywords.
  • Support custom resource types. These should be specified in a the multi-resource dict representation under the keyword custom_resources. The custom resource types should be stored in the KB, such that they later can be retrieved and reused.
  • Also store prefixes in the triplestore such that they easily can be fetch and reused. Really needed?
  • Add a command-line tool for populating and searching the triplestore - addressed in PR Command-line datadoc script #281
  • Rename the dataset submodule to datadoc
  • Rename save_datadoc() / load_datadoc() to store_yaml() / load_yaml()`
  • Rename save_dict() to store_dict()
  • Add documentation about the four type of roles/users: data provider, data user, data consumer and data producer. Currently we have focused on data provider and data user (no mapping transformations)
  • Add an interface to dataaccess for a data user to get the requested data as an instance of a specific datamodel serialised with a given plugin
  • Add API for data consumers and data producers
  • Update the naming used in the current dataset implementation to be consistent with the documentation. Especially check the following new naming conventions introduced in the documentation (which is newer than the original implementation):
    • documented items are now called resources. Not entries or something similar...
    • distinguish between single-resource and multi-resource dict representations. Make sure that all functions that takes a dict argument documents which of these dict-representations that is expected.
    • standardise argument naming. Use different and consistent names for single-resource dicts and multi-resource dicts. What would be good names? single_resource and multi_resource are very long. Are singleres / multires acceptable?
  • Add tests for the datadoc tool.
  • Add test for the contains argument to search_iris()
  • Add --output option to datadoc find to save the output to file (since powershell adds unwanted newlines when redirecting stdout to a file)
  • Update search_iris() to support nested properties (using dots). Ex: add support for criterias={"distribution.mediaType": "text/csv"}.
  • Follow up PR Added support for username/password for sparqlwrapper #308 by:
    • adding support for accessing username/password via either environment variables or keyring (if installed). Should probably be added to the __init__() function of the Triplestore class.
    • adding support username/password in the datadoc tool. Is this really needed if the above point has been implemented?
  • Implement full CRUD functionality. Currently we have no Delete and Update has issues with nested properties represented with blank node individuals.
@jesper-friis jesper-friis changed the title Improve datasets Improve data documentation Jan 6, 2025
@jesper-friis jesper-friis added the issue collection Collection of related issues label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
issue collection Collection of related issues
Projects
None yet
Development

No branches or pull requests

1 participant