Improve data documentation #270

jesper-friis · 2024-12-13T22:44:27Z

Follow up dataset PR #256 with more functionality, including:

Add some kind of session with a user-specific information that should not be stored in the triplestore, including:
- credentials like username and password to external storages
- local root for relative paths
- default storage (e.g. redis) to work against during execution of the user workflow
Exactly how this session should work is still unclear. It will probably be a class with the functions documented in the
top of the dataset module as methods + some additional methods for dealing with the internal state.
Decide whether we want to move the protocol plugins from DLite to Tripper? It may also be possible to create a new package.
Update CONTEXT_URL to point to master branch (or to github pages if we copy the context there)
Create a good example/tutorial - addressed in PR Added documentation for datasets #280
Document the @context - addressed in PR Added documentation for datasets #280
Fix the __TODO__'s in the code - addressed in PR Dataset TODOs #279
Add interface for documenting using tables - addressed by PR New TableDoc class providing a table interface for data documentation #273
Check DCAT-AP and DCAT-AP-NO and add missing keywords to the default context. Update the documentation accordingly. The figure on DCAT-AP shows mandatory and recommended keywords.
Support custom resource types. These should be specified in a the multi-resource dict representation under the keyword custom_resources. The custom resource types should be stored in the KB, such that they later can be retrieved and reused.
Also store prefixes in the triplestore such that they easily can be fetch and reused. Really needed?
Add a command-line tool for populating and searching the triplestore - addressed in PR Command-line datadoc script #281
Rename the dataset submodule to datadoc
Rename save_datadoc() / load_datadoc() to store_yaml() / load_yaml()`
Rename save_dict() to store_dict()
Add documentation about the four type of roles/users: data provider, data user, data consumer and data producer. Currently we have focused on data provider and data user (no mapping transformations)
Add an interface to dataaccess for a data user to get the requested data as an instance of a specific datamodel serialised with a given plugin
Add API for data consumers and data producers
Update the naming used in the current dataset implementation to be consistent with the documentation. Especially check the following new naming conventions introduced in the documentation (which is newer than the original implementation):
- documented items are now called resources. Not entries or something similar...
- distinguish between single-resource and multi-resource dict representations. Make sure that all functions that takes a dict argument documents which of these dict-representations that is expected.
- standardise argument naming. Use different and consistent names for single-resource dicts and multi-resource dicts. What would be good names? single_resource and multi_resource are very long. Are singleres / multires acceptable?
Add tests for the datadoc tool.
Add test for the contains argument to search_iris()
Add --output option to datadoc find to save the output to file (since powershell adds unwanted newlines when redirecting stdout to a file)
Update search_iris() to support nested properties (using dots). Ex: add support for criterias={"distribution.mediaType": "text/csv"}.
Follow up PR Added support for username/password for sparqlwrapper #308 by:
- adding support for accessing username/password via either environment variables or keyring (if installed). Should probably be added to the __init__() function of the Triplestore class.
- adding support username/password in the datadoc tool. Is this really needed if the above point has been implemented?
Implement full CRUD functionality. Currently we have no Delete and Update has issues with nested properties represented with blank node individuals.

The text was updated successfully, but these errors were encountered:

jesper-friis changed the title ~~Improve datasets~~ Improve data documentation Jan 6, 2025

jesper-friis added the issue collection Collection of related issues label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve data documentation #270

Improve data documentation #270

jesper-friis commented Dec 13, 2024 •

edited

Loading

Improve data documentation #270

Improve data documentation #270

Comments

jesper-friis commented Dec 13, 2024 • edited Loading

jesper-friis commented Dec 13, 2024 •

edited

Loading