Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Requested feature]: separate file with ontology and taxonomy rules #66

Open
sneumann opened this issue Jun 13, 2023 · 6 comments
Open

Comments

@sneumann
Copy link

sneumann commented Jun 13, 2023

Summary

The Extended keywords for ontology and taxonomy validation is a quite unique feature in this validator, and requires the graphRestriction, isChildTermOf and isValidTaxonomy in the test_schema.json file. If the JSON-LD schema definition is not under my control,
I would like these semantic validations to be passed into biovalidator from a second file.

Motivation

I would like to allow better validation for schema.org and bioschemas metadata. Currently, there are types defined in JSON schema for e.g. https://schema.org/Dataset or https://bioschemas.org/profiles/MolecularEntity/0.5-RELEASE, which are developed in e.g. https://github.com/BioSchemas/specifications/tree/master/Dataset/ or https://github.com/BioSchemas/specifications/tree/master/MolecularEntity/.

These types allow various properties to have values as https://schema.org/DefinedTerm, and I'd expect the majority of these come from OBO ontologies you'd find on terminology services like OLS or NCBO.

However, I'd expect that schema.org wants to keep their types lean and won't allow people to add further validation into their schema definition. Also, for one schema type, there might be multiple profiles in different communities suggesting / requesting different restrictions on allowed ontology terms.

Example

An example would probably great, but I don't have one yet. I only found biovalidator at last weeks AllHands in Dublin :-)

Yours,
Steffen

@theisuru
Copy link
Collaborator

theisuru commented Jun 15, 2023

Hi Steffen,

Thanks for reaching out. After Dublin, we were also thinking about bioschemas and how we can extend support.

If I understood your use case correctly:
Since the second schema file is in your control, you can reference the bioschemas definition in your schema file. allof is one of keywords provided by JSON Schema, which you can use to validate against multiple schema.

        "allof": [
            {
                "$ref": "https://schema.org/TYPE/JSON_SCHEMA_REPRESENTATION"
            },
            {
                "type": "object",
                ..... secondary validation here
            }
        ]

@theisuru
Copy link
Collaborator

I just quickly glanced over the bioschemas dataset definition. I can see in $validation section, JSON Schema is being used. I will give it a try put up an example, how we can use biovalidator to validate bioschema (thinking about the biosamples type)

@M-casado
Copy link
Contributor

   "allof": [

Just a minor comment: JSON Schema keywords are case-sensitive, so it would be allOf instead of allof. Otherwise I don't think it'll work.

@sneumann
Copy link
Author

sneumann commented Jun 15, 2023

Hi,
Indeed, correct direction. Here is the promised example for a Defined Term:

{
    "@type": "DefinedTerm",
    "@id": "http://purl.obolibrary.org/obo/CHMO_0000230",
    "termCode": "CHMO_0000230",
    "name "alpha-particle spectroscopy",
    "identifier": "http://purl.obolibrary.org/obo/CHMO_0000230",
    "url": "http://purl.obolibrary.org/obo/CHMO_0000230",
    
    "inDefinedTermSet":
    {
        "@type": "DefinedTermSet",
        "@id": "http://purl.bioontology.org/ontology/CHMO"
        "name": "Chemical Methods Ontology",
        "identifier": "http://purl.bioontology.org/ontology/CHMO"
        "url": "https://github.com/rsc-ontologies/rsc-cmo"
    }
}

And what I want to validate for the above could be:

      "isChildTermOf": {
        "parentTerm": "http://purl.obolibrary.org/obo/CHMO_0000800",
        "ontologyId": "CHMO" ## Or "chmo" ?! Probably "Ontology ID" from https://www.ebi.ac.uk/ols/ontologies/chmo 
      }

Other examples for DefinedTerm are in e.g.
https://github.com/BioSchemas/specifications/blob/75b427325742f8e2d3b2c00299bec4f826c56f47/Course/examples/1.0-RELEASE/course.json#L11

Yours,
Steffen

@sneumann
Copy link
Author

Hi, we are currently trying to conjure more examples, and we will prepare more validation rules. It would be great to have some biovalidator functionality to play with at the ELIXIR and ELIXIR-DE Biohackathons.
Any progress, or did you hit a roadblock ? Thanks in advance, yours, Steffen

@theisuru
Copy link
Collaborator

theisuru commented Sep 4, 2023

Hi Steffen,

I have given a try with allOf at the top level and created a test case to aggregate a given schema and a custom schema, but this failed to validate correctly. I am not sure if it is because of wrong JSON Schema syntax or implementation problem. I will check this further and let you know.

This is an example that I have tried.

{
  "$id": "BioSchema/plus/customSchema/for/DefinedTerm",
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "description": "Use custom schema on top of BioSchema to validate BioSchema type",
  "type": "object",
  "$allOf": [
    {
      "$ref": "path/to/bioschemas/definedterm"
    },
    {
      "description": "My custom schema for DefinedTerm",
      "type": "object",
      "properties": {
        "termCode": {
          "type": "string",
          "isChildTermOf": {
            "parentTerm": "http://purl.obolibrary.org/obo/CHMO_0000800",
            "ontologyId": "chmo"
          }
        }
      },
      "required": [
        "termCode"
      ]
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants