You are looking at Jackalope, the open-source tool for enabling support for SNOMED CT-style post-coordinated expressions during ETL conversion of source EMR data into OMOP CDM format. Jackalope is developed and supported by Sciforce for the benefit of open science and the OHDSI community.
Scroll to the bottom of the document for API reference.
SNOMED CT is a large, complex, and highly structured ontology. It also features a syntax for expressing complex clinical using post-coordinated expressions, meaning coordinated use of multiple SNOMED CT concepts to represent a single complex idea.
The OMOP Common Data Model (CDM) is a standard for representing observational clinical data, also featuring a rich ecosystem of tools for research. SNOMED CT forms a centerpiece of the CDM, being used as the primary Standard coding system for many concept Domains.
However, the CDM does not support the use of SNOMED CT post-coordinated expressions. Or, rather, it used to not support them, until the Jackalope came along. Now, it is possible to build custom standard representation for any source concept by representing it as a SNOMED CT post-coordinated expression. Jackalope will build an extension of SNOMED CT hierarchy inside a deployed OMOP CDM instance, allowing for full use of OMOP CDM tools.
Jackalope is written in Python 3.10. It requires an installed Python interpreter and ANTLR4
runtime to run. It is recommended to use a virtual environment for running Jackalope. All
required dependencies are listed in requirements.txt
file.
Jackalope also requires source RF2 files for SNOMED CT. Using matching version of SNOMED CT US
is strongly recommended for compatibility, but not enforced. Check the VOCABULARY.VOCAULARY_VERSION
for vocabulary_id = 'SNOMED'
in your OMOP CDM instance to find out which version of SNOMED CT US to download.
Jackalope can be run on any system and device, but having at least 8 Gb of free RAM is recommended.
For installation, simply download or clone the repository and run pip install -r requirements.txt
.
Jackalope is abstracted from backend database through SQLAlchemy
library,
allowing for easy integration with any SQL-compatible database. Currently, PostgreSQL
and sqlite3 have been tested as backends, but no SQL code generated is platform-specific.
To connect to a database, you need to create a configuration file connection_properties.json
in
JSON format. Use connection_properties.json.example
as a template.
Field name | Description | Optional |
---|---|---|
"protocol" | Database protocol as specified by SQLAlchemy. Currently, only "postgresql" and "sqlite" are tested, but all are supported. | No |
"ssh_address" | Address of the SSH tunnel to use for connecting to the database. Leave empty if not using SSH tunnel. | Yes |
"ssh_port" | Port of the SSH tunnel to use for connecting to the database. Leave empty if not using SSH tunnel. | Yes |
"ssh_username" | Username for the SSH tunnel to use for connecting to the database. Leave empty if not using SSH tunnel. | Yes |
"ssh_password" | Password for the SSH tunnel to use for connecting to the database. If not specified, but other ssh fields are present, will be prompted in stdin . |
Yes |
"db_address" | Address of the database to connect to. Localhost is accepted. | No |
"db_port" | Port of the database to connect to. 5432 is default for Postgres. | No |
"db_user" | Username for the database to connect to. | No |
"db_password" | Password for the database to connect to. If not specified, attempt to connect will be made without password. | Yes |
"db_name" | Name of the database to connect to. | No |
WARNING: Jackalope will make changes to the database.
Jackalope has also a fallback option to work directly with Athena downloaded CSV files. In this case, changes will be represented as _DELTA.csv files, which can be used to update the database manually.
Jackalope is a server architecture application, which can be run in a terminal or as a background process.
It has a sepparate configuration file config.json
in JSON format. It should be good enough in default state
for the most usecases.
Its fields are:
host
- port to run the server on. Default is localhost.port
- port to run the server on. Default is 52252.snomed_path
- relative or absolute path to the SNOMED CT US RF2 files. Default issnomed
.vocabs_path
- relative or absolute path to the vocabulary files. Default isomop_vocab
. Will only be used incsv
backend.backend
- backend to use. Can becsv
orsql
. Default issql
.pickle_ont
- path where to store (and look for) the binary file of cached SNOMED Ontology.connection_properties
- path to the connection properties file. Default isconnection_properties.json
.rebuild_omop
- whether to reset all custom concepts in the OMOP CDM instance on connect. Default isfalse
.stateless
- whether to run the server in stateless mode. Default isfalse
. When set totrue
, will not make any changes to the database, and instead output the changes in JSON format tostdout
. This is useful for open use web-service implementation. Important: allconcept_id
andconcept_code
will be set to 0 ornull
in this mode, except for hash-generated. This option is ignored if filename is passed as a command line argument.
On the first run, Jackalope will pickle SNOMED CT ontology file for faster loading.
Jackalope can be run in two ways:
- As a tool which will evaluate expressions from a single CSV file and exit.
- As a background service, responding to REST API requests.
To run Jackalope as a tool, run $ python main.py %FILENAME%
in the root directory of the repository.
This will evaluate all expressions in the file and load the changes to the database.
%FILENAME%
must be a path to a UTF-8 encoded, comma-delimited, .CSV file. Example file can be found
in use_cases_icd.csv
in the root directory of the repository.
File should have thefollowing structure:
Column name | Description | Optional |
---|---|---|
vocabulary_id | Vocabulary ID of the source concept. Defaults to "SciForce". Must have an entry present in VOCABULARY |
Yes |
concept_code | Concept code of the source concept. | No |
concept_name | Concept name of the source concept. Will also be used for the new standard concept | No |
concept_class_id | Concept class ID of the source concept. Defaults to "Clinical Finding" | Yes |
domain_id | Domain ID of the source concept. Defaults to "Condition". For concept set purposes will not matter. | Yes |
post_coordinated_expression | Post-coordinated expression to be evaluated. Must be in Compositional Grammar syntax* | No |
*-Subexpressions are not currently supported.
Found parents and changes to the database will be printed to the console during the run.
To run Jackalope as a service, run $ python main.py
in the root directory of the repository.
This will start a server on the specified port and host, which will respond to REST API requests.
The server is not intended as a full replacement for an SQL client, and can only make changes in
line with the post-coordinated expression evaluation. rest_client/client.py
can be referred to as
an example of how to use the API.
Default endpoint is http://localhost:52252/jackalope/v1.0/
Returns the version of the server, concept_code hasher, and SNOMED US in both RF2 and database backend.
{
"app": "0.2-ALPHA",
"hasher": "1ALPHA",
"snomed_omop": "2021-09-01",
"snomed_rf2": "2021-09-01"
}
Returns information about a concept in the database having unique id equal to %CONCEPT_ID%
.
This is a convenience tool, it does not provide information about relationships, synonyms or hierarchy.
{
"concept_id": 123456,
"concept_code": "123456789",
"concept_name": "Some concept",
"vocabulary_id": "SNOMED",
"domain_id": "Condition",
"concept_class_id": "Clinical Finding",
"standard_concept": "S"
}
Requires a data package in JSON format. Adds a new custom vocabulary to the database. Must be run prior to adding new concept to said vocabulary.
{
"vocabulary_id": "SciForce",
"vocabulary_name": "SciForce Ukraine",
"vocabulary_reference": "https://sciforce.solutions/industries/medtech",
"vocabulary_version": "2021-09-01"
}
Requires a data package in JSON format. Adds a new source non-standard concept to the database.
{
"concept_code": "123456789",
"concept_name": "Some concept",
"vocabulary_id": "SciForce",
"concept_class_id": "Clinical Finding",
"domain_id": "Condition"
}
Returns a data package in JSON format. Contains the concept_id of the newly created concept. Value of concept_id will be in conventionally designated "manual" space (>2,000,000,000).
{
"concept_id": 2123456789
}
Requires a data package in JSON format. Evaluates a post-coordinated expressions, and adds relevant changes affecting the source concept. Will not deprecate or overwrite existing relations
{
"source_id": 2123456789,
"post_coordinated_expression": "74580009: {405816004 = 25723000, 260686004 = 129404003}"
}
Returns a data package in JSON format describing changes in the database.
{
"mapped_concepts": [123456]
}
Assigns concept_id in designated "synthetic" space (1,000,000,000<concept_id<2,000,000,000). Not yet an established convention. Any number of matching parents can be created.
{
"parent_concepts": [123456, 78910, 11121314],
"concept_id": 123456789,
"concept_code": "e7bc867c5d56acdf0fa0ff4b291d85137c375142234f580a68"
}
The value of concept_code is 25-bit BLAKE2b hash of the cannonical normal form of the post-coordinated expression. It is guaranteed to be shared with all semantically synonymous expressions.
Removes all mapping relationships, CONCEPT_ANCESTOR
entries and contents of STANDARD_CONCEPT
field for the concept.
Returns a boolean value indicating if any changes were made. It is recommended to access
this endpoint before building a new mapping for the concept, as it is not done automatically.
{
"changes_made": true
}
Removes all entries in all tables related to the vocabulary. Returns empty JSON object.