A lightweight, serverless, schema registry to host schemas for product analytics events defined in a Reflekt project. Compatible with popular event producers and consumers.
Currently,
reflekt-registry
is focused on supporting Segment as a producer and consumer. Support for more producers and consumers is on the way.
You can deploy your own instance of reflekt-registry
to your AWS account in a few simple steps. We used an AWS Free Tier to host our registry, so you can too!
You will need:
- An Amazon Web Services (AWS) account.
- The AWS CLI installed and configured.
- A Reflekt project with schemas defined. See Reflekt for details.
- An AWS S3 bucket to host schemas from your Reflekt project. See here for instructions on creating an S3 bucket.
- Run
reflekt push
to push schemas from your Reflekt project to your S3 bucket. See these Reflekt docs for details. - To clone this repo,
git clone https://github.com/GClunies/reflekt-registry.git
. - To create a virtual environment.
- This repo contains a
pyproject.toml
file, so you can usepoetry
to create a virtual environment. - Or use
pip
withreflekt-registry/requirements.txt
and your favorite virtual environment manager.
- This repo contains a
Setup the following environment variables in reflekt-registry/.chalice/config.json
Variable | Description |
---|---|
REGISTRY_BUCKET |
The name of the S3 bucket that hosts schemas from your Reflekt project. |
REGISTRY_BUCKET_REGION |
The name of the region where the S3 bucket is located. |
SEGMENT_WRITE_KEY_VALID |
The write key for the Segment source where VALID events should be sent. |
SEGMENT_WRITE_KEY_INVALID |
The write key for the Segment source where INVALID events should be sent. |
DEBUG |
Set to "true" to enable debug logging to AWS CloudWatch. |
Inside the reflekt-registry/
directory, run the following to deploy your registry to AWS:
$ chalice deploy
Creating deployment package.
Updating policy for IAM role: reflekt-registry-dev
Updating lambda function: reflekt-registry-dev
Updating rest API
Resources deployed:
# NAME OF LAMBDA FUNCTION (reflekt-registry-dev)
- Lambda ARN: arn:aws:lambda:us-west-1:012345678987:function:reflekt-registry-dev
# API ENDPOINT TO BE USED IN SDK CLIENTS
- Rest API URL: https://foo77bar99.execute-api.us-west-1.amazonaws.com/api/
Chalice handles creating IAM roles, the Lambda function, and API Gateway endpoint for you. You can view these resources in the AWS Console.
After your first deploy only - you will need to grant your Lambda IAM role (reflekt-registry-dev
in example above) permission to access your S3 bucket. For instructions on how to add a policy to an IAM role, see here (read section To embed an inline policy for a user or role (console)). Add this policy to your IAM role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": [
"arn:aws:s3:::<YOUR_BUCKET_NAME>"
]
},
{
"Effect": "Allow",
"Action": "s3:*Object",
"Resource": "arn:aws:s3:::<YOUR_BUCKET_NAME>/*"
}
]
}
Most SDKs (e.g., Segment's analytics.js
, analytics-python
) support custom endpoints. Simply configure the SDK client to send events to your reflekt-registry
endpoint. The registry will validate the events against schemas in your S3 bucket and send them to the appropriate consumer.
import segment.analytics as segment_analytics
from datetime import datetime
segment_analytics.write_key = "abc123def456ghi789jkl012mno345pqr678stu901vwx234yz567"
# Specify custom endpoint + 'validate/<sdk_vendor>' (e.g., 'validate/segment')
# reflekt-registry knows how to handle events from SDK
segment_analytics.host = "https://foo77bar99.execute-api.us-west-1.amazonaws.com/api/validate/segment"
segment_analytics.track(
user_id="test_user",
event="Test Event",
timestamp=datetime.now(),
properties={
"schema_id": "segment/demo/Test_Event/1-0.json", # REQUIRED TO VALIDATE EVENT
"test_property": "test_value",
},
)
When sending events to a reflekt-registry
, you must include the schema_id
as a property in the event, set to the $id
of the schema in your Reflekt project (and S3 bucket) that the event should be validated against. For example, the schema_id
in the example above validates against this schema:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "segment/demo/Test_Event/1-0.json",
"description": "User viewed their shopping cart.",
"self": {
"vendor": "com.reflekt-ci",
"name": "Test Event",
"format": "jsonschema",
"version": "1-0",
"metadata": {
"code_owner": "Maura",
"product_owner": "Greg"
}
},
"type": "object",
"properties": {
"schema_id": {
"description": "The schema ID of the event.",
"const": "segment/demo/Test_Event/1-0.json"
},
"test_property": {
"type": "string",
"description": "This is a test property."
}
},
"required": [
"schema_id",
"test_property"
],
"additionalProperties": false
}
Valid events will be sent to the Segment consumer specified by SEGMENT_WRITE_KEY_VALID
. Invalid events will be sent to the Segment consumer specified by SEGMENT_WRITE_KEY_INVALID
.
Supported producers:
- Segment SDKs (e.g.
analytics.js
). See Segment docs for full SDK list.
👀 More producers to come! 👀
Supported consumers:
- Segment Sources. By default,
reflekt-registry
is configure to:- Send valid events to a Segment source with write key
SEGMENT_WRITE_KEY_VALID
- Send invalid events to a Segment source with write key
SEGMENT_WRITE_KEY_INVALID
- Send valid events to a Segment source with write key
👀 More consumers to come! 👀
reflekt-registry
is built on top of AWS and Chalice, making it easy to manage and deploy. We used an AWS Free Tier to host our registry, so you can too!
Behind the scenes, reflekt-registry
is composed of 3 AWS components:
- An S3 bucket to store schemas from a Reflekt project.
- An API Gateway endpoint that accepts events from producers.
- A Lambda function to validate events against schemas in the S3 bucket, routing them to the appropriate consumer.