A Terraform module which deploys the Transformer Kinesis service on EC2. If you want to use a custom AMI for this deployment you will need to ensure it is based on top of Amazon Linux 2.
WARNING: Due to the ability to introduce large numbers of duplicates when scaling this application horizontally we lock the application to a single instance - if you need more throughput from this application you will need to "vertically" scale it by changing the instance_type
to a large node type and re-applying the module. By default this is a t3a.small
which should handle over 100 RPS without needing any scale-up.
This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.
If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id
variable to include a valid email address which we can reach you at.
To disable telemetry simply set variable telemetry_enabled = false
.
For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry
Transformer takes data from a enriched input stream and transforms this data and writes it into S3. There are two type of transformations. The first one is shredding and the second one is wide row. When shredding is activated, Transformer shreds event to custom entities in the event. When wide row is activated, it only converts event to JSON format.
module "enriched_stream" {
source = "snowplow-devops/kinesis-stream/aws"
name = var.stream_name
}
module "transformed_bucket" {
source = "snowplow-devops/s3-bucket/aws"
bucket_name = var.transformed_bucket
}
resource "aws_sqs_queue" "message_queue" {
content_based_deduplication = true
kms_master_key_id = "alias/aws/sqs"
# queue name should end with '.fifo'
name = var.queue_name
fifo_queue = true
}
module "transformer_kinesis" {
source = "snowplow-devops/transformer-kinesis-ec2/aws"
accept_limited_use_license = true
name = var.name
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
stream_name = module.enriched_stream.name
s3_bucket_name = var.transformed_bucket
s3_bucket_object_prefix = "transformed/good"
window_period_min = 10
sqs_queue_name = aws_sqs_queue.message_queue.name
ssh_key_name = var.key_name
ssh_ip_allowlist = ["0.0.0.0/0"]
}
Name | Version |
---|---|
terraform | >= 1.0.0 |
aws | >= 3.72.0 |
Name | Version |
---|---|
aws | >= 3.72.0 |
Name | Source | Version |
---|---|---|
instance_type_metrics | snowplow-devops/ec2-instance-type-metrics/aws | 0.1.2 |
kcl_autoscaling | snowplow-devops/dynamodb-autoscaling/aws | 0.2.0 |
service | snowplow-devops/service-ec2/aws | 0.2.1 |
telemetry | snowplow-devops/telemetry/snowplow | 0.5.0 |
Name | Type |
---|---|
aws_cloudwatch_log_group.log_group | resource |
aws_dynamodb_table.kcl | resource |
aws_iam_instance_profile.instance_profile | resource |
aws_iam_policy.iam_policy | resource |
aws_iam_role.iam_role | resource |
aws_iam_role_policy_attachment.policy_attachment | resource |
aws_security_group.sg | resource |
aws_security_group_rule.egress_tcp_443 | resource |
aws_security_group_rule.egress_tcp_80 | resource |
aws_security_group_rule.egress_udp_123 | resource |
aws_security_group_rule.ingress_tcp_22 | resource |
aws_caller_identity.current | data source |
aws_region.current | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
name | A name which will be pre-pended to the resources created | string |
n/a | yes |
s3_bucket_name | The name of the S3 bucket events will be loaded into | string |
n/a | yes |
s3_bucket_object_prefix | An optional prefix under which Snowplow data will be saved | string |
n/a | yes |
ssh_key_name | The name of the SSH key-pair to attach to all EC2 nodes deployed | string |
n/a | yes |
stream_name | The name of the input kinesis stream that the Transformer will pull data from | string |
n/a | yes |
subnet_ids | The list of subnets to deploy Transformer across | list(string) |
n/a | yes |
vpc_id | The VPC to deploy Transformer within | string |
n/a | yes |
window_period_min | Frequency to emit loading finished message - 5,10,15,20,30,60 etc minutes | number |
n/a | yes |
accept_limited_use_license | Acceptance of the SLULA terms (https://docs.snowplow.io/limited-use-license-1.0/) | bool |
false |
no |
amazon_linux_2_ami_id | The AMI ID to use which must be based of of Amazon Linux 2; by default the latest community version is used | string |
"" |
no |
app_version | Version of transformer kinesis | string |
"5.6.0" |
no |
associate_public_ip_address | Whether to assign a public ip address to this instance | bool |
true |
no |
cloudwatch_logs_enabled | Whether application logs should be reported to CloudWatch | bool |
true |
no |
cloudwatch_logs_retention_days | The length of time in days to retain logs for | number |
7 |
no |
config_override_b64 | App config uploaded as a base64 encoded blob. This variable facilitates dev flow, if config is incorrect this can break the deployment. | string |
"" |
no |
custom_iglu_resolvers | The custom Iglu Resolvers that will be used by Transformer | list(object({ |
[] |
no |
default_iglu_resolvers | The default Iglu Resolvers that will be used by Transformer | list(object({ |
[ |
no |
default_shred_format | Format used by default when format type is 'shred' (TSV or JSON) | string |
"TSV" |
no |
iam_permissions_boundary | The permissions boundary ARN to set on IAM roles created | string |
"" |
no |
initial_position | Where to start processing the input Kinesis Stream from (TRIM_HORIZON or LATEST) | string |
"TRIM_HORIZON" |
no |
instance_type | The instance type to use | string |
"t3a.small" |
no |
java_opts | Custom JAVA Options | string |
"-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75" |
no |
kcl_read_max_capacity | The maximum READ capacity for the KCL DynamoDB table | number |
10 |
no |
kcl_read_min_capacity | The minimum READ capacity for the KCL DynamoDB table | number |
1 |
no |
kcl_write_max_capacity | The maximum WRITE capacity for the KCL DynamoDB table | number |
10 |
no |
kcl_write_min_capacity | The minimum WRITE capacity for the KCL DynamoDB table | number |
1 |
no |
private_ecr_registry | The URL of an ECR registry that the sub-account has access to (e.g. '000000000000.dkr.ecr.cn-north-1.amazonaws.com.cn/') | string |
"" |
no |
schemas_json | List of schemas to get shredded as JSON | list(string) |
[] |
no |
schemas_skip | List of schemas to not get shredded (and thus not loaded) | list(string) |
[] |
no |
schemas_tsv | List of schemas to get shredded as TSV | list(string) |
[] |
no |
sns_topic_arn | The ARN of the SNS topic that Transformer will send the transforming complete message. Either sqs_queue_name or sns_topic_arn needs to be set |
string |
"" |
no |
sqs_queue_name | The name of the SQS queue that Transformer will send the transforming complete message. Either sqs_queue_name or sns_topic_arn needs to be set |
string |
"" |
no |
ssh_ip_allowlist | The list of CIDR ranges to allow SSH traffic from | list(any) |
[ |
no |
tags | The tags to append to this resource | map(string) |
{} |
no |
telemetry_enabled | Whether or not to send telemetry information back to Snowplow Analytics Ltd | bool |
true |
no |
transformation_type | Type of the transformation (shred or widerow) | string |
"shred" |
no |
transformer_compression | Transformer output compression, GZIP or NONE | string |
"GZIP" |
no |
user_provided_id | An optional unique identifier to identify the telemetry events emitted by this stack | string |
"" |
no |
widerow_file_format | The output file_format from the widerow transformation_type selected (json or parquet) | string |
"json" |
no |
Name | Description |
---|---|
asg_id | ID of the ASG |
asg_name | Name of the ASG |
sg_id | ID of the security group attached to the Transformer servers |
Copyright 2021-present Snowplow Analytics Ltd.
Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)