This module provides a configurable set of AWS infrastructure for real-time analytics. While you can view the full set of AWS resources below, the data pipeline this module produces uses three core things: Kinesis, Firehose, and S3.
To reference this module from your project's terraform, add a module
block to your existing Terraform.
For example, this block would explicitly make use of the 1.3.0
version of this module.
Note that this example shows the minimally required arguments.
module "analytics" {
source = "../../"
project_name = "example"
log_group_name = aws_cloudwatch_log_group.project_log_group.name
kinesis_data_producers = {
aws = [aws_iam_role.producer.arn]
}
s3_data_consumers = {
aws = [aws_iam_role.consumer.arn]
}
}
See examples for more, or the variable and output reference below.
- Clone this repo:
git clone [email protected]:<org>/<repo>
- Prepare the environment
cd <repo>; scripts/prep.sh
- Source useful functions
. scripts/functions.sh
Function | Description |
---|---|
dev-start |
This launches the development container interactively, useful if you want to plan or apply the examples. |
dev-docs |
Uses terraform-docs to update documentation. |
dev-fmt |
Runs terraform fmt |
dev-lint |
Uses tflint to lint the terraform. |
dev-test |
Runs terraform test |
The only tooling required on a contributor machine is git
and docker
.
The following requirements are needed by this module:
The following providers are used by this module:
The following resources are used by this module:
- aws_cloudwatch_dashboard.this (resource)
- aws_cloudwatch_log_stream.firehose_logging_stream (resource)
- aws_cloudwatch_metric_alarm.firehose_incoming_records_high_alarm (resource)
- aws_cloudwatch_metric_alarm.firehose_incoming_records_low_alarm (resource)
- aws_iam_role.firehose_role (resource)
- aws_kinesis_firehose_delivery_stream.affiliate_firehose (resource)
- aws_kinesis_resource_policy.this (resource)
- aws_kinesis_stream.affiliate_stream (resource)
- aws_s3_bucket.firehose_target (resource)
- aws_s3_bucket_ownership_controls.this (resource)
- aws_s3_bucket_policy.this (resource)
- aws_s3_bucket_public_access_block.this (resource)
- aws_s3_bucket_server_side_encryption_configuration.encrypt (resource)
- aws_sns_topic.firehose-alarms (resource)
- aws_sns_topic_subscription.alarm_subscriptions (resource)
- random_string.random (resource)
- aws_iam_policy_document.bucket_policy (data source)
- aws_iam_policy_document.firehose_assume_role_policy (data source)
- aws_iam_policy_document.firehose_policy (data source)
- aws_iam_policy_document.stream_policy (data source)
- aws_kms_key.key (data source)
- aws_region.current (data source)
The following input variables are required:
Description: Map of principals allowed to put records into the Kinesis stream
Example:
kinesis_data_producers = {
aws = [
"arn:aws:iam::123456789012:user/JohnDoe",
"arn:aws:iam::123456789012:role/ec2_app/kinesis_role"
]
federated = ["arn:aws:iam::123456789012:saml-provider/okta"]
}
Type:
object({
aws = optional(set(string))
federated = optional(set(string))
})
Description: The name of the log group in which to create the firehose logging stream
Type: string
Description: Name of the project this module is being included in, used when naming resourcces
Type: string
Description: Map of principals allowed to read from the target s3 bucket
Example:
s3_dataconsumers = {
aws = [
"arn:aws:iam::123456789012:user/JohnDoe",
"arn:aws:iam::123456789012:role/ec2_app/kinesis_role"
]
federated = ["arn:aws:iam::123456789012:saml-provider/okta"]
}
Type:
object({
aws = optional(set(string))
federated = optional(set(string))
})
The following input variables are optional (have default values):
Description: The number of incoming records during a 5 minute period above which which the alarm should trigger
Type: number
Default: 1000
Description: The number of incoming records during a 5 minute period below which which the alarm should trigger
Type: number
Default: 50
Description: The set of email addresses to subscribe to alarm notifications
Each will receive an initial email asking to confirm the subscription
Type: set(string)
Default: []
Description: Buffer incoming data for the specified period of time, in seconds, before delivering it to the destination
Note that both this and buffering_size
may be set
Type: number
Default: 400
Description: Buffer incoming data to the specified size, in MBs, before delivering it to the destination
Note that both this and buffering_interval
may be set
Type: number
Default: 10
Description: The ARN of a lambda to use if data transformation in FireHose is desired
To control which revision is executed, ensure you specify it in the ARN
Type: string
Default: ""
Description: Supply a configuration object if you wish to use dynamic partitioning
If this is true, you must provide the jq_metadata_query, and s3_dynamic_prefix, s3_error_prefix
See the AWS blog post and documentation for more information
Of particular note:
When you use the Data Transformation feature in Firehose, the deaggregation will be applied before the Data Transformation. Data coming into Firehose will be processed in the following order: Deaggregation → Data Transformation via Lambda → Partitioning Keys.
Type:
object({
jq_metadata_query = optional(string, "")
s3_dynamic_prefix = optional(string, "")
s3_error_prefix = optional(string, "")
enable_newline_appending = optional(bool, false)
enable_record_deaggregation = optional(bool, false)
record_deaggregation_config = optional(object({
type = optional(string, "JSON")
delimiter = optional(string, "")
}), {})
})
Default: {}
Description: Whether to enable Firehose's dynamic partitioning
dynamic_partitioning_config
must be supplied if true
Changing this after creation will force destruction and recreation of the Firehose stream! |
---|
Type: bool
Default: false
Description: Current deployment environment name ('Dev', 'Test', or 'Prod')
Type: string
Default: "Dev"
Description: The KMS key to use for encryption
Note that this is required if either use_kms_for_kinesis
or use_kms_for_s3
are true
Note also that the key policy will need to permit the firehose role firehose_role_arn
to do certain actions:
kms:Decrypt
if used for Kinesiskms:GenerateDataKey
if used for S3
Type: string
Default: ""
Description: The number of hours records remain accessible in the Kinesis stream
Type: number
Default: 24
Description: The number of shards that the Kinesis stream will use, ignored if stream_mode is 'ON_DEMAND'
Type: number
Default: 1
Description: The capacity mode for the Kinesis stream ('PROVISIONED', or 'ON_DEMAND')
Type: string
Default: "ON_DEMAND"
Description: Use KMS key provided in kms_key_id
for encryption of data in kinesis
Type: bool
Default: false
Description: Use KMS key provided in kms_key_id
for S3 encryption (This is required for cross-account access)
Type: bool
Default: false
The following outputs are exported:
Description: The ARN of the Kinesis stream accepting source input for the analytics pipeline.
This will be needed to provide that appropriate permissions data producers' roles.
Description: Link to the created dashboard
Description: The ARN of the role Firehose will use.
This will be useful in resource policies such as:
- Data transformation Lambda
- KMS key policy
Description: The S3 bucket to which Firehose will write output
- CI/CD for the module
- Module publishing
- Tool improvement
- Get linting, testing, formatting, etc. into a pre-commit hook
- Get tidiness tools working on examples as well
- Add Auto-versioning
- Clean up of the handling of the configuration for dynamic partitioning
- Support for conversion of data formats and schema based on existing Glue data
- Allow for more granular access control to the S3 bucket
- Additional modularity
- Examples' module
source
should be updated to point to the source repo (to prevent dev confusion) - Examples should include additional comments
- Additional examples, particularly around dynamic partitioning
- Testing of the examples
- Switch over to using
python
tftest
, rather than the built in framework - Add tests that exercise more functionality, end to end
- Add tests verifying compatibility with additional tool versions