-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support AWS EventBridge #4188
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading the configuration was quite a mess. Replaced by a singleton to load configurations per deployed entrypoint. For each lambda handler function a different ConfigABC claass is created.
} | ||
} | ||
|
||
function validateEventBusName(config: ConfigWebhookEventBridge): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation is in general limited, but input is controlled via internal terraform modules.
8f71cbc
to
0c342d6
Compare
} | ||
|
||
private loadProperty(propertyName: keyof this, value: string) { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could not find a better way to parse an object or list in to a typed property. In case an error the input is just parsed as string.
Not perfect, but also the terraform modules are typed.
|
||
export async function githubWebhook(event: APIGatewayEvent, context: Context): Promise<Response> { | ||
export async function directWebhook(event: APIGatewayEvent, context: Context): Promise<Response> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the configuration object is changed for the old flow (webhook only). This to accomodate other configurations
|
||
const logger = createChildLogger('handler'); | ||
|
||
export async function dispatch(event: WorkflowJobEvent, eventType: string, config: Config): Promise<Response> { | ||
export async function dispatch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only real change here is the configuration boject
await verifySignature(headers, body, config.webhookSecret); | ||
|
||
const eventType = headers['x-github-event'] as string; | ||
checkEventIsSupported(eventType, config.allowedEvents); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjust terraform module to expose configuration. This allow to limit the events to accept on the bus.
With is part of terraform code all GitHub events can put in a bucket for further analysis. Add the code in a source: https://040code.github.io/2023/01/06/github-event-aws-eventbridge locals {
prefix = var.environment
event_bus_name = module.runners.webhook.eventbridge.event_bus.name
}
resource "aws_cloudwatch_event_rule" "all" {
name = "${local.prefix}-github-events-all"
description = "Caputure all GitHub events"
event_bus_name = local.event_bus_name
event_pattern = <<EOF
{
"source": [{
"prefix": "github"
}]
}
EOF
}
resource "aws_cloudwatch_event_target" "main" {
rule = aws_cloudwatch_event_rule.all.name
arn = aws_kinesis_firehose_delivery_stream.extended_s3_stream.arn
event_bus_name = local.event_bus_name
role_arn = aws_iam_role.event_rule_firehose_role.arn
}
data "aws_iam_policy_document" "event_rule_firehose_role" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["events.amazonaws.com"]
}
}
}
resource "aws_iam_role" "event_rule_firehose_role" {
name = "${local.prefix}-eventbridge-github-rule"
assume_role_policy = data.aws_iam_policy_document.event_rule_firehose_role.json
}
data aws_iam_policy_document firehose_stream {
statement {
actions = [
"firehose:DeleteDeliveryStream",
"firehose:PutRecord",
"firehose:PutRecordBatch",
"firehose:UpdateDestination"
]
resources = [aws_kinesis_firehose_delivery_stream.extended_s3_stream.arn]
}
}
resource "aws_iam_role_policy" "event_rule_firehose_role" {
name = "target-event-rule-firehose"
role = aws_iam_role.event_rule_firehose_role.name
policy = data.aws_iam_policy_document.firehose_stream.json
}
resource "random_uuid" "firehose_stream" {}
resource "aws_s3_bucket" "firehose_stream" {
bucket = "${local.prefix}-${random_uuid.firehose_stream.result}"
force_destroy = true
}
# resource "aws_s3_bucket_acl" "firehose_stream" {
# bucket = aws_s3_bucket.firehose_stream.id
# acl = "private"
# }
data "aws_iam_policy_document" "firehose_assume_role_policy" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["firehose.amazonaws.com"]
}
}
}
resource "aws_iam_role" "firehose_role" {
name = "${local.prefix}-firehose-role"
assume_role_policy = data.aws_iam_policy_document.firehose_assume_role_policy.json
}
data "aws_iam_policy_document" "firehose_s3" {
statement {
actions = [
"s3:AbortMultipartUpload",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:PutObject"
]
resources = [aws_s3_bucket.firehose_stream.arn, "${aws_s3_bucket.firehose_stream.arn}/*"]
}
}
resource "aws_iam_role_policy" "firehose_s3" {
name = "${local.prefix}-s3"
role = aws_iam_role.firehose_role.name
policy = data.aws_iam_policy_document.firehose_s3.json
}
data "aws_iam_policy_document" "firehose_log" {
statement {
actions = [
"logs:PutLogEvents",
"logs:CreateLogStream",
"logs:CreateLogGroup"
]
resources = [aws_cloudwatch_log_group.firehose_delivery_stream.arn]
}
}
resource "aws_iam_role_policy" "firehose_log" {
name = "${local.prefix}-log"
role = aws_iam_role.firehose_role.name
policy = data.aws_iam_policy_document.firehose_log.json
}
resource "aws_kinesis_firehose_delivery_stream" "extended_s3_stream" {
name = "${local.prefix}-stream"
destination = "extended_s3"
extended_s3_configuration {
role_arn = aws_iam_role.firehose_role.arn
bucket_arn = aws_s3_bucket.firehose_stream.arn
}
}
resource "aws_cloudwatch_log_group" "firehose_delivery_stream" {
name = "/aws/kinesisfirehose/${aws_kinesis_firehose_delivery_stream.extended_s3_stream.name}"
retention_in_days = 14
} |
webhook_mode = "eventbridge" | ||
# adjust the allow events to only allow specific events, like workflow_job | ||
# eventbridge_allowed_events = ['workflow_job'] | ||
|
||
# enable this section for tracing | ||
# tracing_config = { | ||
# mode = "Active" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the examples lots of commented TF, this will leave room for mistakes, we tend to forget options that are deleted or have been renamed etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You ar right, but feel free to suggest another way. Would be great to do it different. But the examples are also a great way to check a PR quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not see big problems, only some styling/spelling issues.
@mpas can you check 'my' last commit. Updated the top level object to configure the feature. And marked it beta in the docs. |
Co-authored-by: Marco Pas <marco.pasopas@gmail.com>
Co-authored-by: Marco Pas <marco.pasopas@gmail.com>
@@ -6,7 +6,7 @@ To be able to support a number of use-cases, the module has quite a lot of confi | |||
|
|||
- Org vs Repo level. You can configure the module to connect the runners in GitHub on an org level and share the runners in your org, or set the runners on repo level and the module will install the runner to the repo. There can be multiple repos but runners are not shared between repos. | |||
- Multi-Runner module. This modules allows you to create multiple runner configurations with a single webhook and single GitHub App to simplify deployment of different types of runners. Check the detailed module [documentation](modules/public/multi-runner.md) for more information or checkout the [multi-runner example](examples/multi-runner.md). | |||
- Workflow job event. You can configure the webhook in GitHub to send workflow job events to the webhook. Workflow job events were introduced by GitHub in September 2021 and are designed to support scalable runners. We advise using the workflow job event when possible. | |||
- Webhook mode, the module can be deployed in the mode `direct` and `eventbridge` (Experimental). The `direct` mode is the default and will directly distribute to SQS for the scale-up lambda. The `eventbridge` mode will publish the event to an event bus with a target rule the events are sent to a dispatch lambda. The dispatch lambda will send the event to the SQS queue. The `eventbridge` mode is useful when you want to have more control over the events and potentially filter them. The `eventbridge` mode is disabled by default. We expect thhe `eventbridge` mode will be the future direction to build a data lake, build metrics, acto on `workflow_job` job started events, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Webhook mode, the module can be deployed in the mode `direct` and `eventbridge` (Experimental). The `direct` mode is the default and will directly distribute to SQS for the scale-up lambda. The `eventbridge` mode will publish the event to an event bus with a target rule the events are sent to a dispatch lambda. The dispatch lambda will send the event to the SQS queue. The `eventbridge` mode is useful when you want to have more control over the events and potentially filter them. The `eventbridge` mode is disabled by default. We expect thhe `eventbridge` mode will be the future direction to build a data lake, build metrics, acto on `workflow_job` job started events, etc. | |
- Webhook mode, the module can be deployed in the mode `direct` and `eventbridge` (Experimental). The `direct` mode is the default and will directly distribute to SQS for the scale-up lambda. The `eventbridge` mode will publish the event to an event bus with a target rule the events are sent to a dispatch lambda. The dispatch lambda will send the event to the SQS queue. The `eventbridge` mode is useful when you want to have more control over the events and potentially filter them. The `eventbridge` mode is disabled by default. We expect thhe `eventbridge` mode will be the future direction to build a data lake, build metrics, act on `workflow_job` job started events, etc. |
@@ -31,7 +31,7 @@ The diagram below shows the architecture of the module, groups are indicating th | |||
|
|||
### Webhook | |||
|
|||
The moment a GitHub action workflow requiring a `self-hosted` runner is triggered, GitHub will try to find a runner which can execute the workload. See [additional notes](additional_notes.md) for how the selection is made. This module reacts to GitHub's [`workflow_job` event](https://docs.github.com/en/free-pro-team@latest/developers/webhooks-and-events/webhook-events-and-payloads#workflow_job) for the triggered workflow and creates a new runner if necessary. | |||
The moment a GitHub action workflow requiring a `self-hosted` runner is triggered, GitHub will try to find a runner which can execute the workload. See [additional notes](additional_notes.md) for how the selection is made. The module can be deployed in two modes. One mode called `direct`, after accepting the [`workflow_job` event](https://docs.github.com/en/free-pro-team@latest/developers/webhooks-and-events/webhook-events-and-payloads#workflow_job) event the module will dispatch the event to a SQS queue on which the scale-up function will act. The second mode, `eventbridge` will funnel events via the AWS EventBridge. the EventBridge enables act on other events then only the `workflow_job` event with status `queued`. besides that the EventBridge suppors replay functionality. For future exenstions to act on events or create a data lake we will relay on the EventBridge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The moment a GitHub action workflow requiring a `self-hosted` runner is triggered, GitHub will try to find a runner which can execute the workload. See [additional notes](additional_notes.md) for how the selection is made. The module can be deployed in two modes. One mode called `direct`, after accepting the [`workflow_job` event](https://docs.github.com/en/free-pro-team@latest/developers/webhooks-and-events/webhook-events-and-payloads#workflow_job) event the module will dispatch the event to a SQS queue on which the scale-up function will act. The second mode, `eventbridge` will funnel events via the AWS EventBridge. the EventBridge enables act on other events then only the `workflow_job` event with status `queued`. besides that the EventBridge suppors replay functionality. For future exenstions to act on events or create a data lake we will relay on the EventBridge. | |
The moment a GitHub action workflow requiring a `self-hosted` runner is triggered, GitHub will try to find a runner which can execute the workload. See [additional notes](additional_notes.md) for how the selection is made. The module can be deployed in two modes. One mode called `direct`, after accepting the [`workflow_job` event](https://docs.github.com/en/free-pro-team@latest/developers/webhooks-and-events/webhook-events-and-payloads#workflow_job) event the module will dispatch the event to a SQS queue on which the scale-up function will act. The second mode, `eventbridge` will funnel events via the AWS EventBridge. the EventBridge enables act on other events then only the `workflow_job` event with status `queued`. besides that the EventBridge supports replay functionality. For future extensions to act on events or create a data lake we will relay on the EventBridge. |
@@ -132,6 +132,7 @@ module "multi-runner" { | |||
| <a name="input_enable_managed_runner_security_group"></a> [enable\_managed\_runner\_security\_group](#input\_enable\_managed\_runner\_security\_group) | Enabling the default managed security group creation. Unmanaged security groups can be specified via `runner_additional_security_group_ids`. | `bool` | `true` | no | | |||
| <a name="input_enable_metrics_control_plane"></a> [enable\_metrics\_control\_plane](#input\_enable\_metrics\_control\_plane) | (Experimental) Enable or disable the metrics for the module. Feature can change or renamed without a major release. | `bool` | `false` | no | | |||
| <a name="input_enable_workflow_job_events_queue"></a> [enable\_workflow\_job\_events\_queue](#input\_enable\_workflow\_job\_events\_queue) | Enabling this experimental feature will create a secondory sqs queue to which a copy of the workflow\_job event will be delivered. | `bool` | `false` | no | | |||
| <a name="input_eventbridge"></a> [eventbridge](#input\_eventbridge) | Enable the use of EventBridge by the module. By enable this feature events will be putted on the EventBridge bhy the webhook instead of directly dispatchting to queues for sacling. | <pre>object({<br/> enable = optional(bool, false)<br/> accept_events = optional(list(string), [])<br/> })</pre> | `{}` | no | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| <a name="input_eventbridge"></a> [eventbridge](#input\_eventbridge) | Enable the use of EventBridge by the module. By enable this feature events will be putted on the EventBridge bhy the webhook instead of directly dispatchting to queues for sacling. | <pre>object({<br/> enable = optional(bool, false)<br/> accept_events = optional(list(string), [])<br/> })</pre> | `{}` | no | | |
| <a name="input_eventbridge"></a> [eventbridge](#input\_eventbridge) | Enable the use of EventBridge by the module. By enable this feature events will be putted on the EventBridge by the webhook instead of directly dispatching to queues for scaling. | <pre>object({<br/> enable = optional(bool, false)<br/> accept_events = optional(list(string), [])<br/> })</pre> | `{}` | no | |
{ | ||
"Effect": "Allow", | ||
"Action": ["events:PutEvents"], | ||
"Resource": ${resource_arns} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Resource": ${resource_arns} | |
"Resource": "${resource_arns}" |
"${github_app_webhook_secret_arn}", | ||
"${parameter_runner_matcher_config_arn}" | ||
] | ||
"Resource": ${resource_arns} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Resource": ${resource_arns} | |
"Resource": "${resource_arns}" |
sid = "AllowXRay" | ||
} | ||
} | ||
# data "aws_iam_policy_document" "lambda_xray" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be deleted? Its all commented
@@ -157,6 +157,7 @@ Talk to the forestkeepers in the `runners-channel` on Slack. | |||
| <a name="input_enable_user_data_debug_logging_runner"></a> [enable\_user\_data\_debug\_logging\_runner](#input\_enable\_user\_data\_debug\_logging\_runner) | Option to enable debug logging for user-data, this logs all secrets as well. | `bool` | `false` | no | | |||
| <a name="input_enable_userdata"></a> [enable\_userdata](#input\_enable\_userdata) | Should the userdata script be enabled for the runner. Set this to false if you are using your own prebuilt AMI. | `bool` | `true` | no | | |||
| <a name="input_enable_workflow_job_events_queue"></a> [enable\_workflow\_job\_events\_queue](#input\_enable\_workflow\_job\_events\_queue) | Enabling this experimental feature will create a secondory sqs queue to which a copy of the workflow\_job event will be delivered. | `bool` | `false` | no | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| <a name="input_enable_workflow_job_events_queue"></a> [enable\_workflow\_job\_events\_queue](#input\_enable\_workflow\_job\_events\_queue) | Enabling this experimental feature will create a secondory sqs queue to which a copy of the workflow\_job event will be delivered. | `bool` | `false` | no | | |
| <a name="input_enable_workflow_job_events_queue"></a> [enable\_workflow\_job\_events\_queue](#input\_enable\_workflow\_job\_events\_queue) | Enabling this experimental feature will create a secondary sqs queue to which a copy of the workflow\_job event will be delivered. | `bool` | `false` | no | |
Description
This PR introduces the AWS EventBridge. The EventBridge can be enabled with the options
webhook_mode
, which can be set to eitherdirect
oreventbridge
. In the direct mode the olds way of handling is still applied. When setting the mode toeventbridge
events will publshed on the AWS EventBridge, which is not limited only to the eventworkflow_job
with statuesqueued
via a target rule events relevant for scaling a snet to the dispatcher lambda to distrute to a SQS queue for sacling.Todo
MIgration directions
The change is backwards compatible but will recreate resources managed by the internal module webhook. The only resource contianing data is the CloudWatch LogGroup. To retain the log geroup you can run a terraform state move. Or add a
moved
block to your deployemnt.Migrating to this version
With module defaults or
webhook_mode = direct
Or with
webhook_mode = eventbridge
When switching between direct and event bridge
When enable mode
eventbridge
Or vice versa for moving from
eventbride
towebhook