-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support AWS EventBridge #4188
base: main
Are you sure you want to change the base?
Changes from all commits
d309e3d
9c54c68
cabcb69
9cff596
e41af5e
c0fe919
6649aa9
bd15fc9
0c342d6
55ccf5d
1041336
d1ad024
8ef42b5
2f8eebc
2f509fe
36a224f
49db66c
3b5b3d5
0c16c55
0c8e93f
5f1f0fe
27c92a9
6ac6580
c7e7d3b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -6,7 +6,7 @@ To be able to support a number of use-cases, the module has quite a lot of confi | |||||
|
||||||
- Org vs Repo level. You can configure the module to connect the runners in GitHub on an org level and share the runners in your org, or set the runners on repo level and the module will install the runner to the repo. There can be multiple repos but runners are not shared between repos. | ||||||
- Multi-Runner module. This modules allows you to create multiple runner configurations with a single webhook and single GitHub App to simplify deployment of different types of runners. Check the detailed module [documentation](modules/public/multi-runner.md) for more information or checkout the [multi-runner example](examples/multi-runner.md). | ||||||
- Workflow job event. You can configure the webhook in GitHub to send workflow job events to the webhook. Workflow job events were introduced by GitHub in September 2021 and are designed to support scalable runners. We advise using the workflow job event when possible. | ||||||
- Webhook mode, the module can be deployed in the mode `direct` and `eventbridge` (Experimental). The `direct` mode is the default and will directly distribute to SQS for the scale-up lambda. The `eventbridge` mode will publish the event to an event bus with a target rule the events are sent to a dispatch lambda. The dispatch lambda will send the event to the SQS queue. The `eventbridge` mode is useful when you want to have more control over the events and potentially filter them. The `eventbridge` mode is disabled by default. We expect thhe `eventbridge` mode will be the future direction to build a data lake, build metrics, acto on `workflow_job` job started events, etc. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- Linux vs Windows. You can configure the OS types linux and win. Linux will be used by default. | ||||||
- Re-use vs Ephemeral. By default runners are re-used, until detected idle. Once idle they will be removed from the pool. To improve security we are introducing ephemeral runners. Those runners are only used for one job. Ephemeral runners only work in combination with the workflow job event. For ephemeral runners the lambda requests a JIT (just in time) configuration via the GitHub API to register the runner. [JIT configuration](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-just-in-time-runners) is limited to ephemeral runners (and currently not supported by GHES). For non-ephemeral runners, a registration token is always requested. In both cases the configuration is made available to the instance via the same SSM parameter. To disable JIT configuration for ephemeral runners set `enable_jit_config` to `false`. We also suggest using a pre-build AMI to improve the start time of jobs for ephemeral runners. | ||||||
- Job retry (**Beta**). By default the scale-up lambda will discard the message when it is handled. Meaning in the ephemeral use-case an instance is created. The created runner will ask GitHub for a job, no guarantee it will run the job for which it was scaling. Result could be that with small system hick-up the job is keeping waiting for a runner. Enable a pool (org runners) is one option to avoid this problem. Another option is to enable the job retry function. Which will retry the job after a delay for a configured number of times. | ||||||
|
@@ -259,8 +259,83 @@ Below an example of the the log messages created. | |||||
} | ||||||
``` | ||||||
|
||||||
### EventBridge | ||||||
|
||||||
The module can be deployed in the mode `eventbridge` (Experimental). The `eventbridge` mode will publish the event to an event bus with a target rule the events are sent to a dispatch lambda. The dispatch lambda will send the event to the SQS queue. The `eventbridge` mode is disabled by default. We expect thhe `eventbridge` mode will be the future direction to build a data lake, build metrics, acto on `workflow_job` job started events, etc. | ||||||
|
||||||
Example to use the EventBridge: | ||||||
|
||||||
```hcl | ||||||
module "runners" { | ||||||
source = "philips-labs/github-runners/aws" | ||||||
... | ||||||
eventbridge = { | ||||||
enable = true | ||||||
} | ||||||
... | ||||||
} | ||||||
locals { | ||||||
event_bus_name = module.runners.webhook.eventbridge.event_bus.name | ||||||
} | ||||||
resource "aws_cloudwatch_event_rule" "example" { | ||||||
name = "${local.prefix}-github-events-all" | ||||||
description = "Caputure all GitHub events" | ||||||
event_bus_name = local.event_bus_name | ||||||
event_pattern = <<EOF | ||||||
{ | ||||||
"source": [{ | ||||||
"prefix": "github" | ||||||
}] | ||||||
} | ||||||
EOF | ||||||
} | ||||||
resource "aws_cloudwatch_event_target" "main" { | ||||||
rule = aws_cloudwatch_event_rule.example.name | ||||||
arn = <arn of target> | ||||||
event_bus_name = local.event_bus_name | ||||||
role_arn = aws_iam_role.event_rule_firehose_role.arn | ||||||
} | ||||||
data "aws_iam_policy_document" "event_rule_firehose_role" { | ||||||
statement { | ||||||
actions = ["sts:AssumeRole"] | ||||||
principals { | ||||||
type = "Service" | ||||||
identifiers = ["events.amazonaws.com"] | ||||||
} | ||||||
} | ||||||
} | ||||||
resource "aws_iam_role" "event_rule_role" { | ||||||
name = "${local.prefix}-eventbridge-github-rule" | ||||||
assume_role_policy = data.aws_iam_policy_document.event_rule_firehose_role.json | ||||||
} | ||||||
data aws_iam_policy_document firehose_stream { | ||||||
statement { | ||||||
INSER_YOUR_POIICY_HERE_TO_ACCESS_THE_TARGET | ||||||
} | ||||||
} | ||||||
resource "aws_iam_role_policy" "event_rule_firehose_role" { | ||||||
name = "target-event-rule-firehose" | ||||||
role = aws_iam_role.event_rule_firehose_role.name | ||||||
policy = data.aws_iam_policy_document.firehose_stream.json | ||||||
} | ||||||
``` | ||||||
|
||||||
### Queue to publish workflow job events | ||||||
|
||||||
!!! warning "Deprecated | ||||||
|
||||||
This fearure will be removed since we introducing the EventBridge. Same functinallity can be implemented by adding a rule to the EventBridge to forward `workflow_job` events to the SQS queue. | ||||||
|
||||||
This queue is an experimental feature to allow you to receive a copy of the wokflow_jobs events sent by the GitHub App. This can be used to calculate a matrix or monitor the system. | ||||||
|
||||||
To enable the feature set `enable_workflow_job_events_queue = true`. Be aware though, this feature is experimental! | ||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -31,7 +31,7 @@ The diagram below shows the architecture of the module, groups are indicating th | |||||
|
||||||
### Webhook | ||||||
|
||||||
The moment a GitHub action workflow requiring a `self-hosted` runner is triggered, GitHub will try to find a runner which can execute the workload. See [additional notes](additional_notes.md) for how the selection is made. This module reacts to GitHub's [`workflow_job` event](https://docs.github.com/en/free-pro-team@latest/developers/webhooks-and-events/webhook-events-and-payloads#workflow_job) for the triggered workflow and creates a new runner if necessary. | ||||||
The moment a GitHub action workflow requiring a `self-hosted` runner is triggered, GitHub will try to find a runner which can execute the workload. See [additional notes](additional_notes.md) for how the selection is made. The module can be deployed in two modes. One mode called `direct`, after accepting the [`workflow_job` event](https://docs.github.com/en/free-pro-team@latest/developers/webhooks-and-events/webhook-events-and-payloads#workflow_job) event the module will dispatch the event to a SQS queue on which the scale-up function will act. The second mode, `eventbridge` will funnel events via the AWS EventBridge. the EventBridge enables act on other events then only the `workflow_job` event with status `queued`. besides that the EventBridge suppors replay functionality. For future exenstions to act on events or create a data lake we will relay on the EventBridge. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
For receiving the `workflow_job` event by the webhook (lambda), a webhook needs to be created in GitHub. The same app as for API calls can be used to create the webhook. Or a dedicated webhook can be defined. | ||||||
|
||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -77,6 +77,14 @@ module "runners" { | |
id = var.github_app.id | ||
webhook_secret = random_id.random.hex | ||
} | ||
|
||
# Deploy webhook using the EventBridge | ||
eventbridge = { | ||
enable = true | ||
# adjust the allow events to only allow specific events, like workflow_job | ||
accept_events = ["workflow_job"] | ||
} | ||
|
||
# enable this section for tracing | ||
# tracing_config = { | ||
# mode = "Active" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the examples lots of commented TF, this will leave room for mistakes, we tend to forget options that are deleted or have been renamed etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You ar right, but feel free to suggest another way. Would be great to do it different. But the examples are also a great way to check a PR quickly. |
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.