Releases: Netflix/metaflow
2.9.10
Features
Introduce PagerDuty support for workflows running on Argo Workflows
With this release, Metaflow users can get events on PagerDuty when their workflows succeed or fail on Argo Workflows.
Setting up the notifications is similar to the existing Slack notifications support
- Follow these instructions on PagerDuty to set up an Events API V2 integration for your PagerDuty service
- You should be able to view the required integration key from the Events API V2 dropdown
- To enable notifications on PagerDuty when your Metaflow flow running on Argo Workflows succeeds or fails, deploy it using the --notify-on-error or --notify-on-success flags:
python flow.py argo-workflows create --notify-on-error --notify-on-success --notify-pager-duty-integration-key <pager-duty-integration-key>
- You can also set the following environment variable instead of specifying --notify-slack-webhook-url on the CLI everytime
METAFLOW_ARGO_WORKFLOWS_CREATE_NOTIFY_PAGER_DUTY_INTEGRATION_KEY=<pager-duty-integration-key>
- Next time the flow fails or succeeds, you should receive a new event on PagerDuty under Incidents (Flow failed) or Changes (Flow succeeded)
What's Changed
- fix: validate required configuration for Batch by @saikonen in #1483
- feature: add PagerDuty support for Argo Workflows by @saikonen in #1478
- Bump version to 2.9.10 by @saikonen in #1484
Full Changelog: 2.9.9...2.9.10
2.9.9
Improvements
Fixes a bug with the S3 operations affecting @conda
with some S3 providers
This release fixes a bug with the @conda
bootstrapping process. There was an issue with the ServerSideEncryption
support, that affected some of the S3 operations when using S3 providers that do not implement the encryption headers (for example MinIO).
Affected operations were all that handle multiple files at once:
get_many / get_all / get_recursive / put_many / info_many
which are used as part of bootstrapping a @conda
environment when executing remotely.
What's Changed
- fix: s3 op bug with ServerSideEncryption by @saikonen in #1479
- Bump version to 2.9.9 by @saikonen in #1480
Full Changelog: 2.9.8...2.9.9
2.9.8
Improvements
Fixes bug with Argo events parameters
This release fixes an issue with mapping values with spaces from the Argo events payload to flow parameters.
What's Changed
- sanitize / in secret names by @oavdeev in #1470
- chore: upgrade packages in cards plugin by @saikonen in #1473
- fix: Argo events parameters with spaces by @saikonen in #1475
- allow to customize env var name in
@secrets
by @oavdeev in #1474 - Bump version to 2.9.8 by @saikonen in #1476
Full Changelog: 2.9.7...2.9.8
2.9.7
Features
New commands for managing Argo Workflows through the CLI
This release includes new commands for managing workflows on Argo Workflows.
When needed, commands can be authorized by supplying a production token with --authorize
.
argo-workflows delete
A deployed workflow can be deleted through the CLI with
python flow.py argo-workflows delete
argo-workflows terminate
A run can be terminated mid-execution through the CLI with
python flow.py argo-workflows terminate RUN_ID
argo-workflows suspend/unsuspend
A run can be suspended temporarily with
python flow.py argo-workflows suspend RUN_ID
Note that the suspended flow will show up as failed on Metaflow-UI after a period, due to this also suspending the heartbeat process. Unsuspending will resume the flow and its status will show as running again. This can be done with
python flow.py argo-workflows unsuspend RUN_ID
Improvements
Faster Job completion checks for Kubernetes
Previously the status for tasks running on Kubernetes was determined through the pod status, which can take a while to update after the last container finishes. This release changes the status checks to use container statuses directly instead.
What's Changed
- Job completion check based on container status. by @shrinandj in #1369
- feature: add argo workflows suspend command by @saikonen in #1420
- feature: add delete and terminate for argo workflows by @saikonen in #1307
- Bump version to 2.9.7 by @saikonen in #1467
Full Changelog: 2.9.6...2.9.7
2.9.6
Features
AWS Step Function state machines can now be deleted through the CLI
This release introduces the command step-functions delete
for deleting state machines through the CLI.
For a regular flow
python flow.py step-functions delete
For another users project branch
Comment out the @project
decorator from the flow file, as we do not allow using --name
with projects.
python project_flow.py step-functions --name project_a.user.saikonen.ProjectFlow delete
For a production or custom branch flow
python project_flow.py --production step-functions delete
# or
python project_flow.py --branch custom step-functions delete
add --authorize PRODUCTION_TOKEN
to the command if you do not have the correct production token locally
Improvements
Fixes a bug with the S3 server side encryption feature with some S3 compliant providers.
This release fixes an issue with the S3 server side encryption support, where some S3 compliant providers do not respond with the expected encryption method in the payload. This bug specifically affected regular operation when using MinIO.
Fixes support for --with environment
in Airflow
Fixes a bug with the Airflow support for environment variables, where the env values set in the environment decorator could get overwritten.
What's Changed
- [bugfix] support
--with environment
in Airflow by @valayDave in #1459 - feat: sfn delete workflow (with prod token validation and messaging) by @stevenhoelscher, @saikonen in #1379
- [bugfix]: Optional check for encryption in s3op response by @valayDave in #1460
- Bump version to 2.9.6 by @saikonen in #1461
Full Changelog: 2.9.5...2.9.6
2.9.5
Features
Ability to choose server side encryption method for S3 uploads
There is now the possibility to choose which server side encryption method to use for S3 uploads by setting an environment variable METAFLOW_S3_SERVER_SIDE_ENCRYPTION
with an appropriate value, for example aws:kms
or AES256
Improvements
Fixes double quotes with Parameters on Argo Workflows
This release fixes an issue where using parameters on Argo Workflows caused the values to be unnecessarily quoted.
In case you need any assistance or have feedback for us, ping us at chat.metaflow.org or open a GitHub issue.
What's Changed
- feat: ability to use ServerSideEncryption for S3 uploads by @zendesk-klross in #1436
- fix quoting issue with argo by @savingoyal in #1456
- Bump version to 2.9.5 by @saikonen in #1457
New Contributors
- @zendesk-klross made their first contribution in #1436
Full Changelog: 2.9.4...2.9.5
2.9.4
Improvements
Fix using email addresses as usernames for Argo Workflows
Using an email address as the username when deploying with a @project
decorator to Argo Workflows is now possible. This release fixes an issue with some generated resources containing characters that are not permitted in names of Argo Workflow resources.
The secrets
decorator now supports assuming roles
This release adds the capability to assume specific roles when accessing secrets with the @secrets
decorator. The role for accessing a secret can be defined in the following ways
As a global default
By setting the METAFLOW_DEFAULT_SECRET_ROLE
environment variable, this role will be assumed when accessing any secret specified in the decorator.
As a global option in the decorator
This will assume the role secret-iam-role
for accessing all of the secrets in the sources list.
@secrets(
sources=["first-secret-source", "second-secret-source"],
role="secret-iam-role"
)
Or on a per secret basis
Assuming a different role based on the secret in question can be done as well
@secrets(
sources=[
{"type": "aws-secrets-manager", "id": "first-secret-source", "role": "first-secret-role"},
{"type": "aws-secrets-manager", "id": "second-secret-source", "role": "second-secret-role"}
]
)
In case you need any assistance or have feedback for us, ping us at chat.metaflow.org or open a GitHub issue.
What's Changed
- [OBP] support assuming roles to read secrets by @jackie-ob in #1418
- fix two docstrings that make API docs unhappy by @tuulos in #1441
- Properly validate a config value against the type of its default by @romain-intel in #1426
- Add additional options to @trigger and @trigger_on_finish by @romain-intel in #1398
- Wrap errors importing over the escape hatch as ImportError by @romain-intel in #1446
- Setting default time for files in code package to Dec 3, 2019 by @pjoshi30 in #1445
- Fix issue with handling of exceptions in the escape hatch by @romain-intel in #1444
- fix: support email in argo workflow names by @saikonen in #1448
- fix: email naming support for argo events by @saikonen in #1450
- bump version to 2.9.4 by @saikonen in #1451
Full Changelog: 2.9.3...2.9.4
2.9.3
Improvements
Ignore duplicate Metaflow Extensions packages
Duplicate Metaflow Extensions packages were not properly ignored in all cases. This release fixes this and will allow the loading of extensions even if they are present in duplicate form in your sys.path.
Fix package leaks for the environment escape
In some cases, packages from the outside environment (non Conda) could leak into the Conda environment when using the environment escape functionality. This release addresses this issue and ensures that no spurious packages are imported in the Conda environment.
In case you need any assistance or have feedback for us, ping us at chat.metaflow.org or open a GitHub issue.
What's Changed
- Update README.md by @savingoyal in #1431
- Add labels and fix argo by @dhpollack in #1360
- Update KubernetesDecorator class docstring to include persistent_volume_claims by @tfurmston in #1435
- Properly ignore a duplicate metaflow extension package in sys.path by @romain-intel in #1437
- Fix an issue with the escape hatch that could cause outside packages to "leak" by @romain-intel in #1439
- Bump version to 2.9.3 by @romain-intel in #1440
Full Changelog: 2.9.2...2.9.3
2.9.2
Features
Introduce support for image pull policy for @kubernetes
With this release, Metaflow users can specify image pull policy for their workloads through the @kubernetes decorator for Metaflow tasks.
@kubernetes(image='foo:tag', image_pull_policy='Always') # Allowed values are Always, IfNotPresent, Never
@step
def train(self):
...
...
If an image pull policy is not specified, and the tag for the container image is :latest or the tag for the container image is not specified, image pull policy is automatically set to Always.
If an image pull policy is not specified, and the tag for the container image is specified as a value that is not :latest, image pull policy is automatically set to IfNotPresent.
In case you need any assistance or have feedback for us, ping us at chat.metaflow.org or open a GitHub issue.
What's Changed
- introduce support for intra-cluster webhook url by @savingoyal in #1417
- add and improve docstrings for event-triggering by @tuulos in #1419
- Update readme by @emattia in #1397
- Update README.md by @savingoyal in #1422
- fix includefile for argo-workflows by @savingoyal in #1428
- feature: support configuring image pull policy for Kubernetes and Argo Workflows by @saikonen in #1427
- fix error message by @savingoyal in #1429
- Update to 2.9.2 by @savingoyal in #1430
Full Changelog: 2.9.1...2.9.2
2.9.1
Features
Introduce Slack notifications support for workflow running on Argo Workflows
With this release, Metaflow users can get notified on Slack when their workflows succeed or fail on Argo Workflows. Using this feature is quite straightforward
- Follow these instructions on Slack to set up incoming webhooks for your Slack workspace.
- You should now have a webhook URL that Slack provides. Here is an example webhook:
https://hooks.slack.com/services/T0XXXXXXXXX/B0XXXXXXXXX/qZXXXXXX
- To enable notifications on Slack when your Metaflow flow running on Argo Workflows succeeds or fails, deploy it using the --notify-on-error or --notify-on-success flags:
python flow.py argo-workflows create --notify-on-error --notify-on-success --notify-slack-webhook-url <slack-webhook-url>
- You can also set
METAFLOW_ARGO_WORKFLOWS_CREATE_NOTIFY_SLACK_WEBHOOK_URL=<slack-webhook-url>
in your environment instead of specifying --notify-slack-webhook-url on the CLI everytime. - Next time your workflow succeeds or fails on Argo Workflows, you will get a helpful notification on Slack.
FAQ
I deployed my workflow following the instructions above, but I haven’t received any notifications yet?
This issue may very well happen if you are running Kubernetes v1.24 or newer.
Since v1.24, Kubernetes stopped automatically creating a secret for every serviceAccount. Argo Workflows relies on the existence of these secrets to run lifecycle hooks responsible for the emission of these notifications.
Follow these steps for explicitly creating a secret for the service account that responsible for executing Argo Workflows steps:
- Run the following command, replacing service-account.name with the serviceAccount in your deployment. Also change the name of the secret to correctly reflect the name of the _serviceAccount _for which this secret is
cat <<EOF | kubectl apply -f - apiVersion: v1 kind: Secret metadata: name: default-sa-token #change according to the name of the sa annotations: kubernetes.io/service-account.name: default #replace with your sa type: kubernetes.io/service-account-token EOF
- Edit the serviceAccount object so as to add the name of the above secret in it. You can use kubectl edit for this. The serviceAccount yaml should look like the following
$ kubectl edit sa default -n mynamespace ... apiVersion: v1 kind: ServiceAccount metadata: creationTimestamp: "2023-05-05T20:58:58Z" name: default namespace: jobs-default resourceVersion: "6739507" uid: 4a708eff-d6ba-4dd8-80ee-8fb3c4c1e1c7 secrets: - name: default-sa-token # should match the secret above
- That’s it! Try executing your workflow again on Argo Workflows. If you are still running into issues, reach out to us!
In case you need any assistance or have feedback for us, ping us at chat.metaflow.org or open a GitHub issue.
What's Changed
- feature: add argo events environment variables to
metaflow configure kubernetes
by @saikonen in #1405 - handle whitespaces in argo events parameters by @savingoyal in #1408
- Add back comment for argo workflows by @savingoyal in #1409
- Support ArgoEvent object with @kubernetes by @savingoyal in #1410
- Print workflow template location as part of argo-workflows create by @savingoyal in #1411
Full Changelog: 2.8.6...2.9.0