Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make argo events sensor namespace configurable #1463

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

dhpollack
Copy link
Contributor

  • Add ARGO_EVENTS_SENSOR_NAMESPACE to config (defaults to KUBERNETES_NAMESPACE )
  • Add sensor_namespace parameter to argo client
  • Inititalize argo client with sensor namespace parameter when creating sensor

Resolves #1455

@dhpollack
Copy link
Contributor Author

In our set up at work, we don't have event buses in the namespaces where the workflows run. Thus the "normal" behavior is that if one tries to deploy a workflow (argo-workflows create) with the @trigger or @triggered_on_finish decorators then I receive an error from kubernetes that the eventbus does not exist.

With this PR, I was able to test specifying a namespace and not specifying a namespace. When I did not specify a namespace, I got the same behavior as the master branch (i.e. kubernetes error). When I specified the argo events namespace, the argo workflow and sensor successfully get created. When triggered, the triggered workflow runs in the same workspace as the triggering workflow (not the namespace of the sensor) which is the desired behavior.

Copy link
Collaborator

@saikonen saikonen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argo-workflows delete (was merged in recently) and argo-workflows create both try to delete a sensor as part of the command. There's a slight concern of leaving dangling sensors in custom namespaces if the user issuing the command does not remember to specify the correct one for the sensors

metaflow/metaflow_config.py Show resolved Hide resolved
Copy link
Collaborator

@saikonen saikonen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To revisit my earlier point on tracking where resources get created: One addition that could be done in scope of this PR is that we could add the sensor name, and the sensor namespace to the workflow template we deploy, either as annotations or as labels.

The main point for this is to provide visibility on where resources are deployed as part of the workflow template, but as an added benefit these could be used as another check for create(redeploy)/delete operations when looking for existing sensors

Comment on lines 14 to 17
def __init__(self, namespace=None, sensor_namespace=None):
self._client = KubernetesClient()
self._namespace = namespace or "default"
self._sensor_namespace = sensor_namespace or self._namespace
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a clear tradeoff here with granularity vs. simplicity regarding the ArgoClient. For now we could do with the original implementation of only allowing namespace=, and simply having a different client instance for each namespace we want to operate in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if that would work. The namespace is also used for knowing where to trigger the workflow. So I have workflow 1 in namespace A that communicates with a sensor in namespace B that triggers workflow 2 in namespace A again. And the way the argo client works is namespace = A in this case. If you were to set namespace = B, I would have to make sure that workflow 2 does not run in namespace B.

metaflow/plugins/argo/argo_workflows.py Outdated Show resolved Hide resolved
- Add a cleanup function during `argo-workflows create`
- Clean up sensors and cronworkflows
- Refactor ArgoClient to no longer need the `sensor_namespace`
  parameter.
metaflow/plugins/argo/argo_workflows.py Outdated Show resolved Hide resolved
metaflow/plugins/argo/argo_client.py Outdated Show resolved Hide resolved
metaflow/plugins/argo/argo_workflows.py Outdated Show resolved Hide resolved
@@ -225,6 +225,7 @@ def create(
obj.echo_always(str(flow), err=False, no_bold=True)
# TODO: Support echo-ing Argo Events Sensor template
else:
flow.cleanup_previous()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is an issue with running the cleanup only as part of the deploy command. Consider the following scenario:

  • user has deployed a sensor flow to namespace dev-A, sensor exists in the same namespace
  • this feature gets released and the Argo sensor namespace env_var gets configured to dev-B
  • the user decides that they want to get rid of their flow, and issue an argo-workflows delete directly.

Wouldn't this only check for the currently configured sensor namespace, leaving the original sensor in dev-A untouched?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in the case of delete it seems sensible to me to keep the concern of cleaning up resources as part of the method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the legacy case, we need to do cleanup before creating a new argo workflow. This can only happen if we clean up before calling flow.deploy().

One reason I did this is that there is some inconsistency in the API. The current implementation already does some sort of "cleanup". For schedules, we create/suspend them within the schedule_workflow_template method which is in the argo_client.py, which is called within the schedule method in argo_workflows.py. However the sensors are created/deleted within the schedule method in argo_workflows.py. So I wanted to move this all into one place, but I couldn't put my code into this schedule method. The previous annotations for the triggers would be overwritten before the schedulemethod.

Currently, the flow.schedule function is used to create both the schedule and the sensor with the assumption that the sensor will get overwritten because it always exists in the same namespace. So in your scenario, if you configured you local environment with dev-B then you would have to do an argo-workflows create in order to create a sensor in dev-B. Once you did that, it would delete the sensor in dev-A due to this line:

https://github.com/dhpollack/metaflow/blob/feature/configure-sensor-namespace/metaflow/plugins/argo/argo_workflows.py#L195

@saikonen saikonen added the in review Currently under review label Nov 10, 2023
saikonen
saikonen previously approved these changes Nov 13, 2023
@jmmclean
Copy link

bump on this please; looks to be approved, just maybe some formalities around getting it merged/deployed

metaflow/plugins/argo/argo_workflows.py Outdated Show resolved Hide resolved
saikonen
saikonen previously approved these changes Nov 21, 2023
@jmmclean
Copy link

jmmclean commented Dec 5, 2023

Bueller? Anyone looking into this PR - its approved but likely need some maintainers to merge/release it

@dhpollack
Copy link
Contributor Author

While this continues to work as of today, I recently changed our setup to deploy the required event infra into all of our namespaces so I probably will not update this again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in review Currently under review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow configuration of the namespace argo event sensors
4 participants