Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling Ray Cluster #1501

Closed
wants to merge 26 commits into from
Closed

Enabling Ray Cluster #1501

wants to merge 26 commits into from

Conversation

rileyhun
Copy link
Contributor

@rileyhun rileyhun commented Aug 14, 2023

Overview

We require the ability to activate a Ray cluster using AWS Batch multi-node parallel jobs

This PR accomplishes the following:

  • Creates a @ray_parallel decorator that a user can use to effectively spin up a Ray cluster from AWS Batch multi-node parallel job. This decorator follows the official Ray documentation on setting up Ray on-premise.

What has changed

  • @ray_parallel decorator modeled from base @parallel decorator

Notes

Testing evidence

@savingoyal savingoyal added the in review Currently under review label Aug 21, 2023
@rileyhun rileyhun changed the title Adding EFA Support + Enabling Ray Cluster Enabling Ray Cluster Aug 24, 2023
from threading import Thread
from metaflow.exception import MetaflowException
from metaflow.unbounded_foreach import UBF_CONTROL
from metaflow.plugins.parallel_decorator import ParallelDecorator, _local_multinode_control_task_step_func
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small change proposal: Importing internal method might lead to breaking changes in the future. Could this be done with only the ParallelDecorator and relying on super() in the task_decorate?

break
except ImportError:
print("Ray is not installed. Installing latest version of ray-air package.")
subprocess.run([sys.executable, "-m", "pip", "install", "-U", "ray[air]"], check=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the -U necessary? This could lead to overhead/breakage in task startups even with custom built images that have Ray installed, whenever a new version is released.

@savingoyal
Copy link
Collaborator

This PR is encapsulated in https://github.com/outerbounds/metaflow-ray and we can close this for now.

@savingoyal savingoyal closed this Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in review Currently under review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants