Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[@parallel][jobsets] bug fixes and refactor #1900

Closed
wants to merge 2 commits into from

Commits on Jul 4, 2024

  1. [parallel-fixes] core + test changes

    - fix tests for interplay of @Secrets and @parallel
    - Local runtime will allow secrets only on worker tasks
    - Secrets will be set on all kinds of tasks when run remotely (control/worker)
    - fix tests for ubf based on new changes to core.
    - fix tests for tag-catch for ubf based on new changes to core.
    - internal ubf decorator has a internal_task_type set to it.
    - [feedback] register metadata in parallel decorator
    - [feedback]@parallel inject in current:
        - move `current.parallel` from `metaflow_current` to `parallel_decorator`
    - [feedback] appropariately setting task-metadata for parallel stuff
        - The 'world size' metadata will be set in the @parallel decorator.
        - The 'node-index' metadata, however, varies depending on the type of computing environment executing the task so it will be set in the appropriate compute decorators.
        - One reason to specify 'node-index' within compute decorators is that the parallel implementation in the compute decorator might not directly set the required environment variables (`MF_PARALLEL_*`). Instead, these values may be established during the `task_pre_step` phase of the compute decorator using other environment variables set during the implementation.
        - adding some aws batch changes
    - [feedback] safety check for _parallel_buf_iter in task_pre_step for @parallel
    - [feedback] set `is_parallel` in current to denote a step is running under an `@parallel` decorator.
    valayDave committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    653f1e1 View commit details
    Browse the repository at this point in the history
  2. [refactor-js] refactor to new implementation

    - Some changes stemming from Netflix#1854
    - reorder imports
    - changes to env var prefixes + id generation
    - Abstraction over Jobset Spec
    - Create general abstraction and copy style of how we create jobspec for native k8s jobs.
    - simplified the implementation.
    - similar pattern/trend to follow for argo
    - remove older implementation
    - @parallel metadata from k8s decorator
        - added attempt to MetaDatum tags
        - We didn't have this earlier
    valayDave committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    896474f View commit details
    Browse the repository at this point in the history