Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ready for Review] Support more convenient foreach variable/value search-ability. #1664

Merged
merged 5 commits into from
Jan 29, 2024

Conversation

darinyu
Copy link
Collaborator

@darinyu darinyu commented Jan 3, 2024

In this change, we

  1. add values of foreach items to flowspec for convenient access to values in foreach steps.
  2. Update foreach stack to include value.
  3. Add formatted foreach stack (path from root to current level) to metadata, guarded by a flag.

Currently, we try to get "value" of the for each in the following order:

  1. If the item is list of primitive types like ["a", "b", "c"], we will use the literal value directly
  2. Otherwise, we use replib to get a serialized value of the iterable, limited to 30 charactors.

The end result will be of the form [value]. We will then

  1. Use the list of value (generated in a split/parent node) in the task_finish function inside a custom decorator.
  2. Use the list of value when constructing foreach stack, which will eventually be used to construct the for-loop path in metadata.

An example for the updated search UI is like the following.
Metaflow-UI (2)

@darinyu darinyu changed the title add new param to store for each params as string [Ready to review] add new param to store for each params as string Jan 3, 2024
@darinyu darinyu force-pushed the add_foreach_value branch 2 times, most recently from c5ae0b3 to f337a5b Compare January 5, 2024 20:40
@darinyu darinyu changed the title [Ready to review] add new param to store for each params as string [WIP] add new param to store for each params as string Jan 11, 2024
@darinyu darinyu changed the title [WIP] add new param to store for each params as string [WIP] Support more convenient foreach variable/value search-ability. Jan 17, 2024
@darinyu darinyu changed the title [WIP] Support more convenient foreach variable/value search-ability. [Ready for Review] Support more convenient foreach variable/value search-ability. Jan 17, 2024
@darinyu darinyu force-pushed the add_foreach_value branch 2 times, most recently from 33c52a1 to 054b9fd Compare January 18, 2024 00:48
metaflow/flowspec.py Outdated Show resolved Hide resolved
metaflow/task.py Show resolved Hide resolved
metaflow/task.py Show resolved Hide resolved
metaflow/task.py Show resolved Hide resolved
@@ -1027,7 +1027,13 @@ def results(self):
@property
def finished_id(self):
# note: id is not available before the task has finished
return (self.step, tuple(self.results["_foreach_stack"]))
# index already identifies the task within the foreach,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? Isn't the presence of the value backward compatible with the old code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a node, we may want to quickly get the finish_id of the sibling node and check if they all finished (can check usage of finished_id within _queue_task_join). It is non-trivial to get value of the sibling, so we may as well not compare value at all (afterall, the index of the foreach already identifies the node).

romain-intel
romain-intel previously approved these changes Jan 23, 2024
Copy link
Contributor

@romain-intel romain-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a small typo. Let's wait for someone from OB but we can merge afterwards.

metaflow/task.py Outdated Show resolved Hide resolved
metaflow/task.py Show resolved Hide resolved
metaflow/task.py Outdated Show resolved Hide resolved
@romain-intel romain-intel merged commit 4780962 into master Jan 29, 2024
22 checks passed
@romain-intel romain-intel deleted the add_foreach_value branch January 29, 2024 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants