Modularized Nested Workflows/DAGs Best Practices Question #895
Replies: 1 comment 2 replies
-
Thanks for the question and example code @greenbagels! The code looks pretty sensible to me, my only comment is that when you said
the context manager actually closes upon leaving the For
If you're using containers (rather than script functions), you should just need to declare the input params in |
Beta Was this translation helpful? Give feedback.
-
First of all, thank you for writing this wonderful library; our organization has used Hera v4 and v5 extensively for semi-automating our bioinformatics pipelines. But I have a question pertaining to how to best modularize (in the Python sense) some Hera code.
To keep the bioinformatics-specific details out of the picture, let's say I'm trying to model the necessary work-items needed for a group order at a bakery. The group order is comprised of individual items, and each item requires a different sequence of steps to be completed. Previously, our work operated under the assumption that each individual in a group order wanted the same items. This meant we had this kind of file structure:
containers.py
: just defining allContainer
variables used across all possible items:croissant.py
: defines and runs the workflow for making a batch order ofcroissant
s, passing the customer's specialrequest
s to the containers:and so forth, for the different items; this naturally involved a lot of code duplication, and doesn't allow us to consider group orders of items with mixed types. To allow for mixed-item orders, I'm trying to define each individual item in an order as its own sub-Workflow; but one that is returned from a different module. So basically now we have something that looks like like
croissant.py
: now defines but returns the context manager for making a singlecroissant
:order_dispatch.py
: reads orders from the list and assembles the correct subworkflowsI modeled this on the "Workflows of Workflows" example, but I'm unsure if it looks very hacky; I'm wondering if it would be better to use only DAGs and Workflows instead of DAGs, Workflows, and Steps?
Also, in my actual (bioinformatics) code, I also get issues where the containers are unable to resolve any input parameters (akin to the
request
variables here); is this unavoidable when having a function in a different module returning the subworkflows?Apologies for all the questions; Python is not my native tongue, so any help (even if pointing out an XY problem or similar) would be greatly appreciated :)
Beta Was this translation helpful? Give feedback.
All reactions