Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: proposition of task management for any kind of init, data processing, data trasnformation tasks #124

Closed
wants to merge 1 commit into from

Conversation

rv2931
Copy link
Collaborator

@rv2931 rv2931 commented Mar 17, 2024

The idea is to manage any kind of "task" with a complete task management system
Maybe there is alreay a python module to do that, but here is a proposition of simplest and dummest kind of task manager

Here we replace init_script that already exists as independant tasks but as a task can run other tasks itself, we can implement an init task runing for example load_amp, load_vessels, load_vessel_positions, load_ports ... and other task
each task can be run as a unit task or be run by another parent task

This can be usefull when we will run data processing that will be configured in code but will be startable by a simple cron

This system run tasks in a sequential way but it can be improved to run tasks in paralell or in thread or independant processes...

Example:

# Start unit task load_amp_data
python src/tasks/data/load_amp_data.py
INFO:bloom.tasks:Starting task LoadAmpDataTask
INFO:bloom.tasks:Task LoadAmpDataTask finished

# Start complete init pipeline
python src/tasks/init.py
INFO:bloom.tasks:Starting task InitTask
INFO:bloom.tasks:Starting task LoadAmpDataTask
INFO:bloom.tasks:Task LoadAmpDataTask finished
INFO:bloom.tasks:Starting task LoadPortDataTask
INFO:bloom.tasks:Task LoadPortDataTask finished
INFO:bloom.tasks:Starting task LoadVesselsDataTask
INFO:bloom.tasks:Task LoadVesselsDataTask finished
INFO:bloom.tasks:Starting task LoadVesselPositionsDataTask
INFO:bloom.tasks:Task LoadVesselPositionsDataTask finished
INFO:bloom.tasks:Task InitTask finished

Here is init pipeline creation:

from tasks.base import BaseTask
from tasks.data import LoadAmpDataTask, LoadPortDataTask, LoadVesselsDataTask, LoadVesselPositionsDataTask
from bloom.config import settings
import logging

logging.basicConfig()
logging.getLogger("bloom.tasks").setLevel(settings.logging_level)

class InitTask(BaseTask):

    def run(self):

        LoadAmpDataTask().start()
        LoadPortDataTask().start()
        LoadVesselsDataTask().start()
        LoadVesselPositionsDataTask().start()


if __name__ == "__main__":
    InitTask().start()

@rv2931 rv2931 self-assigned this Mar 17, 2024
@rv2931 rv2931 changed the title feat: proposition of task management for any king of init, data processing, data trasnformation tasks feat: proposition of task management for any kind of init, data processing, data trasnformation tasks Mar 18, 2024
@njouanin
Copy link
Collaborator

Ça me parait pas mal et ça s'intègre facilement avec les tâches que j'ai créé (src/bloom/tasks). Et OK pour la notion de pipeline qui nous sera utile.
On pourrait ajouter la gestion d'envoi des logs par mail ou par message sur un canal slack.

@rv2931 rv2931 mentioned this pull request Mar 24, 2024
@rv2931 rv2931 closed this Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants