Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable custom metrics exporter with DLSIA #7

Open
taxe10 opened this issue Feb 23, 2024 · 10 comments
Open

Enable custom metrics exporter with DLSIA #7

taxe10 opened this issue Feb 23, 2024 · 10 comments
Assignees

Comments

@taxe10
Copy link
Member

taxe10 commented Feb 23, 2024

Currently, we are modifying an existing DLSIA function train_segmentation to write/export the loss and metrics at every epoch while the training process is ongoing.

We (@xiaoyachong @zhuowenzhao @taxe10) discussed about how to integrate DVC without modifying the DLSIA function at our end. Here we summarize our initial thoughts:

In MLExchange:

# train.py
from dvclive import Live
from dlsia.core import train_segmentation

# Creates custom exporter class compatible with DVC
class DvcExporter:
    def __init__(self, live):
        self.live = live
    
    def export_train_metrics(self, metrics):
        # DVC code

# Init DVC Live
with Live() as live:

    # Define the parameters to be tracked
    live.log_param("epochs", NUM_EPOCHS)

    # Init custom exporter
    dvc_exporter = DvcExporter(live)

    # Calls DLSIA function with custom exporter
    train_segmentation(net, trainloader, validationloader, NUM_EPOCHS,
                       criterion, optimizer, device,
                       savepath=None, saveevery=None,
                       scheduler=None, show=0,
                       use_amp=False, clip_value=None, custom_exporter=dvc_exporter)

    live.log_artifact(path, type="model", name=name)

This would require a PR in DLSIA that would look as follows:

def train_segmentation(net, trainloader, validationloader, NUM_EPOCHS,
                       criterion, optimizer, device,
                       savepath=None, saveevery=None,
                       scheduler=None, show=0,
                       use_amp=False, clip_value=None, custom_exporter=None):
.........
for epoch in range(NUM_EPOCHS):
    ....
    ## After validation maybe around https://github.com/phzwart/dlsia/blob/f3f50a78faeb99aca4b9725ffa63c7b95c0613df/dlsia/core/train_scripts.py#L228
    if custom_exporter is not None:
        custom_exporter.export_train_metrics(metrics)

This is a very rough draft, mostly to gather feedback.
Any thoughts and/or comments? @Wiebke @dylanmcreynolds @TibbersHao

@zhuowenzhao
Copy link
Member

zhuowenzhao commented Feb 23, 2024

Another thought of mine is aligned with implementing a TrainModel class, which might be very doable within the timeframe of the Diamond trip since I'd assume train_segmentation() function is not called elsewhere in DLSIA. I am putting it here for (future) record with Peter.

Can implement a function (or internal) called train_epoch() in DLSIA that update per epoch which can be used in an outside loop:

class TrainModel:
     def _ _init_ _(self, **args):
          # initialize metrics if needed 
          self.metrics...

     def train_epoch(self, ...):

     def tain_segmentation(self, ...):
          ...
           for epoch in range(NUM_EPOCHS):
                 ....
                 self.train_epoch()
                 ....

Then for DVC Live, we can use the same code as DVC Live documentation suggested:

train_model = TrainMode()

# Init DVC Live, this code stays unchanged as DVC documentation
with Live() as live:

    live.log_param("epochs", NUM_EPOCHS)

    for epoch in range(NUM_EPOCHS):
        train_model.train_epoch()
        metrics = train_model.metrics

        for metric_name, value in metrics.items():
            live.log_metric(metric_name, value)

        live.next_step()

    live.log_artifact(path, type="model", name=name)

@dylanmcreynolds
Copy link
Member

How would this Live() communicate updated loss to the user? Writing to a file that the segmentation app polls? Writing to a web socket?

@phzwart
Copy link
Collaborator

phzwart commented Feb 23, 2024 via email

@phzwart
Copy link
Collaborator

phzwart commented Feb 23, 2024 via email

@xiaoyachong
Copy link
Contributor

DVCLive supports a lot of existing ML Frameworks (e.g. Fast.ai, Pytorch, Keras, Hugging Face, etc).

Tanny's idea is similar to how DVCLive supports Keras (https://dvc.org/doc/dvclive/ml-frameworks/keras), while Zhuowen's idea is similar to the Hugging Face's method (https://dvc.org/doc/dvclive/ml-frameworks/huggingface). I think both will be fine.

@xiaoyachong
Copy link
Contributor

How would this Live() communicate updated loss to the user? Writing to a file that the segmentation app polls? Writing to a web socket?

Live() will automatically generate a local file called 'report.html' during training, which is updated once after each epoch. And the report.html looks like:
Screenshot 2024-02-23 at 3 45 31 PM

@xiaoyachong
Copy link
Contributor

@phzwart
Hi Peter, based on Zhuowen's idea, I create a new Class called Trainer() and test DVC using a jupyter notebook file (https://drive.google.com/file/d/1Hy7qKViilWDV_fHk0F1NbGkw1TM7vnBI/view?usp=sharing).

Could you take a look at it and tell whether we could add it to DLSIA?

@phzwart
Copy link
Collaborator

phzwart commented Mar 5, 2024 via email

@phzwart
Copy link
Collaborator

phzwart commented Mar 5, 2024 via email

@xiaoyachong
Copy link
Contributor

@phzwart
Hi Peter, I modify the Trainer() accordingly and test it. It works well. For the definition of the Trainer(), you could refer to the latest version https://github.com/mlexchange/mlex_dlsia_segmentation_prototype/blob/xchong-dvc/src/seg_utils.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants