Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: missing loglines from tailing logs with GCP #2112

Merged
merged 5 commits into from
Oct 23, 2024

Conversation

saikonen
Copy link
Collaborator

Fixes an issue where log tailing stops working after a point on GCP, resulting in missing log output on the terminal.

The cause is with the behaviour of google.cloud.storage blob client, where consecutive requests with identical attributes on a single instance will not yield different responses, even if the underlying blob has advanced.

Example flow to reproduce missing log lines on GCP:

from time import sleep
from metaflow import (
    FlowSpec,
    step,
    kubernetes,
)
import os

class MinimalKubeGCPLogExample(FlowSpec):
    @kubernetes
    @step
    def start(self):
        print("BUFFERED: %s" % os.getenv("PYTHONBUFFERED"))
        print("first log line so as not to have an empty file")
        for i in range(10):
            print(f"{i}\r", end="")
            sleep(1)
        print("\r\n")
        print("after progress lines")
        self.next(self.end)

    @step
    def end(self):
        pass


if __name__ == "__main__":
    MinimalKubeGCPLogExample()

also small fix to Azure log tailing as well.

@saikonen saikonen requested review from savingoyal, valayDave and oavdeev and removed request for savingoyal October 22, 2024 13:19
@@ -710,12 +710,23 @@ def wait_for_launch(job):
wait_for_launch(self._job)

# 2) Tail logs until the job has finished
self._output_final_logs = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use generator -

def _has_updates():
    while self._job.is_running:
        yield True
    yield True  # One final update after job finishes
    yield False

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup this should work too, and it reduces reliance on the other status internals. I'll make the changes

Copy link
Collaborator Author

@saikonen saikonen Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scratch that, it won't work without larger changes as the tail_logs has_log_updates= callable is simply used in a while loop, which won't exhaust the generator (it counts as true always)

@savingoyal
Copy link
Collaborator

is there any perf impact?

@saikonen
Copy link
Collaborator Author

is there any perf impact?

the tailing happens client-side so no performance impact on the actual task execution code during remote exec.
there is of course overhead with re-instantiating the blob client during every tailing poll but this didn't seem significant during testing

@saikonen saikonen merged commit 5fd5675 into master Oct 23, 2024
29 checks passed
@saikonen saikonen deleted the fix/missing-loglines-from-tailing-logs branch October 23, 2024 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants