Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait tokio tasks finish before CKB process exit #3999

Merged
merged 11 commits into from
Jul 18, 2023

Conversation

eval-exec
Copy link
Collaborator

@eval-exec eval-exec commented Jun 1, 2023

What problem does this PR solve?

Issue Number: close #3994

Problem Summary:

What is changed and how it works?

What's Changed:

Related changes

Implementation

  • Add a std::threads JoinHandle register to collect all child threads' JoinHandle
  • broadcast CKB's exit signal to child threads and tokio CancellationToken.
  • wait all ckb threads finish before process exit.
  • wait all ckb tokio spawned tasks finish before process exit.

Test

  • Unit test for broadcast_exit_signals, wait_all_ckb_services_exit, register_thread, new_crossbeam_exit_rx and new_tokio_exit_rx
  • Integration test: Bats clit test
    Ideas to implement integration test:
    1. start processes by ckb run
    2. wait a duration about 10 seconds, then kill ckb process.
    3. wait ckb process exit.
    4. assert there are log like all ckb threads have been stopped and ckb shutdown exist in the log file.
  • Fix bat-cli timeout issue

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code ci-runs-only: [ quick_checks,linters ]

Side effects

  • None

Release note

Title Only: Include only the PR title in the release note.

@eval-exec eval-exec force-pushed the exec/stop_register branch 3 times, most recently from 8c09ab7 to 1cd206f Compare June 5, 2023 09:06
@eval-exec eval-exec force-pushed the exec/stop_register branch 13 times, most recently from 2e73c71 to f69efe1 Compare June 20, 2023 07:12
@eval-exec eval-exec changed the title [WIP] Wait tokio tasks finish before CKB process exit Wait tokio tasks finish before CKB process exit Jun 20, 2023
@eval-exec eval-exec marked this pull request as ready for review June 20, 2023 08:19
@eval-exec eval-exec requested a review from a team as a code owner June 20, 2023 08:19
@eval-exec eval-exec requested review from zhangsoledad and removed request for a team June 20, 2023 08:19
@eval-exec eval-exec added the t:enhancement Type: Feature, refactoring. label Jun 20, 2023
@eval-exec eval-exec self-assigned this Jun 20, 2023
@eval-exec eval-exec force-pushed the exec/stop_register branch 3 times, most recently from 6d29416 to 95cc858 Compare June 20, 2023 09:05
@eval-exec eval-exec added the s:pr-created Status: PR is ready for review label Jun 20, 2023
@eval-exec eval-exec marked this pull request as draft June 20, 2023 09:10
@eval-exec eval-exec marked this pull request as ready for review June 23, 2023 01:52
@eval-exec eval-exec force-pushed the exec/stop_register branch 4 times, most recently from 75514fa to 4ba0ec3 Compare July 12, 2023 07:50
}

/// Register a thread `JoinHandle` to `CKB_HANDLES`
pub fn register_thread(name: &str, thread_handle: std::thread::JoinHandle<()>) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is only for printing messages?
so it's not an issue when multiple name are same.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a new version for ckb-stop-handler?
I'm not sure whether any other projects are using it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should.

debug!("wait thread {} done", name);
}
Err(e) => {
warn!("wait thread {}: ERROR: {:?}", name, e)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@eval-exec eval-exec Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This WARN log didn't print thread's panic message, I should use:

    warn!("wait thread {}: ERROR: {:?}",name,e.downcast_ref::<&str>())

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update:
If the panic! use a formatter, e.downcast_ref::<&str>() will get None, so I have to use e.downcast_ref::<String>():

                let msg = e.downcast_ref::<str>().unwrap_or_else(|| {
                    e.downcast_ref::<String>()
                        .map(|s| s.as_str())
                        .unwrap_or("downcast_ref didn't get any error")
                });
                warn!("wait thread {}: ERROR: {:?}", name, msg)

#!/usr/bin/env bats
bats_load_library 'bats-assert'
bats_load_library 'bats-support'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test case, some sub-services terminated before receiving exit notification, and then the wait_all_ckb_services_exit also works for this scenario, maybe also double confirm the log messages.

Copy link
Collaborator Author

@eval-exec eval-exec Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can.
I'd like to construct a bats cli test like this:

  1. start a ckb process.
  2. append some random bytes to ${DATA_DIR}/data/db/*.sst
  3. wait the running ckb process' BlockDownload thread panic:

    ckb/sync/src/utils.rs

    Lines 171 to 173 in f6466c3

    if error_kind == InternalErrorKind::DataCorrupted {
    panic!("{}", error)
    } else {
  4. assert that wait_all_ckb_services_exit got the panic message.

@zhangsoledad zhangsoledad added this pull request to the merge queue Jul 18, 2023
Merged via the queue into nervosnetwork:develop with commit 5e67845 Jul 18, 2023
43 checks passed
@doitian doitian mentioned this pull request Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s:pr-created Status: PR is ready for review t:enhancement Type: Feature, refactoring.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Fix ckb graceful shutdown
3 participants