Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve ordering of ava_async APIs between multiple threads #2

Open
yuhc opened this issue Apr 7, 2020 · 4 comments
Open

Preserve ordering of ava_async APIs between multiple threads #2

yuhc opened this issue Apr 7, 2020 · 4 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@yuhc
Copy link
Member

yuhc commented Apr 7, 2020

In the current prototype, ava_async means the API returns right after it's sent to the API server, when the API server may not receive or execute the API yet.
In the multi-threading scenario, the order of ava_async APIs being executed may be changed and wrong in the API server when there's inter-thread synchronization.
To enforce the execution correctness, ava_async should preserve the ordering of those APIs between guest library and API server.

Related issue: [wait for merge from ava-serverless].

@arthurp
Copy link

arthurp commented Apr 7, 2020

I don't understand how this is possible. The communication channel is ordered and the dispatch is ordered, so reordering can only happen between operations on different threads and those reorderings are allowed for async calls since async calls are definitionally allowed to return before synchonizing with the underlying library.

What am I missing in my understanding?

@yuhc
Copy link
Member Author

yuhc commented Apr 8, 2020

The mistake happens when the multi-threaded program introduces some other assumptions.
For example, it can assume a CUDA kernel is "enqueued" when a cudaLaunchKernel API call returns. After that, another thread can call cudaEventRecord and cudaStreamWaitEvent to wait for this kernel to finish (but the kernel hasn't been enqueued to the stream, so the second thread doesn't wait), and then call cudaMemcpy API to corrupt the kernel's input CUDA memory (if the kernel hasn't finished).

This error occurred when I was improving the support of more complicated TensorFlow workloads. The work is on another private repo and I'll cc you to a related commit email.

@arthurp
Copy link

arthurp commented Apr 8, 2020

Ok. I see. From a simplistic Lapis semantic perspective this means that cudaLaunchKernel should NOT be ava_async since it performs some action before returning.

It sounds like you are saying that we can get the ordering we need by enforcing some additional ordering on the execution of functions in the server. What ordering are you planning to enforce? I worry that enforcing an ordering will have a surprisingly high performance cost. I suspect that we should find a way to encode the need for this ordering in the spec and only enforce the ordering in cases where it is needed.

@rossbach
Copy link
Member

rossbach commented Apr 8, 2020 via email

@yuhc yuhc added bug Something isn't working enhancement New feature or request labels Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants