[Fix] Fix the Error of q, k, and v states must have the same dtype when using flash attention forward. #15

jacklanda · 2024-05-28T16:57:10Z

Step for Error Reproduction

Once we enable the argument use_cache=True in the Hugging Face generation_config something like the following:

generation_config = GenerationConfig(
            bos_token_id=128000,
            eos_token_id=128001,
            pad_token_id=self.tokenizer.pad_token_id,
            use_cache=True,
)

You can simply pass the generation config with use_cache=True to model and make it forward as usual, and I believe you'll get the error I met.

Error Message

I will get the following error message: "query and key must have the same dtype". And I found that this is due to the misalignment between the dtype of query_state and value_state.

Potential Solution

I guess there exists potential casting for q, k, and v states. Hence, I try to cast back the dtype of q, k, and v to the same target dtype.

Reference

The committed code I wrote strongly refers to the existing implementation in mergoo.

mergoo/models/modeling_llama.py

jacklanda

Submit new commit: add dtype checking for q, k, and v states before auto casting.

mergoo/models/modeling_llama.py

gitsailor5

looks good!

🚑 Fix: cast back the dtype of q, k, and v to the same target dtype.

aba6b66

jacklanda changed the title ~~[Bug] Query and key must have the same dtype when use flash attention forward~~ [Fix] Fix the Error of q, k, and v states must have the same dtype when using flash attention forward. May 28, 2024

alirezamshi requested a review from gitsailor5 May 28, 2024 19:26

gitsailor5 requested changes May 28, 2024

View reviewed changes

mergoo/models/modeling_llama.py Outdated Show resolved Hide resolved

✨ Add dtype checking for q, k, and v states before auto casting.

21cd706

jacklanda commented May 29, 2024

View reviewed changes

mergoo/models/modeling_llama.py Outdated Show resolved Hide resolved

jacklanda requested a review from gitsailor5 May 29, 2024 03:07

gitsailor5 approved these changes May 30, 2024

View reviewed changes

gitsailor5 merged commit c73a047 into Leeroo-AI:main May 30, 2024
1 check passed

jacklanda deleted the fix-use-cache-error branch May 30, 2024 15:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Fix the Error of q, k, and v states must have the same dtype when using flash attention forward. #15

[Fix] Fix the Error of q, k, and v states must have the same dtype when using flash attention forward. #15

jacklanda commented May 28, 2024

jacklanda left a comment

gitsailor5 left a comment

[Fix] Fix the Error of q, k, and v states must have the same dtype when using flash attention forward. #15

[Fix] Fix the Error of q, k, and v states must have the same dtype when using flash attention forward. #15

Conversation

jacklanda commented May 28, 2024

Step for Error Reproduction

Error Message

Potential Solution

Reference

jacklanda left a comment

Choose a reason for hiding this comment

gitsailor5 left a comment

Choose a reason for hiding this comment