Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable auto vectorization of xxhash64, when AVX512 is present. #3819

Closed
TocarIP opened this issue Nov 13, 2023 · 5 comments · Fixed by #3933
Closed

Disable auto vectorization of xxhash64, when AVX512 is present. #3819

TocarIP opened this issue Nov 13, 2023 · 5 comments · Fixed by #3933
Assignees

Comments

@TocarIP
Copy link
Contributor

TocarIP commented Nov 13, 2023

AVX512 adds support for VPMULLQ instructions, which makes is possible to vectorize XXH64_update. Here is a minimal reproducer where clang already vectrozies it with avx512 enabled: https://godbolt.org/z/cnM3986dx . Unfortunately some architectures (e.g skylake) have a very significant latency (uops.info report 15+ cycles), so this makes xxhash ~2x slower. Trivial benchmark shows BM_hash/16M 1.29ms ± 3% vs BM_hash/16M 2.74ms ± 4%. We already disable vectorization for xxhash32 if sse4 is present, so we should probably do the same for xxhash64 and avx512.

@Cyan4973
Copy link
Contributor

We already disable vectorization for xxhash32 if sse4 is present, so we should probably do the same for xxhash64 and avx512.

Indeed,
this seems like the same issue,
and we should probably use a similar solution.

Cyan4973 added a commit that referenced this issue Nov 13, 2023
List of updates : https://github.com/Cyan4973/xxHash/releases/tag/v0.8.2

This is also a preparation task before taking care of #3819
@TocarIP
Copy link
Contributor Author

TocarIP commented Nov 16, 2023

Someone asked me about AMD offline so might as well post here. On genoa (Zen4) AVX512 is actually ~30% faster ( 1135871 ns/op vs 885935 ns/op).

@Cyan4973
Copy link
Contributor

OK, so it makes the situation a bit less clear,
since avx512 is now sometimes beneficial, sometimes detrimental.
It probably opens the door towards offering a user choice on this topic.

However, it doesn't remove the question of "what's a good default",
and if I read the situation correctly, it seems that disabling avx512 for xxh64 remains a reasonable default at this point in time.

@TocarIP
Copy link
Contributor Author

TocarIP commented Nov 17, 2023

Regression on Intel is bigger than gain on AMD, so disabled by default makes sense. Using the same XXH_ENABLE_AUTOVECTORIZE to allow opt-in is probably the best.

gcflymoto pushed a commit to gcflymoto/zstd that referenced this issue Dec 9, 2023
List of updates : https://github.com/Cyan4973/xxHash/releases/tag/v0.8.2

This is also a preparation task before taking care of facebook#3819
@Cyan4973
Copy link
Contributor

Cyan4973 commented Mar 6, 2024

Cyan4973/xxHash#924

@Cyan4973 Cyan4973 self-assigned this Mar 8, 2024
hswong3i pushed a commit to alvistack/facebook-zstd that referenced this issue Mar 27, 2024
List of updates : https://github.com/Cyan4973/xxHash/releases/tag/v0.8.2

This is also a preparation task before taking care of facebook#3819
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants