fix: CpuGemmDirectConv2d: accumulate in fp32 unless in fast_mode#1287
Open
Sqvid wants to merge 1 commit intoARM-software:mainfrom
Open
fix: CpuGemmDirectConv2d: accumulate in fp32 unless in fast_mode#1287Sqvid wants to merge 1 commit intoARM-software:mainfrom
Sqvid wants to merge 1 commit intoARM-software:mainfrom
Conversation
Change-Id: Id00f8e17b3349893164eb0b7edd616345488515e Signed-off-by: Siddhartha Menon <siddhartha.menon@arm.com>
Dongsung-arm
approved these changes
May 5, 2026
Dongsung-arm
left a comment
There was a problem hiding this comment.
Looks good to me.
This matches the intended behavior: FP32 accumulation by default, and only disabled when enable_fast_math is set.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We have a bug in PyTorch+oneDNN+ACL where numerical errors were observed in certain f16 convs.
PyTorch issue: pytorch/pytorch#177245
oneDNN issue: uxlfoundation/oneDNN#5106
The root cause is that we are accumulating in
f16rather than inf32even whenfast_modeisfalse. The reason this happens is becauseCpuGemmDirectConv2d::configure()callsCpuGemmAssemblyDispatch::configure()here:ComputeLibrary/src/cpu/operators/CpuGemmDirectConv2d.cpp
Line 136 in d619e50
Which does the following:
ComputeLibrary/src/cpu/operators/internal/CpuGemmAssemblyDispatch.cpp
Lines 853 to 856 in d619e50
My solution is therefore to change
CpuGemmDirectConv2d::init_assembly_metadata()such thatuse_fp32_accistrueunlessenable_fast_modeis set. This resolves the bug at the oneDNN level. Hopefully @robert-hardwick can confirm if this fixes the PyTorch bug as well.