So I extensively measured the impact of the offset in this prefetch. I tried offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes).

On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400.

I could not see any significant impact of this offset.

On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0.

So let's just go with 0!

Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!
This commit is contained in:
Benoit Jacob 2015-02-25 12:37:14 -05:00
parent 531fa9de77
commit 692136350b

View File

@ -878,7 +878,7 @@ void gebp_kernel<LhsScalar,RhsScalar,Index,DataMapper,mr,nr,ConjugateLhs,Conjuga
EIGEN_ASM_COMMENT("end step of gebp micro kernel 3pX4"); \
} while(false)
internal::prefetch(blB + 8 * pk); /* Bug 953 */
internal::prefetch(blB);
EIGEN_GEBP_ONESTEP(0);
EIGEN_GEBP_ONESTEP(1);
EIGEN_GEBP_ONESTEP(2);