As written, depending on multithreading/gpu, the returned index from
`argmin`/`argmax` is not currently stable. Here we modify the functors
to always keep the first occurence (i.e. if the value is equal to the
current min/max, then keep the one with the smallest index).
This is otherwise causing unpredictable results in some TF tests.
(cherry picked from commit 3a087ccb99b454dc34484333e608e836e7032213)