Added benchmarks for contraction on CPU.

2025-09-17 11:53:12 +08:00 · 2016-05-13 14:32:17 -07:00 · 2016-05-13 14:32:17 -07:00 · 069a0b04d7
commit 069a0b04d7
parent c4fc8b70ec
2 changed files with 45 additions and 3 deletions
--- a/bench/tensors/README
+++ b/bench/tensors/README
@ -1,4 +1,6 @@
-Each benchmark comes in 2 flavors: one that runs on CPU, and one that runs on GPU.
+The tensor benchmark suite is made of several parts.
 The first part is a generic suite, in which each benchmark comes in 2 flavors: one that runs on CPU, and one that runs on GPU.
 To compile the floating point CPU benchmarks, simply call:
 g++ tensor_benchmarks_cpu.cc benchmark_main.cc -I ../../ -std=c++11 -O3 -DNDEBUG -pthread -mavx -o benchmarks_cpu
@ -6,7 +8,8 @@ g++ tensor_benchmarks_cpu.cc benchmark_main.cc -I ../../ -std=c++11 -O3 -DNDEBUG
 To compile the floating point GPU benchmarks, simply call:
 nvcc tensor_benchmarks_gpu.cu benchmark_main.cc -I ../../ -std=c++11 -O2 -DNDEBUG -arch compute_35 -o benchmarks_gpu
-
+We also provide a version of the generic GPU tensor benchmarks that uses half floats (aka fp16) instead of regular floats. To compile these benchmarks, simply call the command line below. You'll need a recent GPU that supports compute capability 5.3 or higher to run them and nvcc 7.5 or higher to compile the code.
 To compile the half float GPU benchmarks, simply call the command line below. You'll need a recent GPU that supports compute capability 5.3 or higher to run them and nvcc 7.5 or higher to compile the code.
 nvcc tensor_benchmarks_fp16_gpu.cu benchmark_main.cc -I ../../ -std=c++11 -O2 -DNDEBUG -arch compute_53 -o benchmarks_fp16_gpu
 last but not least, we also provide a suite of benchmarks to measure the scalability of the contraction code on CPU. To compile these benchmarks, call 
 g++ contraction_benchmarks_cpu.cc benchmark_main.cc -I ../../ -std=c++11 -O3 -DNDEBUG -pthread -mavx -o benchmarks_cpu
--- a/bench/tensors/contraction_benchmarks_cpu.cc
+++ b/bench/tensors/contraction_benchmarks_cpu.cc
@ -0,0 +1,39 @@
 #define EIGEN_USE_THREADS
 #include <string>
 #include "tensor_benchmarks.h"
 #define CREATE_THREAD_POOL(threads)             \
 Eigen::ThreadPool pool(threads);                \
 Eigen::ThreadPoolDevice device(&pool, threads);
 // Contractions for number of threads ranging from 1 to 32
 // Dimensions are Rows, Cols, Depth
 #define BM_ContractionCPU(D1, D2, D3)                                         \
  static void BM_##Contraction##_##D1##x##D2##x##D3(int iters, int Threads) { \
    StopBenchmarkTiming();                                                    \
    CREATE_THREAD_POOL(Threads);                                              \
    BenchmarkSuite<Eigen::ThreadPoolDevice, float> suite(device, D1, D2, D3); \
    suite.contraction(iters);                                                 \
  }                                                                           \
  BENCHMARK_RANGE(BM_##Contraction##_##D1##x##D2##x##D3, 1, 32);
 // Vector Matrix and Matrix Vector products
 BM_ContractionCPU(1, 2000, 500);
 BM_ContractionCPU(2000, 1, 500);
 // Various skinny matrices
 BM_ContractionCPU(250, 3, 512);
 BM_ContractionCPU(1500, 3, 512);
 BM_ContractionCPU(512, 800, 4);
 BM_ContractionCPU(512, 80, 800);
 BM_ContractionCPU(512, 80, 13522);
 BM_ContractionCPU(1, 80, 13522);
 BM_ContractionCPU(3200, 512, 4);
 BM_ContractionCPU(3200, 512, 80);
 BM_ContractionCPU(3200, 80, 512);