General Matrix Multiply (GEMM)を簡単に利用できるようにするフレームワークがなかったので作ってみた。 最新版はここここから。

Benchmark

$ clang -Ofast -o axpy axpy.c -march=native
$ ./axpy 
axpy Multiplication 10000: 0.000140 ms
axpy Multiplication 100000: 0.001506 ms
axpy Multiplication 1000000: 0.022846 ms

$ clang -Ofast -o dgemm_sse dgemm_sse.c -mavx
$ ./dgemm_sse 
Matrix Multiplication 10x10 * 10x10, TA=0, TB=0: 0.000028 ms
Matrix Multiplication 100x100 * 100x100, TA=0, TB=0: 0.004218 ms
Matrix Multiplication 1000x1000 * 1000x1000, TA=0, TB=0: 2.819324 ms

$ clang -Ofast -o dgemm_avx dgemm_avx.c -mavx
$ ./dgemm_avx 
Matrix Multiplication 32x32 * 32x32, TA=0, TB=0: 0.000150 ms
Matrix Multiplication 100x100 * 100x100, TA=0, TB=0: 0.003353 ms
Matrix Multiplication 1000x1000 * 1000x1000, TA=0, TB=0: 2.354636 ms

$ clang -Ofast -o sgemm_avx256 sgemm_avx256.c -mavx
$ ./sgemm_avx256 
Matrix Multiplication 32x32 * 32x32, TA=0, TB=0: 0.000104 ms
Matrix Multiplication 100x100 * 100x100, TA=0, TB=0: 0.001764 ms
Matrix Multiplication 1000x1000 * 1000x1000, TA=0, TB=0: 1.725346 ms

axpy.c

dgemm_sse.c

dgemm_avx.c

sgemm_avx256.c

matmul4x4_sse

matmul4x4_avx