关于python：Eigen + MKL或OpenBLAS比Numpy / Scipy + OpenBLAS慢

Eigen + MKL or OpenBLAS slower than Numpy/Scipy + OpenBLAS

我从c ++ atm开始，想要使用矩阵并加速一般事情。之前使用过Python + Numpy + OpenBLAS。
思想c ++ + Eigen + MKL可能更快或至少不慢。

我的c ++代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

#define EIGEN_USE_MKL_ALL
#include <iostream>
#include <Eigen/Dense>
#include <Eigen/LU>
#include <chrono>

using namespace std;
using namespace Eigen;

int main()
{
int n = Eigen::nbThreads( );
cout <<"#Threads:" << n << endl;

uint16_t size = 4000;
MatrixXd a = MatrixXd::Random(size,size);

clock_t start = clock ();
PartialPivLU<MatrixXd> lu = PartialPivLU<MatrixXd>(a);

float timeElapsed = double( clock() - start ) / CLOCKS_PER_SEC;
cout <<"Elasped time is" << timeElapsed <<" seconds." << endl ;
}

我的Python代码：

1
2
3
4
5
6
7
8
9
10
11

import numpy as np
from time import time
from scipy import linalg as la

size = 4000

A = np.random.random((size, size))

t = time()
LU, piv = la.lu_factor(A)
print(time()-t)

我的时间：

1 2	C++ 2.4s Python 1.2s

为什么c ++比Python慢？

我正在编译c ++使用：

1	g++ main.cpp -o main -lopenblas -O3 -fopenmp -DMKL_LP64 -I/usr/local/include/mkl/include

MKL肯定在工作：如果我禁用它，运行时间大约是13秒。

我也试过C ++ + OpenBLAS，这也给了我2.4s左右。

有什么想法为什么C ++和Eigen比numpy / scipy慢？

相关讨论

时机错了。这是挂钟时间与CPU时间的典型症状。当我使用标题中的system_clock时，它"神奇地"变得更快。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

#define EIGEN_USE_MKL_ALL
#include <iostream>
#include <Eigen/Dense>
#include <Eigen/LU>
#include <chrono>

int main()
{
int const n = Eigen::nbThreads( );
std::cout <<"#Threads:" << n << std::endl;

int const size = 4000;
Eigen::MatrixXd a = Eigen::MatrixXd::Random(size,size);

auto start = std::chrono::system_clock::now();

Eigen::PartialPivLU<Eigen::MatrixXd> lu(a);

auto stop = std::chrono::system_clock::now();

std::cout <<"Elasped time is"
<< std::chrono::duration<double>{stop - start}.count()
<<" seconds." << std::endl;
}

我编译

1	icc -O3 -mkl -std=c++11 -DNDEBUG -I/usr/include/eigen3/ test.cpp

并获得输出

1 2	#Threads: 1 Elasped time is 0.295782 seconds.

您的Python版本在我的机器上报告0.399146080017。

或者，要获得可比较的时序，您可以在Python中使用time.clock()(CPU时间)而不是time.time()(挂钟时间)。