关于c ++：c ++ 11正则表达式比python慢

c++11 regex slower than python

嗨，我想知道为什么下面的代码使用regex来分割字符串

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

#include<regex>
#include<vector>
#include<string>

std::vector<std::string> split(const std::string &s){
static const std::regex rsplit(" +");
auto rit = std::sregex_token_iterator(s.begin(), s.end(), rsplit, -1);
auto rend = std::sregex_token_iterator();
auto res = std::vector<std::string>(rit, rend);
return res;
}

int main(){
for(auto i=0; i< 10000; ++i)
split("a b c","");
return 0;
}

比下面的python代码慢

1
2
3

import re
for i in range(10000):
re.split(' +', 'a b c')

这里是

1 2	> python test.py 0.05s user 0.01s system 94% cpu 0.070 total ./test 0.26s user 0.00s system 99% cpu 0.296 total

我在OSX上使用clang++。

用-O3编译会使它降到0.09s user 0.00s system 99% cpu 0.109 total。

相关讨论

您正在运行调试生成吗？使用模板时，请确保打开并调试选项；否则，代码中会有许多安全检查。
不只是CLAN++-O测试.O-C-STD= C++ 11 -STDLIB＝LBC++-Walth-O3 Test.CPP
好吧，python太棒了，所以这是可以预料的。
他们不做同样的事。例如，C++代码执行字符串连接，而Python没有。
我认为你在C++中花了很多时间来创建临时字符串。可能tosplit + '+'比tosplit +"+"快。
@Interjay说得很好，但我没有什么不同。我更新了我的问题。
对于python，regex只需编译/优化一次。C++正则表达式库将一次又一次地构建和优化正则表达式。就记录而言，尝试将rsplitregex定义为静态常量。对于python，re库可以与编译器一起维护优化的regex列表。
您可能会在C++版本中生成大量的正则表达式。如果你把rsplit的声明移到split之外会发生什么？
使其静止或向外移动使用-O3可降至0.09。
好吧，所以一次优化让你净赚了20%。现在找到下一个。：)请记住，python(解释器)已经实现了大多数这样的优化。
另外，增加循环执行的次数，以便使计时更加一致。
@迭戈SeVielin在PytoHNN 0.31与C++-O4-0.9S之间增加到100000
非常相关：stackoverflow.com/questions/9378500/&hellip；
这就是为什么人们将Python用于这样的任务：它减轻了程序员对影响性能的技术分析的需要。
@纳蒂科尔-回答得很好。另外，我猜所有的内存管理(在迭代器和向量之间切换)。尝试立即返回rit，看看您是否没有看到一个巨大的性能提升。
@Frankiethekneeman：我做到了，性能提升非常大，但是Python将所有这些都放到了一个列表中。
当然是这样，因为这是对付那个物体的方法。但是C++(或者更确切地说，是谁写了这个库)并不像Python那样看到世界，因此，只能给你一个使用迭代器的选项。我敢打赌，你可以在C++中编写一个相当有效的实现方式，但是你已经特别选择了一个低效的方法来实现这个解决方案。效率是一个英寸的游戏，虽然我敢肯定python在匹配时只在字符串上迭代一次，但至少要做两次，如果不是更多的话，这取决于S.end()是如何实现的。
您使用的是哪个CRT/堆？Python有一个优化的小块堆，因此分配和更重要的是，对字符串"A"进行重新分配不会像在没有这样一个东西的C++库中那样受到伤害。我怀疑这是很多"为什么是脚本语言>比C++更快"的问题的根源。
本：在我看到的所有STL实现中，小于16个字符的字符串或在堆栈上分配了一些内容。
我可以大致复制您的结果，并简单地替换LBC+++的STD：：RexEx与Booo:：ReGEX使C++版本击败Python约10-15%。我认为libc++的实现还没有特别有效。
@库比：谢谢你看一下这个实现的
我突然想到，在main()中实际分配了一个vector res，并且说res = split("a b c")可能会给乐观者更多的时间来处理(编辑，或者无论如何是另一个匹配的模式)，因为代码没有地方放置返回的res所以，因为编译器显然不只是将整个程序作为no op i删除。也就是说，它所做的分析是有限的。你在最近的海湾合作委员会试过这个吗？
@ ViktorSehr：您还没有看到ListSTDC++实现(在C++ 11之前)；它使用了COME和SysMeMy需要内存分配。
在C++ 1Y中有一个建议用于EDCOX1×0，它是一个字符串型接口，没有底层内存的所有权。我希望他们能使它与这个代码兼容！

通知

另请参阅此答案：https://stackoverflow.com/a/21708215，它是底部编辑2的基础。

我已经将循环增加到了1000000，以获得更好的时间度量。

这是我的python计时：

1
2
3

real 0m2.038s
user 0m2.009s
sys 0m0.024s

这里有一个相当于您的代码，只是稍微漂亮一点：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

#include <regex>
#include <vector>
#include <string>

std::vector<std::string> split(const std::string &s, const std::regex &r)
{
return {
std::sregex_token_iterator(s.begin(), s.end(), r, -1),
std::sregex_token_iterator()
};
}

int main()
{
const std::regex r(" +");
for(auto i=0; i < 1000000; ++i)
split("a b c", r);
return 0;
}

时机：

1
2
3

real 0m5.786s
user 0m5.779s
sys 0m0.005s

这是一种避免构造/分配向量和字符串对象的优化：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

#include <regex>
#include <vector>
#include <string>

void split(const std::string &s, const std::regex &r, std::vector<std::string> &v)
{
auto rit = std::sregex_token_iterator(s.begin(), s.end(), r, -1);
auto rend = std::sregex_token_iterator();
v.clear();
while(rit != rend)
{
v.push_back(*rit);
++rit;
}
}

int main()
{
const std::regex r(" +");
std::vector<std::string> v;
for(auto i=0; i < 1000000; ++i)
split("a b c", r, v);
return 0;
}

时机：

1
2
3

real 0m3.034s
user 0m3.029s
sys 0m0.004s

这几乎是100%的性能改进。

向量是在循环之前创建的，并且可以在第一次迭代中增长其内存。之后，clear()没有内存释放，向量维护内存并在适当的位置构造字符串。

另一个性能提升将是完全避免构造/破坏std::string，从而避免分配/解除分配其对象。

这是一个暂时的方向：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

#include <regex>
#include <vector>
#include <string>

void split(const char *s, const std::regex &r, std::vector<std::string> &v)
{
auto rit = std::cregex_token_iterator(s, s + std::strlen(s), r, -1);
auto rend = std::cregex_token_iterator();
v.clear();
while(rit != rend)
{
v.push_back(*rit);
++rit;
}
}

时机：

1
2
3

real 0m2.509s
user 0m2.503s
sys 0m0.004s

最终的改进是返回const char *的std::vector，其中每个char指针指向原始sc字符串本身内的子字符串。问题是，您不能这样做，因为它们中的每一个都不会被null终止(为此，请参见后面的示例中使用C++ 1y EDCOX1 5)。

最后的改进也可以通过以下方式实现：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

#include <regex>
#include <vector>
#include <string>

void split(const std::string &s, const std::regex &r, std::vector<std::string> &v)
{
auto rit = std::cregex_token_iterator(s.data(), s.data() + s.length(), r, -1);
auto rend = std::cregex_token_iterator();
v.clear();
while(rit != rend)
{
v.push_back(*rit);
++rit;
}
}

int main()
{
const std::regex r(" +");
std::vector<std::string> v;
for(auto i=0; i < 1000000; ++i)
split("a b c", r, v); // the constant string("a b c") should be optimized
// by the compiler. I got the same performance as
// if it was an object outside the loop
return 0;
}

我用3.3 clang(主干)和-o3制作了样品。也许其他的regex库能够更好地执行，但是在任何情况下，分配/释放常常会影响性能。

增压器

这是c字符串参数示例的boost::regex计时：

1
2
3

real 0m1.284s
user 0m1.278s
sys 0m0.005s

相同的代码，此示例中的boost::regex和std::regex接口是相同的，只需更改名称空间和include即可。

随着时间的推移，最好的愿望是，C++STDLIB ReGEX实现是在他们的幼年期。

编辑

为了完成这项工作，我尝试过(上述"最终改进"建议)，但没有在任何方面提高等效的std::vector &v版本的性能：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

#include <regex>
#include <vector>
#include <string>

template<typename Iterator> class intrusive_substring
{
private:
Iterator begin_, end_;

public:
intrusive_substring(Iterator begin, Iterator end) : begin_(begin), end_(end) {}

Iterator begin() {return begin_;}
Iterator end() {return end_;}
};

using intrusive_char_substring = intrusive_substring<const char *>;

void split(const std::string &s, const std::regex &r, std::vector<intrusive_char_substring> &v)
{
auto rit = std::cregex_token_iterator(s.data(), s.data() + s.length(), r, -1);
auto rend = std::cregex_token_iterator();
v.clear(); // This can potentially be optimized away by the compiler because
// the intrusive_char_substring destructor does nothing, so
// resetting the internal size is the only thing to be done.
// Formerly allocated memory is maintained.
while(rit != rend)
{
v.emplace_back(rit->first, rit->second);
++rit;
}
}

int main()
{
const std::regex r(" +");
std::vector<intrusive_char_substring> v;
for(auto i=0; i < 1000000; ++i)
split("a b c", r, v);

return 0;
}

这与数组引用和字符串引用建议有关。下面是使用它的示例代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

#include <regex>
#include <vector>
#include <string>
#include <string_ref>

void split(const std::string &s, const std::regex &r, std::vector<std::string_ref> &v)
{
auto rit = std::cregex_token_iterator(s.data(), s.data() + s.length(), r, -1);
auto rend = std::cregex_token_iterator();
v.clear();
while(rit != rend)
{
v.emplace_back(rit->first, rit->length());
++rit;
}
}

int main()
{
const std::regex r(" +");
std::vector<std::string_ref> v;
for(auto i=0; i < 1000000; ++i)
split("a b c", r, v);

return 0;
}

对于带有矢量返回的split，返回string_ref的矢量也比返回string的矢量要便宜。

编辑2

这个新的解决方案能够通过返回获得输出。我使用了Marshall Clow的string_view(string_ref更名)libc++实现，可在https://github.com/mclow/string_视图中找到。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

#include <string>
#include <string_view>
#include <boost/regex.hpp>
#include <boost/range/iterator_range.hpp>
#include <boost/iterator/transform_iterator.hpp>

using namespace std;
using namespace std::experimental;
using namespace boost;

string_view stringfier(const cregex_token_iterator::value_type &match) {
return {match.first, static_cast<size_t>(match.length())};
}

using string_view_iterator =
transform_iterator<decltype(&stringfier), cregex_token_iterator>;

iterator_range<string_view_iterator> split(string_view s, const regex &r) {
return {
string_view_iterator(
cregex_token_iterator(s.begin(), s.end(), r, -1),
stringfier
),
string_view_iterator()
};
}

int main() {
const regex r(" +");
for (size_t i = 0; i < 1000000; ++i) {
split("a b c", r);
}
}

时机：

1
2
3

real 0m0.385s
user 0m0.385s
sys 0m0.000s

请注意，与以前的结果相比，这一过程要快得多。当然，它不会填充循环中的vector，也不会提前匹配任何内容，但无论如何，您都会得到一个范围，您可以使用基于范围的for，甚至使用它来填充vector。

由于在iterator_range的范围内，在原始string的范围内(或以空结束的字符串)创建string_view，因此这变得非常轻，不会产生不必要的字符串分配。

为了比较使用这个split实现，但实际上填充了一个vector，我们可以这样做：

1
2
3
4
5
6
7
8
9

int main() {
const regex r(" +");
vector<string_view> v;
v.reserve(10);
for (size_t i = 0; i < 1000000; ++i) {
copy(split("a b c", r), back_inserter(v));
v.clear();
}
}

这将使用增强范围复制算法在每次迭代中填充向量，时间为：

1
2
3

real 0m1.002s
user 0m0.997s
sys 0m0.004s

可以看出，与优化的string_view输出参数版本相比没有太大的差异。

注意，还有一个关于std::split的建议，可以这样工作。