为什么C++中的字符串分割比Python更慢？

Why is splitting a string slower in C++ than Python?

我试图把一些代码从Python转换成C++，以获得一点速度，提高我生锈的C++技能。昨天，我感到震惊的是，从STDIN的一个幼稚的实现在Python中比C++更快得多(见图)。今天，我终于知道了如何在C++中用一个分隔符来合并一个字符串(类似于Python的SPLITE()的语义，现在我正在体验似曾相识！我的C++代码要花更长的时间来完成这项工作(虽然不像昨天的课那样多)。

Python代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

#!/usr/bin/env python
from __future__ import print_function
import time
import sys

count = 0
start_time = time.time()
dummy = None

for line in sys.stdin:
dummy = line.split()
count += 1

delta_sec = int(time.time() - start_time)
print("Python: Saw {0} lines in {1} seconds.".format(count, delta_sec), end='')
if delta_sec > 0:
lps = int(count/delta_sec)
print(" Crunch Speed: {0}".format(lps))
else:
print('')

C++代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65

#include <iostream>
#include <string>
#include <sstream>
#include <time.h>
#include <vector>

using namespace std;

void split1(vector<string> &tokens, const string &str,
const string &delimiters ="") {
// Skip delimiters at beginning
string::size_type lastPos = str.find_first_not_of(delimiters, 0);

// Find first non-delimiter
string::size_type pos = str.find_first_of(delimiters, lastPos);

while (string::npos != pos || string::npos != lastPos) {
// Found a token, add it to the vector
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters
lastPos = str.find_first_not_of(delimiters, pos);
// Find next non-delimiter
pos = str.find_first_of(delimiters, lastPos);
}
}

void split2(vector<string> &tokens, const string &str, char delim=' ') {
stringstream ss(str); //convert string to stream
string item;
while(getline(ss, item, delim)) {
tokens.push_back(item); //add token to vector
}
}

int main() {
string input_line;
vector<string> spline;
long count = 0;
int sec, lps;
time_t start = time(NULL);

cin.sync_with_stdio(false); //disable synchronous IO

while(cin) {
getline(cin, input_line);
spline.clear(); //empty the vector for the next line to parse

//I'm trying one of the two implementations, per compilation, obviously:
// split1(spline, input_line);
split2(spline, input_line);

count++;
};

count--; //subtract for final over-read
sec = (int) time(NULL) - start;
cerr <<"C++ : Saw" << count <<" lines in" << sec <<" seconds." ;
if (sec > 0) {
lps = count / sec;
cerr <<" Crunch speed:" << lps << endl;
} else
cerr << endl;
return 0;

//compiled with: g++ -Wall -O3 -o split1 split_1.cpp

请注意，我尝试了两种不同的拆分实现。其中一个(split1)使用字符串方法搜索令牌，能够合并多个令牌以及处理多个令牌(它来自这里)。第二个(split2)使用getline将字符串作为流读取，不合并分隔符，只支持一个delimeter字符(该字符是由多个stackoverflow用户在回答字符串拆分问题时发布的)。

我多次按不同的顺序运行这个。我的测试机是MacBookPro(2011年，8GB，四核)，没什么关系。我正在用一个20M行的文本文件进行测试，该文件有三个空格分隔的列，每个列看起来都类似这样："foo.bar 127.0.0.1 home.foo.bar"

结果：

1
2
3
4
5
6
7
8
9

$ /usr/bin/time cat test_lines_double | ./split.py
15.61 real 0.01 user 0.38 sys
Python: Saw 20000000 lines in 15 seconds. Crunch Speed: 1333333
$ /usr/bin/time cat test_lines_double | ./split1
23.50 real 0.01 user 0.46 sys
C++ : Saw 20000000 lines in 23 seconds. Crunch speed: 869565
$ /usr/bin/time cat test_lines_double | ./split2
44.69 real 0.02 user 0.62 sys
C++ : Saw 20000000 lines in 45 seconds. Crunch speed: 444444

我做错什么了？是否有更好的方法来做C++中不依赖外部库(即无升压)的字符串分割，支持合并定界符序列(如Python的分割)，线程安全(Strutk)，其性能至少与Python相媲美？

编辑1/部分解决方案？：

我尝试让Python重置虚拟列表并每次添加到它，就像C++一样，使它更公平的比较。这还不完全是C++代码所做的，但它有点接近。基本上，循环现在是：

1
2
3
4

for line in sys.stdin:
dummy = []
dummy += line.split()
count += 1

Python的性能现在与SPLIT1 C++实现大致相同。

1
2
3

/usr/bin/time cat test_lines_double | ./split5.py
22.61 real 0.01 user 0.40 sys
Python: Saw 20000000 lines in 22 seconds. Crunch Speed: 909090

我仍然感到惊讶，即使Python对字符串处理(如Matt Joiner建议)那样优化，这些C++实现也不会更快。如果有人对如何使用C++进行更优化的方式有想法，请共享您的代码。(我认为我的下一步将尝试在纯C中实现这一点，尽管我不会为了在C中重新实现我的整个项目而牺牲程序员的生产力，所以这只是一个字符串拆分速度的实验。)

谢谢大家的帮助。

最终编辑/解决方案：

请参阅ALF接受的答案。由于python严格按引用处理字符串，并且STL字符串经常被复制，所以使用普通的python实现时性能更好。为了进行比较，我通过alf的代码编译和运行我的数据，这里是所有其他运行在同一台机器上的性能，基本上与原始的python实现相同(尽管比重置/附加列表的python实现更快，如上面的编辑所示)：

1
2
3

$ /usr/bin/time cat test_lines_double | ./split6
15.09 real 0.01 user 0.45 sys
C++ : Saw 20000000 lines in 15 seconds. Crunch speed: 1333333

我唯一剩下的抱怨是关于在这种情况下获得C++所必需的代码量。

这里从这个问题和昨天的stdin行阅读问题(链接在上面)中得到的一个教训是，我们应该始终进行基准测试，而不是对语言的相对"默认"性能做出幼稚的假设。我很感激你的教育。

再次感谢大家的建议！

相关讨论

你是如何编译C++程序的？是否启用了优化？
@interjay：在他源代码中的最后一条评论中：g++ -Wall -O3 -o split1 split_1.cpp@jjc：当你实际使用dummy和spline时，你的基准是如何运行的，也许python取消了对line.split()的调用，因为它没有副作用？
如果删除拆分，只留下stdin中的读取行，会得到什么结果？
python是用C语言编写的。这意味着有一种有效的方法可以用C语言实现。也许有比使用STL更好的方法来拆分字符串？
@Interjay请查看我昨天的问题(链接在顶部附近的问题中)。刚从STDIN读取的代码在关闭IO同步之后，比Python快一点，即CIN。
这可能与一直调整矢量大小的"向后推"有关？试试reserve(200)看看它能给你带来什么。
@IX013：我也这么认为。我给出了一个解决方案，要么使用reserve()，要么使用std::list，将句子后推，并在末尾将列表分配给向量。
为什么std:：string操作执行得不好？
JJC:我已经编辑了我的答案，把我相信的东西添加到你的Python代码中，使之成为与C++的"公平"比较。
@Mattjoiner我以前没有看到过这个问题，尽管我认为这里产生的讨论，包括你的评论和ALF的评论，都是值得保留的。字符串分割代码对于Python程序员学习C++也是有用的，它可以搜索这些精确的短语(即Python、C++、String)。
@JJC：你能再做一次我帖子里提到的编辑吗？
在我的系统中，split2比split.py快一点，与python的功能完全匹配的手写拆分速度是split.py的两倍多。
@嗯，我想知道这是否取决于数据。我在两台不同的机器上运行我的Benshmarks，得到了一致的结果。你能分享一下你手写的分割码吗？
@Vitefalcon尝试过，在你的答案上公布了结果。
@JJC：谢谢你的尝试。也消除了我的一些疑虑。一个很好的问题，很高兴知道你找到了答案。投票和主演：)
@Vitefalcon感谢并感谢您的建议！这让人大开眼界。
我的代码与公认的答案中的代码类似，只是我只是到处使用普通的std::string，而不是stringref。
@N.M.不管怎样你都能分享吗？如果您不需要编写自己的StringRef类，那么您的代码将更加节俭，从而为大多数人提供更好的解决方案。请分享，我保证会投赞成票。；-)
@JJC好的，看答案
JJC：下面的文章给出了一个很好的C++中的字符串分割的实现：CODEPROS.COM/ToC/23198／C String TooKiTi Strtk ToKeNi和ZWNJ；
部分问题是您没有"使用"数据。如果你在计算单词和字符的数量时看到结果(Github. COM/TBBEZ/Stru-StultIt/Pul/2)，那么几乎所有的C/C++版本都会打败Python版本。

作为猜测，Python字符串是引用计数的不可变字符串，这样就不会在Python代码中复制任何字符串，而C++ EDCOX1(6)则是一个可变的值类型，并在最小的机会被复制。

如果目标是快速拆分，那么将使用固定时间子串操作，这意味着只引用原始字符串的部分，如Python(和Java和C.S.& Helip；)。

C++ EDCOX1〔6〕类具有一个可取的特征，虽然它是标准的，因此它可以用来在效率不是主要考虑的情况下安全且可移植地传递字符串。但是足够的聊天。代码——在我的机器上，这当然比Python快，因为Python的字符串处理是在C++中实现的，这是C++的一个子集(He He)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89

#include <iostream>
#include <string>
#include <sstream>
#include <time.h>
#include <vector>

using namespace std;

class StringRef
{
private:
char const* begin_;
int size_;

public:
int size() const { return size_; }
char const* begin() const { return begin_; }
char const* end() const { return begin_ + size_; }

StringRef( char const* const begin, int const size )
: begin_( begin )
, size_( size )
{}
};

vector<StringRef> split3( string const& str, char delimiter = ' ' )
{
vector<StringRef> result;

enum State { inSpace, inToken };

State state = inSpace;
char const* pTokenBegin = 0; // Init to satisfy compiler.
for( auto it = str.begin(); it != str.end(); ++it )
{
State const newState = (*it == delimiter? inSpace : inToken);
if( newState != state )
{
switch( newState )
{
case inSpace:
result.push_back( StringRef( pTokenBegin, &*it - pTokenBegin ) );
break;
case inToken:
pTokenBegin = &*it;
}
}
state = newState;
}
if( state == inToken )
{
result.push_back( StringRef( pTokenBegin, &*str.end() - pTokenBegin ) );
}
return result;
}

int main() {
string input_line;
vector<string> spline;
long count = 0;
int sec, lps;
time_t start = time(NULL);

cin.sync_with_stdio(false); //disable synchronous IO

while(cin) {
getline(cin, input_line);
//spline.clear(); //empty the vector for the next line to parse

//I'm trying one of the two implementations, per compilation, obviously:
// split1(spline, input_line);
//split2(spline, input_line);

vector<StringRef> const v = split3( input_line );
count++;
};

count--; //subtract for final over-read
sec = (int) time(NULL) - start;
cerr <<"C++ : Saw" << count <<" lines in" << sec <<" seconds." ;
if (sec > 0) {
lps = count / sec;
cerr <<" Crunch speed:" << lps << endl;
} else
cerr << endl;
return 0;
}

//compiled with: g++ -Wall -O3 -o split1 split_1.cpp -std=c++0x

免责声明：我希望没有任何错误。我没有测试功能，只检查了速度。但我认为，即使有一两个错误，纠正它也不会显著影响速度。

相关讨论

谢谢！这是有道理的。我得到了一个关于迭代器类型的错误，所以我还不能编译它(仍然在谷歌搜索)，但是一旦我发现了它，我会用苹果到苹果的速度结果更新我的问题(如果这是其他人可能会有的问题，请随时发布更新)。我逐字复制了您的代码，并在OS-X 10.6上使用G++4.2.1版)。
是的，python字符串是引用计数的对象，所以python复制的次数要少得多。但是，它们仍然在引擎盖下包含以空结尾的C字符串，而不像您的代码那样(指针、大小)对。
@JCC：你记得加上-std=c++0x选项吗？如果4.2.1版本不支持auto，那么只需使用std::string::const_iterator。
@拉拉森德赫。-阿尔夫，我漏掉了连字符，编译器在告诉我它不认识这个论点之前，一直在胡说八道。；-)不过，这让我了解到了Constantd_迭代器！=迭代器。所以，这是一个值得追求的目标。再次感谢你的回答！
很好的解决方案。就我个人而言，我可能不会费心，只是使用了成对的迭代器。不是我提倡这样。；-)
换言之，对于更高级别的工作，比如文本操作，坚持使用更高级别的语言，几十年来，数十名开发人员一直在努力高效地完成这项工作，或者只是准备与所有开发人员一样多地工作，以便在较低的级别上拥有可比的东西。
@JSBUENO在考虑了如何在代码中使用ALF的StringRef对象之后，我刚刚得出了类似的结论。：-)我仍然喜欢他的解决方案，当原始拆分速度很重要，并且大多数值以后不会使用时。否则，我太难理解如何在代码中轻松地使用这些StringRef对象。我认为一个返回实际字符串对象的类方法是一个很好的开始。
与此相关的是，ALF，如果您有更多的示例代码来说明如何在需要存储、比较、映射这些字符串对象的大型程序中使用这样的字符串引用，那么这将非常有帮助，并且具有教育意义。如果你有时间的话，贴个便条或者写个博客会很好。再次感谢。
@JJC：这个可怜人的解决办法就是江户十一〔0〕。但请记住，这不是解决这个问题的方法，而是解决您更一般的需求的方法，尤其是存储此类引用的能力。用C++ 03，这个原始的解决方案可以加快速度，例如，非常大的排序，但是有一个阈值：在某个时候，字符串数据必须复制，例如谷歌的API，或者任何不支持这一点的API。要获得更合适的解决方案的草图，请查看我在SourceForge的旧"StringValue"项目。
@JJC：对于StringRef，可以很容易地将子字符串复制到std::string，只需string( sr.begin(), sr.end() )。
@干杯。-谢谢，我来看看这些提示。
我希望cpython字符串的复制少一些。是的，它们是引用计数且不可变的，但是str.split()使用调用PyObject_MALLOC()的PyString_FromStringAndSize()为每个项分配新字符串。因此，没有一种共享表示的优化方法可以利用字符串在Python中是不可变的。
维护人员：请不要试图通过修复感知到的错误来引入错误(特别是与cplusplus.com无关)。短暂性脑缺血发作

我并没有提供任何更好的解决方案(至少从性能上讲)，但提供了一些可能很有趣的附加数据。

使用strtok_r(strtok的可重入变体)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

void splitc1(vector<string> &tokens, const string &str,
const string &delimiters ="") {
char *saveptr;
char *cpy, *token;

cpy = (char*)malloc(str.size() + 1);
strcpy(cpy, str.c_str());

for(token = strtok_r(cpy, delimiters.c_str(), &saveptr);
token != NULL;
token = strtok_r(NULL, delimiters.c_str(), &saveptr)) {
tokens.push_back(string(token));
}

free(cpy);
}

另外，参数使用字符串，输入使用fgets：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

void splitc2(vector<string> &tokens, const char *str,
const char *delimiters) {
char *saveptr;
char *cpy, *token;

cpy = (char*)malloc(strlen(str) + 1);
strcpy(cpy, str);

for(token = strtok_r(cpy, delimiters, &saveptr);
token != NULL;
token = strtok_r(NULL, delimiters, &saveptr)) {
tokens.push_back(string(token));
}

free(cpy);
}

在某些情况下，销毁输入字符串是可以接受的：

1
2
3
4
5
6
7
8
9
10
11

void splitc3(vector<string> &tokens, char *str,
const char *delimiters) {
char *saveptr;
char *token;

for(token = strtok_r(str, delimiters, &saveptr);
token != NULL;
token = strtok_r(NULL, delimiters, &saveptr)) {
tokens.push_back(string(token));
}
}

这些问题的时间安排如下(包括我对问题的其他变体的结果和接受的答案)：

1
2
3
4
5
6
7
8
9

split1.cpp: C++ : Saw 20000000 lines in 31 seconds. Crunch speed: 645161
split2.cpp: C++ : Saw 20000000 lines in 45 seconds. Crunch speed: 444444
split.py: Python: Saw 20000000 lines in 33 seconds. Crunch Speed: 606060
split5.py: Python: Saw 20000000 lines in 35 seconds. Crunch Speed: 571428
split6.cpp: C++ : Saw 20000000 lines in 18 seconds. Crunch speed: 1111111

splitc1.cpp: C++ : Saw 20000000 lines in 27 seconds. Crunch speed: 740740
splitc2.cpp: C++ : Saw 20000000 lines in 22 seconds. Crunch speed: 909090
splitc3.cpp: C++ : Saw 20000000 lines in 20 seconds. Crunch speed: 1000000

正如我们所看到的，从被接受的答案中得到的解决方案仍然是最快的。

对于任何想做进一步测试的人，我还安装了一个Github repo，其中包含问题中的所有程序、接受的答案、这个答案，以及生成测试数据的makefile和脚本：https://github.com/tobbez/string-spliting。

相关讨论

我怀疑这是因为在push-back()函数调用过程中，std::vector的大小被调整了。如果你试图用std::list或std::vector::reserve()来为句子留出足够的空间，你应该得到更好的表现。或者您可以对split1()使用以下两种方法的组合：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

void split1(vector<string> &tokens, const string &str,
const string &delimiters ="") {
// Skip delimiters at beginning
string::size_type lastPos = str.find_first_not_of(delimiters, 0);

// Find first non-delimiter
string::size_type pos = str.find_first_of(delimiters, lastPos);
list<string> token_list;

while (string::npos != pos || string::npos != lastPos) {
// Found a token, add it to the list
token_list.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters
lastPos = str.find_first_not_of(delimiters, pos);
// Find next non-delimiter
pos = str.find_first_of(delimiters, lastPos);
}
tokens.assign(token_list.begin(), token_list.end());
}

编辑：我看到的另一件明显的事情是每次都分配python变量dummy，但没有修改。所以这不是对C++的公平比较。您应该尝试将python代码修改为dummy = []来初始化它，然后执行dummy += line.split()。你能在这之后报告运行时间吗？

Edt2:为了使它更公平，您可以修改C++代码中的while循环为：

1
2
3
4
5
6
7
8
9
10

while(cin) {
getline(cin, input_line);
std::vector<string> spline; // create a new vector

//I'm trying one of the two implementations, per compilation, obviously:
// split1(spline, input_line);
split2(spline, input_line);

count++;
};

相关讨论

我认为下面的代码是更好的，使用一些C++ 17和C++ 14的特点：

// These codes are un-tested when I write this post, but I'll test it
// When I'm free, and I sincerely welcome others to test and modify this
// code.

// C++17
#include <istream> // For std::istream.
#include <string_view> // new feature in C++17, sizeof(std::string_view) == 16 in libc++ on my x86-64 debian 9.4 computer.
#include <string>
#include <utility> // C++14 feature std::move.

template <template <class...> class Container, class Allocator>
void split1(Container<std::string_view, Allocator> &tokens,
std::string_view str,
std::string_view delimiter ="")
{
/*
* The model of the input string:
*
* (optional) delimiter | content | delimiter | content | delimiter|
* ... | delimiter | content
*
* Using std::string::find_first_not_of or
* std::string_view::find_first_not_of is a bad idea, because it
* actually does the following thing:
*
* Finds the first character not equal to any of the characters
* in the given character sequence.
*
* Which means it does not treeat your delimiters as a whole, but as
* a group of characters.
*
* This has 2 effects:
*
* 1. When your delimiters is not a single character, this function
* won't behave as you predicted.
*
* 2. When your delimiters is just a single character, the function
* may have an additional overhead due to the fact that it has to
* check every character with a range of characters, although
* there's only one, but in order to assure the correctness, it still
* has an inner loop, which adds to the overhead.
*
* So, as a solution, I wrote the following code.
*
* The code below will skip the first delimiter prefix.
* However, if there's nothing between 2 delimiter, this code'll
* still treat as if there's sth. there.
*
* Note:
* Here I use C++ std version of substring search algorithm, but u
* can change it to Boyer-Moore, KMP(takes additional memory),
* Rabin-Karp and other algorithm to speed your code.
*
*/

// Establish the loop invariant 1.
typename std::string_view::size_type
next,
delimiter_size = delimiter.size(),
pos = str.find(delimiter) ? 0 : delimiter_size;

// The loop invariant:
// 1. At pos, it is the content that should be saved.
// 2. The next pos of delimiter is stored in next, which could be 0
// or std::string_view::npos.

do {
// Find the next delimiter, maintain loop invariant 2.
next = str.find(delimiter, pos);

// Found a token, add it to the vector
tokens.push_back(str.substr(pos, next));

// Skip delimiters, maintain the loop invariant 1.
//
// @ next is the size of the just pushed token.
// Because when next == std::string_view::npos, the loop will
// terminate, so it doesn't matter even if the following
// expression have undefined behavior due to the overflow of
// argument.
pos = next + delimiter_size;
} while(next != std::string_view::npos);
}

template <template <class...> class Container, class traits, class Allocator2, class Allocator>
void split2(Container<std::basic_string<char, traits, Allocator2>, Allocator> &tokens,
std::istream &stream,
char delimiter = ' ')
{
std::string<char, traits, Allocator2> item;

// Unfortunately, std::getline can only accept a single-character
// delimiter.
while(std::getline(stream, item, delimiter))
// Move item into token. I haven't checked whether item can be
// reused after being moved.
tokens.push_back(std::move(item));
}

容器的选择：

std::vector。

假设分配的内部数组的初始大小为1，最终大小为n，则将分配和取消分配log2(n)次，并复制(2^(log2(n)+1)-1)=(2n-1)次。如中所指出的，std：：vector的性能差是因为没有调用realloc对数次吗？，当矢量的大小不可预测并且可能非常大时，这可能会有较差的性能。但是，如果你能估计出它的大小，这就没什么问题了。

std::list。

对于每一次向后推，它所消耗的时间是一个常量，但是在单个向后推时，它可能比std：：vector花费的时间要多。使用每个线程的内存池和自定义分配器可以缓解这个问题。

std::forward_list。

与std：：list相同，但每个元素占用的内存较少。由于缺少api push_back，需要包装类才能工作。

std::array。

如果您知道增长的极限，那么可以使用std：：array。当然，你不能直接使用它，因为它没有API的后推功能。但是你可以定义一个包装器，我认为这是最快的方法，如果你的估计非常准确的话，可以节省一些内存。

std::deque。

此选项允许您交换内存以获得性能。元素的副本不会是(2^(n+1)-1)次，只会是n次分配，并且不会解除分配。另外，您将拥有恒定的随机访问时间，并且能够在两端添加新元素。

根据标准：Deque CPP参考

On the other hand, deques typically have large minimal memory cost; a
deque holding just one element has to allocate its full internal array
(e.g. 8 times the object size on 64-bit libstdc++; 16 times the object size
or 4096 bytes, whichever is larger, on 64-bit libc++)

或者可以使用以下组合：

std::vector< std::array >

这类似于std:：deque，区别在于这个容器不支持在前面添加元素。但它的性能仍然更快，因为它不会复制基础std:：array(2^(n+1)-1)次，它只复制指针数组(2^(n-m+1)-1)次，并且只在当前数组已满且不需要解除分配任何内容时分配新数组。顺便说一下，你可以得到恒定的随机访问时间。

std::list< std::array >

大大减轻了记忆框架的压力。它只在当前数组已满时分配新数组，不需要复制任何内容。您仍然需要为连接到组合1的额外指针支付价格。

std::forward_list< std::array >

与2相同，但与组合1的内存相同。

相关讨论

如果采用split1实现并通过更改此项将签名更改为更接近split2的签名：

1	void split1(vector<string> &tokens, const string &str, const string &delimiters ="")

对此：

1	void split1(vector<string> &tokens, const string &str, const char delimiters = ' ')

Split1和Split2之间的差别更大，比较起来也更公平：

1
2
3

split1 C++ : Saw 10000000 lines in 41 seconds. Crunch speed: 243902
split2 C++ : Saw 10000000 lines in 144 seconds. Crunch speed: 69444
split1' C++ : Saw 10000000 lines in 33 seconds. Crunch speed: 303030

您错误地认为，您选择的C++实现必须比Python更快。Python中的字符串处理是高度优化的。请参阅此问题了解更多信息：为什么std:：string操作执行得不好？

相关讨论

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

void split5(vector<string> &tokens, const string &str, char delim=' ') {

enum { do_token, do_delim } state = do_delim;
int idx = 0, tok_start = 0;
for (string::const_iterator it = str.begin() ; ; ++it, ++idx) {
switch (state) {
case do_token:
if (it == str.end()) {
tokens.push_back (str.substr(tok_start, idx-tok_start));
return;
}
else if (*it == delim) {
state = do_delim;
tokens.push_back (str.substr(tok_start, idx-tok_start));
}
break;

case do_delim:
if (it == str.end()) {
return;
}
if (*it != delim) {
state = do_token;
tok_start = idx;
}
break;
}
}
}

相关讨论

我怀疑这与Python中的Sy.StdIn缓冲有关，但是C++实现中没有缓冲。

有关如何更改缓冲区大小的详细信息，请参阅本文，然后重试比较：为sys.stdin设置较小的缓冲区大小？

相关讨论