关于c ++：拆分字符串

Splitting a string

我有这个代码来拆分一个字符串。出于某种原因，它只是坐在那里什么也不做。我不知道是什么问题。顺便说一下，这里是delim = ' '。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

vector<string> split( const string &str, const char &delim )
{
typedef string::const_iterator iter;

iter beg = str.begin();

vector<string> tokens;

while(beg != str.end())
{
iter temp = find(beg, str.end(), delim);
if(beg != str.end())
tokens.push_back(string(beg, temp));
beg = temp;
}

return tokens;
}

相关讨论

我想我可以帮你调试一下，但从长远来看这对你没有帮助。这就是你要做的。

在每行之后，放置一个printf()或cout-staement将更改后的变量转储到标准输出。然后运行代码，向其传递一组简单的参数：

1	vector<string> x = split ("Hello there, Bob.", ' ');

然后，检查输出以了解实现不工作的原因。你可能必须破译代码，因为如果它只是坐在那里，你可能已经得到了一个新的尖牙无限循环。

Give a man a fish and he'll eat for a day, teach a man to fish, he'll never be hungry again.

或Terry Pratchett版本：

Give a man some fire and he'll be warm for a day, set a man on fire, he'll be warm for the rest of his life.

更新：

既然你说你真的按照我的建议做了，这就是我从中发现的。很明显，当你在while循环的末尾将beg设置为temp时，它指向空间。这是通过在while循环的顶部打印beg字符串发现的，它在提取第一个单词后从未改变。

然后，当您执行下一个find时，它会发现完全相同的空间，而不是先跳过空间，然后正确地调用find。您需要跳过每个find后面的空格，确保不会迭代到字符串末尾之外。

这是我的解决方案。随心所欲地使用它。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

#include <iostream>
#include <string>
#include <vector>
#include
using namespace std;

vector<string> split( const string &str, const char &delim ) {
typedef string::const_iterator iter;
iter beg = str.begin();
vector<string> tokens;

while(beg != str.end()) {
//cout <<":" << beg._Myptr <<":" << endl;
iter temp = find(beg, str.end(), delim);
if(beg != str.end())
tokens.push_back(string(beg, temp));
beg = temp;
while ((beg != str.end()) && (*beg == delim))
beg++;
}

return tokens;
}

int main () {
vector<string> x = split ("Hello, my name is Bob.", ' ');
return 0;
}

在while循环结束时，如果没有该跳空代码，则输出为：

1
2
3
4
5
6
7
8

:Hello, my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :

等等，无限大。通过跳过代码，您可以得到：

1
2
3
4
5

:Hello, my name is Bob. :
:my name is Bob. :
:name is Bob. :
:is Bob. :
:Bob. :

相关讨论

下面是另一个不错的基于boost的简短版本，它使用整个字符串作为分隔符：

1 2	std::vector<std::string> result; boost::iter_split(result, str, boost::first_finder(delim));

或不区分大小写：

1
2
3

std::vector<std::string> result;
boost::iter_split(result, str,
boost::first_finder(delim, boost::is_iequal()));

相关讨论

我很喜欢Boost，因为它也为这个提供了一个方便的解决方案：

1
2
3
4
5
6
7
8
9
10
11
12

std::vector<std::string> Split(const std::string &s, const std::string &d)
{
std::vector<std::string> v;

for (boost::split_iterator<std::string::iterator> i = boost::make_split_iterator(s, boost::first_finder(d, boost::is_iequal()));
i != boost::split_iterator<std::string::iterator>();
++i) {
v.push_back(boost::copy_range<std::string>(*i));
}

return v;
}

while循环中有一个问题，如果找到分隔符，那么temp将指向第一个在第一个find调用之后的第一个分隔符。

在while循环结束时，您将beg设置为temp的值。

现在，beg也指向第一个分隔符。

下次调用find时，它将再次返回beg的当前值，因为它指向一个分隔符。

temp没有从它以前的值开始移动，所以您处于无限循环中。

你不必重新发明轮子，Boost为你提供了一个字符串拆分功能。示例代码：

1
2
3
4
5

string stringtobesplit ="AA/BB-CC")
vector<string> tokens;

boost::split(tokens, stringtobesplit, boost::is_any_of("/-"));
// tokens now holds 3 items: AA BB CC

也许这个：

1
2
3
4
5
6
7
8

std::vector<std::string> &mysplit(const std::string &s, char delim, std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;
while(std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

相关讨论

find()将返回下一个标记的位置x。然后，当您将其分配给beg并进入下一个迭代时，它将开始一次又一次地搜索位置x…也就是说，你陷入了一个无止境的循环。

试试这个代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

vector<string> split( const string &str, const char &delim )
{
typedef string::const_iterator iter;

vector<string> tokens;
iter pos = str.begin(), last = str.begin();

while(pos != str.end()) {
last = pos;
pos = find(pos, str.end(), delim);

if (pos != str.end()) {
string token = string(last, pos);
if (token.length() > 0)
tokens.push_back(token);

last = ++pos;
}
}

string lastToken = string(last, pos);
if (lastToken.length() > 0)
tokens.push_back(lastToken);

return tokens;
}

这还有一个好处，即它将包括列表中的最后一个令牌(例如，在空间上拆分时，字符串"a b c"现在将返回令牌A、B和C，而不是仅返回A和B)，并且多个熟食不会导致空令牌。

除了需要用分隔符的大小递增的beg之外，还缺少一个特殊情况：字符串中没有分隔符的情况。

相关讨论

调试此代码的最简单方法是打印beg将要出现的所有位置。如果beg没有增加，那就是你的问题。