关于C++：我如何迭代字符串的单词？

How do I iterate over the words of a string?

我正在尝试迭代字符串中的单词。

可以假定字符串由空格分隔的单词组成。

注意，我对C字符串函数或那种字符操作/访问不感兴趣。此外，请在回答中优先考虑优雅而不是效率。

我现在拥有的最佳解决方案是：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main()
{
string s ="Somewhere down the road";
istringstream iss(s);

do
{
string subs;
iss >> subs;
cout <<"Substring:" << subs << endl;
} while (iss);
}

有更优雅的方法吗？

相关讨论

我用这个来分隔字符串。第一个将结果放入一个预先构建的向量中，第二个返回一个新的向量。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

#include <string>
#include <sstream>
#include <vector>
#include <iterator>

template<typename Out>
void split(const std::string &s, char delim, Out result) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
*(result++) = item;
}
}

std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}

请注意，此解决方案不会跳过空令牌，因此下面将找到4个项，其中一个为空：

1	std::vector<std::string> x = split("one:two::three", ':');

相关讨论

优雅的解决方案，我总是忘记这个特别的"getline"，你我不相信它知道引号和转义序列。
@史蒂金：你是说split("one two three", ' ');返回一个包含4个元素的向量吗？我不确定是这样，但我会测试的。
等等，格式似乎删除了一些空格(或者我忘了它们)：我说的是字符串"一二三"，在"二"和"三"之间有两个空格
我喜欢这个解决方案，但是，我将函数包装在一个模板中，将vectors std:：string模板参数更改为一个参数。对于我来说，我还使用了push-back中所说模板参数的boost:：lexical-cast。
如何修改它以与std:：wstring一起使用，std:：getline不能正常工作？
std::getline是模板化的，因此如果没有看到en.cppreference.com/w/cpp/string/basic_string/getline，它可能"只是工作"，以了解如何调整它。传递一个wchar_t字符，因为熟食可能足以触发正确的模板。
如果您正在启用返回值优化，您不能使函数返回空值吗？
为了避免跳过空令牌，请执行empty()检查：if (!item.empty()) elems.push_back(item)。
熟食店有两个字符，一个是->，怎么样？
@Herohuyongtao，此解决方案仅适用于单字符分隔符。
@你是怎么在模板里做的？
@Evanteran这可能不是关于拆分字符串，而是关于代码中的一般疑问，您作为引用参数传递的元素，并再次返回引用。我只是想知道有什么原因吗？
@Jeshwanthkumarnk，这是不必要的，但是它可以让你做一些事情，比如直接将结果传递给这样的函数：f(split(s, d, v))，如果你愿意的话，还可以享受预先分配的vector的好处。
警告：split("one:two:three"，"：")和split("one:two:three："，"：")返回相同的值。
几乎完美：split(":abc:def:", ':');只返回3而不是4个元素！
能够设置返回元素的最大数目对我来说至关重要。
@jonny，应该是琐碎的，只需在while循环中添加一个额外的条件，将vector的大小与max进行比较。
强尼，我明白了。你的答案看起来有点复杂。如果将max默认为size_t(-1)，那么实际上就是"无穷大"(这是您的系统可以表示的最大大小，因此在点击之前，您将耗尽RAM)。然后你可以把条件简化为我上面的评论。不再需要再次检查流状态并进行第二次读取等操作。只是一个建议：—)。
@Jonny，例如，请参见：pastebin.com/njyikvda
可能是错误的，但这样可能会丢失字符串的结尾。基本上我模仿了PHP的爆炸函数，我相信是这样的。
抓住了。我的解决方案将停在max_count，跳过字符串的其余部分(因为它找到了所需的数量)。我猜你在找的东西总是能把最后一根变成绳子的其余部分。我这里也有类似的函数：github.com/eteran/cpp utilities/blob/master/string.h有些函数是专门设计的，以尽可能接近地匹配PHP的字符串操作函数：-)
为什么不是return split(s, delim, std::vector());？
@加布里埃尔，你可以。但我认为，当它被写下来时(几年前)，有一个命名变量可以更可靠地鼓励nvro。用C++ 11移动语义，它可能差别不大。
请注意，如果您使用的是opencv，则split可能与opencv中分割图像的split混淆。
我真希望他们能用这个签名添加一个标准方法：vector std::string::split(char delimiter = ' ');。

对于它的价值，这里有另一种方法从输入字符串中提取令牌，只依赖于标准的库设施。这是STL设计背后的力量和优雅的一个例子。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

#include <iostream>
#include <string>
#include <sstream>
#include
#include <iterator>

int main() {
using namespace std;
string sentence ="And I feel fine...";
istringstream iss(sentence);
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
ostream_iterator<string>(cout,"
"));
}

可以使用相同的通用copy算法将提取的令牌插入容器，而不是将它们复制到输出流中。

1
2
3
4

vector<string> tokens;
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter(tokens));

…或者直接创建vector：

1 2	vector<string> tokens{istream_iterator<string>{iss}, istream_iterator<string>{}};

相关讨论

是否可以为此指定分隔符？比如说拆分逗号？
@L3DX：参数"n"似乎是分隔符。这段代码很好，但我想更了解它。也许有人能解释这段话的每一行？
@Jonathan:不是这个例子中的分隔符，它是输出到cout的deliminer。
基于此：cplusplus.com/reference/algorithm/copy no.空白行为是istream_iterator的函数。你自己滚会更优雅。
@graham.reeds，@l3dx:请不要编写另一个无法处理引用字段的csv解析器：en.wikipedia.org/wiki/comma-separated_values
这是一个糟糕的解决方案，因为它不采用任何其他分隔符，因此不可扩展和不可维护。
对于询问这是如何工作的人：使用较少stl的等效代码看起来像string token; istringstream iss(sentence); while (iss >> token) { cout << token; }或{ tokens.push_back(token); }。
为什么我在VS2008中得到"错误C2664:'Std:：Back_Inserter'：无法将参数1从'Std:：Vector<(U ty>(u cdecl*)(void)'转换为'Std:：Vector<(U ty>&；'"？
back_inserter的模板参数应该是string，而不是vector。也就是说，应该是back_inserter(tokens)，而不是back_inserter>(tokens)。
如果你关心实际中的优雅(例如，用更少的代码做更多的事情)：slideshare.net/rawwell/iteratorsmusto
实际上，这对于其他分隔符也可以很好地工作(尽管这样做有点难看)。创建一个ctype方面，将所需的分隔符分类为空白，创建一个包含该方面的区域设置，然后在提取字符串之前将该区域设置嵌入到stringstream中。
IStream_迭代器的主要用途是可以从IStream中解析int、float、double等：IStream_迭代器执行由空格分隔的体面的读取double作业。有了前面或后面的插入器，这是一个很好的组合！：)
vector有一个取开始迭代器和结束迭代器的ctor，因此不需要复制调用将它们插入容器中。
@Kinderchocolate"可以假设字符串由空格分隔的单词组成"——嗯，听起来不像是解决问题的糟糕方法。不可扩展，不可维护"—哈，不错。
@纳瓦兹为什么要这样？你插入的是一个std::vector，而不是一个std::string。但是，同样，不应该有一个明确的模板参数，不管怎样(好吧，甚至不应该有一个back_inserter或copy，但是好的)。
@克里斯蒂安劳：哦，你说得对；第一个代码片段可能让我困惑。实际上，我应该说你不需要在std::back_inserter中提到模板参数；事实上，提到模板参数违背了back_inserter的目的。
为什么需要在vector tokens{istream_iterator{iss}, istream_iterator{}};中使用花括号，是因为它看起来像函数调用吗？
问题：1。为什么istream_iterator会停在空白处？对我来说，空格也是字符串的一部分；2。为什么效率很低？
需要5行的优雅包括3行(不包括using 行)和相当神秘的代码…分开一根绳子？亲爱的上帝。
我们还可以使用stl来拆分字符串。
如果你只需要在空白处进行分割，这比EvanTeran的答案快得多。
虽然缺少的分隔符问题是正确的，但是应该考虑到ops解决方案也不能处理这个问题。所以这似乎不是一项要求。
@门蝇是唯一需要花括号的地方是istream_iterator{}，因为如果不是这样的话，这将被视为一种功能。
如果使用wstring且代码中断，请检查此答案以修复wchar_t的istream_迭代器用法：stackoverflow.com/a/20959347/354343437
是的。您可以向流中添加一个专门的本地字符，使其成为一个空格(以及所有其他字符而不是空格)。然后代码的工作原理将完全相同。codereview.stackexchange.com/a/57467/507

使用Boost的一个可能的解决方案是：

1
2
3

#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs,"string to split", boost::is_any_of("\t"));

这种方法可能比stringstream方法更快。由于这是一个通用模板函数，所以可以使用各种分隔符来拆分其他类型的字符串(wchar等或utf-8)。

有关详细信息，请参阅文档。

相关讨论

速度在这里是不相关的，因为这两种情况都比strtok函数慢得多。
如果您知道行只包含几个标记，那么这就足够实用和快速了，但是如果它包含许多标记，那么您将消耗大量的内存(和时间)来增加向量。所以不，它不会比Stringstream解决方案更快——至少对于大N来说不会更快，这是速度最重要的情况。
对于那些还没有动力的人…BCP为此复制了1000多个文件：)
斯特托克是个陷阱。它的线程不安全。
警告：当给定空字符串("")时，此方法返回包含""字符串的向量。所以添加一个"如果"！在拆分之前，将字符串"u to u split.empty())"。
@Ian嵌入式开发人员并不都在使用Boost。
AksHistFoover是使用C++的嵌入式开发人员吗？
bcp这样产生了MPL之类的库，我认为这种库很难分割文本。伙计，这是一个皮塔…
@随机黑客："至少不适用于大N，这是速度最重要的情况"——也适用于大N循环中的小N……
@tuxslayer:各种posix/xopen/unix标准也指定了strtok_r。
@是的，在MSVC里，它被称为strtok_(安全的意思是？：)不太便携…
@Tuxslayer：如果您希望编写自己的实现，而不是有一个五行的#if/#else/#endif，那么就把自己敲出来……
使用std：：string：：find(..)和std：：string：：substr(..)不需要使用boost。
实际上，在我们公司，由于安全原因，我们不允许使用Boost，是的，我知道，但诉讼已经决定了。
作为补充：我只在必须的时候使用boost，通常我更喜欢添加到我自己的代码库中，它是独立的和可移植的，这样我就可以实现小的、精确的特定代码，从而实现给定的目标。这样的话，代码是非公共的、可执行的、琐碎的和可移植的。Boost有它的位置，但我建议在标记线方面有点过火：你不可能让你的整个房子被运到一家工程公司，让一颗新钉子钉在墙上挂一幅画……他们可能做得非常好，但利大于弊。
很好，它甚至可以在cpp类的xcode(ios项目)中调用boost框架。
我个人的看法是C和C++是不意味着敏捷或提供快速市场解决方案的语言，使用Boost和选择更抽象的高级语言几乎是一样的，因为我们选择Java、C等。因为对于那些我们不关心它到底在引擎盖下面做什么的人。使用boost还意味着我必须告诉我的客户我包括了第三方库。不管怎样，还是谢谢你。：)
使用Boost就像使用Booster座椅：不，谢谢。
boost：：split真的能在utf-8字符串上工作吗？你能为此分享任何文件吗？我正在尝试在换行符处拆分一个utf-8字符串。如果我传递的字符串使用的是UTF-8编码，那么boost:：split是否可以正常工作？
@Andrew:EDOCX1[0]自2011年以来一直是标准库的一部分：en.cppreference.com/w/cpp/algorithm/all-any-none-u

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

#include <vector>
#include <string>
#include <sstream>

int main()
{
std::string str("Split me by whitespaces");
std::string buf; // Have a buffer string
std::stringstream ss(str); // Insert the string into a stream

std::vector<std::string> tokens; // Create vector to hold our words

while (ss >> buf)
tokens.push_back(buf);

return 0;
}

相关讨论

对于那些不愿意牺牲所有代码大小的效率并将"高效"视为一种优雅的类型的人来说，下面应该是一个最佳选择(我认为模板容器类是一个非常优雅的添加)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

template < class ContainerT >
void tokenize(const std::string& str, ContainerT& tokens,
const std::string& delimiters ="", bool trimEmpty = false)
{
std::string::size_type pos, lastPos = 0, length = str.length();

using value_type = typename ContainerT::value_type;
using size_type = typename ContainerT::size_type;

while(lastPos < length + 1)
{
pos = str.find_first_of(delimiters, lastPos);
if(pos == std::string::npos)
{
pos = length;
}

if(pos != lastPos || !trimEmpty)
tokens.push_back(value_type(str.data()+lastPos,
(size_type)pos-lastPos ));

lastPos = pos + 1;
}
}

我通常选择使用std::vector类型作为第二个参数(ContainerT类型)。但是，当不需要直接访问时，list<>比vector<>快得多，而且您甚至可以创建自己的字符串类，并使用类似于std::list的东西，其中subString不会复制任何副本，从而提高速度。

它的速度比这个页面上最快的标记化快了一倍多，比其他页面快了近5倍。此外，通过完美的参数类型，您可以消除所有字符串和列表副本，以提高速度。

此外，它不返回结果(效率极低)，而是将令牌作为引用传递，因此如果您愿意，还允许您使用多个调用来构建令牌。

最后，它允许您指定是否通过最后一个可选参数从结果中修剪空标记。

它所需要的只是std::string…其余的是可选的。它不使用流或Boost库，但具有足够的灵活性，能够自然地接受其中一些外来类型。

相关讨论

我很喜欢这个，但是对于G++(可能是个好的实践)任何使用这个的人都需要typedef和typename:typedef ContainerT Base; typedef typename Base::value_type ValueType; typedef typename ValueType::size_type SizeType;，然后相应地替换出值类型和大小类型。
对于我们这些模板材料和第一条评论完全是外来的人来说，一个使用示例cmplete with required includes将是很好的选择。
啊，好吧，我知道了。我将AWS注释中的C++行放在ToKeNeSee()的函数体中，然后编辑了令牌。PuxOffBuffE()行，将CultRe::ValueEyType改为仅值ValueType，并将(Cabult::ValueSyType：：siZeType)改为(siZeType)。修正了比特G++一直在抱怨的问题。只需调用它作为标记化(一些_字符串，一些_向量)；
是否有人会如此亲切地提供一个"摘要"来解释为什么这段代码具有更高的性能？
除了在示例数据上运行一些性能测试之外，主要的是我已经将它减少到尽可能少的指令，并且尽可能少的内存复制，这是通过使用只引用其他字符串中偏移量/长度的子字符串类而实现的。(我推出了自己的，但还有其他一些实现)。不幸的是，我们没有太多其他办法可以改善这一点，但增量增长是可能的。
可能有个虫子。考虑到"xxxabcyabczzzabc"和"abo"，分割结果为"xxx cyy czzz c"。
这是当trimEmpty = true时的正确输出。记住，在这个答案中，"abo"不是分隔符，而是分隔符字符列表。将其修改为单个分隔符字符串是很简单的(我认为str.find_first_of应该改为str.find_first，但我可能错了…无法测试
感谢@thomas perl的修订，它确实使它更可读、更紧凑。我最初的实现避免了每次循环的额外比较，因为我正在为一个非常低延迟的应用程序进行优化。但是，您的编辑将更适用于访问这里的大多数用户。
最初我有一些问题，但是如果您相应地更新模板，这实际上与wstring/unicode一起工作。不过要小心，我遇到了一些容易导致运行时错误的地方，编译器在几个不同的地方没有捕捉到这些错误。
谢谢@ Kayle Effyyon OnDead，我在这个级别上还没有使用C++几年了，在新规范上可能有点生疏，但是如果我在这个帖子上有什么需要修复的地方，请告诉我，我会查一下。

这是另一个解决方案。它结构紧凑，效率合理：

1
2
3
4
5
6
7
8
9
10

std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
tokens.push_back(text.substr(start, end - start));
start = end + 1;
}
tokens.push_back(text.substr(start));
return tokens;
}

它可以很容易地被模板化处理字符串分隔符、宽字符串等。

注意，拆分""会导致一个空字符串，拆分","(即sep)会导致两个空字符串。

它也可以很容易地扩展以跳过空令牌：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
if (end != start) {
tokens.push_back(text.substr(start, end - start));
}
start = end + 1;
}
if (end != start) {
tokens.push_back(text.substr(start));
}
return tokens;
}

如果需要在跳过空标记的同时在多个分隔符处拆分字符串，则可以使用此版本：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

std::vector<std::string> split(const std::string& text, const std::string& delims)
{
std::vector<std::string> tokens;
std::size_t start = text.find_first_not_of(delims), end = 0;

while((end = text.find_first_of(delims, start)) != std::string::npos)
{
tokens.push_back(text.substr(start, end - start));
start = text.find_first_not_of(delims, end);
}
if(start != std::string::npos)
tokens.push_back(text.substr(start));

return tokens;
}

相关讨论

第一个版本很简单，可以完美地完成工作。我所做的唯一更改是直接返回结果，而不是将其作为参数传递。
输出作为效率参数传递。如果返回结果，则需要向量的副本，或者需要释放堆分配。
糟糕的是，我错误地认为STL会像Qt容器那样使用懒惰的拷贝。可惜他们没有。
我喜欢这个，因为它需要最少的额外头。我可能会建议进行编辑，使其遵循名称空间的最佳实践用法(即，在所有内容之前使用std:：)。
上面的注释略微补充：如果使用C++ 11移动语义，这个函数可以返回向量而不受惩罚。
@ AlecThomas：即使在C++ 11之前，大多数编译器也不会通过NRVO优化返回副本吗？(无论如何+1；非常简洁)
@彼得·M·我宁愿把它作为参考传进来，以防埃多克斯1号(8号)变大。
@如果分隔符是最后一个字符，Veritas以什么方式不工作？另外，输出空令牌是有意的，尽管很明显，如果需要的话可以很容易地修改为不这样做。
在所有答案中，这似乎是最具吸引力和灵活性的一个。与带有分隔符的getline一起使用，尽管这是一个不太明显的解决方案。C++ 11标准对此没有什么要求吗？C++ 11现在支持穿孔卡片吗？
如果传入空字符串，则返回一个包含1个元素(空字符串)的向量。如果传入的字符串与sep相同，则返回一个包含2个元素的向量(都是空字符串)。在将推回到while循环之前应该有"if(end>0)"，在将推回到while循环下面之前应该有"if(start>0)"来解决此问题。
@learncocos2d请不要通过编辑改变文章的含义。这种行为是设计出来的。它的行为与Python的split运算符相同。我要加个便条来说明这一点。
建议使用std:：string:：size_类型而不是int，因为有些编译器可能会发出有符号/无符号警告。
好东西。我喜欢阅读和尝试的示例代码
此答案中的第一个函数是最佳解决方案-与反向连接函数-std::string strJoin(const std::vector v, const char& delimiter) { if(!v.empty()) { std::stringstream ss; std::string str(1, delimiter); auto it = v.cbegin(); while(true) { ss << *it++; if(it != v.cend()) ss << delimiter; else return ss.str(); } } return""; }完美配合。

这是我最喜欢的迭代字符串的方法。每个字你想做什么就做什么。

1
2
3
4
5
6
7
8
9

string line ="a line of text to iterate through";
string word;

istringstream iss(line, istringstream::in);

while( iss >> word )
{
// Do something on `word` here...
}

相关讨论

这类似于堆栈溢出问题，如何在C++中标记一个字符串？.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int argc, char** argv)
{
string text ="token test\tstring";

char_separator<char> sep(" \t");
tokenizer<char_separator<char>> tokens(text, sep);
for (const string& t : tokens)
{
cout << t <<"." << endl;
}
}

相关讨论

我喜欢下面的内容，因为它将结果放入一个向量中，支持字符串作为熟食，并提供保持空值的控制。但那时候看起来不太好。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

#include <ostream>
#include <string>
#include <vector>
#include
#include <iterator>
using namespace std;

vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
vector<string> result;
if (delim.empty()) {
result.push_back(s);
return result;
}
string::const_iterator substart = s.begin(), subend;
while (true) {
subend = search(substart, s.end(), delim.begin(), delim.end());
string temp(substart, subend);
if (keep_empty || !temp.empty()) {
result.push_back(temp);
}
if (subend == s.end()) {
break;
}
substart = subend + delim.size();
}
return result;
}

int main() {
const vector<string> words = split("So close no matter how far","");
copy(words.begin(), words.end(), ostream_iterator<string>(cout,"
"));
}

当然，Boost有一个部分工作的split()。如果说"空白"，你指的是任何类型的空白，使用Boost'sSplitWithis_any_of()效果很好。

相关讨论

STL还没有这样的方法。

但是，您可以使用std::string::c_str()成员使用c的strtok()函数，也可以编写自己的函数。以下是我在谷歌快速搜索("stl string split")后找到的代码示例：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

void Tokenize(const string& str,
vector<string>& tokens,
const string& delimiters ="")
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first"non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);

while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the"not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next"non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}

取自：http://oopWeb.com /CPP/文档/CPPHOWTO /Stult/C+ +编程HOOT-7.HTML

如果您对代码示例有疑问，请留下评论，我会解释的。

仅仅因为它没有实现一个名为迭代器的typedef操作符，或者重载<<操作符，并不意味着它是坏代码。我经常使用C函数。例如，printf和scanf都比std::cin和std::cout更快(显著地)，fopen语法对二进制类型更友好，而且它们也倾向于生成较小的exe。

不要在这个"优雅胜过性能"的交易上被出卖。

相关讨论

下面是一个拆分函数：

是通用的
使用标准C++(无升压)
接受多个分隔符

忽略空标记(可以很容易地更改)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

template<typename T>
vector<T>
split(const T & str, const T & delimiters) {
vector<T> v;
typename T::size_type start = 0;
auto pos = str.find_first_of(delimiters, start);
while(pos != T::npos) {
if(pos != start) // ignore empty tokens
v.emplace_back(str, start, pos - start);
start = pos + 1;
pos = str.find_first_of(delimiters, start);
}
if(start < str.length()) // ignore trailing delimiter
v.emplace_back(str, start, str.length() - start); // add what's left of the string
return v;
}

示例用法：

1 2	vector<string> v = split<string>("Hello, there; World",";,"); vector<wstring> v = split<wstring>(L"Hello, there; World", L";,");

相关讨论

I have a 2 lines solution to this problem:

1
2
3
4
5

char sep = ' ';
std::string s="1 This is an example";

for(size_t p=0, q=0; p!=s.npos; p=q)
std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl;

然后你可以把它放在一个向量中，而不是打印。

另一种灵活快速的方法

1
2
3
4
5
6
7
8
9
10
11
12
13

template<typename Operator>
void tokenize(Operator& op, const char* input, const char* delimiters) {
const char* s = input;
const char* e = s;
while (*e != 0) {
e = s;
while (*e != 0 && strchr(delimiters, *e) == 0) ++e;
if (e - s > 0) {
op(s, e - s);
}
s = e + 1;
}
}

将它与字符串向量一起使用(编辑：因为有人指出不继承STL类…HRMF；))：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

template<class ContainerType>
class Appender {
public:
Appender(ContainerType& container) : container_(container) {;}
void operator() (const char* s, unsigned length) {
container_.push_back(std::string(s,length));
}
private:
ContainerType& container_;
};

std::vector<std::string> strVector;
Appender v(strVector);
tokenize(v,"A number of words to be tokenized"," \t");

就是这样！这只是使用记号赋予器的一种方法，比如如何数字：

1
2
3
4
5
6
7
8
9
10
11
12

class WordCounter {
public:
WordCounter() : noOfWords(0) {}
void operator() (const char*, unsigned) {
++noOfWords;
}
unsigned noOfWords;
};

WordCounter wc;
tokenize(wc,"A number of words to be counted"," \t");
ASSERT( wc.noOfWords == 7 );

受想象力限制；

相关讨论

Here's a simple solution that uses only the standard regex library

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

#include <regex>
#include <string>
#include <vector>

std::vector<string> Tokenize( const string str, const std::regex regex )
{
using namespace std;

std::vector<string> result;

sregex_token_iterator it( str.begin(), str.end(), regex, -1 );
sregex_token_iterator reg_end;

for ( ; it != reg_end; ++it ) {
if ( !it->str().empty() ) //token could be empty:check
result.emplace_back( it->str() );
}

return result;
}

在正则表达式允许参数诊断的多(空间commas题元，等。)

通常，只有在两个分离的在线检查和commas空间，所以也有这个默认的功能：

1
2
3
4
5
6
7
8

std::vector<string> TokenizeDefault( const string str )
{
using namespace std;

regex re("[\\s,]+" );

return Tokenize( str, re );
}

《"[\\s,]+"支票(\\s)和空间(commas ,)。

注意，如果你想要的而不是分裂的两个wstringstring，

全std::regex两std::wregex相变
全sregex_token_iterator两wsregex_token_iterator相变

注意，你也会想把你的城市的参考字符串参数，取决于你的编译器。

相关讨论

使用std::stringstream，因为你的工作非常好，并且完全按照你想要的做。如果你只是在寻找不同的做事方式，你可以使用std::find()/std::find_first_of()和std::string::substr()。

下面是一个例子：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

#include <iostream>
#include <string>

int main()
{
std::string s("Somewhere down the road");
std::string::size_type prev_pos = 0, pos = 0;

while( (pos = s.find(' ', pos)) != std::string::npos )
{
std::string substring( s.substr(prev_pos, pos-prev_pos) );

std::cout << substring << '
';

prev_pos = ++pos;
}

std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word
std::cout << substring << '
';

return 0;
}

相关讨论

如果您喜欢使用boost，但希望使用一个完整的字符串作为分隔符(而不是像以前大多数建议的解决方案中那样使用单个字符)，则可以使用boost_split_iterator。

示例代码包括方便的模板：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>

template<typename _OutputIterator>
inline void split(
const std::string& str,
const std::string& delim,
_OutputIterator result)
{
using namespace boost::algorithm;
typedef split_iterator<std::string::const_iterator> It;

for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));
iter!=It();
++iter)
{
*(result++) = boost::copy_range<std::string>(*iter);
}
}

int main(int argc, char* argv[])
{
using namespace std;

vector<string> splitted;
split("HelloFOOworldFOO!","FOO", back_inserter(splitted));

// or directly to console, for example
split("HelloFOOworldFOO!","FOO", ostream_iterator<string>(cout,"
"));
return 0;
}

有一个名为strtok的函数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

#include<string>
using namespace std;

vector<string> split(char* str,const char* delim)
{
char* saveptr;
char* token = strtok_r(str,delim,&saveptr);

vector<string> result;

while(token != NULL)
{
result.push_back(token);
token = strtok_r(NULL,delim,&saveptr);
}
return result;
}

相关讨论

这里是一个只使用标准regex库的regex解决方案。(我有点生疏，所以可能有一些语法错误，但这至少是一般的想法)

1
2
3
4
5
6
7
8
9
10
11
12
13

#include <regex.h>
#include <string.h>
#include <vector.h>

using namespace std;

vector<string> split(string s){
regex r ("\\w+"); //regex matches whole words, (greedy, so no fragment words)
regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );
regex_iterator<string::iterator> rend; //iterators to iterate thru words
vector<string> result<regex_iterator>(rit, rend);
return result; //iterates through the matches to fill the vector
}

相关讨论

如果需要用非空格符号解析字符串，则StringStream很方便：

1
2
3
4
5
6
7
8

string s ="Name:JAck; Spouse:Susan; ...";
string dummy, name, spouse;

istringstream iss(s);
getline(iss, dummy, ':');
getline(iss, name, ';');
getline(iss, dummy, ':');
getline(iss, spouse, ';')

相关讨论

到目前为止，我使用的是Boost，但我需要一些不依赖它的东西，所以我得出了以下结论：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

static void Split(std::vector<std::string>& lst, const std::string& input, const std::string& separators, bool remove_empty = true)
{
std::ostringstream word;
for (size_t n = 0; n < input.size(); ++n)
{
if (std::string::npos == separators.find(input[n]))
word << input[n];
else
{
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
word.str("");
}
}
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
}

好的一点是，在separators中，可以传递多个字符。

短而优雅

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

#include <vector>
#include <string>
using namespace std;

vector<string> split(string data, string token)
{
vector<string> output;
size_t pos = string::npos; // size_t to avoid improbable overflow
do
{
pos = data.find(token);
output.push_back(data.substr(0, pos));
if (string::npos != pos)
data = data.substr(pos + token.size());
} while (string::npos != pos);
return output;
}

可以使用任何字符串作为分隔符，也可以与二进制数据一起使用(std:：string支持二进制数据，包括空值)

使用：

1	auto a = split("this!!is!!!example!string","!!");

输出：

1
2
3

this
is
!example!string

相关讨论

我在我自己的strtok轧制用升压和使用两个分离的字符串。最好的方法是在已经发现的C + +工具包的字符串库。它是incredibly柔性和固定。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>

const char *whitespace =" \t

\f";
const char *whitespace_and_punctuation =" \t

\f;,=";

int main()
{
{ // normal parsing of a string into a vector of strings
std::string s("Somewhere down the road");
std::vector<std::string> result;
if( strtk::parse( s, whitespace, result ) )
{
for(size_t i = 0; i < result.size(); ++i )
std::cout << result[i] << std::endl;
}
}

{ // parsing a string into a vector of floats with other separators
// besides spaces

std::string s("3.0, 3.14; 4.0");
std::vector<float> values;
if( strtk::parse( s, whitespace_and_punctuation, values ) )
{
for(size_t i = 0; i < values.size(); ++i )
std::cout << values[i] << std::endl;
}
}

{ // parsing a string into specific variables

std::string s("angle = 45; radius = 9.9");
std::string w1, w2;
float v1, v2;
if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
{
std::cout <<"word" << w1 <<", value" << v1 << std::endl;
std::cout <<"word" << w2 <<", value" << v2 << std::endl;
}
}

return 0;
}

有太多的工具包的灵活性比这简单的娱乐节目，但以其效用在字符串解析成冰的令人难以置信的有用的元素。

我之所以这么做是因为我需要一种简单的方法来拆分字符串和基于C的字符串…希望其他人也能发现它的用处。另外，它不依赖于令牌，您可以使用字段作为分隔符，这是我需要的另一个键。

我相信有一些改进可以进一步提高它的优雅度，请尽一切努力

字符串拆分器.hpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

#include <vector>
#include <iostream>
#include <string.h>

using namespace std;

class StringSplit
{
private:
void copy_fragment(char*, char*, char*);
void copy_fragment(char*, char*, char);
bool match_fragment(char*, char*, int);
int untilnextdelim(char*, char);
int untilnextdelim(char*, char*);
void assimilate(char*, char);
void assimilate(char*, char*);
bool string_contains(char*, char*);
long calc_string_size(char*);
void copy_string(char*, char*);

public:
vector<char*> split_cstr(char);
vector<char*> split_cstr(char*);
vector<string> split_string(char);
vector<string> split_string(char*);
char* String;
bool do_string;
bool keep_empty;
vector<char*> Container;
vector<string> ContainerS;

StringSplit(char * in)
{
String = in;
}

StringSplit(string in)
{
size_t len = calc_string_size((char*)in.c_str());
String = new char[len + 1];
memset(String, 0, len + 1);
copy_string(String, (char*)in.c_str());
do_string = true;
}

~StringSplit()
{
for (int i = 0; i < Container.size(); i++)
{
if (Container[i] != NULL)
{
delete[] Container[i];
}
}
if (do_string)
{
delete[] String;
}
}
};

字符串拆分器.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283

#include <string.h>
#include <iostream>
#include <vector>
#include"StringSplit.hpp"

using namespace std;

void StringSplit::assimilate(char*src, char delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}

}
else
{
delete[] temp;
}
}
}

void StringSplit::assimilate(char*src, char* delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete[] temp;
}
}
}

long StringSplit::calc_string_size(char* _in)
{
long i = 0;
while (*_in++)
{
i++;
}
return i;
}

bool StringSplit::string_contains(char* haystack, char* needle)
{
size_t len = calc_string_size(needle);
size_t lenh = calc_string_size(haystack);
while (lenh--)
{
if (match_fragment(haystack + lenh, needle, len))
{
return true;
}
}
return false;
}

bool StringSplit::match_fragment(char* _src, char* cmp, int len)
{
while (len--)
{
if (*(_src + len) != *(cmp + len))
{
return false;
}
}
return true;
}

int StringSplit::untilnextdelim(char* _in, char delim)
{
size_t len = calc_string_size(_in);
if (*_in == delim)
{
_in += 1;
return len - 1;
}

int c = 0;
while (*(_in + c) != delim && c < len)
{
c++;
}

return c;
}

int StringSplit::untilnextdelim(char* _in, char* delim)
{
int s = calc_string_size(delim);
int c = 1 + s;

if (!string_contains(_in, delim))
{
return calc_string_size(_in);
}
else if (match_fragment(_in, delim, s))
{
_in += s;
return calc_string_size(_in);
}

while (!match_fragment(_in + c, delim, s))
{
c++;
}

return c;
}

void StringSplit::copy_fragment(char* dest, char* src, char delim)
{
if (*src == delim)
{
src++;
}

int c = 0;
while (*(src + c) != delim && *(src + c))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}

void StringSplit::copy_string(char* dest, char* src)
{
int i = 0;
while (*(src + i))
{
*(dest + i) = *(src + i);
i++;
}
}

void StringSplit::copy_fragment(char* dest, char* src, char* delim)
{
size_t len = calc_string_size(delim);
size_t lens = calc_string_size(src);

if (match_fragment(src, delim, len))
{
src += len;
lens -= len;
}

int c = 0;
while (!match_fragment(src + c, delim, len) && (c < lens))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}

vector<char*> StringSplit::split_cstr(char Delimiter)
{
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}

String -= i;
delete[] String;

return Container;
}

vector<string> StringSplit::split_string(char Delimiter)
{
do_string = true;

int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}

String -= i;
delete[] String;

return ContainerS;
}

vector<char*> StringSplit::split_cstr(char* Delimiter)
{
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);

while(*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String,Delimiter);
}
i++;
String++;
}

String -= i;
delete[] String;

return Container;
}

vector<string> StringSplit::split_string(char* Delimiter)
{
do_string = true;
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);

while (*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String, Delimiter);
}
i++;
String++;
}

String -= i;
delete[] String;

return ContainerS;
}

实例：

1
2
3
4
5
6
7
8
9
10
11
12

int main(int argc, char*argv[])
{
StringSplit ss ="This:CUT:is:CUT:an:CUT:example:CUT:cstring";
vector<char*> Split = ss.split_cstr(":CUT:");

for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}

return 0;
}

意志产出：

这个是安例子字符串

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

int main(int argc, char*argv[])
{
StringSplit ss ="This:is:an:example:cstring";
vector<char*> Split = ss.split_cstr(':');

for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}

return 0;
}

int main(int argc, char*argv[])
{
string mystring ="This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string("[SPLIT]");

for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}

return 0;
}

int main(int argc, char*argv[])
{
string mystring ="This|is|an|example|string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string('|');

for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}

return 0;
}

要保留空条目(默认情况下将排除空条目)：

1
2
3

StringSplit ss = mystring;
ss.keep_empty = true;
vector<string> Split = ss.split_string(":DELIM:");

其目的是使其类似于C的split()方法，其中拆分字符串就像：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

String[] Split =
"Hey:cut:what's:cut:your:cut:name?".Split(new[]{":cut:
<hr><P>这个怎么样：</P>[cc lang="cpp"]#include <string>
#include <vector>

using namespace std;

vector<string> split(string str, const char delim) {
vector<string> v;
string tmp;

for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) {
if(*i != delim && i != str.end()) {
tmp += *i;
} else {
v.push_back(tmp);
tmp ="";
}
}

return v;
}

相关讨论

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

#include<iostream>
#include<string>
#include<sstream>
#include<vector>
using namespace std;

vector<string> split(const string &s, char delim) {
vector<string> elems;
stringstream ss(s);
string item;
while (getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}

int main() {

vector<string> x = split("thi is an sample test",' ');
unsigned int i;
for(i=0;i<x.size();i++)
cout<<i<<":"<<x[i]<<endl;
return 0;
}

我喜欢为此任务使用boost/regex方法，因为它们为指定拆分条件提供了最大的灵活性。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

#include <iostream>
#include <string>
#include <boost/regex.hpp>

int main() {
std::string line("A:::line::to:split");
const boost::regex re(":+"); // one or more colons

// -1 means find inverse matches aka split
boost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1);
boost::sregex_token_iterator end;

for (; tokens != end; ++tokens)
std::cout << *tokens << std::endl;
}

最近我不得不把一个用骆驼壳包装的词分解成子字。没有分隔符，只有大写字符。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

#include <string>
#include <list>
#include <locale> // std::isupper

template<class String>
const std::list<String> split_camel_case_string(const String &s)
{
std::list<String> R;
String w;

for (String::const_iterator i = s.begin(); i < s.end(); ++i) { {
if (std::isupper(*i)) {
if (w.length()) {
R.push_back(w);
w.clear();
}
}
w += *i;
}

if (w.length())
R.push_back(w);
return R;
}

例如，这将"a query trades"拆分为"a"、"query"和"trades"。该函数使用窄字符串和宽字符串。因为它尊重当前的地区，它将"raumfahrt_berwachungs verordnung"分为"raumfahrt"、"berwachungs"和"verordnung"。

注：std::upper应该作为函数模板参数传递。然后，这个函数的更广义的自也可以在诸如","、";"或""这样的定界符处分解。

相关讨论

这个答案获取字符串并将其放入字符串的向量中。它使用Boost库。

1
2
3

#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs,"string to split", boost::is_any_of("\t"));

这是另一种方法……

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

void split_string(string text,vector<string>& words)
{
int i=0;
char ch;
string word;

while(ch=text[i++])
{
if (isspace(ch))
{
if (!word.empty())
{
words.push_back(word);
}
word ="";
}
else
{
word += ch;
}
}
if (!word.empty())
{
words.push_back(word);
}
}

加油！-)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string.hpp>
#include <iostream>
#include <vector>

using namespace std;
using namespace boost;

int main(int argc, char**argv) {
typedef vector < string > list_type;

list_type list;
string line;

line ="Somewhere down the road";
split(list, line, is_any_of(""));

for(int i = 0; i < list.size(); i++)
{
cout << list[i] << endl;
}

return 0;
}

这个例子给出了输出-

1
2
3
4

Somewhere
down
the
road

下面的代码使用strtok()将字符串拆分为标记，并将标记存储在向量中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

#include <iostream>
#include
#include <vector>
#include <string>

using namespace std;

char one_line_string[] ="hello hi how are you nice weather we are having ok then bye";
char seps[] =" ,\t
";
char *token;

int main()
{
vector<string> vec_String_Lines;
token = strtok( one_line_string, seps );

cout <<"Extracting and storing data in a vector..

";

while( token != NULL )
{
vec_String_Lines.push_back(token);
token = strtok( NULL, seps );
}
cout <<"Displaying end result in vector line storage..

";

for ( int i = 0; i < vec_String_Lines.size(); ++i)
cout << vec_String_Lines[i] <<"
";
cout <<"

";

return 0;
}

我使用这个simpleton是因为我们的字符串类"特殊"(即不标准)：

1
2
3
4
5
6
7
8
9
10
11
12
13

void splitString(const String &s, const String &delim, std::vector<String> &result) {
const int l = delim.length();
int f = 0;
int i = s.indexOf(delim,f);
while (i>=0) {
String token( i-f > 0 ? s.substring(f,i-f) :"");
result.push_back(token);
f=i+l;
i = s.indexOf(delim,f);
}
String token = s.substring(f);
result.push_back(token);
}

1
2
3
4
5
6
7
8
9
10
11
12
13

#include <iostream>
#include <regex>

using namespace std;

int main() {
string s ="foo bar baz";
regex e("\\s+");
regex_token_iterator<string::iterator> i(s.begin(), s.end(), e, -1);
regex_token_iterator<string::iterator> end;
while (i != end)
cout <<" [" << *i++ <<"]";
}

在我看来，这是最接近python的re.split()。有关regex_token_迭代器的更多信息，请参阅cplusplus.com。-1(regex_token_迭代器ctor中的第4个参数)是序列中不匹配的部分，使用match作为分隔符。

下面是一个更好的方法。它可以采用任何字符，除非您需要，否则不会拆分行。不需要特殊的库(好吧，除了std，但谁真的认为这是一个额外的库)，没有指针，没有引用，而且它是静态的。只是简单的C++。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58

#pragma once
#include <vector>
#include <sstream>
using namespace std;
class Helpers
{
public:
static vector<string> split(string s, char delim)
{
stringstream temp (stringstream::in | stringstream::out);
vector<string> elems(0);
if (s.size() == 0 || delim == 0)
return elems;
for(char c : s)
{
if(c == delim)
{
elems.push_back(temp.str());
temp = stringstream(stringstream::in | stringstream::out);
}
else
temp << c;
}
if (temp.str().size() > 0)
elems.push_back(temp.str());
return elems;
}

//Splits string s with a list of delimiters in delims (it's just a list, like if we wanted to
//split at the following letters, a, b, c we would make delims="abc".
static vector<string> split(string s, string delims)
{
stringstream temp (stringstream::in | stringstream::out);
vector<string> elems(0);
bool found;
if(s.size() == 0 || delims.size() == 0)
return elems;
for(char c : s)
{
found = false;
for(char d : delims)
{
if (c == d)
{
elems.push_back(temp.str());
temp = stringstream(stringstream::in | stringstream::out);
found = true;
break;
}
}
if(!found)
temp << c;
}
if(temp.str().size() > 0)
elems.push_back(temp.str());
return elems;
}
};

我写了下面的代码。可以指定分隔符，分隔符可以是字符串。结果与Java的String .String类似，结果中有空字符串。

例如，如果我们称为split("abcpickabcanyabctwo:abc"、"abc")，结果如下：

1
2
3
4
5

0 <len:0>
1 PICK <len:4>
2 ANY <len:3>
3 TWO: <len:4>
4 <len:0>

代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

vector <string> split(const string& str, const string& delimiter ="") {
vector <string> tokens;

string::size_type lastPos = 0;
string::size_type pos = str.find(delimiter, lastPos);

while (string::npos != pos) {
// Found a token, add it to the vector.
cout << str.substr(lastPos, pos - lastPos) << endl;
tokens.push_back(str.substr(lastPos, pos - lastPos));
lastPos = pos + delimiter.size();
pos = str.find(delimiter, lastPos);
}

tokens.push_back(str.substr(lastPos, str.size() - lastPos));
return tokens;
}

在处理空格作为分隔符时，已经给出了使用std::istream_iterator的明显答案，并进行了大量投票。当然，元素不能用空格分隔，而是用一些分隔符来分隔。我没有发现任何答案，只是重新定义了空格的含义，将其称为分隔符，然后使用常规方法。

为了改变流所认为的空白，您只需使用(std::istream::imbue()和std::ctype方面来改变流的std::locale和它自己定义的空白意味着什么(它也可以对std::ctype进行，但实际上有点不同，因为std::ctype是表驱动的，而std::ctype是由vi驱动的。RTUAL功能)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

#include <iostream>
#include
#include <iterator>
#include <sstream>
#include <locale>

struct whitespace_mask {
std::ctype_base::mask mask_table[std::ctype<char>::table_size];
whitespace_mask(std::string const& spaces) {
std::ctype_base::mask* table = this->mask_table;
std::ctype_base::mask const* tab
= std::use_facet<std::ctype<char>>(std::locale()).table();
for (std::size_t i(0); i != std::ctype<char>::table_size; ++i) {
table[i] = tab[i] & ~std::ctype_base::space;
}
std::for_each(spaces.begin(), spaces.end(), [=](unsigned char c) {
table[c] |= std::ctype_base::space;
});
}
};
class whitespace_facet
: private whitespace_mask
, public std::ctype<char> {
public:
whitespace_facet(std::string const& spaces)
: whitespace_mask(spaces)
, std::ctype<char>(this->mask_table) {
}
};

struct whitespace {
std::string spaces;
whitespace(std::string const& spaces): spaces(spaces) {}
};
std::istream& operator>>(std::istream& in, whitespace const& ws) {
std::locale loc(in.getloc(), new whitespace_facet(ws.spaces));
in.imbue(loc);
return in;
}
// everything above would probably go into a utility library...

int main() {
std::istringstream in("a, b, c, d, e");
std::copy(std::istream_iterator<std::string>(in >> whitespace(",")),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(std::cout,"
"));

std::istringstream pipes("a b c| d |e e");
std::copy(std::istream_iterator<std::string>(pipes >> whitespace("|")),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(std::cout,"
"));
}

大多数代码用于打包提供软分隔符的通用工具：合并一行中的多个分隔符。无法生成空序列。当流中需要不同的分隔符时，您可能会使用不同的设置流，使用共享流缓冲区：

1
2
3
4
5
6
7
8
9
10
11
12

void f(std::istream& in) {
std::istream pipes(in.rdbuf());
pipes >> whitespace("|");
std::istream comma(in.rdbuf());
comma >> whitespace(",");

std::string s0, s1;
if (pipes >> s0 >> std::ws // read up to first pipe and ignore sequence of pipes
&& comma >> s1 >> std::ws) { // read up to first comma and ignore commas
// ...
}
}

作为一个业余爱好者，这是我想到的第一个解决方案。我有点好奇为什么我还没有在这里看到类似的解决方案，我是怎么做的有根本的问题吗？

include<iostream>#include<string>#包括<vector>std:：vector<std:：string>split(const std:：string&s，const std:：string&delims){std:：vector<std:：string>结果；std：：string：：size_type pos=0；同时(std：：string：：npos！=(pos=s.find_first_not_of(delims，pos))){auto pos2=s.find ou first ou of(熟食，pos)；result.emplace_back(s.substr(pos，std:：string:：npos==pos2？)pos2:pos2-pos))；POS＝POS2；}返回结果；}int(){std：：字符串文本"，然后我说："我不明白，你为什么要这样做呢？？"<div class="suo-content">[collapse title=""]<ul><li>这很管用！</li></ul>[/collapse]</div><hr><P>我使用以下代码：</P>[cc lang="cpp"]namespace Core
{
typedef std::wstring String;

void SplitString(const Core::String& input, const Core::String& splitter, std::list<Core::String>& output)
{
if (splitter.empty())
{
throw std::invalid_argument(); // for example
}

std::list<Core::String> lines;

Core::String::size_type offset = 0;

for (;;)
{
Core::String::size_type splitterPos = input.find(splitter, offset);

if (splitterPos != Core::String::npos)
{
lines.push_back(input.substr(offset, splitterPos - offset));
offset = splitterPos + splitter.size();
}
else
{
lines.push_back(input.substr(offset));
break;
}
}

lines.swap(output);
}
}

// gtest:

class SplitStringTest: public testing::Test
{
};

TEST_F(SplitStringTest, EmptyStringAndSplitter)
{
std::list<Core::String> result;
ASSERT_ANY_THROW(Core::SplitString(Core::String(), Core::String(), result));
}

TEST_F(SplitStringTest, NonEmptyStringAndEmptySplitter)
{
std::list<Core::String> result;
ASSERT_ANY_THROW(Core::SplitString(L"xy", Core::String(), result));
}

TEST_F(SplitStringTest, EmptyStringAndNonEmptySplitter)
{
std::list<Core::String> result;
Core::SplitString(Core::String(), Core::String(L","), result);
ASSERT_EQ(1, result.size());
ASSERT_EQ(Core::String(), *result.begin());
}

TEST_F(SplitStringTest, OneCharSplitter)
{
std::list<Core::String> result;

Core::SplitString(L"x,y", L",", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(L"x", *result.begin());
ASSERT_EQ(L"y", *result.rbegin());

Core::SplitString(L",xy", L",", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(Core::String(), *result.begin());
ASSERT_EQ(L"xy", *result.rbegin());

Core::SplitString(L"xy,", L",", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(L"xy", *result.begin());
ASSERT_EQ(Core::String(), *result.rbegin());
}

TEST_F(SplitStringTest, TwoCharsSplitter)
{
std::list<Core::String> result;

Core::SplitString(L"x,.y,z", L",.", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(L"x", *result.begin());
ASSERT_EQ(L"y,z", *result.rbegin());

Core::SplitString(L"x,,y,z", L",,", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(L"x", *result.begin());
ASSERT_EQ(L"y,z", *result.rbegin());
}

TEST_F(SplitStringTest, RecursiveSplitter)
{
std::list<Core::String> result;

Core::SplitString(L",,,", L",,", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(Core::String(), *result.begin());
ASSERT_EQ(L",", *result.rbegin());

Core::SplitString(L",.,.,", L",.,", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(Core::String(), *result.begin());
ASSERT_EQ(L".,", *result.rbegin());

Core::SplitString(L"x,.,.,y", L",.,", result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(L"x", *result.begin());
ASSERT_EQ(L".,y", *result.rbegin());

Core::SplitString(L",.,,.,", L",.,", result);
ASSERT_EQ(3, result.size());
ASSERT_EQ(Core::String(), *result.begin());
ASSERT_EQ(Core::String(), *(++result.begin()));
ASSERT_EQ(Core::String(), *result.rbegin());
}

TEST_F(SplitStringTest, NullTerminators)
{
std::list<Core::String> result;

Core::SplitString(L"xy", Core::String(L"\0", 1), result);
ASSERT_EQ(1, result.size());
ASSERT_EQ(L"xy", *result.begin());

Core::SplitString(Core::String(L"x\0y", 3), Core::String(L"\0", 1), result);
ASSERT_EQ(2, result.size());
ASSERT_EQ(L"x", *result.begin());
ASSERT_EQ(L"y", *result.rbegin());
}

这是我的版本，是kev的来源：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

#include <string>
#include <vector>
void split(vector<string> &result, string str, char delim ) {
string tmp;
string::iterator i;
result.clear();

for(i = str.begin(); i <= str.end(); ++i) {
if((const char)*i != delim && i != str.end()) {
tmp += *i;
} else {
result.push_back(tmp);
tmp ="";
}
}
}

之后，调用函数并对其执行一些操作：

1
2
3
4
5

vector<string> hosts;
split(hosts,"192.168.1.2,192.168.1.3", ',');
for( size_t i = 0; i < hosts.size(); i++){
cout << "Connecting host :" << hosts.at(i) <<"..." << endl;
}

这是我用C++ 11和STL的解决方案。它应该是相当有效的：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

#include <vector>
#include <string>
#include <cstring>
#include <iostream>
#include
#include <functional>

std::vector<std::string> split(const std::string& s)
{
std::vector<std::string> v;

const auto end = s.end();
auto to = s.begin();
decltype(to) from;

while((from = std::find_if(to, end,
[](char c){ return !std::isspace(c); })) != end)
{
to = std::find_if(from, end, [](char c){ return std::isspace(c); });
v.emplace_back(from, to);
}

return v;
}

int main()
{
std::string s ="this is the string to split";

auto v = split(s);

for(auto&& s: v)
std::cout << s << '
';
}

输出：

1
2
3
4
5
6

this
is
the
string
to
split

相关讨论

使用vector作为基本类的快速版本，可完全访问其所有操作员：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

// Split string into parts.
class Split : public std::vector<std::string>
{
public:
Split(const std::string& str, char* delimList)
{
size_t lastPos = 0;
size_t pos = str.find_first_of(delimList);

while (pos != std::string::npos)
{
if (pos != lastPos)
push_back(str.substr(lastPos, pos-lastPos));
lastPos = pos + 1;
pos = str.find_first_of(delimList, lastPos);
}
if (lastPos < str.length())
push_back(str.substr(lastPos, pos-lastPos));
}
};

用于填充STL集的示例：

1
2
3

std::set<std::string> words;
Split split("Hello,World",",");
words.insert(split.begin(), split.end());

相关讨论

lazystringsplitter：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

#include <string>
#include
#include <unordered_set>

using namespace std;

class LazyStringSplitter
{
string::const_iterator start, finish;
unordered_set<char> chop;

public:

// Empty Constructor
explicit LazyStringSplitter()
{}

explicit LazyStringSplitter (const string cstr, const string delims)
: start(cstr.begin())
, finish(cstr.end())
, chop(delims.begin(), delims.end())
{}

void operator () (const string cstr, const string delims)
{
chop.insert(delims.begin(), delims.end());
start = cstr.begin();
finish = cstr.end();
}

bool empty() const { return (start >= finish); }

string next()
{
// return empty string
// if ran out of characters
if (empty())
return string("");

auto runner = find_if(start, finish, [&](char c) {
return chop.count(c) == 1;
});

// construct next string
string ret(start, runner);
start = runner + 1;

// Never return empty string
// + tail recursion makes this method efficient
return !ret.empty() ? ret : next();
}
};

在《LazyStringSplitter呼叫这个方法，因为一个原因-它不分在一个好的字符串。
它在本质behaves像一个Python的发电机
它exposes称这一方法nextthe next返回的字符串，从原始的冰裂
在使用方式上的无序_从C + +(11，婊子，看IP(分隔符是多少？
这里是如何与信息工程

测试程序

1
2
3
4
5
6
7
8
9
10
11
12
13
14

#include <iostream>
using namespace std;

int main()
{
LazyStringSplitter splitter;

// split at the characters ' ', '!', '.', ','
splitter("This, is a string. And here is another string! Let's test and see how well this does."," !.,");

while (!splitter.empty())
cout << splitter.next() << endl;
return 0;
}

输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

This
is
a
string
And
here
is
another
string
Let's
test
and
see
how
well
this
does

下一个水平的提高是实施begin和两个方法，一end母狗之类的东西，可以做的：

1	vector<string> split_string(splitter.begin(), splitter.end());

相关讨论

一直在寻求一种方式分两个字符串A市分离器，任何长度，所以它从零开始写作，为现有的解决方案不适合我。

这里是我的小算法，采用只读(：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

//use like this
//std::vector<std::wstring> vec = Split<std::wstring> (L"Hello##world##!", L"##");

template <typename valueType>
static std::vector <valueType> Split (valueType text, const valueType& delimiter)
{
std::vector <valueType> tokens;
size_t pos = 0;
valueType token;

while ((pos = text.find(delimiter)) != valueType::npos)
{
token = text.substr(0, pos);
tokens.push_back (token);
text.erase(0, pos + delimiter.length());
}
tokens.push_back (text);

return tokens;
}

它可以用于任何与分离器的长度和形状的父亲，AA AA在收缩测试。实例化或与wstring或者字符串类型。

所有的算法。它是searches delimiter为《，《方得到的字符串，在冰上的deletes delimiter delimiter，直到它再次和searches CR网络的不多。

当然，你可以使用任何的空格数delimiter。

希望它helps。

相关讨论

NO NO流升压型，字符串，只是标准C库cooperating一起与std::string和std::list：C函数库的简单分析，C++的数据类型为简单的内存管理。

冰被视两个空格的任何组合newlines，标签和空间。《冰上的空白字符的wschars变成熟。

#include <string>
#include <list>
#include <iostream>
#include <cstring>

using namespace std;

const char *wschars ="\t
";

list<string> split(const string &str)
{
const char *cstr = str.c_str();
list<string> out;

while (*cstr) { // while remaining string not empty
size_t toklen;
cstr += strspn(cstr, wschars); // skip leading whitespace
toklen = strcspn(cstr, wschars); // figure out token length
if (toklen) // if we have a token, add to list
out.push_back(string(cstr, toklen));
cstr += toklen; // skip over token
}

// ran out of string; return list

return out;
}

int main(int argc, char **argv)
{
list<string> li = split(argv[1]);
for (list<string>::iterator i = li.begin(); i != li.end(); i++)
cout <<"{" << *i <<
<div class="suo-content">[collapse title=""]<ul><li>请使用std:：vector而不是list</li><li>@fmuecke问题中不要求对字符串片段使用特定的表示，因此不需要将您的建议合并到答案中。</li></ul>[/collapse]</div><p><center>[wp_ad_camp_5]</center></p><hr><P>这是我写的一个帮助我做很多事情的函数。它在为<wyn>WebSockets</wyn>做协议时帮助了我。</P>[cc lang="cpp"]using namespace std;
#include <iostream>
#include <vector>
#include <sstream>
#include <string>

vector<string> split ( string input , string split_id ) {
vector<string> result;
int i = 0;
bool add;
string temp;
stringstream ss;
size_t found;
string real;
int r = 0;
while ( i != input.length() ) {
add = false;
ss << input.at(i);
temp = ss.str();
found = temp.find(split_id);
if ( found != string::npos ) {
add = true;
real.append ( temp , 0 , found );
} else if ( r > 0 && ( i+1 ) == input.length() ) {
add = true;
real.append ( temp , 0 , found );
}
if ( add ) {
result.push_back(real);
ss.str(string());
ss.clear();
temp.clear();
real.clear();
r = 0;
}
i++;
r++;
}
return result;
}

int main() {
string s ="S,o,m,e,w,h,e,r,e, down the road
In a really big C++ house.
Lives a little old lady.
That no one ever knew.
She comes outside.
In the very hot sun.

And throws C++ at us.
The End. FIN.";
vector < string > Token;
Token = split ( s ,"," );
for ( int i = 0 ; i < Token.size(); i++) cout << Token.at(i) << endl;
cout << endl << Token.size();
int a;
cin >> a;
return a;
}

对于那些需要使用字符串分隔符拆分字符串的替代方法的用户，您可以尝试下面的解决方案。

std::vector<size_t> str_pos(const std::string &search, const std::string &target)
{
std::vector<size_t> founds;

if(!search.empty())
{
size_t start_pos = 0;

while (true)
{
size_t found_pos = target.find(search, start_pos);

if(found_pos != std::string::npos)
{
size_t found = found_pos;

founds.push_back(found);

start_pos = (found_pos + 1);
}
else
{
break;
}
}
}

return founds;
}

std::string str_sub_index(size_t begin_index, size_t end_index, const std::string &target)
{
std::string sub;

size_t size = target.length();

const char* copy = target.c_str();

for(size_t i = begin_index; i <= end_index; i++)
{
if(i >= size)
{
break;
}
else
{
char c = copy[i];

sub += c;
}
}

return sub;
}

std::vector<std::string> str_split(const std::string &delimiter, const std::string &target)
{
std::vector<std::string> splits;

if(!delimiter.empty())
{
std::vector<size_t> founds = str_pos(delimiter, target);

size_t founds_size = founds.size();

if(founds_size > 0)
{
size_t search_len = delimiter.length();

size_t begin_index = 0;

for(int i = 0; i <= founds_size; i++)
{
std::string sub;

if(i != founds_size)
{
size_t pos = founds.at(i);

sub = str_sub_index(begin_index, pos - 1, target);

begin_index = (pos + search_len);
}
else
{
sub = str_sub_index(begin_index, (target.length() - 1), target);
}

splits.push_back(sub);
}
}
}

return splits;
}

这些代码段由3个函数组成。坏消息是使用str_split函数，您将需要另外两个函数。是的，这是一大块代码。但好消息是，这些额外的两个功能能够独立工作，有时也会很有用。：)

在main()块中测试了如下功能：

1
2
3
4
5
6
7
8
9
10
11

int main()
{
std::string s ="Hello, world! We need to make the world a better place. Because your world is also my world, and our children's world.";

std::vector<std::string> split = str_split("world", s);

for(int i = 0; i < split.size(); i++)
{
std::cout << split[i] << std::endl;
}
}

它会产生：

1
2
3
4
5
6

Hello,
! We need to make the
a better place. Because your
is also my
, and our children's
.

我相信这不是最有效的代码，但至少它是有效的。希望它有帮助。

这是我解决这个问题的方法：

1
2
3
4
5
6
7
8
9
10
11

vector<string> get_tokens(string str) {
vector<string> dt;
stringstream ss;
string tmp;
ss << str;
for (size_t i; !ss.eof(); ++i) {
ss >> tmp;
dt.push_back(tmp);
}
return dt;
}

此函数返回字符串的向量。

是的，我看了全部30个例子。

我找不到适合多字符分隔符的split版本，所以我的：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

#include <string>
#include <vector>

using namespace std;

vector<string> split(const string &str, const string &delim)
{
const auto delim_pos = str.find(delim);

if (delim_pos == string::npos)
return {str};

vector<string> ret{str.substr(0, delim_pos)};
auto tail = split(str.substr(delim_pos + delim.size(), string::npos), delim);

ret.insert(ret.end(), tail.begin(), tail.end());

return ret;
}

可能不是最有效的实现，但它是一个非常简单的递归解决方案，只使用和。

啊，它是用C++ 11编写的，但是这个代码没有什么特别之处，所以你可以很容易地把它改编成C++ 98。

我用下面的

1
2
3
4
5
6
7
8
9
10
11

void split(string in, vector<string>& parts, char separator) {
string::iterator ts, curr;
ts = curr = in.begin();
for(; curr <= in.end(); curr++ ) {
if( (curr == in.end() || *curr == separator) && curr > ts )
parts.push_back( string( ts, curr ));
if( curr == in.end() )
break;
if( *curr == separator ) ts = curr + 1;
}
}

Plasmah，我忘了加上额外的检查(curr>ts)来删除带有空白的令牌。

相关讨论

这是我的版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

#include <vector>

inline std::vector<std::string> Split(const std::string &str, const std::string &delim ="")
{
std::vector<std::string> tokens;
if (str.size() > 0)
{
if (delim.size() > 0)
{
std::string::size_type currPos = 0, prevPos = 0;
while ((currPos = str.find(delim, prevPos)) != std::string::npos)
{
std::string item = str.substr(prevPos, currPos - prevPos);
if (item.size() > 0)
{
tokens.push_back(item);
}
prevPos = currPos + 1;
}
tokens.push_back(str.substr(prevPos));
}
else
{
tokens.push_back(str);
}
}
return tokens;
}

它使用多字符分隔符。它防止空令牌进入结果。它使用一个标题。当您不提供分隔符时，它将字符串作为单个标记返回。如果字符串为空，则返回空结果。不幸的是，由于巨大的EDCOX1×0拷贝，除非使用C++ 11编译，否则应该使用移动示意图。在C++ 11中，这个代码应该是快的。

在getline上循环，标记为''。

这是我的条目：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

template <typename Container, typename InputIter, typename ForwardIter>
Container
split(InputIter first, InputIter last,
ForwardIter s_first, ForwardIter s_last)
{
Container output;

while (true) {
auto pos = std::find_first_of(first, last, s_first, s_last);
output.emplace_back(first, pos);
if (pos == last) {
break;
}

first = ++pos;
}

return output;
}

template <typename Output = std::vector<std::string>,
typename Input = std::string,
typename Delims = std::string>
Output
split(const Input& input, const Delims& delims ="")
{
using std::cbegin;
using std::cend;
return split<Output>(cbegin(input), cend(input),
cbegin(delims), cend(delims));
}

auto vec = split("Mary had a little lamb");

第一个定义是采用两对迭代器的STL样式的泛型函数。第二个是一个方便的功能，可以省去你自己做所有的begin()和end()。例如，如果要使用list，还可以将输出容器类型指定为模板参数。

它之所以优雅(imo)，是因为与大多数其他答案不同，它不仅限于字符串，而且可以与任何与STL兼容的容器一起使用。在不更改上述代码的情况下，您可以说：

1
2
3
4

using vec_of_vecs_t = std::vector<std::vector<int>>;

std::vector<int> v{1, 2, 0, 3, 4, 5, 0, 7, 8, 0, 9};
auto r = split<vec_of_vecs_t>(v, std::initializer_list<int>{0, 2});

每次遇到0或2时，都会将矢量v分割成单独的矢量。

(还有一个额外的好处是，使用字符串，这个实现比基于strtok()和getline()的版本都要快，至少在我的系统上是这样的。)

我相信还没有人发布这个解决方案。与直接使用分隔符不同，它基本上与boost:：split()相同，即，它允许您传递一个谓词，如果char是分隔符，则返回true，否则返回false。我认为这给了程序员更多的控制权，最重要的是你不需要提升。

1
2
3
4
5
6
7
8
9
10
11
12
13

template <class Container, class String, class Predicate>
void split(Container& output, const String& input,
const Predicate& pred, bool trimEmpty = false) {
auto it = begin(input);
auto itLast = it;
while (it = find_if(it, end(input), pred), it != end(input)) {
if (not (trimEmpty and it == itLast)) {
output.emplace_back(itLast, it);
}
++it;
itLast = it;
}
}

然后你可以这样使用它：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

struct Delim {
bool operator()(char c) {
return not isalpha(c);
}
};

int main() {
string s("#include<iostream>
"
"int main() { std::cout << "Hello world!" << std::endl; }");

vector<string> v;

split(v, s, Delim(), true);
/* Which is also the same as */
split(v, s, [](char c) { return not isalpha(c); }, true);

for (const auto& i : v) {
cout << i << endl;
}
}

使用std::string_view和eric niebler的range-v3库：

https://wandbox.org/permlink/kw5lwrcl1pxjp2pw

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

#include <iostream>
#include <string>
#include <string_view>
#include"range/v3/view.hpp"
#include"range/v3/algorithm.hpp"

int main() {
std::string s ="Somewhere down the range v3 library";
ranges::for_each(s
| ranges::view::split(' ')
| ranges::view::transform([](auto &&sub) {
return std::string_view(&*sub.begin(), ranges::distance(sub));
}),
[](auto s) {std::cout <<"Substring:" << s <<"
";}
);
}

我刚刚写了一个很好的例子，说明如何将一个字符一个符号地拆分，然后将每个字符数组(由符号分隔的单词)放入一个向量中。为了简单起见，我创建了std字符串的向量类型。

我希望这对您有所帮助，并且您可以阅读。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

#include <vector>
#include <string>
#include <iostream>

void push(std::vector<std::string> &WORDS, std::string &TMP){
WORDS.push_back(TMP);
TMP ="";
}
std::vector<std::string> mySplit(char STRING[]){
std::vector<std::string> words;
std::string s;
for(unsigned short i = 0; i < strlen(STRING); i++){
if(STRING[i] != ' '){
s += STRING[i];
}else{
push(words, s);
}
}
push(words, s);//Used to get last split
return words;
}

int main(){
char string[] ="My awesome string.";
std::cout << mySplit(string)[2];
std::cin.get();
return 0;
}

根据加利克的回答，我做了这个。这主要是在这里，所以我不必一次又一次地写。疯狂的是C++仍然没有一个本机分割函数。特征：

应该很快。
很容易理解(我想)。
合并空节。
使用多个分隔符(如"
")很简单

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

#include <string>
#include <vector>
#include

std::vector<std::string> split(const std::string& s, const std::string& delims)
{
using namespace std;

vector<string> v;

// Start of an element.
size_t elemStart = 0;

// We start searching from the end of the previous element, which
// initially is the start of the string.
size_t elemEnd = 0;

// Find the first non-delim, i.e. the start of an element, after the end of the previous element.
while((elemStart = s.find_first_not_of(delims, elemEnd)) != string::npos)
{
// Find the first delem, i.e. the end of the element (or if this fails it is the end of the string).
elemEnd = s.find_first_of(delims, elemStart);
// Add it.
v.emplace_back(s, elemStart, elemEnd == string::npos ? string::npos : elemEnd - elemStart);
}
// When there are no more non-spaces, we are done.

return v;
}

1
2
3
4
5
6
7
8
9
10
11
12

// adapted from a"regular" csv parse
std::string stringIn ="my csv is 10233478 NOTseparated by commas";
std::vector<std::string> commaSeparated(1);
int commaCounter = 0;
for (int i=0; i<stringIn.size(); i++) {
if (stringIn[i] =="") {
commaSeparated.push_back("");
commaCounter++;
} else {
commaSeparated.at(commaCounter) += stringIn[i];
}
}

最后，您将得到一个字符串的向量，语句中的每个元素都由空格分隔。只有非标准资源是std:：vector(但由于涉及std:：string，所以我认为它是可以接受的)。

空字符串另存为单独的项。

这是我的看法。我必须逐字处理输入字符串，这可以通过使用空格来计算单词来完成，但我觉得这会很繁琐，我应该将单词拆分为向量。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

#include<iostream>
#include<vector>
#include<string>
#include<stdio.h>
using namespace std;
int main()
{
char x = '\0';
string s ="";
vector<string> q;
x = getchar();
while(x != '
')
{
if(x == ' ')
{
q.push_back(s);
s ="";
x = getchar();
continue;
}
s = s + x;
x = getchar();
}
q.push_back(s);
for(int i = 0; i<q.size(); i++)
cout<<q[i]<<"";
return 0;
}

不考虑多个空间。

如果最后一个单词后面不是换行符，则它包括最后一个单词的最后一个字符和换行符之间的空白。

为了方便起见：

1
2
3
4

template<class V, typename T>
bool in(const V &v, const T &el) {
return std::find(v.begin(), v.end(), el) != v.end();
}

基于多个分隔符的实际拆分：

1
2
3
4
5
6
7
8
9
10
11
12
13

std::vector<std::string> split(const std::string &s,
const std::vector<char> &delims) {
std::vector<std::string> res;
auto stuff = [&delims](char c) { return !in(delims, c); };
auto space = [&delims](char c) { return in(delims, c); };
auto first = std::find_if(s.begin(), s.end(), stuff);
while (first != s.end()) {
auto last = std::find_if(first, s.end(), space);
res.push_back(std::string(first, last));
first = std::find_if(last + 1, s.end(), stuff);
}
return res;
}

用法：

1
2
3
4
5
6

int main() {
std::string s =" aaa, bb cc";
for (auto el: split(s, {' ', ','}))
std::cout << el << std::endl;
return 0;
}

我们可以在C++中使用Strutk，

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

#include <iostream>
#include <cstring>
using namespace std;

int main()
{
char str[]="Mickey M;12034;911416313;M;01a;9001;NULL;0;13;12;0;CPP,C;MSC,3D;FEND,BEND,SEC;";
char *pch = strtok (str,";,");
while (pch != NULL)
{
cout<<pch<<"
";
pch = strtok (NULL,";,");
}
return 0;
}

我的代码是：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

#include <list>
#include <string>
template<class StringType = std::string, class ContainerType = std::list<StringType> >
class DSplitString:public ContainerType
{
public:
explicit DSplitString(const StringType& strString, char cChar, bool bSkipEmptyParts = true)
{
size_t iPos = 0;
size_t iPos_char = 0;
while(StringType::npos != (iPos_char = strString.find(cChar, iPos)))
{
StringType strTemp = strString.substr(iPos, iPos_char - iPos);
if((bSkipEmptyParts && !strTemp.empty()) || (!bSkipEmptyParts))
push_back(strTemp);
iPos = iPos_char + 1;
}
}
explicit DSplitString(const StringType& strString, const StringType& strSub, bool bSkipEmptyParts = true)
{
size_t iPos = 0;
size_t iPos_char = 0;
while(StringType::npos != (iPos_char = strString.find(strSub, iPos)))
{
StringType strTemp = strString.substr(iPos, iPos_char - iPos);
if((bSkipEmptyParts && !strTemp.empty()) || (!bSkipEmptyParts))
push_back(strTemp);
iPos = iPos_char + strSub.length();
}
}
};

例子：

1
2
3
4
5
6
7
8
9
10
11
12

#include <iostream>
#include <string>
int _tmain(int argc, _TCHAR* argv[])
{
DSplitString<> aa("doicanhden1;doicanhden2;doicanhden3;", ';');
for each (std::string var in aa)
{
std::cout << var << std::endl;
}
std::cin.get();
return 0;
}

相关讨论

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

#include <iostream>
#include <vector>
using namespace std;

int main() {
string str ="ABC AABCD CDDD RABC GHTTYU FR";
str +=""; //dirty hack: adding extra space to the end
vector<string> v;

for (int i=0; i<(int)str.size(); i++) {
int a, b;
a = i;

for (int j=i; j<(int)str.size(); j++) {
if (str[j] == ' ') {
b = j;
i = j;
break;
}
}
v.push_back(str.substr(a, b-a));
}

for (int i=0; i<v.size(); i++) {
cout<<v[i].size()<<""<<v[i]<<endl;
}
return 0;
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71

#include <iostream>
#include <string>
#include <deque>

std::deque<std::string> split(
const std::string& line,
std::string::value_type delimiter,
bool skipEmpty = false
) {
std::deque<std::string> parts{};

if (!skipEmpty && !line.empty() && delimiter == line.at(0)) {
parts.push_back({});
}

for (const std::string::value_type& c : line) {
if (
(
c == delimiter
&&
(skipEmpty ? (!parts.empty() && !parts.back().empty()) : true)
)
||
(c != delimiter && parts.empty())
) {
parts.push_back({});
}

if (c != delimiter) {
parts.back().push_back(c);
}
}

if (skipEmpty && !parts.empty() && parts.back().empty()) {
parts.pop_back();
}

return parts;
}

void test(const std::string& line) {
std::cout << line << std::endl;

std::cout <<"skipEmpty=0 |";
for (const std::string& part : split(line, ':')) {
std::cout << part << '|';
}
std::cout << std::endl;

std::cout <<"skipEmpty=1 |";
for (const std::string& part : split(line, ':', true)) {
std::cout << part << '|';
}
std::cout << std::endl;

std::cout << std::endl;
}

int main() {
test("foo:bar:::baz");
test("");
test("foo");
test(":");
test("::");
test(":foo");
test("::foo");
test(":foo:");
test(":foo::");

return 0;
}

输出：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

foo:bar:::baz
skipEmpty=0 |foo|bar|||baz|
skipEmpty=1 |foo|bar|baz|

skipEmpty=0 |
skipEmpty=1 |

foo
skipEmpty=0 |foo|
skipEmpty=1 |foo|

:
skipEmpty=0 |||
skipEmpty=1 |

::
skipEmpty=0 ||||
skipEmpty=1 |

:foo
skipEmpty=0 ||foo|
skipEmpty=1 |foo|

::foo
skipEmpty=0 |||foo|
skipEmpty=1 |foo|

:foo:
skipEmpty=0 ||foo||
skipEmpty=1 |foo|

:foo::
skipEmpty=0 ||foo|||
skipEmpty=1 |foo|

相关讨论

我对string和u32string的一般实现，使用boost::algorithm::split签名。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

template<typename CharT, typename UnaryPredicate>
void split(std::vector<std::basic_string<CharT>>& split_result,
const std::basic_string<CharT>& s,
UnaryPredicate predicate)
{
using ST = std::basic_string<CharT>;
using std::swap;
std::vector<ST> tmp_result;
auto iter = s.cbegin(),
end_iter = s.cend();
while (true)
{
/**
* edge case: empty str -> push an empty str and exit.
*/
auto find_iter = find_if(iter, end_iter, predicate);
tmp_result.emplace_back(iter, find_iter);
if (find_iter == end_iter) { break; }
iter = ++find_iter;
}
swap(tmp_result, split_result);
}

template<typename CharT>
void split(std::vector<std::basic_string<CharT>>& split_result,
const std::basic_string<CharT>& s,
const std::basic_string<CharT>& char_candidate)
{
std::unordered_set<CharT> candidate_set(char_candidate.cbegin(),
char_candidate.cend());
auto predicate = [&candidate_set](const CharT& c) {
return candidate_set.count(c) > 0U;
};
return split(split_result, s, predicate);
}

template<typename CharT>
void split(std::vector<std::basic_string<CharT>>& split_result,
const std::basic_string<CharT>& s,
const CharT* literals)
{
return split(split_result, s, std::basic_string<CharT>(literals));
}

这是对最热门答案之一的扩展。现在它支持设置返回元素的最大数目n。字符串的最后一位将结束在第n个元素中。maxelements参数是可选的，如果设置为默认值0，它将返回无限数量的元素。-)

h：

1
2
3
4
5

class Myneatclass {
public:
static std::vector<std::string>& split(const std::string &s, char delim, std::vector<std::string> &elems, const size_t MAXELEMENTS = 0);
static std::vector<std::string> split(const std::string &s, char delim, const size_t MAXELEMENTS = 0);
};

CPP：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

std::vector<std::string>& Myneatclass::split(const std::string &s, char delim, std::vector<std::string> &elems, const size_t MAXELEMENTS) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
if (MAXELEMENTS > 0 && !ss.eof() && elems.size() + 1 >= MAXELEMENTS) {
std::getline(ss, item);
elems.push_back(item);
break;
}
}
return elems;
}
std::vector<std::string> Myneatclass::split(const std::string &s, char delim, const size_t MAXELEMENTS) {
std::vector<std::string> elems;
split(s, delim, elems, MAXELEMENTS);
return elems;
}

如果您想用一些字符拆分字符串，可以使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

#include<iostream>
#include<string>
#include<vector>
#include<iterator>
#include<sstream>
#include<string>

using namespace std;
void replaceOtherChars(string &input, vector<char> &dividers)
{
const char divider = dividers.at(0);
int replaceIndex = 0;
vector<char>::iterator it_begin = dividers.begin()+1,
it_end= dividers.end();
for(;it_begin!=it_end;++it_begin)
{
replaceIndex = 0;
while(true)
{
replaceIndex=input.find_first_of(*it_begin,replaceIndex);
if(replaceIndex==-1)
break;
input.at(replaceIndex)=divider;
}
}
}
vector<string> split(string str, vector<char> chars, bool missEmptySpace =true )
{
vector<string> result;
const char divider = chars.at(0);
replaceOtherChars(str,chars);
stringstream stream;
stream<<str;
string temp;
while(getline(stream,temp,divider))
{
if(missEmptySpace && temp.empty())
continue;
result.push_back(temp);
}
return result;
}
int main()
{
string str ="milk, pigs.... hot-dogs";
vector<char> arr;
arr.push_back(' '); arr.push_back(','); arr.push_back('.');
vector<string> result = split(str,arr);
vector<string>::iterator it_begin= result.begin(),
it_end= result.end();
for(;it_begin!=it_end;++it_begin)
{
cout<<*it_begin<<endl;
}
return 0;
}

谢谢你@jairo abdiel toribio cisneros。它对我有效，但您的函数返回一些空元素。因此，对于不带空的返回，我使用以下内容进行了编辑：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

std::vector<std::string> split(std::string str, const char* delim) {
std::vector<std::string> v;
std::string tmp;

for(std::string::const_iterator i = str.begin(); i <= str.end(); ++i) {
if(*i != *delim && i != str.end()) {
tmp += *i;
} else {
if (tmp.length() > 0) {
v.push_back(tmp);
}
tmp ="";
}
}

return v;
}

使用：

1
2
3

std::string s ="one:two::three";
std::string delim =":";
std::vector<std::string> vv = split(s, delim.c_str());

我知道参加聚会的时间很晚，但是我在想，如果给你一系列的分隔符而不是空白，并且只使用标准库，那么最优雅的方法就是这样做。

以下是我的想法：

要通过一系列分隔符将单词拆分为字符串向量，请执行以下操作：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

template<class Container>
std::vector<std::string> split_by_delimiters(const std::string& input, const Container& delimiters)
{
std::vector<std::string> result;

for (auto current = begin(input) ; current != end(input) ; )
{
auto first = find_if(current, end(input), not_in(delimiters));
if (first == end(input)) break;
auto last = find_if(first, end(input), is_in(delimiters));
result.emplace_back(first, last);
current = last;
}
return result;
}

通过提供有效字符序列，以另一种方式拆分：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

template<class Container>
std::vector<std::string> split_by_valid_chars(const std::string& input, const Container& valid_chars)
{
std::vector<std::string> result;

for (auto current = begin(input) ; current != end(input) ; )
{
auto first = find_if(current, end(input), is_in(valid_chars));
if (first == end(input)) break;
auto last = find_if(first, end(input), not_in(valid_chars));
result.emplace_back(first, last);
current = last;
}
return result;
}

"是"和"不是"的定义如下：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

namespace detail {
template<class Container>
struct is_in {
is_in(const Container& charset)
: _charset(charset)
{}

bool operator()(char c) const
{
return find(begin(_charset), end(_charset), c) != end(_charset);
}

const Container& _charset;
};

template<class Container>
struct not_in {
not_in(const Container& charset)
: _charset(charset)
{}

bool operator()(char c) const
{
return find(begin(_charset), end(_charset), c) == end(_charset);
}

const Container& _charset;
};

}

template<class Container>
detail::not_in<Container> not_in(const Container& c)
{
return detail::not_in<Container>(c);
}

template<class Container>
detail::is_in<Container> is_in(const Container& c)
{
return detail::is_in<Container>(c);
}

我的实现可以是另一种解决方案：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

std::vector<std::wstring> SplitString(const std::wstring & String, const std::wstring & Seperator)
{
std::vector<std::wstring> Lines;
size_t stSearchPos = 0;
size_t stFoundPos;
while (stSearchPos < String.size() - 1)
{
stFoundPos = String.find(Seperator, stSearchPos);
stFoundPos = (stFoundPos == std::string::npos) ? String.size() : stFoundPos;
Lines.push_back(String.substr(stSearchPos, stFoundPos - stSearchPos));
stSearchPos = stFoundPos + Seperator.size();
}
return Lines;
}

测试代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

std::wstring MyString(L"Part 1SEPsecond partSEPlast partSEPend");
std::vector<std::wstring> Parts = IniFile::SplitString(MyString, L"SEP");
std::wcout << L"The string:" << MyString << std::endl;
for (std::vector<std::wstring>::const_iterator it=Parts.begin(); it<Parts.end(); ++it)
{
std::wcout << *it << L"<---" << std::endl;
}
std::wcout << std::endl;
MyString = L"this,time,a,comma separated,string";
std::wcout << L"The string:" << MyString << std::endl;
Parts = IniFile::SplitString(MyString, L",");
for (std::vector<std::wstring>::const_iterator it=Parts.begin(); it<Parts.end(); ++it)
{
std::wcout << *it << L"<---" << std::endl;
}

测试代码输出：

1
2
3
4
5
6
7
8
9
10
11
12

The string: Part 1SEPsecond partSEPlast partSEPend
Part 1<---
second part<---
last part<---
end<---

The string: this,time,a,comma separated,string
this<---
time<---
a<---
comma separated<---
string<---

我有一个与其他解决方案非常不同的方法，它以其他解决方案不同的方式提供了很多价值，但当然也有它自己的缺点。下面是工作的实现，例如将放在单词周围。

首先，这个问题可以通过一个循环来解决，不需要额外的内存，并且只考虑四个逻辑情况。从概念上讲，我们对边界感兴趣。我们的代码应该反映出这一点：让我们遍历字符串，一次查看两个字符，记住在字符串的开头和结尾都有特殊的情况。

缺点是我们必须编写实现，这有点冗长，但主要是方便的样板文件。

好处是我们编写了实现，因此很容易根据特定的需求对其进行定制，例如区分左边界和写单词边界、使用任何一组定界符，或者处理其他情况，例如非边界或错误位置。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

using namespace std;

#include <iostream>
#include <string>

#include <cctype>

typedef enum boundary_type_e {
E_BOUNDARY_TYPE_ERROR = -1,
E_BOUNDARY_TYPE_NONE,
E_BOUNDARY_TYPE_LEFT,
E_BOUNDARY_TYPE_RIGHT,
} boundary_type_t;

typedef struct boundary_s {
boundary_type_t type;
int pos;
} boundary_t;

bool is_delim_char(int c) {
return isspace(c); // also compare against any other chars you want to use as delimiters
}

bool is_word_char(int c) {
return ' ' <= c && c <= '~' && !is_delim_char(c);
}

boundary_t maybe_word_boundary(string str, int pos) {
int len = str.length();
if (pos < 0 || pos >= len) {
return (boundary_t){.type = E_BOUNDARY_TYPE_ERROR};
} else {
if (pos == 0 && is_word_char(str[pos])) {
// if the first character is word-y, we have a left boundary at the beginning
return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos};
} else if (pos == len - 1 && is_word_char(str[pos])) {
// if the last character is word-y, we have a right boundary left of the null terminator
return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
} else if (!is_word_char(str[pos]) && is_word_char(str[pos + 1])) {
// if we have a delimiter followed by a word char, we have a left boundary left of the word char
return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos + 1};
} else if (is_word_char(str[pos]) && !is_word_char(str[pos + 1])) {
// if we have a word char followed by a delimiter, we have a right boundary right of the word char
return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
}
return (boundary_t){.type = E_BOUNDARY_TYPE_NONE};
}
}

int main() {
string str;
getline(cin, str);

int len = str.length();
for (int i = 0; i < len; i++) {
boundary_t boundary = maybe_word_boundary(str, i);
if (boundary.type == E_BOUNDARY_TYPE_LEFT) {
// whatever
} else if (boundary.type == E_BOUNDARY_TYPE_RIGHT) {
// whatever
}
}
}

如您所见，代码非常容易理解和微调，代码的实际使用非常简短和简单。使用C++不应该阻止我们编写最简单和最容易定制的代码，即使这意味着不使用STL。我认为这是Linus Torvalds可能称之为"品味"的一个例子，因为我们消除了所有我们不需要的逻辑，同时以一种自然地允许更多的案例在需要处理它们的时候和如果需要处理它们的时候处理。

可以改进此代码的可能是使用enum class，在maybe_word_boundary中接受指向is_word_char的函数指针，而不是直接调用is_word_char，并传递lambda。

这里的方法是：切割和分离

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

string cut (string& str, const string& del)
{
string f = str;

if (in.find_first_of(del) != string::npos)
{
f = str.substr(0,str.find_first_of(del));
str = str.substr(str.find_first_of(del)+del.length());
}

return f;
}

vector<string> split (const string& in, const string& del="")
{
vector<string> out();
string t = in;

while (t.length() > del.length())
out.push_back(cut(t,del));

return out;
}

顺便说一句，如果有什么可以做的，在这两个OPTIMIZE…………………

已经有很多好的回答这两个问题，这是只是一个小小的零售。

分裂的字符串输出冰的一件事，但如果你分到一vector容器类，可以让两个reserve()呼叫一差的性能，因为会引起"分裂"concurrent"(chunks分配在不同的大小。

即使《雅可能遭受从一个小的网络，可以被视preceding分析：

1 2	#include size_t n = std::count(s.begin(), s.end(), ' ');

不是说我们需要更多的答案，而是我在受到埃文·泰兰的启发后得出的结论。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

std::vector <std::string> split(const string &input, auto delimiter, bool skipEmpty=true) {
/*
Splits a string at each delimiter and returns these strings as a string vector.
If the delimiter is not found then nothing is returned.
If skipEmpty is true then strings between delimiters that are 0 in length will be skipped.
*/
bool delimiterFound = false;
int pos=0, pPos=0;
std::vector <std::string> result;
while (true) {
pos = input.find(delimiter,pPos);
if (pos != std::string::npos) {
if (skipEmpty==false or pos-pPos > 0) // if empty values are to be kept or not
result.push_back(input.substr(pPos,pos-pPos));
delimiterFound = true;
} else {
if (pPos < input.length() and delimiterFound) {
if (skipEmpty==false or input.length()-pPos > 0) // if empty values are to be kept or not
result.push_back(input.substr(pPos,input.length()-pPos));
}
break;
}
pPos = pos+1;
}
return result;
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

#include <iostream>
#include <string>
#include <sstream>
#include
#include <iterator>
#include <vector>

int main() {
using namespace std;
int n=8;
string sentence ="10 20 30 40 5 6 7 8";
istringstream iss(sentence);

vector<string> tokens;
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter(tokens));

for(int i=0;i<n;i++){
cout<<tokens.at(i);
}

}

1
2
3
4
5
6
7
8
9
10
11
12

void splitString(string str, char delim, string array[], const int arraySize)
{
int delimPosition, subStrSize, subStrStart = 0;

for (int index = 0; delimPosition != -1; index++)
{
delimPosition = str.find(delim, subStrStart);
subStrSize = delimPosition - subStrStart;
array[index] = str.substr(subStrStart, subStrSize);
subStrStart =+ (delimPosition + 1);
}
}

相关讨论

对于一个非常大，可能是冗余的版本，尝试很多for循环。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

string stringlist[10];
int count = 0;

for (int i = 0; i < sequence.length(); i++)
{
if (sequence[i] == ' ')
{
stringlist[count] = sequence.substr(0, i);
sequence.erase(0, i+1);
i = 0;
count++;
}
else if (i == sequence.length()-1) // Last word
{
stringlist[count] = sequence.substr(0, i+1);
}
}

它不漂亮，但大体上(除了标点符号和一系列其他错误)它是有效的！

相关讨论