How do I iterate over the words of a string?
我正在尝试迭代字符串中的单词。
可以假定字符串由空格分隔的单词组成。
注意,我对C字符串函数或那种字符操作/访问不感兴趣。此外,请在回答中优先考虑优雅而不是效率。
我现在拥有的最佳解决方案是:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | #include <iostream> #include <sstream> #include <string> using namespace std; int main() { string s ="Somewhere down the road"; istringstream iss(s); do { string subs; iss >> subs; cout <<"Substring:" << subs << endl; } while (iss); } |
有更优雅的方法吗?
我用这个来分隔字符串。第一个将结果放入一个预先构建的向量中,第二个返回一个新的向量。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | #include <string> #include <sstream> #include <vector> #include <iterator> template<typename Out> void split(const std::string &s, char delim, Out result) { std::stringstream ss(s); std::string item; while (std::getline(ss, item, delim)) { *(result++) = item; } } std::vector<std::string> split(const std::string &s, char delim) { std::vector<std::string> elems; split(s, delim, std::back_inserter(elems)); return elems; } |
请注意,此解决方案不会跳过空令牌,因此下面将找到4个项,其中一个为空:
1 | std::vector<std::string> x = split("one:two::three", ':'); |
对于它的价值,这里有另一种方法从输入字符串中提取令牌,只依赖于标准的库设施。这是STL设计背后的力量和优雅的一个例子。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #include <iostream> #include <string> #include <sstream> #include #include <iterator> int main() { using namespace std; string sentence ="And I feel fine..."; istringstream iss(sentence); copy(istream_iterator<string>(iss), istream_iterator<string>(), ostream_iterator<string>(cout," ")); } |
可以使用相同的通用
1 2 3 4 | vector<string> tokens; copy(istream_iterator<string>(iss), istream_iterator<string>(), back_inserter(tokens)); |
…或者直接创建
1 2 | vector<string> tokens{istream_iterator<string>{iss}, istream_iterator<string>{}}; |
使用Boost的一个可能的解决方案是:
1 2 3 | #include <boost/algorithm/string.hpp> std::vector<std::string> strs; boost::split(strs,"string to split", boost::is_any_of("\t")); |
这种方法可能比
有关详细信息,请参阅文档。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #include <vector> #include <string> #include <sstream> int main() { std::string str("Split me by whitespaces"); std::string buf; // Have a buffer string std::stringstream ss(str); // Insert the string into a stream std::vector<std::string> tokens; // Create vector to hold our words while (ss >> buf) tokens.push_back(buf); return 0; } |
对于那些不愿意牺牲所有代码大小的效率并将"高效"视为一种优雅的类型的人来说,下面应该是一个最佳选择(我认为模板容器类是一个非常优雅的添加)。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | template < class ContainerT > void tokenize(const std::string& str, ContainerT& tokens, const std::string& delimiters ="", bool trimEmpty = false) { std::string::size_type pos, lastPos = 0, length = str.length(); using value_type = typename ContainerT::value_type; using size_type = typename ContainerT::size_type; while(lastPos < length + 1) { pos = str.find_first_of(delimiters, lastPos); if(pos == std::string::npos) { pos = length; } if(pos != lastPos || !trimEmpty) tokens.push_back(value_type(str.data()+lastPos, (size_type)pos-lastPos )); lastPos = pos + 1; } } |
我通常选择使用
它的速度比这个页面上最快的标记化快了一倍多,比其他页面快了近5倍。此外,通过完美的参数类型,您可以消除所有字符串和列表副本,以提高速度。
此外,它不返回结果(效率极低),而是将令牌作为引用传递,因此如果您愿意,还允许您使用多个调用来构建令牌。
最后,它允许您指定是否通过最后一个可选参数从结果中修剪空标记。
它所需要的只是
这是另一个解决方案。它结构紧凑,效率合理:
1 2 3 4 5 6 7 8 9 10 | std::vector<std::string> split(const std::string &text, char sep) { std::vector<std::string> tokens; std::size_t start = 0, end = 0; while ((end = text.find(sep, start)) != std::string::npos) { tokens.push_back(text.substr(start, end - start)); start = end + 1; } tokens.push_back(text.substr(start)); return tokens; } |
它可以很容易地被模板化处理字符串分隔符、宽字符串等。
注意,拆分
它也可以很容易地扩展以跳过空令牌:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | std::vector<std::string> split(const std::string &text, char sep) { std::vector<std::string> tokens; std::size_t start = 0, end = 0; while ((end = text.find(sep, start)) != std::string::npos) { if (end != start) { tokens.push_back(text.substr(start, end - start)); } start = end + 1; } if (end != start) { tokens.push_back(text.substr(start)); } return tokens; } |
如果需要在跳过空标记的同时在多个分隔符处拆分字符串,则可以使用此版本:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | std::vector<std::string> split(const std::string& text, const std::string& delims) { std::vector<std::string> tokens; std::size_t start = text.find_first_not_of(delims), end = 0; while((end = text.find_first_of(delims, start)) != std::string::npos) { tokens.push_back(text.substr(start, end - start)); start = text.find_first_not_of(delims, end); } if(start != std::string::npos) tokens.push_back(text.substr(start)); return tokens; } |
这是我最喜欢的迭代字符串的方法。每个字你想做什么就做什么。
1 2 3 4 5 6 7 8 9 | string line ="a line of text to iterate through"; string word; istringstream iss(line, istringstream::in); while( iss >> word ) { // Do something on `word` here... } |
这类似于堆栈溢出问题,如何在C++中标记一个字符串?.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | #include <iostream> #include <string> #include <boost/tokenizer.hpp> using namespace std; using namespace boost; int main(int argc, char** argv) { string text ="token test\tstring"; char_separator<char> sep(" \t"); tokenizer<char_separator<char>> tokens(text, sep); for (const string& t : tokens) { cout << t <<"." << endl; } } |
我喜欢下面的内容,因为它将结果放入一个向量中,支持字符串作为熟食,并提供保持空值的控制。但那时候看起来不太好。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | #include <ostream> #include <string> #include <vector> #include #include <iterator> using namespace std; vector<string> split(const string& s, const string& delim, const bool keep_empty = true) { vector<string> result; if (delim.empty()) { result.push_back(s); return result; } string::const_iterator substart = s.begin(), subend; while (true) { subend = search(substart, s.end(), delim.begin(), delim.end()); string temp(substart, subend); if (keep_empty || !temp.empty()) { result.push_back(temp); } if (subend == s.end()) { break; } substart = subend + delim.size(); } return result; } int main() { const vector<string> words = split("So close no matter how far",""); copy(words.begin(), words.end(), ostream_iterator<string>(cout," ")); } |
当然,Boost有一个部分工作的
STL还没有这样的方法。
但是,您可以使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | void Tokenize(const string& str, vector<string>& tokens, const string& delimiters ="") { // Skip delimiters at beginning. string::size_type lastPos = str.find_first_not_of(delimiters, 0); // Find first"non-delimiter". string::size_type pos = str.find_first_of(delimiters, lastPos); while (string::npos != pos || string::npos != lastPos) { // Found a token, add it to the vector. tokens.push_back(str.substr(lastPos, pos - lastPos)); // Skip delimiters. Note the"not_of" lastPos = str.find_first_not_of(delimiters, pos); // Find next"non-delimiter" pos = str.find_first_of(delimiters, lastPos); } } |
取自:http://oopWeb.com /CPP/文档/CPPHOWTO /Stult/C+ +编程HOOT-7.HTML
如果您对代码示例有疑问,请留下评论,我会解释的。
仅仅因为它没有实现一个名为迭代器的
不要在这个"优雅胜过性能"的交易上被出卖。
下面是一个拆分函数:
- 是通用的
- 使用标准C++(无升压)
- 接受多个分隔符
忽略空标记(可以很容易地更改)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16template<typename T>
vector<T>
split(const T & str, const T & delimiters) {
vector<T> v;
typename T::size_type start = 0;
auto pos = str.find_first_of(delimiters, start);
while(pos != T::npos) {
if(pos != start) // ignore empty tokens
v.emplace_back(str, start, pos - start);
start = pos + 1;
pos = str.find_first_of(delimiters, start);
}
if(start < str.length()) // ignore trailing delimiter
v.emplace_back(str, start, str.length() - start); // add what's left of the string
return v;
}
示例用法:
1 2 | vector<string> v = split<string>("Hello, there; World",";,"); vector<wstring> v = split<wstring>(L"Hello, there; World", L";,"); |
I have a 2 lines solution to this problem:
1 2 3 4 5 | char sep = ' '; std::string s="1 This is an example"; for(size_t p=0, q=0; p!=s.npos; p=q) std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl; |
然后你可以把它放在一个向量中,而不是打印。
另一种灵活快速的方法
1 2 3 4 5 6 7 8 9 10 11 12 13 | template<typename Operator> void tokenize(Operator& op, const char* input, const char* delimiters) { const char* s = input; const char* e = s; while (*e != 0) { e = s; while (*e != 0 && strchr(delimiters, *e) == 0) ++e; if (e - s > 0) { op(s, e - s); } s = e + 1; } } |
将它与字符串向量一起使用(编辑:因为有人指出不继承STL类…HRMF;)):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | template<class ContainerType> class Appender { public: Appender(ContainerType& container) : container_(container) {;} void operator() (const char* s, unsigned length) { container_.push_back(std::string(s,length)); } private: ContainerType& container_; }; std::vector<std::string> strVector; Appender v(strVector); tokenize(v,"A number of words to be tokenized"," \t"); |
就是这样!这只是使用记号赋予器的一种方法,比如如何数字:
1 2 3 4 5 6 7 8 9 10 11 12 | class WordCounter { public: WordCounter() : noOfWords(0) {} void operator() (const char*, unsigned) { ++noOfWords; } unsigned noOfWords; }; WordCounter wc; tokenize(wc,"A number of words to be counted"," \t"); ASSERT( wc.noOfWords == 7 ); |
受想象力限制;
Here's a simple solution that uses only the standard regex library
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | #include <regex> #include <string> #include <vector> std::vector<string> Tokenize( const string str, const std::regex regex ) { using namespace std; std::vector<string> result; sregex_token_iterator it( str.begin(), str.end(), regex, -1 ); sregex_token_iterator reg_end; for ( ; it != reg_end; ++it ) { if ( !it->str().empty() ) //token could be empty:check result.emplace_back( it->str() ); } return result; } |
在正则表达式允许参数诊断的多(空间commas题元,等。)
通常,只有在两个分离的在线检查和commas空间,所以也有这个默认的功能:
1 2 3 4 5 6 7 8 | std::vector<string> TokenizeDefault( const string str ) { using namespace std; regex re("[\\s,]+" ); return Tokenize( str, re ); } |
《
注意,如果你想要的而不是分裂的两个
- 全
std::regex 两std::wregex 相变 - 全
sregex_token_iterator 两wsregex_token_iterator 相变
注意,你也会想把你的城市的参考字符串参数,取决于你的编译器。
使用
下面是一个例子:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | #include <iostream> #include <string> int main() { std::string s("Somewhere down the road"); std::string::size_type prev_pos = 0, pos = 0; while( (pos = s.find(' ', pos)) != std::string::npos ) { std::string substring( s.substr(prev_pos, pos-prev_pos) ); std::cout << substring << ' '; prev_pos = ++pos; } std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word std::cout << substring << ' '; return 0; } |
如果您喜欢使用boost,但希望使用一个完整的字符串作为分隔符(而不是像以前大多数建议的解决方案中那样使用单个字符),则可以使用
示例代码包括方便的模板:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | #include <iostream> #include <vector> #include <boost/algorithm/string.hpp> template<typename _OutputIterator> inline void split( const std::string& str, const std::string& delim, _OutputIterator result) { using namespace boost::algorithm; typedef split_iterator<std::string::const_iterator> It; for(It iter=make_split_iterator(str, first_finder(delim, is_equal())); iter!=It(); ++iter) { *(result++) = boost::copy_range<std::string>(*iter); } } int main(int argc, char* argv[]) { using namespace std; vector<string> splitted; split("HelloFOOworldFOO!","FOO", back_inserter(splitted)); // or directly to console, for example split("HelloFOOworldFOO!","FOO", ostream_iterator<string>(cout," ")); return 0; } |
有一个名为
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #include<string> using namespace std; vector<string> split(char* str,const char* delim) { char* saveptr; char* token = strtok_r(str,delim,&saveptr); vector<string> result; while(token != NULL) { result.push_back(token); token = strtok_r(NULL,delim,&saveptr); } return result; } |
这里是一个只使用标准regex库的regex解决方案。(我有点生疏,所以可能有一些语法错误,但这至少是一般的想法)
1 2 3 4 5 6 7 8 9 10 11 12 13 | #include <regex.h> #include <string.h> #include <vector.h> using namespace std; vector<string> split(string s){ regex r ("\\w+"); //regex matches whole words, (greedy, so no fragment words) regex_iterator<string::iterator> rit ( s.begin(), s.end(), r ); regex_iterator<string::iterator> rend; //iterators to iterate thru words vector<string> result<regex_iterator>(rit, rend); return result; //iterates through the matches to fill the vector } |
如果需要用非空格符号解析字符串,则StringStream很方便:
1 2 3 4 5 6 7 8 | string s ="Name:JAck; Spouse:Susan; ..."; string dummy, name, spouse; istringstream iss(s); getline(iss, dummy, ':'); getline(iss, name, ';'); getline(iss, dummy, ':'); getline(iss, spouse, ';') |
到目前为止,我使用的是Boost,但我需要一些不依赖它的东西,所以我得出了以下结论:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | static void Split(std::vector<std::string>& lst, const std::string& input, const std::string& separators, bool remove_empty = true) { std::ostringstream word; for (size_t n = 0; n < input.size(); ++n) { if (std::string::npos == separators.find(input[n])) word << input[n]; else { if (!word.str().empty() || !remove_empty) lst.push_back(word.str()); word.str(""); } } if (!word.str().empty() || !remove_empty) lst.push_back(word.str()); } |
好的一点是,在
短而优雅
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #include <vector> #include <string> using namespace std; vector<string> split(string data, string token) { vector<string> output; size_t pos = string::npos; // size_t to avoid improbable overflow do { pos = data.find(token); output.push_back(data.substr(0, pos)); if (string::npos != pos) data = data.substr(pos + token.size()); } while (string::npos != pos); return output; } |
可以使用任何字符串作为分隔符,也可以与二进制数据一起使用(std::string支持二进制数据,包括空值)
使用:
1 | auto a = split("this!!is!!!example!string","!!"); |
输出:
1 2 3 | this is !example!string |
我在我自己的strtok轧制用升压和使用两个分离的字符串。最好的方法是在已经发现的C + +工具包的字符串库。它是incredibly柔性和固定。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | #include <iostream> #include <vector> #include <string> #include <strtk.hpp> const char *whitespace =" \t \f"; const char *whitespace_and_punctuation =" \t \f;,="; int main() { { // normal parsing of a string into a vector of strings std::string s("Somewhere down the road"); std::vector<std::string> result; if( strtk::parse( s, whitespace, result ) ) { for(size_t i = 0; i < result.size(); ++i ) std::cout << result[i] << std::endl; } } { // parsing a string into a vector of floats with other separators // besides spaces std::string s("3.0, 3.14; 4.0"); std::vector<float> values; if( strtk::parse( s, whitespace_and_punctuation, values ) ) { for(size_t i = 0; i < values.size(); ++i ) std::cout << values[i] << std::endl; } } { // parsing a string into specific variables std::string s("angle = 45; radius = 9.9"); std::string w1, w2; float v1, v2; if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) ) { std::cout <<"word" << w1 <<", value" << v1 << std::endl; std::cout <<"word" << w2 <<", value" << v2 << std::endl; } } return 0; } |
有太多的工具包的灵活性比这简单的娱乐节目,但以其效用在字符串解析成冰的令人难以置信的有用的元素。
我之所以这么做是因为我需要一种简单的方法来拆分字符串和基于C的字符串…希望其他人也能发现它的用处。另外,它不依赖于令牌,您可以使用字段作为分隔符,这是我需要的另一个键。
我相信有一些改进可以进一步提高它的优雅度,请尽一切努力
字符串拆分器.hpp:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | #include <vector> #include <iostream> #include <string.h> using namespace std; class StringSplit { private: void copy_fragment(char*, char*, char*); void copy_fragment(char*, char*, char); bool match_fragment(char*, char*, int); int untilnextdelim(char*, char); int untilnextdelim(char*, char*); void assimilate(char*, char); void assimilate(char*, char*); bool string_contains(char*, char*); long calc_string_size(char*); void copy_string(char*, char*); public: vector<char*> split_cstr(char); vector<char*> split_cstr(char*); vector<string> split_string(char); vector<string> split_string(char*); char* String; bool do_string; bool keep_empty; vector<char*> Container; vector<string> ContainerS; StringSplit(char * in) { String = in; } StringSplit(string in) { size_t len = calc_string_size((char*)in.c_str()); String = new char[len + 1]; memset(String, 0, len + 1); copy_string(String, (char*)in.c_str()); do_string = true; } ~StringSplit() { for (int i = 0; i < Container.size(); i++) { if (Container[i] != NULL) { delete[] Container[i]; } } if (do_string) { delete[] String; } } }; |
字符串拆分器.cpp:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 | #include <string.h> #include <iostream> #include <vector> #include"StringSplit.hpp" using namespace std; void StringSplit::assimilate(char*src, char delim) { int until = untilnextdelim(src, delim); if (until > 0) { char * temp = new char[until + 1]; memset(temp, 0, until + 1); copy_fragment(temp, src, delim); if (keep_empty || *temp != 0) { if (!do_string) { Container.push_back(temp); } else { string x = temp; ContainerS.push_back(x); } } else { delete[] temp; } } } void StringSplit::assimilate(char*src, char* delim) { int until = untilnextdelim(src, delim); if (until > 0) { char * temp = new char[until + 1]; memset(temp, 0, until + 1); copy_fragment(temp, src, delim); if (keep_empty || *temp != 0) { if (!do_string) { Container.push_back(temp); } else { string x = temp; ContainerS.push_back(x); } } else { delete[] temp; } } } long StringSplit::calc_string_size(char* _in) { long i = 0; while (*_in++) { i++; } return i; } bool StringSplit::string_contains(char* haystack, char* needle) { size_t len = calc_string_size(needle); size_t lenh = calc_string_size(haystack); while (lenh--) { if (match_fragment(haystack + lenh, needle, len)) { return true; } } return false; } bool StringSplit::match_fragment(char* _src, char* cmp, int len) { while (len--) { if (*(_src + len) != *(cmp + len)) { return false; } } return true; } int StringSplit::untilnextdelim(char* _in, char delim) { size_t len = calc_string_size(_in); if (*_in == delim) { _in += 1; return len - 1; } int c = 0; while (*(_in + c) != delim && c < len) { c++; } return c; } int StringSplit::untilnextdelim(char* _in, char* delim) { int s = calc_string_size(delim); int c = 1 + s; if (!string_contains(_in, delim)) { return calc_string_size(_in); } else if (match_fragment(_in, delim, s)) { _in += s; return calc_string_size(_in); } while (!match_fragment(_in + c, delim, s)) { c++; } return c; } void StringSplit::copy_fragment(char* dest, char* src, char delim) { if (*src == delim) { src++; } int c = 0; while (*(src + c) != delim && *(src + c)) { *(dest + c) = *(src + c); c++; } *(dest + c) = 0; } void StringSplit::copy_string(char* dest, char* src) { int i = 0; while (*(src + i)) { *(dest + i) = *(src + i); i++; } } void StringSplit::copy_fragment(char* dest, char* src, char* delim) { size_t len = calc_string_size(delim); size_t lens = calc_string_size(src); if (match_fragment(src, delim, len)) { src += len; lens -= len; } int c = 0; while (!match_fragment(src + c, delim, len) && (c < lens)) { *(dest + c) = *(src + c); c++; } *(dest + c) = 0; } vector<char*> StringSplit::split_cstr(char Delimiter) { int i = 0; while (*String) { if (*String != Delimiter && i == 0) { assimilate(String, Delimiter); } if (*String == Delimiter) { assimilate(String, Delimiter); } i++; String++; } String -= i; delete[] String; return Container; } vector<string> StringSplit::split_string(char Delimiter) { do_string = true; int i = 0; while (*String) { if (*String != Delimiter && i == 0) { assimilate(String, Delimiter); } if (*String == Delimiter) { assimilate(String, Delimiter); } i++; String++; } String -= i; delete[] String; return ContainerS; } vector<char*> StringSplit::split_cstr(char* Delimiter) { int i = 0; size_t LenDelim = calc_string_size(Delimiter); while(*String) { if (!match_fragment(String, Delimiter, LenDelim) && i == 0) { assimilate(String, Delimiter); } if (match_fragment(String, Delimiter, LenDelim)) { assimilate(String,Delimiter); } i++; String++; } String -= i; delete[] String; return Container; } vector<string> StringSplit::split_string(char* Delimiter) { do_string = true; int i = 0; size_t LenDelim = calc_string_size(Delimiter); while (*String) { if (!match_fragment(String, Delimiter, LenDelim) && i == 0) { assimilate(String, Delimiter); } if (match_fragment(String, Delimiter, LenDelim)) { assimilate(String, Delimiter); } i++; String++; } String -= i; delete[] String; return ContainerS; } |
实例:
1 2 3 4 5 6 7 8 9 10 11 12 | int main(int argc, char*argv[]) { StringSplit ss ="This:CUT:is:CUT:an:CUT:example:CUT:cstring"; vector<char*> Split = ss.split_cstr(":CUT:"); for (int i = 0; i < Split.size(); i++) { cout << Split[i] << endl; } return 0; } |
意志产出:
这个是安例子字符串
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | int main(int argc, char*argv[]) { StringSplit ss ="This:is:an:example:cstring"; vector<char*> Split = ss.split_cstr(':'); for (int i = 0; i < Split.size(); i++) { cout << Split[i] << endl; } return 0; } int main(int argc, char*argv[]) { string mystring ="This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string"; StringSplit ss = mystring; vector<string> Split = ss.split_string("[SPLIT]"); for (int i = 0; i < Split.size(); i++) { cout << Split[i] << endl; } return 0; } int main(int argc, char*argv[]) { string mystring ="This|is|an|example|string"; StringSplit ss = mystring; vector<string> Split = ss.split_string('|'); for (int i = 0; i < Split.size(); i++) { cout << Split[i] << endl; } return 0; } |
要保留空条目(默认情况下将排除空条目):
1 2 3 | StringSplit ss = mystring; ss.keep_empty = true; vector<string> Split = ss.split_string(":DELIM:"); |
其目的是使其类似于C的split()方法,其中拆分字符串就像:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | String[] Split = "Hey:cut:what's:cut:your:cut:name?".Split(new[]{":cut: <hr><P>这个怎么样:</P>[cc lang="cpp"]#include <string> #include <vector> using namespace std; vector<string> split(string str, const char delim) { vector<string> v; string tmp; for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) { if(*i != delim && i != str.end()) { tmp += *i; } else { v.push_back(tmp); tmp =""; } } return v; } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | #include<iostream> #include<string> #include<sstream> #include<vector> using namespace std; vector<string> split(const string &s, char delim) { vector<string> elems; stringstream ss(s); string item; while (getline(ss, item, delim)) { elems.push_back(item); } return elems; } int main() { vector<string> x = split("thi is an sample test",' '); unsigned int i; for(i=0;i<x.size();i++) cout<<i<<":"<<x[i]<<endl; return 0; } |
我喜欢为此任务使用boost/regex方法,因为它们为指定拆分条件提供了最大的灵活性。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #include <iostream> #include <string> #include <boost/regex.hpp> int main() { std::string line("A:::line::to:split"); const boost::regex re(":+"); // one or more colons // -1 means find inverse matches aka split boost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1); boost::sregex_token_iterator end; for (; tokens != end; ++tokens) std::cout << *tokens << std::endl; } |
最近我不得不把一个用骆驼壳包装的词分解成子字。没有分隔符,只有大写字符。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | #include <string> #include <list> #include <locale> // std::isupper template<class String> const std::list<String> split_camel_case_string(const String &s) { std::list<String> R; String w; for (String::const_iterator i = s.begin(); i < s.end(); ++i) { { if (std::isupper(*i)) { if (w.length()) { R.push_back(w); w.clear(); } } w += *i; } if (w.length()) R.push_back(w); return R; } |
例如,这将"a query trades"拆分为"a"、"query"和"trades"。该函数使用窄字符串和宽字符串。因为它尊重当前的地区,它将"raumfahrt_berwachungs verordnung"分为"raumfahrt"、"berwachungs"和"verordnung"。
注:
这个答案获取字符串并将其放入字符串的向量中。它使用Boost库。
1 2 3 | #include <boost/algorithm/string.hpp> std::vector<std::string> strs; boost::split(strs,"string to split", boost::is_any_of("\t")); |
这是另一种方法……
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | void split_string(string text,vector<string>& words) { int i=0; char ch; string word; while(ch=text[i++]) { if (isspace(ch)) { if (!word.empty()) { words.push_back(word); } word =""; } else { word += ch; } } if (!word.empty()) { words.push_back(word); } } |
加油!-)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | #include <boost/algorithm/string/split.hpp> #include <boost/algorithm/string.hpp> #include <iostream> #include <vector> using namespace std; using namespace boost; int main(int argc, char**argv) { typedef vector < string > list_type; list_type list; string line; line ="Somewhere down the road"; split(list, line, is_any_of("")); for(int i = 0; i < list.size(); i++) { cout << list[i] << endl; } return 0; } |
这个例子给出了输出-
1 2 3 4 | Somewhere down the road |
下面的代码使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | #include <iostream> #include #include <vector> #include <string> using namespace std; char one_line_string[] ="hello hi how are you nice weather we are having ok then bye"; char seps[] =" ,\t "; char *token; int main() { vector<string> vec_String_Lines; token = strtok( one_line_string, seps ); cout <<"Extracting and storing data in a vector.. "; while( token != NULL ) { vec_String_Lines.push_back(token); token = strtok( NULL, seps ); } cout <<"Displaying end result in vector line storage.. "; for ( int i = 0; i < vec_String_Lines.size(); ++i) cout << vec_String_Lines[i] <<" "; cout <<" "; return 0; } |
我使用这个simpleton是因为我们的字符串类"特殊"(即不标准):
1 2 3 4 5 6 7 8 9 10 11 12 13 | void splitString(const String &s, const String &delim, std::vector<String> &result) { const int l = delim.length(); int f = 0; int i = s.indexOf(delim,f); while (i>=0) { String token( i-f > 0 ? s.substring(f,i-f) :""); result.push_back(token); f=i+l; i = s.indexOf(delim,f); } String token = s.substring(f); result.push_back(token); } |
1 2 3 4 5 6 7 8 9 10 11 12 13 | #include <iostream> #include <regex> using namespace std; int main() { string s ="foo bar baz"; regex e("\\s+"); regex_token_iterator<string::iterator> i(s.begin(), s.end(), e, -1); regex_token_iterator<string::iterator> end; while (i != end) cout <<" [" << *i++ <<"]"; } |
在我看来,这是最接近python的re.split()。有关regex_token_迭代器的更多信息,请参阅cplusplus.com。-1(regex_token_迭代器ctor中的第4个参数)是序列中不匹配的部分,使用match作为分隔符。
下面是一个更好的方法。它可以采用任何字符,除非您需要,否则不会拆分行。不需要特殊的库(好吧,除了std,但谁真的认为这是一个额外的库),没有指针,没有引用,而且它是静态的。只是简单的C++。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | #pragma once #include <vector> #include <sstream> using namespace std; class Helpers { public: static vector<string> split(string s, char delim) { stringstream temp (stringstream::in | stringstream::out); vector<string> elems(0); if (s.size() == 0 || delim == 0) return elems; for(char c : s) { if(c == delim) { elems.push_back(temp.str()); temp = stringstream(stringstream::in | stringstream::out); } else temp << c; } if (temp.str().size() > 0) elems.push_back(temp.str()); return elems; } //Splits string s with a list of delimiters in delims (it's just a list, like if we wanted to //split at the following letters, a, b, c we would make delims="abc". static vector<string> split(string s, string delims) { stringstream temp (stringstream::in | stringstream::out); vector<string> elems(0); bool found; if(s.size() == 0 || delims.size() == 0) return elems; for(char c : s) { found = false; for(char d : delims) { if (c == d) { elems.push_back(temp.str()); temp = stringstream(stringstream::in | stringstream::out); found = true; break; } } if(!found) temp << c; } if(temp.str().size() > 0) elems.push_back(temp.str()); return elems; } }; |
我写了下面的代码。可以指定分隔符,分隔符可以是字符串。结果与Java的String .String类似,结果中有空字符串。
例如,如果我们称为split("abcpickabcanyabctwo:abc"、"abc"),结果如下:
1 2 3 4 5 | 0 <len:0> 1 PICK <len:4> 2 ANY <len:3> 3 TWO: <len:4> 4 <len:0> |
代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | vector <string> split(const string& str, const string& delimiter ="") { vector <string> tokens; string::size_type lastPos = 0; string::size_type pos = str.find(delimiter, lastPos); while (string::npos != pos) { // Found a token, add it to the vector. cout << str.substr(lastPos, pos - lastPos) << endl; tokens.push_back(str.substr(lastPos, pos - lastPos)); lastPos = pos + delimiter.size(); pos = str.find(delimiter, lastPos); } tokens.push_back(str.substr(lastPos, str.size() - lastPos)); return tokens; } |
在处理空格作为分隔符时,已经给出了使用
为了改变流所认为的空白,您只需使用(
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | #include <iostream> #include #include <iterator> #include <sstream> #include <locale> struct whitespace_mask { std::ctype_base::mask mask_table[std::ctype<char>::table_size]; whitespace_mask(std::string const& spaces) { std::ctype_base::mask* table = this->mask_table; std::ctype_base::mask const* tab = std::use_facet<std::ctype<char>>(std::locale()).table(); for (std::size_t i(0); i != std::ctype<char>::table_size; ++i) { table[i] = tab[i] & ~std::ctype_base::space; } std::for_each(spaces.begin(), spaces.end(), [=](unsigned char c) { table[c] |= std::ctype_base::space; }); } }; class whitespace_facet : private whitespace_mask , public std::ctype<char> { public: whitespace_facet(std::string const& spaces) : whitespace_mask(spaces) , std::ctype<char>(this->mask_table) { } }; struct whitespace { std::string spaces; whitespace(std::string const& spaces): spaces(spaces) {} }; std::istream& operator>>(std::istream& in, whitespace const& ws) { std::locale loc(in.getloc(), new whitespace_facet(ws.spaces)); in.imbue(loc); return in; } // everything above would probably go into a utility library... int main() { std::istringstream in("a, b, c, d, e"); std::copy(std::istream_iterator<std::string>(in >> whitespace(",")), std::istream_iterator<std::string>(), std::ostream_iterator<std::string>(std::cout," ")); std::istringstream pipes("a b c| d |e e"); std::copy(std::istream_iterator<std::string>(pipes >> whitespace("|")), std::istream_iterator<std::string>(), std::ostream_iterator<std::string>(std::cout," ")); } |
大多数代码用于打包提供软分隔符的通用工具:合并一行中的多个分隔符。无法生成空序列。当流中需要不同的分隔符时,您可能会使用不同的设置流,使用共享流缓冲区:
1 2 3 4 5 6 7 8 9 10 11 12 | void f(std::istream& in) { std::istream pipes(in.rdbuf()); pipes >> whitespace("|"); std::istream comma(in.rdbuf()); comma >> whitespace(","); std::string s0, s1; if (pipes >> s0 >> std::ws // read up to first pipe and ignore sequence of pipes && comma >> s1 >> std::ws) { // read up to first comma and ignore commas // ... } } |
作为一个业余爱好者,这是我想到的第一个解决方案。我有点好奇为什么我还没有在这里看到类似的解决方案,我是怎么做的有根本的问题吗?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | include<iostream>#include<string>#包括<vector>std::vector<std::string>split(const std::string&s,const std::string&delims){std::vector<std::string>结果;std::string::size_type pos=0;同时(std::string::npos!=(pos=s.find_first_not_of(delims,pos))){auto pos2=s.find ou first ou of(熟食,pos);result.emplace_back(s.substr(pos,std::string::npos==pos2?)pos2:pos2-pos));POS=POS2;}返回结果;}int(){std::字符串文本",然后我说:"我不明白,你为什么要这样做呢??"<div class="suo-content">[collapse title=""]<ul><li>这很管用!</li></ul>[/collapse]</div><hr><P>我使用以下代码:</P>[cc lang="cpp"]namespace Core { typedef std::wstring String; void SplitString(const Core::String& input, const Core::String& splitter, std::list<Core::String>& output) { if (splitter.empty()) { throw std::invalid_argument(); // for example } std::list<Core::String> lines; Core::String::size_type offset = 0; for (;;) { Core::String::size_type splitterPos = input.find(splitter, offset); if (splitterPos != Core::String::npos) { lines.push_back(input.substr(offset, splitterPos - offset)); offset = splitterPos + splitter.size(); } else { lines.push_back(input.substr(offset)); break; } } lines.swap(output); } } // gtest: class SplitStringTest: public testing::Test { }; TEST_F(SplitStringTest, EmptyStringAndSplitter) { std::list<Core::String> result; ASSERT_ANY_THROW(Core::SplitString(Core::String(), Core::String(), result)); } TEST_F(SplitStringTest, NonEmptyStringAndEmptySplitter) { std::list<Core::String> result; ASSERT_ANY_THROW(Core::SplitString(L"xy", Core::String(), result)); } TEST_F(SplitStringTest, EmptyStringAndNonEmptySplitter) { std::list<Core::String> result; Core::SplitString(Core::String(), Core::String(L","), result); ASSERT_EQ(1, result.size()); ASSERT_EQ(Core::String(), *result.begin()); } TEST_F(SplitStringTest, OneCharSplitter) { std::list<Core::String> result; Core::SplitString(L"x,y", L",", result); ASSERT_EQ(2, result.size()); ASSERT_EQ(L"x", *result.begin()); ASSERT_EQ(L"y", *result.rbegin()); Core::SplitString(L",xy", L",", result); ASSERT_EQ(2, result.size()); ASSERT_EQ(Core::String(), *result.begin()); ASSERT_EQ(L"xy", *result.rbegin()); Core::SplitString(L"xy,", L",", result); ASSERT_EQ(2, result.size()); ASSERT_EQ(L"xy", *result.begin()); ASSERT_EQ(Core::String(), *result.rbegin()); } TEST_F(SplitStringTest, TwoCharsSplitter) { std::list<Core::String> result; Core::SplitString(L"x,.y,z", L",.", result); ASSERT_EQ(2, result.size()); ASSERT_EQ(L"x", *result.begin()); ASSERT_EQ(L"y,z", *result.rbegin()); Core::SplitString(L"x,,y,z", L",,", result); ASSERT_EQ(2, result.size()); ASSERT_EQ(L"x", *result.begin()); ASSERT_EQ(L"y,z", *result.rbegin()); } TEST_F(SplitStringTest, RecursiveSplitter) { std::list<Core::String> result; Core::SplitString(L",,,", L",,", result); ASSERT_EQ(2, result.size()); ASSERT_EQ(Core::String(), *result.begin()); ASSERT_EQ(L",", *result.rbegin()); Core::SplitString(L",.,.,", L",.,", result); ASSERT_EQ(2, result.size()); ASSERT_EQ(Core::String(), *result.begin()); ASSERT_EQ(L".,", *result.rbegin()); Core::SplitString(L"x,.,.,y", L",.,", result); ASSERT_EQ(2, result.size()); ASSERT_EQ(L"x", *result.begin()); ASSERT_EQ(L".,y", *result.rbegin()); Core::SplitString(L",.,,.,", L",.,", result); ASSERT_EQ(3, result.size()); ASSERT_EQ(Core::String(), *result.begin()); ASSERT_EQ(Core::String(), *(++result.begin())); ASSERT_EQ(Core::String(), *result.rbegin()); } TEST_F(SplitStringTest, NullTerminators) { std::list<Core::String> result; Core::SplitString(L"xy", Core::String(L"\0", 1), result); ASSERT_EQ(1, result.size()); ASSERT_EQ(L"xy", *result.begin()); Core::SplitString(Core::String(L"x\0y", 3), Core::String(L"\0", 1), result); ASSERT_EQ(2, result.size()); ASSERT_EQ(L"x", *result.begin()); ASSERT_EQ(L"y", *result.rbegin()); } |
这是我的版本,是kev的来源:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #include <string> #include <vector> void split(vector<string> &result, string str, char delim ) { string tmp; string::iterator i; result.clear(); for(i = str.begin(); i <= str.end(); ++i) { if((const char)*i != delim && i != str.end()) { tmp += *i; } else { result.push_back(tmp); tmp =""; } } } |
之后,调用函数并对其执行一些操作:
1 2 3 4 5 | vector<string> hosts; split(hosts,"192.168.1.2,192.168.1.3", ','); for( size_t i = 0; i < hosts.size(); i++){ cout << "Connecting host :" << hosts.at(i) <<"..." << endl; } |
这是我用C++ 11和STL的解决方案。它应该是相当有效的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | #include <vector> #include <string> #include <cstring> #include <iostream> #include #include <functional> std::vector<std::string> split(const std::string& s) { std::vector<std::string> v; const auto end = s.end(); auto to = s.begin(); decltype(to) from; while((from = std::find_if(to, end, [](char c){ return !std::isspace(c); })) != end) { to = std::find_if(from, end, [](char c){ return std::isspace(c); }); v.emplace_back(from, to); } return v; } int main() { std::string s ="this is the string to split"; auto v = split(s); for(auto&& s: v) std::cout << s << ' '; } |
输出:
1 2 3 4 5 6 | this is the string to split |
使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | // Split string into parts. class Split : public std::vector<std::string> { public: Split(const std::string& str, char* delimList) { size_t lastPos = 0; size_t pos = str.find_first_of(delimList); while (pos != std::string::npos) { if (pos != lastPos) push_back(str.substr(lastPos, pos-lastPos)); lastPos = pos + 1; pos = str.find_first_of(delimList, lastPos); } if (lastPos < str.length()) push_back(str.substr(lastPos, pos-lastPos)); } }; |
用于填充STL集的示例:
1 2 3 | std::set<std::string> words; Split split("Hello,World",","); words.insert(split.begin(), split.end()); |
lazystringsplitter:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | #include <string> #include #include <unordered_set> using namespace std; class LazyStringSplitter { string::const_iterator start, finish; unordered_set<char> chop; public: // Empty Constructor explicit LazyStringSplitter() {} explicit LazyStringSplitter (const string cstr, const string delims) : start(cstr.begin()) , finish(cstr.end()) , chop(delims.begin(), delims.end()) {} void operator () (const string cstr, const string delims) { chop.insert(delims.begin(), delims.end()); start = cstr.begin(); finish = cstr.end(); } bool empty() const { return (start >= finish); } string next() { // return empty string // if ran out of characters if (empty()) return string(""); auto runner = find_if(start, finish, [&](char c) { return chop.count(c) == 1; }); // construct next string string ret(start, runner); start = runner + 1; // Never return empty string // + tail recursion makes this method efficient return !ret.empty() ? ret : next(); } }; |
- 在《
LazyStringSplitter 呼叫这个方法,因为一个原因-它不分在一个好的字符串。 - 它在本质behaves像一个Python的发电机
- 它exposes称这一方法
next the next返回的字符串,从原始的冰裂 - 在使用方式上的无序_从C + +(11,婊子,看IP(分隔符是多少?
- 这里是如何与信息工程
测试程序
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #include <iostream> using namespace std; int main() { LazyStringSplitter splitter; // split at the characters ' ', '!', '.', ',' splitter("This, is a string. And here is another string! Let's test and see how well this does."," !.,"); while (!splitter.empty()) cout << splitter.next() << endl; return 0; } |
输出
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | This is a string And here is another string Let's test and see how well this does |
下一个水平的提高是实施
1 | vector<string> split_string(splitter.begin(), splitter.end()); |
一直在寻求一种方式分两个字符串A市分离器,任何长度,所以它从零开始写作,为现有的解决方案不适合我。
这里是我的小算法,采用只读(:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | //use like this //std::vector<std::wstring> vec = Split<std::wstring> (L"Hello##world##!", L"##"); template <typename valueType> static std::vector <valueType> Split (valueType text, const valueType& delimiter) { std::vector <valueType> tokens; size_t pos = 0; valueType token; while ((pos = text.find(delimiter)) != valueType::npos) { token = text.substr(0, pos); tokens.push_back (token); text.erase(0, pos + delimiter.length()); } tokens.push_back (text); return tokens; } |
它可以用于任何与分离器的长度和形状的父亲,AA AA在收缩测试。 实例化或与wstring或者字符串类型。
所有的算法。它是searches delimiter为《,《方得到的字符串,在冰上的deletes delimiter delimiter,直到它再次和searches CR网络的不多。
当然,你可以使用任何的空格数delimiter。
希望它helps。
NO NO流升压型,字符串,只是标准C库cooperating一起与
冰被视两个空格的任何组合newlines,标签和空间。《冰上的空白字符的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | #include <string> #include <list> #include <iostream> #include <cstring> using namespace std; const char *wschars ="\t "; list<string> split(const string &str) { const char *cstr = str.c_str(); list<string> out; while (*cstr) { // while remaining string not empty size_t toklen; cstr += strspn(cstr, wschars); // skip leading whitespace toklen = strcspn(cstr, wschars); // figure out token length if (toklen) // if we have a token, add to list out.push_back(string(cstr, toklen)); cstr += toklen; // skip over token } // ran out of string; return list return out; } int main(int argc, char **argv) { list<string> li = split(argv[1]); for (list<string>::iterator i = li.begin(); i != li.end(); i++) cout <<"{" << *i << <div class="suo-content">[collapse title=""]<ul><li>请使用std::vector而不是list</li><li>@fmuecke问题中不要求对字符串片段使用特定的表示,因此不需要将您的建议合并到答案中。</li></ul>[/collapse]</div><p><center>[wp_ad_camp_5]</center></p><hr><P>这是我写的一个帮助我做很多事情的函数。它在为<wyn>WebSockets</wyn>做协议时帮助了我。</P>[cc lang="cpp"]using namespace std; #include <iostream> #include <vector> #include <sstream> #include <string> vector<string> split ( string input , string split_id ) { vector<string> result; int i = 0; bool add; string temp; stringstream ss; size_t found; string real; int r = 0; while ( i != input.length() ) { add = false; ss << input.at(i); temp = ss.str(); found = temp.find(split_id); if ( found != string::npos ) { add = true; real.append ( temp , 0 , found ); } else if ( r > 0 && ( i+1 ) == input.length() ) { add = true; real.append ( temp , 0 , found ); } if ( add ) { result.push_back(real); ss.str(string()); ss.clear(); temp.clear(); real.clear(); r = 0; } i++; r++; } return result; } int main() { string s ="S,o,m,e,w,h,e,r,e, down the road In a really big C++ house. Lives a little old lady. That no one ever knew. She comes outside. In the very hot sun. And throws C++ at us. The End. FIN."; vector < string > Token; Token = split ( s ,"," ); for ( int i = 0 ; i < Token.size(); i++) cout << Token.at(i) << endl; cout << endl << Token.size(); int a; cin >> a; return a; } |
对于那些需要使用字符串分隔符拆分字符串的替代方法的用户,您可以尝试下面的解决方案。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | std::vector<size_t> str_pos(const std::string &search, const std::string &target) { std::vector<size_t> founds; if(!search.empty()) { size_t start_pos = 0; while (true) { size_t found_pos = target.find(search, start_pos); if(found_pos != std::string::npos) { size_t found = found_pos; founds.push_back(found); start_pos = (found_pos + 1); } else { break; } } } return founds; } std::string str_sub_index(size_t begin_index, size_t end_index, const std::string &target) { std::string sub; size_t size = target.length(); const char* copy = target.c_str(); for(size_t i = begin_index; i <= end_index; i++) { if(i >= size) { break; } else { char c = copy[i]; sub += c; } } return sub; } std::vector<std::string> str_split(const std::string &delimiter, const std::string &target) { std::vector<std::string> splits; if(!delimiter.empty()) { std::vector<size_t> founds = str_pos(delimiter, target); size_t founds_size = founds.size(); if(founds_size > 0) { size_t search_len = delimiter.length(); size_t begin_index = 0; for(int i = 0; i <= founds_size; i++) { std::string sub; if(i != founds_size) { size_t pos = founds.at(i); sub = str_sub_index(begin_index, pos - 1, target); begin_index = (pos + search_len); } else { sub = str_sub_index(begin_index, (target.length() - 1), target); } splits.push_back(sub); } } } return splits; } |
这些代码段由3个函数组成。坏消息是使用
在
1 2 3 4 5 6 7 8 9 10 11 | int main() { std::string s ="Hello, world! We need to make the world a better place. Because your world is also my world, and our children's world."; std::vector<std::string> split = str_split("world", s); for(int i = 0; i < split.size(); i++) { std::cout << split[i] << std::endl; } } |
它会产生:
1 2 3 4 5 6 | Hello, ! We need to make the a better place. Because your is also my , and our children's . |
我相信这不是最有效的代码,但至少它是有效的。希望它有帮助。
这是我解决这个问题的方法:
1 2 3 4 5 6 7 8 9 10 11 | vector<string> get_tokens(string str) { vector<string> dt; stringstream ss; string tmp; ss << str; for (size_t i; !ss.eof(); ++i) { ss >> tmp; dt.push_back(tmp); } return dt; } |
此函数返回字符串的向量。
是的,我看了全部30个例子。
我找不到适合多字符分隔符的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | #include <string> #include <vector> using namespace std; vector<string> split(const string &str, const string &delim) { const auto delim_pos = str.find(delim); if (delim_pos == string::npos) return {str}; vector<string> ret{str.substr(0, delim_pos)}; auto tail = split(str.substr(delim_pos + delim.size(), string::npos), delim); ret.insert(ret.end(), tail.begin(), tail.end()); return ret; } |
可能不是最有效的实现,但它是一个非常简单的递归解决方案,只使用
啊,它是用C++ 11编写的,但是这个代码没有什么特别之处,所以你可以很容易地把它改编成C++ 98。
我用下面的
1 2 3 4 5 6 7 8 9 10 11 | void split(string in, vector<string>& parts, char separator) { string::iterator ts, curr; ts = curr = in.begin(); for(; curr <= in.end(); curr++ ) { if( (curr == in.end() || *curr == separator) && curr > ts ) parts.push_back( string( ts, curr )); if( curr == in.end() ) break; if( *curr == separator ) ts = curr + 1; } } |
Plasmah,我忘了加上额外的检查(curr>ts)来删除带有空白的令牌。
这是我的版本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | #include <vector> inline std::vector<std::string> Split(const std::string &str, const std::string &delim ="") { std::vector<std::string> tokens; if (str.size() > 0) { if (delim.size() > 0) { std::string::size_type currPos = 0, prevPos = 0; while ((currPos = str.find(delim, prevPos)) != std::string::npos) { std::string item = str.substr(prevPos, currPos - prevPos); if (item.size() > 0) { tokens.push_back(item); } prevPos = currPos + 1; } tokens.push_back(str.substr(prevPos)); } else { tokens.push_back(str); } } return tokens; } |
它使用多字符分隔符。它防止空令牌进入结果。它使用一个标题。当您不提供分隔符时,它将字符串作为单个标记返回。如果字符串为空,则返回空结果。不幸的是,由于巨大的EDCOX1×0拷贝,除非使用C++ 11编译,否则应该使用移动示意图。在C++ 11中,这个代码应该是快的。
在
这是我的条目:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | template <typename Container, typename InputIter, typename ForwardIter> Container split(InputIter first, InputIter last, ForwardIter s_first, ForwardIter s_last) { Container output; while (true) { auto pos = std::find_first_of(first, last, s_first, s_last); output.emplace_back(first, pos); if (pos == last) { break; } first = ++pos; } return output; } template <typename Output = std::vector<std::string>, typename Input = std::string, typename Delims = std::string> Output split(const Input& input, const Delims& delims ="") { using std::cbegin; using std::cend; return split<Output>(cbegin(input), cend(input), cbegin(delims), cend(delims)); } auto vec = split("Mary had a little lamb"); |
第一个定义是采用两对迭代器的STL样式的泛型函数。第二个是一个方便的功能,可以省去你自己做所有的
它之所以优雅(imo),是因为与大多数其他答案不同,它不仅限于字符串,而且可以与任何与STL兼容的容器一起使用。在不更改上述代码的情况下,您可以说:
1 2 3 4 | using vec_of_vecs_t = std::vector<std::vector<int>>; std::vector<int> v{1, 2, 0, 3, 4, 5, 0, 7, 8, 0, 9}; auto r = split<vec_of_vecs_t>(v, std::initializer_list<int>{0, 2}); |
每次遇到
(还有一个额外的好处是,使用字符串,这个实现比基于
我相信还没有人发布这个解决方案。与直接使用分隔符不同,它基本上与boost::split()相同,即,它允许您传递一个谓词,如果char是分隔符,则返回true,否则返回false。我认为这给了程序员更多的控制权,最重要的是你不需要提升。
1 2 3 4 5 6 7 8 9 10 11 12 13 | template <class Container, class String, class Predicate> void split(Container& output, const String& input, const Predicate& pred, bool trimEmpty = false) { auto it = begin(input); auto itLast = it; while (it = find_if(it, end(input), pred), it != end(input)) { if (not (trimEmpty and it == itLast)) { output.emplace_back(itLast, it); } ++it; itLast = it; } } |
然后你可以这样使用它:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | struct Delim { bool operator()(char c) { return not isalpha(c); } }; int main() { string s("#include<iostream> " "int main() { std::cout << "Hello world!" << std::endl; }"); vector<string> v; split(v, s, Delim(), true); /* Which is also the same as */ split(v, s, [](char c) { return not isalpha(c); }, true); for (const auto& i : v) { cout << i << endl; } } |
使用
https://wandbox.org/permlink/kw5lwrcl1pxjp2pw
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #include <iostream> #include <string> #include <string_view> #include"range/v3/view.hpp" #include"range/v3/algorithm.hpp" int main() { std::string s ="Somewhere down the range v3 library"; ranges::for_each(s | ranges::view::split(' ') | ranges::view::transform([](auto &&sub) { return std::string_view(&*sub.begin(), ranges::distance(sub)); }), [](auto s) {std::cout <<"Substring:" << s <<" ";} ); } |
我刚刚写了一个很好的例子,说明如何将一个字符一个符号地拆分,然后将每个字符数组(由符号分隔的单词)放入一个向量中。为了简单起见,我创建了std字符串的向量类型。
我希望这对您有所帮助,并且您可以阅读。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | #include <vector> #include <string> #include <iostream> void push(std::vector<std::string> &WORDS, std::string &TMP){ WORDS.push_back(TMP); TMP =""; } std::vector<std::string> mySplit(char STRING[]){ std::vector<std::string> words; std::string s; for(unsigned short i = 0; i < strlen(STRING); i++){ if(STRING[i] != ' '){ s += STRING[i]; }else{ push(words, s); } } push(words, s);//Used to get last split return words; } int main(){ char string[] ="My awesome string."; std::cout << mySplit(string)[2]; std::cin.get(); return 0; } |
根据加利克的回答,我做了这个。这主要是在这里,所以我不必一次又一次地写。疯狂的是C++仍然没有一个本机分割函数。特征:
- 应该很快。
- 很容易理解(我想)。
- 合并空节。
- 使用多个分隔符(如
" )很简单"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | #include <string> #include <vector> #include std::vector<std::string> split(const std::string& s, const std::string& delims) { using namespace std; vector<string> v; // Start of an element. size_t elemStart = 0; // We start searching from the end of the previous element, which // initially is the start of the string. size_t elemEnd = 0; // Find the first non-delim, i.e. the start of an element, after the end of the previous element. while((elemStart = s.find_first_not_of(delims, elemEnd)) != string::npos) { // Find the first delem, i.e. the end of the element (or if this fails it is the end of the string). elemEnd = s.find_first_of(delims, elemStart); // Add it. v.emplace_back(s, elemStart, elemEnd == string::npos ? string::npos : elemEnd - elemStart); } // When there are no more non-spaces, we are done. return v; } |
1 2 3 4 5 6 7 8 9 10 11 12 | // adapted from a"regular" csv parse std::string stringIn ="my csv is 10233478 NOTseparated by commas"; std::vector<std::string> commaSeparated(1); int commaCounter = 0; for (int i=0; i<stringIn.size(); i++) { if (stringIn[i] =="") { commaSeparated.push_back(""); commaCounter++; } else { commaSeparated.at(commaCounter) += stringIn[i]; } } |
最后,您将得到一个字符串的向量,语句中的每个元素都由空格分隔。只有非标准资源是std::vector(但由于涉及std::string,所以我认为它是可以接受的)。
空字符串另存为单独的项。
这是我的看法。我必须逐字处理输入字符串,这可以通过使用空格来计算单词来完成,但我觉得这会很繁琐,我应该将单词拆分为向量。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | #include<iostream> #include<vector> #include<string> #include<stdio.h> using namespace std; int main() { char x = '\0'; string s =""; vector<string> q; x = getchar(); while(x != ' ') { if(x == ' ') { q.push_back(s); s =""; x = getchar(); continue; } s = s + x; x = getchar(); } q.push_back(s); for(int i = 0; i<q.size(); i++) cout<<q[i]<<""; return 0; } |
为了方便起见:
1 2 3 4 | template<class V, typename T> bool in(const V &v, const T &el) { return std::find(v.begin(), v.end(), el) != v.end(); } |
基于多个分隔符的实际拆分:
1 2 3 4 5 6 7 8 9 10 11 12 13 | std::vector<std::string> split(const std::string &s, const std::vector<char> &delims) { std::vector<std::string> res; auto stuff = [&delims](char c) { return !in(delims, c); }; auto space = [&delims](char c) { return in(delims, c); }; auto first = std::find_if(s.begin(), s.end(), stuff); while (first != s.end()) { auto last = std::find_if(first, s.end(), space); res.push_back(std::string(first, last)); first = std::find_if(last + 1, s.end(), stuff); } return res; } |
用法:
1 2 3 4 5 6 | int main() { std::string s =" aaa, bb cc"; for (auto el: split(s, {' ', ','})) std::cout << el << std::endl; return 0; } |
我们可以在C++中使用Strutk,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #include <iostream> #include <cstring> using namespace std; int main() { char str[]="Mickey M;12034;911416313;M;01a;9001;NULL;0;13;12;0;CPP,C;MSC,3D;FEND,BEND,SEC;"; char *pch = strtok (str,";,"); while (pch != NULL) { cout<<pch<<" "; pch = strtok (NULL,";,"); } return 0; } |
我的代码是:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | #include <list> #include <string> template<class StringType = std::string, class ContainerType = std::list<StringType> > class DSplitString:public ContainerType { public: explicit DSplitString(const StringType& strString, char cChar, bool bSkipEmptyParts = true) { size_t iPos = 0; size_t iPos_char = 0; while(StringType::npos != (iPos_char = strString.find(cChar, iPos))) { StringType strTemp = strString.substr(iPos, iPos_char - iPos); if((bSkipEmptyParts && !strTemp.empty()) || (!bSkipEmptyParts)) push_back(strTemp); iPos = iPos_char + 1; } } explicit DSplitString(const StringType& strString, const StringType& strSub, bool bSkipEmptyParts = true) { size_t iPos = 0; size_t iPos_char = 0; while(StringType::npos != (iPos_char = strString.find(strSub, iPos))) { StringType strTemp = strString.substr(iPos, iPos_char - iPos); if((bSkipEmptyParts && !strTemp.empty()) || (!bSkipEmptyParts)) push_back(strTemp); iPos = iPos_char + strSub.length(); } } }; |
例子:
1 2 3 4 5 6 7 8 9 10 11 12 | #include <iostream> #include <string> int _tmain(int argc, _TCHAR* argv[]) { DSplitString<> aa("doicanhden1;doicanhden2;doicanhden3;", ';'); for each (std::string var in aa) { std::cout << var << std::endl; } std::cin.get(); return 0; } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | #include <iostream> #include <vector> using namespace std; int main() { string str ="ABC AABCD CDDD RABC GHTTYU FR"; str +=""; //dirty hack: adding extra space to the end vector<string> v; for (int i=0; i<(int)str.size(); i++) { int a, b; a = i; for (int j=i; j<(int)str.size(); j++) { if (str[j] == ' ') { b = j; i = j; break; } } v.push_back(str.substr(a, b-a)); } for (int i=0; i<v.size(); i++) { cout<<v[i].size()<<""<<v[i]<<endl; } return 0; } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | #include <iostream> #include <string> #include <deque> std::deque<std::string> split( const std::string& line, std::string::value_type delimiter, bool skipEmpty = false ) { std::deque<std::string> parts{}; if (!skipEmpty && !line.empty() && delimiter == line.at(0)) { parts.push_back({}); } for (const std::string::value_type& c : line) { if ( ( c == delimiter && (skipEmpty ? (!parts.empty() && !parts.back().empty()) : true) ) || (c != delimiter && parts.empty()) ) { parts.push_back({}); } if (c != delimiter) { parts.back().push_back(c); } } if (skipEmpty && !parts.empty() && parts.back().empty()) { parts.pop_back(); } return parts; } void test(const std::string& line) { std::cout << line << std::endl; std::cout <<"skipEmpty=0 |"; for (const std::string& part : split(line, ':')) { std::cout << part << '|'; } std::cout << std::endl; std::cout <<"skipEmpty=1 |"; for (const std::string& part : split(line, ':', true)) { std::cout << part << '|'; } std::cout << std::endl; std::cout << std::endl; } int main() { test("foo:bar:::baz"); test(""); test("foo"); test(":"); test("::"); test(":foo"); test("::foo"); test(":foo:"); test(":foo::"); return 0; } |
输出:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | foo:bar:::baz skipEmpty=0 |foo|bar|||baz| skipEmpty=1 |foo|bar|baz| skipEmpty=0 | skipEmpty=1 | foo skipEmpty=0 |foo| skipEmpty=1 |foo| : skipEmpty=0 ||| skipEmpty=1 | :: skipEmpty=0 |||| skipEmpty=1 | :foo skipEmpty=0 ||foo| skipEmpty=1 |foo| ::foo skipEmpty=0 |||foo| skipEmpty=1 |foo| :foo: skipEmpty=0 ||foo|| skipEmpty=1 |foo| :foo:: skipEmpty=0 ||foo||| skipEmpty=1 |foo| |
我对
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | template<typename CharT, typename UnaryPredicate> void split(std::vector<std::basic_string<CharT>>& split_result, const std::basic_string<CharT>& s, UnaryPredicate predicate) { using ST = std::basic_string<CharT>; using std::swap; std::vector<ST> tmp_result; auto iter = s.cbegin(), end_iter = s.cend(); while (true) { /** * edge case: empty str -> push an empty str and exit. */ auto find_iter = find_if(iter, end_iter, predicate); tmp_result.emplace_back(iter, find_iter); if (find_iter == end_iter) { break; } iter = ++find_iter; } swap(tmp_result, split_result); } template<typename CharT> void split(std::vector<std::basic_string<CharT>>& split_result, const std::basic_string<CharT>& s, const std::basic_string<CharT>& char_candidate) { std::unordered_set<CharT> candidate_set(char_candidate.cbegin(), char_candidate.cend()); auto predicate = [&candidate_set](const CharT& c) { return candidate_set.count(c) > 0U; }; return split(split_result, s, predicate); } template<typename CharT> void split(std::vector<std::basic_string<CharT>>& split_result, const std::basic_string<CharT>& s, const CharT* literals) { return split(split_result, s, std::basic_string<CharT>(literals)); } |
这是对最热门答案之一的扩展。现在它支持设置返回元素的最大数目n。字符串的最后一位将结束在第n个元素中。maxelements参数是可选的,如果设置为默认值0,它将返回无限数量的元素。-)
h:
1 2 3 4 5 | class Myneatclass { public: static std::vector<std::string>& split(const std::string &s, char delim, std::vector<std::string> &elems, const size_t MAXELEMENTS = 0); static std::vector<std::string> split(const std::string &s, char delim, const size_t MAXELEMENTS = 0); }; |
CPP:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | std::vector<std::string>& Myneatclass::split(const std::string &s, char delim, std::vector<std::string> &elems, const size_t MAXELEMENTS) { std::stringstream ss(s); std::string item; while (std::getline(ss, item, delim)) { elems.push_back(item); if (MAXELEMENTS > 0 && !ss.eof() && elems.size() + 1 >= MAXELEMENTS) { std::getline(ss, item); elems.push_back(item); break; } } return elems; } std::vector<std::string> Myneatclass::split(const std::string &s, char delim, const size_t MAXELEMENTS) { std::vector<std::string> elems; split(s, delim, elems, MAXELEMENTS); return elems; } |
如果您想用一些字符拆分字符串,可以使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | #include<iostream> #include<string> #include<vector> #include<iterator> #include<sstream> #include<string> using namespace std; void replaceOtherChars(string &input, vector<char> ÷rs) { const char divider = dividers.at(0); int replaceIndex = 0; vector<char>::iterator it_begin = dividers.begin()+1, it_end= dividers.end(); for(;it_begin!=it_end;++it_begin) { replaceIndex = 0; while(true) { replaceIndex=input.find_first_of(*it_begin,replaceIndex); if(replaceIndex==-1) break; input.at(replaceIndex)=divider; } } } vector<string> split(string str, vector<char> chars, bool missEmptySpace =true ) { vector<string> result; const char divider = chars.at(0); replaceOtherChars(str,chars); stringstream stream; stream<<str; string temp; while(getline(stream,temp,divider)) { if(missEmptySpace && temp.empty()) continue; result.push_back(temp); } return result; } int main() { string str ="milk, pigs.... hot-dogs"; vector<char> arr; arr.push_back(' '); arr.push_back(','); arr.push_back('.'); vector<string> result = split(str,arr); vector<string>::iterator it_begin= result.begin(), it_end= result.end(); for(;it_begin!=it_end;++it_begin) { cout<<*it_begin<<endl; } return 0; } |
谢谢你@jairo abdiel toribio cisneros。它对我有效,但您的函数返回一些空元素。因此,对于不带空的返回,我使用以下内容进行了编辑:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | std::vector<std::string> split(std::string str, const char* delim) { std::vector<std::string> v; std::string tmp; for(std::string::const_iterator i = str.begin(); i <= str.end(); ++i) { if(*i != *delim && i != str.end()) { tmp += *i; } else { if (tmp.length() > 0) { v.push_back(tmp); } tmp =""; } } return v; } |
使用:
1 2 3 | std::string s ="one:two::three"; std::string delim =":"; std::vector<std::string> vv = split(s, delim.c_str()); |
我知道参加聚会的时间很晚,但是我在想,如果给你一系列的分隔符而不是空白,并且只使用标准库,那么最优雅的方法就是这样做。
以下是我的想法:
要通过一系列分隔符将单词拆分为字符串向量,请执行以下操作:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | template<class Container> std::vector<std::string> split_by_delimiters(const std::string& input, const Container& delimiters) { std::vector<std::string> result; for (auto current = begin(input) ; current != end(input) ; ) { auto first = find_if(current, end(input), not_in(delimiters)); if (first == end(input)) break; auto last = find_if(first, end(input), is_in(delimiters)); result.emplace_back(first, last); current = last; } return result; } |
通过提供有效字符序列,以另一种方式拆分:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | template<class Container> std::vector<std::string> split_by_valid_chars(const std::string& input, const Container& valid_chars) { std::vector<std::string> result; for (auto current = begin(input) ; current != end(input) ; ) { auto first = find_if(current, end(input), is_in(valid_chars)); if (first == end(input)) break; auto last = find_if(first, end(input), not_in(valid_chars)); result.emplace_back(first, last); current = last; } return result; } |
"是"和"不是"的定义如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | namespace detail { template<class Container> struct is_in { is_in(const Container& charset) : _charset(charset) {} bool operator()(char c) const { return find(begin(_charset), end(_charset), c) != end(_charset); } const Container& _charset; }; template<class Container> struct not_in { not_in(const Container& charset) : _charset(charset) {} bool operator()(char c) const { return find(begin(_charset), end(_charset), c) == end(_charset); } const Container& _charset; }; } template<class Container> detail::not_in<Container> not_in(const Container& c) { return detail::not_in<Container>(c); } template<class Container> detail::is_in<Container> is_in(const Container& c) { return detail::is_in<Container>(c); } |
我的实现可以是另一种解决方案:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | std::vector<std::wstring> SplitString(const std::wstring & String, const std::wstring & Seperator) { std::vector<std::wstring> Lines; size_t stSearchPos = 0; size_t stFoundPos; while (stSearchPos < String.size() - 1) { stFoundPos = String.find(Seperator, stSearchPos); stFoundPos = (stFoundPos == std::string::npos) ? String.size() : stFoundPos; Lines.push_back(String.substr(stSearchPos, stFoundPos - stSearchPos)); stSearchPos = stFoundPos + Seperator.size(); } return Lines; } |
测试代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | std::wstring MyString(L"Part 1SEPsecond partSEPlast partSEPend"); std::vector<std::wstring> Parts = IniFile::SplitString(MyString, L"SEP"); std::wcout << L"The string:" << MyString << std::endl; for (std::vector<std::wstring>::const_iterator it=Parts.begin(); it<Parts.end(); ++it) { std::wcout << *it << L"<---" << std::endl; } std::wcout << std::endl; MyString = L"this,time,a,comma separated,string"; std::wcout << L"The string:" << MyString << std::endl; Parts = IniFile::SplitString(MyString, L","); for (std::vector<std::wstring>::const_iterator it=Parts.begin(); it<Parts.end(); ++it) { std::wcout << *it << L"<---" << std::endl; } |
测试代码输出:
1 2 3 4 5 6 7 8 9 10 11 12 | The string: Part 1SEPsecond partSEPlast partSEPend Part 1<--- second part<--- last part<--- end<--- The string: this,time,a,comma separated,string this<--- time<--- a<--- comma separated<--- string<--- |
我有一个与其他解决方案非常不同的方法,它以其他解决方案不同的方式提供了很多价值,但当然也有它自己的缺点。下面是工作的实现,例如将
首先,这个问题可以通过一个循环来解决,不需要额外的内存,并且只考虑四个逻辑情况。从概念上讲,我们对边界感兴趣。我们的代码应该反映出这一点:让我们遍历字符串,一次查看两个字符,记住在字符串的开头和结尾都有特殊的情况。
缺点是我们必须编写实现,这有点冗长,但主要是方便的样板文件。
好处是我们编写了实现,因此很容易根据特定的需求对其进行定制,例如区分左边界和写单词边界、使用任何一组定界符,或者处理其他情况,例如非边界或错误位置。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | using namespace std; #include <iostream> #include <string> #include <cctype> typedef enum boundary_type_e { E_BOUNDARY_TYPE_ERROR = -1, E_BOUNDARY_TYPE_NONE, E_BOUNDARY_TYPE_LEFT, E_BOUNDARY_TYPE_RIGHT, } boundary_type_t; typedef struct boundary_s { boundary_type_t type; int pos; } boundary_t; bool is_delim_char(int c) { return isspace(c); // also compare against any other chars you want to use as delimiters } bool is_word_char(int c) { return ' ' <= c && c <= '~' && !is_delim_char(c); } boundary_t maybe_word_boundary(string str, int pos) { int len = str.length(); if (pos < 0 || pos >= len) { return (boundary_t){.type = E_BOUNDARY_TYPE_ERROR}; } else { if (pos == 0 && is_word_char(str[pos])) { // if the first character is word-y, we have a left boundary at the beginning return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos}; } else if (pos == len - 1 && is_word_char(str[pos])) { // if the last character is word-y, we have a right boundary left of the null terminator return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1}; } else if (!is_word_char(str[pos]) && is_word_char(str[pos + 1])) { // if we have a delimiter followed by a word char, we have a left boundary left of the word char return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos + 1}; } else if (is_word_char(str[pos]) && !is_word_char(str[pos + 1])) { // if we have a word char followed by a delimiter, we have a right boundary right of the word char return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1}; } return (boundary_t){.type = E_BOUNDARY_TYPE_NONE}; } } int main() { string str; getline(cin, str); int len = str.length(); for (int i = 0; i < len; i++) { boundary_t boundary = maybe_word_boundary(str, i); if (boundary.type == E_BOUNDARY_TYPE_LEFT) { // whatever } else if (boundary.type == E_BOUNDARY_TYPE_RIGHT) { // whatever } } } |
如您所见,代码非常容易理解和微调,代码的实际使用非常简短和简单。使用C++不应该阻止我们编写最简单和最容易定制的代码,即使这意味着不使用STL。我认为这是Linus Torvalds可能称之为"品味"的一个例子,因为我们消除了所有我们不需要的逻辑,同时以一种自然地允许更多的案例在需要处理它们的时候和如果需要处理它们的时候处理。
可以改进此代码的可能是使用
这里的方法是:切割和分离
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | string cut (string& str, const string& del) { string f = str; if (in.find_first_of(del) != string::npos) { f = str.substr(0,str.find_first_of(del)); str = str.substr(str.find_first_of(del)+del.length()); } return f; } vector<string> split (const string& in, const string& del="") { vector<string> out(); string t = in; while (t.length() > del.length()) out.push_back(cut(t,del)); return out; } |
顺便说一句,如果有什么可以做的,在这两个OPTIMIZE…………………
已经有很多好的回答这两个问题,这是只是一个小小的零售。
分裂的字符串输出冰的一件事,但如果你分到一
即使《雅可能遭受从一个小的网络,可以被视preceding分析:
1 2 | #include size_t n = std::count(s.begin(), s.end(), ' '); |
不是说我们需要更多的答案,而是我在受到埃文·泰兰的启发后得出的结论。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | std::vector <std::string> split(const string &input, auto delimiter, bool skipEmpty=true) { /* Splits a string at each delimiter and returns these strings as a string vector. If the delimiter is not found then nothing is returned. If skipEmpty is true then strings between delimiters that are 0 in length will be skipped. */ bool delimiterFound = false; int pos=0, pPos=0; std::vector <std::string> result; while (true) { pos = input.find(delimiter,pPos); if (pos != std::string::npos) { if (skipEmpty==false or pos-pPos > 0) // if empty values are to be kept or not result.push_back(input.substr(pPos,pos-pPos)); delimiterFound = true; } else { if (pPos < input.length() and delimiterFound) { if (skipEmpty==false or input.length()-pPos > 0) // if empty values are to be kept or not result.push_back(input.substr(pPos,input.length()-pPos)); } break; } pPos = pos+1; } return result; } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | #include <iostream> #include <string> #include <sstream> #include #include <iterator> #include <vector> int main() { using namespace std; int n=8; string sentence ="10 20 30 40 5 6 7 8"; istringstream iss(sentence); vector<string> tokens; copy(istream_iterator<string>(iss), istream_iterator<string>(), back_inserter(tokens)); for(int i=0;i<n;i++){ cout<<tokens.at(i); } } |
1 2 3 4 5 6 7 8 9 10 11 12 | void splitString(string str, char delim, string array[], const int arraySize) { int delimPosition, subStrSize, subStrStart = 0; for (int index = 0; delimPosition != -1; index++) { delimPosition = str.find(delim, subStrStart); subStrSize = delimPosition - subStrStart; array[index] = str.substr(subStrStart, subStrSize); subStrStart =+ (delimPosition + 1); } } |
对于一个非常大,可能是冗余的版本,尝试很多for循环。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | string stringlist[10]; int count = 0; for (int i = 0; i < sequence.length(); i++) { if (sequence[i] == ' ') { stringlist[count] = sequence.substr(0, i); sequence.erase(0, i+1); i = 0; count++; } else if (i == sequence.length()-1) // Last word { stringlist[count] = sequence.substr(0, i+1); } } |
它不漂亮,但大体上(除了标点符号和一系列其他错误)它是有效的!