关于c ++：使用令牌仅解析来自csv文件的特定列

Parse only specific columns from csv file using token

如果有一个文件用逗号分隔的值填充，例如：

1
2
3

"myComputer",5,192.168.1.0,25
"herComputer",6,192.168.1.1,26
"hisComputer",7,192.168.1.2,27

我想把数据作为一个字符串拉出来，我会这样做：

1
2
3
4
5
6
7
8
9
10
11

std::string line;
std::ifstream myfile ("myCSVFile.txt");

if(myfile.is_open())
{
while(getline(myfile,line))
{
std::string tempString = line;
std::string delimiter =",";
}
}

为了逐个解析每个值，我使用这样的方法：使用字符串分隔符(标准C++)解析(拆分)C++中的字符串。

1
2
3
4
5
6
7
8
9
10
11

std::string s ="scott>=tiger>=mushroom";
std::string delimiter =">=";

size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
token = s.substr(0, pos);
std::cout << token << std::endl;
s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;

问题是，如果我只想要第一个和第三个值呢？所以，如果我希望我的csv文件从上面，只输出

1
2
3

"myComputer" 192.168.1.0
"herComputer" 192.168.1.1
"hisComputer" 192.168.1.2

是否有一种方法可以通过上述方法实现这一点，或者我应该使用完全不同的方法？谢谢，

相关讨论

使用专用的库来完成这项任务要容易得多。有了Boost Tokenizer的逃逸列表分隔符，就轻而易举了：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

#include <vector>
#include <string>
#include <iostream>
#include <fstream>
#include <boost/tokenizer.hpp>

int main()
{
std::ifstream myfile("myCSVFile.txt");

if (myfile.is_open())
{
std::string line;
while (std::getline(myfile, line))
{
typedef boost::escaped_list_separator<char> Separator;
typedef boost::tokenizer<Separator> Tokenizer;

std::vector<std::string> tokens;
Tokenizer tokenizer(line);
for (Tokenizer::iterator iter = tokenizer.begin(); iter != tokenizer.end(); ++iter)
{
tokens.push_back(*iter);
}

if (tokens.size() == 4)
{
std::cout << tokens[0] <<"\t" << tokens[2] <<"
";
}
else
{
std::cerr <<"illegal line
";
}
}
}
}

注意，在C++ 11中，可以简化循环：

1
2
3
4

for (auto &token : tokenizer)
{
tokens.push_back(token);
}

正如您所看到的，其思想是将一行的所有值存储在std::vector中，然后输出所需的值。

现在，如果您真正处理大量文件，这可能会导致性能问题。在这种情况下，使用计数器和标记器：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

#include <vector>
#include <string>
#include <iostream>
#include <fstream>
#include <boost/tokenizer.hpp>

int main()
{
std::ifstream myfile("myCSVFile.txt");

if (myfile.is_open())
{
std::string line;
while (std::getline(myfile, line))
{
typedef boost::escaped_list_separator<char> Separator;
typedef boost::tokenizer<Separator> Tokenizer;

Tokenizer tokenizer(line);
int count = 0;
for (Tokenizer::iterator iter = tokenizer.begin(); (iter != tokenizer.end()) && (count < 3); ++iter)
{
if ((count == 0) || (count == 2))
{
std::cout << *iter;
if (count == 0)
{
std::cout <<"\t";
}
}
++count;
}
std::cout <<"
";
}
}
}

您可以同时使用这两种技术(std::vector和后面的输出，或者使用计数器的循环)，甚至可以使用自制的字符串分割算法。基本思路相同：

使用std::vector时：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

std::vector<std::string> tokens;
while ((pos = s.find(delimiter)) != std::string::npos) {
token = s.substr(0, pos);
tokens.push_back(token);
s.erase(0, pos + delimiter.length());
}

if (tokens.size() == 4)
{
std::cout << tokens[0] <<"\t" << tokens[2] <<"
";
}
else
{
std::cerr <<"illegal line
";
}

柜台：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

int count = 0;
while ((pos = s.find(delimiter)) != std::string::npos && (count < 4)) {
token = s.substr(0, pos);

if ((count == 0) || (count == 2))
{
std::cout << token;
if (count == 0)
{
std::cout <<"\t";
}
}
++count;
s.erase(0, pos + delimiter.length());
}