关于c ++：解析浮点数的C字符串

Parse a C-string of floating numbers

我有一个C字符串，它包含由逗号和空格分隔的浮点数列表。每对数字由一个(或多个)空格分隔，并表示X和Y字段由逗号分隔的点(也可以用空格分隔)。

1	" 10,9 2.5, 3 4 ,150.32"

我需要解析这个字符串来填充Point(x, y)的列表。以下是我当前的实现：

1
2
3
4
5
6
7
8
9
10

const char* strPoints = getString();
std::istringstream sstream(strPoints);

float x, y;
char comma;

while (sstream >> x >> comma >> y)
{
myList.push(Point(x, y));
}

因为我需要解析很多(最多500000个)这些字符串，所以我想知道是否有更快的解决方案。

相关讨论

看看提神精神：

如何快速解析C++中的空间分离浮点？

它支持NaN，正无穷大和负无穷大。它还允许您简洁地表达约束语法。

代码的简单改编

以下是适合您语法的示例：

1
2
3
4
5

struct Point { float x,y; };
typedef std::vector<Point> data_t;

// And later:
bool ok = phrase_parse(f,l,*(double_ > ',' > double_), space, data);

迭代器可以是任何迭代器。所以你可以把它和你的C弦连接起来。

下面是对相关基准案例的直接改编。这将向您展示如何从任何std::istream或直接从内存映射文件进行解析。

大肠杆菌上的LIVE

进一步优化(严格针对C字符串)

这里有一个版本不需要知道前面字符串的长度(这很好，因为它避免了strlen调用，以防您没有可用的长度)：

1
2
3
4
5
6
7
8
9
10
11
12
13

template <typename OI>
static inline void parse_points(OI out, char const* it, char const* last = std::numeric_limits<char const*>::max()) {
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;

bool ok = qi::phrase_parse(it, last,
*(qi::double_ >> ',' >> qi::double_) [ *phx::ref(out) = phx::construct<Point>(qi::_1, qi::_2) ],
qi::space);

if (!ok || !(it == last || *it == '\0')) {
throw it; // TODO proper error reporting?
}
}

请注意，我是如何使用输出迭代器来决定如何累积结果的。对向量进行/仅进行/解析的明显包装将是：

1
2
3
4
5

static inline data_t parse_points(char const* szInput) {
data_t pts;
parse_points(back_inserter(pts), szInput);
return pts;
}

但您也可以做不同的事情(例如附加到现有容器，它可以预先保留已知的容量等)。这样的事情通常最终会实现真正的优化集成。

以下是用大约30行基本代码完全演示的代码：

大肠杆菌上的LIVE

额外的奖励

为了展示这个解析器的灵活性；如果您只是想检查输入并获得点的计数，您可以用一个简单的lambda函数替换输出迭代器，该函数增加一个计数器而不是添加一个新构造的点。

1
2
3
4
5
6

int main() {
int count = 0;
parse_points(" 10,9 2.5, 3 4 ,150.32 ", boost::make_function_output_iterator([&](Point const&){count++;}));
std::cout <<"elements in sample:" << count <<"
";
}

大肠杆菌上的LIVE

由于所有内容都是内联的，编译器会注意到整个Point不需要在这里构造，并消除该代码：http://paste.ubuntu.com/9781055/

The main function is seen directly invoking the very parser primitives. Handcoding the parser won't get you better tuning here, at least not without a lot of effort.

相关讨论

使用std：：find和std：：strtof的组合分析这些点，我得到了更好的性能，代码也不复杂。这是我做的测试：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72

#include <iostream>
#include <sstream>
#include <random>
#include <chrono>
#include <cctype>
#include
#include <cstdlib>
#include <forward_list>

struct Point { float x; float y; };
using PointList = std::forward_list<Point>;
using Clock = std::chrono::steady_clock;
using std::chrono::milliseconds;

std::string generate_points(int n) {
static auto random_generator = std::mt19937{std::random_device{}()};
std::ostringstream oss;
std::uniform_real_distribution<float> distribution(-1, 1);
for (int i=0; i<n; ++i) {
oss << distribution(random_generator) <<" ," << distribution(random_generator) <<"\t
";
}
return oss.str();
}

PointList parse_points1(const char* s) {
std::istringstream iss(s);
PointList points;
float x, y;
char comma;
while (iss >> x >> comma >> y)
points.push_front(Point{x, y});
return points;
}

inline
std::tuple<Point, const char*> parse_point2(const char* x_first, const char* last) {
auto is_whitespace = [](char c) { return std::isspace(c); };
auto x_last = std::find(x_first, last, ',');
auto y_first = std::find_if_not(std::next(x_last), last, is_whitespace);
auto y_last = std::find_if(y_first, last, is_whitespace);
auto x = std::strtof(x_first, (char**)&x_last);
auto y = std::strtof(y_first, (char**)&y_last);
auto next_x_first = std::find_if_not(y_last, last, is_whitespace);
return std::make_tuple(Point{x, y}, next_x_first);
}

PointList parse_points2(const char* i, const char* last) {
PointList points;
Point point;
while (i != last) {
std::tie(point, i) = parse_point2(i, last);
points.push_front(point);
}
return points;
}

int main() {
auto s = generate_points(500000);
auto time0 = Clock::now();
auto points1 = parse_points1(s.c_str());
auto time1 = Clock::now();
auto points2 = parse_points2(s.data(), s.data() + s.size());
auto time2 = Clock::now();
std::cout <<"using stringstream:"
<< std::chrono::duration_cast<milliseconds>(time1 - time0).count() << '
';
std::cout <<"using strtof:"
<< std::chrono::duration_cast<milliseconds>(time2 - time1).count() << '
';
return 0;
}

输出：

1 2	using stringstream: 1262 using strtof: 120

相关讨论

您可以首先尝试使用C I/O禁用同步：

1	std::ios::sync_with_stdio(false);

源代码：在C++程序中使用ScMcFor()比使用CIN更快？

您还可以尝试使用iostream的替代方案：

boost_lexical_cast和define boost_lexical_cast_assume_c_locale
斯坎夫

我想你应该试一下.其他的选择需要更多的编码，我不确定你会赢很多(如果有的话)。

相关讨论