关于c ++:使用stringstream而不是`sscanf`来解析固定格式的字符串


Using stringstream instead of `sscanf` to parse a fixed-format string

我想使用stringstream提供的功能从固定格式string提取值,作为sscanf的类型安全替代。 我怎样才能做到这一点?

考虑以下特定用例。 我有以下固定格式的std::string

YYYYMMDDHHMMSSmmm

哪里:

1
2
3
4
5
6
7
YYYY = 4 digits representing the year
MM = 2 digits representing the month ('0' padded to 2 characters)
DD = 2 digits representing the day ('0' padded to 2 characters)
HH = 2 digits representing the hour ('0' padded to 2 characters)
MM = 2 digits representing the minute ('0' padded to 2 characters)
SS = 2 digits representing the second ('0' padded to 2 characters)
mmm = 3 digits representing the milliseconds ('0' padded to 3 characters)

以前我是按照以下方式做一些事情:

1
2
3
string s ="20101220110651184";
unsigned year = 0, month = 0, day = 0, hour = 0, minute = 0, second = 0, milli = 0;    
sscanf(s.c_str(),"%4u%2u%2u%2u%2u%2u%3u", &year, &month, &day, &hour, &minute, &second, &milli );

宽度值是幻数,没关系。 为了类型安全,我想使用流来提取这些值并将它们转换为unsigned。 但是当我尝试这个:

1
2
3
stringstream ss;
ss <<"20101220110651184";
ss >> setw(4) >> year;

year保留值0。 它应该是2010

我该怎么做? 我不能使用Boost或任何其他第三方库,也不能使用C ++ 0x。


一个不是特别有效的选择是构造一些临时字符串并使用词法转换:

1
2
3
std::string s("20101220110651184");
int year = lexical_cast<int>(s.substr(0, 4));
// etc.

lexical_cast可以用几行代码实现。赫伯·萨特(Herb Sutter)在他的文章"庄园农场的弦乐格式化者"中提出了最低限度的要求。

这并不是您要查找的内容,而是从字符串中提取固定宽度字段的一种类型安全的方法。


嗯,如果是固定格式,为什么不这样做?

1
2
3
4
5
6
7
8
9
10
11
  std::string sd("20101220110651184");
  // insert spaces from the back
  sd.insert(14, 1, ' ');
  sd.insert(12, 1, ' ');
  sd.insert(10, 1, ' ');
  sd.insert(8, 1, ' ');
  sd.insert(6, 1, ' ');
  sd.insert(4, 1, ' ');
  int year, month, day, hour, min, sec, ms;
  std::istringstream str(sd);
  str >> year >> month >> day >> hour >> min >> sec >> ms;


我使用以下内容,这可能对您有用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
template<typename T> T stringTo( const std::string& s )
   {
      std::istringstream iss(s);
      T x;
      iss >> x;
      return x;
   };

template<typename T> inline std::string toString( const T& x )
   {
      std::ostringstream o;
      o << x;
      return o.str();
   }

这些模板要求:

1
#include <sstream>

用法

1
2
long date;
date = stringTo<long>( std::cin );

青年汽车


从这里,您可能会发现这很有用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
template<typename T, typename charT, typename traits>
std::basic_istream<charT, traits>&
  fixedread(std::basic_istream<charT, traits>& in, T& x)
{
  if (in.width(  ) == 0)
    // Not fixed size, so read normally.
    in >> x;
  else {
    std::string field;
    in >> field;
    std::basic_istringstream<charT, traits> stream(field);
    if (! (stream >> x))
      in.setstate(std::ios_base::failbit);
  }
  return in;
}

setw()仅适用于读入字符串cstrings。上面的函数使用了这个事实,将其读入字符串,然后将其强制转换为所需的类型。您可以将其与setw()ss.width(w)结合使用以读取任何类型的固定宽度字段。


ps5mh的解决方案确实不错,但是不适用于包含空格的固定大小的字符串解析。以下解决方案可解决此问题:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
template<typename T, typename T2>
struct FixedRead
{
    T& content;
    T2& number;
    int size;
    FixedRead(T& content, int size, T2 & number) :
        content(content), number(number), size(size)
    {
        assert (size != 0);
    }
    template<typename charT, typename traits>
    friend std::basic_istream<charT, traits>&
    operator >>(std::basic_istream<charT, traits>& in, FixedRead<T,T2> x)
    {
        if (!in.eof() && in.good())
        {
            std::vector<char> buffer(x.size+1);
            in.read(buffer.data(), x.size);
            int num_read = in.gcount();
            buffer[num_read] = 0; // set null-termination of string
            std::basic_stringstream<charT, traits> os(buffer.data());
            if (!(os >> x.content))
                in.setstate(std::ios_base::failbit);
            else
                ++x.number;
        }
        return in;
    }
};
template<typename T, typename T2>
FixedRead<T,T2> fixedread(T& content, int size, T2 & number) {
    return FixedRead<T,T2>(content, size, number);
}

可以用作:

1
2
3
4
5
6
7
8
9
10
11
12
13
std::string s  ="90007127       19000715790007397";
std::vector<int> ints(5);
int num_read = 0;
std::istringstream in(s);
in >> fixedread(ints[0], 8, num_read)
   >> fixedread(ints[1], 8, num_read)
   >> fixedread(ints[2], 8, num_read)
   >> fixedread(ints[3], 8, num_read)
   >> fixedread(ints[4], 8, num_read);
// output:
//   num_read = 4 (like return value of sscanf)
//   ints = 90007127, 1, 90007157, 90007397
//   ints[4] is uninitialized

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
template<typename T>
struct FixedRead {
    T& content;
    int size;
    FixedRead(T& content, int size) :
            content(content), size(size) {
        assert(size != 0);
    }
    template<typename charT, typename traits>
    friend std::basic_istream<charT, traits>&
    operator >>(std::basic_istream<charT, traits>& in, FixedRead< T > x) {
        int orig_w = in.width();
        std::basic_string<charT, traits> o;
        in >> setw(x.size) >> o;
        std::basic_stringstream<charT, traits> os(o);
        if (!(os >> x.content))
            in.setstate(std::ios_base::failbit);
        in.width(orig_w);
        return in;
    }
};

template<typename T>
FixedRead< T > fixed_read(T& content, int size) {
    return FixedRead< T >(content, size);
}

void test4() {
    stringstream ss("20101220110651184");
    int year = 0, month = 0, day = 0, hour = 0, min = 0, sec = 0, ms = 0;
    ss >> fixed_read(year, 4) >> fixed_read(month, 2) >> fixed_read(day, 2)
            >> fixed_read(hour, 2) >> fixed_read(min, 2) >> fixed_read(sec, 2)
            >> fixed_read(ms, 4);
    cout <<"year:" << year <<"," <<"month:" << month <<"," <<"day:" << day
            <<"," <<"hour:" << hour <<"," <<"min:" << min <<"," <<"sec:"
            << sec <<"," <<"ms:" << ms << endl;
}