关于C#:strcmp()和signed / unsigned字符

strcmp() and signed / unsigned chars

我对strcmp()感到困惑,或者更确切地说,它是如何由标准定义的。考虑比较两个字符串,其中一个字符串包含ASCII-7范围之外的字符(0-127)。

C标准规定:

int strcmp(const char *s1, const char *s2);

The strcmp function compares the string pointed to by s1 to the string
pointed to by s2.

The strcmp function returns an integer greater than, equal to, or
less than zero, accordingly as the
string pointed to by s1 is greater
than, equal to, or less than the
string pointed to by s2.

参数为char *。不是unsigned char *。没有任何概念认为"应该按照unsigned的方式进行比较"。

但是我检查过的所有标准库都认为"高"字符就是这样,它的值比ASCII-7字符高。

我理解这是有用的,也是预期的行为。我不想说现有的实现是错误的。我只想知道,我遗漏了标准规格中的哪一部分?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
int strcmp_default( const char * s1, const char * s2 )
{
    while ( ( *s1 ) && ( *s1 == *s2 ) )
    {
        ++s1;
        ++s2;
    }
    return ( *s1 - *s2 );
}

int strcmp_unsigned( const char * s1, const char *s2 )
{
    unsigned char * p1 = (unsigned char *)s1;
    unsigned char * p2 = (unsigned char *)s2;

    while ( ( *p1 ) && ( *p1 == *p2 ) )
    {
        ++p1;
        ++p2;
    }
    return ( *p1 - *p2 );
}

#include <stdio.h>
#include <string.h>

int main()
{
    char x1[] ="abc";
    char x2[] ="abü";
    printf("%d
"
, strcmp_default( x1, x2 ) );
    printf("%d
"
, strcmp_unsigned( x1, x2 ) );
    printf("%d
"
, strcmp( x1, x2 ) );
    return 0;
}

输出为:

1
2
3
103
-153
-153


7.21.4/1(C99),重点是矿山:

The sign of a nonzero value returned by the comparison functions memcmp, strcmp,
and strncmp is determined by the sign of the difference between the values of the first
pair of characters (both interpreted as unsigned char) that differ in the objects being
compared.

C90中也有类似的东西。

请注意,strcoll()可能比strcmp()更适合,特别是在基本字符集之外有字符的情况下。