关于C#：strcmp()如何工作？

How does strcmp() work?

我一直在寻找答案。我将创建一系列自己的字符串函数，如my_strcmp()，my_strcat()等。

strcmp()是否可以处理两个字符数组的每个索引，如果ASCII值在两个字符串的相同索引处较小，则该字符串按字母顺序更大，因此返回0或1或2？我想我要问的是，它是否使用字符的ASCII值来返回这些结果？

任何帮助将不胜感激。

[修订]

好的，所以我想出了这个...它适用于所有情况，除非第二个字符串大于第一个字符串。

有小费吗？

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

int my_strcmp(char s1[], char s2[])
{
int i = 0;
while ( s1[i] != '\0' )
{
if( s2[i] == '\0' ) { return 1; }
else if( s1[i] < s2[i] ) { return -1; }
else if( s1[i] > s2[i] ) { return 1; }
i++;
}
return 0;
}

int main (int argc, char *argv[])
{
int result = my_strcmp(argv[1], argv[2]);

printf("Value: %d
", result);

return 0;

}

相关讨论

strcmp的伪代码"实现"将类似于：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

define strcmp (s1, s2):
p1 = address of first character of str1
p2 = address of first character of str2

while contents of p1 not equal to null:
if contents of p2 equal to null:
return 1

if contents of p2 greater than contents of p1:
return -1

if contents of p1 greater than contents of p2:
return 1

advance p1
advance p2

if contents of p2 not equal to null:
return -1

return 0

基本上就是这样。依次比较每个字符，根据该字符决定第一个或第二个字符串是否更大。

只有当字符相同时才移动到下一个字符，如果所有字符都相同，则返回零。

请注意，您可能不一定得到1和-1，规格说任何正值或负值都足够了，因此您应该始终使用< 0，> 0或== 0检查返回值。

把它变成真正的C会相对简单：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

int myStrCmp (const char *s1, const char *s2) {
const unsigned char *p1 = (const unsigned char *)s1;
const unsigned char *p2 = (const unsigned char *)s2;

while (*p1 != '\0') {
if (*p2 == '\0') return 1;
if (*p2 > *p1) return -1;
if (*p1 > *p2) return 1;

p1++;
p2++;
}

if (*p2 != '\0') return -1;

return 0;
}

还要记住，字符上下文中的"更大"不一定基于所有字符串函数的简单ASCII排序。

C有一个名为'locales'的概念，它指定(除其他外)整理或基础字符集的排序，你可以找到，例如，字符a，á，à和?都被认为是相同的。对于像strcoll这样的函数会发生这种情况。

相关讨论

这是BSD实现：

1
2
3
4
5
6
7
8
9

int
strcmp(s1, s2)
register const char *s1, *s2;
{
while (*s1 == *s2++)
if (*s1++ == 0)
return (0);
return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1));
}

一旦两个字符不匹配，它就会返回这两个字符之间的差异。

相关讨论

它使用字符的字节值，如果第一个字符串出现在第二个字符串之前(按字节值排序)，则返回负值;如果它们相等则返回零;如果第一个字符串出现在第二个字符串之后，则返回正值。由于它以字节为单位运行，因此不能识别编码。

例如：

1
2
3
4

strcmp("abc","def") < 0
strcmp("abc","abcd") < 0 // null character is less than 'd'
strcmp("abc","ABC") > 0 // 'a' > 'A' in ASCII
strcmp("abc","abc") == 0

更确切地说，如strcmp Open Group规范中所述：

The sign of a non-zero return value shall be determined by the sign of the difference between the values of the first pair of bytes (both interpreted as type unsigned char) that differ in the strings being compared.

请注意，返回值可能不等于此差异，但它将带有相同的符号。

相关讨论

这是来自大师们自己(K＆R，第2版，第106页)：

1
2
3
4
5
6
7
8
9
10

// strcmp: return < 0 if s < t, 0 if s == t, > 0 if s > t
int strcmp(char *s, char *t)
{
int i;

for (i = 0; s[i] == t[i]; i++)
if (s[i] == '\0')
return 0;
return s[i] - t[i];
}

相关讨论

这是我的版本，为小型微控制器应用程序编写，符合MISRA-C标准。
这段代码的主要目的是编写可读代码，而不是大多数编译器库中的单行goo。

1
2
3
4
5
6
7
8
9
10

int8_t strcmp (const uint8_t* s1, const uint8_t* s2)
{
while ( (*s1 != '\0') && (*s1 == *s2) )
{
s1++;
s2++;
}

return (int8_t)( (int16_t)*s1 - (int16_t)*s2 );
}

注意：代码假定为16位int类型。

相关讨论

注意：不需要&& (*s2 != '\0')。
此代码不正确：(*s1 != *s2)应为(*s1 == *s2)。为什么strcmp会返回int8_t？它定义为返回int！最终的return有3个无用的强制转换：return (*s1 > *s2) - (*s1 < *s2);不对int的大小或char的大小做出假设。
如果字符串可以包含非ASCII字符，则返回值(int8_t)( (int16_t)*s1 - (int16_t)*s2 );实际上是不正确的，因为2个字符的差异可能超出int8_t的范围。您将获得实现定义的行为，并可能得到一个不正确的标志。
@chqrlie确实，不知道！=来自哪里。我从中获取代码片段的原始代码没有该错误。固定。
@chqrlie对于返回类型int8_t，它是对微控制器系统的优化，其中int将无法使用。强制转换是为了确保每个操作数都是预期的类型。在编写符合MISRA的代码时，不允许编写依赖于隐式类型促销的代码，这是一个合理的规则。同样，不允许您将依赖于隐式类型转换的代码编写为较小的类型，因此最终转换。
@Lundin：从我自己的经验来审计这种代码，这些约束无助于程序员生成更好或更安全的代码。在这种特殊情况下，对于简单情况，结果的符号是不正确的。例如：strcmp("\001","\377");返回2，其中结果应为负数，如果返回类型为int则返回。
@chqrlie即使对于带有签名int strcmp(const char *s1, const char *s2);的标准C strcmp，您的测试用例也存在缺陷。如果您尝试将值377o存储在char变量中，则无法保证会发生什么，因此您的测试用例会调用未定义的行为。话虽这么说，这段代码显然假设7位ASCII。这是为嵌入式系统环境编写的，未使用扩展符号表。或者如果是，则使用wcscmp，而不是strcmp。
@Lundin：如果char至少有8位，"\377"是一个完全有效的字符串。它定义了与"\xFF"相同的字符串，如果char类型是有符号的，则此字符为负数，字符文字'\377'也是如此。扩展符号表是一个完全不相关的主题，我同意您绝对不希望在嵌入式系统中使用宽字符API。
@chqrlie No.如果char有8位但是有符号，则文字\377太大而无法放入其中。文字本身的类型为int，因此您的代码等同于编写类似char ch = 255;的代码。此类代码不安全，根据C11 6.3.1.3调用指定不当的行为"否则，新类型已签名且值无法在其中表示;结果是实现定义的，或者引发实现定义的信号。"这是您的测试用例中的错误，而不是发布的答案。
@Lundin：我担心你错了：如果char类型是8位有符号，字符文字'\377'确实有类型int但它的值是-1(参见C11 6.4.4.4字符常量pp9至13，具有等效示例'\xFF'。关于字符串文字，C11 6.4.5 p4明确指出相同的注意事项适用于字符串文字中序列的每个元素，就好像它是一个整数字符常量....因此，在同一平台上，字符串文字， "\377"被解析为具有值-1的单个字符。测试用例中还有其他问题吗？

此代码等效，更短，更易读：

1
2
3
4
5
6
7
8
9

int8_t strcmp (const uint8_t* s1, const uint8_t* s2)
{
while( (*s1!='\0') && (*s1==*s2) ){
s1++;
s2++;
}

return (int8_t)*s1 - (int8_t)*s2;
}

我们只需要测试s1的结尾，因为如果我们在s1结束之前到达s2的末尾，则循环将终止(因为* s2！= * s1)。

返回表达式在每种情况下计算正确的值，前提是我们只使用7位(纯ASCII)字符。由于存在整数溢出的风险，因此需要仔细考虑为8位字符生成正确的代码。

相关讨论

我在网上发现了这个。

http://www.opensource.apple.com/source/Libc/Libc-262/ppc/gen/strcmp.c

1
2
3
4
5
6
7

int strcmp(const char *s1, const char *s2)
{
for ( ; *s1 == *s2; s1++, s2++)
if (*s1 == '\0')
return 0;
return ((*(unsigned char *)s1 < *(unsigned char *)s2) ? -1 : +1);
}

这是我实现strcmp的方式：
它的工作原理如下：
它比较两个字符串的第一个字母，如果相同，则继续下一个字母。如果不是，则返回相应的值。它非常简单易懂：
＃包括

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

//function declaration:
int strcmp(char string1[], char string2[]);

int main()
{
char string1[]=" The San Antonio spurs";
char string2[]=" will be champins again!";
//calling the function- strcmp
printf("
number returned by the strcmp function: %d", strcmp(string1, string2));
getch();
return(0);
}

/**This function calculates the dictionary value of the string and compares it to another string.
it returns a number bigger than 0 if the first string is bigger than the second
it returns a number smaller than 0 if the second string is bigger than the first
input: string1, string2
output: value- can be 1, 0 or -1 according to the case*/
int strcmp(char string1[], char string2[])
{
int i=0;
int value=2; //this initialization value could be any number but the numbers that can be returned by the function
while(value==2)
{
if (string1[i]>string2[i])
{
value=1;
}
else if (string1[i]<string2[i])
{
value=-1;
}
else
{
i++;
}
}
return(value);
}

相关讨论

就是这样：

1
2
3
4
5
6
7

int strcmp(char *str1, char *str2){
while( (*str1 == *str2) && (*str1 != 0) ){
++*str1;
++*str2;
}
return (*str1-*str2);
}

如果你想要更快，你可以在类型之前添加"register"，如下所示：
注册char

然后，像这样：

1
2
3
4
5
6
7

int strcmp(register char *str1, register char *str2){
while( (*str1 == *str2) && (*str1 != 0) ){
++*str1;
++*str2;
}
return (*str1-*str2);
}

这样，如果可能的话，使用ALU的寄存器。

相关讨论