关于C#:“while(* s ++ = * t ++)”如何复制字符串?


How does “while(*s++ = *t++)” copy a string?

我的问题是,这个代码是做什么的(从http://www.joelonsoftware.com/articles/collegeadvice.html):

1
while (*s++ = *t++);

网站上说上面的代码复制了一个字符串,但我不明白为什么…

这和指针有关吗?


它相当于:

1
2
3
4
5
6
while (*t) {
    *s = *t;
    s++;
    t++;
}
*s = *t;

t指向的字符是'\0'时,while循环将终止。在此之前,它将复制t指向s指向的字符,然后递增st指向其数组中的下一个字符。


这件事在幕后发生了很多事情:

1
while (*s++ = *t++);

st变量是指针(几乎可以确定为字符),s是目标。以下步骤说明了正在发生的情况:

  • t(*t的内容复制到s(*s的内容),一个字符。
  • st都是递增的(++)。
  • 赋值(copy)返回被复制的字符(到while)。
  • while继续到该字符为零(C中的字符串结束)。

实际上,它是:

1
2
3
4
5
6
7
8
while (*t != 0) {
    *s = *t;
    s++;
    t++;
}
*s = *t;
s++;
t++;

但写得要紧凑得多。


假设st是指向字符串的char *(假设s至少和t一样大)。在C语言中,字符串都以0结尾(ascii"nul"),对吗?那么这是做什么的:

1
*s++ = *t++;

首先,它执行*s = *t,将*t的值复制到*s。然后,它执行s++,所以s现在指向下一个字符。然后它执行t++,所以t指向下一个字符。这与运算符优先级和前缀与后缀增量/减量有关。

运算符优先级是运算符的解析顺序。举个简单的例子,请看:

1
4 + 2 * 3

这是4 + (2 * 3)还是(4 + 2) * 3呢?好吧,我们知道这是第一个因为优先权-二元*(乘法运算符)比二元+(加法运算符)具有更高的优先权,并且首先被解析。

*s++中,我们有一元*(指针解引用运算符)和一元++(后缀增量运算符)。在这种情况下,++*具有更高的优先级(也被称为"更紧的绑定")。如果我们说++*s,我们会增加*s的值,而不是s所指的地址,因为前缀增量具有较低的优先级*作为解引用,但我们使用了具有较高优先级的后缀增量。如果我们想使用前缀增量,我们可以使用*(++s),因为圆括号会覆盖所有较低的先例并强制++s排在第一位,但这会产生不良的副作用,即在字符串的开头留下一个空字符。

注意,仅仅因为它具有更高的优先级并不意味着它首先发生。固定后增量是在使用值后发生的,这就是为什么*s = *t发生在s++之前。

所以现在你了解了*s++ = *t++。但他们把它放在一个循环中:

1
while(*s++ = *t++);

这个循环什么也不做-操作都处于状态。但是检查这个条件——如果*s为0,则返回"false",这意味着*t为0,这意味着它们位于字符串的末尾(yay代表ascii"nul")。因此,只要t中有字符,这个循环就会循环,并将它们尽职地复制到s中,一路递增st。当这个循环退出时,s已经被nul终止,并且是一个合适的字符串。唯一的问题是,s指向了终点。准备另一个指针指向s的开头(即swhile()循环之前),这将是您复制的字符串:

1
2
3
char *s, *string = s;
while(*s++ = *t++);
printf("%s", string); // prints the string that was in *t

或者,检查一下:

1
2
3
4
5
size_t i = strlen(t);
while(*s++ = *t++);
s -= i + 1;
printf("%s
"
, s); // prints the string that was in *t

我们从获取长度开始,所以当我们结束时,我们做了更多的指针运算,把s放回到它开始的地方。

当然,为了简单起见,这个代码片段(以及我所有的代码片段)忽略了缓冲区问题。更好的版本是:

1
2
3
4
5
6
7
size_t i = strlen(t);
char *c = malloc(i + 1);
while(*s++ = *t++);
s -= i + 1;
printf("%s
"
, s); // prints the string that was in *t
free(c);

但你已经知道了,否则你很快就会在每个人最喜欢的网站上问一个问题。;)

*实际上,它们具有相同的优先级,但这是由不同的规则解决的。在这种情况下,它们实际上具有较低的优先级。


1
while(*s++ = *t++);

为什么人们认为它等同于:

1
2
3
4
5
6
while (*t) {
    *s = *t;
    s++;
    t++;
}
*s = *t; /* if *t was 0 at the beginning s and t are not incremented  */

显然不是的时候。

1
2
3
4
5
6
7
char tmp = 0;
do {
   tmp = *t;
   *s = tmp;
   s++;
   t++;
} while(tmp);

更像它

编辑:更正了编译错误。tmp变量必须在循环外部声明。


这方面的神秘之处在于操作顺序。如果您查找C语言规范,它会声明在这个上下文中,操作的顺序如下:

1
2
3
1. * operator
2. = (assignment) operator
3. ++ operator

所以while循环就变成了,用英语来说:

1
2
3
4
while (some condition):
  Take what is at address"t" and copy it over to location at address"s".
  Increment"s" by one address location.
  Increment"t" by one address location.

现在,"某种情况"是什么?C语言规范还指出赋值表达式的值是赋值本身,在本例中是*t

所以"某些条件"是"t指向非零的东西",或者以一种更简单的方式,"而t位置的数据不是NULL"。


BrianW.Kernighan和DennisM.Ritchie的C编程语言(K&R)对此给出了详细的解释。

第二版,第104页:

5.5 Character Pointers and Functions

A string constant, written as

1
"I am a string"

is an array of characters. In the internal representation, the array is terminated with the null character '\0' so that programs can find the end. The length in storage is thus one more than the number of characters between the double quotes.

Perhaps the most common occurrence of string constants is as arguments to functions, as in

1
2
printf("hello, world
"
);

Where a character string like this appears in a program, access to it is through a character pointer; printf receives a pointer to the beginning of the character array. That is, a string constant is accessed by a pointer to its first element.

String constants need not be functions arguments. If pmessage is declared as

1
char *pmessage;

then the statement

1
pmessage ="now is the time";

assigns to pmessage a pointer to the character array. This is not a string copy; only pointers are involved. C does not provide any operators for processing an entire string of characters as a unit.

There is an important different between these definitions:

1
2
char amessage[] ="now is the time"; /* an array */
char *pmessage ="now is the time"; /* a pointer */

amessage is an array, just big enough to hold the sequence of characters and '\0' that initializes it. Individual characters within the array may be changed by amessage will always refer to the same storage. On the other hand, pmessage is a pointer, initialized to point to a string constant; the pointer may subsequently be modified to point elsewhere, but the result is undefined if you try to modify the string contents.

1
2
3
4
5
6
7
          +---+       +--------------------+
pmessage: | o-------->| now is the time \0 |
          +---+       +--------------------+

          +--------------------+
amessage: | now is the time \0 |
          +--------------------+

We will illustrate more aspects of pointers and arrays by studying versions of two useful functions adapted from the standard library. The first function is strcpy(s,t), which copies the string t to the string s. It would be nice just to say s = t but this copies the pointer, not the characters.To copy the characters, we need a loop. The array version is first:

1
2
3
4
5
6
7
8
9
/* strcpy: copy t to s; array subscript version */
void strcpy(char *s, char *t)
{
    int i;

    i = 0;
    while((s[i] = t[i]) != '\0')
        i ++;
}

For contrast, here is a version of strcpy with pointers:

1
2
3
4
5
6
7
8
9
/* strcpy: copy t to s; pointer version 1 */
void strcpy(char *s, char *t)
{
    while((*s = *t) != '\0')
    {
        s ++;
        t ++;
    }
}

Because arguments are passed by value, strcpy can use the parameters s and t in any way it pleases. Here they are conveniently initialized pointers, which are marched along the arrays a character at a time, until the '\0' that terminates t has been copied to s.

In practice, strcpy would not be written as we showed it above. Experienced C programmers would prefer

1
2
3
4
5
6
/* strcpy: copy t to s; pointer version 2 */
void strcpy(char *s, char *t)
{
    while((*s++ = *t++) != '\0')
        ;
}

This moves the increment of s and t into the test part of the loop. The value of *t++ is the character that t pointed to before t was incremented; the postfix ++ doesn't change t until after this character has been fetched. In the same way, the character is stored into the old s position before s is incremented. This character is also the value that is compared against '\0' to control the loop. The net effect is that characters are copied from t to s, up to and including the terminating '\0'.

As the final abbreviation, observe that a comparison against '\0' is redundant, since the question is merely whether the expression is zero. So the function would likely be written as

1
2
3
4
5
/* strcpy: cope t to s; pointer version 3 */
void strcpy(char *s, char *t)
{
    while(*s++ = *t++);
}

Although this may seem cryptic as first sight, the notational convenience is considerable, and the idiom should be mastered, because you will see if frequently in C programs.

The strcpy in the standard library () returns the target string as its function value.

这是本节相关部分的结尾。

附言:如果你喜欢读这本书,可以考虑买一本K&R——它不贵。


提示:

  • 运算符"="的作用是什么?
  • 表达式"a=b"的值是多少?如果你做"c=a=b",c会得到什么值?
  • 什么终止了C字符串?它是判断对还是错?
  • 在"*s++"中,哪个运算符具有更高的优先级?

忠告:

  • 请改用strncpy()。

它的工作原理是将"EDOCX1"(0)指向的字符串中的字符复制到"EDOCX1"(1)指向的字符串中。对于每个字符副本,两个指针都递增。当循环找到一个NUL字符(等于零,因此退出)时,循环终止。


许多C语言的追随者相信"while(*s++=*t++)"是真正的恩典。

在循环"while"的条件表达式中,插入了三个副作用(一个指针的移位、第二个指针的移位、赋值)。

因此,循环体为空,因为所有功能都放置在条件表达式中。


它复制一个字符串,因为数组总是通过引用传递的,而字符串只是一个char数组。基本上,所发生的是(如果我正确地记住了这个术语)指针算术。以下是维基百科关于C数组的更多信息。

您将在s中存储从t中取消引用的值,然后通过+移动到下一个索引。


假设你有这样的东西:

1
char *someString ="Hello, World!";

someString指向字符串中的第一个字符-在本例中为"h"。

现在,如果将指针增加一个:

1
someString++

现在somestring将指向"e"。

1
while ( *someString++ );

将循环直到somestring指向的值变为空,这表示字符串的结尾("以空结尾")。

代码:

1
while (*s++ = *t++);

等于:

1
2
3
4
5
while ( *t != NULL ) { // While whatever t points to isn't NULL
    *s = *t;           // copy whatever t points to into s
    s++;
    t++;
}


开始一个while循环….

*S=*T先到,这将把T点分配给S点。也就是说,它将一个字符从T字符串复制到S字符串。

正在分配的内容将传递给while条件…任何非零都是"真的",所以它将继续,而0是假的,它将停止……刚好,字符串的结尾也是零。

S++和T++会增加指针

一切又开始了

所以它不断地分配循环,移动指针,直到它到达字符串末尾的0


我提供以下答案的问题已作为此问题的副本关闭,因此我将在此处复制答案的相关部分。

while循环的实际语义解释如下:

1
2
3
4
5
6
7
8
9
10
for (;;) {
    char *olds = s;             // original s in olds
    char *oldt = t;             // original t in oldt
    char c = *oldt;             // original *t in c
    s += 1;                     // complete post increment of s
    t += 1;                     // complete post increment of t
    *olds = c;                  // copy character c into *olds
    if (c) continue;            // continue if c is not 0
    break;                      // otherwise loop ends
}

保存st的顺序以及增加st的顺序可以互换。在保存oldt之后和使用c之前的任何时候,都可以将*oldt保存到c中。在保存colds之后,可以随时将c分配给*olds。在我的信封背面,至少有40种不同的解释。


是的,这和指针有关。

读取代码的方法是:"指针"s"指向的值(在该操作之后递增)获取指针"t"指向的值(在该操作之后递增;该操作的整个值计算为所复制字符的值;在此操作中迭代,直到这个值等于零"。由于字符串空终止符的值是零("/0")的字符值,因此循环将迭代,直到字符串从t指向的位置复制到s指向的位置。


是的,它使用指针,并在评估while条件时完成所有工作。C允许条件表达式有副作用。

"*"运算符取消接收指针s和t。

递增运算符("++")在赋值后递增指针和T。

循环在空字符的条件下终止,该字符在C中的计算结果为假。

另外一条评论……这不是安全的代码,因为它不做任何事情来确保S有足够的内存分配。


好吧,这是真的,就在char的情况下,如果没有并且它是一个整数数组,程序将崩溃,因为将有一个地址,其元素不是数组或指针的一部分,如果系统有使用malloc分配的内存,那么系统将继续提供内存