Does the C++ volatile keyword introduce a memory fence?
我知道
据我所知,对易失性对象的操作序列不能重新排序,必须保留。这似乎意味着一些记忆障碍是必要的,而且没有真正的方法来解决这个问题。我说得对吗?
在这个相关问题上有一个有趣的讨论
乔纳森·韦克利写道:
... Accesses to distinct volatile variables cannot be reordered by the
compiler as long as they occur in separate full expressions ... right
that volatile is useless for thread-safety, but not for the reasons he
gives. It's not because the compiler might reorder accesses to
volatile objects, but because the CPU might reorder them. Atomic
operations and memory barriers prevent the compiler and the CPU from
reordering
大卫·施瓦茨在评论中回复:
... There's no difference, from the point of view of the C++ standard,
between the compiler doing something and the compiler emitting
instructions that cause the hardware to do something. If the CPU may
reorder accesses to volatiles, then the standard doesn't require that
their order be preserved. ...... The C++ standard doesn't make any distinction about what does the
reordering. And you can't argue that the CPU can reorder them with no
observable effect so that's okay -- the C++ standard defines their
order as observable. A compiler is compliant with the C++ standard on
a platform if it generates code that makes the platform do what the
standard requires. If the standard requires accesses to volatiles not
be reordered, then a platform the reorders them isn't compliant. ...My point is that if the C++ standard prohibits the compiler from
reordering accesses to distinct volatiles, on the theory that the
order of such accesses is part of the program's observable behavior,
then it also requires the compiler to emit code that prohibits the CPU
from doing so. The standard does not differentiate between what the
compiler does and what the compiler's generate code makes the CPU do.
哪一个会产生两个问题:它们中的任何一个是"对"的?实际的实现真正做什么?
不要解释
- 在信号处理器内部时。因为写入
volatile 变量几乎是标准允许您从信号处理程序中进行的唯一操作。由于C++ 11,你可以使用EDCOX1,4,用于这个目的,但是只有当原子是无锁的。 - 根据英特尔的说法,在与
setjmp 打交道时。 - 当直接处理硬件时,您希望确保编译器不会优化您的读或写操作。
例如:
1 2 3 | volatile int *foo = some_memory_mapped_device; while (*foo) ; // wait until *foo turns false |
如果没有
注意,
在所有其他情况下,应将
Does the C++ volatile keyword introduce a memory fence?
符合规范的C++编译器不需要引入内存栅栏。您的特定编译器可能;将您的问题引向编译器的作者。
C++中的"易失性"函数与线程无关。记住,"volatile"的目的是禁用编译器优化,这样就不会优化由于外部条件而更改的寄存器的读取。由不同的CPU上的不同线程写入的内存地址是由于外部条件而改变的寄存器吗?不。同样,如果一些编译器作者选择将不同CPU上的不同线程写入的内存地址视为由于外部条件而改变的寄存器,那么这就是他们的业务;他们不需要这样做。它们也不是必需的——即使它引入了内存边界——例如,确保每个线程都能看到稳定的读写顺序。
事实上,对于C/C++中的线程来说,易失性几乎没有用。最好的做法是避免。
此外:内存围栏是特定处理器体系结构的一个实现细节。在C中,volatile显式设计用于多线程,规范并没有说明将引入半围栏,因为程序可能运行在一个最初没有围栏的体系结构上。相反,该规范再次保证了编译器、运行时和CPU将避免哪些优化,从而对如何排序某些副作用施加某些(非常弱)约束。实际上,这些优化是通过使用半围栏来消除的,但这是一个将来可能发生变化的实现细节。
您关心任何语言中volatile的语义,因为它们与多线程有关,这表明您正在考虑跨线程共享内存。考虑不这么做。它使您的程序更难理解,并且更可能包含微妙的、不可能重现的错误。
戴维忽略的是,C++标准指定了在特定情况下交互的多个线程的行为,所有其他结果都导致未定义的行为。如果不使用原子变量,则涉及至少一次写入的争用条件是未定义的。
因此,编译器完全有权放弃任何同步指令,因为CPU只会注意到由于缺少同步而显示未定义行为的程序中的差异。
首先,C++标准不能保证正确地对非原子的读/写排序所需的内存障碍。建议使用易失性变量与MMIO、信号处理等配合使用。在大多数实现中,易失性对于多线程不有用,通常不建议使用。
关于易失性访问的实现,这是编译器的选择。
本文描述了gcc行为,它表明您不能使用volatile对象作为内存屏障来对volatile内存进行顺序写入。
关于ICC行为,我发现这个源代码还告诉我们volatile并不保证按顺序访问内存。
Microsoft VS2013编译器有不同的行为。本文档解释了volatile如何加强发布/获取语义,并使volatile对象能够在多线程应用程序的锁/发布中使用。
需要考虑的另一个方面是,同一个编译器可能具有不同的行为WRT。取决于目标硬件架构的易失性。关于MSVS2013编译器的这篇文章清楚地说明了使用volatile for ARM平台编译的细节。
所以我的答案是:
Does the C++ volatile keyword introduce a memory fence?
可能是:没有保证,可能没有,但有些编译器可能会这样做。你不应该依赖这样的事实。
据我所知,编译器只在Itanium体系结构上插入内存边界。
这取决于哪个编译器是"编译器"。从2005开始,VisualC++完成。但是该标准不需要它,所以其他一些编译器不需要它。
这主要来自内存,基于C++11之前的版本,没有线程。但是我参加了委员会关于线程的讨论,可以这么说委员会从来没有打算用
可以证明(这是我接受的一个论点),这违背了标准,因为除非硬件将地址识别为内存映射IO和禁止任何重新排序等,您甚至不能使用volatile作为内存映射IO,至少在SPARC或Intel架构上。从来没有少过,没有一个我看过的Comiler(Sun CC、G++和MSC)可以输出任何围栏或内存条。指令。(关于微软提议扩展规则的时间
不必。volatile不是同步原语。它只是禁用优化,即在线程中按抽象机指定的相同顺序获得可预测的读写序列。但是不同线程中的读和写首先没有顺序,说保留或不保留它们的顺序是没有意义的。通过同步原语可以建立两个命令之间的顺序,不需要它们就可以得到UB。
关于记忆障碍的一点解释。典型的CPU具有多个级别的内存访问。有一个内存管道,几个级别的缓存,然后是RAM等。
MEMBAR说明冲洗管道。它们不会改变执行读和写的顺序,只会强制在给定的时刻执行未完成的读和写。它对多线程程序很有用,但对其他程序不太有用。
缓存通常在CPU之间自动保持一致。如果要确保缓存与RAM同步,则需要缓存刷新。它与Membar非常不同。
编译器需要在
请注意,一些编译器确实超出了C++标准所要求的,以便使EDCOX1 0在这些平台上更强大或更有用。可移植的代码不应该依赖于EDCOX1×0来做超出C++标准中规定的任何事情。
我总是在中断服务例程中使用volatile,例如,isr(通常是汇编代码)修改一些内存位置,并且在中断上下文之外运行的更高级别的代码通过指向volatile的指针访问内存位置。
我为RAM和内存映射IO执行此操作。
根据这里的讨论,这似乎仍然是volatile的有效使用,但与多线程或CPU无关。如果一个微控制器的编译器"知道"不可能有任何其他的访问(例如,每一次都是片上的,没有缓存,只有一个核心),我会认为一个内存边界根本没有暗示,编译器只需要防止某些优化。
当我们把更多的东西堆到执行对象代码的"系统"中时,几乎所有的赌注都被取消了,至少这就是我阅读本文的方式。编译器怎么可能覆盖所有的基呢?
关键字
它还意味着不应该期望任何读取都会产生可预测的值:编译器不应该假定任何关于读取的内容,即使是在写入同一个易失性对象之后:
1 2 3 4 | volatile int i; i = 1; int j = i; if (j == 1) // not assumed to be true |
声明对象volatile是否足以确保处理异步更改的代码的行为取决于平台:不同的CPU为正常内存读写提供不同级别的保证同步。除非您是该领域的专家,否则您可能不应该尝试编写这样的低级多线程代码。
原子原语为多线程提供了一个更高级别的对象视图,这使得对代码进行推理变得容易。几乎所有程序员都应该使用原子原语或提供互斥的原语,如互斥、读写锁、信号量或其他阻塞原语。
当我正在为3D图形和游戏引擎开发工作时,我正在学习一个在线可下载视频教程。我们确实在一个类中使用了
"Since we can now have our games run in multiple threads it is important to synchronize data between threads properly. In this video I show how to create a volitile locking class to ensure volitile variables are properly synchronized..."
如果你订阅了他的网站,并且可以在这个视频中访问他的视频,他引用了这篇文章,涉及使用
下面是来自上面链接的文章:http://www.drdobbs.com/cpp/volatile-the-multithread-programmers-b/184403766
volatile: The Multithreaded Programmer's Best Friend
By Andrei Alexandrescu, February 01, 2001
The volatile keyword was devised to prevent compiler optimizations that might render code incorrect in the presence of certain asynchronous events.
I don't want to spoil your mood, but this column addresses the dreaded topic of multithreaded programming. If — as the previous installment of Generic says — exception-safe programming is hard, it's child's play compared to multithreaded programming.
Programs using multiple threads are notoriously hard to write, prove correct, debug, maintain, and tame in general. Incorrect multithreaded programs might run for years without a glitch, only to unexpectedly run amok because some critical timing condition has been met.
Needless to say, a programmer writing multithreaded code needs all the help she can get. This column focuses on race conditions — a common source of trouble in multithreaded programs — and provides you with insights and tools on how to avoid them and, amazingly enough, have the compiler work hard at helping you with that.
Just a Little Keyword
Although both C and C++ Standards are conspicuously silent when it comes to threads, they do make a little concession to multithreading, in the form of the volatile keyword.
Just like its better-known counterpart const, volatile is a type modifier. It's intended to be used in conjunction with variables that are accessed and modified in different threads. Basically, without volatile, either writing multithreaded programs becomes impossible, or the compiler wastes vast optimization opportunities. An explanation is in order.
Consider the following code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14 class Gadget {
public:
void Wait() {
while (!flag_) {
Sleep(1000); // sleeps for 1000 milliseconds
}
}
void Wakeup() {
flag_ = true;
}
...
private:
bool flag_;
};The purpose of Gadget::Wait above is to check the flag_ member variable every second and return when that variable has been set to true by another thread. At least that's what its programmer intended, but, alas, Wait is incorrect.
Suppose the compiler figures out that Sleep(1000) is a call into an external library that cannot possibly modify the member variable flag_. Then the compiler concludes that it can cache flag_ in a register and use that register instead of accessing the slower on-board memory. This is an excellent optimization for single-threaded code, but in this case, it harms correctness: after you call Wait for some Gadget object, although another thread calls Wakeup, Wait will loop forever. This is because the change of flag_ will not be reflected in the register that caches flag_. The optimization is too ... optimistic.
Caching variables in registers is a very valuable optimization that applies most of the time, so it would be a pity to waste it. C and C++ give you the chance to explicitly disable such caching. If you use the volatile modifier on a variable, the compiler won't cache that variable in registers — each access will hit the actual memory location of that variable. So all you have to do to make Gadget's Wait/Wakeup combo work is to qualify flag_ appropriately:
1
2
3
4
5
6 class Gadget {
public:
... as above ...
private:
volatile bool flag_;
};Most explanations of the rationale and usage of volatile stop here and advise you to volatile-qualify the primitive types that you use in multiple threads. However, there is much more you can do with volatile, because it is part of C++'s wonderful type system.
Using volatile with User-Defined Types
You can volatile-qualify not only primitive types, but also user-defined types. In that case, volatile modifies the type in a way similar to const. (You can also apply const and volatile to the same type simultaneously.)
Unlike const, volatile discriminates between primitive types and user-defined types. Namely, unlike classes, primitive types still support all of their operations (addition, multiplication, assignment, etc.) when volatile-qualified. For example, you can assign a non-volatile int to a volatile int, but you cannot assign a non-volatile object to a volatile object.
Let's illustrate how volatile works on user-defined types on an example.
1
2
3
4
5
6
7
8
9
10
11
12 class Gadget {
public:
void Foo() volatile;
void Bar();
...
private:
String name_;
int state_;
};
...
Gadget regularGadget;
volatile Gadget volatileGadget;If you think volatile is not that useful with objects, prepare for some surprise.
1
2
3
4
5
6 volatileGadget.Foo(); // ok, volatile fun called for
// volatile object
regularGadget.Foo(); // ok, volatile fun called for
// non-volatile object
volatileGadget.Bar(); // error! Non-volatile function called for
// volatile object!The conversion from a non-qualified type to its volatile counterpart is trivial. However, just as with const, you cannot make the trip back from volatile to non-qualified. You must use a cast:
1
2 Gadget& ref = const_cast<Gadget&>(volatileGadget);
ref.Bar(); // okA volatile-qualified class gives access only to a subset of its interface, a subset that is under the control of the class implementer. Users can gain full access to that type's interface only by using a const_cast. In addition, just like constness, volatileness propagates from the class to its members (for example, volatileGadget.name_ and volatileGadget.state_ are volatile variables).
volatile, Critical Sections, and Race Conditions
The simplest and the most often-used synchronization device in multithreaded programs is the mutex. A mutex exposes the Acquire and Release primitives. Once you call Acquire in some thread, any other thread calling Acquire will block. Later, when that thread calls Release, precisely one thread blocked in an Acquire call will be released. In other words, for a given mutex, only one thread can get processor time in between a call to Acquire and a call to Release. The executing code between a call to Acquire and a call to Release is called a critical section. (Windows terminology is a bit confusing because it calls the mutex itself a critical section, while"mutex" is actually an inter-process mutex. It would have been nice if they were called thread mutex and process mutex.)
Mutexes are used to protect data against race conditions. By definition, a race condition occurs when the effect of more threads on data depends on how threads are scheduled. Race conditions appear when two or more threads compete for using the same data. Because threads can interrupt each other at arbitrary moments in time, data can be corrupted or misinterpreted. Consequently, changes and sometimes accesses to data must be carefully protected with critical sections. In object-oriented programming, this usually means that you store a mutex in a class as a member variable and use it whenever you access that class' state.
Experienced multithreaded programmers might have yawned reading the two paragraphs above, but their purpose is to provide an intellectual workout, because now we will link with the volatile connection. We do this by drawing a parallel between the C++ types' world and the threading semantics world.
- Outside a critical section, any thread might interrupt any other at any time; there is no control, so consequently variables accessible from multiple threads are volatile. This is in keeping with the original intent of volatile — that of preventing the compiler from unwittingly caching values used by multiple threads at once.
- Inside a critical section defined by a mutex, only one thread has access. Consequently, inside a critical section, the executing code has single-threaded semantics. The controlled variable is not volatile anymore — you can remove the volatile qualifier.
In short, data shared between threads is conceptually volatile outside a critical section, and non-volatile inside a critical section.
You enter a critical section by locking a mutex. You remove the volatile qualifier from a type by applying a const_cast. If we manage to put these two operations together, we create a connection between C++'s type system and an application's threading semantics. We can make the compiler check race conditions for us.
LockingPtr
We need a tool that collects a mutex acquisition and a const_cast. Let's develop a LockingPtr class template that you initialize with a volatile object obj and a mutex mtx. During its lifetime, a LockingPtr keeps mtx acquired. Also, LockingPtr offers access to the volatile-stripped obj. The access is offered in a smart pointer fashion, through operator-> and operator*. The const_cast is performed inside LockingPtr. The cast is semantically valid because LockingPtr keeps the mutex acquired for its lifetime.
First, let's define the skeleton of a class Mutex with which LockingPtr will work:
1
2
3
4
5
6 class Mutex {
public:
void Acquire();
void Release();
...
};To use LockingPtr, you implement Mutex using your operating system's native data structures and primitive functions.
LockingPtr is templated with the type of the controlled variable. For example, if you want to control a Widget, you use a LockingPtr that you initialize with a variable of type volatile Widget.
LockingPtr's definition is very simple. LockingPtr implements an unsophisticated smart pointer. It focuses solely on collecting a const_cast and a critical section.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 template <typename T>
class LockingPtr {
public:
// Constructors/destructors
LockingPtr(volatile T& obj, Mutex& mtx)
: pObj_(const_cast<T*>(&obj)), pMtx_(&mtx) {
mtx.Lock();
}
~LockingPtr() {
pMtx_->Unlock();
}
// Pointer behavior
T& operator*() {
return *pObj_;
}
T* operator->() {
return pObj_;
}
private:
T* pObj_;
Mutex* pMtx_;
LockingPtr(const LockingPtr&);
LockingPtr& operator=(const LockingPtr&);
};In spite of its simplicity, LockingPtr is a very useful aid in writing correct multithreaded code. You should define objects that are shared between threads as volatile and never use const_cast with them — always use LockingPtr automatic objects. Let's illustrate this with an example.
Say you have two threads that share a vector object:
1
2
3
4
5
6
7
8
9 class SyncBuf {
public:
void Thread1();
void Thread2();
private:
typedef vector<char> BufT;
volatile BufT buffer_;
Mutex mtx_; // controls access to buffer_
};Inside a thread function, you simply use a LockingPtr to get controlled access to the buffer_ member variable:
1
2
3
4
5
6
7 void SyncBuf::Thread1() {
LockingPtr<BufT> lpBuf(buffer_, mtx_);
BufT::iterator i = lpBuf->begin();
for (; i != lpBuf->end(); ++i) {
... use *i ...
}
}The code is very easy to write and understand — whenever you need to use buffer_, you must create a LockingPtr pointing to it. Once you do that, you have access to vector's entire interface.
The nice part is that if you make a mistake, the compiler will point it out:
1
2
3
4
5
6
7
8 void SyncBuf::Thread2() {
// Error! Cannot access 'begin' for a volatile object
BufT::iterator i = buffer_.begin();
// Error! Cannot access 'end' for a volatile object
for ( ; i != lpBuf->end(); ++i ) {
... use *i ...
}
}You cannot access any function of buffer_ until you either apply a const_cast or use LockingPtr. The difference is that LockingPtr offers an ordered way of applying const_cast to volatile variables.
LockingPtr is remarkably expressive. If you only need to call one function, you can create an unnamed temporary LockingPtr object and use it directly:
1
2
3 unsigned int SyncBuf::Size() {
return LockingPtr<BufT>(buffer_, mtx_)->size();
}Back to Primitive Types
We saw how nicely volatile protects objects against uncontrolled access and how LockingPtr provides a simple and effective way of writing thread-safe code. Let's now return to primitive types, which are treated differently by volatile.
Let's consider an example where multiple threads share a variable of type int.
1
2
3
4
5
6
7
8 class Counter {
public:
...
void Increment() { ++ctr_; }
void Decrement() { —ctr_; }
private:
int ctr_;
};If Increment and Decrement are to be called from different threads, the fragment above is buggy. First, ctr_ must be volatile. Second, even a seemingly atomic operation such as ++ctr_ is actually a three-stage operation. Memory itself has no arithmetic capabilities. When incrementing a variable, the processor:
- Reads that variable in a register
- Increments the value in the register
- Writes the result back to memory
This three-step operation is called RMW (Read-Modify-Write). During the Modify part of an RMW operation, most processors free the memory bus in order to give other processors access to the memory.
If at that time another processor performs a RMW operation on the same variable, we have a race condition: the second write overwrites the effect of the first.
To avoid that, you can rely, again, on LockingPtr:
1
2
3
4
5
6
7
8
9 class Counter {
public:
...
void Increment() { ++*LockingPtr<int>(ctr_, mtx_); }
void Decrement() { —*LockingPtr<int>(ctr_, mtx_); }
private:
volatile int ctr_;
Mutex mtx_;
};Now the code is correct, but its quality is inferior when compared to SyncBuf's code. Why? Because with Counter, the compiler will not warn you if you mistakenly access ctr_ directly (without locking it). The compiler compiles ++ctr_ if ctr_ is volatile, although the generated code is simply incorrect. The compiler is not your ally anymore, and only your attention can help you avoid race conditions.
What should you do then? Simply encapsulate the primitive data that you use in higher-level structures and use volatile with those structures. Paradoxically, it's worse to use volatile directly with built-ins, in spite of the fact that initially this was the usage intent of volatile!
volatile Member Functions
So far, we've had classes that aggregate volatile data members; now let's think of designing classes that in turn will be part of larger objects and shared between threads. Here is where volatile member functions can be of great help.
When designing your class, you volatile-qualify only those member functions that are thread safe. You must assume that code from the outside will call the volatile functions from any code at any time. Don't forget: volatile equals free multithreaded code and no critical section; non-volatile equals single-threaded scenario or inside a critical section.
For example, you define a class Widget that implements an operation in two variants — a thread-safe one and a fast, unprotected one.
1
2
3
4
5
6
7
8 class Widget {
public:
void Operation() volatile;
void Operation();
...
private:
Mutex mtx_;
};Notice the use of overloading. Now Widget's user can invoke Operation using a uniform syntax either for volatile objects and get thread safety, or for regular objects and get speed. The user must be careful about defining the shared Widget objects as volatile.
When implementing a volatile member function, the first operation is usually to lock this with a LockingPtr. Then the work is done by using the non- volatile sibling:
1
2
3
4 void Widget::Operation() volatile {
LockingPtr<Widget> lpThis(*this, mtx_);
lpThis->Operation(); // invokes the non-volatile function
}Summary
When writing multithreaded programs, you can use volatile to your advantage. You must stick to the following rules:
- Define all shared objects as volatile.
- Don't use volatile directly with primitive types.
- When defining shared classes, use volatile member functions to express thread safety.
If you do this, and if you use the simple generic component LockingPtr, you can write thread-safe code and worry much less about race conditions, because the compiler will worry for you and will diligently point out the spots where you are wrong.
A couple of projects I've been involved with use volatile and LockingPtr to great effect. The code is clean and understandable. I recall a couple of deadlocks, but I prefer deadlocks to race conditions because they are so much easier to debug. There were virtually no problems related to race conditions. But then you never know.
Acknowledgements
Many thanks to James Kanze and Sorin Jianu who helped with insightful ideas.
Andrei Alexandrescu is a Development Manager at RealNetworks Inc. (www.realnetworks.com), based in Seattle, WA, and author of the acclaimed book Modern C++ Design. He may be contacted at www.moderncppdesign.com. Andrei is also one of the featured instructors of The C++ Seminar (www.gotw.ca/cpp_seminar).
这篇文章可能有点过时,但它确实提供了一个很好的见解,即在使用多线程编程时使用volatile修饰符可以帮助保持事件异步,同时让编译器为我们检查竞争条件。这可能无法直接回答有关创建内存围栏的操作初始问题,但我选择将此作为其他人的答案,作为在处理多线程应用程序时充分使用volatile的极好参考。
我认为有关易失性和指令重新排序的混淆源于CPU重新排序的两个概念:
volatile会影响编译器生成代码的方式,假设是单线程执行(这包括中断)。它并不意味着任何关于内存屏障指令的内容,但它阻止编译器执行与内存访问相关的某些优化。一个典型的例子是从内存中重新获取一个值,而不是使用一个缓存在寄存器中的值。
无序执行如果最终结果可能发生在原始代码中,CPU可以不按顺序/推测地执行指令。CPU可以执行编译器中不允许的转换,因为编译器只能执行在所有情况下都正确的转换。相反,CPU可以检查这些优化的有效性,如果发现它们不正确,就退出它们。
其他CPU看到的内存读/写序列指令序列的最终结果(有效顺序)必须与编译器生成的代码的语义一致。但是,CPU选择的实际执行顺序可能不同。在其他CPU中看到的有效顺序(每个CPU可以有不同的视图)可能受到内存屏障的限制。我不确定实际的顺序和有效的顺序会有多大的不同,因为我不知道内存障碍会在多大程度上阻止CPU执行无序执行。
资料来源:
- 记忆障碍
- LLVM:Atomics
- 访问一次()和编译器错误