mirror of https://github.com/rust-lang/nomicon
parent
9e8b997dd0
commit
36943a9b98
@ -0,0 +1,47 @@
|
||||
% Atomics
|
||||
|
||||
Rust pretty blatantly just inherits C11's memory model for atomics. This is not
|
||||
due this model being particularly excellent or easy to understand. Indeed, this
|
||||
model is quite complex and known to have [several flaws][C11-busted]. Rather,
|
||||
it is a pragmatic concession to the fact that *everyone* is pretty bad at modeling
|
||||
atomics. At very least, we can benefit from existing tooling and research around
|
||||
C.
|
||||
|
||||
Trying to fully explain the model is fairly hopeless. If you want all the
|
||||
nitty-gritty details, you should check out [C's specification][C11-model].
|
||||
Still, we'll try to cover the basics and some of the problems Rust developers
|
||||
face.
|
||||
|
||||
The C11 memory model is fundamentally about trying to bridge the gap between C's
|
||||
single-threaded semantics, common compiler optimizations, and hardware peculiarities
|
||||
in the face of a multi-threaded environment. It does this by splitting memory
|
||||
accesses into two worlds: data accesses, and atomic accesses.
|
||||
|
||||
Data accesses are the bread-and-butter of the programming world. They are
|
||||
fundamentally unsynchronized and compilers are free to aggressively optimize
|
||||
them. In particular data accesses are free to be reordered by the compiler
|
||||
on the assumption that the program is single-threaded. The hardware is also free
|
||||
to propagate the changes made in data accesses as lazily and inconsistently as
|
||||
it wants to other threads. Mostly critically, data accesses are where we get data
|
||||
races. These are pretty clearly awful semantics to try to write a multi-threaded
|
||||
program with.
|
||||
|
||||
Atomic accesses are the answer to this. Each atomic access can be marked with
|
||||
an *ordering*. The set of orderings Rust exposes are:
|
||||
|
||||
* Sequentially Consistent (SeqCst)
|
||||
* Release
|
||||
* Acquire
|
||||
* Relaxed
|
||||
|
||||
(Note: We explicitly do not expose the C11 *consume* ordering)
|
||||
|
||||
TODO: give simple "basic" explanation of these
|
||||
TODO: implementing Arc example (why does Drop need the trailing barrier?)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
[C11-busted]: http://plv.mpi-sws.org/c11comp/popl15.pdf
|
||||
[C11-model]: http://en.cppreference.com/w/c/atomic/memory_order
|
@ -1,217 +1,13 @@
|
||||
% Concurrency and Paralellism
|
||||
|
||||
|
||||
|
||||
# Data Races and Race Conditions
|
||||
|
||||
Safe Rust guarantees an absence of data races, which are defined as:
|
||||
|
||||
* two or more threads concurrently accessing a location of memory
|
||||
* one of them is a write
|
||||
* one of them is unsynchronized
|
||||
|
||||
A data race has Undefined Behaviour, and is therefore impossible to perform
|
||||
in Safe Rust. Data races are *mostly* prevented through rust's ownership system:
|
||||
it's impossible to alias a mutable reference, so it's impossible to perform a
|
||||
data race. Interior mutability makes this more complicated, which is largely why
|
||||
we have the Send and Sync traits (see below).
|
||||
|
||||
However Rust *does not* prevent general race conditions. This is
|
||||
pretty fundamentally impossible, and probably honestly undesirable. Your hardware
|
||||
is racy, your OS is racy, the other programs on your computer are racy, and the
|
||||
world this all runs in is racy. Any system that could genuinely claim to prevent
|
||||
*all* race conditions would be pretty awful to use, if not just incorrect.
|
||||
|
||||
So it's perfectly "fine" for a Safe Rust program to get deadlocked or do
|
||||
something incredibly stupid with incorrect synchronization. Obviously such a
|
||||
program isn't very good, but Rust can only hold your hand so far. Still, a
|
||||
race condition can't violate memory safety in a Rust program on
|
||||
its own. Only in conjunction with some other unsafe code can a race condition
|
||||
actually violate memory safety. For instance:
|
||||
|
||||
```rust
|
||||
use std::thread;
|
||||
use std::sync::atomic::{AtomicUsize, Ordering};
|
||||
use std::sync::Arc;
|
||||
|
||||
let data = vec![1, 2, 3, 4];
|
||||
// Arc so that the memory the AtomicUsize is stored in still exists for
|
||||
// the other thread to increment, even if we completely finish executing
|
||||
// before it. Rust won't compile the program without it, because of the
|
||||
// lifetime requirements of thread::spawn!
|
||||
let idx = Arc::new(AtomicUsize::new(0));
|
||||
let other_idx = idx.clone();
|
||||
|
||||
// `move` captures other_idx by-value, moving it into this thread
|
||||
thread::spawn(move || {
|
||||
// It's ok to mutate idx because this value
|
||||
// is an atomic, so it can't cause a Data Race.
|
||||
other_idx.fetch_add(10, Ordering::SeqCst);
|
||||
});
|
||||
|
||||
// Index with the value loaded from the atomic. This is safe because we
|
||||
// read the atomic memory only once, and then pass a *copy* of that value
|
||||
// to the Vec's indexing implementation. This indexing will be correctly
|
||||
// bounds checked, and there's no chance of the value getting changed
|
||||
// in the middle. However our program may panic if the thread we spawned
|
||||
// managed to increment before this ran. A race condition because correct
|
||||
// program execution (panicing is rarely correct) depends on order of
|
||||
// thread execution.
|
||||
println!("{}", data[idx.load(Ordering::SeqCst)]);
|
||||
|
||||
if idx.load(Ordering::SeqCst) < data.len() {
|
||||
unsafe {
|
||||
// Incorrectly loading the idx *after* we did the bounds check.
|
||||
// It could have changed. This is a race condition, *and dangerous*
|
||||
// because we decided to do `get_unchecked`, which is `unsafe`.
|
||||
println!("{}", data.get_unchecked(idx.load(Ordering::SeqCst)));
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
# Send and Sync
|
||||
|
||||
Not everything obeys inherited mutability, though. Some types allow you to multiply
|
||||
alias a location in memory while mutating it. Unless these types use synchronization
|
||||
to manage this access, they are absolutely not thread safe. Rust captures this with
|
||||
through the `Send` and `Sync` traits.
|
||||
|
||||
* A type is Send if it is safe to send it to another thread.
|
||||
* A type is Sync if it is safe to share between threads (`&T` is Send).
|
||||
|
||||
Send and Sync are *very* fundamental to Rust's concurrency story. As such, a
|
||||
substantial amount of special tooling exists to make them work right. First and
|
||||
foremost, they're *unsafe traits*. This means that they are unsafe *to implement*,
|
||||
and other unsafe code can *trust* that they are correctly implemented. Since
|
||||
they're *marker traits* (they have no associated items like methods), correctly
|
||||
implemented simply means that they have the intrinsic properties an implementor
|
||||
should have. Incorrectly implementing Send or Sync can cause Undefined Behaviour.
|
||||
|
||||
Send and Sync are also what Rust calls *opt-in builtin traits*.
|
||||
This means that, unlike every other trait, they are *automatically* derived:
|
||||
if a type is composed entirely of Send or Sync types, then it is Send or Sync.
|
||||
Almost all primitives are Send and Sync, and as a consequence pretty much
|
||||
all types you'll ever interact with are Send and Sync.
|
||||
|
||||
Major exceptions include:
|
||||
|
||||
* raw pointers are neither Send nor Sync (because they have no safety guards)
|
||||
* `UnsafeCell` isn't Sync (and therefore `Cell` and `RefCell` aren't)
|
||||
* `Rc` isn't Send or Sync (because the refcount is shared and unsynchronized)
|
||||
|
||||
`Rc` and `UnsafeCell` are very fundamentally not thread-safe: they enable
|
||||
unsynchronized shared mutable state. However raw pointers are, strictly speaking,
|
||||
marked as thread-unsafe as more of a *lint*. Doing anything useful
|
||||
with a raw pointer requires dereferencing it, which is already unsafe. In that
|
||||
sense, one could argue that it would be "fine" for them to be marked as thread safe.
|
||||
|
||||
However it's important that they aren't thread safe to prevent types that
|
||||
*contain them* from being automatically marked as thread safe. These types have
|
||||
non-trivial untracked ownership, and it's unlikely that their author was
|
||||
necessarily thinking hard about thread safety. In the case of Rc, we have a nice
|
||||
example of a type that contains a `*mut` that is *definitely* not thread safe.
|
||||
|
||||
Types that aren't automatically derived can *opt-in* to Send and Sync by simply
|
||||
implementing them:
|
||||
|
||||
```rust
|
||||
struct MyBox(*mut u8);
|
||||
|
||||
unsafe impl Send for MyBox {}
|
||||
unsafe impl Sync for MyBox {}
|
||||
```
|
||||
|
||||
In the *incredibly rare* case that a type is *inappropriately* automatically
|
||||
derived to be Send or Sync, then one can also *unimplement* Send and Sync:
|
||||
|
||||
```rust
|
||||
struct SpecialThreadToken(u8);
|
||||
|
||||
impl !Send for SpecialThreadToken {}
|
||||
impl !Sync for SpecialThreadToken {}
|
||||
```
|
||||
|
||||
Note that *in and of itself* it is impossible to incorrectly derive Send and Sync.
|
||||
Only types that are ascribed special meaning by other unsafe code can possible cause
|
||||
trouble by being incorrectly Send or Sync.
|
||||
|
||||
Most uses of raw pointers should be encapsulated behind a sufficient abstraction
|
||||
that Send and Sync can be derived. For instance all of Rust's standard
|
||||
collections are Send and Sync (when they contain Send and Sync types)
|
||||
in spite of their pervasive use raw pointers to
|
||||
manage allocations and complex ownership. Similarly, most iterators into these
|
||||
collections are Send and Sync because they largely behave like an `&` or `&mut`
|
||||
into the collection.
|
||||
|
||||
TODO: better explain what can or can't be Send or Sync. Sufficient to appeal
|
||||
only to data races?
|
||||
|
||||
|
||||
|
||||
|
||||
# Atomics
|
||||
|
||||
Rust pretty blatantly just inherits C11's memory model for atomics. This is not
|
||||
due this model being particularly excellent or easy to understand. Indeed, this
|
||||
model is quite complex and known to have [several flaws][C11-busted]. Rather,
|
||||
it is a pragmatic concession to the fact that *everyone* is pretty bad at modeling
|
||||
atomics. At very least, we can benefit from existing tooling and research around
|
||||
C.
|
||||
|
||||
Trying to fully explain the model is fairly hopeless. If you want all the
|
||||
nitty-gritty details, you should check out [C's specification][C11-model].
|
||||
Still, we'll try to cover the basics and some of the problems Rust developers
|
||||
face.
|
||||
|
||||
The C11 memory model is fundamentally about trying to bridge the gap between C's
|
||||
single-threaded semantics, common compiler optimizations, and hardware peculiarities
|
||||
in the face of a multi-threaded environment. It does this by splitting memory
|
||||
accesses into two worlds: data accesses, and atomic accesses.
|
||||
|
||||
Data accesses are the bread-and-butter of the programming world. They are
|
||||
fundamentally unsynchronized and compilers are free to aggressively optimize
|
||||
them. In particular data accesses are free to be reordered by the compiler
|
||||
on the assumption that the program is single-threaded. The hardware is also free
|
||||
to propagate the changes made in data accesses as lazily and inconsistently as
|
||||
it wants to other threads. Mostly critically, data accesses are where we get data
|
||||
races. These are pretty clearly awful semantics to try to write a multi-threaded
|
||||
program with.
|
||||
|
||||
Atomic accesses are the answer to this. Each atomic access can be marked with
|
||||
an *ordering*. The set of orderings Rust exposes are:
|
||||
|
||||
* Sequentially Consistent (SeqCst)
|
||||
* Release
|
||||
* Acquire
|
||||
* Relaxed
|
||||
|
||||
(Note: We explicitly do not expose the C11 *consume* ordering)
|
||||
|
||||
TODO: give simple "basic" explanation of these
|
||||
TODO: implementing Arc example (why does Drop need the trailing barrier?)
|
||||
|
||||
|
||||
|
||||
|
||||
# Actually Doing Things Concurrently
|
||||
|
||||
Rust as a language doesn't *really* have an opinion on how to do concurrency or
|
||||
parallelism. The standard library exposes OS threads and blocking sys-calls
|
||||
because *everyone* has those and they're uniform enough that you can provide
|
||||
because *everyone* has those, and they're uniform enough that you can provide
|
||||
an abstraction over them in a relatively uncontroversial way. Message passing,
|
||||
green threads, and async APIs are all diverse enough that any abstraction over
|
||||
them tends to involve trade-offs that we weren't willing to commit to for 1.0.
|
||||
|
||||
However Rust's current design is setup so that you can set up your own
|
||||
concurrent paradigm or library as you see fit. Just require the right
|
||||
lifetimes and Send and Sync where appropriate and everything should Just Work
|
||||
with everyone else's stuff.
|
||||
|
||||
|
||||
|
||||
|
||||
[C11-busted]: http://plv.mpi-sws.org/c11comp/popl15.pdf
|
||||
[C11-model]: http://en.cppreference.com/w/c/atomic/memory_order
|
||||
However the way Rust models concurrency makes it relatively easy design your own
|
||||
concurrency paradigm as a library and have *everyone else's* code Just Work
|
||||
with yours. Just require the right lifetimes and Send and Sync where appropriate
|
||||
and you're off to the races. Or rather, not having races. Races are bad.
|
||||
|
@ -0,0 +1,66 @@
|
||||
% Data Races and Race Conditions
|
||||
|
||||
Safe Rust guarantees an absence of data races, which are defined as:
|
||||
|
||||
* two or more threads concurrently accessing a location of memory
|
||||
* one of them is a write
|
||||
* one of them is unsynchronized
|
||||
|
||||
A data race has Undefined Behaviour, and is therefore impossible to perform
|
||||
in Safe Rust. Data races are *mostly* prevented through rust's ownership system:
|
||||
it's impossible to alias a mutable reference, so it's impossible to perform a
|
||||
data race. Interior mutability makes this more complicated, which is largely why
|
||||
we have the Send and Sync traits (see below).
|
||||
|
||||
However Rust *does not* prevent general race conditions. This is
|
||||
pretty fundamentally impossible, and probably honestly undesirable. Your hardware
|
||||
is racy, your OS is racy, the other programs on your computer are racy, and the
|
||||
world this all runs in is racy. Any system that could genuinely claim to prevent
|
||||
*all* race conditions would be pretty awful to use, if not just incorrect.
|
||||
|
||||
So it's perfectly "fine" for a Safe Rust program to get deadlocked or do
|
||||
something incredibly stupid with incorrect synchronization. Obviously such a
|
||||
program isn't very good, but Rust can only hold your hand so far. Still, a
|
||||
race condition can't violate memory safety in a Rust program on
|
||||
its own. Only in conjunction with some other unsafe code can a race condition
|
||||
actually violate memory safety. For instance:
|
||||
|
||||
```rust
|
||||
use std::thread;
|
||||
use std::sync::atomic::{AtomicUsize, Ordering};
|
||||
use std::sync::Arc;
|
||||
|
||||
let data = vec![1, 2, 3, 4];
|
||||
// Arc so that the memory the AtomicUsize is stored in still exists for
|
||||
// the other thread to increment, even if we completely finish executing
|
||||
// before it. Rust won't compile the program without it, because of the
|
||||
// lifetime requirements of thread::spawn!
|
||||
let idx = Arc::new(AtomicUsize::new(0));
|
||||
let other_idx = idx.clone();
|
||||
|
||||
// `move` captures other_idx by-value, moving it into this thread
|
||||
thread::spawn(move || {
|
||||
// It's ok to mutate idx because this value
|
||||
// is an atomic, so it can't cause a Data Race.
|
||||
other_idx.fetch_add(10, Ordering::SeqCst);
|
||||
});
|
||||
|
||||
// Index with the value loaded from the atomic. This is safe because we
|
||||
// read the atomic memory only once, and then pass a *copy* of that value
|
||||
// to the Vec's indexing implementation. This indexing will be correctly
|
||||
// bounds checked, and there's no chance of the value getting changed
|
||||
// in the middle. However our program may panic if the thread we spawned
|
||||
// managed to increment before this ran. A race condition because correct
|
||||
// program execution (panicing is rarely correct) depends on order of
|
||||
// thread execution.
|
||||
println!("{}", data[idx.load(Ordering::SeqCst)]);
|
||||
|
||||
if idx.load(Ordering::SeqCst) < data.len() {
|
||||
unsafe {
|
||||
// Incorrectly loading the idx *after* we did the bounds check.
|
||||
// It could have changed. This is a race condition, *and dangerous*
|
||||
// because we decided to do `get_unchecked`, which is `unsafe`.
|
||||
println!("{}", data.get_unchecked(idx.load(Ordering::SeqCst)));
|
||||
}
|
||||
}
|
||||
```
|
@ -0,0 +1,76 @@
|
||||
% Send and Sync
|
||||
|
||||
Not everything obeys inherited mutability, though. Some types allow you to multiply
|
||||
alias a location in memory while mutating it. Unless these types use synchronization
|
||||
to manage this access, they are absolutely not thread safe. Rust captures this with
|
||||
through the `Send` and `Sync` traits.
|
||||
|
||||
* A type is Send if it is safe to send it to another thread.
|
||||
* A type is Sync if it is safe to share between threads (`&T` is Send).
|
||||
|
||||
Send and Sync are *very* fundamental to Rust's concurrency story. As such, a
|
||||
substantial amount of special tooling exists to make them work right. First and
|
||||
foremost, they're *unsafe traits*. This means that they are unsafe *to implement*,
|
||||
and other unsafe code can *trust* that they are correctly implemented. Since
|
||||
they're *marker traits* (they have no associated items like methods), correctly
|
||||
implemented simply means that they have the intrinsic properties an implementor
|
||||
should have. Incorrectly implementing Send or Sync can cause Undefined Behaviour.
|
||||
|
||||
Send and Sync are also what Rust calls *opt-in builtin traits*.
|
||||
This means that, unlike every other trait, they are *automatically* derived:
|
||||
if a type is composed entirely of Send or Sync types, then it is Send or Sync.
|
||||
Almost all primitives are Send and Sync, and as a consequence pretty much
|
||||
all types you'll ever interact with are Send and Sync.
|
||||
|
||||
Major exceptions include:
|
||||
|
||||
* raw pointers are neither Send nor Sync (because they have no safety guards)
|
||||
* `UnsafeCell` isn't Sync (and therefore `Cell` and `RefCell` aren't)
|
||||
* `Rc` isn't Send or Sync (because the refcount is shared and unsynchronized)
|
||||
|
||||
`Rc` and `UnsafeCell` are very fundamentally not thread-safe: they enable
|
||||
unsynchronized shared mutable state. However raw pointers are, strictly speaking,
|
||||
marked as thread-unsafe as more of a *lint*. Doing anything useful
|
||||
with a raw pointer requires dereferencing it, which is already unsafe. In that
|
||||
sense, one could argue that it would be "fine" for them to be marked as thread safe.
|
||||
|
||||
However it's important that they aren't thread safe to prevent types that
|
||||
*contain them* from being automatically marked as thread safe. These types have
|
||||
non-trivial untracked ownership, and it's unlikely that their author was
|
||||
necessarily thinking hard about thread safety. In the case of Rc, we have a nice
|
||||
example of a type that contains a `*mut` that is *definitely* not thread safe.
|
||||
|
||||
Types that aren't automatically derived can *opt-in* to Send and Sync by simply
|
||||
implementing them:
|
||||
|
||||
```rust
|
||||
struct MyBox(*mut u8);
|
||||
|
||||
unsafe impl Send for MyBox {}
|
||||
unsafe impl Sync for MyBox {}
|
||||
```
|
||||
|
||||
In the *incredibly rare* case that a type is *inappropriately* automatically
|
||||
derived to be Send or Sync, then one can also *unimplement* Send and Sync:
|
||||
|
||||
```rust
|
||||
struct SpecialThreadToken(u8);
|
||||
|
||||
impl !Send for SpecialThreadToken {}
|
||||
impl !Sync for SpecialThreadToken {}
|
||||
```
|
||||
|
||||
Note that *in and of itself* it is impossible to incorrectly derive Send and Sync.
|
||||
Only types that are ascribed special meaning by other unsafe code can possible cause
|
||||
trouble by being incorrectly Send or Sync.
|
||||
|
||||
Most uses of raw pointers should be encapsulated behind a sufficient abstraction
|
||||
that Send and Sync can be derived. For instance all of Rust's standard
|
||||
collections are Send and Sync (when they contain Send and Sync types)
|
||||
in spite of their pervasive use raw pointers to
|
||||
manage allocations and complex ownership. Similarly, most iterators into these
|
||||
collections are Send and Sync because they largely behave like an `&` or `&mut`
|
||||
into the collection.
|
||||
|
||||
TODO: better explain what can or can't be Send or Sync. Sufficient to appeal
|
||||
only to data races?
|
Loading…
Reference in new issue