diff --git a/SUMMARY.md b/SUMMARY.md index 3b5a04c..1456e36 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -31,6 +31,9 @@ * [Leaking](leaking.md) * [Unwinding](unwinding.md) * [Concurrency](concurrency.md) + * [Races](races.md) + * [Send and Sync](send-and-sync.md) + * [Atomics](atomics.md) * [Example: Implementing Vec](vec.md) * [Layout](vec-layout.md) * [Allocating](vec-alloc.md) diff --git a/atomics.md b/atomics.md new file mode 100644 index 0000000..e13fb01 --- /dev/null +++ b/atomics.md @@ -0,0 +1,47 @@ +% Atomics + +Rust pretty blatantly just inherits C11's memory model for atomics. This is not +due this model being particularly excellent or easy to understand. Indeed, this +model is quite complex and known to have [several flaws][C11-busted]. Rather, +it is a pragmatic concession to the fact that *everyone* is pretty bad at modeling +atomics. At very least, we can benefit from existing tooling and research around +C. + +Trying to fully explain the model is fairly hopeless. If you want all the +nitty-gritty details, you should check out [C's specification][C11-model]. +Still, we'll try to cover the basics and some of the problems Rust developers +face. + +The C11 memory model is fundamentally about trying to bridge the gap between C's +single-threaded semantics, common compiler optimizations, and hardware peculiarities +in the face of a multi-threaded environment. It does this by splitting memory +accesses into two worlds: data accesses, and atomic accesses. + +Data accesses are the bread-and-butter of the programming world. They are +fundamentally unsynchronized and compilers are free to aggressively optimize +them. In particular data accesses are free to be reordered by the compiler +on the assumption that the program is single-threaded. The hardware is also free +to propagate the changes made in data accesses as lazily and inconsistently as +it wants to other threads. Mostly critically, data accesses are where we get data +races. These are pretty clearly awful semantics to try to write a multi-threaded +program with. + +Atomic accesses are the answer to this. Each atomic access can be marked with +an *ordering*. The set of orderings Rust exposes are: + +* Sequentially Consistent (SeqCst) +* Release +* Acquire +* Relaxed + +(Note: We explicitly do not expose the C11 *consume* ordering) + +TODO: give simple "basic" explanation of these +TODO: implementing Arc example (why does Drop need the trailing barrier?) + + + + + +[C11-busted]: http://plv.mpi-sws.org/c11comp/popl15.pdf +[C11-model]: http://en.cppreference.com/w/c/atomic/memory_order diff --git a/concurrency.md b/concurrency.md index 71cfec3..e527632 100644 --- a/concurrency.md +++ b/concurrency.md @@ -1,217 +1,13 @@ % Concurrency and Paralellism - - -# Data Races and Race Conditions - -Safe Rust guarantees an absence of data races, which are defined as: - -* two or more threads concurrently accessing a location of memory -* one of them is a write -* one of them is unsynchronized - -A data race has Undefined Behaviour, and is therefore impossible to perform -in Safe Rust. Data races are *mostly* prevented through rust's ownership system: -it's impossible to alias a mutable reference, so it's impossible to perform a -data race. Interior mutability makes this more complicated, which is largely why -we have the Send and Sync traits (see below). - -However Rust *does not* prevent general race conditions. This is -pretty fundamentally impossible, and probably honestly undesirable. Your hardware -is racy, your OS is racy, the other programs on your computer are racy, and the -world this all runs in is racy. Any system that could genuinely claim to prevent -*all* race conditions would be pretty awful to use, if not just incorrect. - -So it's perfectly "fine" for a Safe Rust program to get deadlocked or do -something incredibly stupid with incorrect synchronization. Obviously such a -program isn't very good, but Rust can only hold your hand so far. Still, a -race condition can't violate memory safety in a Rust program on -its own. Only in conjunction with some other unsafe code can a race condition -actually violate memory safety. For instance: - -```rust -use std::thread; -use std::sync::atomic::{AtomicUsize, Ordering}; -use std::sync::Arc; - -let data = vec![1, 2, 3, 4]; -// Arc so that the memory the AtomicUsize is stored in still exists for -// the other thread to increment, even if we completely finish executing -// before it. Rust won't compile the program without it, because of the -// lifetime requirements of thread::spawn! -let idx = Arc::new(AtomicUsize::new(0)); -let other_idx = idx.clone(); - -// `move` captures other_idx by-value, moving it into this thread -thread::spawn(move || { - // It's ok to mutate idx because this value - // is an atomic, so it can't cause a Data Race. - other_idx.fetch_add(10, Ordering::SeqCst); -}); - -// Index with the value loaded from the atomic. This is safe because we -// read the atomic memory only once, and then pass a *copy* of that value -// to the Vec's indexing implementation. This indexing will be correctly -// bounds checked, and there's no chance of the value getting changed -// in the middle. However our program may panic if the thread we spawned -// managed to increment before this ran. A race condition because correct -// program execution (panicing is rarely correct) depends on order of -// thread execution. -println!("{}", data[idx.load(Ordering::SeqCst)]); - -if idx.load(Ordering::SeqCst) < data.len() { - unsafe { - // Incorrectly loading the idx *after* we did the bounds check. - // It could have changed. This is a race condition, *and dangerous* - // because we decided to do `get_unchecked`, which is `unsafe`. - println!("{}", data.get_unchecked(idx.load(Ordering::SeqCst))); - } -} -``` - - - - -# Send and Sync - -Not everything obeys inherited mutability, though. Some types allow you to multiply -alias a location in memory while mutating it. Unless these types use synchronization -to manage this access, they are absolutely not thread safe. Rust captures this with -through the `Send` and `Sync` traits. - -* A type is Send if it is safe to send it to another thread. -* A type is Sync if it is safe to share between threads (`&T` is Send). - -Send and Sync are *very* fundamental to Rust's concurrency story. As such, a -substantial amount of special tooling exists to make them work right. First and -foremost, they're *unsafe traits*. This means that they are unsafe *to implement*, -and other unsafe code can *trust* that they are correctly implemented. Since -they're *marker traits* (they have no associated items like methods), correctly -implemented simply means that they have the intrinsic properties an implementor -should have. Incorrectly implementing Send or Sync can cause Undefined Behaviour. - -Send and Sync are also what Rust calls *opt-in builtin traits*. -This means that, unlike every other trait, they are *automatically* derived: -if a type is composed entirely of Send or Sync types, then it is Send or Sync. -Almost all primitives are Send and Sync, and as a consequence pretty much -all types you'll ever interact with are Send and Sync. - -Major exceptions include: - -* raw pointers are neither Send nor Sync (because they have no safety guards) -* `UnsafeCell` isn't Sync (and therefore `Cell` and `RefCell` aren't) -* `Rc` isn't Send or Sync (because the refcount is shared and unsynchronized) - -`Rc` and `UnsafeCell` are very fundamentally not thread-safe: they enable -unsynchronized shared mutable state. However raw pointers are, strictly speaking, -marked as thread-unsafe as more of a *lint*. Doing anything useful -with a raw pointer requires dereferencing it, which is already unsafe. In that -sense, one could argue that it would be "fine" for them to be marked as thread safe. - -However it's important that they aren't thread safe to prevent types that -*contain them* from being automatically marked as thread safe. These types have -non-trivial untracked ownership, and it's unlikely that their author was -necessarily thinking hard about thread safety. In the case of Rc, we have a nice -example of a type that contains a `*mut` that is *definitely* not thread safe. - -Types that aren't automatically derived can *opt-in* to Send and Sync by simply -implementing them: - -```rust -struct MyBox(*mut u8); - -unsafe impl Send for MyBox {} -unsafe impl Sync for MyBox {} -``` - -In the *incredibly rare* case that a type is *inappropriately* automatically -derived to be Send or Sync, then one can also *unimplement* Send and Sync: - -```rust -struct SpecialThreadToken(u8); - -impl !Send for SpecialThreadToken {} -impl !Sync for SpecialThreadToken {} -``` - -Note that *in and of itself* it is impossible to incorrectly derive Send and Sync. -Only types that are ascribed special meaning by other unsafe code can possible cause -trouble by being incorrectly Send or Sync. - -Most uses of raw pointers should be encapsulated behind a sufficient abstraction -that Send and Sync can be derived. For instance all of Rust's standard -collections are Send and Sync (when they contain Send and Sync types) -in spite of their pervasive use raw pointers to -manage allocations and complex ownership. Similarly, most iterators into these -collections are Send and Sync because they largely behave like an `&` or `&mut` -into the collection. - -TODO: better explain what can or can't be Send or Sync. Sufficient to appeal -only to data races? - - - - -# Atomics - -Rust pretty blatantly just inherits C11's memory model for atomics. This is not -due this model being particularly excellent or easy to understand. Indeed, this -model is quite complex and known to have [several flaws][C11-busted]. Rather, -it is a pragmatic concession to the fact that *everyone* is pretty bad at modeling -atomics. At very least, we can benefit from existing tooling and research around -C. - -Trying to fully explain the model is fairly hopeless. If you want all the -nitty-gritty details, you should check out [C's specification][C11-model]. -Still, we'll try to cover the basics and some of the problems Rust developers -face. - -The C11 memory model is fundamentally about trying to bridge the gap between C's -single-threaded semantics, common compiler optimizations, and hardware peculiarities -in the face of a multi-threaded environment. It does this by splitting memory -accesses into two worlds: data accesses, and atomic accesses. - -Data accesses are the bread-and-butter of the programming world. They are -fundamentally unsynchronized and compilers are free to aggressively optimize -them. In particular data accesses are free to be reordered by the compiler -on the assumption that the program is single-threaded. The hardware is also free -to propagate the changes made in data accesses as lazily and inconsistently as -it wants to other threads. Mostly critically, data accesses are where we get data -races. These are pretty clearly awful semantics to try to write a multi-threaded -program with. - -Atomic accesses are the answer to this. Each atomic access can be marked with -an *ordering*. The set of orderings Rust exposes are: - -* Sequentially Consistent (SeqCst) -* Release -* Acquire -* Relaxed - -(Note: We explicitly do not expose the C11 *consume* ordering) - -TODO: give simple "basic" explanation of these -TODO: implementing Arc example (why does Drop need the trailing barrier?) - - - - -# Actually Doing Things Concurrently - Rust as a language doesn't *really* have an opinion on how to do concurrency or parallelism. The standard library exposes OS threads and blocking sys-calls -because *everyone* has those and they're uniform enough that you can provide +because *everyone* has those, and they're uniform enough that you can provide an abstraction over them in a relatively uncontroversial way. Message passing, green threads, and async APIs are all diverse enough that any abstraction over them tends to involve trade-offs that we weren't willing to commit to for 1.0. -However Rust's current design is setup so that you can set up your own -concurrent paradigm or library as you see fit. Just require the right -lifetimes and Send and Sync where appropriate and everything should Just Work -with everyone else's stuff. - - - - -[C11-busted]: http://plv.mpi-sws.org/c11comp/popl15.pdf -[C11-model]: http://en.cppreference.com/w/c/atomic/memory_order +However the way Rust models concurrency makes it relatively easy design your own +concurrency paradigm as a library and have *everyone else's* code Just Work +with yours. Just require the right lifetimes and Send and Sync where appropriate +and you're off to the races. Or rather, not having races. Races are bad. diff --git a/races.md b/races.md new file mode 100644 index 0000000..240e4ac --- /dev/null +++ b/races.md @@ -0,0 +1,66 @@ +% Data Races and Race Conditions + +Safe Rust guarantees an absence of data races, which are defined as: + +* two or more threads concurrently accessing a location of memory +* one of them is a write +* one of them is unsynchronized + +A data race has Undefined Behaviour, and is therefore impossible to perform +in Safe Rust. Data races are *mostly* prevented through rust's ownership system: +it's impossible to alias a mutable reference, so it's impossible to perform a +data race. Interior mutability makes this more complicated, which is largely why +we have the Send and Sync traits (see below). + +However Rust *does not* prevent general race conditions. This is +pretty fundamentally impossible, and probably honestly undesirable. Your hardware +is racy, your OS is racy, the other programs on your computer are racy, and the +world this all runs in is racy. Any system that could genuinely claim to prevent +*all* race conditions would be pretty awful to use, if not just incorrect. + +So it's perfectly "fine" for a Safe Rust program to get deadlocked or do +something incredibly stupid with incorrect synchronization. Obviously such a +program isn't very good, but Rust can only hold your hand so far. Still, a +race condition can't violate memory safety in a Rust program on +its own. Only in conjunction with some other unsafe code can a race condition +actually violate memory safety. For instance: + +```rust +use std::thread; +use std::sync::atomic::{AtomicUsize, Ordering}; +use std::sync::Arc; + +let data = vec![1, 2, 3, 4]; +// Arc so that the memory the AtomicUsize is stored in still exists for +// the other thread to increment, even if we completely finish executing +// before it. Rust won't compile the program without it, because of the +// lifetime requirements of thread::spawn! +let idx = Arc::new(AtomicUsize::new(0)); +let other_idx = idx.clone(); + +// `move` captures other_idx by-value, moving it into this thread +thread::spawn(move || { + // It's ok to mutate idx because this value + // is an atomic, so it can't cause a Data Race. + other_idx.fetch_add(10, Ordering::SeqCst); +}); + +// Index with the value loaded from the atomic. This is safe because we +// read the atomic memory only once, and then pass a *copy* of that value +// to the Vec's indexing implementation. This indexing will be correctly +// bounds checked, and there's no chance of the value getting changed +// in the middle. However our program may panic if the thread we spawned +// managed to increment before this ran. A race condition because correct +// program execution (panicing is rarely correct) depends on order of +// thread execution. +println!("{}", data[idx.load(Ordering::SeqCst)]); + +if idx.load(Ordering::SeqCst) < data.len() { + unsafe { + // Incorrectly loading the idx *after* we did the bounds check. + // It could have changed. This is a race condition, *and dangerous* + // because we decided to do `get_unchecked`, which is `unsafe`. + println!("{}", data.get_unchecked(idx.load(Ordering::SeqCst))); + } +} +``` diff --git a/send-and-sync.md b/send-and-sync.md new file mode 100644 index 0000000..0ac14a8 --- /dev/null +++ b/send-and-sync.md @@ -0,0 +1,76 @@ +% Send and Sync + +Not everything obeys inherited mutability, though. Some types allow you to multiply +alias a location in memory while mutating it. Unless these types use synchronization +to manage this access, they are absolutely not thread safe. Rust captures this with +through the `Send` and `Sync` traits. + +* A type is Send if it is safe to send it to another thread. +* A type is Sync if it is safe to share between threads (`&T` is Send). + +Send and Sync are *very* fundamental to Rust's concurrency story. As such, a +substantial amount of special tooling exists to make them work right. First and +foremost, they're *unsafe traits*. This means that they are unsafe *to implement*, +and other unsafe code can *trust* that they are correctly implemented. Since +they're *marker traits* (they have no associated items like methods), correctly +implemented simply means that they have the intrinsic properties an implementor +should have. Incorrectly implementing Send or Sync can cause Undefined Behaviour. + +Send and Sync are also what Rust calls *opt-in builtin traits*. +This means that, unlike every other trait, they are *automatically* derived: +if a type is composed entirely of Send or Sync types, then it is Send or Sync. +Almost all primitives are Send and Sync, and as a consequence pretty much +all types you'll ever interact with are Send and Sync. + +Major exceptions include: + +* raw pointers are neither Send nor Sync (because they have no safety guards) +* `UnsafeCell` isn't Sync (and therefore `Cell` and `RefCell` aren't) +* `Rc` isn't Send or Sync (because the refcount is shared and unsynchronized) + +`Rc` and `UnsafeCell` are very fundamentally not thread-safe: they enable +unsynchronized shared mutable state. However raw pointers are, strictly speaking, +marked as thread-unsafe as more of a *lint*. Doing anything useful +with a raw pointer requires dereferencing it, which is already unsafe. In that +sense, one could argue that it would be "fine" for them to be marked as thread safe. + +However it's important that they aren't thread safe to prevent types that +*contain them* from being automatically marked as thread safe. These types have +non-trivial untracked ownership, and it's unlikely that their author was +necessarily thinking hard about thread safety. In the case of Rc, we have a nice +example of a type that contains a `*mut` that is *definitely* not thread safe. + +Types that aren't automatically derived can *opt-in* to Send and Sync by simply +implementing them: + +```rust +struct MyBox(*mut u8); + +unsafe impl Send for MyBox {} +unsafe impl Sync for MyBox {} +``` + +In the *incredibly rare* case that a type is *inappropriately* automatically +derived to be Send or Sync, then one can also *unimplement* Send and Sync: + +```rust +struct SpecialThreadToken(u8); + +impl !Send for SpecialThreadToken {} +impl !Sync for SpecialThreadToken {} +``` + +Note that *in and of itself* it is impossible to incorrectly derive Send and Sync. +Only types that are ascribed special meaning by other unsafe code can possible cause +trouble by being incorrectly Send or Sync. + +Most uses of raw pointers should be encapsulated behind a sufficient abstraction +that Send and Sync can be derived. For instance all of Rust's standard +collections are Send and Sync (when they contain Send and Sync types) +in spite of their pervasive use raw pointers to +manage allocations and complex ownership. Similarly, most iterators into these +collections are Send and Sync because they largely behave like an `&` or `&mut` +into the collection. + +TODO: better explain what can or can't be Send or Sync. Sufficient to appeal +only to data races? \ No newline at end of file