Merge pull request #28 from Gankro/cleanup

Cleanup the first chapter
8 years ago · a4322ccb28
parent f570bcb681 a08085479b
commit a4322ccb28
6 changed files with 216 additions and 168 deletions
--- a/src/README.md
+++ b/src/README.md
@ -2,7 +2,7 @@

 #### The Dark Arts of Advanced and Unsafe Rust Programming

-# NOTE: This is a draft document, and may contain serious errors
+# NOTE: This is a draft document that discusses several unstable aspects of Rust, and may contain serious errors or outdated information.

 > Instead of the programs I had hoped for, there came only a shuddering blackness
 and ineffable loneliness; and I saw at last a fearful truth which no one had
@ -19,20 +19,23 @@ infinitesimal fragments of despair.

 Should you wish a long and happy career of writing Rust programs, you should
 turn back now and forget you ever saw this book. It is not necessary. However
-if you intend to write unsafe code -- or just want to dig into the guts of the
-language -- this book contains invaluable information.
-
-Unlike [The Book][trpl] we will be assuming considerable prior knowledge. In
-particular, you should be comfortable with basic systems programming and Rust.
-If you don't feel comfortable with these topics, you should consider [reading
-The Book][trpl] first. Though we will not be assuming that you have, and will
-take care to occasionally give a refresher on the basics where appropriate. You
-can skip straight to this book if you want; just know that we won't be
-explaining everything from the ground up.
-
-To be clear, this book goes into deep detail. We're going to dig into
-exception-safety, pointer aliasing, memory models, and even some type-theory.
-We will also be spending a lot of time talking about the different kinds
-of safety and guarantees.
+if you intend to write unsafe code — or just want to dig into the guts of the
+language — this book contains lots of useful information.
+
+Unlike *[The Rust Programming Language][trpl]*, we will be assuming considerable
+prior knowledge. In particular, you should be comfortable with basic systems
+programming and Rust. If you don't feel comfortable with these topics, you
+should consider [reading The Book][trpl] first. That said, we won't assume you
+have read it, and we will take care to occasionally give a refresher on the
+basics where appropriate. You can skip straight to this book if you want;
+just know that we won't be explaining everything from the ground up.
+
+We're going to dig into exception-safety, pointer aliasing, memory models,
+compiler and hardware implementation details, and even some type-theory.
+Much text will be devoted to exotic corner cases that no one *should* ever have
+to care about, but suddenly become important because we wrote `unsafe`.
+
+We will also be spending a lot of time talking about the different kinds of
+safety and guarantees that programs could care about.

 [trpl]: ../book/index.html
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@ -4,6 +4,7 @@

 * [Meet Safe and Unsafe](meet-safe-and-unsafe.md)
 	* [How Safe and Unsafe Interact](safe-unsafe-meaning.md)
+	* [What Unsafe Can Do](what-unsafe-does.md)
 	* [Working with Unsafe](working-with-unsafe.md)
 * [Data Layout](data.md)
 	* [repr(Rust)](repr-rust.md)
--- a/src/meet-safe-and-unsafe.md
+++ b/src/meet-safe-and-unsafe.md
@ -1,41 +1,51 @@
 # Meet Safe and Unsafe

-![safe and unsafe](img/safeandunsafe.svg) 
+![safe and unsafe](img/safeandunsafe.svg)

-Programmers in safe "high-level" languages face a fundamental dilemma. On one
-hand, it would be *really* great to just say what you want and not worry about
-how it's done. On the other hand, that can lead to unacceptably poor
-performance. It may be necessary to drop down to less clear or idiomatic
-practices to get the performance characteristics you want. Or maybe you just
-throw up your hands in disgust and decide to shell out to an implementation in
-a less sugary-wonderful *unsafe* language.
+It would be great to not have to worry about low-level implementation details.
+Who could possibly care how much space the empty tuple occupies? Sadly, it
+sometimes matters and we need to worry about it. The most common reason
+developers start to care about implementation details is performance, but more
+importantly, these details can become a matter of correctness when interfacing
+directly with hardware, operating systems, or other languages.

-Worse, when you want to talk directly to the operating system, you *have* to
-talk to an unsafe language: *C*. C is ever-present and unavoidable. It's the
-lingua-franca of the programming world.
-Even other safe languages generally expose C interfaces for the world at large!
-Regardless of why you're doing it, as soon as your program starts talking to
-C it stops being safe.
+When implementation details start to matter in a safe programming language,
+programmers usually have three options:

-With that said, Rust is *totally* a safe programming language.
+* fiddle with the code to encourage the compiler/runtime to perform an optimization
+* adopt a more unidiomatic or cumbersome design to get the desired implementation
+* rewrite the implementation in a language that lets you deal with those details

-Well, Rust *has* a safe programming language. Let's step back a bit.
+For that last option, the language programmers tend to use is *C*. This is often
+necessary to interface with systems that only declare a C interface.

-Rust can be thought of as being composed of two programming languages: *Safe
-Rust* and *Unsafe Rust*. Safe Rust is For Reals  Totally Safe. Unsafe Rust,
-unsurprisingly, is *not* For Reals Totally Safe.  In fact, Unsafe Rust lets you
-do some really, *really* unsafe things.
+Unfortunately, C is incredibly unsafe to use (sometimes for good reason),
+and this unsafety is magnified when trying to interoperate with another
+language. Care must be taken to ensure C and the other language agree on
+what's happening, and that they don't step on each other's toes.
+
+So what does this have to do with Rust?
+
+Well, unlike C, Rust is a safe programming language.
+
+But, like C, Rust is an unsafe programming language.
+
+More accurately, Rust *contains* both a safe and unsafe programming language.
+
+Rust can be thought of as a combination of two programming languages: *Safe
+Rust* and *Unsafe Rust*. Conveniently, these names mean exactly what they say:
+Safe Rust is Safe. Unsafe Rust is, well, not. In fact, Unsafe Rust lets us
+do some *really* unsafe things. Things the Rust authors will implore you not to
+do, but we'll do anyway.

 Safe Rust is the *true* Rust programming language. If all you do is write Safe
 Rust, you will never have to worry about type-safety or memory-safety. You will
-never endure a null or dangling pointer, or any of that Undefined Behavior
-nonsense.
-
-*That's totally awesome.*
+never endure a dangling pointer, a use-after-free, or any other kind of
+Undefined Behavior.

-The standard library also gives you enough utilities out-of-the-box that you'll
-be able to write awesome high-performance applications and libraries in pure
-idiomatic Safe Rust.
+The standard library also gives you enough utilities out of the box that you'll
+be able to write high-performance applications and libraries in pure idiomatic
+Safe Rust.

 But maybe you want to talk to another language. Maybe you're writing a
 low-level abstraction not exposed by the standard library. Maybe you're
@ -44,57 +54,15 @@ need to do something the type-system doesn't understand and just *frob some dang
 bits*. Maybe you need Unsafe Rust.

 Unsafe Rust is exactly like Safe Rust with all the same rules and semantics.
-However Unsafe Rust lets you do some *extra* things that are Definitely Not Safe.
-
-The only things that are different in Unsafe Rust are that you can:
-
-* Dereference raw pointers
-* Call `unsafe` functions (including C functions, intrinsics, and the raw allocator)
-* Implement `unsafe` traits
-* Mutate statics
-
-That's it. The reason these operations are relegated to Unsafe is that misusing
-any of these things will cause the ever dreaded Undefined Behavior. Invoking
-Undefined Behavior gives the compiler full rights to do arbitrarily bad things
-to your program. You definitely *should not* invoke Undefined Behavior.
-
-Unlike C, Undefined Behavior is pretty limited in scope in Rust. All the core
-language cares about is preventing the following things:
-
-* Dereferencing null or dangling pointers
-* Reading [uninitialized memory]
-* Breaking the [pointer aliasing rules]
-* Producing invalid primitive values:
-    * dangling/null references
-    * a `bool` that isn't 0 or 1
-    * an undefined `enum` discriminant
-    * a `char` outside the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF]
-    * A non-utf8 `str`
-* Unwinding into another language
-* Causing a [data race][race]
-
-That's it. That's all the causes of Undefined Behavior baked into Rust. Of
-course, unsafe functions and traits are free to declare arbitrary other
-constraints that a program must maintain to avoid Undefined Behavior. However,
-generally violations of these constraints will just transitively lead to one of
-the above problems. Some additional constraints may also derive from compiler
-intrinsics that make special assumptions about how code can be optimized.
-
-Rust is otherwise quite permissive with respect to other dubious operations.
-Rust considers it "safe" to:
-
-* Deadlock
-* Have a [race condition][race]
-* Leak memory
-* Fail to call destructors
-* Overflow integers
-* Abort the program
-* Delete the production database
-
-However any program that actually manages to do such a thing is *probably*
-incorrect. Rust provides lots of tools to make these things rare, but
-these problems are considered impractical to categorically prevent.
-
-[pointer aliasing rules]: references.html
-[uninitialized memory]: uninitialized.html
-[race]: races.html
+It just lets you do some *extra* things that are Definitely Not Safe
+(which we will define in the next section).
+
+The value of this separation is that we gain the benefits of using an unsafe
+language like C — low level control over implementation details — without most
+of the problems that come with trying to integrate it with a completely
+different safe language.
+
+There are still some problems — most notably, we must become aware of properties
+that the type system assumes and audit them in any code that interacts with
+Unsafe Rust. That's the purpose of this book: to teach you about these assumptions
+and how to manage them.
--- a/src/safe-unsafe-meaning.md
+++ b/src/safe-unsafe-meaning.md
@ -6,25 +6,29 @@ interact?
 The separation between Safe Rust and Unsafe Rust is controlled with the
 `unsafe` keyword, which acts as an interface from one to the other. This is
 why we can say Safe Rust is a safe language: all the unsafe parts are kept
-exclusively behind the boundary.
+exclusively behind the `unsafe` boundary. If you wish, you can even toss
+`#![forbid(unsafe_code)]` into your code base to statically guarantee that
+you're only writing Safe Rust.

 The `unsafe` keyword has two uses: to declare the existence of contracts the
-compiler can't check, and to declare that the adherence of some code to
-those contracts has been checked by the programmer.
+compiler can't check, and to declare that a programmer has checked that these
+contracts have been upheld.

 You can use `unsafe` to indicate the existence of unchecked contracts on
-_functions_ and on _trait declarations_. On functions, `unsafe` means that
+_functions_ and _trait declarations_. On functions, `unsafe` means that
 users of the function must check that function's documentation to ensure
 they are using it in a way that maintains the contracts the function
 requires. On trait declarations, `unsafe` means that implementors of the
 trait must check the trait documentation to ensure their implementation
 maintains the contracts the trait requires.

-You can use `unsafe` on a block to declare that all constraints required
-by an unsafe function within the block have been adhered to, and the code
-can therefore be trusted. You can use `unsafe` on a trait implementation
-to declare that the implementation of that trait has adhered to whatever
-contracts the trait's documentation requires.
+You can use `unsafe` on a block to declare that all unsafe actions performed
+within are verified to uphold the contracts of those operations. For instance,
+the index passed to `slice::get_unchecked` is in-bounds.
+
+You can use `unsafe` on a trait implementation to declare that the implementation
+upholds the trait's contract. For instance, that a type implementing `Send` is
+really safe to move to another thread.

 The standard library has a number of unsafe functions, including:

@ -32,11 +36,10 @@ The standard library has a number of unsafe functions, including:
  memory safety to be freely violated.
 * `mem::transmute` reinterprets some value as having a given type, bypassing
  type safety in arbitrary ways (see [conversions] for details).
-* Every raw pointer to a sized type has an intrinstic `offset` method that
-  invokes Undefined Behavior if the passed offset is not "in bounds" as
-  defined by LLVM.
-* All FFI functions are `unsafe` because the other language can do arbitrary
-  operations that the Rust compiler can't check.
+* Every raw pointer to a sized type has an `offset` method that
+  invokes Undefined Behavior if the passed offset is not ["in bounds"][ptr_offset].
+* All FFI functions are `unsafe` to call because the other language can do
+  arbitrary operations that the Rust compiler can't check.

 As of Rust 1.0 there are exactly two unsafe traits:

@ -45,41 +48,60 @@ As of Rust 1.0 there are exactly two unsafe traits:
 * `Sync` is a marker trait that promises threads can safely share implementors
  through a shared reference.

-Much of the Rust standard library also uses Unsafe Rust internally, although
-these implementations are rigorously manually checked, and the Safe Rust
-interfaces provided on top of these implementations can be assumed to be safe.
+Much of the Rust standard library also uses Unsafe Rust internally. These
+implementations have generally been rigorously manually checked, so the Safe Rust
+interfaces built on top of these implementations can be assumed to be safe.

 The need for all of this separation boils down a single fundamental property
 of Safe Rust:

 **No matter what, Safe Rust can't cause Undefined Behavior.**

-The design of the safe/unsafe split means that Safe Rust inherently has to
-trust that any Unsafe Rust it touches has been written correctly (meaning
-the Unsafe Rust actually maintains whatever contracts it is supposed to
-maintain). On the other hand, Unsafe Rust has to be very careful about
-trusting Safe Rust.
+The design of the safe/unsafe split means that there is an asymmetric trust
+relationship between Safe and Unsafe Rust. Safe Rust inherently has to
+trust that any Unsafe Rust it touches has been written correctly.
+On the other hand, Unsafe Rust has to be very careful about trusting Safe Rust.

 As an example, Rust has the `PartialOrd` and `Ord` traits to differentiate
-between types which can "just" be compared, and those that provide a total
-ordering (where every value of the type is either equal to, greater than,
-or less than any other value of the same type). The sorted map type
-`BTreeMap` doesn't make sense for partially-ordered types, and so it
-requires that any key type for it implements the `Ord` trait. However,
-`BTreeMap` has Unsafe Rust code inside of its implementation, and this
-Unsafe Rust code cannot assume that any `Ord` implementation it gets makes
-sense. The unsafe portions of `BTreeMap`'s internals have to be careful to
-maintain all necessary contracts, even if a key type's `Ord` implementation
-does not implement a total ordering.
-
-Unsafe Rust cannot automatically trust Safe Rust. When writing Unsafe Rust,
-you must be careful to only rely on specific Safe Rust code, and not make
-assumptions about potential future Safe Rust code providing the same
-guarantees.
-
-This is the problem that `unsafe` traits exist to resolve. The `BTreeMap`
-type could theoretically require that keys implement a new trait called
-`UnsafeOrd`, rather than `Ord`, that might look like this:
+between types which can "just" be compared, and those that provide a "total"
+ordering (which basically means that comparison behaves reasonably).
+
+`BTreeMap` doesn't really make sense for partially-ordered types, and so it
+requires that its keys implement `Ord`. However, `BTreeMap` has Unsafe Rust code
+inside of its implementation. Because it would be unacceptable for a sloppy `Ord`
+implementation (which is Safe to write) to cause Undefined Behavior, the Unsafe
+code in BTreeMap must be written to be robust against `Ord` implementations which
+aren't actually total — even though that's the whole point of requiring `Ord`.
+
+The Unsafe Rust code just can't trust the Safe Rust code to be written correctly.
+That said, `BTreeMap` will still behave completely erratically if you feed in
+values that don't have a total ordering. It just won't ever cause Undefined
+Behavior.
+
+One may wonder, if `BTreeMap` cannot trust `Ord` because it's Safe, why can it
+trust *any* Safe code? For instance `BTreeMap` relies on integers and slices to
+be implemented correctly. Those are safe too, right?
+
+The difference is one of scope. When `BTreeMap` relies on integers and slices,
+it's relying on one very specific implementation. This is a measured risk that
+can be weighed against the benefit. In this case there's basically zero risk;
+if integers and slices are broken, *everyone* is broken. Also, they're maintained
+by the same people who maintain `BTreeMap`, so it's easy to keep tabs on them.
+
+On the other hand, `BTreeMap`'s key type is generic. Trusting its `Ord` implementation
+means trusting every `Ord` implementation in the past, present, and future.
+Here the risk is high: someone somewhere is going to make a mistake and mess up
+their `Ord` implementation, or even just straight up lie about providing a total
+ordering because "it seems to work". When that happens, `BTreeMap` needs to be
+prepared.
+
+The same logic applies to trusting a closure that's passed to you to behave
+correctly.
+
+This problem of unbounded generic trust is the problem that `unsafe` traits
+exist to resolve. The `BTreeMap` type could theoretically require that keys
+implement a new trait called `UnsafeOrd`, rather than `Ord`, that might look
+like this:

 ```rust
 use std::cmp::Ordering;
@ -92,32 +114,32 @@ unsafe trait UnsafeOrd {
 Then, a type would use `unsafe` to implement `UnsafeOrd`, indicating that
 they've ensured their implementation maintains whatever contracts the
 trait expects. In this situation, the Unsafe Rust in the internals of
-`BTreeMap` could trust that the key type's `UnsafeOrd` implementation is
-correct. If it isn't, it's the fault of the unsafe trait implementation
-code, which is consistent with Rust's safety guarantees.
+`BTreeMap` would be justified in trusting that the key type's `UnsafeOrd`
+implementation is correct. If it isn't, it's the fault of the unsafe trait
+implementation, which is consistent with Rust's safety guarantees.

 The decision of whether to mark a trait `unsafe` is an API design choice.
-Rust has traditionally avoided marking traits unsafe because it makes Unsafe
-Rust pervasive, which is not desirable. `Send` and `Sync` are marked unsafe
+Rust has traditionally avoided doing this because it makes Unsafe
+Rust pervasive, which isn't desirable. `Send` and `Sync` are marked unsafe
 because thread safety is a *fundamental property* that unsafe code can't
 possibly hope to defend against in the way it could defend against a bad
 `Ord` implementation. The decision of whether to mark your own traits `unsafe`
-depends on the same sort of consideration. If `unsafe` code cannot reasonably
+depends on the same sort of consideration. If `unsafe` code can't reasonably
 expect to defend against a bad implementation of the trait, then marking the
 trait `unsafe` is a reasonable choice.

-As an aside, while `Send` and `Sync` are `unsafe` traits, they are
+As an aside, while `Send` and `Sync` are `unsafe` traits, they are *also*
 automatically implemented for types when such derivations are provably safe
 to do. `Send` is automatically derived for all types composed only of values
 whose types also implement `Send`. `Sync` is automatically derived for all
-types composed only of values whose types also implement `Sync`.
+types composed only of values whose types also implement `Sync`. This minimizes
+the pervasive unsafety of making these two traits `unsafe`.

-This is the dance of Safe Rust and Unsafe Rust. It is designed to make using
-Safe Rust as ergonomic as possible, but requires extra effort and care when
-writing Unsafe Rust. The rest of the book is largely a discussion of the sort
-of care that must be taken, and what contracts it is expected of Unsafe Rust
-to uphold.
+This is the balance between Safe and Unsafe Rust. The separation is designed to
+make using Safe Rust as ergonomic as possible, but requires extra effort and
+care when writing Unsafe Rust. The rest of this book is largely a discussion
+of the sort of care that must be taken, and what contracts Unsafe Rust must uphold.

-[drop flags]: drop-flags.html
 [conversions]: conversions.html
+[ptr_offset]: ../std/primitive.pointer.html#method.offset

--- a/src/what-unsafe-does.md
+++ b/src/what-unsafe-does.md
@ -0,0 +1,58 @@
+# What Unsafe Rust Can Do
+
+The only things that are different in Unsafe Rust are that you can:
+
+* Dereference raw pointers
+* Call `unsafe` functions (including C functions, compiler intrinsics, and the raw allocator)
+* Implement `unsafe` traits
+* Mutate statics
+
+That's it. The reason these operations are relegated to Unsafe is that misusing
+any of these things will cause the ever dreaded Undefined Behavior. Invoking
+Undefined Behavior gives the compiler full rights to do arbitrarily bad things
+to your program. You definitely *should not* invoke Undefined Behavior.
+
+Unlike C, Undefined Behavior is pretty limited in scope in Rust. All the core
+language cares about is preventing the following things:
+
+* Dereferencing null, dangling, or unaligned pointers
+* Reading [uninitialized memory][]
+* Breaking the [pointer aliasing rules][]
+* Producing invalid primitive values:
+    * dangling/null references
+    * a `bool` that isn't 0 or 1
+    * an undefined `enum` discriminant
+    * a `char` outside the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF]
+    * A non-utf8 `str`
+* Unwinding into another language
+* Causing a [data race][race]
+
+That's it. That's all the causes of Undefined Behavior baked into Rust. Of
+course, unsafe functions and traits are free to declare arbitrary other
+constraints that a program must maintain to avoid Undefined Behavior. For
+instance, the allocator APIs declare that deallocating unallocated memory is
+Undefined Behavior.
+
+However, violations of these constraints generally will just transitively lead to one of
+the above problems. Some additional constraints may also derive from compiler
+intrinsics that make special assumptions about how code can be optimized. For instance,
+Vec and Box make use of intrinsics that require their pointers to be non-null at all times.
+
+Rust is otherwise quite permissive with respect to other dubious operations.
+Rust considers it "safe" to:
+
+* Deadlock
+* Have a [race condition][race]
+* Leak memory
+* Fail to call destructors
+* Overflow integers
+* Abort the program
+* Delete the production database
+
+However any program that actually manages to do such a thing is *probably*
+incorrect. Rust provides lots of tools to make these things rare, but
+these problems are considered impractical to categorically prevent.
+
+[pointer aliasing rules]: references.html
+[uninitialized memory]: uninitialized.html
+[race]: races.html
--- a/src/working-with-unsafe.md
+++ b/src/working-with-unsafe.md
@ -16,7 +16,7 @@ fn index(idx: usize, arr: &[u8]) -> Option<u8> {
 }
 ```

-Clearly, this function is safe. We check that the index is in bounds, and if it
+This function is safe and correct. We check that the index is in bounds, and if it
 is, index into the array in an unchecked manner. But even in such a trivial
 function, the scope of the unsafe block is questionable. Consider changing the
 `<` to a `<=`:
@ -45,13 +45,13 @@ null or containing uninitialized memory. Nothing fundamentally changes. However
 safety *isn't* modular in the sense that programs are inherently stateful and
 your unsafe operations may depend on arbitrary other state.

-Trickier than that is when we get into actual statefulness. Consider a simple
-implementation of `Vec`:
+This non-locality gets much worse when we incorporate actual persistent state.
+Consider a simple implementation of `Vec`:

 ```rust
 use std::ptr;

-// Note this definition is insufficient. See the section on implementing Vec.
+// Note: This definition is naive. See the chapter on implementing Vec.
 pub struct Vec<T> {
    ptr: *mut T,
    len: usize,
@ -59,8 +59,7 @@ pub struct Vec<T> {
 }

 // Note this implementation does not correctly handle zero-sized types.
-// We currently live in a nice imaginary world of only positive fixed-size
-// types.
+// See the chapter on implementing Vec.
 impl<T> Vec<T> {
    pub fn push(&mut self, elem: T) {
        if self.len == self.cap {
@ -72,14 +71,13 @@ impl<T> Vec<T> {
            self.len += 1;
        }
    }
-
    # fn reallocate(&mut self) { }
 }

 # fn main() {}
 ```

-This code is simple enough to reasonably audit and verify. Now consider
+This code is simple enough to reasonably audit and informally verify. Now consider
 adding the following method:

 ```rust,ignore
@ -106,14 +104,12 @@ as Vec.

 It is therefore possible for us to write a completely safe abstraction that
 relies on complex invariants. This is *critical* to the relationship between
-Safe Rust and Unsafe Rust. We have already seen that Unsafe code must trust
-*some* Safe code, but can't trust *generic* Safe code. It can't trust an
-arbitrary implementor of a trait or any function that was passed to it to be
-well-behaved in a way that safe code doesn't care about.
-
-However if unsafe code couldn't prevent client safe code from messing with its
-state in arbitrary ways, safety would be a lost cause. Thankfully, it *can*
-prevent arbitrary code from messing with critical state due to privacy.
+Safe Rust and Unsafe Rust.
+
+We have already seen that Unsafe code must trust *some* Safe code, but shouldn't
+trust *generic* Safe code. Privacy is important to unsafe code for similar reasons:
+it prevents us from having to trust all the safe code in the universe from messing
+with our trusted state.

 Safety lives!