rewrite intro

10 years ago · 165132a1ad
parent 5f3cec4a00
commit 165132a1ad
5 changed files with 202 additions and 154 deletions
--- a/README.md
+++ b/README.md
@ -1,20 +1,39 @@
-% The Unsafe Rust Programming Language
+% The Advanced Rust Programming Language

 # NOTE: This is a draft document, and may contain serious errors

-**This document is about advanced functionality and low-level development practices
-in the Rust Programming Language. Most of the things discussed won't matter
-to the average Rust programmer. However if you wish to correctly write unsafe
-code in Rust, this text contains invaluable information.**
+So you've played around with Rust a bit. You've written a few simple programs and
+you think you grok the basics. Maybe you've even read through
+*[The Rust Programming Language][trpl]*. Now you want to get neck-deep in all the
+nitty-gritty details of the language. You want to know those weird corner-cases.
+You want to know what the heck `unsafe` really means, and how to properly use it.
+This is the book for you.

-The Unsafe Rust Programming Language (TURPL) seeks to complement
-[The Rust Programming Language Book][trpl] (TRPL).
-Where TRPL introduces the language and teaches the basics, TURPL dives deep into
-the specification of the language, and all the nasty bits necessary to write
-Unsafe Rust. TURPL does not assume you have read TRPL, but does assume you know
-the basics of the language and systems programming. We will not explain the
-stack or heap. We will not explain the basic syntax.
+To be clear, this book goes into *serious* detail. We're going to dig into
+exception-safety and pointer aliasing. We're going to talk about memory
+models. We're even going to do some type-theory. This is stuff that you
+absolutely *don't* need to know to write fast and safe Rust programs.
+You could probably close this book *right now* and still have a productive
+and happy career in Rust.

+However if you intend to write unsafe code -- or just *really* want to dig into
+the guts of the language -- this book contains *invaluable* information.

+Unlike *The Rust Programming Language* we *will* be assuming considerable prior
+knowledge. In particular, you should be comfortable with:

-[trpl]: https://doc.rust-lang.org/book/
+* Basic Systems Programming:
+    * Pointers
+    * [The stack and heap][]
+    * The memory hierarchy (caches)
+    * Threads
+
+* [Basic Rust][]
+
+Due to the nature of advanced Rust programming, we will be spending a lot of time
+talking about *safety* and *guarantees*. In particular, a significant portion of
+the book will be dedicated to correctly writing and understanding Unsafe Rust.
+
+[trpl]: https://doc.rust-lang.org/book/
+[The stack and heap]: https://doc.rust-lang.org/book/the-stack-and-the-heap.html
+[Basic Rust]: https://doc.rust-lang.org/book/syntax-and-semantics.html
--- a/SUMMARY.md
+++ b/SUMMARY.md
@ -1,7 +1,7 @@
 # Summary

 * [Meet Safe and Unsafe](meet-safe-and-unsafe.md)
-	* [What Do Safe and Unsafe Mean](safe-unsafe-meaning.md)
+	* [How Safe and Unsafe Interact](safe-unsafe-meaning.md)
 	* [Working with Unsafe](working-with-unsafe.md)
 * [Data Layout](data.md)
 	* [repr(Rust)](repr-rust.md)
--- a/meet-safe-and-unsafe.md
+++ b/meet-safe-and-unsafe.md
@ -1,82 +1,98 @@
 % Meet Safe and Unsafe

-Safe and Unsafe are Rust's chief engineers.
-
-TODO: ADORABLE PICTURES OMG
-
-Unsafe handles all the dangerous internal stuff. They build the foundations
-and handle all the dangerous materials. By all accounts, Unsafe is really a bit
-unproductive, because the nature of their work means that they have to spend a
-lot of time checking and double-checking everything. What if there's an earthquake
-on a leap year? Are we ready for that? Unsafe better be, because if they get
-*anything* wrong, everything will blow up! What Unsafe brings to the table is
-*quality*, not quantity. Still, nothing would ever get done if everything was
-built to Unsafe's standards!
-
-That's where Safe comes in. Safe has to handle *everything else*. Since Safe needs
-to *get work done*, they've grown to be fairly careless and clumsy! Safe doesn't worry
-about all the crazy eventualities that Unsafe does, because life is too short to deal
-with leap-year-earthquakes. Of course, this means there's some jobs that Safe just
-can't handle. Safe is all about quantity over quality.
-
-Unsafe loves Safe to bits, but knows that they *can never trust them to do the
-right thing*. Still, Unsafe acknowledges that not every problem needs quite the
-attention to detail that they apply. Indeed, Unsafe would *love* if Safe could do
-*everything* for them. To accomplish this, Unsafe spends most of their time
-building *safe abstractions*. These abstractions handle all the nitty-gritty
-details for Safe, and choose good defaults so that the simplest solution (which
-Safe will inevitably use) is usually the *right* one. Once a safe abstraction is
-built, Unsafe ideally needs to never work on it again, and Safe can blindly use
-it in all their work.
-
-Unsafe's attention to detail means that all the things that they mark as ok for
-Safe to use can be combined in arbitrarily ridiculous ways, and all the rules
-that Unsafe is forced to uphold will never be violated. If they *can* be violated
-by Safe, that means *Unsafe*'s the one in the wrong. Safe can work carelessly,
-knowing that if anything blows up, it's not *their* fault. Safe can also call in
-Unsafe at any time if there's a hard problem they can't quite work out, or if they
-can't meet the client's quality demands. Of course, Unsafe will beg and plead Safe
-to try their latest safe abstraction first!
-
-In addition to being adorable, Safe and Unsafe are what makes Rust possible.
-Rust can be thought of as two different languages: Safe Rust, and Unsafe Rust.
-Any time someone opines the guarantees of Rust, they are almost surely talking about
-Safe. However Safe is not sufficient to write every program. For that,
-we need the Unsafe superset.
-
-Most fundamentally, writing bindings to other languages
-(such as the C exposed by your operating system) is never going to be safe. Rust
-can't control what other languages do to program execution! However Unsafe is
-also necessary to construct fundamental abstractions where the type system is not
-sufficient to automatically prove what you're doing is sound.
-
-Indeed, the Rust standard library is implemented in Rust, and it makes substantial
-use of Unsafe for implementing IO, memory allocation, collections,
-synchronization, and other low-level computational primitives.
-
-Upon hearing this, many wonder why they would not simply just use C or C++ in place of
-Rust (or just use a "real" safe language). If we're going to do unsafe things, why not
-lean on these much more established languages?
-
-The most important difference between C++ and Rust is a matter of defaults:
-Rust is 100% safe by default. Even when you *opt out* of safety in Rust, it is a modular
-action. In deciding to work with unchecked uninitialized memory, this does not
-suddenly make dangling or null pointers a problem. When using unchecked indexing on `x`,
-one does not have to suddenly worry about indexing out of bounds on `y`.
-C and C++, by contrast, have pervasive unsafety baked into the language. Even the
-modern best practices like `unique_ptr` have various safety pitfalls.
-
-It cannot be emphasized enough that Unsafe should be regarded as an exceptional
-thing, not a normal one. Unsafe is often the domain of *fundamental libraries*: anything that needs
-to make FFI bindings or define core abstractions. These fundamental libraries then expose
-a safe interface for intermediate libraries and applications to build upon. And these
-safe interfaces make an important promise: if your application segfaults, it's not your
-fault. *They* have a bug.
-
-And really, how is that different from *any* safe language? Python, Ruby, and Java libraries
-can internally do all sorts of nasty things. The languages themselves are no
-different. Safe languages *regularly* have bugs that cause critical vulnerabilities.
-The fact that Rust is written with a healthy spoonful of Unsafe is no different.
-However it *does* mean that Rust doesn't need to fall back to the pervasive unsafety of
-C to do the nasty things that need to get done.
+Programmers in safe "high-level" languages face a fundamental dilemma. On one
+hand, it would be *really* great to just say what you want and not worry about
+how it's done. On the other hand, that can lead to some *really* poor
+performance. It may be necessary to drop down to less clear or idiomatic
+practices to get the performance characteristics you want. Or maybe you just
+throw up your hands in disgust and decide to shell out to an implementation in
+a less sugary-wonderful *unsafe* language.

+Worse, when you want to talk directly to the operating system, you *have* to
+talk to an unsafe language: *C*. C is ever-present and unavoidable. It's the
+lingua-franca of the programming world.
+Even other safe languages generally expose C interfaces for the world at large!
+Regardless of *why* you're doing it, as soon as your program starts talking to
+C it stops being safe.
+
+With that said, Rust is *totally* a safe programming language.
+
+Well, Rust *has* a safe programming language. Let's step back a bit.
+
+Rust can be thought of as being composed of two
+programming languages: *Safe* and *Unsafe*. Safe is For Reals Totally Safe.
+Unsafe, unsurprisingly, is *not* For Reals Totally Safe. In fact, Unsafe lets
+you do some really crazy unsafe things.
+
+Safe is *the* Rust programming language. If all you do is write Safe Rust,
+you will never have to worry about type-safety or memory-safety. You will never
+endure a null or dangling pointer, or any of that Undefined Behaviour nonsense.
+
+*That's totally awesome*.
+
+The standard library also gives you enough utilities out-of-the-box that you'll
+be able to write awesome high-performance applications and libraries in pure
+idiomatic Safe Rust.
+
+But maybe you want to talk to another language. Maybe you're writing a
+low-level abstraction not exposed by the standard library. Maybe you're
+*writing* the standard library (which is written entirely in Rust). Maybe you
+need to do something the type-system doesn't understand and just *frob some dang
+bits*. Maybe you need Unsafe Rust.
+
+Unsafe Rust is exactly like Safe Rust with *all* the same rules and semantics.
+However Unsafe Rust lets you do some *extra* things that are Definitely Not Safe.
+
+The only things that are different in Unsafe Rust are that you can:
+
+* Dereference raw pointers
+* Call `unsafe` functions (including C functions, intrinsics, and the raw allocator)
+* Implement `unsafe` traits
+* Mutate statics
+
+That's it. The reason these operations are relegated to Unsafe is that misusing
+any of these things will cause the ever dreaded Undefined Behaviour. Invoking
+Undefined Behaviour gives the compiler full rights to do arbitrarily bad things
+to your program. You definitely *should not* invoke Undefined Behaviour.
+
+Unlike C, Undefined Behaviour is pretty limited in scope in Rust. All the core
+language cares about is preventing the following things:
+
+* Dereferencing null or dangling pointers
+* Reading [uninitialized memory][]
+* Breaking the [pointer aliasing rules][]
+* Producing invalid primitive values:
+    * dangling/null references
+    * a `bool` that isn't 0 or 1
+    * an undefined `enum` discriminant
+    * a `char` outside the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF]
+    * A non-utf8 `str`
+* Unwinding into another language
+* Causing a [data race][race]
+* Double-dropping a value
+
+That's it. That's all the Undefined Behaviour baked into Rust. Of course, unsafe
+functions and traits are free to declare arbitrary other constraints that a
+program must maintain to avoid Undefined Behaviour. However these are generally
+just things that will transitively lead to one of the above problems. Some
+additional constraints may also derive from compiler intrinsics that make special
+assumptions about how code can be optimized.
+
+Rust is otherwise quite permissive with respect to other dubious operations. Rust
+considers it "safe" to:
+
+* Deadlock
+* Have a [race condition][race]
+* Leak memory
+* Fail to call destructors
+* Overflow integers
+* Abort the program
+* Delete the production database
+
+However any program that actually manages to do such a thing is *probably*
+incorrect. Rust provides lots of tools to make these things rare, but
+these problems are considered impractical to categorically prevent.
+
+[pointer aliasing rules]: references.html
+[uninitialized memory]: uninitialized.html
+[race]: races.html
--- a/safe-unsafe-meaning.md
+++ b/safe-unsafe-meaning.md
@ -1,38 +1,17 @@
-% What do Safe and Unsafe really mean?
-
-Rust cares about preventing the following things:
-
-* Dereferencing null or dangling pointers
-* Reading [uninitialized memory][]
-* Breaking the [pointer aliasing rules][]
-* Producing invalid primitive values:
-    * dangling/null references
-    * a `bool` that isn't 0 or 1
-    * an undefined `enum` discriminant
-    * a `char` larger than char::MAX (TODO: check if stronger restrictions apply)
-    * A non-utf8 `str`
-* Unwinding into another language
-* Causing a [data race][]
-* Invoking Misc. Undefined Behaviour (in e.g. compiler intrinsics)
-
-That's it. That's all the Undefined Behaviour in Rust. Libraries are free to
-declare arbitrary requirements if they could transitively cause memory safety
-issues, but it all boils down to the above actions. Rust is otherwise
-quite permisive with respect to other dubious operations. Rust considers it
-"safe" to:
-
-* Deadlock
-* Have a Race Condition
-* Leak memory
-* Fail to call destructors
-* Overflow integers
-* Delete the production database
-
-However any program that does such a thing is *probably* incorrect. Rust
-provides lots of tools to make doing these things rare, but these problems are
-considered impractical to categorically prevent.
-
-Rust models the seperation between Safe and Unsafe with the `unsafe` keyword.
+% How Safe and Unsafe Interact
+
+So what's the relationship between Safe and Unsafe? How do they interact?
+
+Rust models the seperation between Safe and Unsafe with the `unsafe` keyword, which
+can be thought as a sort of *foreign function interface* (FFI) between Safe and Unsafe.
+This is the magic behind why we can say Safe is a safe language: all the scary unsafe
+bits are relagated *exclusively* to FFI *just like every other safe language*.
+
+However because one language is a subset of the other, the two can be cleanly
+intermixed as long as the boundary between Safe and Unsafe is denoted with the
+`unsafe` keyword. No need to write headers, initialize runtimes, or any of that
+other FFI boiler-plate.
+
 There are several places `unsafe` can appear in Rust today, which can largely be
 grouped into two categories:

@ -44,7 +23,7 @@ you to write `unsafe` elsewhere:
      the danger.
    * On trait declarations, `unsafe` is declaring that *implementing* the trait
      is an unsafe operation, as it has contracts that other unsafe code is free to
-      trust blindly.
+      trust blindly. (More on this below.)

 * I am declaring that I have, to the best of my knowledge, adhered to the
 unchecked contracts:
@ -55,14 +34,14 @@ unchecked contracts:

 There is also `#[unsafe_no_drop_flag]`, which is a special case that exists for
 historical reasons and is in the process of being phased out. See the section on
-[destructors][] for details.
+[drop flags][] for details.

 Some examples of unsafe functions:

 * `slice::get_unchecked` will perform unchecked indexing, allowing memory
  safety to be freely violated.
 * `ptr::offset` is an intrinsic that invokes Undefined Behaviour if it is
-  not "in bounds" as defined by LLVM (see the lifetimes section for details).
+  not "in bounds" as defined by LLVM.
 * `mem::transmute` reinterprets some value as having the given type,
  bypassing type safety in arbitrary ways. (see [conversions][] for details)
 * All FFI functions are `unsafe` because they can do arbitrary things.
@ -72,14 +51,34 @@ Some examples of unsafe functions:
 As of Rust 1.0 there are exactly two unsafe traits:

 * `Send` is a marker trait (it has no actual API) that promises implementors
-  are safe to send to another thread.
+  are safe to send (move) to another thread.
 * `Sync` is a marker trait that promises that threads can safely share
  implementors through a shared reference.

-The need for unsafe traits boils down to the fundamental lack of trust that Unsafe
-has for Safe. All safe traits are free to declare arbitrary contracts, but because
-implementing them is a job for Safe, Unsafe can't trust those contracts to actually
-be upheld.
+The need for unsafe traits boils down to the fundamental property of safe code:
+
+**No matter how completely awful Safe code is, it can't cause Undefined
+Behaviour.**
+
+This means that Unsafe, **the royal vanguard of Undefined Behaviour**, has to be
+*super paranoid* about generic safe code. Unsafe is free to trust *specific* safe
+code (or else you would degenerate into infinite spirals of paranoid despair).
+It is generally regarded as ok to trust the standard library to be correct, as
+std is effectively an extension of the language (and you *really* just have to trust
+the language). If `std` fails to uphold the guarantees it declares, then it's
+basically a language bug.
+
+That said, it would be best to minimize *needlessly* relying on properties of
+concrete safe code. Bugs happen! Of course, I must reinforce that this is only
+a concern for Unsafe code. Safe code can't blindly trust anyone and everyone
+as far as basic memory-safety is concerned.
+
+On the other hand, safe traits are free to declare arbitrary contracts, but because
+implementing them is Safe, Unsafe can't trust those contracts to actually
+be upheld. This is different from the concrete case because *anyone* can
+randomly implement the interface. There is something fundamentally different
+about trusting a *particular* piece of code to be correct, and trusting *all the
+code that will ever be written* to be correct.

 For instance Rust has `PartialOrd` and `Ord` traits to try to differentiate
 between types which can "just" be compared, and those that actually implement a
@ -99,14 +98,13 @@ destructors will be successfully called! Hooray!
 However BTreeMap is implemented using a modest spoonful of Unsafe (most collections
 are). That means that it is not necessarily *trivially true* that a bad Ord
 implementation will make BTreeMap behave safely. Unsafe must be sure not to rely
-on Ord *where safety is at stake*, because Ord is provided by Safe, and memory
-safety is not Safe's responsibility to uphold. *It must be impossible for Safe
-code to violate memory safety*.
+on Ord *where safety is at stake*. Ord is provided by Safe, and safety is not
+Safe's responsibility to uphold.

 But wouldn't it be grand if there was some way for Unsafe to trust *some* trait
 contracts *somewhere*? This is the problem that unsafe traits tackle: by marking
 *the trait itself* as unsafe *to implement*, Unsafe can trust the implementation
-to be correct (because Unsafe can trust themself).
+to be correct.

 Rust has traditionally avoided making traits unsafe because it makes Unsafe
 pervasive, which is not desirable. Send and Sync are unsafe is because
@ -114,11 +112,12 @@ thread safety is a *fundamental property* that Unsafe cannot possibly hope to
 defend against in the same way it would defend against a bad Ord implementation.
 The only way to possibly defend against thread-unsafety would be to *not use
 threading at all*. Making every operation atomic isn't even sufficient, because
-it's possible for complex invariants between disjoint locations in memory.
+it's possible for complex invariants to exist between disjoint locations in
+memory. For instance, the pointer and capacity of a Vec must be in sync.

 Even concurrent paradigms that are traditionally regarded as Totally Safe like
 message passing implicitly rely on some notion of thread safety -- are you
-really message-passing if you send a *pointer*? Send and Sync therefore require
+really message-passing if you pass a *pointer*? Send and Sync therefore require
 some *fundamental* level of trust that Safe code can't provide, so they must be
 unsafe to implement. To help obviate the pervasive unsafety that this would
 introduce, Send (resp. Sync) is *automatically* derived for all types composed only
@ -128,8 +127,6 @@ primitives).



-[pointer aliasing rules]: lifetimes.html#references
-[uninitialized memory]: uninitialized.html
-[data race]: concurrency.html
-[destructors]: raii.html
-[conversions]: conversions.html
+
+[drop flags]: drop-flags.html
+[conversions]: conversions.html
--- a/working-with-unsafe.md
+++ b/working-with-unsafe.md
@ -1,11 +1,11 @@
 % Working with Unsafe

-Rust generally only gives us the tools to talk about safety in a scoped and
-binary manner. Unfortunately reality is significantly more complicated than that.
+Rust generally only gives us the tools to talk about Unsafe in a scoped and
+binary manner. Unfortunately, reality is significantly more complicated than that.
 For instance, consider the following toy function:

 ```rust
-fn do_idx(idx: usize, arr: &[u8]) -> Option<u8> {
+pub fn index(idx: usize, arr: &[u8]) -> Option<u8> {
    if idx < arr.len() {
        unsafe {
            Some(*arr.get_unchecked(idx))
@ -22,7 +22,7 @@ function, the scope of the unsafe block is questionable. Consider changing the
 `<` to a `<=`:

 ```rust
-fn do_idx(idx: usize, arr: &[u8]) -> Option<u8> {
+pub fn index(idx: usize, arr: &[u8]) -> Option<u8> {
    if idx <= arr.len() {
        unsafe {
            Some(*arr.get_unchecked(idx))
@ -45,7 +45,7 @@ implementation of `Vec`:

 ```rust
 // Note this defintion is insufficient. See the section on lifetimes.
-struct Vec<T> {
+pub struct Vec<T> {
    ptr: *mut T,
    len: usize,
    cap: usize,
@ -55,7 +55,7 @@ struct Vec<T> {
 // We currently live in a nice imaginary world of only positive fixed-size
 // types.
 impl<T> Vec<T> {
-    fn push(&mut self, elem: T) {
+    pub fn push(&mut self, elem: T) {
        if self.len == self.cap {
            // not important for this example
            self.reallocate();
@ -80,9 +80,25 @@ adding the following method:

 This code is safe, but it is also completely unsound. Changing the capacity
 violates the invariants of Vec (that `cap` reflects the allocated space in the
-Vec). This is not something the rest of `Vec` can guard against. It *has* to
+Vec). This is not something the rest of Vec can guard against. It *has* to
 trust the capacity field because there's no way to verify it.

 `unsafe` does more than pollute a whole function: it pollutes a whole *module*.
 Generally, the only bullet-proof way to limit the scope of unsafe code is at the
 module boundary with privacy.
+
+However this works *perfectly*. The existence of `make_room` is *not* a
+problem for the soundness of Vec because we didn't mark it as public. Only the
+module that defines this function can call it. Also, `make_room` directly
+accesses the private fields of Vec, so it can only be written in the same module
+as Vec.
+
+It is therefore possible for us to write a completely safe abstraction that
+relies on complex invariants. This is *critical* to the relationship between
+Safe Rust and Unsafe Rust. We have already seen that Unsafe code must trust
+*some* Safe code, but can't trust *arbitrary* Safe code. However if Unsafe
+couldn't prevent client Safe code from messing with its state in arbitrary ways,
+safety would be a lost cause.
+
+Safety lives!
+