diff --git a/src/README.md b/src/README.md index d577d7b..f1efd91 100644 --- a/src/README.md +++ b/src/README.md @@ -2,7 +2,7 @@ #### The Dark Arts of Advanced and Unsafe Rust Programming -# NOTE: This is a draft document, and may contain serious errors +# NOTE: This is a draft document that discusses several unstable aspects of Rust, and may contain serious errors or outdated information. > Instead of the programs I had hoped for, there came only a shuddering blackness and ineffable loneliness; and I saw at last a fearful truth which no one had @@ -19,20 +19,23 @@ infinitesimal fragments of despair. Should you wish a long and happy career of writing Rust programs, you should turn back now and forget you ever saw this book. It is not necessary. However -if you intend to write unsafe code -- or just want to dig into the guts of the -language -- this book contains invaluable information. - -Unlike [The Book][trpl] we will be assuming considerable prior knowledge. In -particular, you should be comfortable with basic systems programming and Rust. -If you don't feel comfortable with these topics, you should consider [reading -The Book][trpl] first. Though we will not be assuming that you have, and will -take care to occasionally give a refresher on the basics where appropriate. You -can skip straight to this book if you want; just know that we won't be -explaining everything from the ground up. - -To be clear, this book goes into deep detail. We're going to dig into -exception-safety, pointer aliasing, memory models, and even some type-theory. -We will also be spending a lot of time talking about the different kinds -of safety and guarantees. +if you intend to write unsafe code — or just want to dig into the guts of the +language — this book contains lots of useful information. + +Unlike *[The Rust Programming Language][trpl]*, we will be assuming considerable +prior knowledge. In particular, you should be comfortable with basic systems +programming and Rust. If you don't feel comfortable with these topics, you +should consider [reading The Book][trpl] first. That said, we won't assume you +have read it, and we will take care to occasionally give a refresher on the +basics where appropriate. You can skip straight to this book if you want; +just know that we won't be explaining everything from the ground up. + +We're going to dig into exception-safety, pointer aliasing, memory models, +compiler and hardware implementation details, and even some type-theory. +Much text will be devoted to exotic corner cases that no one *should* ever have +to care about, but suddenly become important because we wrote `unsafe`. + +We will also be spending a lot of time talking about the different kinds of +safety and guarantees that programs could care about. [trpl]: ../book/index.html diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 0b34952..a7348a6 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -4,6 +4,7 @@ * [Meet Safe and Unsafe](meet-safe-and-unsafe.md) * [How Safe and Unsafe Interact](safe-unsafe-meaning.md) + * [What Unsafe Can Do](what-unsafe-does.md) * [Working with Unsafe](working-with-unsafe.md) * [Data Layout](data.md) * [repr(Rust)](repr-rust.md) diff --git a/src/meet-safe-and-unsafe.md b/src/meet-safe-and-unsafe.md index 4d0bb70..d7ad923 100644 --- a/src/meet-safe-and-unsafe.md +++ b/src/meet-safe-and-unsafe.md @@ -1,41 +1,51 @@ # Meet Safe and Unsafe -![safe and unsafe](img/safeandunsafe.svg) +![safe and unsafe](img/safeandunsafe.svg) -Programmers in safe "high-level" languages face a fundamental dilemma. On one -hand, it would be *really* great to just say what you want and not worry about -how it's done. On the other hand, that can lead to unacceptably poor -performance. It may be necessary to drop down to less clear or idiomatic -practices to get the performance characteristics you want. Or maybe you just -throw up your hands in disgust and decide to shell out to an implementation in -a less sugary-wonderful *unsafe* language. +It would be great to not have to worry about low-level implementation details. +Who could possibly care how much space the empty tuple occupies? Sadly, it +sometimes matters and we need to worry about it. The most common reason +developers start to care about implementation details is performance, but more +importantly, these details can become a matter of correctness when interfacing +directly with hardware, operating systems, or other languages. -Worse, when you want to talk directly to the operating system, you *have* to -talk to an unsafe language: *C*. C is ever-present and unavoidable. It's the -lingua-franca of the programming world. -Even other safe languages generally expose C interfaces for the world at large! -Regardless of why you're doing it, as soon as your program starts talking to -C it stops being safe. +When implementation details start to matter in a safe programming language, +programmers usually have three options: -With that said, Rust is *totally* a safe programming language. +* fiddle with the code to encourage the compiler/runtime to perform an optimization +* adopt a more unidiomatic or cumbersome design to get the desired implementation +* rewrite the implementation in a language that lets you deal with those details -Well, Rust *has* a safe programming language. Let's step back a bit. +For that last option, the language programmers tend to use is *C*. This is often +necessary to interface with systems that only declare a C interface. -Rust can be thought of as being composed of two programming languages: *Safe -Rust* and *Unsafe Rust*. Safe Rust is For Reals Totally Safe. Unsafe Rust, -unsurprisingly, is *not* For Reals Totally Safe. In fact, Unsafe Rust lets you -do some really, *really* unsafe things. +Unfortunately, C is incredibly unsafe to use (sometimes for good reason), +and this unsafety is magnified when trying to interoperate with another +language. Care must be taken to ensure C and the other language agree on +what's happening, and that they don't step on each other's toes. + +So what does this have to do with Rust? + +Well, unlike C, Rust is a safe programming language. + +But, like C, Rust is an unsafe programming language. + +More accurately, Rust *contains* both a safe and unsafe programming language. + +Rust can be thought of as a combination of two programming languages: *Safe +Rust* and *Unsafe Rust*. Conveniently, these names mean exactly what they say: +Safe Rust is Safe. Unsafe Rust is, well, not. In fact, Unsafe Rust lets us +do some *really* unsafe things. Things the Rust authors will implore you not to +do, but we'll do anyway. Safe Rust is the *true* Rust programming language. If all you do is write Safe Rust, you will never have to worry about type-safety or memory-safety. You will -never endure a null or dangling pointer, or any of that Undefined Behavior -nonsense. - -*That's totally awesome.* +never endure a dangling pointer, a use-after-free, or any other kind of +Undefined Behavior. -The standard library also gives you enough utilities out-of-the-box that you'll -be able to write awesome high-performance applications and libraries in pure -idiomatic Safe Rust. +The standard library also gives you enough utilities out of the box that you'll +be able to write high-performance applications and libraries in pure idiomatic +Safe Rust. But maybe you want to talk to another language. Maybe you're writing a low-level abstraction not exposed by the standard library. Maybe you're @@ -44,57 +54,15 @@ need to do something the type-system doesn't understand and just *frob some dang bits*. Maybe you need Unsafe Rust. Unsafe Rust is exactly like Safe Rust with all the same rules and semantics. -However Unsafe Rust lets you do some *extra* things that are Definitely Not Safe. - -The only things that are different in Unsafe Rust are that you can: - -* Dereference raw pointers -* Call `unsafe` functions (including C functions, intrinsics, and the raw allocator) -* Implement `unsafe` traits -* Mutate statics - -That's it. The reason these operations are relegated to Unsafe is that misusing -any of these things will cause the ever dreaded Undefined Behavior. Invoking -Undefined Behavior gives the compiler full rights to do arbitrarily bad things -to your program. You definitely *should not* invoke Undefined Behavior. - -Unlike C, Undefined Behavior is pretty limited in scope in Rust. All the core -language cares about is preventing the following things: - -* Dereferencing null or dangling pointers -* Reading [uninitialized memory] -* Breaking the [pointer aliasing rules] -* Producing invalid primitive values: - * dangling/null references - * a `bool` that isn't 0 or 1 - * an undefined `enum` discriminant - * a `char` outside the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF] - * A non-utf8 `str` -* Unwinding into another language -* Causing a [data race][race] - -That's it. That's all the causes of Undefined Behavior baked into Rust. Of -course, unsafe functions and traits are free to declare arbitrary other -constraints that a program must maintain to avoid Undefined Behavior. However, -generally violations of these constraints will just transitively lead to one of -the above problems. Some additional constraints may also derive from compiler -intrinsics that make special assumptions about how code can be optimized. - -Rust is otherwise quite permissive with respect to other dubious operations. -Rust considers it "safe" to: - -* Deadlock -* Have a [race condition][race] -* Leak memory -* Fail to call destructors -* Overflow integers -* Abort the program -* Delete the production database - -However any program that actually manages to do such a thing is *probably* -incorrect. Rust provides lots of tools to make these things rare, but -these problems are considered impractical to categorically prevent. - -[pointer aliasing rules]: references.html -[uninitialized memory]: uninitialized.html -[race]: races.html +It just lets you do some *extra* things that are Definitely Not Safe +(which we will define in the next section). + +The value of this separation is that we gain the benefits of using an unsafe +language like C — low level control over implementation details — without most +of the problems that come with trying to integrate it with a completely +different safe language. + +There are still some problems — most notably, we must become aware of properties +that the type system assumes and audit them in any code that interacts with +Unsafe Rust. That's the purpose of this book: to teach you about these assumptions +and how to manage them. diff --git a/src/safe-unsafe-meaning.md b/src/safe-unsafe-meaning.md index 0a655a3..662fbe3 100644 --- a/src/safe-unsafe-meaning.md +++ b/src/safe-unsafe-meaning.md @@ -6,25 +6,29 @@ interact? The separation between Safe Rust and Unsafe Rust is controlled with the `unsafe` keyword, which acts as an interface from one to the other. This is why we can say Safe Rust is a safe language: all the unsafe parts are kept -exclusively behind the boundary. +exclusively behind the `unsafe` boundary. If you wish, you can even toss +`#![forbid(unsafe_code)]` into your code base to statically guarantee that +you're only writing Safe Rust. The `unsafe` keyword has two uses: to declare the existence of contracts the -compiler can't check, and to declare that the adherence of some code to -those contracts has been checked by the programmer. +compiler can't check, and to declare that a programmer has checked that these +contracts have been upheld. You can use `unsafe` to indicate the existence of unchecked contracts on -_functions_ and on _trait declarations_. On functions, `unsafe` means that +_functions_ and _trait declarations_. On functions, `unsafe` means that users of the function must check that function's documentation to ensure they are using it in a way that maintains the contracts the function requires. On trait declarations, `unsafe` means that implementors of the trait must check the trait documentation to ensure their implementation maintains the contracts the trait requires. -You can use `unsafe` on a block to declare that all constraints required -by an unsafe function within the block have been adhered to, and the code -can therefore be trusted. You can use `unsafe` on a trait implementation -to declare that the implementation of that trait has adhered to whatever -contracts the trait's documentation requires. +You can use `unsafe` on a block to declare that all unsafe actions performed +within are verified to uphold the contracts of those operations. For instance, +the index passed to `slice::get_unchecked` is in-bounds. + +You can use `unsafe` on a trait implementation to declare that the implementation +upholds the trait's contract. For instance, that a type implementing `Send` is +really safe to move to another thread. The standard library has a number of unsafe functions, including: @@ -32,11 +36,10 @@ The standard library has a number of unsafe functions, including: memory safety to be freely violated. * `mem::transmute` reinterprets some value as having a given type, bypassing type safety in arbitrary ways (see [conversions] for details). -* Every raw pointer to a sized type has an intrinstic `offset` method that - invokes Undefined Behavior if the passed offset is not "in bounds" as - defined by LLVM. -* All FFI functions are `unsafe` because the other language can do arbitrary - operations that the Rust compiler can't check. +* Every raw pointer to a sized type has an `offset` method that + invokes Undefined Behavior if the passed offset is not ["in bounds"][ptr_offset]. +* All FFI functions are `unsafe` to call because the other language can do + arbitrary operations that the Rust compiler can't check. As of Rust 1.0 there are exactly two unsafe traits: @@ -45,41 +48,60 @@ As of Rust 1.0 there are exactly two unsafe traits: * `Sync` is a marker trait that promises threads can safely share implementors through a shared reference. -Much of the Rust standard library also uses Unsafe Rust internally, although -these implementations are rigorously manually checked, and the Safe Rust -interfaces provided on top of these implementations can be assumed to be safe. +Much of the Rust standard library also uses Unsafe Rust internally. These +implementations have generally been rigorously manually checked, so the Safe Rust +interfaces built on top of these implementations can be assumed to be safe. The need for all of this separation boils down a single fundamental property of Safe Rust: **No matter what, Safe Rust can't cause Undefined Behavior.** -The design of the safe/unsafe split means that Safe Rust inherently has to -trust that any Unsafe Rust it touches has been written correctly (meaning -the Unsafe Rust actually maintains whatever contracts it is supposed to -maintain). On the other hand, Unsafe Rust has to be very careful about -trusting Safe Rust. +The design of the safe/unsafe split means that there is an asymmetric trust +relationship between Safe and Unsafe Rust. Safe Rust inherently has to +trust that any Unsafe Rust it touches has been written correctly. +On the other hand, Unsafe Rust has to be very careful about trusting Safe Rust. As an example, Rust has the `PartialOrd` and `Ord` traits to differentiate -between types which can "just" be compared, and those that provide a total -ordering (where every value of the type is either equal to, greater than, -or less than any other value of the same type). The sorted map type -`BTreeMap` doesn't make sense for partially-ordered types, and so it -requires that any key type for it implements the `Ord` trait. However, -`BTreeMap` has Unsafe Rust code inside of its implementation, and this -Unsafe Rust code cannot assume that any `Ord` implementation it gets makes -sense. The unsafe portions of `BTreeMap`'s internals have to be careful to -maintain all necessary contracts, even if a key type's `Ord` implementation -does not implement a total ordering. - -Unsafe Rust cannot automatically trust Safe Rust. When writing Unsafe Rust, -you must be careful to only rely on specific Safe Rust code, and not make -assumptions about potential future Safe Rust code providing the same -guarantees. - -This is the problem that `unsafe` traits exist to resolve. The `BTreeMap` -type could theoretically require that keys implement a new trait called -`UnsafeOrd`, rather than `Ord`, that might look like this: +between types which can "just" be compared, and those that provide a "total" +ordering (which basically means that comparison behaves reasonably). + +`BTreeMap` doesn't really make sense for partially-ordered types, and so it +requires that its keys implement `Ord`. However, `BTreeMap` has Unsafe Rust code +inside of its implementation. Because it would be unacceptable for a sloppy `Ord` +implementation (which is Safe to write) to cause Undefined Behavior, the Unsafe +code in BTreeMap must be written to be robust against `Ord` implementations which +aren't actually total — even though that's the whole point of requiring `Ord`. + +The Unsafe Rust code just can't trust the Safe Rust code to be written correctly. +That said, `BTreeMap` will still behave completely erratically if you feed in +values that don't have a total ordering. It just won't ever cause Undefined +Behavior. + +One may wonder, if `BTreeMap` cannot trust `Ord` because it's Safe, why can it +trust *any* Safe code? For instance `BTreeMap` relies on integers and slices to +be implemented correctly. Those are safe too, right? + +The difference is one of scope. When `BTreeMap` relies on integers and slices, +it's relying on one very specific implementation. This is a measured risk that +can be weighed against the benefit. In this case there's basically zero risk; +if integers and slices are broken, *everyone* is broken. Also, they're maintained +by the same people who maintain `BTreeMap`, so it's easy to keep tabs on them. + +On the other hand, `BTreeMap`'s key type is generic. Trusting its `Ord` implementation +means trusting every `Ord` implementation in the past, present, and future. +Here the risk is high: someone somewhere is going to make a mistake and mess up +their `Ord` implementation, or even just straight up lie about providing a total +ordering because "it seems to work". When that happens, `BTreeMap` needs to be +prepared. + +The same logic applies to trusting a closure that's passed to you to behave +correctly. + +This problem of unbounded generic trust is the problem that `unsafe` traits +exist to resolve. The `BTreeMap` type could theoretically require that keys +implement a new trait called `UnsafeOrd`, rather than `Ord`, that might look +like this: ```rust use std::cmp::Ordering; @@ -92,32 +114,32 @@ unsafe trait UnsafeOrd { Then, a type would use `unsafe` to implement `UnsafeOrd`, indicating that they've ensured their implementation maintains whatever contracts the trait expects. In this situation, the Unsafe Rust in the internals of -`BTreeMap` could trust that the key type's `UnsafeOrd` implementation is -correct. If it isn't, it's the fault of the unsafe trait implementation -code, which is consistent with Rust's safety guarantees. +`BTreeMap` would be justified in trusting that the key type's `UnsafeOrd` +implementation is correct. If it isn't, it's the fault of the unsafe trait +implementation, which is consistent with Rust's safety guarantees. The decision of whether to mark a trait `unsafe` is an API design choice. -Rust has traditionally avoided marking traits unsafe because it makes Unsafe -Rust pervasive, which is not desirable. `Send` and `Sync` are marked unsafe +Rust has traditionally avoided doing this because it makes Unsafe +Rust pervasive, which isn't desirable. `Send` and `Sync` are marked unsafe because thread safety is a *fundamental property* that unsafe code can't possibly hope to defend against in the way it could defend against a bad `Ord` implementation. The decision of whether to mark your own traits `unsafe` -depends on the same sort of consideration. If `unsafe` code cannot reasonably +depends on the same sort of consideration. If `unsafe` code can't reasonably expect to defend against a bad implementation of the trait, then marking the trait `unsafe` is a reasonable choice. -As an aside, while `Send` and `Sync` are `unsafe` traits, they are +As an aside, while `Send` and `Sync` are `unsafe` traits, they are *also* automatically implemented for types when such derivations are provably safe to do. `Send` is automatically derived for all types composed only of values whose types also implement `Send`. `Sync` is automatically derived for all -types composed only of values whose types also implement `Sync`. +types composed only of values whose types also implement `Sync`. This minimizes +the pervasive unsafety of making these two traits `unsafe`. -This is the dance of Safe Rust and Unsafe Rust. It is designed to make using -Safe Rust as ergonomic as possible, but requires extra effort and care when -writing Unsafe Rust. The rest of the book is largely a discussion of the sort -of care that must be taken, and what contracts it is expected of Unsafe Rust -to uphold. +This is the balance between Safe and Unsafe Rust. The separation is designed to +make using Safe Rust as ergonomic as possible, but requires extra effort and +care when writing Unsafe Rust. The rest of this book is largely a discussion +of the sort of care that must be taken, and what contracts Unsafe Rust must uphold. -[drop flags]: drop-flags.html [conversions]: conversions.html +[ptr_offset]: ../std/primitive.pointer.html#method.offset diff --git a/src/what-unsafe-does.md b/src/what-unsafe-does.md new file mode 100644 index 0000000..91f9145 --- /dev/null +++ b/src/what-unsafe-does.md @@ -0,0 +1,58 @@ +# What Unsafe Rust Can Do + +The only things that are different in Unsafe Rust are that you can: + +* Dereference raw pointers +* Call `unsafe` functions (including C functions, compiler intrinsics, and the raw allocator) +* Implement `unsafe` traits +* Mutate statics + +That's it. The reason these operations are relegated to Unsafe is that misusing +any of these things will cause the ever dreaded Undefined Behavior. Invoking +Undefined Behavior gives the compiler full rights to do arbitrarily bad things +to your program. You definitely *should not* invoke Undefined Behavior. + +Unlike C, Undefined Behavior is pretty limited in scope in Rust. All the core +language cares about is preventing the following things: + +* Dereferencing null, dangling, or unaligned pointers +* Reading [uninitialized memory][] +* Breaking the [pointer aliasing rules][] +* Producing invalid primitive values: + * dangling/null references + * a `bool` that isn't 0 or 1 + * an undefined `enum` discriminant + * a `char` outside the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF] + * A non-utf8 `str` +* Unwinding into another language +* Causing a [data race][race] + +That's it. That's all the causes of Undefined Behavior baked into Rust. Of +course, unsafe functions and traits are free to declare arbitrary other +constraints that a program must maintain to avoid Undefined Behavior. For +instance, the allocator APIs declare that deallocating unallocated memory is +Undefined Behavior. + +However, violations of these constraints generally will just transitively lead to one of +the above problems. Some additional constraints may also derive from compiler +intrinsics that make special assumptions about how code can be optimized. For instance, +Vec and Box make use of intrinsics that require their pointers to be non-null at all times. + +Rust is otherwise quite permissive with respect to other dubious operations. +Rust considers it "safe" to: + +* Deadlock +* Have a [race condition][race] +* Leak memory +* Fail to call destructors +* Overflow integers +* Abort the program +* Delete the production database + +However any program that actually manages to do such a thing is *probably* +incorrect. Rust provides lots of tools to make these things rare, but +these problems are considered impractical to categorically prevent. + +[pointer aliasing rules]: references.html +[uninitialized memory]: uninitialized.html +[race]: races.html diff --git a/src/working-with-unsafe.md b/src/working-with-unsafe.md index 5724f3d..09056c5 100644 --- a/src/working-with-unsafe.md +++ b/src/working-with-unsafe.md @@ -16,7 +16,7 @@ fn index(idx: usize, arr: &[u8]) -> Option { } ``` -Clearly, this function is safe. We check that the index is in bounds, and if it +This function is safe and correct. We check that the index is in bounds, and if it is, index into the array in an unchecked manner. But even in such a trivial function, the scope of the unsafe block is questionable. Consider changing the `<` to a `<=`: @@ -45,13 +45,13 @@ null or containing uninitialized memory. Nothing fundamentally changes. However safety *isn't* modular in the sense that programs are inherently stateful and your unsafe operations may depend on arbitrary other state. -Trickier than that is when we get into actual statefulness. Consider a simple -implementation of `Vec`: +This non-locality gets much worse when we incorporate actual persistent state. +Consider a simple implementation of `Vec`: ```rust use std::ptr; -// Note this definition is insufficient. See the section on implementing Vec. +// Note: This definition is naive. See the chapter on implementing Vec. pub struct Vec { ptr: *mut T, len: usize, @@ -59,8 +59,7 @@ pub struct Vec { } // Note this implementation does not correctly handle zero-sized types. -// We currently live in a nice imaginary world of only positive fixed-size -// types. +// See the chapter on implementing Vec. impl Vec { pub fn push(&mut self, elem: T) { if self.len == self.cap { @@ -72,14 +71,13 @@ impl Vec { self.len += 1; } } - # fn reallocate(&mut self) { } } # fn main() {} ``` -This code is simple enough to reasonably audit and verify. Now consider +This code is simple enough to reasonably audit and informally verify. Now consider adding the following method: ```rust,ignore @@ -106,14 +104,12 @@ as Vec. It is therefore possible for us to write a completely safe abstraction that relies on complex invariants. This is *critical* to the relationship between -Safe Rust and Unsafe Rust. We have already seen that Unsafe code must trust -*some* Safe code, but can't trust *generic* Safe code. It can't trust an -arbitrary implementor of a trait or any function that was passed to it to be -well-behaved in a way that safe code doesn't care about. - -However if unsafe code couldn't prevent client safe code from messing with its -state in arbitrary ways, safety would be a lost cause. Thankfully, it *can* -prevent arbitrary code from messing with critical state due to privacy. +Safe Rust and Unsafe Rust. + +We have already seen that Unsafe code must trust *some* Safe code, but shouldn't +trust *generic* Safe code. Privacy is important to unsafe code for similar reasons: +it prevents us from having to trust all the safe code in the universe from messing +with our trusted state. Safety lives!