Merge pull request #28 from Gankro/cleanup

Cleanup the first chapter
pull/38/head
Steve Klabnik 8 years ago committed by GitHub
commit a4322ccb28

@ -2,7 +2,7 @@
#### The Dark Arts of Advanced and Unsafe Rust Programming
# NOTE: This is a draft document, and may contain serious errors
# NOTE: This is a draft document that discusses several unstable aspects of Rust, and may contain serious errors or outdated information.
> Instead of the programs I had hoped for, there came only a shuddering blackness
and ineffable loneliness; and I saw at last a fearful truth which no one had
@ -19,20 +19,23 @@ infinitesimal fragments of despair.
Should you wish a long and happy career of writing Rust programs, you should
turn back now and forget you ever saw this book. It is not necessary. However
if you intend to write unsafe code -- or just want to dig into the guts of the
language -- this book contains invaluable information.
Unlike [The Book][trpl] we will be assuming considerable prior knowledge. In
particular, you should be comfortable with basic systems programming and Rust.
If you don't feel comfortable with these topics, you should consider [reading
The Book][trpl] first. Though we will not be assuming that you have, and will
take care to occasionally give a refresher on the basics where appropriate. You
can skip straight to this book if you want; just know that we won't be
explaining everything from the ground up.
To be clear, this book goes into deep detail. We're going to dig into
exception-safety, pointer aliasing, memory models, and even some type-theory.
We will also be spending a lot of time talking about the different kinds
of safety and guarantees.
if you intend to write unsafe code — or just want to dig into the guts of the
language — this book contains lots of useful information.
Unlike *[The Rust Programming Language][trpl]*, we will be assuming considerable
prior knowledge. In particular, you should be comfortable with basic systems
programming and Rust. If you don't feel comfortable with these topics, you
should consider [reading The Book][trpl] first. That said, we won't assume you
have read it, and we will take care to occasionally give a refresher on the
basics where appropriate. You can skip straight to this book if you want;
just know that we won't be explaining everything from the ground up.
We're going to dig into exception-safety, pointer aliasing, memory models,
compiler and hardware implementation details, and even some type-theory.
Much text will be devoted to exotic corner cases that no one *should* ever have
to care about, but suddenly become important because we wrote `unsafe`.
We will also be spending a lot of time talking about the different kinds of
safety and guarantees that programs could care about.
[trpl]: ../book/index.html

@ -4,6 +4,7 @@
* [Meet Safe and Unsafe](meet-safe-and-unsafe.md)
* [How Safe and Unsafe Interact](safe-unsafe-meaning.md)
* [What Unsafe Can Do](what-unsafe-does.md)
* [Working with Unsafe](working-with-unsafe.md)
* [Data Layout](data.md)
* [repr(Rust)](repr-rust.md)

@ -1,41 +1,51 @@
# Meet Safe and Unsafe
![safe and unsafe](img/safeandunsafe.svg)
![safe and unsafe](img/safeandunsafe.svg)
Programmers in safe "high-level" languages face a fundamental dilemma. On one
hand, it would be *really* great to just say what you want and not worry about
how it's done. On the other hand, that can lead to unacceptably poor
performance. It may be necessary to drop down to less clear or idiomatic
practices to get the performance characteristics you want. Or maybe you just
throw up your hands in disgust and decide to shell out to an implementation in
a less sugary-wonderful *unsafe* language.
It would be great to not have to worry about low-level implementation details.
Who could possibly care how much space the empty tuple occupies? Sadly, it
sometimes matters and we need to worry about it. The most common reason
developers start to care about implementation details is performance, but more
importantly, these details can become a matter of correctness when interfacing
directly with hardware, operating systems, or other languages.
Worse, when you want to talk directly to the operating system, you *have* to
talk to an unsafe language: *C*. C is ever-present and unavoidable. It's the
lingua-franca of the programming world.
Even other safe languages generally expose C interfaces for the world at large!
Regardless of why you're doing it, as soon as your program starts talking to
C it stops being safe.
When implementation details start to matter in a safe programming language,
programmers usually have three options:
With that said, Rust is *totally* a safe programming language.
* fiddle with the code to encourage the compiler/runtime to perform an optimization
* adopt a more unidiomatic or cumbersome design to get the desired implementation
* rewrite the implementation in a language that lets you deal with those details
Well, Rust *has* a safe programming language. Let's step back a bit.
For that last option, the language programmers tend to use is *C*. This is often
necessary to interface with systems that only declare a C interface.
Rust can be thought of as being composed of two programming languages: *Safe
Rust* and *Unsafe Rust*. Safe Rust is For Reals Totally Safe. Unsafe Rust,
unsurprisingly, is *not* For Reals Totally Safe. In fact, Unsafe Rust lets you
do some really, *really* unsafe things.
Unfortunately, C is incredibly unsafe to use (sometimes for good reason),
and this unsafety is magnified when trying to interoperate with another
language. Care must be taken to ensure C and the other language agree on
what's happening, and that they don't step on each other's toes.
So what does this have to do with Rust?
Well, unlike C, Rust is a safe programming language.
But, like C, Rust is an unsafe programming language.
More accurately, Rust *contains* both a safe and unsafe programming language.
Rust can be thought of as a combination of two programming languages: *Safe
Rust* and *Unsafe Rust*. Conveniently, these names mean exactly what they say:
Safe Rust is Safe. Unsafe Rust is, well, not. In fact, Unsafe Rust lets us
do some *really* unsafe things. Things the Rust authors will implore you not to
do, but we'll do anyway.
Safe Rust is the *true* Rust programming language. If all you do is write Safe
Rust, you will never have to worry about type-safety or memory-safety. You will
never endure a null or dangling pointer, or any of that Undefined Behavior
nonsense.
*That's totally awesome.*
never endure a dangling pointer, a use-after-free, or any other kind of
Undefined Behavior.
The standard library also gives you enough utilities out-of-the-box that you'll
be able to write awesome high-performance applications and libraries in pure
idiomatic Safe Rust.
The standard library also gives you enough utilities out of the box that you'll
be able to write high-performance applications and libraries in pure idiomatic
Safe Rust.
But maybe you want to talk to another language. Maybe you're writing a
low-level abstraction not exposed by the standard library. Maybe you're
@ -44,57 +54,15 @@ need to do something the type-system doesn't understand and just *frob some dang
bits*. Maybe you need Unsafe Rust.
Unsafe Rust is exactly like Safe Rust with all the same rules and semantics.
However Unsafe Rust lets you do some *extra* things that are Definitely Not Safe.
The only things that are different in Unsafe Rust are that you can:
* Dereference raw pointers
* Call `unsafe` functions (including C functions, intrinsics, and the raw allocator)
* Implement `unsafe` traits
* Mutate statics
That's it. The reason these operations are relegated to Unsafe is that misusing
any of these things will cause the ever dreaded Undefined Behavior. Invoking
Undefined Behavior gives the compiler full rights to do arbitrarily bad things
to your program. You definitely *should not* invoke Undefined Behavior.
Unlike C, Undefined Behavior is pretty limited in scope in Rust. All the core
language cares about is preventing the following things:
* Dereferencing null or dangling pointers
* Reading [uninitialized memory]
* Breaking the [pointer aliasing rules]
* Producing invalid primitive values:
* dangling/null references
* a `bool` that isn't 0 or 1
* an undefined `enum` discriminant
* a `char` outside the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF]
* A non-utf8 `str`
* Unwinding into another language
* Causing a [data race][race]
That's it. That's all the causes of Undefined Behavior baked into Rust. Of
course, unsafe functions and traits are free to declare arbitrary other
constraints that a program must maintain to avoid Undefined Behavior. However,
generally violations of these constraints will just transitively lead to one of
the above problems. Some additional constraints may also derive from compiler
intrinsics that make special assumptions about how code can be optimized.
Rust is otherwise quite permissive with respect to other dubious operations.
Rust considers it "safe" to:
* Deadlock
* Have a [race condition][race]
* Leak memory
* Fail to call destructors
* Overflow integers
* Abort the program
* Delete the production database
However any program that actually manages to do such a thing is *probably*
incorrect. Rust provides lots of tools to make these things rare, but
these problems are considered impractical to categorically prevent.
[pointer aliasing rules]: references.html
[uninitialized memory]: uninitialized.html
[race]: races.html
It just lets you do some *extra* things that are Definitely Not Safe
(which we will define in the next section).
The value of this separation is that we gain the benefits of using an unsafe
language like C — low level control over implementation details — without most
of the problems that come with trying to integrate it with a completely
different safe language.
There are still some problems — most notably, we must become aware of properties
that the type system assumes and audit them in any code that interacts with
Unsafe Rust. That's the purpose of this book: to teach you about these assumptions
and how to manage them.

@ -6,25 +6,29 @@ interact?
The separation between Safe Rust and Unsafe Rust is controlled with the
`unsafe` keyword, which acts as an interface from one to the other. This is
why we can say Safe Rust is a safe language: all the unsafe parts are kept
exclusively behind the boundary.
exclusively behind the `unsafe` boundary. If you wish, you can even toss
`#![forbid(unsafe_code)]` into your code base to statically guarantee that
you're only writing Safe Rust.
The `unsafe` keyword has two uses: to declare the existence of contracts the
compiler can't check, and to declare that the adherence of some code to
those contracts has been checked by the programmer.
compiler can't check, and to declare that a programmer has checked that these
contracts have been upheld.
You can use `unsafe` to indicate the existence of unchecked contracts on
_functions_ and on _trait declarations_. On functions, `unsafe` means that
_functions_ and _trait declarations_. On functions, `unsafe` means that
users of the function must check that function's documentation to ensure
they are using it in a way that maintains the contracts the function
requires. On trait declarations, `unsafe` means that implementors of the
trait must check the trait documentation to ensure their implementation
maintains the contracts the trait requires.
You can use `unsafe` on a block to declare that all constraints required
by an unsafe function within the block have been adhered to, and the code
can therefore be trusted. You can use `unsafe` on a trait implementation
to declare that the implementation of that trait has adhered to whatever
contracts the trait's documentation requires.
You can use `unsafe` on a block to declare that all unsafe actions performed
within are verified to uphold the contracts of those operations. For instance,
the index passed to `slice::get_unchecked` is in-bounds.
You can use `unsafe` on a trait implementation to declare that the implementation
upholds the trait's contract. For instance, that a type implementing `Send` is
really safe to move to another thread.
The standard library has a number of unsafe functions, including:
@ -32,11 +36,10 @@ The standard library has a number of unsafe functions, including:
memory safety to be freely violated.
* `mem::transmute` reinterprets some value as having a given type, bypassing
type safety in arbitrary ways (see [conversions] for details).
* Every raw pointer to a sized type has an intrinstic `offset` method that
invokes Undefined Behavior if the passed offset is not "in bounds" as
defined by LLVM.
* All FFI functions are `unsafe` because the other language can do arbitrary
operations that the Rust compiler can't check.
* Every raw pointer to a sized type has an `offset` method that
invokes Undefined Behavior if the passed offset is not ["in bounds"][ptr_offset].
* All FFI functions are `unsafe` to call because the other language can do
arbitrary operations that the Rust compiler can't check.
As of Rust 1.0 there are exactly two unsafe traits:
@ -45,41 +48,60 @@ As of Rust 1.0 there are exactly two unsafe traits:
* `Sync` is a marker trait that promises threads can safely share implementors
through a shared reference.
Much of the Rust standard library also uses Unsafe Rust internally, although
these implementations are rigorously manually checked, and the Safe Rust
interfaces provided on top of these implementations can be assumed to be safe.
Much of the Rust standard library also uses Unsafe Rust internally. These
implementations have generally been rigorously manually checked, so the Safe Rust
interfaces built on top of these implementations can be assumed to be safe.
The need for all of this separation boils down a single fundamental property
of Safe Rust:
**No matter what, Safe Rust can't cause Undefined Behavior.**
The design of the safe/unsafe split means that Safe Rust inherently has to
trust that any Unsafe Rust it touches has been written correctly (meaning
the Unsafe Rust actually maintains whatever contracts it is supposed to
maintain). On the other hand, Unsafe Rust has to be very careful about
trusting Safe Rust.
The design of the safe/unsafe split means that there is an asymmetric trust
relationship between Safe and Unsafe Rust. Safe Rust inherently has to
trust that any Unsafe Rust it touches has been written correctly.
On the other hand, Unsafe Rust has to be very careful about trusting Safe Rust.
As an example, Rust has the `PartialOrd` and `Ord` traits to differentiate
between types which can "just" be compared, and those that provide a total
ordering (where every value of the type is either equal to, greater than,
or less than any other value of the same type). The sorted map type
`BTreeMap` doesn't make sense for partially-ordered types, and so it
requires that any key type for it implements the `Ord` trait. However,
`BTreeMap` has Unsafe Rust code inside of its implementation, and this
Unsafe Rust code cannot assume that any `Ord` implementation it gets makes
sense. The unsafe portions of `BTreeMap`'s internals have to be careful to
maintain all necessary contracts, even if a key type's `Ord` implementation
does not implement a total ordering.
Unsafe Rust cannot automatically trust Safe Rust. When writing Unsafe Rust,
you must be careful to only rely on specific Safe Rust code, and not make
assumptions about potential future Safe Rust code providing the same
guarantees.
This is the problem that `unsafe` traits exist to resolve. The `BTreeMap`
type could theoretically require that keys implement a new trait called
`UnsafeOrd`, rather than `Ord`, that might look like this:
between types which can "just" be compared, and those that provide a "total"
ordering (which basically means that comparison behaves reasonably).
`BTreeMap` doesn't really make sense for partially-ordered types, and so it
requires that its keys implement `Ord`. However, `BTreeMap` has Unsafe Rust code
inside of its implementation. Because it would be unacceptable for a sloppy `Ord`
implementation (which is Safe to write) to cause Undefined Behavior, the Unsafe
code in BTreeMap must be written to be robust against `Ord` implementations which
aren't actually total — even though that's the whole point of requiring `Ord`.
The Unsafe Rust code just can't trust the Safe Rust code to be written correctly.
That said, `BTreeMap` will still behave completely erratically if you feed in
values that don't have a total ordering. It just won't ever cause Undefined
Behavior.
One may wonder, if `BTreeMap` cannot trust `Ord` because it's Safe, why can it
trust *any* Safe code? For instance `BTreeMap` relies on integers and slices to
be implemented correctly. Those are safe too, right?
The difference is one of scope. When `BTreeMap` relies on integers and slices,
it's relying on one very specific implementation. This is a measured risk that
can be weighed against the benefit. In this case there's basically zero risk;
if integers and slices are broken, *everyone* is broken. Also, they're maintained
by the same people who maintain `BTreeMap`, so it's easy to keep tabs on them.
On the other hand, `BTreeMap`'s key type is generic. Trusting its `Ord` implementation
means trusting every `Ord` implementation in the past, present, and future.
Here the risk is high: someone somewhere is going to make a mistake and mess up
their `Ord` implementation, or even just straight up lie about providing a total
ordering because "it seems to work". When that happens, `BTreeMap` needs to be
prepared.
The same logic applies to trusting a closure that's passed to you to behave
correctly.
This problem of unbounded generic trust is the problem that `unsafe` traits
exist to resolve. The `BTreeMap` type could theoretically require that keys
implement a new trait called `UnsafeOrd`, rather than `Ord`, that might look
like this:
```rust
use std::cmp::Ordering;
@ -92,32 +114,32 @@ unsafe trait UnsafeOrd {
Then, a type would use `unsafe` to implement `UnsafeOrd`, indicating that
they've ensured their implementation maintains whatever contracts the
trait expects. In this situation, the Unsafe Rust in the internals of
`BTreeMap` could trust that the key type's `UnsafeOrd` implementation is
correct. If it isn't, it's the fault of the unsafe trait implementation
code, which is consistent with Rust's safety guarantees.
`BTreeMap` would be justified in trusting that the key type's `UnsafeOrd`
implementation is correct. If it isn't, it's the fault of the unsafe trait
implementation, which is consistent with Rust's safety guarantees.
The decision of whether to mark a trait `unsafe` is an API design choice.
Rust has traditionally avoided marking traits unsafe because it makes Unsafe
Rust pervasive, which is not desirable. `Send` and `Sync` are marked unsafe
Rust has traditionally avoided doing this because it makes Unsafe
Rust pervasive, which isn't desirable. `Send` and `Sync` are marked unsafe
because thread safety is a *fundamental property* that unsafe code can't
possibly hope to defend against in the way it could defend against a bad
`Ord` implementation. The decision of whether to mark your own traits `unsafe`
depends on the same sort of consideration. If `unsafe` code cannot reasonably
depends on the same sort of consideration. If `unsafe` code can't reasonably
expect to defend against a bad implementation of the trait, then marking the
trait `unsafe` is a reasonable choice.
As an aside, while `Send` and `Sync` are `unsafe` traits, they are
As an aside, while `Send` and `Sync` are `unsafe` traits, they are *also*
automatically implemented for types when such derivations are provably safe
to do. `Send` is automatically derived for all types composed only of values
whose types also implement `Send`. `Sync` is automatically derived for all
types composed only of values whose types also implement `Sync`.
types composed only of values whose types also implement `Sync`. This minimizes
the pervasive unsafety of making these two traits `unsafe`.
This is the dance of Safe Rust and Unsafe Rust. It is designed to make using
Safe Rust as ergonomic as possible, but requires extra effort and care when
writing Unsafe Rust. The rest of the book is largely a discussion of the sort
of care that must be taken, and what contracts it is expected of Unsafe Rust
to uphold.
This is the balance between Safe and Unsafe Rust. The separation is designed to
make using Safe Rust as ergonomic as possible, but requires extra effort and
care when writing Unsafe Rust. The rest of this book is largely a discussion
of the sort of care that must be taken, and what contracts Unsafe Rust must uphold.
[drop flags]: drop-flags.html
[conversions]: conversions.html
[ptr_offset]: ../std/primitive.pointer.html#method.offset

@ -0,0 +1,58 @@
# What Unsafe Rust Can Do
The only things that are different in Unsafe Rust are that you can:
* Dereference raw pointers
* Call `unsafe` functions (including C functions, compiler intrinsics, and the raw allocator)
* Implement `unsafe` traits
* Mutate statics
That's it. The reason these operations are relegated to Unsafe is that misusing
any of these things will cause the ever dreaded Undefined Behavior. Invoking
Undefined Behavior gives the compiler full rights to do arbitrarily bad things
to your program. You definitely *should not* invoke Undefined Behavior.
Unlike C, Undefined Behavior is pretty limited in scope in Rust. All the core
language cares about is preventing the following things:
* Dereferencing null, dangling, or unaligned pointers
* Reading [uninitialized memory][]
* Breaking the [pointer aliasing rules][]
* Producing invalid primitive values:
* dangling/null references
* a `bool` that isn't 0 or 1
* an undefined `enum` discriminant
* a `char` outside the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF]
* A non-utf8 `str`
* Unwinding into another language
* Causing a [data race][race]
That's it. That's all the causes of Undefined Behavior baked into Rust. Of
course, unsafe functions and traits are free to declare arbitrary other
constraints that a program must maintain to avoid Undefined Behavior. For
instance, the allocator APIs declare that deallocating unallocated memory is
Undefined Behavior.
However, violations of these constraints generally will just transitively lead to one of
the above problems. Some additional constraints may also derive from compiler
intrinsics that make special assumptions about how code can be optimized. For instance,
Vec and Box make use of intrinsics that require their pointers to be non-null at all times.
Rust is otherwise quite permissive with respect to other dubious operations.
Rust considers it "safe" to:
* Deadlock
* Have a [race condition][race]
* Leak memory
* Fail to call destructors
* Overflow integers
* Abort the program
* Delete the production database
However any program that actually manages to do such a thing is *probably*
incorrect. Rust provides lots of tools to make these things rare, but
these problems are considered impractical to categorically prevent.
[pointer aliasing rules]: references.html
[uninitialized memory]: uninitialized.html
[race]: races.html

@ -16,7 +16,7 @@ fn index(idx: usize, arr: &[u8]) -> Option<u8> {
}
```
Clearly, this function is safe. We check that the index is in bounds, and if it
This function is safe and correct. We check that the index is in bounds, and if it
is, index into the array in an unchecked manner. But even in such a trivial
function, the scope of the unsafe block is questionable. Consider changing the
`<` to a `<=`:
@ -45,13 +45,13 @@ null or containing uninitialized memory. Nothing fundamentally changes. However
safety *isn't* modular in the sense that programs are inherently stateful and
your unsafe operations may depend on arbitrary other state.
Trickier than that is when we get into actual statefulness. Consider a simple
implementation of `Vec`:
This non-locality gets much worse when we incorporate actual persistent state.
Consider a simple implementation of `Vec`:
```rust
use std::ptr;
// Note this definition is insufficient. See the section on implementing Vec.
// Note: This definition is naive. See the chapter on implementing Vec.
pub struct Vec<T> {
ptr: *mut T,
len: usize,
@ -59,8 +59,7 @@ pub struct Vec<T> {
}
// Note this implementation does not correctly handle zero-sized types.
// We currently live in a nice imaginary world of only positive fixed-size
// types.
// See the chapter on implementing Vec.
impl<T> Vec<T> {
pub fn push(&mut self, elem: T) {
if self.len == self.cap {
@ -72,14 +71,13 @@ impl<T> Vec<T> {
self.len += 1;
}
}
# fn reallocate(&mut self) { }
}
# fn main() {}
```
This code is simple enough to reasonably audit and verify. Now consider
This code is simple enough to reasonably audit and informally verify. Now consider
adding the following method:
```rust,ignore
@ -106,14 +104,12 @@ as Vec.
It is therefore possible for us to write a completely safe abstraction that
relies on complex invariants. This is *critical* to the relationship between
Safe Rust and Unsafe Rust. We have already seen that Unsafe code must trust
*some* Safe code, but can't trust *generic* Safe code. It can't trust an
arbitrary implementor of a trait or any function that was passed to it to be
well-behaved in a way that safe code doesn't care about.
However if unsafe code couldn't prevent client safe code from messing with its
state in arbitrary ways, safety would be a lost cause. Thankfully, it *can*
prevent arbitrary code from messing with critical state due to privacy.
Safe Rust and Unsafe Rust.
We have already seen that Unsafe code must trust *some* Safe code, but shouldn't
trust *generic* Safe code. Privacy is important to unsafe code for similar reasons:
it prevents us from having to trust all the safe code in the universe from messing
with our trusted state.
Safety lives!

Loading…
Cancel
Save