rewrite intro

pull/10/head
Alexis Beingessner 9 years ago committed by Manish Goregaokar
parent 5f3cec4a00
commit 165132a1ad

@ -1,20 +1,39 @@
% The Unsafe Rust Programming Language % The Advanced Rust Programming Language
# NOTE: This is a draft document, and may contain serious errors # NOTE: This is a draft document, and may contain serious errors
**This document is about advanced functionality and low-level development practices So you've played around with Rust a bit. You've written a few simple programs and
in the Rust Programming Language. Most of the things discussed won't matter you think you grok the basics. Maybe you've even read through
to the average Rust programmer. However if you wish to correctly write unsafe *[The Rust Programming Language][trpl]*. Now you want to get neck-deep in all the
code in Rust, this text contains invaluable information.** nitty-gritty details of the language. You want to know those weird corner-cases.
You want to know what the heck `unsafe` really means, and how to properly use it.
This is the book for you.
The Unsafe Rust Programming Language (TURPL) seeks to complement To be clear, this book goes into *serious* detail. We're going to dig into
[The Rust Programming Language Book][trpl] (TRPL). exception-safety and pointer aliasing. We're going to talk about memory
Where TRPL introduces the language and teaches the basics, TURPL dives deep into models. We're even going to do some type-theory. This is stuff that you
the specification of the language, and all the nasty bits necessary to write absolutely *don't* need to know to write fast and safe Rust programs.
Unsafe Rust. TURPL does not assume you have read TRPL, but does assume you know You could probably close this book *right now* and still have a productive
the basics of the language and systems programming. We will not explain the and happy career in Rust.
stack or heap. We will not explain the basic syntax.
However if you intend to write unsafe code -- or just *really* want to dig into
the guts of the language -- this book contains *invaluable* information.
Unlike *The Rust Programming Language* we *will* be assuming considerable prior
knowledge. In particular, you should be comfortable with:
* Basic Systems Programming:
* Pointers
* [The stack and heap][]
* The memory hierarchy (caches)
* Threads
* [Basic Rust][]
Due to the nature of advanced Rust programming, we will be spending a lot of time
talking about *safety* and *guarantees*. In particular, a significant portion of
the book will be dedicated to correctly writing and understanding Unsafe Rust.
[trpl]: https://doc.rust-lang.org/book/ [trpl]: https://doc.rust-lang.org/book/
[The stack and heap]: https://doc.rust-lang.org/book/the-stack-and-the-heap.html
[Basic Rust]: https://doc.rust-lang.org/book/syntax-and-semantics.html

@ -1,7 +1,7 @@
# Summary # Summary
* [Meet Safe and Unsafe](meet-safe-and-unsafe.md) * [Meet Safe and Unsafe](meet-safe-and-unsafe.md)
* [What Do Safe and Unsafe Mean](safe-unsafe-meaning.md) * [How Safe and Unsafe Interact](safe-unsafe-meaning.md)
* [Working with Unsafe](working-with-unsafe.md) * [Working with Unsafe](working-with-unsafe.md)
* [Data Layout](data.md) * [Data Layout](data.md)
* [repr(Rust)](repr-rust.md) * [repr(Rust)](repr-rust.md)

@ -1,82 +1,98 @@
% Meet Safe and Unsafe % Meet Safe and Unsafe
Safe and Unsafe are Rust's chief engineers. Programmers in safe "high-level" languages face a fundamental dilemma. On one
hand, it would be *really* great to just say what you want and not worry about
TODO: ADORABLE PICTURES OMG how it's done. On the other hand, that can lead to some *really* poor
performance. It may be necessary to drop down to less clear or idiomatic
Unsafe handles all the dangerous internal stuff. They build the foundations practices to get the performance characteristics you want. Or maybe you just
and handle all the dangerous materials. By all accounts, Unsafe is really a bit throw up your hands in disgust and decide to shell out to an implementation in
unproductive, because the nature of their work means that they have to spend a a less sugary-wonderful *unsafe* language.
lot of time checking and double-checking everything. What if there's an earthquake
on a leap year? Are we ready for that? Unsafe better be, because if they get
*anything* wrong, everything will blow up! What Unsafe brings to the table is
*quality*, not quantity. Still, nothing would ever get done if everything was
built to Unsafe's standards!
That's where Safe comes in. Safe has to handle *everything else*. Since Safe needs
to *get work done*, they've grown to be fairly careless and clumsy! Safe doesn't worry
about all the crazy eventualities that Unsafe does, because life is too short to deal
with leap-year-earthquakes. Of course, this means there's some jobs that Safe just
can't handle. Safe is all about quantity over quality.
Unsafe loves Safe to bits, but knows that they *can never trust them to do the
right thing*. Still, Unsafe acknowledges that not every problem needs quite the
attention to detail that they apply. Indeed, Unsafe would *love* if Safe could do
*everything* for them. To accomplish this, Unsafe spends most of their time
building *safe abstractions*. These abstractions handle all the nitty-gritty
details for Safe, and choose good defaults so that the simplest solution (which
Safe will inevitably use) is usually the *right* one. Once a safe abstraction is
built, Unsafe ideally needs to never work on it again, and Safe can blindly use
it in all their work.
Unsafe's attention to detail means that all the things that they mark as ok for
Safe to use can be combined in arbitrarily ridiculous ways, and all the rules
that Unsafe is forced to uphold will never be violated. If they *can* be violated
by Safe, that means *Unsafe*'s the one in the wrong. Safe can work carelessly,
knowing that if anything blows up, it's not *their* fault. Safe can also call in
Unsafe at any time if there's a hard problem they can't quite work out, or if they
can't meet the client's quality demands. Of course, Unsafe will beg and plead Safe
to try their latest safe abstraction first!
In addition to being adorable, Safe and Unsafe are what makes Rust possible.
Rust can be thought of as two different languages: Safe Rust, and Unsafe Rust.
Any time someone opines the guarantees of Rust, they are almost surely talking about
Safe. However Safe is not sufficient to write every program. For that,
we need the Unsafe superset.
Most fundamentally, writing bindings to other languages
(such as the C exposed by your operating system) is never going to be safe. Rust
can't control what other languages do to program execution! However Unsafe is
also necessary to construct fundamental abstractions where the type system is not
sufficient to automatically prove what you're doing is sound.
Indeed, the Rust standard library is implemented in Rust, and it makes substantial
use of Unsafe for implementing IO, memory allocation, collections,
synchronization, and other low-level computational primitives.
Upon hearing this, many wonder why they would not simply just use C or C++ in place of
Rust (or just use a "real" safe language). If we're going to do unsafe things, why not
lean on these much more established languages?
The most important difference between C++ and Rust is a matter of defaults:
Rust is 100% safe by default. Even when you *opt out* of safety in Rust, it is a modular
action. In deciding to work with unchecked uninitialized memory, this does not
suddenly make dangling or null pointers a problem. When using unchecked indexing on `x`,
one does not have to suddenly worry about indexing out of bounds on `y`.
C and C++, by contrast, have pervasive unsafety baked into the language. Even the
modern best practices like `unique_ptr` have various safety pitfalls.
It cannot be emphasized enough that Unsafe should be regarded as an exceptional
thing, not a normal one. Unsafe is often the domain of *fundamental libraries*: anything that needs
to make FFI bindings or define core abstractions. These fundamental libraries then expose
a safe interface for intermediate libraries and applications to build upon. And these
safe interfaces make an important promise: if your application segfaults, it's not your
fault. *They* have a bug.
And really, how is that different from *any* safe language? Python, Ruby, and Java libraries
can internally do all sorts of nasty things. The languages themselves are no
different. Safe languages *regularly* have bugs that cause critical vulnerabilities.
The fact that Rust is written with a healthy spoonful of Unsafe is no different.
However it *does* mean that Rust doesn't need to fall back to the pervasive unsafety of
C to do the nasty things that need to get done.
Worse, when you want to talk directly to the operating system, you *have* to
talk to an unsafe language: *C*. C is ever-present and unavoidable. It's the
lingua-franca of the programming world.
Even other safe languages generally expose C interfaces for the world at large!
Regardless of *why* you're doing it, as soon as your program starts talking to
C it stops being safe.
With that said, Rust is *totally* a safe programming language.
Well, Rust *has* a safe programming language. Let's step back a bit.
Rust can be thought of as being composed of two
programming languages: *Safe* and *Unsafe*. Safe is For Reals Totally Safe.
Unsafe, unsurprisingly, is *not* For Reals Totally Safe. In fact, Unsafe lets
you do some really crazy unsafe things.
Safe is *the* Rust programming language. If all you do is write Safe Rust,
you will never have to worry about type-safety or memory-safety. You will never
endure a null or dangling pointer, or any of that Undefined Behaviour nonsense.
*That's totally awesome*.
The standard library also gives you enough utilities out-of-the-box that you'll
be able to write awesome high-performance applications and libraries in pure
idiomatic Safe Rust.
But maybe you want to talk to another language. Maybe you're writing a
low-level abstraction not exposed by the standard library. Maybe you're
*writing* the standard library (which is written entirely in Rust). Maybe you
need to do something the type-system doesn't understand and just *frob some dang
bits*. Maybe you need Unsafe Rust.
Unsafe Rust is exactly like Safe Rust with *all* the same rules and semantics.
However Unsafe Rust lets you do some *extra* things that are Definitely Not Safe.
The only things that are different in Unsafe Rust are that you can:
* Dereference raw pointers
* Call `unsafe` functions (including C functions, intrinsics, and the raw allocator)
* Implement `unsafe` traits
* Mutate statics
That's it. The reason these operations are relegated to Unsafe is that misusing
any of these things will cause the ever dreaded Undefined Behaviour. Invoking
Undefined Behaviour gives the compiler full rights to do arbitrarily bad things
to your program. You definitely *should not* invoke Undefined Behaviour.
Unlike C, Undefined Behaviour is pretty limited in scope in Rust. All the core
language cares about is preventing the following things:
* Dereferencing null or dangling pointers
* Reading [uninitialized memory][]
* Breaking the [pointer aliasing rules][]
* Producing invalid primitive values:
* dangling/null references
* a `bool` that isn't 0 or 1
* an undefined `enum` discriminant
* a `char` outside the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF]
* A non-utf8 `str`
* Unwinding into another language
* Causing a [data race][race]
* Double-dropping a value
That's it. That's all the Undefined Behaviour baked into Rust. Of course, unsafe
functions and traits are free to declare arbitrary other constraints that a
program must maintain to avoid Undefined Behaviour. However these are generally
just things that will transitively lead to one of the above problems. Some
additional constraints may also derive from compiler intrinsics that make special
assumptions about how code can be optimized.
Rust is otherwise quite permissive with respect to other dubious operations. Rust
considers it "safe" to:
* Deadlock
* Have a [race condition][race]
* Leak memory
* Fail to call destructors
* Overflow integers
* Abort the program
* Delete the production database
However any program that actually manages to do such a thing is *probably*
incorrect. Rust provides lots of tools to make these things rare, but
these problems are considered impractical to categorically prevent.
[pointer aliasing rules]: references.html
[uninitialized memory]: uninitialized.html
[race]: races.html

@ -1,38 +1,17 @@
% What do Safe and Unsafe really mean? % How Safe and Unsafe Interact
Rust cares about preventing the following things: So what's the relationship between Safe and Unsafe? How do they interact?
* Dereferencing null or dangling pointers Rust models the seperation between Safe and Unsafe with the `unsafe` keyword, which
* Reading [uninitialized memory][] can be thought as a sort of *foreign function interface* (FFI) between Safe and Unsafe.
* Breaking the [pointer aliasing rules][] This is the magic behind why we can say Safe is a safe language: all the scary unsafe
* Producing invalid primitive values: bits are relagated *exclusively* to FFI *just like every other safe language*.
* dangling/null references
* a `bool` that isn't 0 or 1 However because one language is a subset of the other, the two can be cleanly
* an undefined `enum` discriminant intermixed as long as the boundary between Safe and Unsafe is denoted with the
* a `char` larger than char::MAX (TODO: check if stronger restrictions apply) `unsafe` keyword. No need to write headers, initialize runtimes, or any of that
* A non-utf8 `str` other FFI boiler-plate.
* Unwinding into another language
* Causing a [data race][]
* Invoking Misc. Undefined Behaviour (in e.g. compiler intrinsics)
That's it. That's all the Undefined Behaviour in Rust. Libraries are free to
declare arbitrary requirements if they could transitively cause memory safety
issues, but it all boils down to the above actions. Rust is otherwise
quite permisive with respect to other dubious operations. Rust considers it
"safe" to:
* Deadlock
* Have a Race Condition
* Leak memory
* Fail to call destructors
* Overflow integers
* Delete the production database
However any program that does such a thing is *probably* incorrect. Rust
provides lots of tools to make doing these things rare, but these problems are
considered impractical to categorically prevent.
Rust models the seperation between Safe and Unsafe with the `unsafe` keyword.
There are several places `unsafe` can appear in Rust today, which can largely be There are several places `unsafe` can appear in Rust today, which can largely be
grouped into two categories: grouped into two categories:
@ -44,7 +23,7 @@ you to write `unsafe` elsewhere:
the danger. the danger.
* On trait declarations, `unsafe` is declaring that *implementing* the trait * On trait declarations, `unsafe` is declaring that *implementing* the trait
is an unsafe operation, as it has contracts that other unsafe code is free to is an unsafe operation, as it has contracts that other unsafe code is free to
trust blindly. trust blindly. (More on this below.)
* I am declaring that I have, to the best of my knowledge, adhered to the * I am declaring that I have, to the best of my knowledge, adhered to the
unchecked contracts: unchecked contracts:
@ -55,14 +34,14 @@ unchecked contracts:
There is also `#[unsafe_no_drop_flag]`, which is a special case that exists for There is also `#[unsafe_no_drop_flag]`, which is a special case that exists for
historical reasons and is in the process of being phased out. See the section on historical reasons and is in the process of being phased out. See the section on
[destructors][] for details. [drop flags][] for details.
Some examples of unsafe functions: Some examples of unsafe functions:
* `slice::get_unchecked` will perform unchecked indexing, allowing memory * `slice::get_unchecked` will perform unchecked indexing, allowing memory
safety to be freely violated. safety to be freely violated.
* `ptr::offset` is an intrinsic that invokes Undefined Behaviour if it is * `ptr::offset` is an intrinsic that invokes Undefined Behaviour if it is
not "in bounds" as defined by LLVM (see the lifetimes section for details). not "in bounds" as defined by LLVM.
* `mem::transmute` reinterprets some value as having the given type, * `mem::transmute` reinterprets some value as having the given type,
bypassing type safety in arbitrary ways. (see [conversions][] for details) bypassing type safety in arbitrary ways. (see [conversions][] for details)
* All FFI functions are `unsafe` because they can do arbitrary things. * All FFI functions are `unsafe` because they can do arbitrary things.
@ -72,14 +51,34 @@ Some examples of unsafe functions:
As of Rust 1.0 there are exactly two unsafe traits: As of Rust 1.0 there are exactly two unsafe traits:
* `Send` is a marker trait (it has no actual API) that promises implementors * `Send` is a marker trait (it has no actual API) that promises implementors
are safe to send to another thread. are safe to send (move) to another thread.
* `Sync` is a marker trait that promises that threads can safely share * `Sync` is a marker trait that promises that threads can safely share
implementors through a shared reference. implementors through a shared reference.
The need for unsafe traits boils down to the fundamental lack of trust that Unsafe The need for unsafe traits boils down to the fundamental property of safe code:
has for Safe. All safe traits are free to declare arbitrary contracts, but because
implementing them is a job for Safe, Unsafe can't trust those contracts to actually **No matter how completely awful Safe code is, it can't cause Undefined
be upheld. Behaviour.**
This means that Unsafe, **the royal vanguard of Undefined Behaviour**, has to be
*super paranoid* about generic safe code. Unsafe is free to trust *specific* safe
code (or else you would degenerate into infinite spirals of paranoid despair).
It is generally regarded as ok to trust the standard library to be correct, as
std is effectively an extension of the language (and you *really* just have to trust
the language). If `std` fails to uphold the guarantees it declares, then it's
basically a language bug.
That said, it would be best to minimize *needlessly* relying on properties of
concrete safe code. Bugs happen! Of course, I must reinforce that this is only
a concern for Unsafe code. Safe code can't blindly trust anyone and everyone
as far as basic memory-safety is concerned.
On the other hand, safe traits are free to declare arbitrary contracts, but because
implementing them is Safe, Unsafe can't trust those contracts to actually
be upheld. This is different from the concrete case because *anyone* can
randomly implement the interface. There is something fundamentally different
about trusting a *particular* piece of code to be correct, and trusting *all the
code that will ever be written* to be correct.
For instance Rust has `PartialOrd` and `Ord` traits to try to differentiate For instance Rust has `PartialOrd` and `Ord` traits to try to differentiate
between types which can "just" be compared, and those that actually implement a between types which can "just" be compared, and those that actually implement a
@ -99,14 +98,13 @@ destructors will be successfully called! Hooray!
However BTreeMap is implemented using a modest spoonful of Unsafe (most collections However BTreeMap is implemented using a modest spoonful of Unsafe (most collections
are). That means that it is not necessarily *trivially true* that a bad Ord are). That means that it is not necessarily *trivially true* that a bad Ord
implementation will make BTreeMap behave safely. Unsafe must be sure not to rely implementation will make BTreeMap behave safely. Unsafe must be sure not to rely
on Ord *where safety is at stake*, because Ord is provided by Safe, and memory on Ord *where safety is at stake*. Ord is provided by Safe, and safety is not
safety is not Safe's responsibility to uphold. *It must be impossible for Safe Safe's responsibility to uphold.
code to violate memory safety*.
But wouldn't it be grand if there was some way for Unsafe to trust *some* trait But wouldn't it be grand if there was some way for Unsafe to trust *some* trait
contracts *somewhere*? This is the problem that unsafe traits tackle: by marking contracts *somewhere*? This is the problem that unsafe traits tackle: by marking
*the trait itself* as unsafe *to implement*, Unsafe can trust the implementation *the trait itself* as unsafe *to implement*, Unsafe can trust the implementation
to be correct (because Unsafe can trust themself). to be correct.
Rust has traditionally avoided making traits unsafe because it makes Unsafe Rust has traditionally avoided making traits unsafe because it makes Unsafe
pervasive, which is not desirable. Send and Sync are unsafe is because pervasive, which is not desirable. Send and Sync are unsafe is because
@ -114,11 +112,12 @@ thread safety is a *fundamental property* that Unsafe cannot possibly hope to
defend against in the same way it would defend against a bad Ord implementation. defend against in the same way it would defend against a bad Ord implementation.
The only way to possibly defend against thread-unsafety would be to *not use The only way to possibly defend against thread-unsafety would be to *not use
threading at all*. Making every operation atomic isn't even sufficient, because threading at all*. Making every operation atomic isn't even sufficient, because
it's possible for complex invariants between disjoint locations in memory. it's possible for complex invariants to exist between disjoint locations in
memory. For instance, the pointer and capacity of a Vec must be in sync.
Even concurrent paradigms that are traditionally regarded as Totally Safe like Even concurrent paradigms that are traditionally regarded as Totally Safe like
message passing implicitly rely on some notion of thread safety -- are you message passing implicitly rely on some notion of thread safety -- are you
really message-passing if you send a *pointer*? Send and Sync therefore require really message-passing if you pass a *pointer*? Send and Sync therefore require
some *fundamental* level of trust that Safe code can't provide, so they must be some *fundamental* level of trust that Safe code can't provide, so they must be
unsafe to implement. To help obviate the pervasive unsafety that this would unsafe to implement. To help obviate the pervasive unsafety that this would
introduce, Send (resp. Sync) is *automatically* derived for all types composed only introduce, Send (resp. Sync) is *automatically* derived for all types composed only
@ -128,8 +127,6 @@ primitives).
[pointer aliasing rules]: lifetimes.html#references
[uninitialized memory]: uninitialized.html [drop flags]: drop-flags.html
[data race]: concurrency.html
[destructors]: raii.html
[conversions]: conversions.html [conversions]: conversions.html

@ -1,11 +1,11 @@
% Working with Unsafe % Working with Unsafe
Rust generally only gives us the tools to talk about safety in a scoped and Rust generally only gives us the tools to talk about Unsafe in a scoped and
binary manner. Unfortunately reality is significantly more complicated than that. binary manner. Unfortunately, reality is significantly more complicated than that.
For instance, consider the following toy function: For instance, consider the following toy function:
```rust ```rust
fn do_idx(idx: usize, arr: &[u8]) -> Option<u8> { pub fn index(idx: usize, arr: &[u8]) -> Option<u8> {
if idx < arr.len() { if idx < arr.len() {
unsafe { unsafe {
Some(*arr.get_unchecked(idx)) Some(*arr.get_unchecked(idx))
@ -22,7 +22,7 @@ function, the scope of the unsafe block is questionable. Consider changing the
`<` to a `<=`: `<` to a `<=`:
```rust ```rust
fn do_idx(idx: usize, arr: &[u8]) -> Option<u8> { pub fn index(idx: usize, arr: &[u8]) -> Option<u8> {
if idx <= arr.len() { if idx <= arr.len() {
unsafe { unsafe {
Some(*arr.get_unchecked(idx)) Some(*arr.get_unchecked(idx))
@ -45,7 +45,7 @@ implementation of `Vec`:
```rust ```rust
// Note this defintion is insufficient. See the section on lifetimes. // Note this defintion is insufficient. See the section on lifetimes.
struct Vec<T> { pub struct Vec<T> {
ptr: *mut T, ptr: *mut T,
len: usize, len: usize,
cap: usize, cap: usize,
@ -55,7 +55,7 @@ struct Vec<T> {
// We currently live in a nice imaginary world of only positive fixed-size // We currently live in a nice imaginary world of only positive fixed-size
// types. // types.
impl<T> Vec<T> { impl<T> Vec<T> {
fn push(&mut self, elem: T) { pub fn push(&mut self, elem: T) {
if self.len == self.cap { if self.len == self.cap {
// not important for this example // not important for this example
self.reallocate(); self.reallocate();
@ -80,9 +80,25 @@ adding the following method:
This code is safe, but it is also completely unsound. Changing the capacity This code is safe, but it is also completely unsound. Changing the capacity
violates the invariants of Vec (that `cap` reflects the allocated space in the violates the invariants of Vec (that `cap` reflects the allocated space in the
Vec). This is not something the rest of `Vec` can guard against. It *has* to Vec). This is not something the rest of Vec can guard against. It *has* to
trust the capacity field because there's no way to verify it. trust the capacity field because there's no way to verify it.
`unsafe` does more than pollute a whole function: it pollutes a whole *module*. `unsafe` does more than pollute a whole function: it pollutes a whole *module*.
Generally, the only bullet-proof way to limit the scope of unsafe code is at the Generally, the only bullet-proof way to limit the scope of unsafe code is at the
module boundary with privacy. module boundary with privacy.
However this works *perfectly*. The existence of `make_room` is *not* a
problem for the soundness of Vec because we didn't mark it as public. Only the
module that defines this function can call it. Also, `make_room` directly
accesses the private fields of Vec, so it can only be written in the same module
as Vec.
It is therefore possible for us to write a completely safe abstraction that
relies on complex invariants. This is *critical* to the relationship between
Safe Rust and Unsafe Rust. We have already seen that Unsafe code must trust
*some* Safe code, but can't trust *arbitrary* Safe code. However if Unsafe
couldn't prevent client Safe code from messing with its state in arbitrary ways,
safety would be a lost cause.
Safety lives!

Loading…
Cancel
Save