rework unsafe intro to be 1000% more adorable

pull/10/head
Alexis Beingessner 10 years ago committed by Manish Goregaokar
parent cec46bf948
commit b0f30f264e

@ -13,26 +13,66 @@ Where TRPL introduces the language and teaches the basics, TURPL dives deep into
the specification of the language, and all the nasty bits necessary to write the specification of the language, and all the nasty bits necessary to write
Unsafe Rust. TURPL does not assume you have read TRPL, but does assume you know Unsafe Rust. TURPL does not assume you have read TRPL, but does assume you know
the basics of the language and systems programming. We will not explain the the basics of the language and systems programming. We will not explain the
stack or heap, we will not explain the syntax. stack or heap, we will not explain the basic syntax.
# A Tale Of Two Languages
# Meet Safe and Unsafe
Safe and Unsafe are Rust's chief engineers.
TODO: ADORABLE PICTURES OMG
Unsafe handles all the dangerous internal stuff. They build the foundations
and handle all the dangerous materials. By all accounts, Unsafe is really a bit
unproductive, because the nature of their work means that they have to spend a
lot of time checking and double-checking everything. What if there's an earthquake
on a leap year? Are we ready for that? Unsafe better be, because if they get
*anything* wrong, everything will blow up! What Unsafe brings to the table is
*quality*, not quantity. Still, nothing would ever get done if everything was
built to Unsafe's standards!
That's where Safe comes in. Safe has to handle *everything else*. Since Safe needs
to *get work done*, they've grown to be fairly carless and clumsy! Safe doesn't worry
about all the crazy eventualities that Unsafe does, because life is too short to deal
with leap-year-earthquakes. Of course, this means there's some jobs that Safe just
can't handle. Safe is all about quantity over quality.
Unsafe loves Safe to bits, but knows that tey *can never trust them to do the
right thing*. Still, Unsafe acknowledges that not every problem needs quite the
attention to detail that they apply. Indeed, Unsafe would *love* if Safe could do
*everything* for them. To accomplish this, Unsafe spends most of their time
building *safe abstractions*. These abstractions handle all the nitty-gritty
details for Safe, and choose good defaults so that the simplest solution (which
Safe will inevitably use) is usually the *right* one. Once a safe abstraction is
built, Unsafe ideally needs to never work on it again, and Safe can blindly use
it in all their work.
Unsafe's attention to detail means that all the things that they mark as ok for
Safe to use can be combined in arbitrarily ridiculous ways, and all the rules
that Unsafe is forced to uphold will never be violated. If they *can* be violated
by Safe, that means *Unsafe*'s the one in the wrong. Safe can work carelessly,
knowing that if anything blows up, it's not *their* fault. Safe can also call in
Unsafe at any time if there's a hard problem they can't quite work out, or if they
can't meet the client's quality demands. Of course, Unsafe will beg and plead Safe
to try their latest safe abstraction first!
In addition to being adorable, Safe and Unsafe are what makes Rust possible.
Rust can be thought of as two different languages: Safe Rust, and Unsafe Rust. Rust can be thought of as two different languages: Safe Rust, and Unsafe Rust.
Any time someone opines the guarantees of Rust, they are almost surely talking about Any time someone opines the guarantees of Rust, they are almost surely talking about
Safe Rust. However Safe Rust is not sufficient to write every program. For that, Safe. However Safe is not sufficient to write every program. For that,
we need the Unsafe Rust superset. we need the Unsafe superset.
Most fundamentally, writing bindings to other languages Most fundamentally, writing bindings to other languages
(such as the C exposed by your operating system) is never going to be safe. Rust (such as the C exposed by your operating system) is never going to be safe. Rust
can't control what other languages do to program execution! However Unsafe Rust is can't control what other languages do to program execution! However Unsafe is
also necessary to construct fundamental abstractions where the type system is not also necessary to construct fundamental abstractions where the type system is not
sufficient to automatically prove what you're doing is sound. sufficient to automatically prove what you're doing is sound.
Indeed, the Rust standard library is implemented in Rust, and it makes substantial Indeed, the Rust standard library is implemented in Rust, and it makes substantial
use of Unsafe Rust for implementing IO, memory allocation, collections, use of Unsafe for implementing IO, memory allocation, collections,
synchronization, and other low-level computational primitives. synchronization, and other low-level computational primitives.
Upon hearing this, many wonder why they would not simply just use C or C++ in place of Upon hearing this, many wonder why they would not simply just use C or C++ in place of
@ -47,46 +87,40 @@ one does not have to suddenly worry about indexing out of bounds on `y`.
C and C++, by contrast, have pervasive unsafety baked into the language. Even the C and C++, by contrast, have pervasive unsafety baked into the language. Even the
modern best practices like `unique_ptr` have various safety pitfalls. modern best practices like `unique_ptr` have various safety pitfalls.
It should also be noted that writing Unsafe Rust should be regarded as an exceptional It cannot be emphasized enough that Unsafe should be regarded as an exceptional
action. Unsafe Rust is often the domain of *fundamental libraries*. Anything that needs thing, not a normal one. Unsafe is often the domain of *fundamental libraries*: anything that needs
to make FFI bindings or define core abstractions. These fundamental libraries then expose to make FFI bindings or define core abstractions. These fundamental libraries then expose
a *safe* interface for intermediate libraries and applications to build upon. And these a safe interface for intermediate libraries and applications to build upon. And these
safe interfaces make an important promise: if your application segfaults, it's not your safe interfaces make an important promise: if your application segfaults, it's not your
fault. *They* have a bug. fault. *They* have a bug.
And really, how is that different from *any* safe language? Python, Ruby, and Java libraries And really, how is that different from *any* safe language? Python, Ruby, and Java libraries
can internally do all sorts of nasty things. The languages themselves are no can internally do all sorts of nasty things. The languages themselves are no
different. Safe languages regularly have bugs that cause critical vulnerabilities. different. Safe languages *regularly* have bugs that cause critical vulnerabilities.
The fact that Rust is written with a healthy spoonful of Unsafe Rust is no different. The fact that Rust is written with a healthy spoonful of Unsafe is no different.
However it *does* mean that Rust doesn't need to fall back to the pervasive unsafety of However it *does* mean that Rust doesn't need to fall back to the pervasive unsafety of
C to do the nasty things that need to get done. C to do the nasty things that need to get done.
# What does `unsafe` mean?
Rust tries to model memory safety through the `unsafe` keyword. Interestingly, # What do Safe and Unsafe really mean?
the meaning of `unsafe` largely revolves around what
its *absence* means. If the `unsafe` keyword is absent from a program, it should
not be possible to violate memory safety under *any* conditions. The presence
of `unsafe` means that there are conditions under which this code *could*
violate memory safety.
To be more concrete, Rust cares about preventing the following things: Rust cares about preventing the following things:
* Dereferencing null/dangling pointers * Dereferencing null or dangling pointers
* Reading uninitialized memory * Reading [uninitialized memory][]
* Breaking the pointer aliasing rules (TBD) (llvm rules + noalias on &mut and & w/o UnsafeCell) * Breaking the [pointer aliasing rules][]
* Invoking Undefined Behaviour (in e.g. compiler intrinsics)
* Producing invalid primitive values: * Producing invalid primitive values:
* dangling/null references * dangling/null references
* a `bool` that isn't 0 or 1 * a `bool` that isn't 0 or 1
* an undefined `enum` discriminant * an undefined `enum` discriminant
* a `char` larger than char::MAX * a `char` larger than char::MAX (TODO: check if stronger restrictions apply)
* A non-utf8 `str` * A non-utf8 `str`
* Unwinding into an FFI function * Unwinding into another language
* Causing a data race * Causing a [data race][]
* Invoking Misc. Undefined Behaviour (in e.g. compiler intrinsics)
That's it. That's all the Undefined Behaviour in Rust. Libraries are free to That's it. That's all the Undefined Behaviour in Rust. Libraries are free to
declare arbitrary requirements if they could transitively cause memory safety declare arbitrary requirements if they could transitively cause memory safety
@ -95,15 +129,17 @@ quite permisive with respect to other dubious operations. Rust considers it
"safe" to: "safe" to:
* Deadlock * Deadlock
* Have a Race Condition
* Leak memory * Leak memory
* Fail to call destructors * Fail to call destructors
* Overflow integers * Overflow integers
* Delete the production database * Delete the production database
However any program that does such a thing is *probably* incorrect. Rust just isn't However any program that does such a thing is *probably* incorrect. Rust
interested in modeling these problems, as they are much harder to prevent in general, provides lots of tools to make doing these things rare, but these problems are
and it's literally impossible to prevent incorrect programs from getting written. considered impractical to categorically prevent.
Rust models the seperation between Safe and Unsafe with the `unsafe` keyword.
There are several places `unsafe` can appear in Rust today, which can largely be There are several places `unsafe` can appear in Rust today, which can largely be
grouped into two categories: grouped into two categories:
@ -112,7 +148,7 @@ you to write `unsafe` elsewhere:
* On functions, `unsafe` is declaring the function to be unsafe to call. Users * On functions, `unsafe` is declaring the function to be unsafe to call. Users
of the function must check the documentation to determine what this means, of the function must check the documentation to determine what this means,
and then have to write `unsafe` somewhere to identify that they're aware of and then have to write `unsafe` somewhere to identify that they're aware of
the danger. the danger.
* On trait declarations, `unsafe` is declaring that *implementing* the trait * On trait declarations, `unsafe` is declaring that *implementing* the trait
is an unsafe operation, as it has contracts that other unsafe code is free to is an unsafe operation, as it has contracts that other unsafe code is free to
trust blindly. trust blindly.
@ -126,19 +162,19 @@ unchecked contracts:
There is also `#[unsafe_no_drop_flag]`, which is a special case that exists for There is also `#[unsafe_no_drop_flag]`, which is a special case that exists for
historical reasons and is in the process of being phased out. See the section on historical reasons and is in the process of being phased out. See the section on
destructors for details. [destructors][] for details.
Some examples of unsafe functions: Some examples of unsafe functions:
* `slice::get_unchecked` will perform unchecked indexing, allowing memory * `slice::get_unchecked` will perform unchecked indexing, allowing memory
safety to be freely violated. safety to be freely violated.
* `ptr::offset` in an intrinsic that invokes Undefined Behaviour if it is * `ptr::offset` is an intrinsic that invokes Undefined Behaviour if it is
not "in bounds" as defined by LLVM (see the lifetimes section for details). not "in bounds" as defined by LLVM (see the lifetimes section for details).
* `mem::transmute` reinterprets some value as having the given type, * `mem::transmute` reinterprets some value as having the given type,
bypassing type safety in arbitrary ways. (see the conversions section for details) bypassing type safety in arbitrary ways. (see [conversions][] for details)
* All FFI functions are `unsafe` because they can do arbitrary things. * All FFI functions are `unsafe` because they can do arbitrary things.
C being an obvious culprit, but generally any language can do something C being an obvious culprit, but generally any language can do something
that Rust isn't happy about. (see the FFI section for details) that Rust isn't happy about.
As of Rust 1.0 there are exactly two unsafe traits: As of Rust 1.0 there are exactly two unsafe traits:
@ -147,25 +183,60 @@ As of Rust 1.0 there are exactly two unsafe traits:
* `Sync` is a marker trait that promises that threads can safely share * `Sync` is a marker trait that promises that threads can safely share
implementors through a shared reference. implementors through a shared reference.
All other traits that declare any kind of contract *really* can't be trusted The need for unsafe traits boils down to the fundamental lack of trust that Unsafe
to adhere to their contract when memory-safety is at stake. For instance Rust has has for Safe. All safe traits are free to declare arbitrary contracts, but because
`PartialOrd` and `Ord` to differentiate between types which can "just" be implementing them is a job for Safe, Unsafe can't trust those contracts to actually
compared and those that implement a total ordering. However you can't actually be upheld.
trust an implementor of `Ord` to actually provide a total ordering if failing to
do so causes you to e.g. index out of bounds. But if it just makes your program For instance Rust has `PartialOrd` and `Ord` traits to try to differentiate
do a stupid thing, then it's "fine" to rely on `Ord`. between types which can "just" be compared, and those that actually implement a
*total* ordering. Pretty much every API that wants to work with data that can be
The reason this is the case is that `Ord` is safe to implement, and it should be compared *really* wants Ord data. For instance, a sorted map like BTreeMap
impossible for bad *safe* code to violate memory safety. Rust has traditionally *doesn't even make sense* for partially ordered types. If you claim to implement
avoided making traits unsafe because it makes `unsafe` pervasive in the language, Ord for a type, but don't actually provide a proper total ordering, BTreeMap will
which is not desirable. The only reason `Send` and `Sync` are unsafe is because get *really confused* and start making a total mess of itself. Data that is
thread safety is a sort of fundamental thing that a program can't really guard inserted may be impossible to find!
against locally (even by-value message passing still requires a notion Send).
But that's ok. BTreeMap is safe, so it guarantees that even if you give it a
*completely* garbage Ord implementation, it will still do something *safe*. You
won't start reading uninitialized memory or unallocated memory. In fact, BTreeMap
manages to not actually lose any of your data. When the map is dropped, all the
# Working with unsafe destructors will be successfully called! Hooray!
However BTreeMap is implemented using a modest spoonful of Unsafe (most collections
are). That means that it is not necessarily *trivially true* that a bad Ord
implementation will make BTreeMap behave safely. Unsafe most be sure not to rely
on Ord *where safety is at stake*, because Ord is provided by Safe, and memory
safety is not Safe's responsibility to uphold. *It must be impossible for Safe
code to violate memory safety*.
But wouldn't it be grand if there was some way for Unsafe to trust *some* trait
contracts *somewhere*? This is the problem that unsafe traits tackle: by marking
*the trait itself* as unsafe *to implement*, Unsafe can trust the implementation
to be correct (because Unsafe can trust themself).
Rust has traditionally avoided making traits unsafe because it makes Unsafe
pervasive, which is not desirable. Send and Sync are unsafe is because
thread safety is a *fundamental property* that Unsafe cannot possibly hope to
defend against in the same way it would defend against a bad Ord implementation.
The only way to possibly defend against thread-unsafety would be to *not use
threading at all*. Making every operation atomic isn't even sufficient, because
it's possible for complex invariants between disjoint locations in memory.
Even concurrent paradigms that are traditionally regarded as Totally Safe like
message passing implicitly rely on some notion of thread safety -- are you
really message-passing if you send a *pointer*? Send and Sync therefore require
some *fundamental* level of trust that Safe code can't provide, so they must be
unsafe to implement. To help obviate the pervasive unsafety that this would
introduce, Send (resp. Sync) is *automatically* derived for all types composed only
of Send (resp. Sync) values. 99% of types are Send and Sync, and 99% of those
never actually say it (the remaining 1% is overwhelmingly synchronization
primitives).
# Working with Unsafe
Rust generally only gives us the tools to talk about safety in a scoped and Rust generally only gives us the tools to talk about safety in a scoped and
binary manner. Unfortunately reality is significantly more complicated than that. binary manner. Unfortunately reality is significantly more complicated than that.
@ -254,5 +325,11 @@ trust the capacity field because there's no way to verify it.
Generally, the only bullet-proof way to limit the scope of unsafe code is at the Generally, the only bullet-proof way to limit the scope of unsafe code is at the
module boundary with privacy. module boundary with privacy.
[trpl]: https://doc.rust-lang.org/book/
[trpl]: https://doc.rust-lang.org/book/
[pointer aliasing rules]: lifetimes.html#references
[uninitialized memory]: uninitialized.html
[data race]: concurrency.html
[destructors]: raii.html
[conversions]: conversions.html

Loading…
Cancel
Save