mirror of https://github.com/rust-lang/nomicon
parent
70ab1bf88a
commit
8fd1169aab
@ -0,0 +1,217 @@
|
|||||||
|
% Exception Safety
|
||||||
|
|
||||||
|
Although programs should use unwinding sparingly, there's *a lot* of code that
|
||||||
|
*can* panic. If you unwrap a None, index out of bounds, or divide by 0, your
|
||||||
|
program *will* panic. On debug builds, *every* arithmetic operation can panic
|
||||||
|
if it overflows. Unless you are very careful and tightly control what code runs,
|
||||||
|
pretty much everything can unwind, and you need to be ready for it.
|
||||||
|
|
||||||
|
Being ready for unwinding is often referred to as *exception safety*
|
||||||
|
in the broader programming world. In Rust, their are two levels of exception
|
||||||
|
safety that one may concern themselves with:
|
||||||
|
|
||||||
|
* In unsafe code, we *must* be exception safe to the point of not violating
|
||||||
|
memory safety. We'll call this *minimal* exception safety.
|
||||||
|
|
||||||
|
* In safe code, it is *good* to be exception safe to the point of your program
|
||||||
|
doing the right thing. We'll call this *maximal* exception safety.
|
||||||
|
|
||||||
|
As is the case in many places in Rust, Unsafe code must be ready to deal with
|
||||||
|
bad Safe code when it comes to unwinding. Code that transiently creates
|
||||||
|
unsound states must be careful that a panic does not cause that state to be
|
||||||
|
used. Generally this means ensuring that only non-panicking code is run while
|
||||||
|
these states exist, or making a guard that cleans up the state in the case of
|
||||||
|
a panic. This does not necessarily mean that the state a panic witnesses is a
|
||||||
|
fully *coherent* state. We need only guarantee that it's a *safe* state.
|
||||||
|
|
||||||
|
Most Unsafe code is leaf-like, and therefore fairly easy to make exception-safe.
|
||||||
|
It controls all the code that runs, and most of that code can't panic. However
|
||||||
|
it is not uncommon for Unsafe code to work with arrays of temporarily
|
||||||
|
uninitialized data while repeatedly invoking caller-provided code. Such code
|
||||||
|
needs to be careful and consider exception safety.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Vec::push_all
|
||||||
|
|
||||||
|
`Vec::push_all` is a temporary hack to get extending a Vec by a slice reliably
|
||||||
|
effecient without specialization. Here's a simple implementation:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
impl<T: Clone> Vec<T> {
|
||||||
|
fn push_all(&mut self, to_push: &[T]) {
|
||||||
|
self.reserve(to_push.len());
|
||||||
|
unsafe {
|
||||||
|
// can't overflow because we just reserved this
|
||||||
|
self.set_len(self.len() + to_push.len());
|
||||||
|
|
||||||
|
for (i, x) in to_push.iter().enumerate() {
|
||||||
|
self.ptr().offset(i as isize).write(x.clone());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
We bypass `push` in order to avoid redundant capacity and `len` checks on the
|
||||||
|
Vec that we definitely know has capacity. The logic is totally correct, except
|
||||||
|
there's a subtle problem with our code: it's not exception-safe! `set_len`,
|
||||||
|
`offset`, and `write` are all fine, but *clone* is the panic bomb we over-looked.
|
||||||
|
|
||||||
|
Clone is completely out of our control, and is totally free to panic. If it does,
|
||||||
|
our function will exit early with the length of the Vec set too large. If
|
||||||
|
the Vec is looked at or dropped, uninitialized memory will be read!
|
||||||
|
|
||||||
|
The fix in this case is fairly simple. If we want to guarantee that the values
|
||||||
|
we *did* clone are dropped we can set the len *in* the loop. If we just want to
|
||||||
|
guarantee that uninitialized memory can't be observed, we can set the len *after*
|
||||||
|
the loop.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## BinaryHeap::sift_up
|
||||||
|
|
||||||
|
Bubbling an element up a heap is a bit more complicated than extending a Vec.
|
||||||
|
The pseudocode is as follows:
|
||||||
|
|
||||||
|
```text
|
||||||
|
bubble_up(heap, index):
|
||||||
|
while index != 0 && heap[index] < heap[parent(index)]:
|
||||||
|
heap.swap(index, parent(index))
|
||||||
|
index = parent(index)
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
A literal transcription of this code to Rust is totally fine, but has an annoying
|
||||||
|
performance characteristic: the `self` element is swapped over and over again
|
||||||
|
uselessly. We would *rather* have the following:
|
||||||
|
|
||||||
|
```text
|
||||||
|
bubble_up(heap, index):
|
||||||
|
let elem = heap[index]
|
||||||
|
while index != 0 && element < heap[parent(index)]:
|
||||||
|
heap[index] = heap[parent(index)]
|
||||||
|
index = parent(index)
|
||||||
|
heap[index] = elem
|
||||||
|
```
|
||||||
|
|
||||||
|
This code ensures that each element is copied as little as possible (it is in
|
||||||
|
fact necessary that elem be copied twice in general). However it now exposes
|
||||||
|
some exception safety trouble! At all times, there exists two copies of one
|
||||||
|
value. If we panic in this function something will be double-dropped.
|
||||||
|
Unfortunately, we also don't have full control of the code: that comparison is
|
||||||
|
user-defined!
|
||||||
|
|
||||||
|
Unlike Vec, the fix isn't as easy here. One option is to break the user-defined
|
||||||
|
code and the unsafe code into two separate phases:
|
||||||
|
|
||||||
|
```text
|
||||||
|
bubble_up(heap, index):
|
||||||
|
let end_index = index;
|
||||||
|
while end_index != 0 && heap[end_index] < heap[parent(end_index)]:
|
||||||
|
end_index = parent(end_index)
|
||||||
|
|
||||||
|
let elem = heap[index]
|
||||||
|
while index != end_index:
|
||||||
|
heap[index] = heap[parent(index)]
|
||||||
|
index = parent(index)
|
||||||
|
heap[index] = elem
|
||||||
|
```
|
||||||
|
|
||||||
|
If the user-defined code blows up, that's no problem anymore, because we haven't
|
||||||
|
actually touched the state of the heap yet. Once we do start messing with the
|
||||||
|
heap, we're working with only data and functions that we trust, so there's no
|
||||||
|
concern of panics.
|
||||||
|
|
||||||
|
Perhaps you're not happy with this design. Surely, it's cheating! And we have
|
||||||
|
to do the complex heap traversal *twice*! Alright, let's bite the bullet. Let's
|
||||||
|
intermix untrusted and unsafe code *for reals*.
|
||||||
|
|
||||||
|
If Rust had `try` and `finally` like in Java, we could do the following:
|
||||||
|
|
||||||
|
```text
|
||||||
|
bubble_up(heap, index):
|
||||||
|
let elem = heap[index]
|
||||||
|
try:
|
||||||
|
while index != 0 && element < heap[parent(index)]:
|
||||||
|
heap[index] = heap[parent(index)]
|
||||||
|
index = parent(index)
|
||||||
|
finally:
|
||||||
|
heap[index] = elem
|
||||||
|
```
|
||||||
|
|
||||||
|
The basic idea is simple: if the comparison panics, we just toss the loose
|
||||||
|
element in the logically uninitialized index and bail out. Anyone who observes
|
||||||
|
the heap will see a potentially *inconsistent* heap, but at least it won't
|
||||||
|
cause any double-drops! If the algorithm terminates normally, then this
|
||||||
|
operation happens to coincide precisely with the how we finish up regardless.
|
||||||
|
|
||||||
|
Sadly, Rust has no such construct, so we're going to need to roll our own! The
|
||||||
|
way to do this is to store the algorithm's state in a separate struct with a
|
||||||
|
destructor for the "finally" logic. Whether we panic or not, that destructor
|
||||||
|
will run and clean up after us.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
struct Hole<'a, T: 'a> {
|
||||||
|
data: &'a mut [T],
|
||||||
|
/// `elt` is always `Some` from new until drop.
|
||||||
|
elt: Option<T>,
|
||||||
|
pos: usize,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<'a, T> Hole<'a, T> {
|
||||||
|
fn new(data: &'a mut [T], pos: usize) -> Self {
|
||||||
|
unsafe {
|
||||||
|
let elt = ptr::read(&data[pos]);
|
||||||
|
Hole {
|
||||||
|
data: data,
|
||||||
|
elt: Some(elt),
|
||||||
|
pos: pos,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn pos(&self) -> usize { self.pos }
|
||||||
|
|
||||||
|
fn removed(&self) -> &T { self.elt.as_ref().unwrap() }
|
||||||
|
|
||||||
|
unsafe fn get(&self, index: usize) -> &T { &self.data[index] }
|
||||||
|
|
||||||
|
unsafe fn move_to(&mut self, index: usize) {
|
||||||
|
let index_ptr: *const _ = &self.data[index];
|
||||||
|
let hole_ptr = &mut self.data[self.pos];
|
||||||
|
ptr::copy_nonoverlapping(index_ptr, hole_ptr, 1);
|
||||||
|
self.pos = index;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<'a, T> Drop for Hole<'a, T> {
|
||||||
|
fn drop(&mut self) {
|
||||||
|
// fill the hole again
|
||||||
|
unsafe {
|
||||||
|
let pos = self.pos;
|
||||||
|
ptr::write(&mut self.data[pos], self.elt.take().unwrap());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl<T: Ord> BinaryHeap<T> {
|
||||||
|
fn sift_up(&mut self, pos: usize) {
|
||||||
|
unsafe {
|
||||||
|
// Take out the value at `pos` and create a hole.
|
||||||
|
let mut hole = Hole::new(&mut self.data, pos);
|
||||||
|
|
||||||
|
while hole.pos() != 0 {
|
||||||
|
let parent = parent(hole.pos());
|
||||||
|
if hole.removed() <= hole.get(parent) { break }
|
||||||
|
hole.move_to(parent);
|
||||||
|
}
|
||||||
|
// Hole will be unconditionally filled here; panic or not!
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
@ -0,0 +1,35 @@
|
|||||||
|
% Poisoning
|
||||||
|
|
||||||
|
Although all unsafe code *must* ensure it has minimal exception safety, not all
|
||||||
|
types ensure *maximal* exception safety. Even if the type does, your code may
|
||||||
|
ascribe additional meaning to it. For instance, an integer is certainly
|
||||||
|
exception-safe, but has no semantics on its own. It's possible that code that
|
||||||
|
panics could fail to correctly update the integer, producing an inconsistent
|
||||||
|
program state.
|
||||||
|
|
||||||
|
This is *usually* fine, because anything that witnesses an exception is about
|
||||||
|
to get destroyed. For instance, if you send a Vec to another thread and that
|
||||||
|
thread panics, it doesn't matter if the Vec is in a weird state. It will be
|
||||||
|
dropped and go away forever. However some types are especially good at smuggling
|
||||||
|
values across the panic boundary.
|
||||||
|
|
||||||
|
These types may choose to explicitly *poison* themselves if they witness a panic.
|
||||||
|
Poisoning doesn't entail anything in particular. Generally it just means
|
||||||
|
preventing normal usage from proceeding. The most notable example of this is the
|
||||||
|
standard library's Mutex type. A Mutex will poison itself if one of its
|
||||||
|
MutexGuards (the thing it returns when a lock is obtained) is dropped during a
|
||||||
|
panic. Any future attempts to lock the Mutex will return an `Err` or panic.
|
||||||
|
|
||||||
|
Mutex poisons not for *true* safety in the sense that Rust normally cares about. It
|
||||||
|
poisons as a safety-guard against blindly using the data that comes out of a Mutex
|
||||||
|
that has witnessed a panic while locked. The data in such a Mutex was likely in the
|
||||||
|
middle of being modified, and as such may be in an inconsistent or incomplete state.
|
||||||
|
It is important to note that one cannot violate memory safety with such a type
|
||||||
|
if it is correctly written. After all, it must be minimally exception-safe!
|
||||||
|
|
||||||
|
However if the Mutex contained, say, a BinaryHeap that does not actually have the
|
||||||
|
heap property, it's unlikely that any code that uses it will do
|
||||||
|
what the author intended. As such, the program should not proceed normally.
|
||||||
|
Still, if you're double-plus-sure that you can do *something* with the value,
|
||||||
|
the Mutex exposes a method to get the lock anyway. It *is* safe, after all.
|
||||||
|
Just maybe nonsense.
|
Loading…
Reference in new issue