|
|
|
# Unchecked Uninitialized Memory
|
|
|
|
|
|
|
|
One interesting exception to this rule is working with arrays. Safe Rust doesn't
|
|
|
|
permit you to partially initialize an array. When you initialize an array, you
|
|
|
|
can either set every value to the same thing with `let x = [val; N]`, or you can
|
|
|
|
specify each member individually with `let x = [val1, val2, val3]`.
|
|
|
|
Unfortunately this is pretty rigid, especially if you need to initialize your
|
|
|
|
array in a more incremental or dynamic way.
|
|
|
|
|
|
|
|
Unsafe Rust gives us a powerful tool to handle this problem:
|
|
|
|
[`MaybeUninit`]. This type can be used to handle memory that has not been fully
|
|
|
|
initialized yet.
|
|
|
|
|
|
|
|
With `MaybeUninit`, we can initialize an array element-for-element as follows:
|
|
|
|
|
|
|
|
```rust
|
|
|
|
use std::mem::{self, MaybeUninit};
|
|
|
|
|
|
|
|
// Size of the array is hard-coded but easy to change (meaning, changing just
|
|
|
|
// the constant is sufficient). This means we can't use [a, b, c] syntax to
|
|
|
|
// initialize the array, though, as we would have to keep that in sync
|
|
|
|
// with `SIZE`!
|
|
|
|
const SIZE: usize = 10;
|
|
|
|
|
|
|
|
let x = {
|
|
|
|
// Create an uninitialized array of `MaybeUninit`. The `assume_init` is
|
|
|
|
// safe because the type we are claiming to have initialized here is a
|
|
|
|
// bunch of `MaybeUninit`s, which do not require initialization.
|
|
|
|
let mut x: [MaybeUninit<Box<u32>>; SIZE] = unsafe {
|
|
|
|
MaybeUninit::uninit().assume_init()
|
|
|
|
};
|
|
|
|
|
|
|
|
// Dropping a `MaybeUninit` does nothing. Thus using raw pointer
|
|
|
|
// assignment instead of `ptr::write` does not cause the old
|
|
|
|
// uninitialized value to be dropped.
|
|
|
|
// Exception safety is not a concern because Box can't panic
|
|
|
|
for i in 0..SIZE {
|
|
|
|
x[i] = MaybeUninit::new(Box::new(i as u32));
|
|
|
|
}
|
|
|
|
|
|
|
|
// Everything is initialized. Transmute the array to the
|
|
|
|
// initialized type.
|
|
|
|
unsafe { mem::transmute::<_, [Box<u32>; SIZE]>(x) }
|
|
|
|
};
|
|
|
|
|
|
|
|
dbg!(x);
|
|
|
|
```
|
|
|
|
|
|
|
|
This code proceeds in three steps:
|
|
|
|
|
|
|
|
1. Create an array of `MaybeUninit<T>`. With current stable Rust, we have to use
|
|
|
|
unsafe code for this: we take some uninitialized piece of memory
|
|
|
|
(`MaybeUninit::uninit()`) and claim we have fully initialized it
|
|
|
|
([`assume_init()`][assume_init]). This seems ridiculous, because we didn't!
|
|
|
|
The reason this is correct is that the array consists itself entirely of
|
|
|
|
`MaybeUninit`, which do not actually require initialization. For most other
|
|
|
|
types, doing `MaybeUninit::uninit().assume_init()` produces an invalid
|
|
|
|
instance of said type, so you got yourself some Undefined Behavior.
|
|
|
|
|
|
|
|
2. Initialize the array. The subtle aspect of this is that usually, when we use
|
|
|
|
`=` to assign to a value that the Rust type checker considers to already be
|
|
|
|
initialized (like `x[i]`), the old value stored on the left-hand side gets
|
|
|
|
dropped. This would be a disaster. However, in this case, the type of the
|
|
|
|
left-hand side is `MaybeUninit<Box<u32>>`, and dropping that does not do
|
|
|
|
anything! See below for some more discussion of this `drop` issue.
|
|
|
|
|
|
|
|
3. Finally, we have to change the type of our array to remove the
|
|
|
|
`MaybeUninit`. With current stable Rust, this requires a `transmute`.
|
|
|
|
This transmute is legal because in memory, `MaybeUninit<T>` looks the same as `T`.
|
|
|
|
|
|
|
|
However, note that in general, `Container<MaybeUninit<T>>>` does *not* look
|
|
|
|
the same as `Container<T>`! Imagine if `Container` was `Option`, and `T` was
|
|
|
|
`bool`, then `Option<bool>` exploits that `bool` only has two valid values,
|
|
|
|
but `Option<MaybeUninit<bool>>` cannot do that because the `bool` does not
|
|
|
|
have to be initialized.
|
|
|
|
|
|
|
|
So, it depends on `Container` whether transmuting away the `MaybeUninit` is
|
|
|
|
allowed. For arrays, it is (and eventually the standard library will
|
|
|
|
acknowledge that by providing appropriate methods).
|
|
|
|
|
|
|
|
It's worth spending a bit more time on the loop in the middle, and in particular
|
|
|
|
the assignment operator and its interaction with `drop`. If we would have
|
|
|
|
written something like
|
|
|
|
```rust,ignore
|
|
|
|
*x[i].as_mut_ptr() = Box::new(i as u32); // WRONG!
|
|
|
|
```
|
|
|
|
we would actually overwrite a `Box<u32>`, leading to `drop` of uninitialized
|
|
|
|
data, which will cause much sadness and pain.
|
|
|
|
|
|
|
|
The correct alternative, if for some reason we cannot use `MaybeUninit::new`, is
|
|
|
|
to use the [`ptr`] module. In particular, it provides three functions that allow
|
|
|
|
us to assign bytes to a location in memory without dropping the old value:
|
|
|
|
[`write`], [`copy`], and [`copy_nonoverlapping`].
|
|
|
|
|
|
|
|
* `ptr::write(ptr, val)` takes a `val` and moves it into the address pointed
|
|
|
|
to by `ptr`.
|
|
|
|
* `ptr::copy(src, dest, count)` copies the bits that `count` T's would occupy
|
|
|
|
from src to dest. (this is equivalent to memmove -- note that the argument
|
|
|
|
order is reversed!)
|
|
|
|
* `ptr::copy_nonoverlapping(src, dest, count)` does what `copy` does, but a
|
|
|
|
little faster on the assumption that the two ranges of memory don't overlap.
|
|
|
|
(this is equivalent to memcpy -- note that the argument order is reversed!)
|
|
|
|
|
|
|
|
It should go without saying that these functions, if misused, will cause serious
|
|
|
|
havoc or just straight up Undefined Behavior. The only things that these
|
|
|
|
functions *themselves* require is that the locations you want to read and write
|
|
|
|
are allocated and properly aligned. However, the ways writing arbitrary bits to
|
|
|
|
arbitrary locations of memory can break things are basically uncountable!
|
|
|
|
|
|
|
|
It's worth noting that you don't need to worry about `ptr::write`-style
|
|
|
|
shenanigans with types which don't implement `Drop` or contain `Drop` types,
|
|
|
|
because Rust knows not to try to drop them. This is what we relied on in the
|
|
|
|
above example.
|
|
|
|
|
|
|
|
However when working with uninitialized memory you need to be ever-vigilant for
|
|
|
|
Rust trying to drop values you make like this before they're fully initialized.
|
|
|
|
Every control path through that variable's scope must initialize the value
|
|
|
|
before it ends, if it has a destructor.
|
|
|
|
*[This includes code panicking](unwinding.html)*. `MaybeUninit` helps a bit
|
|
|
|
here, because it does not implicitly drop its content - but all this really
|
|
|
|
means in case of a panic is that instead of a double-free of the not yet
|
|
|
|
initialized parts, you end up with a memory leak of the already initialized
|
|
|
|
parts.
|
|
|
|
|
|
|
|
Note that, to use the `ptr` methods, you need to first obtain a *raw pointer* to
|
|
|
|
the data you want to initialize. It is illegal to construct a *reference* to
|
|
|
|
uninitialized data, which implies that you have to be careful when obtaining
|
|
|
|
said raw pointer:
|
|
|
|
* For an array of `T`, you can use `base_ptr.add(idx)` where `base_ptr: *mut T`
|
|
|
|
to compute the address of array index `idx`. This relies on
|
|
|
|
how arrays are laid out in memory.
|
|
|
|
* For a struct, however, in general we do not know how it is laid out, and we
|
|
|
|
also cannot use `&mut base_ptr.field` as that would be creating a
|
|
|
|
reference. Thus, it is currently not possible to create a raw pointer to a field
|
|
|
|
of a partially initialized struct, and also not possible to initialize a single
|
|
|
|
field of a partially initialized struct. (A
|
|
|
|
[solution to this problem](https://github.com/rust-lang/rfcs/pull/2582) is being
|
|
|
|
worked on.)
|
|
|
|
|
|
|
|
One last remark: when reading old Rust code, you might stumble upon the
|
|
|
|
deprecated `mem::uninitialized` function. That function used to be the only way
|
|
|
|
to deal with uninitialized memory on the stack, but it turned out to be
|
|
|
|
impossible to properly integrate with the rest of the language. Always use
|
|
|
|
`MaybeUninit` instead in new code, and port old code over when you get the
|
|
|
|
opportunity.
|
|
|
|
|
|
|
|
And that's about it for working with uninitialized memory! Basically nothing
|
|
|
|
anywhere expects to be handed uninitialized memory, so if you're going to pass
|
|
|
|
it around at all, be sure to be *really* careful.
|
|
|
|
|
|
|
|
[`MaybeUninit`]: ../core/mem/union.MaybeUninit.html
|
|
|
|
[assume_init]: ../core/mem/union.MaybeUninit.html#method.assume_init
|
|
|
|
[`ptr`]: ../core/ptr/index.html
|
|
|
|
[`write`]: ../core/ptr/fn.write.html
|
|
|
|
[`copy`]: ../std/ptr/fn.copy.html
|
|
|
|
[`copy_nonoverlapping`]: ../std/ptr/fn.copy_nonoverlapping.html
|