get into the weeds over GEP and allocations

pull/10/head
Alexis Beingessner 9 years ago committed by Manish Goregaokar
parent 9d578c5d11
commit 2479c91cf7

@ -1,5 +1,22 @@
% Allocating Memory
Using Unique throws a wrench in an important feature of Vec (and indeed all of
the std collections): an empty Vec doesn't actually allocate at all. So if we
can't allocate, but also can't put a null pointer in `ptr`, what do we do in
`Vec::new`? Well, we just put some other garbage in there!
This is perfectly fine because we already have `cap == 0` as our sentinel for no
allocation. We don't even need to handle it specially in almost any code because
we usually need to check if `cap > len` or `len > 0` anyway. The traditional
Rust value to put here is `0x01`. The standard library actually exposes this
as `std::rt::heap::EMPTY`. There are quite a few places where we'll
want to use `heap::EMPTY` because there's no real allocation to talk about but
`null` would make the compiler do bad things.
All of the `heap` API is totally unstable under the `heap_api` feature, though.
We could trivially define `heap::EMPTY` ourselves, but we'll want the rest of
the `heap` API anyway, so let's just get that dependency over with.
So:
```rust,ignore
@ -24,15 +41,29 @@ I slipped in that assert there because zero-sized types will require some
special handling throughout our code, and I want to defer the issue for now.
Without this assert, some of our early drafts will do some Very Bad Things.
Next we need to figure out what to actually do when we *do* want space. For that,
we'll need to use the rest of the heap APIs. These basically allow us to
talk directly to Rust's instance of jemalloc.
We'll also need a way to handle out-of-memory conditions. The standard library
calls the `abort` intrinsic, but calling intrinsics from normal Rust code is a
pretty bad idea. Unfortunately, the `abort` exposed by the standard library
allocates. Not something we want to do during `oom`! Instead, we'll call
`std::process::exit`.
Next we need to figure out what to actually do when we *do* want space. For
that, we'll need to use the rest of the heap APIs. These basically allow us to
talk directly to Rust's allocator (jemalloc by default).
We'll also need a way to handle out-of-memory (OOM) conditions. The standard
library calls the `abort` intrinsic, which just calls an illegal instruction to
crash the whole program. The reason we abort and don't panic is because
unwinding can cause allocations to happen, and that seems like a bad thing to do
when your allocator just came back with "hey I don't have any more memory".
Of course, this is a bit silly since most platforms don't actually run out of
memory in a conventional way. Your operating system will probably kill the
application by another means if you legitimately start using up all the memory.
The most likely way we'll trigger OOM is by just asking for ludicrous quantities
of memory at once (e.g. half the theoretical address space). As such it's
*probably* fine to panic and nothing bad will happen. Still, we're trying to be
like the standard library as much as possible, so we'll just kill the whole
program.
We said we don't want to use intrinsics, so doing *exactly* what `std` does is
out. `std::rt::util::abort` actually exists, but it takes a message to print,
which will probably allocate. Also it's still unstable. Instead, we'll call
`std::process::exit` with some random number.
```rust
fn oom() {
@ -51,29 +82,104 @@ else:
cap *= 2
```
But Rust's only supported allocator API is so low level that we'll need to
do a fair bit of extra work, though. We also need to guard against some special
conditions that can occur with really large allocations. In particular, we index
into arrays using unsigned integers, but `ptr::offset` takes signed integers. This
means Bad Things will happen if we ever manage to grow to contain more than
`isize::MAX` elements. Thankfully, this isn't something we need to worry about
in most cases.
But Rust's only supported allocator API is so low level that we'll need to do a
fair bit of extra work. We also need to guard against some special
conditions that can occur with really large allocations or empty allocations.
In particular, `ptr::offset` will cause us *a lot* of trouble, because it has
the semantics of LLVM's GEP inbounds instruction. If you're fortunate enough to
not have dealt with this instruction, here's the basic story with GEP: alias
analysis, alias analysis, alias analysis. It's super important to an optimizing
compiler to be able to reason about data dependencies and aliasing.
On 64-bit targets we're artifically limited to only 48-bits, so we'll run out
of memory far before we reach that point. However on 32-bit targets, particularly
those with extensions to use more of the address space, it's theoretically possible
to successfully allocate more than `isize::MAX` bytes of memory. Still, we only
really need to worry about that if we're allocating elements that are a byte large.
Anything else will use up too much space.
As a simple example, consider the following fragment of code:
```rust
# let x = &mut 0;
# let y = &mut 0;
*x *= 7;
*y *= 3;
```
However since this is a tutorial, we're not going to be particularly optimal here,
and just unconditionally check, rather than use clever platform-specific `cfg`s.
If the compiler can prove that `x` and `y` point to different locations in
memory, the two operations can in theory be executed in parallel (by e.g.
loading them into different registers and working on them independently).
However in *general* the compiler can't do this because if x and y point to
the same location in memory, the operations need to be done to the same value,
and they can't just be merged afterwards.
When you use GEP inbounds, you are specifically telling LLVM that the offsets
you're about to do are within the bounds of a single allocated entity. The
ultimate payoff being that LLVM can assume that if two pointers are known to
point to two disjoint objects, all the offsets of those pointers are *also*
known to not alias (because you won't just end up in some random place in
memory). LLVM is heavily optimized to work with GEP offsets, and inbounds
offsets are the best of all, so it's important that we use them as much as
possible.
So that's what GEP's about, how can it cause us trouble?
The first problem is that we index into arrays with unsigned integers, but
GEP (and as a consequence `ptr::offset`) takes a *signed integer*. This means
that half of the seemingly valid indices into an array will overflow GEP and
actually go in the wrong direction! As such we must limit all allocations to
`isize::MAX` elements. This actually means we only need to worry about
byte-sized objects, because e.g. `> isize::MAX` `u16`s will truly exhaust all of
the system's memory. However in order to avoid subtle corner cases where someone
reinterprets some array of `< isize::MAX` objects as bytes, std limits all
allocations to `isize::MAX` bytes.
On all 64-bit targets that Rust currently supports we're artificially limited
to significantly less than all 64 bits of the address space (modern x64
platforms only expose 48-bit addressing), so we can rely on just running out of
memory first. However on 32-bit targets, particularly those with extensions to
use more of the address space (PAE x86 or x32), it's theoretically possible to
successfully allocate more than `isize::MAX` bytes of memory.
However since this is a tutorial, we're not going to be particularly optimal
here, and just unconditionally check, rather than use clever platform-specific
`cfg`s.
The other corner-case we need to worry about is *empty* allocations. There will
be two kinds of empty allocations we need to worry about: `cap = 0` for all T,
and `cap > 0` for zero-sized types.
These cases are tricky because they come
down to what LLVM means by "allocated". LLVM's notion of an
allocation is significantly more abstract than how we usually use it. Because
LLVM needs to work with different languages' semantics and custom allocators,
it can't really intimately understand allocation. Instead, the main idea behind
allocation is "doesn't overlap with other stuff". That is, heap allocations,
stack allocations, and globals don't randomly overlap. Yep, it's about alias
analysis. As such, Rust can technically play a bit fast an loose with the notion of
an allocation as long as it's *consistent*.
Getting back to the empty allocation case, there are a couple of places where
we want to offset by 0 as a consequence of generic code. The question is then:
is it consistent to do so? For zero-sized types, we have concluded that it is
indeed consistent to do a GEP inbounds offset by an arbitrary number of
elements. This is a runtime no-op because every element takes up no space,
and it's fine to pretend that there's infinite zero-sized types allocated
at `0x01`. No allocator will ever allocate that address, because they won't
allocate `0x00` and they generally allocate to some minimal alignment higher
than a byte.
However what about for positive-sized types? That one's a bit trickier. In
principle, you can argue that offsetting by 0 gives LLVM no information: either
there's an element before the address, or after it, but it can't know which.
However we've chosen to conservatively assume that it may do bad things. As
such we *will* guard against this case explicitly.
*Phew*
Ok with all the nonsense out of the way, let's actually allocate some memory:
```rust,ignore
fn grow(&mut self) {
// this is all pretty delicate, so let's say it's all unsafe
unsafe {
let align = mem::min_align_of::<T>();
// current API requires us to specify size and alignment manually.
let align = mem::align_of::<T>();
let elem_size = mem::size_of::<T>();
let (new_cap, ptr) = if self.cap == 0 {

@ -13,15 +13,64 @@ pub struct Vec<T> {
# fn main() {}
```
And indeed this would compile. Unfortunately, it would be incorrect. The
compiler will give us too strict variance, so e.g. an `&Vec<&'static str>`
And indeed this would compile. Unfortunately, it would be incorrect. First, the
compiler will give us too strict variance. So a `&Vec<&'static str>`
couldn't be used where an `&Vec<&'a str>` was expected. More importantly, it
will give incorrect ownership information to dropck, as it will conservatively
assume we don't own any values of type `T`. See [the chapter on ownership and
lifetimes] (lifetimes.html) for details.
will give incorrect ownership information to the drop checker, as it will
conservatively assume we don't own any values of type `T`. See [the chapter
on ownership and lifetimes][ownership] for all the details on variance and
drop check.
As we saw in the lifetimes chapter, we should use `Unique<T>` in place of
`*mut T` when we have a raw pointer to an allocation we own:
As we saw in the ownership chapter, we should use `Unique<T>` in place of
`*mut T` when we have a raw pointer to an allocation we own. Unique is unstable,
so we'd like to not use it if possible, though.
As a recap, Unique is a wrapper around a raw pointer that declares that:
* We are variant over `T`
* We may own a value of type `T` (for drop check)
* We are Send/Sync if `T` is Send/Sync
* We deref to `*mut T` (so it largely acts like a `*mut` in our code)
* Our pointer is never null (so `Option<Vec<T>>` is null-pointer-optimized)
We can implement all of the above requirements except for the last
one in stable Rust:
```rust
use std::marker::PhantomData;
use std::ops::Deref;
use std::mem;
struct Unique<T> {
ptr: *const T, // *const for variance
_marker: PhantomData<T>, // For the drop checker
}
// Deriving Send and Sync is safe because we are the Unique owners
// of this data. It's like Unique<T> is "just" T.
unsafe impl<T: Send> Send for Unique<T> {}
unsafe impl<T: Sync> Sync for Unique<T> {}
impl<T> Unique<T> {
pub fn new(ptr: *mut T) -> Self {
Unique { ptr: ptr, _marker: PhantomData }
}
}
impl<T> Deref for Unique<T> {
type Target = *mut T;
fn deref(&self) -> &*mut T {
// There's no way to cast the *const to a *mut
// while also taking a reference. So we just
// transmute it since it's all "just pointers".
unsafe { mem::transmute(&self.ptr) }
}
}
```
Unfortunately the mechanism for stating that your value is non-zero is
unstable and unlikely to be stabilized soon. As such we're just going to
take the hit and use std's Unique:
```rust
@ -38,29 +87,11 @@ pub struct Vec<T> {
# fn main() {}
```
As a recap, Unique is a wrapper around a raw pointer that declares that:
* We may own a value of type `T`
* We are Send/Sync iff `T` is Send/Sync
* Our pointer is never null (and therefore `Option<Vec>` is
null-pointer-optimized)
That last point is subtle. First, it makes `Unique::new` unsafe to call, because
putting `null` inside of it is Undefined Behaviour. It also throws a
wrench in an important feature of Vec (and indeed all of the std collections):
an empty Vec doesn't actually allocate at all. So if we can't allocate,
but also can't put a null pointer in `ptr`, what do we do in
`Vec::new`? Well, we just put some other garbage in there!
This is perfectly fine because we already have `cap == 0` as our sentinel for no
allocation. We don't even need to handle it specially in almost any code because
we usually need to check if `cap > len` or `len > 0` anyway. The traditional
Rust value to put here is `0x01`. The standard library actually exposes this
as `std::rt::heap::EMPTY`. There are quite a few places where we'll want to use
`heap::EMPTY` because there's no real allocation to talk about but `null` would
make the compiler angry.
All of the `heap` API is totally unstable under the `heap_api` feature, though.
We could trivially define `heap::EMPTY` ourselves, but we'll want the rest of
the `heap` API anyway, so let's just get that dependency over with.
If you don't care about the null-pointer optimization, then you can use the
stable code. However we will be designing the rest of the code around enabling
the optimization. In particular, `Unique::new` is unsafe to call, because
putting `null` inside of it is Undefined Behaviour. Our stable Unique doesn't
need `new` to be unsafe because it doesn't make any interesting guarantees about
its contents.
[ownership]: ownership.html

@ -2,5 +2,19 @@
To bring everything together, we're going to write `std::Vec` from scratch.
Because all the best tools for writing unsafe code are unstable, this
project will only work on nightly (as of Rust 1.2.0).
project will only work on nightly (as of Rust 1.2.0). With the exception of the
allocator API, much of the unstable code we'll use is expected to be stabilized
in a similar form as it is today.
However we will generally try to avoid unstable code where possible. In
particular we won't use any intrinsics that could make a code a little
bit nicer or efficient because intrinsics are permanently unstable. Although
many intrinsics *do* become stabilized elsewhere (`std::ptr` and `str::mem`
consist of many intrinsics).
Ultimately this means out implementation may not take advantage of all
possible optimizations, though it will be by no means *naive*. We will
definitely get into the weeds over nitty-gritty details, even
when the problem doesn't *really* merit it.
You wanted advanced. We're gonna go advanced.

Loading…
Cancel
Save