clean up vec chapter of tarpl

pull/10/head
Alexis Beingessner 9 years ago committed by Manish Goregaokar
parent 7415230ad1
commit 3ddcf0929c

@ -46,6 +46,7 @@
* [Deref](vec-deref.md) * [Deref](vec-deref.md)
* [Insert and Remove](vec-insert-remove.md) * [Insert and Remove](vec-insert-remove.md)
* [IntoIter](vec-into-iter.md) * [IntoIter](vec-into-iter.md)
* [RawVec](vec-raw.md)
* [Drain](vec-drain.md) * [Drain](vec-drain.md)
* [Handling Zero-Sized Types](vec-zsts.md) * [Handling Zero-Sized Types](vec-zsts.md)
* [Final Code](vec-final.md) * [Final Code](vec-final.md)

@ -46,7 +46,7 @@ Okay, now we can write growing. Roughly, we want to have this logic:
if cap == 0: if cap == 0:
allocate() allocate()
cap = 1 cap = 1
else else:
reallocate reallocate
cap *= 2 cap *= 2
``` ```

@ -2,13 +2,13 @@
Next we should implement Drop so that we don't massively leak tons of resources. Next we should implement Drop so that we don't massively leak tons of resources.
The easiest way is to just call `pop` until it yields None, and then deallocate The easiest way is to just call `pop` until it yields None, and then deallocate
our buffer. Note that calling `pop` is uneeded if `T: !Drop`. In theory we can our buffer. Note that calling `pop` is unneeded if `T: !Drop`. In theory we can
ask Rust if T needs_drop and omit the calls to `pop`. However in practice LLVM ask Rust if `T` `needs_drop` and omit the calls to `pop`. However in practice
is *really* good at removing simple side-effect free code like this, so I wouldn't LLVM is *really* good at removing simple side-effect free code like this, so I
bother unless you notice it's not being stripped (in this case it is). wouldn't bother unless you notice it's not being stripped (in this case it is).
We must not call `heap::deallocate` when `self.cap == 0`, as in this case we haven't We must not call `heap::deallocate` when `self.cap == 0`, as in this case we
actually allocated any memory. haven't actually allocated any memory.
```rust,ignore ```rust,ignore

@ -1,13 +1,15 @@
% Deref % Deref
Alright! We've got a decent minimal ArrayStack implemented. We can push, we can Alright! We've got a decent minimal stack implemented. We can push, we can
pop, and we can clean up after ourselves. However there's a whole mess of functionality pop, and we can clean up after ourselves. However there's a whole mess of
we'd reasonably want. In particular, we have a proper array, but none of the slice functionality we'd reasonably want. In particular, we have a proper array, but
functionality. That's actually pretty easy to solve: we can implement `Deref<Target=[T]>`. none of the slice functionality. That's actually pretty easy to solve: we can
This will magically make our Vec coerce to and behave like a slice in all sorts of implement `Deref<Target=[T]>`. This will magically make our Vec coerce to, and
conditions. behave like, a slice in all sorts of conditions.
All we need is `slice::from_raw_parts`. All we need is `slice::from_raw_parts`. It will correctly handle empty slices
for us. Later once we set up zero-sized type support it will also Just Work
for those too.
```rust,ignore ```rust,ignore
use std::ops::Deref; use std::ops::Deref;
@ -36,5 +38,5 @@ impl<T> DerefMut for Vec<T> {
} }
``` ```
Now we have `len`, `first`, `last`, indexing, slicing, sorting, `iter`, `iter_mut`, Now we have `len`, `first`, `last`, indexing, slicing, sorting, `iter`,
and all other sorts of bells and whistles provided by slice. Sweet! `iter_mut`, and all other sorts of bells and whistles provided by slice. Sweet!

@ -2,7 +2,7 @@
Let's move on to Drain. Drain is largely the same as IntoIter, except that Let's move on to Drain. Drain is largely the same as IntoIter, except that
instead of consuming the Vec, it borrows the Vec and leaves its allocation instead of consuming the Vec, it borrows the Vec and leaves its allocation
free. For now we'll only implement the "basic" full-range version. untouched. For now we'll only implement the "basic" full-range version.
```rust,ignore ```rust,ignore
use std::marker::PhantomData; use std::marker::PhantomData;
@ -38,6 +38,9 @@ impl<T> RawValIter<T> {
RawValIter { RawValIter {
start: slice.as_ptr(), start: slice.as_ptr(),
end: if slice.len() == 0 { end: if slice.len() == 0 {
// if `len = 0`, then this is not actually allocated memory.
// Need to avoid offsetting because that will give wrong
// information to LLVM via GEP.
slice.as_ptr() slice.as_ptr()
} else { } else {
slice.as_ptr().offset(slice.len() as isize) slice.as_ptr().offset(slice.len() as isize)
@ -137,5 +140,7 @@ impl<T> Vec<T> {
} }
``` ```
For more details on the `mem::forget` problem, see the
[section on leaks][leaks].
[leaks]: leaking.html

@ -1,12 +1,13 @@
% Insert and Remove % Insert and Remove
Something *not* provided but slice is `insert` and `remove`, so let's do those next. Something *not* provided by slice is `insert` and `remove`, so let's do those
next.
Insert needs to shift all the elements at the target index to the right by one. Insert needs to shift all the elements at the target index to the right by one.
To do this we need to use `ptr::copy`, which is our version of C's `memmove`. To do this we need to use `ptr::copy`, which is our version of C's `memmove`.
This copies some chunk of memory from one location to another, correctly handling This copies some chunk of memory from one location to another, correctly
the case where the source and destination overlap (which will definitely happen handling the case where the source and destination overlap (which will
here). definitely happen here).
If we insert at index `i`, we want to shift the `[i .. len]` to `[i+1 .. len+1]` If we insert at index `i`, we want to shift the `[i .. len]` to `[i+1 .. len+1]`
using the *old* len. using the *old* len.

@ -11,19 +11,20 @@ allocation.
IntoIter needs to be DoubleEnded as well, to enable reading from both ends. IntoIter needs to be DoubleEnded as well, to enable reading from both ends.
Reading from the back could just be implemented as calling `pop`, but reading Reading from the back could just be implemented as calling `pop`, but reading
from the front is harder. We could call `remove(0)` but that would be insanely from the front is harder. We could call `remove(0)` but that would be insanely
expensive. Instead we're going to just use ptr::read to copy values out of either expensive. Instead we're going to just use ptr::read to copy values out of
end of the Vec without mutating the buffer at all. either end of the Vec without mutating the buffer at all.
To do this we're going to use a very common C idiom for array iteration. We'll To do this we're going to use a very common C idiom for array iteration. We'll
make two pointers; one that points to the start of the array, and one that points make two pointers; one that points to the start of the array, and one that
to one-element past the end. When we want an element from one end, we'll read out points to one-element past the end. When we want an element from one end, we'll
the value pointed to at that end and move the pointer over by one. When the two read out the value pointed to at that end and move the pointer over by one. When
pointers are equal, we know we're done. the two pointers are equal, we know we're done.
Note that the order of read and offset are reversed for `next` and `next_back` Note that the order of read and offset are reversed for `next` and `next_back`
For `next_back` the pointer is always *after* the element it wants to read next, For `next_back` the pointer is always *after* the element it wants to read next,
while for `next` the pointer is always *at* the element it wants to read next. while for `next` the pointer is always *at* the element it wants to read next.
To see why this is, consider the case where every element but one has been yielded. To see why this is, consider the case where every element but one has been
yielded.
The array looks like this: The array looks like this:
@ -35,6 +36,10 @@ The array looks like this:
If E pointed directly at the element it wanted to yield next, it would be If E pointed directly at the element it wanted to yield next, it would be
indistinguishable from the case where there are no more elements to yield. indistinguishable from the case where there are no more elements to yield.
Although we don't actually care about it during iteration, we also need to hold
onto the Vec's allocation information in order to free it once IntoIter is
dropped.
So we're going to use the following struct: So we're going to use the following struct:
```rust,ignore ```rust,ignore
@ -46,8 +51,8 @@ struct IntoIter<T> {
} }
``` ```
One last subtle detail: if our Vec is empty, we want to produce an empty iterator. One last subtle detail: if our Vec is empty, we want to produce an empty
This will actually technically fall out doing the naive thing of: iterator. This will actually technically fall out doing the naive thing of:
```text ```text
start = ptr start = ptr
@ -155,139 +160,3 @@ impl<T> Drop for IntoIter<T> {
} }
} }
``` ```
We've actually reached an interesting situation here: we've duplicated the logic
for specifying a buffer and freeing its memory. Now that we've implemented it and
identified *actual* logic duplication, this is a good time to perform some logic
compression.
We're going to abstract out the `(ptr, cap)` pair and give them the logic for
allocating, growing, and freeing:
```rust,ignore
struct RawVec<T> {
ptr: Unique<T>,
cap: usize,
}
impl<T> RawVec<T> {
fn new() -> Self {
assert!(mem::size_of::<T>() != 0, "TODO: implement ZST support");
unsafe {
RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: 0 }
}
}
// unchanged from Vec
fn grow(&mut self) {
unsafe {
let align = mem::align_of::<T>();
let elem_size = mem::size_of::<T>();
let (new_cap, ptr) = if self.cap == 0 {
let ptr = heap::allocate(elem_size, align);
(1, ptr)
} else {
let new_cap = 2 * self.cap;
let ptr = heap::reallocate(*self.ptr as *mut _,
self.cap * elem_size,
new_cap * elem_size,
align);
(new_cap, ptr)
};
// If allocate or reallocate fail, we'll get `null` back
if ptr.is_null() { oom() }
self.ptr = Unique::new(ptr as *mut _);
self.cap = new_cap;
}
}
}
impl<T> Drop for RawVec<T> {
fn drop(&mut self) {
if self.cap != 0 {
let align = mem::align_of::<T>();
let elem_size = mem::size_of::<T>();
let num_bytes = elem_size * self.cap;
unsafe {
heap::deallocate(*self.ptr as *mut _, num_bytes, align);
}
}
}
}
```
And change vec as follows:
```rust,ignore
pub struct Vec<T> {
buf: RawVec<T>,
len: usize,
}
impl<T> Vec<T> {
fn ptr(&self) -> *mut T { *self.buf.ptr }
fn cap(&self) -> usize { self.buf.cap }
pub fn new() -> Self {
Vec { buf: RawVec::new(), len: 0 }
}
// push/pop/insert/remove largely unchanged:
// * `self.ptr -> self.ptr()`
// * `self.cap -> self.cap()`
// * `self.grow -> self.buf.grow()`
}
impl<T> Drop for Vec<T> {
fn drop(&mut self) {
while let Some(_) = self.pop() {}
// deallocation is handled by RawVec
}
}
```
And finally we can really simplify IntoIter:
```rust,ignore
struct IntoIter<T> {
_buf: RawVec<T>, // we don't actually care about this. Just need it to live.
start: *const T,
end: *const T,
}
// next and next_back literally unchanged since they never referred to the buf
impl<T> Drop for IntoIter<T> {
fn drop(&mut self) {
// only need to ensure all our elements are read;
// buffer will clean itself up afterwards.
for _ in &mut *self {}
}
}
impl<T> Vec<T> {
pub fn into_iter(self) -> IntoIter<T> {
unsafe {
// need to use ptr::read to unsafely move the buf out since it's
// not Copy.
let buf = ptr::read(&self.buf);
let len = self.len;
mem::forget(self);
IntoIter {
start: *buf.ptr,
end: buf.ptr.offset(len as isize),
_buf: buf,
}
}
}
}
```
Much better.

@ -13,15 +13,15 @@ pub struct Vec<T> {
# fn main() {} # fn main() {}
``` ```
And indeed this would compile. Unfortunately, it would be incorrect. The compiler And indeed this would compile. Unfortunately, it would be incorrect. The
will give us too strict variance, so e.g. an `&Vec<&'static str>` couldn't be used compiler will give us too strict variance, so e.g. an `&Vec<&'static str>`
where an `&Vec<&'a str>` was expected. More importantly, it will give incorrect couldn't be used where an `&Vec<&'a str>` was expected. More importantly, it
ownership information to dropck, as it will conservatively assume we don't own will give incorrect ownership information to dropck, as it will conservatively
any values of type `T`. See [the chapter on ownership and lifetimes] assume we don't own any values of type `T`. See [the chapter on ownership and
(lifetimes.html) for details. lifetimes] (lifetimes.html) for details.
As we saw in the lifetimes chapter, we should use `Unique<T>` in place of `*mut T` As we saw in the lifetimes chapter, we should use `Unique<T>` in place of
when we have a raw pointer to an allocation we own: `*mut T` when we have a raw pointer to an allocation we own:
```rust ```rust
@ -40,9 +40,10 @@ pub struct Vec<T> {
As a recap, Unique is a wrapper around a raw pointer that declares that: As a recap, Unique is a wrapper around a raw pointer that declares that:
* We own at least one value of type `T` * We may own a value of type `T`
* We are Send/Sync iff `T` is Send/Sync * We are Send/Sync iff `T` is Send/Sync
* Our pointer is never null (and therefore `Option<Vec>` is null-pointer-optimized) * Our pointer is never null (and therefore `Option<Vec>` is
null-pointer-optimized)
That last point is subtle. First, it makes `Unique::new` unsafe to call, because That last point is subtle. First, it makes `Unique::new` unsafe to call, because
putting `null` inside of it is Undefined Behaviour. It also throws a putting `null` inside of it is Undefined Behaviour. It also throws a

@ -0,0 +1,136 @@
% RawVec
We've actually reached an interesting situation here: we've duplicated the logic
for specifying a buffer and freeing its memory. Now that we've implemented it
and identified *actual* logic duplication, this is a good time to perform some
logic compression.
We're going to abstract out the `(ptr, cap)` pair and give them the logic for
allocating, growing, and freeing:
```rust,ignore
struct RawVec<T> {
ptr: Unique<T>,
cap: usize,
}
impl<T> RawVec<T> {
fn new() -> Self {
assert!(mem::size_of::<T>() != 0, "TODO: implement ZST support");
unsafe {
RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: 0 }
}
}
// unchanged from Vec
fn grow(&mut self) {
unsafe {
let align = mem::align_of::<T>();
let elem_size = mem::size_of::<T>();
let (new_cap, ptr) = if self.cap == 0 {
let ptr = heap::allocate(elem_size, align);
(1, ptr)
} else {
let new_cap = 2 * self.cap;
let ptr = heap::reallocate(*self.ptr as *mut _,
self.cap * elem_size,
new_cap * elem_size,
align);
(new_cap, ptr)
};
// If allocate or reallocate fail, we'll get `null` back
if ptr.is_null() { oom() }
self.ptr = Unique::new(ptr as *mut _);
self.cap = new_cap;
}
}
}
impl<T> Drop for RawVec<T> {
fn drop(&mut self) {
if self.cap != 0 {
let align = mem::align_of::<T>();
let elem_size = mem::size_of::<T>();
let num_bytes = elem_size * self.cap;
unsafe {
heap::deallocate(*self.ptr as *mut _, num_bytes, align);
}
}
}
}
```
And change vec as follows:
```rust,ignore
pub struct Vec<T> {
buf: RawVec<T>,
len: usize,
}
impl<T> Vec<T> {
fn ptr(&self) -> *mut T { *self.buf.ptr }
fn cap(&self) -> usize { self.buf.cap }
pub fn new() -> Self {
Vec { buf: RawVec::new(), len: 0 }
}
// push/pop/insert/remove largely unchanged:
// * `self.ptr -> self.ptr()`
// * `self.cap -> self.cap()`
// * `self.grow -> self.buf.grow()`
}
impl<T> Drop for Vec<T> {
fn drop(&mut self) {
while let Some(_) = self.pop() {}
// deallocation is handled by RawVec
}
}
```
And finally we can really simplify IntoIter:
```rust,ignore
struct IntoIter<T> {
_buf: RawVec<T>, // we don't actually care about this. Just need it to live.
start: *const T,
end: *const T,
}
// next and next_back literally unchanged since they never referred to the buf
impl<T> Drop for IntoIter<T> {
fn drop(&mut self) {
// only need to ensure all our elements are read;
// buffer will clean itself up afterwards.
for _ in &mut *self {}
}
}
impl<T> Vec<T> {
pub fn into_iter(self) -> IntoIter<T> {
unsafe {
// need to use ptr::read to unsafely move the buf out since it's
// not Copy, and Vec implements Drop (so we can't destructure it).
let buf = ptr::read(&self.buf);
let len = self.len;
mem::forget(self);
IntoIter {
start: *buf.ptr,
end: buf.ptr.offset(len as isize),
_buf: buf,
}
}
}
}
```
Much better.
Loading…
Cancel
Save