clean up vec chapter of tarpl

pull/10/head
Alexis Beingessner 9 years ago committed by Manish Goregaokar
parent 7415230ad1
commit 3ddcf0929c

@ -46,6 +46,7 @@
* [Deref](vec-deref.md)
* [Insert and Remove](vec-insert-remove.md)
* [IntoIter](vec-into-iter.md)
* [RawVec](vec-raw.md)
* [Drain](vec-drain.md)
* [Handling Zero-Sized Types](vec-zsts.md)
* [Final Code](vec-final.md)

@ -46,7 +46,7 @@ Okay, now we can write growing. Roughly, we want to have this logic:
if cap == 0:
allocate()
cap = 1
else
else:
reallocate
cap *= 2
```

@ -2,13 +2,13 @@
Next we should implement Drop so that we don't massively leak tons of resources.
The easiest way is to just call `pop` until it yields None, and then deallocate
our buffer. Note that calling `pop` is uneeded if `T: !Drop`. In theory we can
ask Rust if T needs_drop and omit the calls to `pop`. However in practice LLVM
is *really* good at removing simple side-effect free code like this, so I wouldn't
bother unless you notice it's not being stripped (in this case it is).
our buffer. Note that calling `pop` is unneeded if `T: !Drop`. In theory we can
ask Rust if `T` `needs_drop` and omit the calls to `pop`. However in practice
LLVM is *really* good at removing simple side-effect free code like this, so I
wouldn't bother unless you notice it's not being stripped (in this case it is).
We must not call `heap::deallocate` when `self.cap == 0`, as in this case we haven't
actually allocated any memory.
We must not call `heap::deallocate` when `self.cap == 0`, as in this case we
haven't actually allocated any memory.
```rust,ignore

@ -1,13 +1,15 @@
% Deref
Alright! We've got a decent minimal ArrayStack implemented. We can push, we can
pop, and we can clean up after ourselves. However there's a whole mess of functionality
we'd reasonably want. In particular, we have a proper array, but none of the slice
functionality. That's actually pretty easy to solve: we can implement `Deref<Target=[T]>`.
This will magically make our Vec coerce to and behave like a slice in all sorts of
conditions.
Alright! We've got a decent minimal stack implemented. We can push, we can
pop, and we can clean up after ourselves. However there's a whole mess of
functionality we'd reasonably want. In particular, we have a proper array, but
none of the slice functionality. That's actually pretty easy to solve: we can
implement `Deref<Target=[T]>`. This will magically make our Vec coerce to, and
behave like, a slice in all sorts of conditions.
All we need is `slice::from_raw_parts`.
All we need is `slice::from_raw_parts`. It will correctly handle empty slices
for us. Later once we set up zero-sized type support it will also Just Work
for those too.
```rust,ignore
use std::ops::Deref;
@ -36,5 +38,5 @@ impl<T> DerefMut for Vec<T> {
}
```
Now we have `len`, `first`, `last`, indexing, slicing, sorting, `iter`, `iter_mut`,
and all other sorts of bells and whistles provided by slice. Sweet!
Now we have `len`, `first`, `last`, indexing, slicing, sorting, `iter`,
`iter_mut`, and all other sorts of bells and whistles provided by slice. Sweet!

@ -2,7 +2,7 @@
Let's move on to Drain. Drain is largely the same as IntoIter, except that
instead of consuming the Vec, it borrows the Vec and leaves its allocation
free. For now we'll only implement the "basic" full-range version.
untouched. For now we'll only implement the "basic" full-range version.
```rust,ignore
use std::marker::PhantomData;
@ -38,6 +38,9 @@ impl<T> RawValIter<T> {
RawValIter {
start: slice.as_ptr(),
end: if slice.len() == 0 {
// if `len = 0`, then this is not actually allocated memory.
// Need to avoid offsetting because that will give wrong
// information to LLVM via GEP.
slice.as_ptr()
} else {
slice.as_ptr().offset(slice.len() as isize)
@ -137,5 +140,7 @@ impl<T> Vec<T> {
}
```
For more details on the `mem::forget` problem, see the
[section on leaks][leaks].
[leaks]: leaking.html

@ -1,12 +1,13 @@
% Insert and Remove
Something *not* provided but slice is `insert` and `remove`, so let's do those next.
Something *not* provided by slice is `insert` and `remove`, so let's do those
next.
Insert needs to shift all the elements at the target index to the right by one.
To do this we need to use `ptr::copy`, which is our version of C's `memmove`.
This copies some chunk of memory from one location to another, correctly handling
the case where the source and destination overlap (which will definitely happen
here).
This copies some chunk of memory from one location to another, correctly
handling the case where the source and destination overlap (which will
definitely happen here).
If we insert at index `i`, we want to shift the `[i .. len]` to `[i+1 .. len+1]`
using the *old* len.

@ -11,19 +11,20 @@ allocation.
IntoIter needs to be DoubleEnded as well, to enable reading from both ends.
Reading from the back could just be implemented as calling `pop`, but reading
from the front is harder. We could call `remove(0)` but that would be insanely
expensive. Instead we're going to just use ptr::read to copy values out of either
end of the Vec without mutating the buffer at all.
expensive. Instead we're going to just use ptr::read to copy values out of
either end of the Vec without mutating the buffer at all.
To do this we're going to use a very common C idiom for array iteration. We'll
make two pointers; one that points to the start of the array, and one that points
to one-element past the end. When we want an element from one end, we'll read out
the value pointed to at that end and move the pointer over by one. When the two
pointers are equal, we know we're done.
make two pointers; one that points to the start of the array, and one that
points to one-element past the end. When we want an element from one end, we'll
read out the value pointed to at that end and move the pointer over by one. When
the two pointers are equal, we know we're done.
Note that the order of read and offset are reversed for `next` and `next_back`
For `next_back` the pointer is always *after* the element it wants to read next,
while for `next` the pointer is always *at* the element it wants to read next.
To see why this is, consider the case where every element but one has been yielded.
To see why this is, consider the case where every element but one has been
yielded.
The array looks like this:
@ -35,6 +36,10 @@ The array looks like this:
If E pointed directly at the element it wanted to yield next, it would be
indistinguishable from the case where there are no more elements to yield.
Although we don't actually care about it during iteration, we also need to hold
onto the Vec's allocation information in order to free it once IntoIter is
dropped.
So we're going to use the following struct:
```rust,ignore
@ -46,8 +51,8 @@ struct IntoIter<T> {
}
```
One last subtle detail: if our Vec is empty, we want to produce an empty iterator.
This will actually technically fall out doing the naive thing of:
One last subtle detail: if our Vec is empty, we want to produce an empty
iterator. This will actually technically fall out doing the naive thing of:
```text
start = ptr
@ -155,139 +160,3 @@ impl<T> Drop for IntoIter<T> {
}
}
```
We've actually reached an interesting situation here: we've duplicated the logic
for specifying a buffer and freeing its memory. Now that we've implemented it and
identified *actual* logic duplication, this is a good time to perform some logic
compression.
We're going to abstract out the `(ptr, cap)` pair and give them the logic for
allocating, growing, and freeing:
```rust,ignore
struct RawVec<T> {
ptr: Unique<T>,
cap: usize,
}
impl<T> RawVec<T> {
fn new() -> Self {
assert!(mem::size_of::<T>() != 0, "TODO: implement ZST support");
unsafe {
RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: 0 }
}
}
// unchanged from Vec
fn grow(&mut self) {
unsafe {
let align = mem::align_of::<T>();
let elem_size = mem::size_of::<T>();
let (new_cap, ptr) = if self.cap == 0 {
let ptr = heap::allocate(elem_size, align);
(1, ptr)
} else {
let new_cap = 2 * self.cap;
let ptr = heap::reallocate(*self.ptr as *mut _,
self.cap * elem_size,
new_cap * elem_size,
align);
(new_cap, ptr)
};
// If allocate or reallocate fail, we'll get `null` back
if ptr.is_null() { oom() }
self.ptr = Unique::new(ptr as *mut _);
self.cap = new_cap;
}
}
}
impl<T> Drop for RawVec<T> {
fn drop(&mut self) {
if self.cap != 0 {
let align = mem::align_of::<T>();
let elem_size = mem::size_of::<T>();
let num_bytes = elem_size * self.cap;
unsafe {
heap::deallocate(*self.ptr as *mut _, num_bytes, align);
}
}
}
}
```
And change vec as follows:
```rust,ignore
pub struct Vec<T> {
buf: RawVec<T>,
len: usize,
}
impl<T> Vec<T> {
fn ptr(&self) -> *mut T { *self.buf.ptr }
fn cap(&self) -> usize { self.buf.cap }
pub fn new() -> Self {
Vec { buf: RawVec::new(), len: 0 }
}
// push/pop/insert/remove largely unchanged:
// * `self.ptr -> self.ptr()`
// * `self.cap -> self.cap()`
// * `self.grow -> self.buf.grow()`
}
impl<T> Drop for Vec<T> {
fn drop(&mut self) {
while let Some(_) = self.pop() {}
// deallocation is handled by RawVec
}
}
```
And finally we can really simplify IntoIter:
```rust,ignore
struct IntoIter<T> {
_buf: RawVec<T>, // we don't actually care about this. Just need it to live.
start: *const T,
end: *const T,
}
// next and next_back literally unchanged since they never referred to the buf
impl<T> Drop for IntoIter<T> {
fn drop(&mut self) {
// only need to ensure all our elements are read;
// buffer will clean itself up afterwards.
for _ in &mut *self {}
}
}
impl<T> Vec<T> {
pub fn into_iter(self) -> IntoIter<T> {
unsafe {
// need to use ptr::read to unsafely move the buf out since it's
// not Copy.
let buf = ptr::read(&self.buf);
let len = self.len;
mem::forget(self);
IntoIter {
start: *buf.ptr,
end: buf.ptr.offset(len as isize),
_buf: buf,
}
}
}
}
```
Much better.

@ -13,15 +13,15 @@ pub struct Vec<T> {
# fn main() {}
```
And indeed this would compile. Unfortunately, it would be incorrect. The compiler
will give us too strict variance, so e.g. an `&Vec<&'static str>` couldn't be used
where an `&Vec<&'a str>` was expected. More importantly, it will give incorrect
ownership information to dropck, as it will conservatively assume we don't own
any values of type `T`. See [the chapter on ownership and lifetimes]
(lifetimes.html) for details.
And indeed this would compile. Unfortunately, it would be incorrect. The
compiler will give us too strict variance, so e.g. an `&Vec<&'static str>`
couldn't be used where an `&Vec<&'a str>` was expected. More importantly, it
will give incorrect ownership information to dropck, as it will conservatively
assume we don't own any values of type `T`. See [the chapter on ownership and
lifetimes] (lifetimes.html) for details.
As we saw in the lifetimes chapter, we should use `Unique<T>` in place of `*mut T`
when we have a raw pointer to an allocation we own:
As we saw in the lifetimes chapter, we should use `Unique<T>` in place of
`*mut T` when we have a raw pointer to an allocation we own:
```rust
@ -40,9 +40,10 @@ pub struct Vec<T> {
As a recap, Unique is a wrapper around a raw pointer that declares that:
* We own at least one value of type `T`
* We may own a value of type `T`
* We are Send/Sync iff `T` is Send/Sync
* Our pointer is never null (and therefore `Option<Vec>` is null-pointer-optimized)
* Our pointer is never null (and therefore `Option<Vec>` is
null-pointer-optimized)
That last point is subtle. First, it makes `Unique::new` unsafe to call, because
putting `null` inside of it is Undefined Behaviour. It also throws a

@ -0,0 +1,136 @@
% RawVec
We've actually reached an interesting situation here: we've duplicated the logic
for specifying a buffer and freeing its memory. Now that we've implemented it
and identified *actual* logic duplication, this is a good time to perform some
logic compression.
We're going to abstract out the `(ptr, cap)` pair and give them the logic for
allocating, growing, and freeing:
```rust,ignore
struct RawVec<T> {
ptr: Unique<T>,
cap: usize,
}
impl<T> RawVec<T> {
fn new() -> Self {
assert!(mem::size_of::<T>() != 0, "TODO: implement ZST support");
unsafe {
RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: 0 }
}
}
// unchanged from Vec
fn grow(&mut self) {
unsafe {
let align = mem::align_of::<T>();
let elem_size = mem::size_of::<T>();
let (new_cap, ptr) = if self.cap == 0 {
let ptr = heap::allocate(elem_size, align);
(1, ptr)
} else {
let new_cap = 2 * self.cap;
let ptr = heap::reallocate(*self.ptr as *mut _,
self.cap * elem_size,
new_cap * elem_size,
align);
(new_cap, ptr)
};
// If allocate or reallocate fail, we'll get `null` back
if ptr.is_null() { oom() }
self.ptr = Unique::new(ptr as *mut _);
self.cap = new_cap;
}
}
}
impl<T> Drop for RawVec<T> {
fn drop(&mut self) {
if self.cap != 0 {
let align = mem::align_of::<T>();
let elem_size = mem::size_of::<T>();
let num_bytes = elem_size * self.cap;
unsafe {
heap::deallocate(*self.ptr as *mut _, num_bytes, align);
}
}
}
}
```
And change vec as follows:
```rust,ignore
pub struct Vec<T> {
buf: RawVec<T>,
len: usize,
}
impl<T> Vec<T> {
fn ptr(&self) -> *mut T { *self.buf.ptr }
fn cap(&self) -> usize { self.buf.cap }
pub fn new() -> Self {
Vec { buf: RawVec::new(), len: 0 }
}
// push/pop/insert/remove largely unchanged:
// * `self.ptr -> self.ptr()`
// * `self.cap -> self.cap()`
// * `self.grow -> self.buf.grow()`
}
impl<T> Drop for Vec<T> {
fn drop(&mut self) {
while let Some(_) = self.pop() {}
// deallocation is handled by RawVec
}
}
```
And finally we can really simplify IntoIter:
```rust,ignore
struct IntoIter<T> {
_buf: RawVec<T>, // we don't actually care about this. Just need it to live.
start: *const T,
end: *const T,
}
// next and next_back literally unchanged since they never referred to the buf
impl<T> Drop for IntoIter<T> {
fn drop(&mut self) {
// only need to ensure all our elements are read;
// buffer will clean itself up afterwards.
for _ in &mut *self {}
}
}
impl<T> Vec<T> {
pub fn into_iter(self) -> IntoIter<T> {
unsafe {
// need to use ptr::read to unsafely move the buf out since it's
// not Copy, and Vec implements Drop (so we can't destructure it).
let buf = ptr::read(&self.buf);
let len = self.len;
mem::forget(self);
IntoIter {
start: *buf.ptr,
end: buf.ptr.offset(len as isize),
_buf: buf,
}
}
}
}
```
Much better.
Loading…
Cancel
Save