From 37d42cdcefb327d5df3b9b65938c51f1c701b4d1 Mon Sep 17 00:00:00 2001 From: Alexis Beingessner Date: Thu, 30 Jul 2015 22:48:36 -0700 Subject: [PATCH] last of the emphasis cleanup --- races.md | 16 ++++++++------ repr-rust.md | 14 ++++++------ safe-unsafe-meaning.md | 50 +++++++++++++++++++++--------------------- send-and-sync.md | 29 ++++++++++++------------ subtyping.md | 26 ++++++++++++---------- unwinding.md | 4 ++-- vec-alloc.md | 14 ++++++------ vec-drain.md | 2 +- vec-insert-remove.md | 2 +- vec-into-iter.md | 6 ++--- vec-push-pop.md | 6 ++--- vec-zsts.md | 4 ++-- 12 files changed, 89 insertions(+), 84 deletions(-) diff --git a/races.md b/races.md index 21a67f1..3b47502 100644 --- a/races.md +++ b/races.md @@ -12,11 +12,13 @@ it's impossible to alias a mutable reference, so it's impossible to perform a data race. Interior mutability makes this more complicated, which is largely why we have the Send and Sync traits (see below). -However Rust *does not* prevent general race conditions. This is -pretty fundamentally impossible, and probably honestly undesirable. Your hardware -is racy, your OS is racy, the other programs on your computer are racy, and the -world this all runs in is racy. Any system that could genuinely claim to prevent -*all* race conditions would be pretty awful to use, if not just incorrect. +**However Rust does not prevent general race conditions.** + +This is pretty fundamentally impossible, and probably honestly undesirable. Your +hardware is racy, your OS is racy, the other programs on your computer are racy, +and the world this all runs in is racy. Any system that could genuinely claim to +prevent *all* race conditions would be pretty awful to use, if not just +incorrect. So it's perfectly "fine" for a Safe Rust program to get deadlocked or do something incredibly stupid with incorrect synchronization. Obviously such a @@ -46,7 +48,7 @@ thread::spawn(move || { }); // Index with the value loaded from the atomic. This is safe because we -// read the atomic memory only once, and then pass a *copy* of that value +// read the atomic memory only once, and then pass a copy of that value // to the Vec's indexing implementation. This indexing will be correctly // bounds checked, and there's no chance of the value getting changed // in the middle. However our program may panic if the thread we spawned @@ -75,7 +77,7 @@ thread::spawn(move || { if idx.load(Ordering::SeqCst) < data.len() { unsafe { - // Incorrectly loading the idx *after* we did the bounds check. + // Incorrectly loading the idx after we did the bounds check. // It could have changed. This is a race condition, *and dangerous* // because we decided to do `get_unchecked`, which is `unsafe`. println!("{}", data.get_unchecked(idx.load(Ordering::SeqCst))); diff --git a/repr-rust.md b/repr-rust.md index 639d64a..9073495 100644 --- a/repr-rust.md +++ b/repr-rust.md @@ -70,7 +70,7 @@ struct B { Rust *does* guarantee that two instances of A have their data laid out in exactly the same way. However Rust *does not* guarantee that an instance of A has the same field ordering or padding as an instance of B (in practice there's -no *particular* reason why they wouldn't, other than that its not currently +no particular reason why they wouldn't, other than that its not currently guaranteed). With A and B as written, this is basically nonsensical, but several other @@ -88,9 +88,9 @@ struct Foo { ``` Now consider the monomorphizations of `Foo` and `Foo`. If -Rust lays out the fields in the order specified, we expect it to *pad* the -values in the struct to satisfy their *alignment* requirements. So if Rust -didn't reorder fields, we would expect Rust to produce the following: +Rust lays out the fields in the order specified, we expect it to pad the +values in the struct to satisfy their alignment requirements. So if Rust +didn't reorder fields, we would expect it to produce the following: ```rust,ignore struct Foo { @@ -112,7 +112,7 @@ The latter case quite simply wastes space. An optimal use of space therefore requires different monomorphizations to have *different field orderings*. **Note: this is a hypothetical optimization that is not yet implemented in Rust -**1.0 +1.0** Enums make this consideration even more complicated. Naively, an enum such as: @@ -128,8 +128,8 @@ would be laid out as: ```rust struct FooRepr { - data: u64, // this is *really* either a u64, u32, or u8 based on `tag` - tag: u8, // 0 = A, 1 = B, 2 = C + data: u64, // this is either a u64, u32, or u8 based on `tag` + tag: u8, // 0 = A, 1 = B, 2 = C } ``` diff --git a/safe-unsafe-meaning.md b/safe-unsafe-meaning.md index 9093083..2f15b70 100644 --- a/safe-unsafe-meaning.md +++ b/safe-unsafe-meaning.md @@ -5,7 +5,7 @@ So what's the relationship between Safe and Unsafe Rust? How do they interact? Rust models the separation between Safe and Unsafe Rust with the `unsafe` keyword, which can be thought as a sort of *foreign function interface* (FFI) between Safe and Unsafe Rust. This is the magic behind why we can say Safe Rust -is a safe language: all the scary unsafe bits are relegated *exclusively* to FFI +is a safe language: all the scary unsafe bits are relegated exclusively to FFI *just like every other safe language*. However because one language is a subset of the other, the two can be cleanly @@ -61,13 +61,13 @@ The need for unsafe traits boils down to the fundamental property of safe code: **No matter how completely awful Safe code is, it can't cause Undefined Behaviour.** -This means that Unsafe, **the royal vanguard of Undefined Behaviour**, has to be -*super paranoid* about generic safe code. Unsafe is free to trust *specific* safe -code (or else you would degenerate into infinite spirals of paranoid despair). -It is generally regarded as ok to trust the standard library to be correct, as -`std` is effectively an extension of the language (and you *really* just have -to trust the language). If `std` fails to uphold the guarantees it declares, -then it's basically a language bug. +This means that Unsafe Rust, **the royal vanguard of Undefined Behaviour**, has to be +*super paranoid* about generic safe code. To be clear, Unsafe Rust is totally free to trust +specific safe code. Anything else would degenerate into infinite spirals of +paranoid despair. In particular it's generally regarded as ok to trust the standard library +to be correct. `std` is effectively an extension of the language, and you +really just have to trust the language. If `std` fails to uphold the +guarantees it declares, then it's basically a language bug. That said, it would be best to minimize *needlessly* relying on properties of concrete safe code. Bugs happen! Of course, I must reinforce that this is only @@ -75,36 +75,36 @@ a concern for Unsafe code. Safe code can blindly trust anyone and everyone as far as basic memory-safety is concerned. On the other hand, safe traits are free to declare arbitrary contracts, but because -implementing them is Safe, Unsafe can't trust those contracts to actually +implementing them is safe, unsafe code can't trust those contracts to actually be upheld. This is different from the concrete case because *anyone* can randomly implement the interface. There is something fundamentally different -about trusting a *particular* piece of code to be correct, and trusting *all the +about trusting a particular piece of code to be correct, and trusting *all the code that will ever be written* to be correct. For instance Rust has `PartialOrd` and `Ord` traits to try to differentiate between types which can "just" be compared, and those that actually implement a -*total* ordering. Pretty much every API that wants to work with data that can be -compared *really* wants Ord data. For instance, a sorted map like BTreeMap +total ordering. Pretty much every API that wants to work with data that can be +compared wants Ord data. For instance, a sorted map like BTreeMap *doesn't even make sense* for partially ordered types. If you claim to implement Ord for a type, but don't actually provide a proper total ordering, BTreeMap will get *really confused* and start making a total mess of itself. Data that is inserted may be impossible to find! But that's okay. BTreeMap is safe, so it guarantees that even if you give it a -*completely* garbage Ord implementation, it will still do something *safe*. You -won't start reading uninitialized memory or unallocated memory. In fact, BTreeMap +completely garbage Ord implementation, it will still do something *safe*. You +won't start reading uninitialized or unallocated memory. In fact, BTreeMap manages to not actually lose any of your data. When the map is dropped, all the destructors will be successfully called! Hooray! -However BTreeMap is implemented using a modest spoonful of Unsafe (most collections -are). That means that it is not necessarily *trivially true* that a bad Ord -implementation will make BTreeMap behave safely. Unsafe must be sure not to rely -on Ord *where safety is at stake*. Ord is provided by Safe, and safety is not -Safe's responsibility to uphold. +However BTreeMap is implemented using a modest spoonful of Unsafe Rust (most collections +are). That means that it's not necessarily *trivially true* that a bad Ord +implementation will make BTreeMap behave safely. BTreeMap must be sure not to rely +on Ord *where safety is at stake*. Ord is provided by safe code, and safety is not +safe code's responsibility to uphold. -But wouldn't it be grand if there was some way for Unsafe to trust *some* trait +But wouldn't it be grand if there was some way for Unsafe to trust some trait contracts *somewhere*? This is the problem that unsafe traits tackle: by marking -*the trait itself* as unsafe *to implement*, Unsafe can trust the implementation +*the trait itself* as unsafe to implement, unsafe code can trust the implementation to uphold the trait's contract. Although the trait implementation may be incorrect in arbitrary other ways. @@ -126,7 +126,7 @@ But it's probably not the implementation you want. Rust has traditionally avoided making traits unsafe because it makes Unsafe pervasive, which is not desirable. Send and Sync are unsafe is because thread -safety is a *fundamental property* that Unsafe cannot possibly hope to defend +safety is a *fundamental property* that unsafe code cannot possibly hope to defend against in the same way it would defend against a bad Ord implementation. The only way to possibly defend against thread-unsafety would be to *not use threading at all*. Making every load and store atomic isn't even sufficient, @@ -135,10 +135,10 @@ in memory. For instance, the pointer and capacity of a Vec must be in sync. Even concurrent paradigms that are traditionally regarded as Totally Safe like message passing implicitly rely on some notion of thread safety -- are you -really message-passing if you pass a *pointer*? Send and Sync therefore require -some *fundamental* level of trust that Safe code can't provide, so they must be +really message-passing if you pass a pointer? Send and Sync therefore require +some fundamental level of trust that Safe code can't provide, so they must be unsafe to implement. To help obviate the pervasive unsafety that this would -introduce, Send (resp. Sync) is *automatically* derived for all types composed only +introduce, Send (resp. Sync) is automatically derived for all types composed only of Send (resp. Sync) values. 99% of types are Send and Sync, and 99% of those never actually say it (the remaining 1% is overwhelmingly synchronization primitives). diff --git a/send-and-sync.md b/send-and-sync.md index 5b00709..af8fb43 100644 --- a/send-and-sync.md +++ b/send-and-sync.md @@ -8,20 +8,19 @@ captures this with through the `Send` and `Sync` traits. * A type is Send if it is safe to send it to another thread. A type is Sync if * it is safe to share between threads (`&T` is Send). -Send and Sync are *very* fundamental to Rust's concurrency story. As such, a +Send and Sync are fundamental to Rust's concurrency story. As such, a substantial amount of special tooling exists to make them work right. First and -foremost, they're *unsafe traits*. This means that they are unsafe *to -implement*, and other unsafe code can *trust* that they are correctly +foremost, they're [unsafe traits][]. This means that they are unsafe to +implement, and other unsafe code can that they are correctly implemented. Since they're *marker traits* (they have no associated items like methods), correctly implemented simply means that they have the intrinsic properties an implementor should have. Incorrectly implementing Send or Sync can cause Undefined Behaviour. -Send and Sync are also what Rust calls *opt-in builtin traits*. This means that, -unlike every other trait, they are *automatically* derived: if a type is -composed entirely of Send or Sync types, then it is Send or Sync. Almost all -primitives are Send and Sync, and as a consequence pretty much all types you'll -ever interact with are Send and Sync. +Send and Sync are also automatically derived traits. This means that, unlike +every other trait, if a type is composed entirely of Send or Sync types, then it +is Send or Sync. Almost all primitives are Send and Sync, and as a consequence +pretty much all types you'll ever interact with are Send and Sync. Major exceptions include: @@ -37,13 +36,12 @@ sense, one could argue that it would be "fine" for them to be marked as thread safe. However it's important that they aren't thread safe to prevent types that -*contain them* from being automatically marked as thread safe. These types have +contain them from being automatically marked as thread safe. These types have non-trivial untracked ownership, and it's unlikely that their author was necessarily thinking hard about thread safety. In the case of Rc, we have a nice -example of a type that contains a `*mut` that is *definitely* not thread safe. +example of a type that contains a `*mut` that is definitely not thread safe. -Types that aren't automatically derived can *opt-in* to Send and Sync by simply -implementing them: +Types that aren't automatically derived can simply implement them if desired: ```rust struct MyBox(*mut u8); @@ -52,12 +50,13 @@ unsafe impl Send for MyBox {} unsafe impl Sync for MyBox {} ``` -In the *incredibly rare* case that a type is *inappropriately* automatically -derived to be Send or Sync, then one can also *unimplement* Send and Sync: +In the *incredibly rare* case that a type is inappropriately automatically +derived to be Send or Sync, then one can also unimplement Send and Sync: ```rust #![feature(optin_builtin_traits)] +// I have some magic semantics for some synchronization primitive! struct SpecialThreadToken(u8); impl !Send for SpecialThreadToken {} @@ -77,3 +76,5 @@ largely behave like an `&` or `&mut` into the collection. TODO: better explain what can or can't be Send or Sync. Sufficient to appeal only to data races? + +[unsafe traits]: safe-unsafe-meaning.html diff --git a/subtyping.md b/subtyping.md index 767a0ac..3c57297 100644 --- a/subtyping.md +++ b/subtyping.md @@ -1,14 +1,14 @@ % Subtyping and Variance Although Rust doesn't have any notion of structural inheritance, it *does* -include subtyping. In Rust, subtyping derives entirely from *lifetimes*. Since +include subtyping. In Rust, subtyping derives entirely from lifetimes. Since lifetimes are scopes, we can partially order them based on the *contains* (outlives) relationship. We can even express this as a generic bound. -Subtyping on lifetimes in terms of that relationship: if `'a: 'b` ("a contains +Subtyping on lifetimes is in terms of that relationship: if `'a: 'b` ("a contains b" or "a outlives b"), then `'a` is a subtype of `'b`. This is a large source of confusion, because it seems intuitively backwards to many: the bigger scope is a -*sub type* of the smaller scope. +*subtype* of the smaller scope. This does in fact make sense, though. The intuitive reason for this is that if you expect an `&'a u8`, then it's totally fine for me to hand you an `&'static @@ -72,7 +72,7 @@ to be able to pass `&&'static str` where an `&&'a str` is expected. The additional level of indirection does not change the desire to be able to pass longer lived things where shorted lived things are expected. -However this logic *does not* apply to `&mut`. To see why `&mut` should +However this logic doesn't apply to `&mut`. To see why `&mut` should be invariant over T, consider the following code: ```rust,ignore @@ -109,7 +109,7 @@ between `'a` and T is that `'a` is a property of the reference itself, while T is something the reference is borrowing. If you change T's type, then the source still remembers the original type. However if you change the lifetime's type, no one but the reference knows this information, so it's fine. -Put another way, `&'a mut T` owns `'a`, but only *borrows* T. +Put another way: `&'a mut T` owns `'a`, but only *borrows* T. `Box` and `Vec` are interesting cases because they're variant, but you can definitely store values in them! This is where Rust gets really clever: it's @@ -118,7 +118,7 @@ in them *via a mutable reference*! The mutable reference makes the whole type invariant, and therefore prevents you from smuggling a short-lived type into them. -Being variant *does* allows `Box` and `Vec` to be weakened when shared +Being variant allows `Box` and `Vec` to be weakened when shared immutably. So you can pass a `&Box<&'static str>` where a `&Box<&'a str>` is expected. @@ -126,7 +126,7 @@ However what should happen when passing *by-value* is less obvious. It turns out that, yes, you can use subtyping when passing by-value. That is, this works: ```rust -fn get_box<'a>(str: &'a u8) -> Box<&'a str> { +fn get_box<'a>(str: &'a str) -> Box<&'a str> { // string literals are `&'static str`s Box::new("hello") } @@ -150,7 +150,7 @@ signature: fn foo(&'a str) -> usize; ``` -This signature claims that it can handle any `&str` that lives *at least* as +This signature claims that it can handle any `&str` that lives at least as long as `'a`. Now if this signature was variant over `&'a str`, that would mean @@ -159,10 +159,12 @@ fn foo(&'static str) -> usize; ``` could be provided in its place, as it would be a subtype. However this function -has a *stronger* requirement: it says that it can *only* handle `&'static str`s, -and nothing else. Therefore functions are not variant over their arguments. +has a stronger requirement: it says that it can only handle `&'static str`s, +and nothing else. Giving `&'a str`s to it would be unsound, as it's free to +assume that what it's given lives forever. Therefore functions are not variant +over their arguments. -To see why `Fn(T) -> U` should be *variant* over U, consider the following +To see why `Fn(T) -> U` should be variant over U, consider the following function signature: ```rust,ignore @@ -177,7 +179,7 @@ therefore completely reasonable to provide fn foo(usize) -> &'static str; ``` -in its place. Therefore functions *are* variant over their return type. +in its place. Therefore functions are variant over their return type. `*const` has the exact same semantics as `&`, so variance follows. `*mut` on the other hand can dereference to an `&mut` whether shared or not, so it is marked diff --git a/unwinding.md b/unwinding.md index 59494d8..3ad95dd 100644 --- a/unwinding.md +++ b/unwinding.md @@ -31,12 +31,12 @@ panics can only be caught by the parent thread. This means catching a panic requires spinning up an entire OS thread! This unfortunately stands in conflict to Rust's philosophy of zero-cost abstractions. -There is an *unstable* API called `catch_panic` that enables catching a panic +There is an unstable API called `catch_panic` that enables catching a panic without spawning a thread. Still, we would encourage you to only do this sparingly. In particular, Rust's current unwinding implementation is heavily optimized for the "doesn't unwind" case. If a program doesn't unwind, there should be no runtime cost for the program being *ready* to unwind. As a -consequence, *actually* unwinding will be more expensive than in e.g. Java. +consequence, actually unwinding will be more expensive than in e.g. Java. Don't build your programs to unwind under normal circumstances. Ideally, you should only panic for programming errors or *extreme* problems. diff --git a/vec-alloc.md b/vec-alloc.md index 93efbbb..fc7feba 100644 --- a/vec-alloc.md +++ b/vec-alloc.md @@ -60,7 +60,7 @@ of memory at once (e.g. half the theoretical address space). As such it's like the standard library as much as possible, so we'll just kill the whole program. -We said we don't want to use intrinsics, so doing *exactly* what `std` does is +We said we don't want to use intrinsics, so doing exactly what `std` does is out. Instead, we'll call `std::process::exit` with some random number. ```rust @@ -84,7 +84,7 @@ But Rust's only supported allocator API is so low level that we'll need to do a fair bit of extra work. We also need to guard against some special conditions that can occur with really large allocations or empty allocations. -In particular, `ptr::offset` will cause us *a lot* of trouble, because it has +In particular, `ptr::offset` will cause us a lot of trouble, because it has the semantics of LLVM's GEP inbounds instruction. If you're fortunate enough to not have dealt with this instruction, here's the basic story with GEP: alias analysis, alias analysis, alias analysis. It's super important to an optimizing @@ -102,7 +102,7 @@ As a simple example, consider the following fragment of code: If the compiler can prove that `x` and `y` point to different locations in memory, the two operations can in theory be executed in parallel (by e.g. loading them into different registers and working on them independently). -However in *general* the compiler can't do this because if x and y point to +However the compiler can't do this in general because if x and y point to the same location in memory, the operations need to be done to the same value, and they can't just be merged afterwards. @@ -118,7 +118,7 @@ possible. So that's what GEP's about, how can it cause us trouble? The first problem is that we index into arrays with unsigned integers, but -GEP (and as a consequence `ptr::offset`) takes a *signed integer*. This means +GEP (and as a consequence `ptr::offset`) takes a signed integer. This means that half of the seemingly valid indices into an array will overflow GEP and actually go in the wrong direction! As such we must limit all allocations to `isize::MAX` elements. This actually means we only need to worry about @@ -138,7 +138,7 @@ However since this is a tutorial, we're not going to be particularly optimal here, and just unconditionally check, rather than use clever platform-specific `cfg`s. -The other corner-case we need to worry about is *empty* allocations. There will +The other corner-case we need to worry about is empty allocations. There will be two kinds of empty allocations we need to worry about: `cap = 0` for all T, and `cap > 0` for zero-sized types. @@ -165,9 +165,9 @@ protected from being allocated anyway (a whole 4k, on many platforms). However what about for positive-sized types? That one's a bit trickier. In principle, you can argue that offsetting by 0 gives LLVM no information: either -there's an element before the address, or after it, but it can't know which. +there's an element before the address or after it, but it can't know which. However we've chosen to conservatively assume that it may do bad things. As -such we *will* guard against this case explicitly. +such we will guard against this case explicitly. *Phew* diff --git a/vec-drain.md b/vec-drain.md index 3be295f..4521bbd 100644 --- a/vec-drain.md +++ b/vec-drain.md @@ -130,7 +130,7 @@ impl<'a, T> Drop for Drain<'a, T> { impl Vec { pub fn drain(&mut self) -> Drain { // this is a mem::forget safety thing. If Drain is forgotten, we just - // leak the whole Vec's contents. Also we need to do this *eventually* + // leak the whole Vec's contents. Also we need to do this eventually // anyway, so why not do it now? self.len = 0; diff --git a/vec-insert-remove.md b/vec-insert-remove.md index 6f88a77..0a37170 100644 --- a/vec-insert-remove.md +++ b/vec-insert-remove.md @@ -10,7 +10,7 @@ handling the case where the source and destination overlap (which will definitely happen here). If we insert at index `i`, we want to shift the `[i .. len]` to `[i+1 .. len+1]` -using the *old* len. +using the old len. ```rust,ignore pub fn insert(&mut self, index: usize, elem: T) { diff --git a/vec-into-iter.md b/vec-into-iter.md index a9c1917..ebb0a79 100644 --- a/vec-into-iter.md +++ b/vec-into-iter.md @@ -21,8 +21,8 @@ read out the value pointed to at that end and move the pointer over by one. When the two pointers are equal, we know we're done. Note that the order of read and offset are reversed for `next` and `next_back` -For `next_back` the pointer is always *after* the element it wants to read next, -while for `next` the pointer is always *at* the element it wants to read next. +For `next_back` the pointer is always after the element it wants to read next, +while for `next` the pointer is always at the element it wants to read next. To see why this is, consider the case where every element but one has been yielded. @@ -124,7 +124,7 @@ impl DoubleEndedIterator for IntoIter { ``` Because IntoIter takes ownership of its allocation, it needs to implement Drop -to free it. However it *also* wants to implement Drop to drop any elements it +to free it. However it also wants to implement Drop to drop any elements it contains that weren't yielded. diff --git a/vec-push-pop.md b/vec-push-pop.md index 2ef15e3..b518e8a 100644 --- a/vec-push-pop.md +++ b/vec-push-pop.md @@ -32,14 +32,14 @@ pub fn push(&mut self, elem: T) { Easy! How about `pop`? Although this time the index we want to access is initialized, Rust won't just let us dereference the location of memory to move -the value out, because that *would* leave the memory uninitialized! For this we +the value out, because that would leave the memory uninitialized! For this we need `ptr::read`, which just copies out the bits from the target address and intrprets it as a value of type T. This will leave the memory at this address -*logically* uninitialized, even though there is in fact a perfectly good instance +logically uninitialized, even though there is in fact a perfectly good instance of T there. For `pop`, if the old len is 1, we want to read out of the 0th index. So we -should offset by the *new* len. +should offset by the new len. ```rust,ignore pub fn pop(&mut self) -> Option { diff --git a/vec-zsts.md b/vec-zsts.md index 931aed3..72e8a34 100644 --- a/vec-zsts.md +++ b/vec-zsts.md @@ -2,7 +2,7 @@ It's time. We're going to fight the spectre that is zero-sized types. Safe Rust *never* needs to care about this, but Vec is very intensive on raw pointers and -raw allocations, which are exactly the *only* two things that care about +raw allocations, which are exactly the two things that care about zero-sized types. We need to be careful of two things: * The raw allocator API has undefined behaviour if you pass in 0 for an @@ -22,7 +22,7 @@ So if the allocator API doesn't support zero-sized allocations, what on earth do we store as our allocation? Why, `heap::EMPTY` of course! Almost every operation with a ZST is a no-op since ZSTs have exactly one value, and therefore no state needs to be considered to store or load them. This actually extends to `ptr::read` and -`ptr::write`: they won't actually look at the pointer at all. As such we *never* need +`ptr::write`: they won't actually look at the pointer at all. As such we never need to change the pointer. Note however that our previous reliance on running out of memory before overflow is