diff --git a/README.md b/README.md index a801a21..874f6f2 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,39 @@ -% The Unsafe Rust Programming Language +% The Advanced Rust Programming Language # NOTE: This is a draft document, and may contain serious errors -**This document is about advanced functionality and low-level development practices -in the Rust Programming Language. Most of the things discussed won't matter -to the average Rust programmer. However if you wish to correctly write unsafe -code in Rust, this text contains invaluable information.** +So you've played around with Rust a bit. You've written a few simple programs and +you think you grok the basics. Maybe you've even read through +*[The Rust Programming Language][trpl]*. Now you want to get neck-deep in all the +nitty-gritty details of the language. You want to know those weird corner-cases. +You want to know what the heck `unsafe` really means, and how to properly use it. +This is the book for you. -The Unsafe Rust Programming Language (TURPL) seeks to complement -[The Rust Programming Language Book][trpl] (TRPL). -Where TRPL introduces the language and teaches the basics, TURPL dives deep into -the specification of the language, and all the nasty bits necessary to write -Unsafe Rust. TURPL does not assume you have read TRPL, but does assume you know -the basics of the language and systems programming. We will not explain the -stack or heap. We will not explain the basic syntax. +To be clear, this book goes into *serious* detail. We're going to dig into +exception-safety and pointer aliasing. We're going to talk about memory +models. We're even going to do some type-theory. This is stuff that you +absolutely *don't* need to know to write fast and safe Rust programs. +You could probably close this book *right now* and still have a productive +and happy career in Rust. +However if you intend to write unsafe code -- or just *really* want to dig into +the guts of the language -- this book contains *invaluable* information. +Unlike *The Rust Programming Language* we *will* be assuming considerable prior +knowledge. In particular, you should be comfortable with: -[trpl]: https://doc.rust-lang.org/book/ \ No newline at end of file +* Basic Systems Programming: + * Pointers + * [The stack and heap][] + * The memory hierarchy (caches) + * Threads + +* [Basic Rust][] + +Due to the nature of advanced Rust programming, we will be spending a lot of time +talking about *safety* and *guarantees*. In particular, a significant portion of +the book will be dedicated to correctly writing and understanding Unsafe Rust. + +[trpl]: https://doc.rust-lang.org/book/ +[The stack and heap]: https://doc.rust-lang.org/book/the-stack-and-the-heap.html +[Basic Rust]: https://doc.rust-lang.org/book/syntax-and-semantics.html diff --git a/SUMMARY.md b/SUMMARY.md index 1d66f1b..dc494d2 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -1,7 +1,7 @@ # Summary * [Meet Safe and Unsafe](meet-safe-and-unsafe.md) - * [What Do Safe and Unsafe Mean](safe-unsafe-meaning.md) + * [How Safe and Unsafe Interact](safe-unsafe-meaning.md) * [Working with Unsafe](working-with-unsafe.md) * [Data Layout](data.md) * [repr(Rust)](repr-rust.md) diff --git a/meet-safe-and-unsafe.md b/meet-safe-and-unsafe.md index dfcbd4a..5ff000f 100644 --- a/meet-safe-and-unsafe.md +++ b/meet-safe-and-unsafe.md @@ -1,82 +1,98 @@ % Meet Safe and Unsafe -Safe and Unsafe are Rust's chief engineers. - -TODO: ADORABLE PICTURES OMG - -Unsafe handles all the dangerous internal stuff. They build the foundations -and handle all the dangerous materials. By all accounts, Unsafe is really a bit -unproductive, because the nature of their work means that they have to spend a -lot of time checking and double-checking everything. What if there's an earthquake -on a leap year? Are we ready for that? Unsafe better be, because if they get -*anything* wrong, everything will blow up! What Unsafe brings to the table is -*quality*, not quantity. Still, nothing would ever get done if everything was -built to Unsafe's standards! - -That's where Safe comes in. Safe has to handle *everything else*. Since Safe needs -to *get work done*, they've grown to be fairly careless and clumsy! Safe doesn't worry -about all the crazy eventualities that Unsafe does, because life is too short to deal -with leap-year-earthquakes. Of course, this means there's some jobs that Safe just -can't handle. Safe is all about quantity over quality. - -Unsafe loves Safe to bits, but knows that they *can never trust them to do the -right thing*. Still, Unsafe acknowledges that not every problem needs quite the -attention to detail that they apply. Indeed, Unsafe would *love* if Safe could do -*everything* for them. To accomplish this, Unsafe spends most of their time -building *safe abstractions*. These abstractions handle all the nitty-gritty -details for Safe, and choose good defaults so that the simplest solution (which -Safe will inevitably use) is usually the *right* one. Once a safe abstraction is -built, Unsafe ideally needs to never work on it again, and Safe can blindly use -it in all their work. - -Unsafe's attention to detail means that all the things that they mark as ok for -Safe to use can be combined in arbitrarily ridiculous ways, and all the rules -that Unsafe is forced to uphold will never be violated. If they *can* be violated -by Safe, that means *Unsafe*'s the one in the wrong. Safe can work carelessly, -knowing that if anything blows up, it's not *their* fault. Safe can also call in -Unsafe at any time if there's a hard problem they can't quite work out, or if they -can't meet the client's quality demands. Of course, Unsafe will beg and plead Safe -to try their latest safe abstraction first! - -In addition to being adorable, Safe and Unsafe are what makes Rust possible. -Rust can be thought of as two different languages: Safe Rust, and Unsafe Rust. -Any time someone opines the guarantees of Rust, they are almost surely talking about -Safe. However Safe is not sufficient to write every program. For that, -we need the Unsafe superset. - -Most fundamentally, writing bindings to other languages -(such as the C exposed by your operating system) is never going to be safe. Rust -can't control what other languages do to program execution! However Unsafe is -also necessary to construct fundamental abstractions where the type system is not -sufficient to automatically prove what you're doing is sound. - -Indeed, the Rust standard library is implemented in Rust, and it makes substantial -use of Unsafe for implementing IO, memory allocation, collections, -synchronization, and other low-level computational primitives. - -Upon hearing this, many wonder why they would not simply just use C or C++ in place of -Rust (or just use a "real" safe language). If we're going to do unsafe things, why not -lean on these much more established languages? - -The most important difference between C++ and Rust is a matter of defaults: -Rust is 100% safe by default. Even when you *opt out* of safety in Rust, it is a modular -action. In deciding to work with unchecked uninitialized memory, this does not -suddenly make dangling or null pointers a problem. When using unchecked indexing on `x`, -one does not have to suddenly worry about indexing out of bounds on `y`. -C and C++, by contrast, have pervasive unsafety baked into the language. Even the -modern best practices like `unique_ptr` have various safety pitfalls. - -It cannot be emphasized enough that Unsafe should be regarded as an exceptional -thing, not a normal one. Unsafe is often the domain of *fundamental libraries*: anything that needs -to make FFI bindings or define core abstractions. These fundamental libraries then expose -a safe interface for intermediate libraries and applications to build upon. And these -safe interfaces make an important promise: if your application segfaults, it's not your -fault. *They* have a bug. - -And really, how is that different from *any* safe language? Python, Ruby, and Java libraries -can internally do all sorts of nasty things. The languages themselves are no -different. Safe languages *regularly* have bugs that cause critical vulnerabilities. -The fact that Rust is written with a healthy spoonful of Unsafe is no different. -However it *does* mean that Rust doesn't need to fall back to the pervasive unsafety of -C to do the nasty things that need to get done. +Programmers in safe "high-level" languages face a fundamental dilemma. On one +hand, it would be *really* great to just say what you want and not worry about +how it's done. On the other hand, that can lead to some *really* poor +performance. It may be necessary to drop down to less clear or idiomatic +practices to get the performance characteristics you want. Or maybe you just +throw up your hands in disgust and decide to shell out to an implementation in +a less sugary-wonderful *unsafe* language. +Worse, when you want to talk directly to the operating system, you *have* to +talk to an unsafe language: *C*. C is ever-present and unavoidable. It's the +lingua-franca of the programming world. +Even other safe languages generally expose C interfaces for the world at large! +Regardless of *why* you're doing it, as soon as your program starts talking to +C it stops being safe. + +With that said, Rust is *totally* a safe programming language. + +Well, Rust *has* a safe programming language. Let's step back a bit. + +Rust can be thought of as being composed of two +programming languages: *Safe* and *Unsafe*. Safe is For Reals Totally Safe. +Unsafe, unsurprisingly, is *not* For Reals Totally Safe. In fact, Unsafe lets +you do some really crazy unsafe things. + +Safe is *the* Rust programming language. If all you do is write Safe Rust, +you will never have to worry about type-safety or memory-safety. You will never +endure a null or dangling pointer, or any of that Undefined Behaviour nonsense. + +*That's totally awesome*. + +The standard library also gives you enough utilities out-of-the-box that you'll +be able to write awesome high-performance applications and libraries in pure +idiomatic Safe Rust. + +But maybe you want to talk to another language. Maybe you're writing a +low-level abstraction not exposed by the standard library. Maybe you're +*writing* the standard library (which is written entirely in Rust). Maybe you +need to do something the type-system doesn't understand and just *frob some dang +bits*. Maybe you need Unsafe Rust. + +Unsafe Rust is exactly like Safe Rust with *all* the same rules and semantics. +However Unsafe Rust lets you do some *extra* things that are Definitely Not Safe. + +The only things that are different in Unsafe Rust are that you can: + +* Dereference raw pointers +* Call `unsafe` functions (including C functions, intrinsics, and the raw allocator) +* Implement `unsafe` traits +* Mutate statics + +That's it. The reason these operations are relegated to Unsafe is that misusing +any of these things will cause the ever dreaded Undefined Behaviour. Invoking +Undefined Behaviour gives the compiler full rights to do arbitrarily bad things +to your program. You definitely *should not* invoke Undefined Behaviour. + +Unlike C, Undefined Behaviour is pretty limited in scope in Rust. All the core +language cares about is preventing the following things: + +* Dereferencing null or dangling pointers +* Reading [uninitialized memory][] +* Breaking the [pointer aliasing rules][] +* Producing invalid primitive values: + * dangling/null references + * a `bool` that isn't 0 or 1 + * an undefined `enum` discriminant + * a `char` outside the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF] + * A non-utf8 `str` +* Unwinding into another language +* Causing a [data race][race] +* Double-dropping a value + +That's it. That's all the Undefined Behaviour baked into Rust. Of course, unsafe +functions and traits are free to declare arbitrary other constraints that a +program must maintain to avoid Undefined Behaviour. However these are generally +just things that will transitively lead to one of the above problems. Some +additional constraints may also derive from compiler intrinsics that make special +assumptions about how code can be optimized. + +Rust is otherwise quite permissive with respect to other dubious operations. Rust +considers it "safe" to: + +* Deadlock +* Have a [race condition][race] +* Leak memory +* Fail to call destructors +* Overflow integers +* Abort the program +* Delete the production database + +However any program that actually manages to do such a thing is *probably* +incorrect. Rust provides lots of tools to make these things rare, but +these problems are considered impractical to categorically prevent. + +[pointer aliasing rules]: references.html +[uninitialized memory]: uninitialized.html +[race]: races.html diff --git a/safe-unsafe-meaning.md b/safe-unsafe-meaning.md index 6c3d408..082970d 100644 --- a/safe-unsafe-meaning.md +++ b/safe-unsafe-meaning.md @@ -1,38 +1,17 @@ -% What do Safe and Unsafe really mean? - -Rust cares about preventing the following things: - -* Dereferencing null or dangling pointers -* Reading [uninitialized memory][] -* Breaking the [pointer aliasing rules][] -* Producing invalid primitive values: - * dangling/null references - * a `bool` that isn't 0 or 1 - * an undefined `enum` discriminant - * a `char` larger than char::MAX (TODO: check if stronger restrictions apply) - * A non-utf8 `str` -* Unwinding into another language -* Causing a [data race][] -* Invoking Misc. Undefined Behaviour (in e.g. compiler intrinsics) - -That's it. That's all the Undefined Behaviour in Rust. Libraries are free to -declare arbitrary requirements if they could transitively cause memory safety -issues, but it all boils down to the above actions. Rust is otherwise -quite permisive with respect to other dubious operations. Rust considers it -"safe" to: - -* Deadlock -* Have a Race Condition -* Leak memory -* Fail to call destructors -* Overflow integers -* Delete the production database - -However any program that does such a thing is *probably* incorrect. Rust -provides lots of tools to make doing these things rare, but these problems are -considered impractical to categorically prevent. - -Rust models the seperation between Safe and Unsafe with the `unsafe` keyword. +% How Safe and Unsafe Interact + +So what's the relationship between Safe and Unsafe? How do they interact? + +Rust models the seperation between Safe and Unsafe with the `unsafe` keyword, which +can be thought as a sort of *foreign function interface* (FFI) between Safe and Unsafe. +This is the magic behind why we can say Safe is a safe language: all the scary unsafe +bits are relagated *exclusively* to FFI *just like every other safe language*. + +However because one language is a subset of the other, the two can be cleanly +intermixed as long as the boundary between Safe and Unsafe is denoted with the +`unsafe` keyword. No need to write headers, initialize runtimes, or any of that +other FFI boiler-plate. + There are several places `unsafe` can appear in Rust today, which can largely be grouped into two categories: @@ -44,7 +23,7 @@ you to write `unsafe` elsewhere: the danger. * On trait declarations, `unsafe` is declaring that *implementing* the trait is an unsafe operation, as it has contracts that other unsafe code is free to - trust blindly. + trust blindly. (More on this below.) * I am declaring that I have, to the best of my knowledge, adhered to the unchecked contracts: @@ -55,14 +34,14 @@ unchecked contracts: There is also `#[unsafe_no_drop_flag]`, which is a special case that exists for historical reasons and is in the process of being phased out. See the section on -[destructors][] for details. +[drop flags][] for details. Some examples of unsafe functions: * `slice::get_unchecked` will perform unchecked indexing, allowing memory safety to be freely violated. * `ptr::offset` is an intrinsic that invokes Undefined Behaviour if it is - not "in bounds" as defined by LLVM (see the lifetimes section for details). + not "in bounds" as defined by LLVM. * `mem::transmute` reinterprets some value as having the given type, bypassing type safety in arbitrary ways. (see [conversions][] for details) * All FFI functions are `unsafe` because they can do arbitrary things. @@ -72,14 +51,34 @@ Some examples of unsafe functions: As of Rust 1.0 there are exactly two unsafe traits: * `Send` is a marker trait (it has no actual API) that promises implementors - are safe to send to another thread. + are safe to send (move) to another thread. * `Sync` is a marker trait that promises that threads can safely share implementors through a shared reference. -The need for unsafe traits boils down to the fundamental lack of trust that Unsafe -has for Safe. All safe traits are free to declare arbitrary contracts, but because -implementing them is a job for Safe, Unsafe can't trust those contracts to actually -be upheld. +The need for unsafe traits boils down to the fundamental property of safe code: + +**No matter how completely awful Safe code is, it can't cause Undefined +Behaviour.** + +This means that Unsafe, **the royal vanguard of Undefined Behaviour**, has to be +*super paranoid* about generic safe code. Unsafe is free to trust *specific* safe +code (or else you would degenerate into infinite spirals of paranoid despair). +It is generally regarded as ok to trust the standard library to be correct, as +std is effectively an extension of the language (and you *really* just have to trust +the language). If `std` fails to uphold the guarantees it declares, then it's +basically a language bug. + +That said, it would be best to minimize *needlessly* relying on properties of +concrete safe code. Bugs happen! Of course, I must reinforce that this is only +a concern for Unsafe code. Safe code can't blindly trust anyone and everyone +as far as basic memory-safety is concerned. + +On the other hand, safe traits are free to declare arbitrary contracts, but because +implementing them is Safe, Unsafe can't trust those contracts to actually +be upheld. This is different from the concrete case because *anyone* can +randomly implement the interface. There is something fundamentally different +about trusting a *particular* piece of code to be correct, and trusting *all the +code that will ever be written* to be correct. For instance Rust has `PartialOrd` and `Ord` traits to try to differentiate between types which can "just" be compared, and those that actually implement a @@ -99,14 +98,13 @@ destructors will be successfully called! Hooray! However BTreeMap is implemented using a modest spoonful of Unsafe (most collections are). That means that it is not necessarily *trivially true* that a bad Ord implementation will make BTreeMap behave safely. Unsafe must be sure not to rely -on Ord *where safety is at stake*, because Ord is provided by Safe, and memory -safety is not Safe's responsibility to uphold. *It must be impossible for Safe -code to violate memory safety*. +on Ord *where safety is at stake*. Ord is provided by Safe, and safety is not +Safe's responsibility to uphold. But wouldn't it be grand if there was some way for Unsafe to trust *some* trait contracts *somewhere*? This is the problem that unsafe traits tackle: by marking *the trait itself* as unsafe *to implement*, Unsafe can trust the implementation -to be correct (because Unsafe can trust themself). +to be correct. Rust has traditionally avoided making traits unsafe because it makes Unsafe pervasive, which is not desirable. Send and Sync are unsafe is because @@ -114,11 +112,12 @@ thread safety is a *fundamental property* that Unsafe cannot possibly hope to defend against in the same way it would defend against a bad Ord implementation. The only way to possibly defend against thread-unsafety would be to *not use threading at all*. Making every operation atomic isn't even sufficient, because -it's possible for complex invariants between disjoint locations in memory. +it's possible for complex invariants to exist between disjoint locations in +memory. For instance, the pointer and capacity of a Vec must be in sync. Even concurrent paradigms that are traditionally regarded as Totally Safe like message passing implicitly rely on some notion of thread safety -- are you -really message-passing if you send a *pointer*? Send and Sync therefore require +really message-passing if you pass a *pointer*? Send and Sync therefore require some *fundamental* level of trust that Safe code can't provide, so they must be unsafe to implement. To help obviate the pervasive unsafety that this would introduce, Send (resp. Sync) is *automatically* derived for all types composed only @@ -128,8 +127,6 @@ primitives). -[pointer aliasing rules]: lifetimes.html#references -[uninitialized memory]: uninitialized.html -[data race]: concurrency.html -[destructors]: raii.html -[conversions]: conversions.html \ No newline at end of file + +[drop flags]: drop-flags.html +[conversions]: conversions.html diff --git a/working-with-unsafe.md b/working-with-unsafe.md index 57b71d1..b1174a7 100644 --- a/working-with-unsafe.md +++ b/working-with-unsafe.md @@ -1,11 +1,11 @@ % Working with Unsafe -Rust generally only gives us the tools to talk about safety in a scoped and -binary manner. Unfortunately reality is significantly more complicated than that. +Rust generally only gives us the tools to talk about Unsafe in a scoped and +binary manner. Unfortunately, reality is significantly more complicated than that. For instance, consider the following toy function: ```rust -fn do_idx(idx: usize, arr: &[u8]) -> Option { +pub fn index(idx: usize, arr: &[u8]) -> Option { if idx < arr.len() { unsafe { Some(*arr.get_unchecked(idx)) @@ -22,7 +22,7 @@ function, the scope of the unsafe block is questionable. Consider changing the `<` to a `<=`: ```rust -fn do_idx(idx: usize, arr: &[u8]) -> Option { +pub fn index(idx: usize, arr: &[u8]) -> Option { if idx <= arr.len() { unsafe { Some(*arr.get_unchecked(idx)) @@ -45,7 +45,7 @@ implementation of `Vec`: ```rust // Note this defintion is insufficient. See the section on lifetimes. -struct Vec { +pub struct Vec { ptr: *mut T, len: usize, cap: usize, @@ -55,7 +55,7 @@ struct Vec { // We currently live in a nice imaginary world of only positive fixed-size // types. impl Vec { - fn push(&mut self, elem: T) { + pub fn push(&mut self, elem: T) { if self.len == self.cap { // not important for this example self.reallocate(); @@ -80,9 +80,25 @@ adding the following method: This code is safe, but it is also completely unsound. Changing the capacity violates the invariants of Vec (that `cap` reflects the allocated space in the -Vec). This is not something the rest of `Vec` can guard against. It *has* to +Vec). This is not something the rest of Vec can guard against. It *has* to trust the capacity field because there's no way to verify it. `unsafe` does more than pollute a whole function: it pollutes a whole *module*. Generally, the only bullet-proof way to limit the scope of unsafe code is at the module boundary with privacy. + +However this works *perfectly*. The existence of `make_room` is *not* a +problem for the soundness of Vec because we didn't mark it as public. Only the +module that defines this function can call it. Also, `make_room` directly +accesses the private fields of Vec, so it can only be written in the same module +as Vec. + +It is therefore possible for us to write a completely safe abstraction that +relies on complex invariants. This is *critical* to the relationship between +Safe Rust and Unsafe Rust. We have already seen that Unsafe code must trust +*some* Safe code, but can't trust *arbitrary* Safe code. However if Unsafe +couldn't prevent client Safe code from messing with its state in arbitrary ways, +safety would be a lost cause. + +Safety lives! +