You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
nomicon/src/phantom-data.md

247 lines
11 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# PhantomData
불안전한 코드와 작업을 하다 보면, 우리는 종종 타입이나 수명이 구조체와 논리적으로 연관되어 있지만, 실제로 그 필드의 일부분은 아닌 상황을 마주할 수 있습니다. 이런 상황은 보통 수명인 경우가 많은데요, 예를 들어 `&'a [T]`를 위한 `Iter`는 (거의) 다음과 같이 정의되어 있습니다:
```rust,compile_fail
struct Iter<'a, T: 'a> {
ptr: *const T,
end: *const T,
}
```
하지만 `'a`가 구조체의 본문에 쓰이지 않았기 때문에, 이 수명은 *무제한이* 됩니다. [이것이 역사적으로 초래해왔던 문제들 때문에][unused-param], 무제한 수명과 이를 사용하는 타입은 구조체 선언에서 *금지되었습니다*. 따라서 우리는 어떻게든 이 타입들을 구조체 안에서 참조해야 합니다.
이것을 올바르게 하는 것은 올바른 변성과 해제 검사에 있어서 필수적입니다.
[unused-param]: https://rust-lang.github.io/rfcs/0738-variance.html#the-corner-case-unused-parameters-and-parameters-that-are-only-used-unsafely
우리는 이것을 특별한 표시 타입인 `PhantomData`를 통해서 합니다. `PhantomData`는 공간을 차지하지 않지만, 컴파일러의 분석을 위해 주어진 타입의 필드를 흉내냅니다. 이 방식은 우리가 원하는 변성을 직접 타입 시스템에 말하는 것보다 더 오류에 견고하다고 평가되었습니다. 또한 이 방식은 자동 트레잇과 해제 검사에 필요한 정보 등의 유용한 것들을 컴파일러에게 제공합니다.
`Iter`는 논리적으로 여러 개의 `&'a T`를 포함하므로, 바로 이렇게 우리는 `PhantomData`에게 흉내내라고 할 것입니다:
```rust
use std::marker;
struct Iter<'a, T: 'a> {
ptr: *const T,
end: *const T,
_marker: marker::PhantomData<&'a T>,
}
```
이렇게만 하면 됩니다. 수명은 제한될 것이고, 반복자는 `'a``T`에 대해서 공변할 것입니다. 모든 게 그냥 마법처럼 동작할 겁니다.
## 제네릭 매개변수와 해제 검사
이전에는 신경써야 할 다른 것이 있었습니다.
바로 이 문서는 이렇게 말했었지요:
> Another important example is Vec, which is (approximately) defined as follows:
>
> ```rust
> struct Vec<T> {
> data: *const T, // *const for variance!
> len: usize,
> cap: usize,
> }
> ```
>
> Unlike the previous example, it *appears* that everything is exactly as we
> want. Every generic argument to Vec shows up in at least one field.
> Good to go!
>
> Nope.
>
> The drop checker will generously determine that `Vec<T>` does not own any values
> of type T. This will in turn make it conclude that it doesn't need to worry
> about Vec dropping any T's in its destructor for determining drop check
> soundness. This will in turn allow people to create unsoundness using
> Vec's destructor.
>
> In order to tell the drop checker that we *do* own values of type T, and
> therefore may drop some T's when *we* drop, we must add an extra `PhantomData`
> saying exactly that:
>
> ```rust
> use std::marker;
>
> struct Vec<T> {
> data: *const T, // *const for variance!
> len: usize,
> cap: usize,
> _owns_T: marker::PhantomData<T>,
> }
> ```
since [RFC 1238](https://rust-lang.github.io/rfcs/1238-nonparametric-dropck.html),
**this is no longer true nor necessary**.
If you were to write:
```rust
struct Vec<T> {
data: *const T, // `*const` for variance!
len: usize,
cap: usize,
}
# #[cfg(any())]
impl<T> Drop for Vec<T> { /* … */ }
```
then the existence of that `impl<T> Drop for Vec<T>` makes it so Rust will consider
that that `Vec<T>` _owns_ values of type `T` (more precisely: may use values of type `T`
in its `Drop` implementation), and Rust will thus not allow them to _dangle_ should a
`Vec<T>` be dropped.
When a type already has a `Drop impl`, **adding an extra `_owns_T: PhantomData<T>` field
is thus _superfluous_ and accomplishes nothing**, dropck-wise (it still affects variance
and auto-traits).
- (advanced edge case: if the type containing the `PhantomData` has no `Drop` impl at all,
but still has drop glue (by having _another_ field with drop glue), then the
dropck/`#[may_dangle]` considerations mentioned herein do apply as well: a `PhantomData<T>`
field will then require `T` to be droppable whenever the containing type goes out of scope).
___
But this situation can sometimes lead to overly restrictive code. That's why the
standard library uses an unstable and `unsafe` attribute to opt back into the old
"unchecked" drop-checking behavior, that this very documentation warned about: the
`#[may_dangle]` attribute.
### An exception: the special case of the standard library and its unstable `#[may_dangle]`
This section can be skipped if you are only writing your own library code; but if you are
curious about what the standard library does with the actual `Vec` definition, you'll notice
that it still needs to use a `_owns_T: PhantomData<T>` field for soundness.
<details><summary>Click here to see why</summary>
Consider the following example:
```rust
fn main() {
let mut v: Vec<&str> = Vec::new();
let s: String = "Short-lived".into();
v.push(&s);
drop(s);
} // <- `v` is dropped here
```
with a classical `impl<T> Drop for Vec<T> {` definition, the above [is denied].
[is denied]: https://rust.godbolt.org/z/ans15Kqz3
Indeed, in this case we have a `Vec</* T = */ &'s str>` vector of `'s`-lived references
to `str`ings, but in the case of `let s: String`, it is dropped before the `Vec` is, and
thus `'s` **is expired** by the time the `Vec` is dropped, and the
`impl<'s> Drop for Vec<&'s str> {` is used.
This means that if such `Drop` were to be used, it would be dealing with an _expired_, or
_dangling_ lifetime `'s`. But this is contrary to Rust principles, where by default all
Rust references involved in a function signature are non-dangling and valid to dereference.
Hence why Rust has to conservatively deny this snippet.
And yet, in the case of the real `Vec`, the `Drop` impl does not care about `&'s str`,
_since it has no drop glue of its own_: it only wants to deallocate the backing buffer.
In other words, it would be nice if the above snippet was somehow accepted, by special
casing `Vec`, or by relying on some special property of `Vec`: `Vec` could try to
_promise not to use the `&'s str`s it holds when being dropped_.
This is the kind of `unsafe` promise that can be expressed with `#[may_dangle]`:
```rust ,ignore
unsafe impl<#[may_dangle] 's> Drop for Vec<&'s str> { /* … */ }
```
or, more generally:
```rust ,ignore
unsafe impl<#[may_dangle] T> Drop for Vec<T> { /* … */ }
```
is the `unsafe` way to opt out of this conservative assumption that Rust's drop
checker makes about type parameters of a dropped instance not being allowed to dangle.
And when this is done, such as in the standard library, we need to be careful in the
case where `T` has drop glue of its own. In this instance, imagine replacing the
`&'s str`s with a `struct PrintOnDrop<'s> /* = */ (&'s str);` which would have a
`Drop` impl wherein the inner `&'s str` would be dereferenced and printed to the screen.
Indeed, `Drop for Vec<T> {`, before deallocating the backing buffer, does have to transitively
drop each `T` item when it has drop glue; in the case of `PrintOnDrop<'s>`, it means that
`Drop for Vec<PrintOnDrop<'s>>` has to transitively drop the `PrintOnDrop<'s>`s elements before
deallocating the backing buffer.
So when we said that `'s` `#[may_dangle]`, it was an excessively loose statement. We'd rather want
to say: "`'s` may dangle provided it not be involved in some transitive drop glue". Or, more generally,
"`T` may dangle provided it not be involved in some transitive drop glue". This "exception to the
exception" is a pervasive situation whenever **we own a `T`**. That's why Rust's `#[may_dangle]` is
smart enough to know of this opt-out, and will thus be disabled _when the generic parameter is held
in an owned fashion_ by the fields of the struct.
Hence why the standard library ends up with:
```rust
# #[cfg(any())]
// we pinky-swear not to use `T` when dropping a `Vec`…
unsafe impl<#[may_dangle] T> Drop for Vec<T> {
fn drop(&mut self) {
unsafe {
if mem::needs_drop::<T>() {
/* … except here, that is, … */
ptr::drop_in_place::<[T]>(/* … */);
}
// …
dealloc(/* … */)
// …
}
}
}
struct Vec<T> {
// … except for the fact that a `Vec` owns `T` items and
// may thus be dropping `T` items on drop!
_owns_T: core::marker::PhantomData<T>,
ptr: *const T, // `*const` for variance (but this does not express ownership of a `T` *per se*)
len: usize,
cap: usize,
}
```
</details>
___
Raw pointers that own an allocation is such a pervasive pattern that the
standard library made a utility for itself called `Unique<T>` which:
* wraps a `*const T` for variance
* includes a `PhantomData<T>`
* auto-derives `Send`/`Sync` as if T was contained
* marks the pointer as `NonZero` for the null-pointer optimization
## Table of `PhantomData` patterns
Heres a table of all the wonderful ways `PhantomData` could be used:
| Phantom type | variance of `'a` | variance of `T` | `Send`/`Sync`<br/>(or lack thereof) | dangling `'a` or `T` in drop glue<br/>(_e.g._, `#[may_dangle] Drop`) |
|-----------------------------|:----------------:|:-----------------:|:-----------------------------------------:|:------------------------------------------------:|
| `PhantomData<T>` | - | **cov**ariant | inherited | disallowed ("owns `T`") |
| `PhantomData<&'a T>` | **cov**ariant | **cov**ariant | `Send + Sync`<br/>requires<br/>`T : Sync` | allowed |
| `PhantomData<&'a mut T>` | **cov**ariant | **inv**ariant | inherited | allowed |
| `PhantomData<*const T>` | - | **cov**ariant | `!Send + !Sync` | allowed |
| `PhantomData<*mut T>` | - | **inv**ariant | `!Send + !Sync` | allowed |
| `PhantomData<fn(T)>` | - | **contra**variant | `Send + Sync` | allowed |
| `PhantomData<fn() -> T>` | - | **cov**ariant | `Send + Sync` | allowed |
| `PhantomData<fn(T) -> T>` | - | **inv**ariant | `Send + Sync` | allowed |
| `PhantomData<Cell<&'a ()>>` | **inv**ariant | - | `Send + !Sync` | allowed |
- Note: opting out of the `Unpin` auto-trait requires the dedicated [`PhantomPinned`] type instead.
[`PhantomPinned`]: ../core/marker/struct.PhantomPinned.html