You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
nomicon/repr-rust.md

4.9 KiB

% repr(Rust)

First and foremost, all types have an alignment specified in bytes. The alignment of a type specifies what addresses are valid to store the value at. A value of alignment n must only be stored at an address that is a multiple of n. So alignment 2 means you must be stored at an even address, and 1 means that you can be stored anywhere. Alignment is at least 1, and always a power of 2. Most primitives are generally aligned to their size, although this is platform-specific behaviour. In particular, on x86 u64 and f64 may be only aligned to 32 bits.

A type's size must always be a multiple of its alignment. This ensures that an array of that type may always be indexed by offsetting by a multiple of its size. Note that the size and alignment of a type may not be known statically in the case of dynamically sized types.

Rust gives you the following ways to lay out composite data:

  • structs (named product types)
  • tuples (anonymous product types)
  • arrays (homogeneous product types)
  • enums (named sum types -- tagged unions)

An enum is said to be C-like if none of its variants have associated data.

Composite structures will have an alignment equal to the maximum of their fields' alignment. Rust will consequently insert padding where necessary to ensure that all fields are properly aligned and that the overall type's size is a multiple of its alignment. For instance:

struct A {
    a: u8,
    b: u32,
    c: u16,
}

will be 32-bit aligned on an architecture that aligns these primitives to their respective sizes. The whole struct will therefore have a size that is a multiple of 32-bits. It will potentially become:

struct A {
    a: u8,
    _pad1: [u8; 3], // to align `b`
    b: u32,
    c: u16,
    _pad2: [u8; 2], // to make overall size multiple of 4
}

There is no indirection for these types; all data is stored within the struct, as you would expect in C. However with the exception of arrays (which are densely packed and in-order), the layout of data is not by default specified in Rust. Given the two following struct definitions:

struct A {
    a: i32,
    b: u64,
}

struct B {
    a: i32,
    b: u64,
}

Rust does guarantee that two instances of A have their data laid out in exactly the same way. However Rust does not currently guarantee that an instance of A has the same field ordering or padding as an instance of B, though in practice there's no reason why they wouldn't.

With A and B as written, this point would seem to be pedantic, but several other features of Rust make it desirable for the language to play with data layout in complex ways.

For instance, consider this struct:

struct Foo<T, U> {
    count: u16,
    data1: T,
    data2: U,
}

Now consider the monomorphizations of Foo<u32, u16> and Foo<u16, u32>. If Rust lays out the fields in the order specified, we expect it to pad the values in the struct to satisfy their alignment requirements. So if Rust didn't reorder fields, we would expect it to produce the following:

struct Foo<u16, u32> {
    count: u16,
    data1: u16,
    data2: u32,
}

struct Foo<u32, u16> {
    count: u16,
    _pad1: u16,
    data1: u32,
    data2: u16,
    _pad2: u16,
}

The latter case quite simply wastes space. An optimal use of space therefore requires different monomorphizations to have different field orderings.

Note: this is a hypothetical optimization that is not yet implemented in Rust 1.0

Enums make this consideration even more complicated. Naively, an enum such as:

enum Foo {
    A(u32),
    B(u64),
    C(u8),
}

would be laid out as:

struct FooRepr {
    data: u64, // this is either a u64, u32, or u8 based on `tag`
    tag: u8,   // 0 = A, 1 = B, 2 = C
}

And indeed this is approximately how it would be laid out in general (modulo the size and position of tag).

However there are several cases where such a representation is inefficient. The classic case of this is Rust's "null pointer optimization": an enum consisting of a single outer unit variant (e.g. None) and a (potentially nested) non- nullable pointer variant (e.g. &T) makes the tag unnecessary, because a null pointer value can safely be interpreted to mean that the unit variant is chosen instead. The net result is that, for example, size_of::<Option<&T>>() == size_of::<&T>().

There are many types in Rust that are, or contain, non-nullable pointers such as Box<T>, Vec<T>, String, &T, and &mut T. Similarly, one can imagine nested enums pooling their tags into a single discriminant, as they are by definition known to have a limited range of valid values. In principle enums could use fairly elaborate algorithms to cache bits throughout nested types with special constrained representations. As such it is especially desirable that we leave enum layout unspecified today.