diff --git a/src/SUMMARY.md b/src/SUMMARY.md index e33dcbb..0b34952 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -53,3 +53,4 @@ * [Handling Zero-Sized Types](vec-zsts.md) * [Final Code](vec-final.md) * [Implementing Arc and Mutex](arc-and-mutex.md) +* [FFI](ffi.md) diff --git a/src/ffi.md b/src/ffi.md new file mode 100644 index 0000000..03693bb --- /dev/null +++ b/src/ffi.md @@ -0,0 +1,756 @@ +# Foreign Function Interface + +# Introduction + +This guide will use the [snappy](https://github.com/google/snappy) +compression/decompression library as an introduction to writing bindings for +foreign code. Rust is currently unable to call directly into a C++ library, but +snappy includes a C interface (documented in +[`snappy-c.h`](https://github.com/google/snappy/blob/master/snappy-c.h)). + +## A note about libc + +Many of these examples use [the `libc` crate][libc], which provides various +type definitions for C types, among other things. If you’re trying these +examples yourself, you’ll need to add `libc` to your `Cargo.toml`: + +```toml +[dependencies] +libc = "0.2.0" +``` + +[libc]: https://crates.io/crates/libc + +and add `extern crate libc;` to your crate root. + +## Calling foreign functions + +The following is a minimal example of calling a foreign function which will +compile if snappy is installed: + +```rust,ignore +extern crate libc; +use libc::size_t; + +#[link(name = "snappy")] +extern { + fn snappy_max_compressed_length(source_length: size_t) -> size_t; +} + +fn main() { + let x = unsafe { snappy_max_compressed_length(100) }; + println!("max compressed length of a 100 byte buffer: {}", x); +} +``` + +The `extern` block is a list of function signatures in a foreign library, in +this case with the platform's C ABI. The `#[link(...)]` attribute is used to +instruct the linker to link against the snappy library so the symbols are +resolved. + +Foreign functions are assumed to be unsafe so calls to them need to be wrapped +with `unsafe {}` as a promise to the compiler that everything contained within +truly is safe. C libraries often expose interfaces that aren't thread-safe, and +almost any function that takes a pointer argument isn't valid for all possible +inputs since the pointer could be dangling, and raw pointers fall outside of +Rust's safe memory model. + +When declaring the argument types to a foreign function, the Rust compiler +cannot check if the declaration is correct, so specifying it correctly is part +of keeping the binding correct at runtime. + +The `extern` block can be extended to cover the entire snappy API: + +```rust,ignore +extern crate libc; +use libc::{c_int, size_t}; + +#[link(name = "snappy")] +extern { + fn snappy_compress(input: *const u8, + input_length: size_t, + compressed: *mut u8, + compressed_length: *mut size_t) -> c_int; + fn snappy_uncompress(compressed: *const u8, + compressed_length: size_t, + uncompressed: *mut u8, + uncompressed_length: *mut size_t) -> c_int; + fn snappy_max_compressed_length(source_length: size_t) -> size_t; + fn snappy_uncompressed_length(compressed: *const u8, + compressed_length: size_t, + result: *mut size_t) -> c_int; + fn snappy_validate_compressed_buffer(compressed: *const u8, + compressed_length: size_t) -> c_int; +} +# fn main() {} +``` + +# Creating a safe interface + +The raw C API needs to be wrapped to provide memory safety and make use of higher-level concepts +like vectors. A library can choose to expose only the safe, high-level interface and hide the unsafe +internal details. + +Wrapping the functions which expect buffers involves using the `slice::raw` module to manipulate Rust +vectors as pointers to memory. Rust's vectors are guaranteed to be a contiguous block of memory. The +length is the number of elements currently contained, and the capacity is the total size in elements of +the allocated memory. The length is less than or equal to the capacity. + +```rust,ignore +# extern crate libc; +# use libc::{c_int, size_t}; +# unsafe fn snappy_validate_compressed_buffer(_: *const u8, _: size_t) -> c_int { 0 } +# fn main() {} +pub fn validate_compressed_buffer(src: &[u8]) -> bool { + unsafe { + snappy_validate_compressed_buffer(src.as_ptr(), src.len() as size_t) == 0 + } +} +``` + +The `validate_compressed_buffer` wrapper above makes use of an `unsafe` block, but it makes the +guarantee that calling it is safe for all inputs by leaving off `unsafe` from the function +signature. + +The `snappy_compress` and `snappy_uncompress` functions are more complex, since a buffer has to be +allocated to hold the output too. + +The `snappy_max_compressed_length` function can be used to allocate a vector with the maximum +required capacity to hold the compressed output. The vector can then be passed to the +`snappy_compress` function as an output parameter. An output parameter is also passed to retrieve +the true length after compression for setting the length. + +```rust,ignore +# extern crate libc; +# use libc::{size_t, c_int}; +# unsafe fn snappy_compress(a: *const u8, b: size_t, c: *mut u8, +# d: *mut size_t) -> c_int { 0 } +# unsafe fn snappy_max_compressed_length(a: size_t) -> size_t { a } +# fn main() {} +pub fn compress(src: &[u8]) -> Vec { + unsafe { + let srclen = src.len() as size_t; + let psrc = src.as_ptr(); + + let mut dstlen = snappy_max_compressed_length(srclen); + let mut dst = Vec::with_capacity(dstlen as usize); + let pdst = dst.as_mut_ptr(); + + snappy_compress(psrc, srclen, pdst, &mut dstlen); + dst.set_len(dstlen as usize); + dst + } +} +``` + +Decompression is similar, because snappy stores the uncompressed size as part of the compression +format and `snappy_uncompressed_length` will retrieve the exact buffer size required. + +```rust,ignore +# extern crate libc; +# use libc::{size_t, c_int}; +# unsafe fn snappy_uncompress(compressed: *const u8, +# compressed_length: size_t, +# uncompressed: *mut u8, +# uncompressed_length: *mut size_t) -> c_int { 0 } +# unsafe fn snappy_uncompressed_length(compressed: *const u8, +# compressed_length: size_t, +# result: *mut size_t) -> c_int { 0 } +# fn main() {} +pub fn uncompress(src: &[u8]) -> Option> { + unsafe { + let srclen = src.len() as size_t; + let psrc = src.as_ptr(); + + let mut dstlen: size_t = 0; + snappy_uncompressed_length(psrc, srclen, &mut dstlen); + + let mut dst = Vec::with_capacity(dstlen as usize); + let pdst = dst.as_mut_ptr(); + + if snappy_uncompress(psrc, srclen, pdst, &mut dstlen) == 0 { + dst.set_len(dstlen as usize); + Some(dst) + } else { + None // SNAPPY_INVALID_INPUT + } + } +} +``` + +Then, we can add some tests to show how to use them. + +```rust,ignore +# extern crate libc; +# use libc::{c_int, size_t}; +# unsafe fn snappy_compress(input: *const u8, +# input_length: size_t, +# compressed: *mut u8, +# compressed_length: *mut size_t) +# -> c_int { 0 } +# unsafe fn snappy_uncompress(compressed: *const u8, +# compressed_length: size_t, +# uncompressed: *mut u8, +# uncompressed_length: *mut size_t) +# -> c_int { 0 } +# unsafe fn snappy_max_compressed_length(source_length: size_t) -> size_t { 0 } +# unsafe fn snappy_uncompressed_length(compressed: *const u8, +# compressed_length: size_t, +# result: *mut size_t) +# -> c_int { 0 } +# unsafe fn snappy_validate_compressed_buffer(compressed: *const u8, +# compressed_length: size_t) +# -> c_int { 0 } +# fn main() { } + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn valid() { + let d = vec![0xde, 0xad, 0xd0, 0x0d]; + let c: &[u8] = &compress(&d); + assert!(validate_compressed_buffer(c)); + assert!(uncompress(c) == Some(d)); + } + + #[test] + fn invalid() { + let d = vec![0, 0, 0, 0]; + assert!(!validate_compressed_buffer(&d)); + assert!(uncompress(&d).is_none()); + } + + #[test] + fn empty() { + let d = vec![]; + assert!(!validate_compressed_buffer(&d)); + assert!(uncompress(&d).is_none()); + let c = compress(&d); + assert!(validate_compressed_buffer(&c)); + assert!(uncompress(&c) == Some(d)); + } +} +``` + +# Destructors + +Foreign libraries often hand off ownership of resources to the calling code. +When this occurs, we must use Rust's destructors to provide safety and guarantee +the release of these resources (especially in the case of panic). + +For more about destructors, see the [Drop trait](../std/ops/trait.Drop.html). + +# Callbacks from C code to Rust functions + +Some external libraries require the usage of callbacks to report back their +current state or intermediate data to the caller. +It is possible to pass functions defined in Rust to an external library. +The requirement for this is that the callback function is marked as `extern` +with the correct calling convention to make it callable from C code. + +The callback function can then be sent through a registration call +to the C library and afterwards be invoked from there. + +A basic example is: + +Rust code: + +```rust,no_run +extern fn callback(a: i32) { + println!("I'm called from C with value {0}", a); +} + +#[link(name = "extlib")] +extern { + fn register_callback(cb: extern fn(i32)) -> i32; + fn trigger_callback(); +} + +fn main() { + unsafe { + register_callback(callback); + trigger_callback(); // Triggers the callback. + } +} +``` + +C code: + +```c +typedef void (*rust_callback)(int32_t); +rust_callback cb; + +int32_t register_callback(rust_callback callback) { + cb = callback; + return 1; +} + +void trigger_callback() { + cb(7); // Will call callback(7) in Rust. +} +``` + +In this example Rust's `main()` will call `trigger_callback()` in C, +which would, in turn, call back to `callback()` in Rust. + + +## Targeting callbacks to Rust objects + +The former example showed how a global function can be called from C code. +However it is often desired that the callback is targeted to a special +Rust object. This could be the object that represents the wrapper for the +respective C object. + +This can be achieved by passing a raw pointer to the object down to the +C library. The C library can then include the pointer to the Rust object in +the notification. This will allow the callback to unsafely access the +referenced Rust object. + +Rust code: + +```rust,no_run +#[repr(C)] +struct RustObject { + a: i32, + // Other members... +} + +extern "C" fn callback(target: *mut RustObject, a: i32) { + println!("I'm called from C with value {0}", a); + unsafe { + // Update the value in RustObject with the value received from the callback: + (*target).a = a; + } +} + +#[link(name = "extlib")] +extern { + fn register_callback(target: *mut RustObject, + cb: extern fn(*mut RustObject, i32)) -> i32; + fn trigger_callback(); +} + +fn main() { + // Create the object that will be referenced in the callback: + let mut rust_object = Box::new(RustObject { a: 5 }); + + unsafe { + register_callback(&mut *rust_object, callback); + trigger_callback(); + } +} +``` + +C code: + +```c +typedef void (*rust_callback)(void*, int32_t); +void* cb_target; +rust_callback cb; + +int32_t register_callback(void* callback_target, rust_callback callback) { + cb_target = callback_target; + cb = callback; + return 1; +} + +void trigger_callback() { + cb(cb_target, 7); // Will call callback(&rustObject, 7) in Rust. +} +``` + +## Asynchronous callbacks + +In the previously given examples the callbacks are invoked as a direct reaction +to a function call to the external C library. +The control over the current thread is switched from Rust to C to Rust for the +execution of the callback, but in the end the callback is executed on the +same thread that called the function which triggered the callback. + +Things get more complicated when the external library spawns its own threads +and invokes callbacks from there. +In these cases access to Rust data structures inside the callbacks is +especially unsafe and proper synchronization mechanisms must be used. +Besides classical synchronization mechanisms like mutexes, one possibility in +Rust is to use channels (in `std::sync::mpsc`) to forward data from the C +thread that invoked the callback into a Rust thread. + +If an asynchronous callback targets a special object in the Rust address space +it is also absolutely necessary that no more callbacks are performed by the +C library after the respective Rust object gets destroyed. +This can be achieved by unregistering the callback in the object's +destructor and designing the library in a way that guarantees that no +callback will be performed after deregistration. + +# Linking + +The `link` attribute on `extern` blocks provides the basic building block for +instructing rustc how it will link to native libraries. There are two accepted +forms of the link attribute today: + +* `#[link(name = "foo")]` +* `#[link(name = "foo", kind = "bar")]` + +In both of these cases, `foo` is the name of the native library that we're +linking to, and in the second case `bar` is the type of native library that the +compiler is linking to. There are currently three known types of native +libraries: + +* Dynamic - `#[link(name = "readline")]` +* Static - `#[link(name = "my_build_dependency", kind = "static")]` +* Frameworks - `#[link(name = "CoreFoundation", kind = "framework")]` + +Note that frameworks are only available on macOS targets. + +The different `kind` values are meant to differentiate how the native library +participates in linkage. From a linkage perspective, the Rust compiler creates +two flavors of artifacts: partial (rlib/staticlib) and final (dylib/binary). +Native dynamic library and framework dependencies are propagated to the final +artifact boundary, while static library dependencies are not propagated at +all, because the static libraries are integrated directly into the subsequent +artifact. + +A few examples of how this model can be used are: + +* A native build dependency. Sometimes some C/C++ glue is needed when writing + some Rust code, but distribution of the C/C++ code in a library format is + a burden. In this case, the code will be archived into `libfoo.a` and then the + Rust crate would declare a dependency via `#[link(name = "foo", kind = + "static")]`. + + Regardless of the flavor of output for the crate, the native static library + will be included in the output, meaning that distribution of the native static + library is not necessary. + +* A normal dynamic dependency. Common system libraries (like `readline`) are + available on a large number of systems, and often a static copy of these + libraries cannot be found. When this dependency is included in a Rust crate, + partial targets (like rlibs) will not link to the library, but when the rlib + is included in a final target (like a binary), the native library will be + linked in. + +On macOS, frameworks behave with the same semantics as a dynamic library. + +# Unsafe blocks + +Some operations, like dereferencing raw pointers or calling functions that have been marked +unsafe are only allowed inside unsafe blocks. Unsafe blocks isolate unsafety and are a promise to +the compiler that the unsafety does not leak out of the block. + +Unsafe functions, on the other hand, advertise it to the world. An unsafe function is written like +this: + +```rust +unsafe fn kaboom(ptr: *const i32) -> i32 { *ptr } +``` + +This function can only be called from an `unsafe` block or another `unsafe` function. + +# Accessing foreign globals + +Foreign APIs often export a global variable which could do something like track +global state. In order to access these variables, you declare them in `extern` +blocks with the `static` keyword: + +```rust,ignore +extern crate libc; + +#[link(name = "readline")] +extern { + static rl_readline_version: libc::c_int; +} + +fn main() { + println!("You have readline version {} installed.", + unsafe { rl_readline_version as i32 }); +} +``` + +Alternatively, you may need to alter global state provided by a foreign +interface. To do this, statics can be declared with `mut` so we can mutate +them. + +```rust,ignore +extern crate libc; + +use std::ffi::CString; +use std::ptr; + +#[link(name = "readline")] +extern { + static mut rl_prompt: *const libc::c_char; +} + +fn main() { + let prompt = CString::new("[my-awesome-shell] $").unwrap(); + unsafe { + rl_prompt = prompt.as_ptr(); + + println!("{:?}", rl_prompt); + + rl_prompt = ptr::null(); + } +} +``` + +Note that all interaction with a `static mut` is unsafe, both reading and +writing. Dealing with global mutable state requires a great deal of care. + +# Foreign calling conventions + +Most foreign code exposes a C ABI, and Rust uses the platform's C calling convention by default when +calling foreign functions. Some foreign functions, most notably the Windows API, use other calling +conventions. Rust provides a way to tell the compiler which convention to use: + +```rust,ignore +extern crate libc; + +#[cfg(all(target_os = "win32", target_arch = "x86"))] +#[link(name = "kernel32")] +#[allow(non_snake_case)] +extern "stdcall" { + fn SetEnvironmentVariableA(n: *const u8, v: *const u8) -> libc::c_int; +} +# fn main() { } +``` + +This applies to the entire `extern` block. The list of supported ABI constraints +are: + +* `stdcall` +* `aapcs` +* `cdecl` +* `fastcall` +* `vectorcall` +This is currently hidden behind the `abi_vectorcall` gate and is subject to change. +* `Rust` +* `rust-intrinsic` +* `system` +* `C` +* `win64` +* `sysv64` + +Most of the abis in this list are self-explanatory, but the `system` abi may +seem a little odd. This constraint selects whatever the appropriate ABI is for +interoperating with the target's libraries. For example, on win32 with a x86 +architecture, this means that the abi used would be `stdcall`. On x86_64, +however, windows uses the `C` calling convention, so `C` would be used. This +means that in our previous example, we could have used `extern "system" { ... }` +to define a block for all windows systems, not only x86 ones. + +# Interoperability with foreign code + +Rust guarantees that the layout of a `struct` is compatible with the platform's +representation in C only if the `#[repr(C)]` attribute is applied to it. +`#[repr(C, packed)]` can be used to lay out struct members without padding. +`#[repr(C)]` can also be applied to an enum. + +Rust's owned boxes (`Box`) use non-nullable pointers as handles which point +to the contained object. However, they should not be manually created because +they are managed by internal allocators. References can safely be assumed to be +non-nullable pointers directly to the type. However, breaking the borrow +checking or mutability rules is not guaranteed to be safe, so prefer using raw +pointers (`*`) if that's needed because the compiler can't make as many +assumptions about them. + +Vectors and strings share the same basic memory layout, and utilities are +available in the `vec` and `str` modules for working with C APIs. However, +strings are not terminated with `\0`. If you need a NUL-terminated string for +interoperability with C, you should use the `CString` type in the `std::ffi` +module. + +The [`libc` crate on crates.io][libc] includes type aliases and function +definitions for the C standard library in the `libc` module, and Rust links +against `libc` and `libm` by default. + +# Variadic functions + +In C, functions can be 'variadic', meaning they accept a variable number of arguments. This can +be achieved in Rust by specifying `...` within the argument list of a foreign function declaration: + +```no_run +extern { + fn foo(x: i32, ...); +} + +fn main() { + unsafe { + foo(10, 20, 30, 40, 50); + } +} +``` + +Normal Rust functions can *not* be variadic: + +```ignore +// This will not compile + +fn foo(x: i32, ...) { } +``` + +# The "nullable pointer optimization" + +Certain Rust types are defined to never be `null`. This includes references (`&T`, +`&mut T`), boxes (`Box`), and function pointers (`extern "abi" fn()`). When +interfacing with C, pointers that might be `null` are often used, which would seem to +require some messy `transmute`s and/or unsafe code to handle conversions to/from Rust types. +However, the language provides a workaround. + +As a special case, an `enum` is eligible for the "nullable pointer optimization" if it contains +exactly two variants, one of which contains no data and the other contains a field of one of the +non-nullable types listed above. This means no extra space is required for a discriminant; rather, +the empty variant is represented by putting a `null` value into the non-nullable field. This is +called an "optimization", but unlike other optimizations it is guaranteed to apply to eligible +types. + +The most common type that takes advantage of the nullable pointer optimization is `Option`, +where `None` corresponds to `null`. So `Option c_int>` is a correct way +to represent a nullable function pointer using the C ABI (corresponding to the C type +`int (*)(int)`). + +Here is a contrived example. Let's say some C library has a facility for registering a +callback, which gets called in certain situations. The callback is passed a function pointer +and an integer and it is supposed to run the function with the integer as a parameter. So +we have function pointers flying across the FFI boundary in both directions. + +```rust,ignore +extern crate libc; +use libc::c_int; + +# #[cfg(hidden)] +extern "C" { + /// Registers the callback. + fn register(cb: Option c_int>, c_int) -> c_int>); +} +# unsafe fn register(_: Option c_int>, +# c_int) -> c_int>) +# {} + +/// This fairly useless function receives a function pointer and an integer +/// from C, and returns the result of calling the function with the integer. +/// In case no function is provided, it squares the integer by default. +extern "C" fn apply(process: Option c_int>, int: c_int) -> c_int { + match process { + Some(f) => f(int), + None => int * int + } +} + +fn main() { + unsafe { + register(Some(apply)); + } +} +``` + +And the code on the C side looks like this: + +```c +void register(void (*f)(void (*)(int), int)) { + ... +} +``` + +No `transmute` required! + +# Calling Rust code from C + +You may wish to compile Rust code in a way so that it can be called from C. This is +fairly easy, but requires a few things: + +```rust +#[no_mangle] +pub extern fn hello_rust() -> *const u8 { + "Hello, world!\0".as_ptr() +} +# fn main() {} +``` + +The `extern` makes this function adhere to the C calling convention, as +discussed above in "[Foreign Calling +Conventions](ffi.html#foreign-calling-conventions)". The `no_mangle` +attribute turns off Rust's name mangling, so that it is easier to link to. + +# FFI and panics + +It’s important to be mindful of `panic!`s when working with FFI. A `panic!` +across an FFI boundary is undefined behavior. If you’re writing code that may +panic, you should run it in a closure with [`catch_unwind`]: + +```rust +use std::panic::catch_unwind; + +#[no_mangle] +pub extern fn oh_no() -> i32 { + let result = catch_unwind(|| { + panic!("Oops!"); + }); + match result { + Ok(_) => 0, + Err(_) => 1, + } +} + +fn main() {} +``` + +Please note that [`catch_unwind`] will only catch unwinding panics, not +those who abort the process. See the documentation of [`catch_unwind`] +for more information. + +[`catch_unwind`]: ../std/panic/fn.catch_unwind.html + +# Representing opaque structs + +Sometimes, a C library wants to provide a pointer to something, but not let you +know the internal details of the thing it wants. The simplest way is to use a +`void *` argument: + +```c +void foo(void *arg); +void bar(void *arg); +``` + +We can represent this in Rust with the `c_void` type: + +```rust,ignore +extern crate libc; + +extern "C" { + pub fn foo(arg: *mut libc::c_void); + pub fn bar(arg: *mut libc::c_void); +} +# fn main() {} +``` + +This is a perfectly valid way of handling the situation. However, we can do a bit +better. To solve this, some C libraries will instead create a `struct`, where +the details and memory layout of the struct are private. This gives some amount +of type safety. These structures are called ‘opaque’. Here’s an example, in C: + +```c +struct Foo; /* Foo is a structure, but its contents are not part of the public interface */ +struct Bar; +void foo(struct Foo *arg); +void bar(struct Bar *arg); +``` + +To do this in Rust, let’s create our own opaque types with `enum`: + +```rust +pub enum Foo {} +pub enum Bar {} + +extern "C" { + pub fn foo(arg: *mut Foo); + pub fn bar(arg: *mut Bar); +} +# fn main() {} +``` + +By using an `enum` with no variants, we create an opaque type that we can’t +instantiate, as it has no variants. But because our `Foo` and `Bar` types are +different, we’ll get type safety between the two of them, so we cannot +accidentally pass a pointer to `Foo` to `bar()`.