-
Notifications
You must be signed in to change notification settings - Fork 5
/
perhaps_rust_needs_defer.html
494 lines (457 loc) · 27.5 KB
/
perhaps_rust_needs_defer.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
<!DOCTYPE html>
<html>
<head>
<title>Perhaps Rust needs "defer"</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link type="application/atom+xml" href="/blog/feed.xml" rel="self"/>
<link rel="shortcut icon" type="image/ico" href="/blog/favicon.ico">
<link rel="stylesheet" type="text/css" href="main.css">
<link rel="stylesheet" href="https://unpkg.com/@highlightjs/[email protected]/styles/default.min.css">
<script src="highlight.min.js"></script>
<!-- From https://github.com/odin-lang/odin-lang.org/blob/6f48c2cfb094a42dffd34143884fa958bd9c0ba2/themes/odin/layouts/partials/head.html#L71 -->
<script src="x86asm.min.js"></script>
<script>
window.onload = function() {
hljs.registerLanguage("odin", function(e) {
return {
aliases: ["odin", "odinlang", "odin-lang"],
keywords: {
keyword: "auto_cast bit_field bit_set break case cast context continue defer distinct do dynamic else enum fallthrough for foreign if import in map matrix not_in or_else or_return package proc return struct switch transmute type_of typeid union using when where",
literal: "true false nil",
built_in: "abs align_of cap clamp complex conj expand_to_tuple imag jmag kmag len max min offset_of quaternion real size_of soa_unzip soa_zip swizzle type_info_of type_of typeid_of"
},
illegal: "</",
contains: [e.C_LINE_COMMENT_MODE, e.C_BLOCK_COMMENT_MODE, {
className: "string",
variants: [e.QUOTE_STRING_MODE, {
begin: "'",
end: "[^\\\\]'"
}, {
begin: "`",
end: "`"
}]
}, {
className: "number",
variants: [{
begin: e.C_NUMBER_RE + "[ijk]",
relevance: 1
}, e.C_NUMBER_MODE]
}]
}
});
hljs.highlightAll();
document.querySelectorAll('code').forEach((el, _i) => {
if (0 == el.classList.length || el.classList.contains('language-sh') || el.classList.contains('language-shell') || el.classList.contains('language-bash')){
el.classList.add('code-no-line-numbers');
return;
}
var lines = el.innerHTML.trimEnd().split('\n');
var out = [];
lines.forEach(function(l, i){
out.push('<span class="line-number">' + (i+1).toString() + '</span> ' + l);
});
el.innerHTML = out.join('\n');
});
}
</script>
</head>
<body>
<div id="banner">
<div id="name">
<img id="me" src="me.jpeg">
<span>Philippe Gaultier</span>
</div>
<ul>
<li> <a href="/blog/body_of_work.html">Body of work</a> </li>
<li> <a href="/blog/articles-by-tag.html">Tags</a> </li>
<li> <a href="https://github.com/gaultier/resume/raw/master/Philippe_Gaultier_resume_en.pdf">Resume</a> </li>
<li> <a href="https://www.linkedin.com/in/philippegaultier/">LinkedIn</a> </li>
<li> <a href="https://github.com/gaultier">Github</a> </li>
<li> <a href="/blog/feed.xml">Atom feed</a> </li>
</ul>
</div>
<div class="body">
<div class="article-prelude">
<p><a href="/blog"> ⏴ Back to all articles</a></p>
<p class="publication-date">Published on 2024-11-06</p>
</div>
<div class="article-title">
<h1>Perhaps Rust needs "defer"</h1>
<div class="tags"> <a href="/blog/articles-by-tag.html#rust" class="tag">Rust</a> <a href="/blog/articles-by-tag.html#c" class="tag">C</a></div>
</div>
<strong>Table of contents</strong>
<ul>
<li>
<a href="#setting-the-stage">Setting the stage</a>
</li>
<li>
<a href="#first-attempt-at-freeing-the-memory-properly">First attempt at freeing the memory properly</a>
</li>
<li>
<a href="#second-attempt-at-freeing-the-memory-properly">Second attempt at freeing the memory properly</a>
</li>
<li>
<a href="#third-attempt-at-freeing-the-memory-properly">Third attempt at freeing the memory properly</a>
</li>
<li>
<a href="#defer">Defer</a>
</li>
<li>
<a href="#possible-solutions">Possible solutions</a>
</li>
<li>
<a href="#conclusion">Conclusion</a>
</li>
<li>
<a href="#addendum-one-more-gotcha">Addendum: One more gotcha</a>
</li>
</ul>
<p><em>Or, how FFI in Rust is a pain in the neck.</em></p>
<p><em>Discussions: <a href="https://old.reddit.com/r/rust/comments/1gktuw6/perhaps_rust_needs_defer/?">/r/rust</a>, <a href="https://old.reddit.com/r/programming/comments/1gktum4/perhaps_rust_needs_defer/?">/r/programming</a>, <a href="https://news.ycombinator.com/item?id=42058091">HN</a>, <a href="https://lobste.rs/s/2ka0ps/perhaps_rust_needs_defer">lobsters</a></em></p>
<p>In a previous article I <a href="/blog/lessons_learned_from_a_successful_rust_rewrite.html#i-am-still-chasing-memory-leaks">mentioned</a> that we use the <code>defer</code> idiom in Rust through a crate, but that it actually rarely gets past the borrow checker. Some comments were <s>claiming this issue does not exist</s> surprised and I did not have an example at hand.</p>
<p>Well, today at work I hit this issue again so I thought I would document it. And the whole experience showcases well how working in Rust with lots of FFI interop feels like.</p>
<h2 id="setting-the-stage">
<a class="title" href="#setting-the-stage">Setting the stage</a>
<a class="hash-anchor" href="#setting-the-stage" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>So, I have a Rust API like this:</p>
<pre><code class="language-rust">#[repr(C)]
pub struct Foo {
value: usize,
}
#[no_mangle]
pub extern "C" fn MYLIB_get_foos(out_foos: *mut *mut Foo, out_foos_count: &mut usize) -> i32 {
let res = vec![Foo { value: 42 }, Foo { value: 99 }];
*out_foos_count = res.len();
unsafe { *out_foos = res.leak().as_mut_ptr() };
0
}
</code></pre>
<p>It allocates and returns an dynamically allocated array as a pointer and a length. Of course in reality, <code>Foo</code> has many fields and the values are not known in advance but what happens is that we send messages to a Smartcard to ask it to send us a piece of data residing on it, and it replies with some encoded messages that our library decodes and returns to the user.</p>
<p>I tell Cargo this is a static library:</p>
<pre><code class="language-toml"># Cargo.toml
[lib]
crate-type = ["staticlib"]
</code></pre>
<p>It's a straightforward API, so I generate the corresponding C header with cbindgen:</p>
<pre><code class="language-sh">$ cbindgen -v src/lib.rs --lang=c -o mylib.h
</code></pre>
<p>And I get:</p>
<pre><code class="language-c">#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>
typedef struct Foo {
uintptr_t value;
} Foo;
int32_t MYLIB_get_foos(struct Foo **out_foos, uintptr_t *out_foos_count);
</code></pre>
<p>I can now use it from C so:</p>
<pre><code class="language-c">#include "mylib.h"
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
Foo *foos = NULL;
size_t foos_count = 0;
assert(0 == MYLIB_get_foos(&foos, &foos_count));
for (size_t i = 0; i < foos_count; i++) {
printf("%lu\n", foos[i].value);
}
if (NULL != foos) {
free(foos);
}
}
</code></pre>
<p>I build it with all the warnings enabled, run it with sanitizers on, and/or in Valgrind, all good.</p>
<p><em>This code has a subtle mistake (can you spot it?), so keep on reading.</em></p>
<blockquote>
<p>If we feel fancy (and non-portable), we can even automate the freeing of the memory in C with <code>__attribute(cleanup)</code>, like <code>defer</code> (ominous sounds). But let's not, today. Let's focus on the Rust side.</p>
</blockquote>
<p>Now, we are principled developers who test their code (right?). So let's write a Rust test for it. We expect it to be exactly the same as the C code:</p>
<pre><code class="language-rust">#[cfg(test)]
mod tests {
#[test]
fn test_get_foos() {
let mut foos = std::ptr::null_mut();
let mut foos_count = 0;
assert_eq!(0, super::MYLIB_get_foos(&mut foos, &mut foos_count));
}
}
</code></pre>
<p>And it passes:</p>
<pre><code class="language-sh">$ cargo test
...
running 1 test
test tests::test_get_foos ... ok
...
</code></pre>
<p>Of course, we have not yet freed anything, so we expect Miri to complain, and it does:</p>
<pre><code class="language-sh">$ cargo +nightly miri test
...
error: memory leaked: alloc59029 (Rust heap, size: 16, align: 8), allocated here:
...
</code></pre>
<p>Note that the standard test runner does not report memory leaks, unfortunately. If Miri does not work for a given use case, and we still want to check that there are no leaks, we have to reach for nightly sanitizers or Valgrind.</p>
<h2 id="first-attempt-at-freeing-the-memory-properly">
<a class="title" href="#first-attempt-at-freeing-the-memory-properly">First attempt at freeing the memory properly</a>
<a class="hash-anchor" href="#first-attempt-at-freeing-the-memory-properly" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>Great, so let's free it at the end of the test, like C does, with <code>free</code> from libc, which we add as a dependency:</p>
<pre><code class="language-rust">
#[test]
fn test_get_foos() {
...
unsafe { libc::free(foos as *mut std::ffi::c_void) };
}
</code></pre>
<p>The test passes, great. Let's try with Miri:</p>
<pre><code class="language-sh">$ cargo +nightly miri test
...
error: Undefined Behavior: deallocating alloc59029, which is Rust heap memory, using C heap deallocation operation
...
</code></pre>
<p>Hmm...ok...Well that's a bit weird, because what Rust does, when the <code>Vec</code> is allocated, is to call out to <code>malloc</code> from libc, as we can see with <code>strace</code>:</p>
<pre><code class="language-sh">$ strace -k -v -e brk ./a.out
...
brk(0x213c0000) = 0x213c0000
> /usr/lib64/libc.so.6(brk+0xb) [0x10fa9b]
> /usr/lib64/libc.so.6(__sbrk+0x6b) [0x118cab]
> /usr/lib64/libc.so.6(__default_morecore@GLIBC_2.2.5+0x15) [0xa5325]
> /usr/lib64/libc.so.6(sysmalloc+0x57b) [0xa637b]
> /usr/lib64/libc.so.6(_int_malloc+0xd39) [0xa7399]
> /usr/lib64/libc.so.6(tcache_init.part.0+0x36) [0xa7676]
> /usr/lib64/libc.so.6(__libc_malloc+0x125) [0xa7ef5]
> /home/pg/scratch/rust-blog2/a.out(alloc::alloc::alloc+0x6a) [0x4a145a]
> /home/pg/scratch/rust-blog2/a.out(alloc::alloc::Global::alloc_impl+0x140) [0x4a15a0]
> /home/pg/scratch/rust-blog2/a.out(alloc::alloc::exchange_malloc+0x3a) [0x4a139a]
> /home/pg/scratch/rust-blog2/a.out(MYLIB_get_foos+0x26) [0x407cc6]
> /home/pg/scratch/rust-blog2/a.out(main+0x2b) [0x407bfb]
</code></pre>
<p><em>Depending on your system, the call stack and specific system call may vary. It depends on the libc implementation, but point being, <code>malloc</code> from libc gets called by Rust.</em></p>
<p>Note the irony that we do not need to have a third-party dependency on the <code>libc</code> crate to allocate with <code>malloc</code> (being called under the hood), but we do need it, in order to deallocate the memory with <code>free</code>. Perhaps it's by design. Anyway. Where was I.</p>
<p>The docs for <code>Vec</code> indeed state:</p>
<blockquote>
<p>In general, Vec’s allocation details are very subtle — if you intend to allocate memory using a Vec and use it for something else (either to pass to unsafe code, or to build your own memory-backed collection), be sure to deallocate this memory by using from_raw_parts to recover the Vec and then dropping it.</p>
</blockquote>
<p>But a few sentences later it also says:</p>
<blockquote>
<p>That is, the reported capacity is completely accurate, and can be relied on. It can even be used to manually free the memory allocated by a Vec if desired.</p>
</blockquote>
<p>So now I am confused, am I allowed to <code>free()</code> the <code>Vec</code>'s pointer directly or not?</p>
<p>By the way, we also spot in the same docs that there was no way to correctly free the <code>Vec</code> by calling <code>free()</code> on the pointer without knowing the capacity because:</p>
<blockquote>
<p>The pointer will never be null, so this type is null-pointer-optimized. However, the pointer might not actually point to allocated memory.</p>
</blockquote>
<p>Hmm, ok... So I guess the only way to not trigger Undefined Behavior on the C side when freeing, would be to keep the <code>capacity</code> of the <code>Vec</code> around and do:</p>
<pre><code class="language-c"> if (capacity > 0) {
free(foos);
}
</code></pre>
<p>Let's ignore for now that this will surprise every C developer out there that have been doing <code>if (NULL != ptr) free(ptr)</code> for 50 years now.</p>
<p>I also tried to investigate how <code>drop</code> is implemented for <code>Vec</code> to understand what's going on and I stopped at this function in <code>core/src/alloc/mod.rs</code>:</p>
<pre><code class="language-rust"> unsafe fn deallocate(&self, ptr: NonNull<u8>, layout: Layout);
</code></pre>
<p>Not sure where the implementation is located... Ok, let's move on.</p>
<p>Let's stay on the safe side and assume that we ought to use <code>Vec::from_raw_parts</code> and let the <code>Vec</code> free the memory when it gets dropped at the end of the scope. The only problem is: This function requires the pointer, the length, <em>and the capacity</em>. Wait, but we lost the capacity when we returned the pointer + length to the caller in <code>MYLIB_get_foos()</code>, and the caller <em>does not care one bit about the capacity</em>! It's irrelevant to them! At work, the mobile developers using our library rightfully asked: wait, what is this <code>cap</code> field? Why do I care? What do I do with it? If you are used to manually managing your own memory, this is a very old concept, but if you are used to a Garbage Collector, it's very much new.</p>
<h2 id="second-attempt-at-freeing-the-memory-properly">
<a class="title" href="#second-attempt-at-freeing-the-memory-properly">Second attempt at freeing the memory properly</a>
<a class="hash-anchor" href="#second-attempt-at-freeing-the-memory-properly" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>So, let's first try to dodge the problem the <s>hacky</s> simple way by pretending that the memory is allocated by a <code>Box</code>, which only needs the pointer, just like <code>free()</code>:</p>
<pre><code class="language-rust"> #[test]
fn test_get_foos() {
...
unsafe {
let _ = Box::from_raw(foos);
}
}
</code></pre>
<p>That's I think the first instinct for a C developer. Whatever way the memory was heap allocated, be it with <code>malloc</code>, <code>calloc</code>, <code>realloc</code>, be it for one struct or for a whole array, we want to free it with one call, passing it the base pointer. Let's ignore for a moment the docs that state that sometimes the pointer is heap-allocated and sometimes not.</p>
<p>So this Rust code builds. The test passes. And Miri is unhappy. I guess you know the drill by now:</p>
<pre><code class="language-sh">$ cargo +nightly miri test
...
incorrect layout on deallocation: alloc59029 has size 16 and alignment 8, but gave size 8 and alignment 8
...
</code></pre>
<p>Let's take a second to marvel at the fact that Rust, probably the programming language the most strict at compile time, the if-it-builds-it-runs-dude-I-swear language, seems to work at compile time and at run time, but only fails when run under an experimental analyzer that only works in nightly and does not support lots of FFI patterns, which is the place where you need Miri the most!</p>
<p>That's the power of Undefined Behavior and <code>unsafe{}</code>. Again: audit all of your <code>unsafe</code> blocks, and be very suspicious of any third-party code that uses <code>unsafe</code>. I think Rust developers on average do not realize the harm that it is very easy to inflict to your program by using <code>unsafe</code> unwisely even if everything seems fine.</p>
<p>Anyways, I guess we have to refactor our whole C API to do it the Rust Way(tm)!</p>
<h2 id="third-attempt-at-freeing-the-memory-properly">
<a class="title" href="#third-attempt-at-freeing-the-memory-properly">Third attempt at freeing the memory properly</a>
<a class="hash-anchor" href="#third-attempt-at-freeing-the-memory-properly" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>So, in our codebase at work, we have defined this type:</p>
<pre><code class="language-rust">/// Owning Array i.e. `Vec<T>` in Rust or `std::vector<T>` in C++.
#[repr(C)]
pub struct OwningArrayC<T> {
pub data: *mut T,
pub len: usize,
pub cap: usize,
}
</code></pre>
<p>It clearly signifies to the caller that they are in charge of freeing the memory, and also it carries the capacity of the <code>Vec</code> with it, so it's not lost.</p>
<p>In our project, this struct is used a lot. We also define a struct for non owning arrays (slices), etc.</p>
<p>So let's adapt the function, and also add a function in the API to free it for convenience:</p>
<pre><code class="language-rust">#[no_mangle]
pub extern "C" fn MYLIB_get_foos(out_foos: &mut OwningArrayC<Foo>) -> i32 {
let res = vec![Foo { value: 42 }, Foo { value: 99 }];
let len = res.len();
let cap = res.capacity();
*out_foos = OwningArrayC {
data: res.leak().as_mut_ptr(),
len,
cap,
};
0
}
#[no_mangle]
pub extern "C" fn MYLIB_free_foos(foos: &mut OwningArrayC<Foo>) {
if foos.cap > 0 {
unsafe {
let _ = Vec::from_raw_parts(foos.data, foos.len, foos.cap);
}
}
}
</code></pre>
<p>Let's also re-generate the C header, adapt the C code, rebuild it, communicate with the various projects that use our C API to make them adapt, etc...</p>
<p>Back to the Rust test:</p>
<pre><code class="language-rust">#[cfg(test)]
mod tests {
#[test]
fn test_get_foos() {
let mut foos = crate::OwningArrayC {
data: std::ptr::null_mut(),
len: 0,
cap: 0,
};
assert_eq!(0, super::MYLIB_get_foos(&mut foos));
println!("foos: {}", foos.len);
super::MYLIB_free_foos(&mut foos);
}
}
</code></pre>
<p>And now, Miri is happy. Urgh. So, back to what we set out to do originally, <code>defer</code>.</p>
<h2 id="defer">
<a class="title" href="#defer">Defer</a>
<a class="hash-anchor" href="#defer" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>The test is trivial right now but in real code, there are many code paths that sometimes allocate, sometimes not, with validation interleaved, and early returns, so we'd really like if we could statically demonstrate that the memory is always correctly freed. To ourselves, to auditors, etc.</p>
<p>One example at work of such hairy code is: building a linked list (in Rust), fetching more from the network based on the content of the last node in the list, and appending the additional data to the linked list, until some flag is detected in the encoded data. Oh, and there is also validation of the incoming data, so you might have to return early with a partially constructed list which should be properly cleaned up.</p>
<p>And there are many such examples like this, where the memory is often allocated/deallocated with a C API and it's not always possible to use RAII. So <code>defer</code> comes in handy.</p>
<hr />
<p>Let's use the <code>scopeguard</code> crate which provides a <code>defer!</code> macro, in the test, to automatically free the memory:</p>
<pre><code class="language-rust"> #[test]
fn test_get_foos() {
let mut foos = crate::OwningArrayC {
data: std::ptr::null_mut(),
len: 0,
cap: 0,
};
assert_eq!(0, super::MYLIB_get_foos(&mut foos));
defer! {
super::MYLIB_free_foos(&mut foos);
}
println!("foos: {}", foos.len);
}
</code></pre>
<p>And we get a compile error:</p>
<pre><code class="language-sh">$ cargo test
error[E0502]: cannot borrow `foos.len` as immutable because it is also borrowed as mutable
--> src/lib.rs:54:30
|
50 | / defer! {
51 | | super::MYLIB_free_foos(&mut foos);
| | ---- first borrow occurs due to use of `foos` in closure
52 | | }
| |_________- mutable borrow occurs here
53 |
54 | println!("foos: {}", foos.len);
| ^^^^^^^^ immutable borrow occurs here
55 | }
| - mutable borrow might be used here, when `_guard` is dropped and runs the `Drop` code for type `ScopeGuard`
|
</code></pre>
<p>Dum dum duuuum....Yes, we cannot use the <code>defer</code> idiom here (or at least I did not find a way). In some cases it's possible, in lots of cases it's not. The borrow checker considers that the <code>defer</code> block holds an exclusive mutable reference and the rest of the code cannot use that reference in any way.</p>
<p>Despite the fact, that the version without defer, and with defer, are semantically equivalent and the borrow checker is fine with the former and not with the latter.</p>
<h2 id="possible-solutions">
<a class="title" href="#possible-solutions">Possible solutions</a>
<a class="hash-anchor" href="#possible-solutions" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>So that is why I argue that Rust should get a <code>defer</code> statement in the language and the borrow checker should be made aware of this construct to allow this approach to take place.</p>
<p>But what can we do otherwise? Are there any alternatives?</p>
<ul>
<li>We can be very careful and make sure we deallocate everything by hand in every code paths. Obviously that doesn't scale to team size, code complexity, etc. And it's unfortunate since using a defer-like approach in C with <code>__attribute(cleanup)</code> and in C++ by implementing our <a href="https://www.gingerbill.org/article/2015/08/19/defer-in-cpp/">own</a> <code>defer</code> is trivial. And even Go which is garbage-collected has a first-class <code>defer</code>. So not being able to do so in Rust is unfortunate.</li>
<li>We can use a goto-like approach, as a reader <a href="https://lobste.rs/s/n6gciw/lessons_learned_from_successful_rust#c_8pzmqg">suggested</a> in a previous article, even though Rust does not have <code>goto</code> per se:
<pre><code class="language-rust">fn foo_init() -> *mut () { &mut () }
fn foo_bar(_: *mut ()) -> bool { false }
fn foo_baz(_: *mut ()) -> bool { true }
fn foo_free(_: *mut ()) {}
fn main() {
let f = foo_init();
'free: {
if foo_bar(f) {
break 'free;
}
if foo_baz(f) {
break 'free;
}
// ...
};
foo_free(f);
}
</code></pre>
It's very nifty, but I am not sure I would enjoy reading and writing this kind of code, especially with multiple levels of nesting. Again, it does not scale very well. But it's something.</li>
<li>We can work-around the borrow-checker to still use <code>defer</code> by refactoring our code to make it happy. Again, tedious and not always possible. One thing that possibly works is using handles (numerical ids) instead of pointers, so that they are <code>Copy</code> and the borrow checker does not see an issue with sharing/copying them. Like file descriptors work in Unix. The potential downside here is that it creates global state since some component has to bookkeep these handles and their mapping to the real pointer. But it's a <a href="https://floooh.github.io/2018/06/17/handles-vs-pointers.html">common</a> pattern in gamedev.</li>
<li>Perhaps the borrow checker can be improved upon without adding <code>defer</code> to the language, 'just' by making it smarter?</li>
<li>We can use arenas everywhere and sail away in the sunset, leaving all these nasty problems behind us</li>
<li>Rust can stabilize various nightly APIs and tools, like custom allocators and sanitizers, to make development simpler</li>
</ul>
<h2 id="conclusion">
<a class="title" href="#conclusion">Conclusion</a>
<a class="hash-anchor" href="#conclusion" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>Rust + FFI is nasty and has a lot of friction. I went at work through all these steps I went through in this article, and this happens a lot.</p>
<p>The crux of the issue is that there is a lot of knowledge to keep in our heads, lots of easy ways to shoot ourselves in the foot, and we have to reconcile what various tools tell us: even if the compiler is happy, the tests might not be. Even the tests are happy, Miri might not be. Even if we think we have done the right thing, we discover later, buried deep in the docs, that in fact, we didn't. It's definitely for experts only.</p>
<p>This should not be so hard! Won't somebody think of the <s>children</s> Rust FFI users?</p>
<p>EDIT: It's been <a href="https://chaos.social/@filmroellchen/113464336212759405">pointed</a> out to me that there are two on-going internal discussions by the Rust developers about this topic to possibly reserve the <code>defer</code> keyword for future use and maybe one day add this facility to the language: <a href="https://internals.rust-lang.org/t/pre-rfc-defer-statement/16644">1</a>, <a href="https://internals.rust-lang.org/t/a-defer-discussion/20387/71">2</a>.</p>
<h2 id="addendum-one-more-gotcha">
<a class="title" href="#addendum-one-more-gotcha">Addendum: One more gotcha</a>
<a class="hash-anchor" href="#addendum-one-more-gotcha" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>Rust guarantees that the underlying pointer in <code>Vec</code> is not null. And <code>OwningArrayC</code> mirrors <code>Vec</code>, so it should be the same, right? Well consider this C code:</p>
<pre><code class="language-c">int main() {
OwningArrayC_Foo foos = {0};
if (some_condition) {
MYLIB_get_foos(&foos);
}
// `foos.data` is null here in some code paths.
MYLIB_free_foos(&foos);
}
</code></pre>
<p>In this case, <code>MYLIB_free_foos</code> actually can receive an argument with a null pointer (the <code>data</code> field), which would then trigger an assert inside <code>Vec::from_raw_parts</code>. So we should check that in <code>MY_LIB_free_foos</code>:</p>
<pre><code class="language-rust">#[no_mangle]
pub extern "C" fn MYLIB_free_foos(foos: &mut OwningArrayC<Foo>) {
if !foos.data.is_null() {
unsafe {
let _ = Vec::from_raw_parts(foos.data, foos.len, foos.cap);
}
}
}
</code></pre>
<p>It might be a bit surprising to a pure Rust developer given the <code>Vec</code> guarantees, but since the C side could pass anything, we must be defensive.</p>
<p><a href="/blog"> ⏴ Back to all articles</a></p>
<blockquote id="donate">
<p>If you enjoy what you're reading, you want to support me, and can afford it: <a href="https://paypal.me/philigaultier?country.x=DE&locale.x=en_US">Support me</a>. That allows me to write more cool articles!</p>
</blockquote>
<blockquote>
<p>
This blog is <a href="https://github.com/gaultier/blog">open-source</a>!
If you find a problem, please open a Github issue.
The content of this blog as well as the code snippets are under the <a href="https://en.wikipedia.org/wiki/BSD_licenses#3-clause_license_(%22BSD_License_2.0%22,_%22Revised_BSD_License%22,_%22New_BSD_License%22,_or_%22Modified_BSD_License%22)">BSD-3 License</a> which I also usually use for all my personal projects. It's basically free for every use but you have to mention me as the original author.
</p>
</blockquote>
</div>
</body>
</html>