-
Notifications
You must be signed in to change notification settings - Fork 5
/
i_want_rust_to_have_defer.html
338 lines (315 loc) · 15.1 KB
/
i_want_rust_to_have_defer.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
<!DOCTYPE html>
<html>
<head>
<title>I want Rust to have "defer"</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link type="application/atom+xml" href="/blog/feed.xml" rel="self"/>
<link rel="shortcut icon" type="image/ico" href="/blog/favicon.ico">
<link rel="stylesheet" type="text/css" href="main.css">
<link rel="stylesheet" href="https://unpkg.com/@highlightjs/[email protected]/styles/default.min.css">
<script src="highlight.min.js"></script>
<!-- From https://github.com/odin-lang/odin-lang.org/blob/6f48c2cfb094a42dffd34143884fa958bd9c0ba2/themes/odin/layouts/partials/head.html#L71 -->
<script src="x86asm.min.js"></script>
<script>
window.onload = function() {
hljs.registerLanguage("odin", function(e) {
return {
aliases: ["odin", "odinlang", "odin-lang"],
keywords: {
keyword: "auto_cast bit_field bit_set break case cast context continue defer distinct do dynamic else enum fallthrough for foreign if import in map matrix not_in or_else or_return package proc return struct switch transmute type_of typeid union using when where",
literal: "true false nil",
built_in: "abs align_of cap clamp complex conj expand_to_tuple imag jmag kmag len max min offset_of quaternion real size_of soa_unzip soa_zip swizzle type_info_of type_of typeid_of"
},
illegal: "</",
contains: [e.C_LINE_COMMENT_MODE, e.C_BLOCK_COMMENT_MODE, {
className: "string",
variants: [e.QUOTE_STRING_MODE, {
begin: "'",
end: "[^\\\\]'"
}, {
begin: "`",
end: "`"
}]
}, {
className: "number",
variants: [{
begin: e.C_NUMBER_RE + "[ijk]",
relevance: 1
}, e.C_NUMBER_MODE]
}]
}
});
hljs.highlightAll();
document.querySelectorAll('code').forEach((el, _i) => {
if (0 == el.classList.length || el.classList.contains('language-sh') || el.classList.contains('language-shell') || el.classList.contains('language-bash')){
el.classList.add('code-no-line-numbers');
return;
}
var lines = el.innerHTML.trimEnd().split('\n');
var out = [];
lines.forEach(function(l, i){
out.push('<span class="line-number">' + (i+1).toString() + '</span> ' + l);
});
el.innerHTML = out.join('\n');
});
}
</script>
</head>
<body>
<div id="banner">
<div id="name">
<img id="me" src="me.jpeg">
<span>Philippe Gaultier</span>
</div>
<ul>
<li> <a href="/blog/body_of_work.html">Body of work</a> </li>
<li> <a href="/blog/articles-by-tag.html">Tags</a> </li>
<li> <a href="https://github.com/gaultier/resume/raw/master/Philippe_Gaultier_resume_en.pdf">Resume</a> </li>
<li> <a href="https://www.linkedin.com/in/philippegaultier/">LinkedIn</a> </li>
<li> <a href="https://github.com/gaultier">Github</a> </li>
<li> <a href="/blog/feed.xml">Atom feed</a> </li>
</ul>
</div>
<div class="body">
<div class="article-prelude">
<p><a href="/blog"> ⏴ Back to all articles</a></p>
<p class="publication-date">Published on 2024-11-05</p>
</div>
<div class="article-title">
<h1>I want Rust to have "defer"</h1>
<div class="tags"> <a href="/blog/articles-by-tag.html#rust" class="tag">Rust</a> <a href="/blog/articles-by-tag.html#c" class="tag">C</a></div>
</div>
<p>In a previous article I <a href="/blog/lessons_learned_from_a_successful_rust_rewrite.html#i-am-still-chasing-memory-leaks">mentioned</a> that we use the <code>defer</code> idiom in Rust through a crate, but that it actually rarely gets past the borrow checker. Some comments were <s>doubtful</s> surprised and I did not have an example at hand.</p>
<p>Well, today at work I hit this issue again so I thought I would document it.</p>
<p>I have a Rust API like this:</p>
<pre><code class="language-rust">#[repr(C)]
pub struct Foo {
value: usize,
}
#[no_mangle]
pub extern "C" fn MYLIB_get_foos(out_foos: *mut *mut Foo, out_foos_count: &mut usize) -> i32 {
let res = vec![Foo { value: 42 }, Foo { value: 99 }];
*out_foos_count = res.len();
unsafe { *out_foos = res.leak().as_mut_ptr() };
0
}
</code></pre>
<p>It allocates and returns an dynamically allocated array as a pointer and a length. Of course in reality, <code>Foo</code> has many fields and the values are not known in advance but decoded from the network.</p>
<p>I tell Cargo this is a static library:</p>
<pre><code class="language-toml"># Cargo.toml
[lib]
crate-type = ["staticlib"]
</code></pre>
<p>It's a straightforward API, so I generate the corresponding C header with cbindgen:</p>
<pre><code class="language-sh">$ cbindgen -v src/lib.rs --lang=c -o mylib.h
</code></pre>
<p>And I get:</p>
<pre><code class="language-c">#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>
typedef struct Foo {
uintptr_t value;
} Foo;
int32_t MYLIB_get_foos(struct Foo **out_foos, uintptr_t *out_foos_count);
</code></pre>
<p>I can now use it from C so:</p>
<pre><code class="language-c">#include "mylib.h"
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
Foo *foos = NULL;
size_t foos_count = 0;
assert(0 == MYLIB_get_foos(&foos, &foos_count));
for (size_t i = 0; i < foos_count; i++) {
printf("%lu\n", foos[i].value);
}
if (NULL != foos) {
free(foos);
}
}
</code></pre>
<p>I build it with all the warnings enabled, run it with sanitizers on, and/or in valgrind, all good.</p>
<blockquote>
<p>If I feel fancy (and non-portable), I can even automate the freeing of the memory in C with <code>__attribute(cleanup)</code>, like <code>defer</code> (ominous sounds). But let's not, today. Let's focus on the Rust side.</p>
</blockquote>
<p>Now, we are principled developers who test their code (right?). So let's write a Rust test for it. We expect it to be exactly the same as the C code:</p>
<pre><code class="language-rust">#[cfg(test)]
mod tests {
#[test]
fn test_get_foos() {
let mut foos = std::ptr::null_mut();
let mut foos_count = 0;
assert_eq!(0, super::MYLIB_get_foos(&mut foos, &mut foos_count));
}
}
</code></pre>
<p>And it passes:</p>
<pre><code class="language-sh">$ cargo test
...
running 1 test
test tests::test_get_foos ... ok
...
</code></pre>
<p>Of course, we have not yet freed anything, so we expect Miri to complain, and it does:</p>
<pre><code class="language-sh">$ cargo +nightly miri test
...
error: memory leaked: alloc59029 (Rust heap, size: 16, align: 8), allocated here:
...
</code></pre>
<p>Great, so let's free it at the end of the test, like C does, with <code>free</code> from libc, which we add as a dependency:</p>
<pre><code class="language-rust">
#[test]
fn test_get_foos() {
..
if !foos.is_null() {
unsafe { libc::free(foos as *mut std::ffi::c_void) };
}
}
</code></pre>
<p>The test passes, great. Let's try with Miri:</p>
<pre><code class="language-sh">$ cargo +nightly miri test
...
error: Undefined Behavior: deallocating alloc59029, which is Rust heap memory, using C heap deallocation operation
...
</code></pre>
<p>Hmm...ok...Well that's a bit weird because what Rust does when the <code>Vec</code> is allocated, is to call out to <code>malloc</code> from libc, as we can see with <code>strace</code>:</p>
<pre><code class="language-sh">$ strace -k -v -e brk ./a.out
...
brk(0x213c0000) = 0x213c0000
> /usr/lib64/libc.so.6(brk+0xb) [0x10fa9b]
> /usr/lib64/libc.so.6(__sbrk+0x6b) [0x118cab]
> /usr/lib64/libc.so.6(__default_morecore@GLIBC_2.2.5+0x15) [0xa5325]
> /usr/lib64/libc.so.6(sysmalloc+0x57b) [0xa637b]
> /usr/lib64/libc.so.6(_int_malloc+0xd39) [0xa7399]
> /usr/lib64/libc.so.6(tcache_init.part.0+0x36) [0xa7676]
> /usr/lib64/libc.so.6(__libc_malloc+0x125) [0xa7ef5]
> /home/pg/scratch/rust-blog2/a.out(alloc::alloc::alloc+0x6a) [0x4a145a]
> /home/pg/scratch/rust-blog2/a.out(alloc::alloc::Global::alloc_impl+0x140) [0x4a15a0]
> /home/pg/scratch/rust-blog2/a.out(alloc::alloc::exchange_malloc+0x3a) [0x4a139a]
> /home/pg/scratch/rust-blog2/a.out(MYLIB_get_foos+0x26) [0x407cc6]
> /home/pg/scratch/rust-blog2/a.out(main+0x2b) [0x407bfb]
</code></pre>
<p>Note the irony that we do not need to have a third-party dependency on the <code>libc</code> crate to allocate with <code>malloc</code> being called under the hood, but we do need it to free the memory with <code>free</code>. Anyway. Where was I.</p>
<p>Right, Rust wants to free the memory it allocated. Ok. Let's do that I guess.</p>
<p>The only problem is that to do so properly, we ought to use <code>Vec::from_raw_parts</code> and let the <code>Vec</code> free the memory when it gets dropped at the end of the scope. The only problem is: This function requires the pointer, the length, <em>and the capacity</em>. Wait, but we lost the capacity when we returned the pointer + length to the caller in <code>MYLIB_get_foos()</code>, and the caller <em>does not care one bit about the capacity</em>! It's irrelevant to them! At work, the mobile developers using our library rightfully asked: wait, what is this <code>cap</code> field? Why do I care?</p>
<p>So, let's first try to dodge the problem the <s>hacky</s> easy way by pretending that the memory is allocated by a <code>Box</code>, which only needs the pointer, just like <code>free()</code>:</p>
<pre><code class="language-rust"> #[test]
fn test_get_foos() {
...
if !foos.is_null() {
unsafe {
let _ = Box::from_raw(foos);
}
}
}
</code></pre>
<p>It builds. The test passes. And Miri is unhappy. I guess you know the drill by now:</p>
<pre><code class="language-sh">$ cargo +nightly miri test
...
incorrect layout on deallocation: alloc59029 has size 16 and alignment 8, but gave size 8 and alignment 8
...
</code></pre>
<p>Let's take a second to marvel at the fact that Rust, probably the programming language the most strict at compile time, the if-it-builds-it-runs-dude-I-swear language, seems to work at compile time and at run time, but only fails when run under an experimental analyzer that only works in nightly and does not support lots of FFI patterns. Anyways, I guess we have to refactor our whole API!</p>
<p>So, in our codebase at work, we have defined this type:</p>
<pre><code class="language-rust">/// Owning Array i.e. `Vec<T>` in Rust or `std::vector<T>` in C++.
#[repr(C)]
pub struct OwningArrayC<T> {
pub data: *mut T,
pub len: usize,
pub cap: usize,
}
</code></pre>
<p>It clearly signifies to the caller that they are in charge of freeing the memory, and also it carries the capacity of the <code>Vec</code> with it, so it's not lost.</p>
<p>In our project, this struct is used a lot.</p>
<p>So let's adapt the function, and also add a function in the API to free it for convenience:</p>
<pre><code class="language-rust">#[no_mangle]
pub extern "C" fn MYLIB_get_foos(out_foos: &mut OwningArrayC<Foo>) -> i32 {
let res = vec![Foo { value: 42 }, Foo { value: 99 }];
let len = res.len();
let cap = res.capacity();
*out_foos = OwningArrayC {
data: res.leak().as_mut_ptr(),
len,
cap,
};
0
}
#[no_mangle]
pub extern "C" fn MYLIB_free_foos(foos: &mut OwningArrayC<Foo>) {
if !foos.data.is_null() {
unsafe {
let _ = Vec::from_raw_parts(foos.data, foos.len, foos.cap);
}
}
}
</code></pre>
<p>Let's also re-generate the C header, adapt the C code, rebuild it, etc...</p>
<p>Back to the Rust test:</p>
<pre><code class="language-rust">#[cfg(test)]
mod tests {
#[test]
fn test_get_foos() {
let mut foos = crate::OwningArrayC {
data: std::ptr::null_mut(),
len: 0,
cap: 0,
};
assert_eq!(0, super::MYLIB_get_foos(&mut foos));
println!("foos: {}", foos.len);
super::MYLIB_free_foos(&mut foos);
}
}
</code></pre>
<p>And now, Miri is happy. Urgh. So, back to what we set out to do originally, <code>defer</code>.</p>
<p>Let's use the <code>scopeguard</code> crate which provides a <code>defer!</code> macro, in the test, to automatically free the memory:</p>
<pre><code class="language-rust"> #[test]
fn test_get_foos() {
let mut foos = crate::OwningArrayC {
data: std::ptr::null_mut(),
len: 0,
cap: 0,
};
assert_eq!(0, super::MYLIB_get_foos(&mut foos));
defer! {
super::MYLIB_free_foos(&mut foos);
}
println!("foos: {}", foos.len);
}
</code></pre>
<p>And we get a compile error:</p>
<pre><code class="language-sh">$ cargo test
error[E0502]: cannot borrow `foos.len` as immutable because it is also borrowed as mutable
--> src/lib.rs:54:30
|
50 | / defer! {
51 | | super::MYLIB_free_foos(&mut foos);
| | ---- first borrow occurs due to use of `foos` in closure
52 | | }
| |_________- mutable borrow occurs here
53 |
54 | println!("foos: {}", foos.len);
| ^^^^^^^^ immutable borrow occurs here
55 | }
| - mutable borrow might be used here, when `_guard` is dropped and runs the `Drop` code for type `ScopeGuard`
|
</code></pre>
<p>Dum dum duuuum....Yes, we cannot use the <code>defer</code> idom here (or at least I did not find a way). In some cases it's possible, in lots of cases it's not. Despite the version without defer and with defer being equivalent and the borrow checker being fine with the former and not with the latter.</p>
<p>So that is why I argue that Rust should get a <code>defer</code> statement in the language and the borrow checker should be made aware of this construct to allow this approach to take place.</p>
<p>And that's irrespective of the annoying constraints around freeing memory that Rust has allocated. Or that the code builds and runs fine even though it is subtly flawed.</p>
<p><a href="/blog"> ⏴ Back to all articles</a></p>
<blockquote id="donate">
<p>If you enjoy what you're reading, you want to support me, and can afford it: <a href="https://paypal.me/philigaultier?country.x=DE&locale.x=en_US">Support me</a>. That allows me to write more cool articles!</p>
</blockquote>
<blockquote>
<p>
This blog is <a href="https://github.com/gaultier/blog">open-source</a>!
If you find a problem, please open a Github issue.
The content of this blog as well as the code snippets are under the <a href="https://en.wikipedia.org/wiki/BSD_licenses#3-clause_license_(%22BSD_License_2.0%22,_%22Revised_BSD_License%22,_%22New_BSD_License%22,_or_%22Modified_BSD_License%22)">BSD-3 License</a> which I also usually use for all my personal projects. It's basically free for every use but you have to mention me as the original author.
</p>
</blockquote>
</div>
</body>
</html>