Specify `Sync` and `Send` #41

rikhuijzer · 2024-12-22T15:00:41Z

Implement some basic multithreading (fix #37). After reading Rust for Rustaceans, chapter 10 on concurrency (and parallelism), it seems that rayon could be a suitable way to implement it (EDIT: Yes looks like it. rayon is also used within the Rust compiler together with the query-based approach). The book says that Shared Memory is usually the best approach if threads need to "jointly update some shared state in a way that does not commute". But on the other hand, having a Worker Pool via rayon does look appealing too. The API is very simple. I think it would be possible to figure out beforehand which rewrites have side-effects and which don't, then based on that rayon can probably be called differently. For example, say if 2 out of 6 op rewrites have side-effects (| | x | | x) , then maybe the iter can be split up in multiple parts (| |, x, | |, x). This would lose some concurrency of course since it essentially introduces 4 barriers (synchronization points), but depending on how many rewrites have side-effects this could still save a lot of time.

EDIT: So it crashes with out of bounds errors like trying to access a block that has been removed. This makes sense. But how to stop rewrites in a certain area if one rewrite already knows that it need to rewrite a few layers up? Or maybe the locks are currently not correct? Certain rewrites should just lock a whole area? The book talks about that only commutatitivy is a problem for rayon which uses a Worker Pool. Or maybe the rewrites should take multiple locks? A more hands-off approach could be to determine independent regions beforehand and then just process those parallel (like Mojo did with LLVM: https://youtu.be/yuSBEXkjfEA).

EDIT2: Interesting text from https://stackoverflow.com/questions/4430001/: "In practice, fine-grained parallelism in compiler passes probably isn't worth the overhead of synchronization (and the inevitable bugs when a pass touches more than it claims to) given that individual source files in large programs can be compiled in parallel. The different pass classes are primarily useful for documentation. They can also help in scheduling passes in a cache-friendly way; for example, when running a bunch of FunctionPasses on all the translation unit's functions, it's faster to run each pass on one function (keeping it in cache) before moving to the next function."

EDIT 3: Probably need to think about this in terms of separate parts like parser, optimizer, etc. From the Rust compiler docs: "As of November 2024, most of the rust compiler is now parallelized.

The codegen part is executed concurrently by default. You can use the -C codegen-units=n option to control the number of concurrent tasks.
The parts after HIR lowering to codegen such as type checking, borrowing checking, and mir optimization are parallelized in the nightly version. Currently, they are executed in serial by default, and parallelization is manually enabled by the user using the -Z threads = n option.
Other parts, such as lexical parsing, HIR lowering, and macro expansion, are still executed in serial mode"

Here codegen as far as I understand means llvm. This part is already parallelized since 2015: "In a normal build we generate an LLVM module for each crate. LLVM then processes that module into an object file, which is eventually linked with other object files (from other crates or libs) to make the final program. Under parallel codegen, rustc creates multiple LLVM modules per crate (one per ‘codegen unit’), these modules are processed in parallel to produce multiple object files. Then these object files are linked together to produce a single object file for the whole crate. That object file can then be further linked as normal."

The linker seems to be single threaded more generally. In a large cmake based project cpu utilization went visually down to one during linking.

EDIT 4: The query system is mostly meant to make incremental compilation faster, but it has benefits for parallelization too since query results are immutable and query providers can only access data via the query context which allows the query context to take care of synchronizing access (https://rustc-dev-guide.rust-lang.org/parallel-rustc.html#query-system). Also from Ole Fredriksson's blog: "Query-based compilers are also surprisingly easy to parallelise. Since we're allowed to make any query at any time, and they're memoised the first time they're run, we can fire off queries in parallel without having to think much. In Sixty, the default behaviour is for all input modules to be type checked in parallel."

EDIT 5: According to matklad, a query-based compiler might be overkill for many languages (https://www.reddit.com/r/ProgrammingLanguages/comments/hfs53y/comment/fyeri3v/). Usually it's fine for parallelism to do it on a higher level (like the file-level).

rikhuijzer added 6 commits December 22, 2024 16:00

Specify Sync and Send

d70ac4e

Fix an incorrect "caching" of ops"

b8d349d

Rewrite for loop to iterators

71451fa

Update

cbbb224

Passes

465a0f1

Fails unsurprisingly

1d6ef9a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify `Sync` and `Send` #41

Specify `Sync` and `Send` #41

rikhuijzer commented Dec 22, 2024 •

edited

Loading

Specify Sync and Send #41

Are you sure you want to change the base?

Specify Sync and Send #41

Conversation

rikhuijzer commented Dec 22, 2024 • edited Loading

Specify `Sync` and `Send` #41

Specify `Sync` and `Send` #41

rikhuijzer commented Dec 22, 2024 •

edited

Loading