Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement some basic multithreading (fix #37). After reading Rust for Rustaceans, chapter 10 on concurrency (and parallelism), it seems that
rayon
could be a suitable way to implement it (EDIT: Yes looks like it.rayon
is also used within the Rust compiler together with the query-based approach). The book says that Shared Memory is usually the best approach if threads need to "jointly update some shared state in a way that does not commute". But on the other hand, having a Worker Pool viarayon
does look appealing too. The API is very simple. I think it would be possible to figure out beforehand which rewrites have side-effects and which don't, then based on thatrayon
can probably be called differently. For example, say if 2 out of 6 op rewrites have side-effects (| | x | | x
) , then maybe theiter
can be split up in multiple parts (| |
,x
,| |
,x
). This would lose some concurrency of course since it essentially introduces 4 barriers (synchronization points), but depending on how many rewrites have side-effects this could still save a lot of time.EDIT: So it crashes with out of bounds errors like trying to access a block that has been removed. This makes sense. But how to stop rewrites in a certain area if one rewrite already knows that it need to rewrite a few layers up? Or maybe the locks are currently not correct? Certain rewrites should just lock a whole area? The book talks about that only commutatitivy is a problem for
rayon
which uses a Worker Pool. Or maybe the rewrites should take multiple locks? A more hands-off approach could be to determine independent regions beforehand and then just process those parallel (like Mojo did with LLVM: https://youtu.be/yuSBEXkjfEA).EDIT2: Interesting text from https://stackoverflow.com/questions/4430001/: "In practice, fine-grained parallelism in compiler passes probably isn't worth the overhead of synchronization (and the inevitable bugs when a pass touches more than it claims to) given that individual source files in large programs can be compiled in parallel. The different pass classes are primarily useful for documentation. They can also help in scheduling passes in a cache-friendly way; for example, when running a bunch of FunctionPasses on all the translation unit's functions, it's faster to run each pass on one function (keeping it in cache) before moving to the next function."
EDIT 3: Probably need to think about this in terms of separate parts like parser, optimizer, etc. From the Rust compiler docs: "As of November 2024, most of the rust compiler is now parallelized.
The codegen part is executed concurrently by default. You can use the -C codegen-units=n option to control the number of concurrent tasks.
The parts after HIR lowering to codegen such as type checking, borrowing checking, and mir optimization are parallelized in the nightly version. Currently, they are executed in serial by default, and parallelization is manually enabled by the user using the -Z threads = n option.
Other parts, such as lexical parsing, HIR lowering, and macro expansion, are still executed in serial mode"
Here codegen as far as I understand means llvm. This part is already parallelized since 2015: "In a normal build we generate an LLVM module for each crate. LLVM then processes that module into an object file, which is eventually linked with other object files (from other crates or libs) to make the final program. Under parallel codegen, rustc creates multiple LLVM modules per crate (one per ‘codegen unit’), these modules are processed in parallel to produce multiple object files. Then these object files are linked together to produce a single object file for the whole crate. That object file can then be further linked as normal."
The linker seems to be single threaded more generally. In a large cmake based project cpu utilization went visually down to one during linking.
EDIT 4: The query system is mostly meant to make incremental compilation faster, but it has benefits for parallelization too since query results are immutable and query providers can only access data via the query context which allows the query context to take care of synchronizing access (https://rustc-dev-guide.rust-lang.org/parallel-rustc.html#query-system). Also from Ole Fredriksson's blog: "Query-based compilers are also surprisingly easy to parallelise. Since we're allowed to make any query at any time, and they're memoised the first time they're run, we can fire off queries in parallel without having to think much. In Sixty, the default behaviour is for all input modules to be type checked in parallel."
EDIT 5: According to matklad, a query-based compiler might be overkill for many languages (https://www.reddit.com/r/ProgrammingLanguages/comments/hfs53y/comment/fyeri3v/). Usually it's fine for parallelism to do it on a higher level (like the file-level).