Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify Sync and Send #41

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Specify Sync and Send #41

wants to merge 6 commits into from

Conversation

rikhuijzer
Copy link
Owner

@rikhuijzer rikhuijzer commented Dec 22, 2024

Implement some basic multithreading (fix #37). After reading Rust for Rustaceans, chapter 10 on concurrency (and parallelism), it seems that rayon could be a suitable way to implement it (EDIT: Yes looks like it. rayon is also used within the Rust compiler together with the query-based approach). The book says that Shared Memory is usually the best approach if threads need to "jointly update some shared state in a way that does not commute". But on the other hand, having a Worker Pool via rayon does look appealing too. The API is very simple. I think it would be possible to figure out beforehand which rewrites have side-effects and which don't, then based on that rayon can probably be called differently. For example, say if 2 out of 6 op rewrites have side-effects (| | x | | x) , then maybe the iter can be split up in multiple parts (| |, x, | |, x). This would lose some concurrency of course since it essentially introduces 4 barriers (synchronization points), but depending on how many rewrites have side-effects this could still save a lot of time.

EDIT: So it crashes with out of bounds errors like trying to access a block that has been removed. This makes sense. But how to stop rewrites in a certain area if one rewrite already knows that it need to rewrite a few layers up? Or maybe the locks are currently not correct? Certain rewrites should just lock a whole area? The book talks about that only commutatitivy is a problem for rayon which uses a Worker Pool. Or maybe the rewrites should take multiple locks? A more hands-off approach could be to determine independent regions beforehand and then just process those parallel (like Mojo did with LLVM: https://youtu.be/yuSBEXkjfEA).

EDIT2: Interesting text from https://stackoverflow.com/questions/4430001/: "In practice, fine-grained parallelism in compiler passes probably isn't worth the overhead of synchronization (and the inevitable bugs when a pass touches more than it claims to) given that individual source files in large programs can be compiled in parallel. The different pass classes are primarily useful for documentation. They can also help in scheduling passes in a cache-friendly way; for example, when running a bunch of FunctionPasses on all the translation unit's functions, it's faster to run each pass on one function (keeping it in cache) before moving to the next function."

EDIT 3: Probably need to think about this in terms of separate parts like parser, optimizer, etc. From the Rust compiler docs: "As of November 2024, most of the rust compiler is now parallelized.

The codegen part is executed concurrently by default. You can use the -C codegen-units=n option to control the number of concurrent tasks.
The parts after HIR lowering to codegen such as type checking, borrowing checking, and mir optimization are parallelized in the nightly version. Currently, they are executed in serial by default, and parallelization is manually enabled by the user using the -Z threads = n option.
Other parts, such as lexical parsing, HIR lowering, and macro expansion, are still executed in serial mode"

Here codegen as far as I understand means llvm. This part is already parallelized since 2015: "In a normal build we generate an LLVM module for each crate. LLVM then processes that module into an object file, which is eventually linked with other object files (from other crates or libs) to make the final program. Under parallel codegen, rustc creates multiple LLVM modules per crate (one per ‘codegen unit’), these modules are processed in parallel to produce multiple object files. Then these object files are linked together to produce a single object file for the whole crate. That object file can then be further linked as normal."

The linker seems to be single threaded more generally. In a large cmake based project cpu utilization went visually down to one during linking.

EDIT 4: The query system is mostly meant to make incremental compilation faster, but it has benefits for parallelization too since query results are immutable and query providers can only access data via the query context which allows the query context to take care of synchronizing access (https://rustc-dev-guide.rust-lang.org/parallel-rustc.html#query-system). Also from Ole Fredriksson's blog: "Query-based compilers are also surprisingly easy to parallelise. Since we're allowed to make any query at any time, and they're memoised the first time they're run, we can fire off queries in parallel without having to think much. In Sixty, the default behaviour is for all input modules to be type checked in parallel."

EDIT 5: According to matklad, a query-based compiler might be overkill for many languages (https://www.reddit.com/r/ProgrammingLanguages/comments/hfs53y/comment/fyeri3v/). Usually it's fine for parallelism to do it on a higher level (like the file-level).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multithreading
1 participant