Move first RISC-V builder over to cross-compiler + execute tests under qemu-system setup #279

asb · 2024-10-15T14:59:11Z

This PR is structured as two patches (at the time of writing), though it might make sense to separate out the first one for separate review. I'm posting together to get initial feedback.

My goal is to move clang-riscv-rva20-2stage, clang-riscv-rva23-mrvv-vec-bits-2stage, and clang-riscv-rva23-evl-vec-2stage over so:

All building is done on the X86 host (first stage straight-forward native build, stage2 a crossbuild using the just-built stage1)
To execute the tests (ninja check-all) we rely on a llvm-lit wrapper script that spins up a special-purpose VM with the source+build tree copied into it (see here) under qemu-system and runs tests there. This should be much faster than the current approach that does all building under qemu-system.

This PR does this change just for clang-riscv-rva20-2stage for ease of review and so we can get any teething problems sorted out. Although it might be possible to properly configure the cross-build otherwise, I've adding a mechanism for creating and adding options to a toolchains file used by the second stage. This matches CMake recommended practice for cross-building, and after spending a lot of time iterating on various cross-build configs recently I've found it's the option that gives most control and fewer problems (notably, some CMake variables can't be set outside of a toolchain file).

@gkistanova: The underlying worker is currently configured within a RISC-V qemu-system. Once this is ready to merge, I'll bring it over to running under the host meaning I shouldn't need to set up a new password and so on. But let me know if you'd prefer I do set up a new worker with a new name and retire this name.

We'll later want to add the ability to customise the enabled projects/runtimes vs stage1 vs stage2.

For reference, this approximates the logic that the configured builder should follow:

#!/bin/sh

die () {
  printf "%s\n" "$*" >&2
  exit 1
}

mkdir -p build && cd build
mkdir -p stage1 stage1.install stage2
printf "!!!!!!!!!! Building stage 1 !!!!!!!!!!\n"
cd stage1
# TODO: no need to build clang-tools-extra in stage1, but this matches logic in ClangBuilder.py
cmake -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_ENABLE_ASSERTIONS=True \
  -DCMAKE_INSTALL_PREFIX=../stage1.install \
  -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld" \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DLLVM_ENABLE_LLD=True \
  -DLLVM_TARGETS_TO_BUILD=RISCV \
  -DCMAKE_C_COMPILER_LAUNCHER=ccache \
  -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
  ../../llvm || die "Failed to configure stage 1 build"
ninja || die "Failed to build stage 1"
printf "!!!!!!!!!! Installing stage 1 !!!!!!!!!!\n"
ninja install || die "Failed to install stage 1"
printf "!!!!!!!!!! Building stage 2 !!!!!!!!!!\n"
cd ../stage2
cat - <<EOF > stage1-toolchain.cmake
set(CMAKE_SYSTEM_NAME Linux)

set(CMAKE_SYSROOT $HOME/rvsysroot)

set(CMAKE_C_COMPILER $(pwd)/../stage1.install/bin/clang)
set(CMAKE_CXX_COMPILER $(pwd)/../stage1.install/bin/clang++)

set(CMAKE_C_COMPILER_TARGET riscv64-linux-gnu)
set(CMAKE_CXX_COMPILER_TARGET riscv64-linux-gnu)
set(CMAKE_C_FLAGS_INIT '-march=rva20u64')
set(CMAKE_CXX_FLAGS_INIT '-march=rva20u64')

set(CMAKE_LINKER_TYPE LLD)

set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
EOF
cmake -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_ENABLE_ASSERTIONS=True \
  -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld" \
  -DCMAKE_TOOLCHAIN_FILE=$(pwd)/stage1-toolchain.cmake \
  -DLLVM_NATIVE_TOOL_DIR=$(pwd)/../stage1/bin \
  -DLLVM_BUILD_TESTS=True \
  -DLLVM_EXTERNAL_LIT=$HOME/lit-on-qemu-system-rva20.py \
  ../../llvm || die "Failed to configure stage 2 build"
ninja || die "Failed to build stage 2"
printf "!!!!!!!!!! Running check-all on stage 2 build !!!!!!!!!!\n"
ninja check-all || die "check-all failed"

…the stage1 CC/CXX This allows us to follow best/usual CMake practice for configuring cross compiling.

…der qemu-system

DavidSpickett · 2024-10-16T09:33:55Z

The obvious alternative is binfmt_misc. We have used this at Linaro to run AArch64 compiler binaries on x86 to build benchmarks. Though we then copied those to real hardware of course, it just let us use spare machines for compiling. Once we had enough Arm hardware we did it all on AArch64.

Splitting the test build vs. test run apart for lit tests is not a viable option however. Benchmark build and run were already discreet steps so it was easy.

Even if you run the tests via qemu-user too, you could argue it's not a "real" system layout. You pass a sysroot but having a proper Linux install is going to be more accurate, even if only by a small amount. It's the "test what you ship" saying but "test where you're going to ship to". And it gives you a VM image you can send to someone if they want to dig into the problem.

There's probably a lot of overhead in ~10,000 qemu-users being set up and torn down as well, even compared to the kernel boot time.

Am I right in thinking that this qemu-system VM is per run?

That clears out any leftover files, which is good, disadvantage is that you can't get at reproducers as easily. Then again, when people want a reproducer from a Linaro bot they have to ask us for it, so only difference here is you would start a VM locally and reproduce it again.

I thought maybe you could cross compile in stage 1 right away, but any bug in the host compiler's risc-v codegen would break that build. So it makes sense to build latest clang as the compiler for stage 2. You have enabled ccache so this mitigates the overhead of stage 1.

We had some issues like this where SVE codegen in our host clang compiler for stage 1 would hit bugs that had already been fixed. Luckily there was a version we could upgrade to but otherwise you're trying to host nightly builds and it's a mess.

buildbot/osuosl/master/config/builders.py

zorg/buildbot/builders/ClangBuilder.py

DavidSpickett · 2024-10-16T09:56:38Z

The obvious alternative is binfmt_misc.

Also we cannot test lldb with qemu-user because it doesn't emulate ptrace calls. You can use qemu's own GDB stub as the debug server but lldb -> lldb-server tests don't work.

To be clear, don't add lldb here. I am happy to see this though as interest in RISC-V lldb has been picking up lately and it will need checking eventually.

asb · 2024-10-16T12:49:34Z

The obvious alternative is binfmt_misc. We have used this at Linaro to run AArch64 compiler binaries on x86 to build benchmarks. Though we then copied those to real hardware of course, it just let us use spare machines for compiling. Once we had enough Arm hardware we did it all on AArch64.

binfmt_misc and qemu-user does work pretty well, and is slightly faster but requires masking out some tests that are incompatible with the approach. My feeling is that for something I hope to move to the non-staging buildmaster, using qemu-system in this way and thus minimising the chance of test failures being an artifact of the emulation process and its interaction with the build system is probably more viable. As you say below, LLDB tests can't run under qemu-user. I'd be a lot happier running libc tests under qemu-system as well rather than using it as a stress test for qemu-user syscall translation! So my reasoning is this approach can work well as a baseline that doesn't require alternate approaches for different sub-projects.

Splitting the test build vs. test run apart for lit tests is not a viable option however. Benchmark build and run were already discreet steps so it was easy.

Even if you run the tests via qemu-user too, you could argue it's not a "real" system layout. You pass a sysroot but having a proper Linux install is going to be more accurate, even if only by a small amount. It's the "test what you ship" saying but "test where you're going to ship to". And it gives you a VM image you can send to someone if they want to dig into the problem.

There's probably a lot of overhead in ~10,000 qemu-users being set up and torn down as well, even compared to the kernel boot time.

qemu-user for separate process typically scales better than running that same set of processes withing qemu-system (sorry I don't have numbers to hand right now). But for the reasons above, and due to concern about qemu-user needing more work in terms of masking specific tasks and the like, I prefer qemu-system.

Am I right in thinking that this qemu-system VM is per run?

Yes, effectively launched as a single-use appliance.

That clears out any leftover files, which is good, disadvantage is that you can't get at reproducers as easily. Then again, when people want a reproducer from a Linaro bot they have to ask us for it, so only difference here is you would start a VM locally and reproduce it again.

Plus a documented cross-compile flow means it's a bit easier to reproduce! I can look at making it easier to reproduce an equivalent VM image, though this will be easier once the next Debian release happens so there's a stable base rather than just using sid at the time the image was generated.

I thought maybe you could cross compile in stage 1 right away, but any bug in the host compiler's risc-v codegen would break that build. So it makes sense to build latest clang as the compiler for stage 2. You have enabled ccache so this mitigates the overhead of stage 1.

My thinking exactly. Especially with the different rva23 vectorisation codegen options, we're using building+testing clang as a bit of a stress test for codegen itself. Plus old enough clang/llvm didn't recognise all the options we're using.

asb · 2024-10-30T13:03:27Z

FYI I've taken a slight diversion to set up a good flow for testing builder config changes locally, in order to easier land this and other related changes: #289

… to interpolation

asb · 2024-11-12T21:58:34Z

With a proper testing flow in place, I've fixed the issues related to the toolchains file generation as well as bugs in my initial implementation related to buildbot interpolation.

I considered the idea of having the toolchains file as a file in the llvm-zorg repo, but I don't think it would be overly useful (IMHO):

We need buildbot to inject paths into it, so it needs interpolation
Configuration for the build is then split between ClangBuilder.py, builders.py, and this toolchain file, so arguably more complex than before.

DavidSpickett · 2024-11-19T14:01:50Z

As long as it's possible to reproduce without starting a whole master. If we're lucky this download string step records it as an artifact or shows it.

buildbot/osuosl/master/config/builders.py

…m-test-builders

asb · 2024-12-11T11:56:58Z

@DavidSpickett I've generalised the lit-on-qemu script and checked it in. The implementation is modified to execute it from an llvm-zorg checkout. I've also checked in the script used to build the qemu 'appliance' virtual machine used to execute the test suite and retrieve the results. For lit-on-qemu, although it could certainly be generalised for other targets, I've opted to leave that until there's at least one other target we want to generalise it for (which I'd love to see!) - mainly I want to avoid adding additional configurability/complexity until someone takes the time to go through and see what is needed.

I think this is ready to go now, if you're happy. Many thanks.

DavidSpickett

A little hard to asses without just running it all, but no objections from me at this time.

Given this is the first of its kind, @gkistanova should be the approver.

zorg/buildbot/builders/ClangBuilder.py

DavidSpickett · 2024-12-11T13:24:00Z

mainly I want to avoid adding additional configurability/complexity until someone takes the time to go through and see what is needed.

Agreed.

asb · 2024-12-11T14:21:03Z

A little hard to asses without just running it all, but no objections from me at this time.

Given this is the first of its kind, @gkistanova should be the approver.

Thanks for taking a look. What do you think @gkistanova? I think I've managed to keep the ClangBuilder.py changes as minimal and reusable as possible.

asb · 2024-12-18T09:25:15Z

A little hard to asses without just running it all, but no objections from me at this time.
Given this is the first of its kind, @gkistanova should be the approver.

Thanks for taking a look. What do you think @gkistanova? I think I've managed to keep the ClangBuilder.py changes as minimal and reusable as possible.

Ping. Many thanks in advance.

gkistanova

Thanks, Alex!
LGTM.

But let me know if you'd prefer I do set up a new worker with a new name and retire this name.

No, I'd rather prefer keeping the same name if you are migrating the build configuration and not adding a new one.

asb added 2 commits October 15, 2024 15:39

Add ability to use a CMake toolchains file for specifying details of …

ec9391e

…the stage1 CC/CXX This allows us to follow best/usual CMake practice for configuring cross compiling.

[RISCV] Move clang-riscv-rva20-2stage to cross-build and then test un…

6f37f5b

…der qemu-system

asb requested review from DavidSpickett and gkistanova October 15, 2024 14:59

DavidSpickett reviewed Oct 16, 2024

View reviewed changes

buildbot/osuosl/master/config/builders.py Outdated Show resolved Hide resolved

DavidSpickett reviewed Oct 16, 2024

View reviewed changes

zorg/buildbot/builders/ClangBuilder.py Outdated Show resolved Hide resolved

DavidSpickett reviewed Oct 16, 2024

View reviewed changes

zorg/buildbot/builders/ClangBuilder.py Outdated Show resolved Hide resolved

Use StringDownload to transfer toolchains file and fix issues related…

03f14e6

… to interpolation

DavidSpickett reviewed Nov 19, 2024

View reviewed changes

buildbot/osuosl/master/config/builders.py Outdated Show resolved Hide resolved

asb added 2 commits December 11, 2024 06:46

Add scripts used for RISE RISC-V builders and use them

7571d35

Merge branch 'main' into 2024q4-riscv-add-cross-build-then-qemu-syste…

2699963

…m-test-builders

asb closed this Dec 11, 2024

asb force-pushed the 2024q4-riscv-add-cross-build-then-qemu-system-test-builders branch from 03f14e6 to 7c43daf Compare December 11, 2024 11:42

Fixups

4a7fce4

asb reopened this Dec 11, 2024

DavidSpickett reviewed Dec 11, 2024

View reviewed changes

zorg/buildbot/builders/ClangBuilder.py Outdated Show resolved Hide resolved

Don't use llvm-zorg fork (holdover from testing)

be98c96

gkistanova approved these changes Dec 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move first RISC-V builder over to cross-compiler + execute tests under qemu-system setup #279

Move first RISC-V builder over to cross-compiler + execute tests under qemu-system setup #279

asb commented Oct 15, 2024 •

edited

Loading

DavidSpickett commented Oct 16, 2024

DavidSpickett commented Oct 16, 2024 •

edited

Loading

asb commented Oct 16, 2024

asb commented Oct 30, 2024

asb commented Nov 12, 2024

DavidSpickett commented Nov 19, 2024

asb commented Dec 11, 2024

DavidSpickett left a comment

DavidSpickett commented Dec 11, 2024

asb commented Dec 11, 2024

asb commented Dec 18, 2024

gkistanova left a comment •

edited

Loading

Move first RISC-V builder over to cross-compiler + execute tests under qemu-system setup #279

Are you sure you want to change the base?

Move first RISC-V builder over to cross-compiler + execute tests under qemu-system setup #279

Conversation

asb commented Oct 15, 2024 • edited Loading

DavidSpickett commented Oct 16, 2024

DavidSpickett commented Oct 16, 2024 • edited Loading

asb commented Oct 16, 2024

asb commented Oct 30, 2024

asb commented Nov 12, 2024

DavidSpickett commented Nov 19, 2024

asb commented Dec 11, 2024

DavidSpickett left a comment

Choose a reason for hiding this comment

DavidSpickett commented Dec 11, 2024

asb commented Dec 11, 2024

asb commented Dec 18, 2024

gkistanova left a comment • edited Loading

Choose a reason for hiding this comment

asb commented Oct 15, 2024 •

edited

Loading

DavidSpickett commented Oct 16, 2024 •

edited

Loading

gkistanova left a comment •

edited

Loading