Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

worker: rework flash endpoint #728

Closed
wants to merge 1 commit into from

Conversation

rcooke-warwick
Copy link
Contributor

This PR modifies the flash endpoint.

What we were doing:

  • streaming the image over the flash endpoint in the worker, and piping that stream into the flash function of the testbotSDK/qemu worker

This was causing problems for the usbboot device types like the fin - where the http connection was closing while waiting for the fin to attach as a block device during the flashing procedure.

What this PR does:

  • (for testbot) we first send the image in its entirety over rsync over SSH to the worker, before trying to flash.
  • (for qemu) the image is already in a volume shared by the core and worker - so we don't send anything
  • the flash endpoint on the worker now just flashes the image to the DUT with the image path provided in the request body.

other things to note:

  • no interface change - tests won't need to be modified
  • removed needless http transfer of image for qemu worker
  • more resistant to quick VPN drops while flashing
  • fixes fin and other usbboot device flashing

Change-type: patch
Signed-off-by: Ryan Cooke [email protected]

Change-type: patch
Signed-off-by: Ryan Cooke <[email protected]>
@rcooke-warwick rcooke-warwick added the versionbot/pr-draft Draft PR - Don't merge this PR automatically label Apr 20, 2022
@rcooke-warwick
Copy link
Contributor Author

@balena-ci rebase

@ghost ghost force-pushed the ryan/send-img-pre-flash branch from 42b25aa to 21146a4 Compare April 20, 2022 15:01
this.logger.log(`Preparing to flash, attempt ${attempt}...`);
// if qemu worker, image is already in volume
let flashPath = imagePath;
if(!this.url.includes(`worker`)){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wish we had a better way to check this but nothing is coming to mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only think of initialising the class with a flag that says if its for emulated or testbot tbh

await pipeline(
fs.createReadStream(imagePath),
createGzip({ level: 6 }),
fs.createWriteStream(`/tmp/os.img`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we already gzip the image before providing it to the client?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wouldn't we want to add a .gz extension if we are zipping it, just for clarity?

@jakogut
Copy link
Contributor

jakogut commented Apr 20, 2022

Expanding on our last sync (that I was present for), I think the hacks for detecting emulated workers are because of the asymmetry between the two approaches currently. For testbot, we have:
[ client/core ] -> [ testbot (worker) ] -> [ test device (DUT) ]

For QEMU, we have:
[ client/core | worker ] -> [ test device ]

In English, the testbot has three separate hosts connected by the network (with no guaranteed direct route between the client and test device), whereas the emulated approach really only has two hosts, and the direct route is guaranteed.

We can make these symmetrical by running the client/core on the testbot itself, which is easily doable over SSH. This has several advantages:

  • No more redundant and overly complicated interfaces between core/client and worker. For example, executeCommandInWorkerHost becomes wholly unnecessary, executeCommandInHostOS and the like can be replaced with simpler and more concise direct SSH calls.
  • No more tunneling required, nor manually creating and managing SSH keys
  • No uploads required from workstations, which often have asymmetrical connections with slow upload speed
  • No code required locally on a workstation to run test suites at all, either using a testbot or QEMU, though simple tasks could still be automated with bash scripts
  • No managing releases on the worker, the image is pulled explicitly at runtime

This also means we wouldn't need any bespoke code or APIs for managing artifacts, configuration, images, and the like. If I have a suite and an image on my local machine I want to run remotely on a Pi 4, I simply rsync or sftp them over before doing something like:

echo balena run -v /mnt/data/:/mnt/data \
    balena/leviathan --worker testbot \
                     --suite /mnt/data/suite \
                     --image /mnt/data/os.img \
    | balena ssh [mydevice]

Same thing on my workstation, if I want to run tests on QEMU, I'd do something like:

docker run -v .:/data \
    balena/leviathan --worker qemu \
                     --suite /data/suite \
                     --image /data/os.img`

@vipulgupta2048
Copy link
Member

No longer feasible as flashing endpoint needs to be reworked to accommodate #965

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
versionbot/pr-draft Draft PR - Don't merge this PR automatically
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants