worker: rework flash endpoint #728

rcooke-warwick · 2022-04-20T15:00:37Z

This PR modifies the flash endpoint.

What we were doing:

streaming the image over the flash endpoint in the worker, and piping that stream into the flash function of the testbotSDK/qemu worker

This was causing problems for the usbboot device types like the fin - where the http connection was closing while waiting for the fin to attach as a block device during the flashing procedure.

What this PR does:

(for testbot) we first send the image in its entirety over rsync over SSH to the worker, before trying to flash.
(for qemu) the image is already in a volume shared by the core and worker - so we don't send anything
the flash endpoint on the worker now just flashes the image to the DUT with the image path provided in the request body.

other things to note:

no interface change - tests won't need to be modified
removed needless http transfer of image for qemu worker
more resistant to quick VPN drops while flashing
fixes fin and other usbboot device flashing

Change-type: patch
Signed-off-by: Ryan Cooke [email protected]

Change-type: patch Signed-off-by: Ryan Cooke <[email protected]>

rcooke-warwick · 2022-04-20T15:00:48Z

@balena-ci rebase

klutchell · 2022-04-20T15:12:12Z

core/lib/common/worker.js

-					this.logger.log(`Preparing to flash, attempt ${attempt}...`);
+					// if qemu worker, image is already in volume
+					let flashPath = imagePath;
+					if(!this.url.includes(`worker`)){


I really wish we had a better way to check this but nothing is coming to mind.

I can only think of initialising the class with a flag that says if its for emulated or testbot tbh

klutchell · 2022-04-20T16:44:23Z

core/lib/common/worker.js

+						await pipeline(
+							fs.createReadStream(imagePath),
+							createGzip({ level: 6 }),
+							fs.createWriteStream(`/tmp/os.img`)


Don't we already gzip the image before providing it to the client?

Also wouldn't we want to add a .gz extension if we are zipping it, just for clarity?

jakogut · 2022-04-20T19:29:19Z

Expanding on our last sync (that I was present for), I think the hacks for detecting emulated workers are because of the asymmetry between the two approaches currently. For testbot, we have:
[ client/core ] -> [ testbot (worker) ] -> [ test device (DUT) ]

For QEMU, we have:
[ client/core | worker ] -> [ test device ]

In English, the testbot has three separate hosts connected by the network (with no guaranteed direct route between the client and test device), whereas the emulated approach really only has two hosts, and the direct route is guaranteed.

We can make these symmetrical by running the client/core on the testbot itself, which is easily doable over SSH. This has several advantages:

No more redundant and overly complicated interfaces between core/client and worker. For example, executeCommandInWorkerHost becomes wholly unnecessary, executeCommandInHostOS and the like can be replaced with simpler and more concise direct SSH calls.
No more tunneling required, nor manually creating and managing SSH keys
No uploads required from workstations, which often have asymmetrical connections with slow upload speed
No code required locally on a workstation to run test suites at all, either using a testbot or QEMU, though simple tasks could still be automated with bash scripts
No managing releases on the worker, the image is pulled explicitly at runtime

This also means we wouldn't need any bespoke code or APIs for managing artifacts, configuration, images, and the like. If I have a suite and an image on my local machine I want to run remotely on a Pi 4, I simply rsync or sftp them over before doing something like:

echo balena run -v /mnt/data/:/mnt/data \
    balena/leviathan --worker testbot \
                     --suite /mnt/data/suite \
                     --image /mnt/data/os.img \
    | balena ssh [mydevice]

Same thing on my workstation, if I want to run tests on QEMU, I'd do something like:

docker run -v .:/data \
    balena/leviathan --worker qemu \
                     --suite /data/suite \
                     --image /data/os.img`

vipulgupta2048 · 2023-02-15T21:30:26Z

No longer feasible as flashing endpoint needs to be reworked to accommodate #965

worker: rework flash endpoint

21146a4

Change-type: patch Signed-off-by: Ryan Cooke <[email protected]>

rcooke-warwick added the versionbot/pr-draft Draft PR - Don't merge this PR automatically label Apr 20, 2022

rcooke-warwick requested review from jakogut, Bucknalla, klutchell and vipulgupta2048 April 20, 2022 15:00

ghost force-pushed the ryan/send-img-pre-flash branch from 42b25aa to 21146a4 Compare April 20, 2022 15:01

klutchell reviewed Apr 20, 2022

View reviewed changes

vipulgupta2048 closed this Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

worker: rework flash endpoint #728

worker: rework flash endpoint #728

rcooke-warwick commented Apr 20, 2022

rcooke-warwick commented Apr 20, 2022

klutchell Apr 20, 2022

rcooke-warwick Apr 20, 2022

klutchell Apr 20, 2022

klutchell Apr 20, 2022

jakogut commented Apr 20, 2022 •

edited

Loading

vipulgupta2048 commented Feb 15, 2023

worker: rework flash endpoint #728

worker: rework flash endpoint #728

Conversation

rcooke-warwick commented Apr 20, 2022

rcooke-warwick commented Apr 20, 2022

klutchell Apr 20, 2022

Choose a reason for hiding this comment

rcooke-warwick Apr 20, 2022

Choose a reason for hiding this comment

klutchell Apr 20, 2022

Choose a reason for hiding this comment

klutchell Apr 20, 2022

Choose a reason for hiding this comment

jakogut commented Apr 20, 2022 • edited Loading

vipulgupta2048 commented Feb 15, 2023

jakogut commented Apr 20, 2022 •

edited

Loading