Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cryptsetup toolstack version bump (2.3.3-> 2.6.1) + reencryption cleanup (LUKS v1/v2 proper support + reencryption on Q4.2 + BTRFS dual LUKS containers install) #1541

Merged

Conversation

tlaurion
Copy link
Collaborator

@tlaurion tlaurion commented Nov 28, 2023

  • Bump kernel from 5.10.5 -> 5.10.214

    • Cloudfare patches to speed up LUKS encryption were upstreamed into linux kernel and backported to 5.10.9+.
    • cryptsetup2 toolstack version bump and script fixes to support multi-LUKS containers (BTRFS QubesOS 4.2)
      • Argon2 used externally and internally: requires a lot of RAM and CPU to derivate passphrase to key validated in key slots.
        • This is used to rate limit efficiently bruteforcing of LUKS key slots, requiring each offline brute force attempt to consume ~15-30 seconds per attempt
        • Of course, strong passphrases are still recommended, but bruteforcing LUKSv2 containers with Argon2 would require immense time, ram and CPU even to bruteforce low entropy passphrase/PINs.
      • passphrase change doesn't reuse LUKS key slot anymore: cryptsetup passhprase change enforces key slot rotattion (new one consusumed per hot operation: the old one wiped internally after change. EG: LUKS key slot 1 created, then 0 deleted)
      • reencryption doesn't permit old call arguments. No more direct-io; inadmissibly slow through AIO (async) calls, need workarounds for good enough perfs (arguments + newer kernel with cloudfare fixes in tree)
    • cryptsetup required whole toolstack version bump:
      • cryptsetup requires lvm2 2.03.23+
      • cryptsetup requires libaio, which is also included in this PR (could be hacked out but deep dependency at first sight: left in, deprecates all legacy boards)
        • which requires util-linux 2.39
      • patches for reproducible builds are included for above 4 packages.
  • luks-functions was updated to support the new cryptsetup 2.6.1 version calls/changes as opposed to old 2.3 in master

    • reencryption happen in direct-io, offline mode and without locking, requiring linux 5.10.9+ to bypass linux queues
      • from tests, this is best for performance and reliability in single-user mode, as before, op cannot be interrupted.
    • LUKS container ops now validate Disk Recovery Key (DRK) passphrase and DRK key slot prior of going forward if needed, failing close early.
      • Heads don't expect DRK to be in static key slot anymore, and finds and keeps track of the used DRK key slot dynamically.
      • If reencrytipn/passphrase change: make sure all LUKS containers on same block device can be unlocked with same DRK
        • Reencryption: requires to know which key slot to reencrypt.
          • Find LUKS key slot that unlocks with DRK passphrase unlock prior of reencrypt call
        • Passphrase change: no slot can be passed, but key slot of DRK rotates.
  • kexec-seal-key

    • TPM LUKS Disk Unlock Key key slots have changed to be set in max slots per LUKS version (LUKSv1:7 /LUKSv2: 31)
      • If key slot != default LUKS version's keyslot outside of found DRK key slot: prompt the user before wiping that key slot, otherwise wipe automatically
        • This takes for granted that the DRK key slot alone is needed on the system and Heads solely controls the LUKS key slots.
          • If user has something else going on, ie: Using USB Security dongle + TPM DUK, then the user will need to say no when wiping additional key slot not being DRK or LUKSv1:7/LUKSv2:31.
            • It was suggested to leave LUKS key slots outside of DRK alone, but then: what to do when all key slots would be used on next op?
              • Alternative implementation could be to only prompt users to wipe key slots other then DRK when key slots are all used (LUKSv1: 0-7, LUKSv2: 0-31)
                • But then cleanup would need to happen prior of operations (LUKS passphrase change, TPM DUK setup) and could be problematic, where OS might fail (QubesOS DRK util doesn't do test on this, not sure why any other user util would also check for that corner case. Was decided against.)
    • LUKS containers now checked to be same LUKS version prior of permitting to set TPM DUK and will refuse to go forward if different versions of LUKS found across LUKS containers specified to be unlocked with TPM DUK, even if across different block devices.

TODOs:

  • async (AIO) calls are not used. direct-io is used instead. libaio could be hacked out
    • this could be subject to future work.

Notes:

  • time to deprecated legacy boards the do not enough space for the new space requirements
    • x230-legacy, x230-legacy-flash, x230-hotp-legacy
      • t430-legacy, t430-legacy-flash, t430-hotp-legacy were already deprecated. No more legacy board support
    • None of the legacy boards are now built by CircleCI. Legacy boards are dead, long lived legacy boards.

Unrelated:

  • typos fixes found along the way

OLD:
I finally got a grip on where stems the problem discussed under #1539

  • cryptsesup requires async/sync ops from kernel (removed directio ops)
    • cryptsetup uses direct-io when in offline mode without locking which is new calls changed in luks helper script
  • libaio is new strong dependency of lvm2 (hacking needed on lvm2 side to remove dep)
  • kernel AIO needed otherwise warning (lets see if that is verbose without being under debug mode later)
  • cryptsetup requires newer libdevmapper and dmsetup to deal also with that
  • those are provided by newer lvm2 binaries, including blkid and dmsetup itself
    • which required newer util-linux version to provide such

Todo:

  • Make sure kernel crypto backend requirements are as small as needed
  • Review patches and clean them
  • Removeinitrd/test_reencrypt_ram.sh when ramfs raw disk reencryption meets normal speed
  • test on real hardware
  • legacy boards now deprecated. refactoring of /etc/ash_functions can now occur to depend only on bash
  • documentation for deprecation of legacy board
  • open other issues, including newer distro impossibilities to having fs reencrypted if done through cryptsetup 2.6.1
  • cloudfare optimizations down from cryptsetup calls Choose stronger encryption by default and/or re-use encryption parameters of LUKS container #1539 (comment) still needed?

  • Disclose publicly needed firmware upgrade/kernel commit bump downstream (that having gone unnoticed confirms that noone reencrypted Q4.2/Q4.2.1 installation up to me discovering this. I can only repeat this, but OEM not pushing users to reencrypt their encrypted drives from OEM means that a malicious worker could backup LUKS header for all laptops where OSes are preinstalled, and sell that LUKS header backup at high prices for daily used laptops theft where OEM provisioned DRK passphrase can then be reused to access FDE content supposedly protected by encryption. The only way to completely transfer OEM ownership of a laptop to the end user is to rerun the Re-Ownership wizard and accept reencrypting the disk. DRK != DRK passphrase. Changing the DRK passphrase won't permit a LUKS header backup/restore from permitting old DRK passphrase from decrypting the FDE content.

Old branch prior of cleaning at https://github.com/tlaurion/heads/tree/staging_cryptsetup_261
That was way too much and unexpected work.

@tlaurion

This comment was marked as resolved.

@tlaurion tlaurion force-pushed the cryptsetup_version_bump-reencryption_cleanup branch 2 times, most recently from 8048f06 to 4cb56d4 Compare November 28, 2023 20:38
@tlaurion

This comment was marked as resolved.

@tlaurion

This comment was marked as resolved.

@tlaurion tlaurion force-pushed the cryptsetup_version_bump-reencryption_cleanup branch from e110960 to 302452d Compare November 29, 2023 06:55
@tlaurion

This comment was marked as resolved.

@UndeadDevel

This comment was marked as resolved.

@tlaurion

This comment was marked as resolved.

@tlaurion

This comment was marked as resolved.

@tlaurion

This comment was marked as resolved.

@tlaurion

This comment was marked as resolved.

@tlaurion

This comment was marked as resolved.

@tlaurion

This comment was marked as resolved.

@tlaurion tlaurion force-pushed the cryptsetup_version_bump-reencryption_cleanup branch from 9d0458b to 63ad6f9 Compare December 10, 2023 22:36
@tlaurion tlaurion force-pushed the cryptsetup_version_bump-reencryption_cleanup branch 3 times, most recently from 20e884d to 2ea3195 Compare April 7, 2024 16:56
@tlaurion tlaurion changed the title WiP: Cryptsetup version bump reencryption cleanup (LUKS2 reencryption speed disastrous otherwise) WiP: Cryptsetup version bump reencryption cleanup (LUKS2 reencryption impossible otherwise on Q4.2 and others) Apr 7, 2024
@tlaurion tlaurion marked this pull request as ready for review April 7, 2024 17:07
@tlaurion

This comment was marked as resolved.

@tlaurion

This comment was marked as resolved.

@UndeadDevel

This comment was marked as resolved.

@tlaurion

This comment was marked as resolved.

@tlaurion tlaurion marked this pull request as draft April 10, 2024 19:06
@tlaurion tlaurion marked this pull request as ready for review October 27, 2024 14:19
@tlaurion
Copy link
Collaborator Author

Oupsies, reencryption happens twice for each luks container. Reviewing.

@tlaurion tlaurion force-pushed the cryptsetup_version_bump-reencryption_cleanup branch 2 times, most recently from 413db0c to c670e6a Compare October 29, 2024 17:21
@tlaurion
Copy link
Collaborator Author

Ready for review. Will address unresolved disscussions points next.

@tlaurion
Copy link
Collaborator Author

tlaurion commented Oct 29, 2024

The TPM DUK work is, IMO, confusing and I can't really tell from review whether it will work reliably or have issues in corner cases (per comments: #1541 (comment), #1541 (comment)). The UX is also confusing/surprising, not something you usually want when dealing with LUKS encryption keys (#1541 (comment)). I've made suggestions for each of those things. While I try to offer actual changes whenever I can, unfortunately due to internal needs I can't allocate more time toward this feature we don't ship right now.

But like I said, I won't block based on those things.

So, up to you @tlaurion whether you want to merge as-is or work on those suggestions, feel free to discuss further though if you'd like.

I refactored and replied in threads of dscussions. Please review again @JonathonHall-Purism.
The problem today is that fedora 37+/cryptsetup 2.6+ created LUKS containers cannot be reencrypted/passphrase changed with older cryptsetup (Heads master has 2.3.3)

If there is areas of work outside of left todo in last commit (attempt to unlock luks containers with cached LUKS DRK pasphrase), please let me know and i'll open a seperate issue for working this further, but I think that what was blocking this PR was since then addressed.

OEM shipped OS installations should be reencrypted/passphrase changed by user through re-ownership wizard, which cannot happen for a while unless this PR is merged.

Also note that videorecording for end users, made by oems, are expected to be recorded, ideally prior of #1821. So GUI changes should happen sooner then later so that those recordings can be done. Otherwise, either recordings will need to be redone or fixes postponed to next downstream releases

@tlaurion tlaurion changed the title Cryptsetup toolstack version bump + reencryption cleanup (LUKSv2+Luksv1 proper support + reencryption on Q4.2 + BTRFS dual LUKS containers install) Cryptsetup toolstack version bump (2.3.3-> 2.6.1) + reencryption cleanup (LUKS v1/v2 proper support + reencryption on Q4.2 + BTRFS dual LUKS containers install) Oct 29, 2024
@JonathonHall-Purism
Copy link
Collaborator

Thanks @tlaurion. This implementation makes much more sense, and I think the UX is clearer too. Looks good to me 👍

Cloudfare patches to speed up LUKS encryption were upstreamed into linux kernel and backported to 5.10.9: cloudflare/linux#1 (comment)
Therefore, we bump to latest of 5.10.x (bump from 5.10.5 which doesn't contain the fixes)

Trace:
    sed -i 's/5.10.5/5.10.214/g' boards/*/*.config
    find ./boards/*/*.config | awk -F "/" {'print $3'}| while read board; do echo "make BOARD=$board linux"; make BOARD=$board linux; echo make BOARD=$board linux.save_in_oldconfig_format_in_place || make BOARD=$board linux.modify_and_save_oldconfig_in_place; done
    git status | grep modified | awk -F ":" {'print $2'}| xargs git add
    git commit --signoff

- Move patches from 5.10.5 -> 5.10.214
- Add linux kernel hash and version under modules/linux
- Change board configs accordingly

Signed-off-by: Thierry Laurion <[email protected]>
…LUKS containers (BTRFS QubesOS 4.2)

cryptsetup2 2.6.1 is a new release that supports reencryption of Q4.2 release LUKS2 volumes created at installation.
 This is a critical feature for the Qubes OS 4.2 release for added data at rest protection

Cryptsetup 2.6.x internal changes:
 - Argon2 used externally and internally: requires a lot of RAM and CPU to derivate passphrase to key validated in key slots.
  - This is used to rate limit efficiently bruteforcing of LUKS key slots, requiring each offline brute force attempt to consume ~15-30 seconds per attempt
  - OF course, strong passphrases are still recommended, but bruteforcing LUKSv2 containers with Argon2 would require immense time, ram and CPU even to bruteforce low entropy passphrase/PINs.
 - passphrase change doesn't permit LUKS key slot specification anymore: key slot rotates (new one consusumed per op: then old one wiped internally. EG: LUKS key slot 1 created, then 0 deleted)
 - reencryption doesn't permit old call arguments. No more direct-io; inadmissively slow through AIO (async) calls, need workarounds for good enough perfs (arguments + newer kernel with cloudfare fixes in tree)

cryptsetup 2.6.1 requires:
 - lvm2 2.03.23, which is also included in this PR.
   - requires libaio, which is also included in this PR (could be hacked out but deep dependency at first sight: left in)
   - requires util-linux 2.39
 - patches for reproducible builds are included for above 3 packages.

luks-functions was updated to support the new cryptsetup2 version calls/changes
 - reencryption happen in direct-io, offline mode and without locking, requiring linux 5.10.9+ to bypass linux queues
   - from tests, this is best for performance and reliability in single-user mode
 - LUKS container ops now validate Disk Recovery Key (DRK) passphrase prior and DRK key slot prior of going forward if needed, failing early.
  - Heads don't expect DRK to be in static key slot anymore, and finds the DRK key slot dynamically.
  - If reencrytipn/passphrase change: make sure all LUKS containers on same block device can be unlocked with same DRK
 - Reencryption: requires to know which key slot to reencrypt.
   - Find LUKS key slot that unlocks with DRK passphrase unlock prior of reencrypt call
 - Passphrase change: no slot can be passed, but key slot of DRK rotates.

kexec-seal-key
 - TPM LUKS Disk Unlock Key key slots have changed to be set in max slots per LUKS version (LUKSv1:7 /LUKSv2: 31)
  - If key slot != default LUKS version's keyslot outside of DRK key slot: prompt the user before wiping that key slot, otherwise wipe automatically
    - This takes for granted that the DRK key slot alone is needed on the system and Heads controls the LUKS key slots.
      - If user has something else going on, ie: Using USB Security dongle + TPM DUK, then the user will need to say no when wiping keys.
      - It was suggested to leave LUKS key slots outside of DRK alone, but then: what to do when all key slots would be used?
        - Alternative implementation could be to only prompt users to wipe keyslots other then DRK when key slots are all used (LUKSv1: 0-7, LUKSv2: 0-31)
          - But then cleanup would need to happen prior of operations (LUKS passphrase change, TPM DUK setup) and could be problematic.
  - LUKS containers now checked to be same LUKS version prior of permitting to set TPM DUK and will refuse to go forward of different versions.

TODO:
- async (AIO) calls are not used. direct-io is used instead. libaio could be hacked out
  - this could be subject to future work

Notes:
- time to deprecated legacy boards the do not enough space for the new space requirements
 - x230-legacy, x230-legacy-flash, x230-hotp-legacy
 - t430-legacy, t430-legacy-flash, t430-hotp-legacy already deprecated

Unrelated:
- typos fixes found along the way

Signed-off-by: Thierry Laurion <[email protected]>
…seems like luks passphrase change only happens on one of the containers; not all

Signed-off-by: Thierry Laurion <[email protected]>
… DEBUG for TOTP secret/qrcode output to console

Signed-off-by: Thierry Laurion <[email protected]>
… first LUKS volume, not all

Remove unneeded loop under luks_reencrypt

Signed-off-by: Thierry Laurion <[email protected]>
…wiped when going to recovery shell and upon automatic cleanup as all other secret

Signed-off-by: Thierry Laurion <[email protected]>
Signed-off-by: Thierry Laurion <[email protected]>

Signed-off-by: Thierry Laurion <[email protected]>
…ne last time: seems like luks passphrase change only happens on one of the containers; not all"

This reverts commit 20e9392.

To test this PR without reencryption, just 'git revert' this commit

Signed-off-by: Thierry Laurion <[email protected]>
…ith prompted DRK then ask user to confirm that those are all ok to reencryt/change passphrase onto (oem factory reset/manual, whatever)

- cache/reuse that passphrase, used afterward to find which LUKS keyslot contains the DRK, which is used to direct reencryption, also reused for passphrase change.
- refactoring detection + testing of prompted LUKS passphrase for discovered LUKS containers that can be unlocked with same passphrase to prompt user for selection

TODO: remove duplicate luks passphrase unlocking volumes functions for the moment

Signed-off-by: Thierry Laurion <[email protected]>
- fi misplaced
- rework reencryption loop
- added verbose output on TPM DUK key addition when LUKS container can be unlocked with DRK

Current state, left todo for future work:

TPM DUK:
- TPM DUK setup on defautl boot reuses /boot/kexec_key_devices.txt if present
- If not, list all LUKS partitions, asks user for selection and makes sure LUKS passphrase can unlock all
- Works on both LUKSv1 and LUKSv2 containers, reusing OS installer settings (Heads doesn't enforce better then OS installer LUKS parameters)

LUKS passphrase change/LUKS reencryption:
- Reuses /boot/kexec_key_devices.txt if existing
- If not, prompts for LUKS passphase, list all LUKS containers not being USB based and attempt to unlock all those, listing only the ones successfully unlocked
- Prompts user to reuse found unlockable LUKS partitions with LUKS passphrase, caches and reuse in other LUKS operations (passphrase change as well from oem factory reset/re-ownership)
- Deals properly with LUKSv1/LUKSv2/multiple LUKS containers and reencrypt/passphrase changes them all if accepted, otherwise asks user to select individual LUKS container

Tested on luksv1,luksv2, btrfs under luks (2x containers) and TPM DUK setup up to booting OS. All good

TODO:
- LUKS passphrase check is done multiple times across TPM DUK, reencryption and luks passphrase. Could refactor to change this, but since this op is done only one reencrypt+passphrase change) upon hardare reception from OEM, I stopped caring here.

Signed-off-by: Thierry Laurion <[email protected]>
@UndeadDevel
Copy link
Contributor

@UndeadDevel ping for review if you have a chance as well. Code suggestions welcome. Also incorporates #1547 AFAIK as said in other comments.

Sorry for not replying; I've been completely swamped with work and didn't have time for my many hobbies. Unfortunately this state of affairs is going to continue for a while, so I won't be able to collaborate for the foreseeable future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Choose stronger encryption by default and/or re-use encryption parameters of LUKS container
3 participants