Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minnowboard Turbot's CPU soft locks at random, affecting automated tests #1000

Open
wiktormowinski opened this issue Aug 13, 2024 · 5 comments

Comments

@wiktormowinski
Copy link

wiktormowinski commented Aug 13, 2024

Component

Dasharo firmware

Device

other

Dasharo version

v0.9.0-rc1

Dasharo Tools Suite version

No response

Test case ID

No response

Brief summary

sometimes Minnowboard gets really slow or even freezes what is indicated by watchdog reporting that CPU is stuck for X seconds

How reproducible

rare about 30% of tests should get it during regression

How to reproduce

for auto:
run absolutely any automated test suite, but for a near 100% fail chance i suggest CPF002.001 from dasharo-performance this is because cpf002.001 lasts 1h and during that time the softlock can occur at any time (most often happens around ~15min mark)

for manual:

  1. try logging in to OS via serial
  2. if you manage to do that just stay idle for a while and the error should pop up eventually

Expected behavior

the tests should continue uninterrupted

Actual behavior

instead the tests get either

  1. lost (80%):
  • looking for login to os (90%)
  • looking for any checkpoint phrase (10%)
  1. CPU gets stuck and you get a message from watchdog (20%)
    watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [systemd-udevd:120]

though I am almost certain both of these stem from the very same problem

Screenshots

No response

Additional context

my cpf002.001 attempts documented:
faile.zip

Solutions you've tried

No response

@miczyg1
Copy link
Contributor

miczyg1 commented Aug 14, 2024

Hard to say what may go wrong. It needs further investigation. Especially that it can be hard to reproduce, e.g. after 1h

@wiktormowinski
Copy link
Author

wiktormowinski commented Aug 20, 2024

After the initial fixes to the fw, focusing other issues, seems like this one got fixed too.
You can boot to ubuntu and it doesn't soft lock after a minute (which was a common occurance if not quicker)

@filipleple
Copy link
Member

filipleple commented Aug 20, 2024

It used to be pretty much unbearable, allowing the platform to be used for like 30sec at most. After building from the byt_fsp_parity branch however, it hasn't occurred for a while of normal use. I will run lengthy performance/stability tests today to confirm whether this has been completely resolved.

EDIT: that's only true for the SB binary. It still persists after building the non-SB config.

@wiktormowinski
Copy link
Author

thanks for confirming

@miczyg1
Copy link
Contributor

miczyg1 commented Nov 7, 2024

The issue does not happen in a deterministic way. Sometimes the CPU soft-locks when the system is booting, sometimes when it is running for a couple of seconds and minutes. Printing cbmem console or dmesg on serial console helps with triggering the issue a little bit faster if it doesn't happen right off the bat. Some platforms were not affected by the issue (mainly quad core platforms).

I have analyzed the Bay Trail FSP source and compared it against Bay Trail native silicon init in coreboot and haven't found any major problems. A couple of things caught my eye regarding CPU P/C states, which I fixed per BWG, however, it didn't help. The work is on WIP PR: Dasharo/coreboot#575

Now that I am thinking about it, maybe it is some issue with C6 state and C6 DRAM which ought to be reserved for it. That would imply some difference in MRC binary and FSP memory init.

coreboot 4.11 (lastest version which still had FSP baytrail support) did not have the problem, so it may be related to the MRC bin not doing something what should be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants