Releases: open-power/skiboot
v5.10.4
skiboot-5.10.4
skiboot 5.10.4 was released on Wednesday April 4th, 2018. It replaces
skiboot-5.10.3 as the current stable release in the 5.10.x series.
It is recommended that 5.10.3 be used instead of any previous 5.10.x
version due to the bug fixes and debugging enhancements in it.
Over skiboot-5.10.3, we have one bug fix:
-
xive: disable store EOI support
Hardware has limitations which would require to put a sync after
each store EOI to make sure the MMIO operations that change the ESB
state are ordered. This is a killer for performance and the PHBs do
not support the sync. So remove the store EOI for the moment, until
hardware is improved.Also, while we are at changing the XIVE source flags, let’s fix the
settings for the PHB4s which should follow these rules :-
SHIFT_BUG for DD10
-
STORE_EOI for DD20 and if enabled
-
TRIGGER_PAGE for DDx0 and if not STORE_EOI
-
v5.10.3
skiboot-5.10.3
skiboot 5.10.3 was released on Thursday March 28th, 2018. It replaces
skiboot-5.10.2 as the current stable release in the 5.10.x series.
It is recommended that 5.10.3 be used instead of any previous 5.10.x
version due to the bug fixes and debugging enhancements in it.
Over skiboot-5.10.2, we have a few improvements and bug fixes:
-
NPU2: dump NPU2 registers on npu2 HMI
Due to the nature of debugging npu2 issues, folk are wanting the
full list of NPU2 registers dumped when there’s a problem.This is different than the solution introduced in 5.10.1 as there we
would dump the registers in a way that would trigger a FIR bit that
would confuse PRD. -
npu2: Add performance tuning SCOM inits
Peer-to-peer GPU bandwidth latency testing has produced some tunable
values that improve performance. Add them to our device
initialization.File these under things that need to be cleaned up with nice
#defines for the register names and bitfields when we get time.A few of the settings are dependent on the system’s particular
NVLink topology, so introduce a helper to determine how many links
go to a single GPU. -
hw/npu2: Assign a unique LPARSHORTID per GPU
This gets used elsewhere to index items in the XTS tables.
-
occ: Set up OCC messaging even if we fail to setup pstates
This means that we no longer hit this bug if we fail to get valid
pstates from the OCC.[console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
[ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
[ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
[ 10.318805] Disabling lock debugging due to kernel taint
[ 10.318808] Severe Machine check interrupt [Not recovered]
[ 10.318812] NIP [000000003003e434]: 0x3003e434
[ 10.318813] Initiator: CPU
[ 10.318815] Error type: Real address [Load/Store (foreign)]
[ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception
[ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3
[ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240
[ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1)
[ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000
[ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1 -
core/fast-reboot: disable fast reboot upon fundamental
entry/exit/locking errorsThis disables fast reboot in several more cases where serious errors
like lock corruption or call re-entrancy are detected. -
core/opal: allow some re-entrant calls
This allows a small number of OPAL calls to succeed despite re-
entering the firmware, and rejects others rather than aborting.This allows a system reset interrupt that interrupts OPAL to do
something useful. Sreset other CPUs, use the console, which allows
xmon to work or stack traces to be printed, reboot the system.Use OPAL_INTERNAL_ERROR when rejecting, rather than OPAL_BUSY, which
is used for many other things that does not mean a serious permanent
error. -
core/opal: abort in case of re-entrant OPAL call
The stack is already destroyed by the time we get here, so there is
not much point continuing. -
npu2: Disable fast reboot
Fast reboot does not yet work right with the NPU. It’s been disabled
on NVLink and OpenCAPI machines. Do the same for NVLink2.This amounts to a port of 3e45779 (“npu: Fix broken fast
reset”) from the npu code to npu2.
v5.11-rc1
skiboot-5.11-rc1
skiboot v5.11-rc1 was released on Wednesday March 28th 2018. It is the
first release candidate of skiboot 5.11, which will become the new
stable release of skiboot following the 5.10 release, first released
February 23rd 2018.
It is not expected to keep the 5.11 branch around for long, and
instead quickly move onto a 6.0, which will mark the basis for op-
build v2.0 and will be required for POWER9 systems.
skiboot v5.11-rc1 contains all bug fixes as of skiboot-5.10.3 and
skiboot-5.4.9 (the currently maintained stable releases). There may
be more 5.10.x stable releases, it will depend on demand.
For how the skiboot stable releases work, see Skiboot stable tree
rules and releases for details.
The current plan is to cut the final 5.11 in March, with skiboot 5.11
being for all POWER8 and POWER9 platforms in op-build v1.22. This
release is targeted to early POWER9 systems.
Over skiboot-5.10, we have the following changes:
New Platforms
-
Add VESNIN platform support
The Vesnin platform from YADRO is a 4 socked POWER8 system with up
to 8TB of memory with 460GB/s of memory bandwidth in only 2U. Many
kudos to the team from Yadro for submitting their code upstream!
New Features
-
fast-reboot: enable by default for POWER9
- Fast reboot is disabled if NPU2 is present or CAPI2/OpenCAPI is
used
- Fast reboot is disabled if NPU2 is present or CAPI2/OpenCAPI is
-
PCI tunneled operations on PHB4
-
phb4: set PBCQ Tunnel BAR for tunneled operations
P9 supports PCI tunneled operations (atomics and as_notify) that
are initiated by devices.A subset of the tunneled operations require a response, that must
be sent back from the host to the device. For example, an atomic
compare and swap will return the compare status, as swap will only
performed in case of success. Similarly, as_notify reports if the
target thread has been woken up or not, because the operation may
fail.To enable tunneled operations, a device driver must tell the host
where it expects tunneled operation responses, by setting the PBCQ
Tunnel BAR Response register with a specific value within the
range of its BARs.This register is currently initialized by enable_capi_mode(). But,
as tunneled operations may also operate in PCI mode, a new API is
required to set the PBCQ Tunnel BAR Response register, without
switching to CAPI mode.This patch provides two new OPAL calls to get/set the PBCQ Tunnel
BAR Response register.Note: as there is only one PBCQ Tunnel BAR register, shared
between all the devices connected to the same PHB, only one of
these devices will be able to use tunneled operations, at any
time. -
phb4: set PHB CMPM registers for tunneled operations
P9 supports PCI tunneled operations (atomics and as_notify) that
require setting the PHB ASN Compare/Mask register with a 16-bit
indication.This register is currently initialized by enable_capi_mode(). But,
as tunneled operations may also work in PCI mode, the ASN
Compare/Mask register should rather be initialized in
phb4_init_ioda3().This patch also adds “ibm,phb-indications” to the device tree, to
tell Linux the values of CAPI, ASN, and NBW indications, when
supported.Tunneled operations tested by IBM in CAPI mode, by Mellanox
Technologies in PCI mode.
-
-
Tie tm-suspend fw-feature and opal_reinit_cpus() together
Currently opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED)
always returns OPAL_UNSUPPORTED.This ties the tm suspend fw-feature to the
opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) so that when
tm suspend is disabled, we correctly report it to the kernel. For
backwards compatibility, it’s assumed tm suspend is available if the
fw-feature is not present.Currently hostboot will clear fw-feature(TM_SUSPEND_ENABLED) on P9N
DD2.1. P9N DD2.2 will set fw-feature(TM_SUSPEND_ENABLED). DD2.0 and
below has TM disabled completely (not just suspend).We are using opal_reinit_cpus() to determine this setting (rather
than the device tree/HDAT) as some future firmware may let us change
this dynamically after boot. That is not the case currently though.
Power Management
-
SLW: Increase stop4-5 residency by 10x
Using DGEMM benchmark we observed there was a drop of 5-9%
throughput with and without stop4/5. In this benchmark the GPU waits
on the cpu to wakeup and provide the subsequent data block to
compute. The wakup latency accumulates over the run and shows up as
a performance drop.Linux enters stop4/5 more aggressively for its wakeup latency.
Increasing the residency from 1ms to 10ms makes the performance drop
<1% -
occ: Set up OCC messaging even if we fail to setup pstates
This means that we no longer hit this bug if we fail to get valid
pstates from the OCC.[console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
[ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
[ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
[ 10.318805] Disabling lock debugging due to kernel taint
[ 10.318808] Severe Machine check interrupt [Not recovered]
[ 10.318812] NIP [000000003003e434]: 0x3003e434
[ 10.318813] Initiator: CPU
[ 10.318815] Error type: Real address [Load/Store (foreign)]
[ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception
[ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3
[ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240
[ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1)
[ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000
[ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1
mbox based platforms
For platforms using the mbox protocol for host flash access (all BMC
based OpenPOWER systems, most OpenBMC based systems) there have been
some hardening efforts in the event of the BMC being poorly behaved.
-
mbox: Reduce default BMC timeouts
Rebooting a BMC can take 70 seconds. Skiboot cannot possibly spin
for 70 seconds waiting for a BMC to come back. This also makes the
current default of 30 seconds a bit pointless, is it far too short
to be a worse case wait time but too long to avoid hitting
hardlockup detectors and wrecking havoc inside host linux.Just change it to three seconds so that host linux will survive and
that, reads and writes will fail but at least the host stays up.Also refactored the waiting loop just a bit so that it’s easier to
read. -
mbox: Harden against BMC daemon errors
Bugs present in the BMC daemon mean that skiboot gets presented with
mbox windows of size zero. These windows cannot be valid and skiboot
already detects these conditions.Currently skiboot warns quite strongly about the occurrence of these
problems. The problem for skiboot is that it doesn’t take any
action. Initially I wanting to avoid putting policy like this into
skiboot but since these bugs aren’t going away and skiboot barfing
is leading to lockups and ultimately the host going down something
needs to be done.I propose that when we detect the problem we fail the mbox call and
punt the problem back up to Linux. I don’t like it but at least it
will cause errors to cascade and won’t bring the host down. I’m not
sure how Linux is supposed to detect this or what it can even do but
this is better than a crash.Diagnosing a failure to boot if skiboot its self fails to read flash
may be marginally more difficult with this patch. This is because
skiboot will now only print one warning about the zero sized window
rather than continuously spitting it out.
Fast Reboot Improvements
Around fast-reboot we have made several improvements to harden the
fast reboot code paths and resort to a full IPL if something doesn’t
look right.
-
core/fast-reboot: zero memory after fast reboot
This improves the security and predictability of the fast reboot
environment.There can not be a secure fence between fast reboots, because a
malicious OS can modify the firmware itself. However a well-behaved
OS can have a reasonable expectation that OS memory regions it has
modified will be cleared upon fast reboot.The memory is zeroed after all other CPUs come up from fast reboot,
just before the new kernel is loaded and booted into. This allows
image preloading to run concurrently, and will allow parallelisation
of the clearing in future. -
core/fast-reboot: verify mem regions before fast reboot
Run the mem_region sanity checkers before proceeding with fast
reboot.This is the beginning of proactive sanity checks on opal data for
fast reboot (with complements the reactive disable_fast_reboot
cases). This is encouraged to re-use and share any kind of debug
code and unit test code. -
fast-reboot: occ: Only delete /ibm, opal/power-mgt nodes if they
exist -
core/fast-reboot: disable fast reboot upon fundamental
entry/exit/locking errorsThis disables fast reboot in several more cases where serious errors
like lock corruption or call re-entrancy are detected. -
capp: Disable fast-reboot whenever enable_capi_mode() is called
This patch updates phb4_set_capi_mode() to disable fast-reboot
whenever enable_capi_mode() is called, irres...
v5.10.2
skiboot-5.10.2
skiboot 5.10.2 was released on Tuesday March 6th, 2018. It replaces
skiboot-5.10.1 as the current stable release in the 5.10.x series.
Over skiboot-5.10.1, we have one improvement:
-
Tie tm-suspend fw-feature and opal_reinit_cpus() together
Currently opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED)
always returns OPAL_UNSUPPORTED.This ties the tm suspend fw-feature to the
opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) so that when
tm suspend is disabled, we correctly report it to the kernel. For
backwards compatibility, it’s assumed tm suspend is available if the
fw-feature is not present.Currently hostboot will clear fw-feature(TM_SUSPEND_ENABLED) on P9N
DD2.1. P9N DD2.2 will set fw-feature(TM_SUSPEND_ENABLED). DD2.0 and
below has TM disabled completely (not just suspend).We are using opal_reinit_cpus() to determine this setting (rather
than the device tree/HDAT) as some future firmware may let us change
this dynamically after boot. That is not the case currently though.
v5.10.1
skiboot-5.10.1
skiboot 5.10.1 was released on Thursday March 1st, 2018. It replaces
skiboot-5.10 as the current stable release in the 5.10.x series.
Over skiboot-5.10, we have an improvement for debugging NPU2/NVLink
problems and a bug fix. These changes are:
-
NPU2 HMIs: dump out a LOT of npu2 registers for debugging
-
libflash/blocklevel: Correct miscalculation in
blocklevel_smart_erase()This fixes a bug in pflash.
If blocklevel_smart_erase() detects that the smart erase fits entire
in one erase block, it has an early bail path. In this path it
miscaculates where in the buffer the backend needs to read from to
perform the final write.Fixes: #151
v5.10
skiboot-5.10
skiboot v5.10 was released on Friday February 23rd 2018. It is the first
release of skiboot 5.10, and becomes the new stable release of skiboot
following the 5.9 release, first released October 31st 2017.
skiboot v5.10 contains all bug fixes as of skiboot-5.9.8 and
skiboot-5.4.9. We do not forsee any further 5.9.x releases.
For how the skiboot stable releases work, see stable-rules for details.
Over skiboot-5.9, we have the following changes:
New Features
Since skiboot-5.10-rc3:
-
sensor-groups: occ: Add support to disable/enable sensor group
This patch adds a new opal call to enable/disable a sensor group.
This call is used to select the sensor groups that needs to be
copied to main memory by OCC at runtime. -
sensors: occ: Add energy counters
Export the accumulated power values as energy sensors. The
accumulator field of power sensors are used for representing energy
counters which can be exported as energy counters in Linux hwmon
interface. -
sensors: Support reading u64 sensor values
This patch adds support to read u64 sensor values. This also adds
changes to the core and the backend implementation code to make this
API as the base call. Host can use this new API to read sensors upto
64bits.This adds a list to store the pointer to the kernel u32 buffer, for
older kernels making async sensor u32 reads. -
dt: add /cpus/ibm,powerpc-cpu-features device tree bindings
This is a new CPU feature advertising interface that is
fine-grained, extensible, aware of privilege levels, and gives
control of features to all levels of the stack (firmware,
hypervisor, and OS).The design and binding specification is described in detail in doc/.
Since skiboot-5.10-rc2:
-
DT: Add "version" property under ibm, firmware-versions node
First line of VERSION section in PNOR contains firmware version. Use
that to add "version" property under firmware versions dt node.Sample output:
root@xxx2:/proc/device-tree/ibm,firmware-versions# lsprop version "witherspoon-ibm-OP9_v1.19_1.94"
Since skiboot-5.10-rc1:
- hw/npu2: Implement logging HMI actions
Since skiboot-5.9:
-
hdata: Parse IPL FW feature settings
Add parsing for the firmware feature flags in the HDAT. This
indicates the settings of various parameters which are set at IPL
time by firmware. -
opal/xstop: Use nvram option to enable/disable sw checkstop.
Add a mechanism to enable/disable sw checkstop by looking at nvram
option opal-sw-xstop=<enable/disable>.For now this patch disables the sw checkstop trigger unless
explicitly enabled through nvram option 'opal-sw-xstop=enable'i for
p9. This will allow an opportunity to get host kernel in panic path
or xmon for unrecoverable HMIs or MCE, to be able to debug the issue
effectively.To enable sw checkstop in opal issue following command: :
nvram -p ibm,skiboot --update-config opal-sw-xstop=enable
NOTE: This is a workaround patch to disable sw checkstop by
default to gain control in host kernel for better checkstop
debugging. Once we have most of the checkstop issues
stabilized/resolved, revisit this patch to enable sw checkstop by
default.For p8 platform it will remain enabled by default unless explicitly
disabled.To disable sw checkstop on p8 issue following command: :
nvram -p ibm,skiboot --update-config opal-sw-xstop=disable
-
hdata: Parse SPD data
Parse SPD data and populate device tree.
list of properties parsing from SPD: :
[root@ltc-wspoon dimm@d00f]# lsprop . memory-id 0000000c (12) # DIMM type product-version 00000032 (50) # Module Revision Code device_type "memory-dimm-ddr4" serial-number 15d9acb6 (366587062) status "okay" size 00004000 (16384) phandle 000000bd (189) ibm,loc-code "UOPWR.0000000-Node0-DIMM7" part-number "36ASF2G72PZ-2G6B2 " reg 0000d007 (53255) name "dimm" manufacturer-id 0000802c (32812) # Vendor ID, we can get vendor name from this ID
Also update documentation.
-
hdata: Add memory hierarchy under xscom node
We have memory to chip mapping but doesn't have complete memory
hierarchy. This patch adds memory hierarchy under xscom node. This
is specific to P9 system as these hierarchy may change between
processor generation.It uses memory controller ID details and populates nodes like:
: xscom@<addr>/mcbist@<mcbist_id>/mcs@<mcs_id>/mca@<mca_id>/dimm@<resource_id>
Also this patch adds few properties under dimm node. Finally make
sure xscom nodes created before calling memory_parse().
Fast Reboot and Quiesce
We have a preliminary fast reboot implementation for POWER9 systems,
which we look to enabling by default in the next release.
The OPAL Quiesce calls are designed to improve reliability and
debuggability around reboot and error conditions. See the full API
documentation for details: opal-quiesce.
-
fast-reboot: bare bones fast reboot implementation for POWER9
This is an initial fast reboot implementation for p9 which has only
been tested on the Witherspoon platform, and without the use of
NPUs, NX/VAS, etc.This has worked reasonably well so far, with no failures in about
100 reboots. It is hidden behind the traditional fast-reboot
experimental nvram option, until more platforms and configurations
are tested. -
fast-reboot: move boot CPU clean-up logically together with
secondariesMove the boot CPU clean-up and state transition to active, logically
together with secondaries. Don't release secondaries from fast
reboot hold until everyone has cleaned up and transitioned to
active.This is cosmetic, but it is helpful to run the fast reboot state
machine the same way on all CPUs. -
fast-reboot: improve failure error messages
Change existing failure error messages to PR_NOTICE so they get
printed to the console, and add some new ones. It's not a more
severe class because it falls back to IPL on failure. -
fast-reboot: quiesce opal before initiating a fast reboot
Switch fast reboot to use quiescing rather than "wait for a while".
If firmware can not be quiesced, then fast reboot is skipped. This
significantly improves the robustness of fast reboot in the face of
bugs or unexpected latencies.Complexity of synchronization in fast-reboot is reduced, because we
are guaranteed to be single-threaded when quiesce succeeds, so locks
can be removed.In the case that firmware can be quiesced, then it will generally
reduce fast reboot times by nearly 200ms, because quiescing usually
takes very little time. -
core: Add support for quiescing OPAL
Quiescing is ensuring all host controlled CPUs (except the current
one) are out of OPAL and prevented from entering. This can be use in
debug and shutdown paths, particularly with system reset sequences.This patch adds per-CPU entry and exit tracking for OPAL calls, and
adds logic to "hold" or "reject" at entry time, if OPAL is quiesced.An OPAL call is added, to expose the functionality to Linux, where
it can be used for shutdown, kexec, and before generating sreset
IPIs for debugging (so the debug code does not recurse into OPAL). -
dctl: p9 increase thread quiesce timeout
We require all instructions to be completed before a thread is
considered stopped, by the dctl interface. Long running instructions
like cache misses and CI loads may take a significant amount of time
to complete, and timeouts have been observed in stress testing.Increase the timeout significantly, to cover this. The workbook just
says to poll, but we like to have timeouts to avoid getting stuck in
firmware.
POWER9 power saving
There is much improved support for deeper sleep/idle (stop) states on
POWER9.
-
OCC: Increase max pstate check on P9 to 255
This has changed from P8, we can now have > 127 pstates.
This was observed on Boston during WoF bring up.
-
SLW: Add idle state stop5 for DD2.0 and above
Adding stop5 idle state with rough residency and latency numbers.
-
SLW: Add p9_stop_api calls for IMC
Add p9_stop_api for EVENT_MASK and PDBAR scoms. These scoms are
lost on wakeup from stop11. -
SCOM restore for DARN and XIVE
While waking up from stop11, we want NCU_DARN_BAR to have enable
bit set. Without this stop_api call, the value restored is without
enable bit set. We loose NCU_SPEC_BAR when the quad goes into
stop11, stop_api will restore while waking up from stop11. -
SLW: Call p9_stop_api only if deep_states are enabled
All init time p9_stop_api calls have been isolated to
slw_late_init. If p9_stop_api fails, then the deep states can be
excluded from device tree.For p9_stop_api called after device-tree for cpuidle is created ,
has_deep_states will be used to check if this call is even
required. -
Better handle errors in setting up sleep states (p9_stop_api)
We won't put affected stop states in the device tree if the wakeup
engine is not present or has failed. -
SCOM Restore: Increased the EQ SCOM restore limit.
Commit increases the SCOM restore limit from 16 to 31.
-...
v5.10-rc4
skiboot-5.10-rc4
skiboot v5.10-rc4 was released on Wednesday February 21st 2018. It is
the fourth release candidate of skiboot 5.10, which will become the new
stable release of skiboot following the 5.9 release, first released
October 31st 2017.
skiboot v5.10-rc4 contains all bug fixes as of skiboot-5.9.8 and
skiboot-5.4.9 (the currently maintained stable releases). There may be
more 5.9.x stable releases, it will depend on demand.
For how the skiboot stable releases work, see stable-rules for details.
The current plan is to cut the final 5.10 in February, with skiboot 5.10
being for all POWER8 and POWER9 platforms in op-build v1.21. This
release will be targeted to early POWER9 systems.
Over skiboot-5.10-rc3, we have the following changes:
-
core: Fix mismatched names between reserved memory nodes &
propertiesOPAL exposes reserved memory regions through the device tree in both
new (nodes) and old (properties) formats.However, the names used for these don't match - we use a generated
cell address for the nodes, but the plain region name for the
properties.This fixes a warning from FWTS
-
sensor-groups: occ: Add support to disable/enable sensor group
This patch adds a new opal call to enable/disable a sensor group.
This call is used to select the sensor groups that needs to be
copied to main memory by OCC at runtime. -
sensors: occ: Add energy counters
Export the accumulated power values as energy sensors. The
accumulator field of power sensors are used for representing energy
counters which can be exported as energy counters in Linux hwmon
interface. -
sensors: Support reading u64 sensor values
This patch adds support to read u64 sensor values. This also adds
changes to the core and the backend implementation code to make this
API as the base call. Host can use this new API to read sensors upto
64bits.This adds a list to store the pointer to the kernel u32 buffer, for
older kernels making async sensor u32 reads. -
dt: add /cpus/ibm,powerpc-cpu-features device tree bindings
This is a new CPU feature advertising interface that is
fine-grained, extensible, aware of privilege levels, and gives
control of features to all levels of the stack (firmware,
hypervisor, and OS).The design and binding specification is described in detail in doc/.
-
phb3/phb4/p7ioc: Document supported TCE sizes in DT
Add a new property, "ibm,supported-tce-sizes", to advertise to Linux
how big the available TCE sizes are. Each value is a bit shift, from
smallest to largest. -
phb4: Fix TCE page size
The page sizes for TCEs on P9 were inaccurate and just copied from
PHB3, so correct them. -
Revert "pci: Shared slot state synchronisation for hot reset"
An issue was found in shared slot reset where the system can be
stuck in an infinite loop, pull the code out until there's a proper
fix.This reverts commit 1172a6c.
-
hdata/iohub: Use only wildcard slots for pluggables
We don't want to cause a VID:DID check against pluggable devices, as
they may use multiple devids.Narrow the condition under which VID:DID is listed in the dt, so
that we'll end up creating a wildcard slot for these instead. -
increase log verbosity in debug builds
-
Add -debug to version on DEBUG builds
-
cpu_wait_job: Correctly report time spent waiting for job
v5.10-rc3
skiboot-5.10-rc3
skiboot v5.10-rc3 was released on Thursday February 15th 2018. It is the
third release candidate of skiboot 5.10, which will become the new
stable release of skiboot following the 5.9 release, first released
October 31st 2017.
skiboot v5.10-rc3 contains all bug fixes as of skiboot-5.9.8 and
skiboot-5.4.9 (the currently maintained stable releases). There may be
more 5.9.x stable releases, it will depend on demand.
For how the skiboot stable releases work, see stable-rules for details.
The current plan is to cut the final 5.10 in February, with skiboot 5.10
being for all POWER8 and POWER9 platforms in op-build v1.21. This
release will be targeted to early POWER9 systems.
Over skiboot-5.10-rc2, we have the following changes:
-
vas: Disable VAS/NX-842 on some P9 revisions
VAS/NX-842 are not functional on some P9 revisions, so disable them
in hardware and skip creating their device tree nodes.Since the intent is to prevent OS from configuring VAS/NX, we remove
only the platform device nodes but leave the VAS/NX DT nodes under
xscom (i.e we don't skip add_vas_node() in hdata/spira.c) -
phb4: Only escalate freezes on MMIO load where necessary
In order to work around a hardware issue, MMIO load freezes were
escalated to fences on every chip. Now that hardware no longer
requires this, restrict escalation to the chips that actually need
it. -
pflash: Fix makefile dependency issue
-
DT: Add "version" property under ibm, firmware-versions node
First line of VERSION section in PNOR contains firmware version. Use
that to add "version" property under firmware versions dt node.Sample output:
root@xxx2:/proc/device-tree/ibm,firmware-versions# lsprop version "witherspoon-ibm-OP9_v1.19_1.94"
-
npu2: Disable TVT range check when in bypass mode
On POWER9 the GPUs need to be able to access the MMIO memory space.
Therefore the TVT range check needs to include the MMIO address
space. As any possible range check would cover all of memory anyway
this patch just disables the TVT range check all together when
bypassing the TCE tables. -
hw/npu2: support creset of npu2 devices
creset calls in the hw procedure that resets the PHY, we don't take
them out of reset, just put them in reset.this fixes a kexec issue.
-
ATTN: Enable flush instruction cache bit in HID register
In P9, we have to enable "flush the instruction cache" bit along
with "attn instruction support" bit to trigger attention. -
capi: Enable channel tag streaming for PHB in CAPP mode
We re-enable channel tag streaming for PHB in CAPP mode as without
it PEC was waiting for cresp for each DMA write command before
sending a new DMA write command on the Powerbus. This resulted in
much lower DMA write performance than expected.The patch updates enable_capi_mode() to remove the masking of
channel_streaming_en bit in PBCQ Hardware Configuration Register.
Also does some re-factoring of the code that updates this register
to use xscom_write_mask instead of xscom_read followed by a
xscom_write. -
core/device.c: Fix dt_find_compatible_node
dt_find_compatible_node() and
dt_find_compatible_node_on_chip() are used to find device nodes
under a parent/root node with a given compatible property.dt_next(root, prev) is used to walk the child nodes of the given
parent and takes two arguments - root contains the parent node to
walk whilst prev contains the previous child to search from so that
it can be used as an iterator over all children nodes.The first iteration of dt_find_compatible_node(root, prev) calls
dt_next(root, root) which is not a well defined operation as prev
is assumed to be child of the root node. The result is that when a
node contains no children it will start returning the parent nodes
siblings until it hits the top of the tree at which point a NULL
derefence is attempted when looking for the root nodes parent.Dereferencing NULL can result in undesirable data exceptions during
system boot and untimely non-hilarious system crashes. dt_next()
should not be called with prev == root. Instead we add a check to
dt_next() such that passing prev = NULL will cause it to start
iterating from the first child node (if any). -
stb: Put correct label (for skiboot) into container
Hostboot will expect the label field of the stb header to contain
"PAYLOAD" for skiboot or it will fail to load and run skiboot.The failure looks something like this: :
53.40896|ISTEP 20. 1 - host_load_payload 53.65840|secure|Secureboot Failure plid = 0x90000755, rc = 0x1E07 53.65881|System shutting down with error status 0x1E07 53.67547|================================================ 53.67954|Error reported by secure (0x1E00) PLID 0x90000755 53.67560| Container's component ID does not match expected component ID 53.67561| ModuleId 0x09 SECUREBOOT::MOD_SECURE_VERIFY_COMPONENT 53.67845| ReasonCode 0x1e07 SECUREBOOT::RC_ROM_VERIFY 53.67998| UserData1 : 0x0000000000000000 53.67999| UserData2 : 0x0000000000000000 53.67999|------------------------------------------------ 53.68000| Callout type : Procedure Callout 53.68000| Procedure : EPUB_PRC_HB_CODE 53.68001| Priority : SRCI_PRIORITY_HIGH 53.68001|------------------------------------------------ 53.68002| Callout type : Procedure Callout 53.68003| Procedure : EPUB_PRC_FW_VERIFICATION_ERR 53.68003| Priority : SRCI_PRIORITY_HIGH 53.68004|------------------------------------------------
-
hw/occ: Fix fast-reboot crash in P8 platforms.
commit 85a1de3 ("fast-boot: occ: Re-parse the pstate table
during fast-boot" ) breaks the fast-reboot on P8 platforms while
reiniting the OCC pstates. On P8 platforms OPAL adds additional two
properties #address-cells and #size-cells under
ibm,opal/power-mgmt/ DT node. While in fast-reboot same properties
adding back to the same node results in Duplicate properties and
hence fast-reboot fails with below traces. :[ 541.410373292,5] OCC: All Chip Rdy after 0 ms [ 541.410488745,3] Duplicate property "#address-cells" in node /ibm,opal/power-mgt [ 541.410694290,0] Aborting! CPU 0058 Backtrace: S: 0000000031d639d0 R: 000000003001367c .backtrace+0x48 S: 0000000031d63a60 R: 000000003001a03c ._abort+0x4c S: 0000000031d63ae0 R: 00000000300267d8 .new_property+0xd8 S: 0000000031d63b70 R: 0000000030026a28 .__dt_add_property_cells+0x30 S: 0000000031d63c10 R: 000000003003ea3c .occ_pstates_init+0x984 S: 0000000031d63d90 R: 00000000300142d8 .load_and_boot_kernel+0x86c S: 0000000031d63e70 R: 000000003002586c .fast_reboot_entry+0x358 S: 0000000031d63f00 R: 00000000300029f4 fast_reset_entry+0x2c
This patch fixes this issue by removing these two properties on P8
while doing OCC pstates re-init in fast-reboot code path.
v5.10-rc2
skiboot-5.10-rc2
skiboot v5.10-rc2 was released on Friday February 9th 2018. It is the
second release candidate of skiboot 5.10, which will become the new
stable release of skiboot following the 5.9 release, first released
October 31st 2017.
skiboot v5.10-rc2 contains all bug fixes as of skiboot-5.9.8 and
skiboot-5.4.9 (the currently maintained stable releases). There may be
more 5.9.x stable releases, it will depend on demand.
For how the skiboot stable releases work, see stable-rules for details.
The current plan is to cut the final 5.10 in February, with skiboot 5.10
being for all POWER8 and POWER9 platforms in op-build v1.21. This
release will be targeted to early POWER9 systems.
Over skiboot-5.10-rc1, we have the following changes:
-
hw/npu2: Implement logging HMI actions
-
opal-prd: Fix FTBFS with -Werror=format-overflow
i2c.c fails to compile with gcc7 and -Werror=format-overflow used in
Debian Unstable and Ubuntu 18.04 : :i2c.c: In function ‘i2c_init’: i2c.c:211:15: error: ‘%s’ directive writing up to 255 bytes into a region of size 236 [-Werror=format-overflow=]
-
core/exception: beautify exception handler, add MCE-involved
registersPrint DSISR and DAR, to help with deciphering machine check
exceptions, and improve the output a bit, decode NIP symbol, improve
alignment, etc. Also print a specific header for machine check,
because we do expect to see these if there is a hardware failure.Before: :
[ 0.005968779,3] *********************************************** [ 0.005974102,3] Unexpected exception 200 ! [ 0.005978696,3] SRR0 : 000000003002ad80 SRR1 : 9000000000001000 [ 0.005985239,3] HSRR0: 00000000300027b4 HSRR1: 9000000030001000 [ 0.005991782,3] LR : 000000003002ad80 CTR : 0000000000000000 [ 0.005998130,3] CFAR : 00000000300b58bc [ 0.006002769,3] CR : 40000004 XER: 20000000 [ 0.006008069,3] GPR00: 000000003002ad80 GPR16: 0000000000000000 [ 0.006015170,3] GPR01: 0000000031c03bd0 GPR17: 0000000000000000 [...]
After: :
[ 0.003287941,3] *********************************************** [ 0.003561769,3] Fatal MCE at 000000003002ad80 .nvram_init+0x24 [ 0.003579628,3] CFAR : 00000000300b5964 [ 0.003584268,3] SRR0 : 000000003002ad80 SRR1 : 9000000000001000 [ 0.003590812,3] HSRR0: 00000000300027b4 HSRR1: 9000000030001000 [ 0.003597355,3] DSISR: 00000000 DAR : 0000000000000000 [ 0.003603480,3] LR : 000000003002ad68 CTR : 0000000030093d80 [ 0.003609930,3] CR : 40000004 XER : 20000000 [ 0.003615698,3] GPR00: 00000000300149e8 GPR16: 0000000000000000 [ 0.003622799,3] GPR01: 0000000031c03bc0 GPR17: 0000000000000000 [...]
-
core/init: manage MSR[ME] explicitly, always enable
The current boot sequence inherits MSR[ME] from the IPL firmware,
and never changes it. Some environments disable MSR[ME] (e.g.,
mambo), and others can enable it (hostboot).This has two problems. First, MSR[ME] must be disabled while in
the process of taking over the interrupt vector from the previous
environment. Second, after installing our machine check handler,
MSR[ME] should be enabled to get some useful output rather than a
checkstop. -
fast-reboot: occ: Re-parse the pstate table during fast-reboot
OCC shares the frequency list to host by copying the pstate table to
main memory in HOMER. This table is parsed during boot to create
device-tree properties for frequency and pstate IDs. OCC can update
the pstate table to present a new set of frequencies to the host.
But host will remain oblivious to these changes unless it is
re-inited with the updated device-tree CPU frequency properties. So
this patch allows to re-parse the pstate table and update the
device-tree properties during fast-reboot.OCC updates the pstate table when asked to do so using pstate-table
bias command. And this is mainly used by WOF team for
characterization purposes. -
fast-reboot: move pci_reset error handling into fast-reboot code
pci_reset() currently does a platform reboot if it fails. It should
not know about fast-reboot at this level, so instead have it return
an error, and the fast reboot caller will do the platform reboot.The code essentially does the same thing, but flexibility is
improved. Ideally the fast reboot code should perform pci_reset and
all such fail-able operations before the CPU resets itself and
destroys its own stack. That's not the case now, but that should be
the goal. -
capi: Fix the max tlbi divider and the directory size.
Switch to 512KB mode (directory size) as we don’t use bit 48 of the
tag in addressing the array. This mode is controlled by the Snoop
CAPI Configuration Register. Set the maximum of the number of data
polls received before signaling TLBI hang detect timer expired. The
value of '0000' is equal to 16. -
npu2/tce: Fix page size checking
The page size is encoded in the TVT data [59:63] as @shift+11 but
the tce_kill handler does not do the math right; this fixes it. -
stb: Enforce secure boot if called before libstb initialized
-
stb: Correctly error out when no PCR for resource
-
core/init: move imc catalog preload init after the STB init.
As a safer side move the imc catalog preload after the STB init to
make sure the imc catalog resource get's verified and measured
properly during loading when both secure and trusted boot modes are
on. -
libstb: fix failure of calling trusted measure without STB
initialization.When we load a flash resource during OPAL init, STB calls trusted
measure to measure the given resource. There is a situation when a
flash gets loaded before STB initialization then trusted measure
cannot measure properly.So this patch fixes this issue by calling trusted measure only if
the corresponding trusted init was done.The ideal fix is to make sure STB init done at the first place
during init and then do the loading of flash resources, by that way
STB can properly verify and measure the all resources. -
libstb: fix failure of calling cvc verify without STB
initialization.Currently in OPAL init time at various stages we are loading various
PNOR partition containers from the flash device. When we load a
flash resource STB calls the CVC verify and trusted measure(sha512)
functions. So when we have a flash resource gets loaded before STB
initialization, then cvc verify function fails to start the verify
and enforce the boot.Below is one of the example failure where our VERSION partition gets
loading early in the boot stage without STB initialization done.This is with secure mode off. STB: VERSION NOT VERIFIED, invalid
param. buf=0x305ed930, len=4096 key-hash=0x0 hash-size=0In the same code path when secure mode is on, the boot process will
abort.So this patch fixes this issue by calling cvc verify only if we have
STB init was done.And also we need a permanent fix in init path to ensure STB init
gets done at first place and then start loading all other flash
resources. -
libstb/tpm_chip: Add missing new line to print messages.
-
libstb: increase the log level of verify/measure messages to
PR_NOTICE.Currently libstb logs the verify and hash caluculation messages in
PR_INFO level. So when there is a secure boot enforcement happens
in loading last flash resource(Ex: BOOTKERNEL), the previous verify
and measure messages are not logged to console, which is not clear
to the end user which resource is verified and measured. So this
patch fixes this by increasing the log level to PR_NOTICE.
v5.10-rc1
skiboot-5.10-rc1
skiboot v5.10-rc1 was released on Tuesday February 6th 2018. It is the
first release candidate of skiboot 5.10, which will become the new
stable release of skiboot following the 5.9 release, first released
October 31st 2017.
skiboot v5.10-rc1 contains all bug fixes as of skiboot-5.9.8 and
skiboot-5.4.9 (the currently maintained stable releases). There may be
more 5.9.x stable releases, it will depend on demand.
For how the skiboot stable releases work, see stable-rules for details.
The current plan is to cut the final 5.10 in February, with skiboot 5.10
being for all POWER8 and POWER9 platforms in op-build v1.21. This
release will be targeted to early POWER9 systems.
Over skiboot-5.9, we have the following changes:
New Features
-
hdata: Parse IPL FW feature settings
Add parsing for the firmware feature flags in the HDAT. This
indicates the settings of various parameters which are set at IPL
time by firmware. -
opal/xstop: Use nvram option to enable/disable sw checkstop.
Add a mechanism to enable/disable sw checkstop by looking at nvram
option opal-sw-xstop=<enable/disable>.For now this patch disables the sw checkstop trigger unless
explicitly enabled through nvram option 'opal-sw-xstop=enable'i for
p9. This will allow an opportunity to get host kernel in panic path
or xmon for unrecoverable HMIs or MCE, to be able to debug the issue
effectively.To enable sw checkstop in opal issue following command: :
nvram -p ibm,skiboot --update-config opal-sw-xstop=enable
NOTE: This is a workaround patch to disable sw checkstop by
default to gain control in host kernel for better checkstop
debugging. Once we have most of the checkstop issues
stabilized/resolved, revisit this patch to enable sw checkstop by
default.For p8 platform it will remain enabled by default unless explicitly
disabled.To disable sw checkstop on p8 issue following command: :
nvram -p ibm,skiboot --update-config opal-sw-xstop=disable
-
hdata: Parse SPD data
Parse SPD data and populate device tree.
list of properties parsing from SPD: :
[root@ltc-wspoon dimm@d00f]# lsprop . memory-id 0000000c (12) # DIMM type product-version 00000032 (50) # Module Revision Code device_type "memory-dimm-ddr4" serial-number 15d9acb6 (366587062) status "okay" size 00004000 (16384) phandle 000000bd (189) ibm,loc-code "UOPWR.0000000-Node0-DIMM7" part-number "36ASF2G72PZ-2G6B2 " reg 0000d007 (53255) name "dimm" manufacturer-id 0000802c (32812) # Vendor ID, we can get vendor name from this ID
Also update documentation.
-
hdata: Add memory hierarchy under xscom node
We have memory to chip mapping but doesn't have complete memory
hierarchy. This patch adds memory hierarchy under xscom node. This
is specific to P9 system as these hierarchy may change between
processor generation.It uses memory controller ID details and populates nodes like:
: xscom@<addr>/mcbist@<mcbist_id>/mcs@<mcs_id>/mca@<mca_id>/dimm@<resource_id>
Also this patch adds few properties under dimm node. Finally make
sure xscom nodes created before calling memory_parse().
Fast Reboot and Quiesce
We have a preliminary fast reboot implementation for POWER9 systems,
which we look to enabling by default in the next release.
The OPAL Quiesce calls are designed to improve reliability and
debuggability around reboot and error conditions. See the full API
documentation for details: opal-quiesce.
-
fast-reboot: bare bones fast reboot implementation for POWER9
This is an initial fast reboot implementation for p9 which has only
been tested on the Witherspoon platform, and without the use of
NPUs, NX/VAS, etc.This has worked reasonably well so far, with no failures in about
100 reboots. It is hidden behind the traditional fast-reboot
experimental nvram option, until more platforms and configurations
are tested. -
fast-reboot: move boot CPU clean-up logically together with
secondariesMove the boot CPU clean-up and state transition to active, logically
together with secondaries. Don't release secondaries from fast
reboot hold until everyone has cleaned up and transitioned to
active.This is cosmetic, but it is helpful to run the fast reboot state
machine the same way on all CPUs. -
fast-reboot: improve failure error messages
Change existing failure error messages to PR_NOTICE so they get
printed to the console, and add some new ones. It's not a more
severe class because it falls back to IPL on failure. -
fast-reboot: quiesce opal before initiating a fast reboot
Switch fast reboot to use quiescing rather than "wait for a while".
If firmware can not be quiesced, then fast reboot is skipped. This
significantly improves the robustness of fast reboot in the face of
bugs or unexpected latencies.Complexity of synchronization in fast-reboot is reduced, because we
are guaranteed to be single-threaded when quiesce succeeds, so locks
can be removed.In the case that firmware can be quiesced, then it will generally
reduce fast reboot times by nearly 200ms, because quiescing usually
takes very little time. -
core: Add support for quiescing OPAL
Quiescing is ensuring all host controlled CPUs (except the current
one) are out of OPAL and prevented from entering. This can be use in
debug and shutdown paths, particularly with system reset sequences.This patch adds per-CPU entry and exit tracking for OPAL calls, and
adds logic to "hold" or "reject" at entry time, if OPAL is quiesced.An OPAL call is added, to expose the functionality to Linux, where
it can be used for shutdown, kexec, and before generating sreset
IPIs for debugging (so the debug code does not recurse into OPAL). -
dctl: p9 increase thread quiesce timeout
We require all instructions to be completed before a thread is
considered stopped, by the dctl interface. Long running instructions
like cache misses and CI loads may take a significant amount of time
to complete, and timeouts have been observed in stress testing.Increase the timeout significantly, to cover this. The workbook just
says to poll, but we like to have timeouts to avoid getting stuck in
firmware.
POWER9 power saving
There is much improved support for deeper sleep/idle (stop) states on
POWER9.
-
OCC: Increase max pstate check on P9 to 255
This has changed from P8, we can now have > 127 pstates.
This was observed on Boston during WoF bring up.
-
SLW: Add idle state stop5 for DD2.0 and above
Adding stop5 idle state with rough residency and latency numbers.
-
SLW: Add p9_stop_api calls for IMC
Add p9_stop_api for EVENT_MASK and PDBAR scoms. These scoms are
lost on wakeup from stop11. -
SCOM restore for DARN and XIVE
While waking up from stop11, we want NCU_DARN_BAR to have enable
bit set. Without this stop_api call, the value restored is without
enable bit set. We loose NCU_SPEC_BAR when the quad goes into
stop11, stop_api will restore while waking up from stop11. -
SLW: Call p9_stop_api only if deep_states are enabled
All init time p9_stop_api calls have been isolated to
slw_late_init. If p9_stop_api fails, then the deep states can be
excluded from device tree.For p9_stop_api called after device-tree for cpuidle is created ,
has_deep_states will be used to check if this call is even
required. -
Better handle errors in setting up sleep states (p9_stop_api)
We won't put affected stop states in the device tree if the wakeup
engine is not present or has failed. -
SCOM Restore: Increased the EQ SCOM restore limit.
Commit increases the SCOM restore limit from 16 to 31.
-
hw/dts: retry special wakeup operation if core still gated
It has been observed that in some cases the special wakeup operation
can "succeed" but the core is still in a gated/offline state.Check for this state after attempting to wakeup a core and retry the
wakeup if necessary. -
core/direct-controls: add function to read core gated state
-
core/direct-controls: wait for core special wkup bit cleared
When clearing special wakeup bit on a core, wait until the bit is
actually cleared by the hardware in the status register until
returning success.This may help avoid issues with back-to-back reads where the special
wakeup request is cleared but the firmware is still processing the
request and the next attempt to set the bit reads an immediate
success from the previous operation. -
p9_stop_api: PM: Added support for version control in SCOM restore
entries.-
adds version info in SCOM restore entry header
-
adds version specific details in SCOM restore entry header
-
retains old behaviour of SGPE Hcode's base version
-
-
p9_stop_api: EQ SCOM Restore: Introduced version control in SCOM
restore entry.- introduces version control in header of SCOM restore entry
- ensures backward compatibility
- introduces flexibility to handle any number of SCOM restore
entry.
Secure and Trusted Boot for POWER9
We introduce support for Secure and Trusted Boot for POWER9 systems,
with eq...