Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASoC: SOF: Intel: hda: fix null deref on system suspend entry #5085

Merged

Conversation

kv2019i
Copy link
Collaborator

@kv2019i kv2019i commented Jun 26, 2024

When system enters suspend with an active stream, SOF core calls hw_params_upon_resume(). On Intel platforms with HDA DMA used to manage the link DMA, this leads to call chain of

hda_dsp_set_hw_params_upon_resume()
-> hda_dsp_dais_suspend()
-> hda_dai_suspend()
-> hda_ipc4_post_trigger()

A bug is hit in hda_dai_suspend() as hda_link_dma_cleanup() is run first, which clears hext_stream->link_substream, and then hda_ipc4_post_trigger() is called with a NULL snd_pcm_substream pointer.

Fixes: 2b009fa ("ASoC: SOF: Intel: hda: Unify DAI drv ops for IPC3 and IPC4")
Link: #5080

When system enters suspend with an active stream, SOF core
calls hw_params_upon_resume(). On Intel platforms with HDA DMA used
to manage the link DMA, this leads to call chain of

   hda_dsp_set_hw_params_upon_resume()
 -> hda_dsp_dais_suspend()
 -> hda_dai_suspend()
 -> hda_ipc4_post_trigger()

A bug is hit in hda_dai_suspend() as hda_link_dma_cleanup() is run first,
which clears hext_stream->link_substream, and then hda_ipc4_post_trigger()
is called with a NULL snd_pcm_substream pointer.

Fixes: 2b009fa ("ASoC: SOF: Intel: hda: Unify DAI drv ops for IPC3 and IPC4")
Link: thesofproject#5080
Signed-off-by: Kai Vehmanen <[email protected]>
@kv2019i kv2019i requested a review from bardliao as a code owner June 26, 2024 11:40
@kv2019i kv2019i requested review from plbossart and ranj063 June 26, 2024 11:42
@plbossart
Copy link
Member

humm, this looks relatively similar to f710445
but this was to deal with the suspend while paused case.

But then the original code in has a comment that "This fixes the suspend while paused case for
IPC4 because previously we weren't resetting the pipeline when suspending
the system with some paused streams."

This suggests that this piece of code is indeed controversial.

the code we're trying to be consistent with does this

static int __maybe_unused hda_dai_trigger(struct snd_pcm_substream *substream, int cmd,
					  struct snd_soc_dai *dai)
{
...

	if (ops->post_trigger) {
		ret = ops->post_trigger(sdev, dai, substream, cmd);
		if (ret < 0)
			return ret;
	}

	switch (cmd) {
	case SNDRV_PCM_TRIGGER_SUSPEND:
		ret = hda_link_dma_cleanup(substream, hext_stream, dai);
		if (ret < 0) {
			dev_err(sdev->dev, "%s: failed to clean up link DMA\n", __func__);
			return ret;
		}
		break;
	default:
		break;
	}

	return 0;
}

so it's not clear why we used the reverse order to deal with the corner case of suspend while paused.

What's also not clear is why we end-up executing this code if the stream was active. In that case, the ALSA framework would send the TRIGGER_SUSPEND command and the link_substream would be freed.

So while the change looks ok, I am not clear on why we hit this problem in the first place.

@kv2019i
Copy link
Collaborator Author

kv2019i commented Jun 26, 2024

Also now found #5054 (update: and #5058 which replaces 5054) -- will dig a bit deeper why this code gets called in this scenario.

The test results for this PR are positive though. Not a full pass, but test run 43174 versus 43138 (using stable-v2.10-rc1 as fw) shows a big improvement in system-pm tests. Still failures, but I think some variant of this PR is needed to fix the "system suspend when audio stream is active" case.

@kv2019i
Copy link
Collaborator Author

kv2019i commented Jun 26, 2024

@plbossart I did more test runs and indeed the normal case is that SUSPEND trigger is called for the streams at suspend. When the NULL deref is hit, it seems the DAI is stopped for some reason just before suspend, but hw_free is not called on the dai.

[ 4500.726240] DEBUG: hda_dai_trigger() cmd=SNDRV_PCM_TRIGGER_START dai SSP0 Pin direction 0
[ 4500.736676] PM: suspend exit
[ 4505.732214] DEBUG: hda_dai_trigger() cmd=SNDRV_PCM_TRIGGER_STOP dai SSP0 Pin direction 0
[ 4507.075488] PM: suspend entry (s2idle)
[...]
# 
[ 4507.819102] DEBUG: hda_dai_suspend(), hext_stream->link_substream=0xc908d800
# null dereference is hit without this PR

But while this PR avoids the kernel crash, I don't think the driver is working correctly anymore here and the rootcause is probably somewhere else. I'll continue to debug this aspect tomorrow.

Merging this PR might make debugging easier in CI (given there is some timing aspect here)... without this PR, you need console access to the DUT.

UPDATE: a lot more details on the condition where this bug is hit: #5080 (comment)

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with merging this, the code as it is is not consistent despite the comment that it is.

But we MUST root-cause why this oops was hit. To some extent, this helped identify a larger problem before this code is run.

@bardliao can you review/approve/merge if you are ok?

@bardliao bardliao merged commit 7f5c1e8 into thesofproject:topic/sof-dev Jun 27, 2024
10 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants