Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue observed with Cilium L4 load balancer on BCM57416 #1

Open
vills opened this issue May 30, 2024 · 6 comments
Open

Issue observed with Cilium L4 load balancer on BCM57416 #1

vills opened this issue May 30, 2024 · 6 comments

Comments

@vills
Copy link

vills commented May 30, 2024

TL;DR

Hello!

Appreciate your work regarding reproducing the bug. I think i faced the same issue with the bnxt_en driver while using Cilium.

Did you report it somewhere or know a way to fix it?

Thanks.

Expected behavior

network card should work

Observed behavior

network card doesn't work as expected

Minimal working example

No response

Log output

No response

Additional information

No response

@vills vills added the bug Something isn't working label May 30, 2024
@aibor
Copy link
Collaborator

aibor commented May 30, 2024

Hi,

good to know we are not the only one facing the issue. Thanks for the info. :)

So, you ran into the issue by using the Cilium layer 4 load balancer, right? Which NIC did you use exactly, if I may ask?

We have been in contact with Broadcom since September. First, they stated that XDP was not fully supported by the NIC we use, but they planned to add full XDP support with firmware release 229 (released in March 2024). But the issue was still present with this release.

They are still investigating the issue and we don't have any information about when a fix can be expected. Also, we do not have any work around and just use other vendors NICs for now.

@vills
Copy link
Author

vills commented May 30, 2024

Yep. It's Cilium's L4 LB in my lab where i observe that issue. NIC info:

product: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
capabilities: pm vpd msix pciexpress bus_master cap_list rom ethernet physical tp 1000bt-fd 10000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=bnxt_en driverversion=5.15.0 duplex=full firmware=227.0.134.0/pkg 22.71.11.13 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s

I have not yet tried to reproduce issue with your code, but i will in following days. I also have other servers with Intel NICs and whey work without problems.

I think, i'll try to contact Broadcom as well. Maybe it will add some value to issue :-).

@aibor
Copy link
Collaborator

aibor commented May 30, 2024

Thanks for the info.

product: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller

Interesting. So, now we know about two different ASICs that are affected: 57414 and 57416.

I have not yet tried to reproduce issue with your code, but i will in following days. I also have other servers with Intel NICs and whey work without problems.

Great, please let me know if the reproducer fails on your NIC in the same way I observed it.

I think, i'll try to contact Broadcom as well. Maybe it will add some value to issue :-).

Nice, thanks. :)

@aibor aibor removed the bug Something isn't working label May 30, 2024
@aibor aibor changed the title Did you found fix for that issue? Issue observed with Cilium L4 load blancer on BCM57416 May 30, 2024
@aibor aibor changed the title Issue observed with Cilium L4 load blancer on BCM57416 Issue observed with Cilium L4 load balancer on BCM57416 May 30, 2024
@vills
Copy link
Author

vills commented Jun 14, 2024

We can confirm issue can be reproduced by your script with BCM57416 cards.

Tested in different firmwares. The latest available from our vendor is:

product: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
capabilities: pm vpd msix pciexpress bus_master cap_list rom ethernet physical tp 1000bt-fd 10000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=bnxt_en driverversion=6.1.0-21-amd64 duplex=full firmware=229.2.52.0/pkg 22.92.06.10 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s

@vills
Copy link
Author

vills commented Oct 22, 2024

Hi!

Did you managed to get fixed firmware from Broadcom? :-)

@aibor
Copy link
Collaborator

aibor commented Oct 22, 2024

Hi,

yes and the fixed version 231 is released by now. Unfortunately, the changes for the driver that fix this issue have not been merged into the Linux kernel tree yet. This means you need to use the out of tree bnxt_en driver provided by Broadcom. I still don't know what there plans are about merging it.

As of now just a DMA issue causing some log spam was merged. This only fixes it for XDP in receive handling, though. For tx (when redirecting to the NIC) it is still present (which is a separate issue on its own, I guess). Setting iommu to passthrough mode removes the log spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants