Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V8 rebase+review changes #11

Open
wants to merge 62 commits into
base: v8_rebase
Choose a base branch
from
Open

Conversation

kubalewski
Copy link
Owner

No description provided.

kubalewski and others added 30 commits May 24, 2023 20:20
Add a protocol spec for DPLL.
Add code generated from the spec.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Michal Michalik <[email protected]>
Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Signed-off-by: Vadim Fedorenko <[email protected]>
DPLL framework is used to represent and configure DPLL devices
in systems. Each device that has DPLL and can configure sources
and outputs can use this framework. Netlink interface is used to
provide configuration data and to receive notification messages
about changes in the configuration or status of DPLL device.
Inputs and outputs of the DPLL device are represented as special
objects which could be dynamically added to and removed from DPLL
device.

Co-developed-by: Milena Olech <[email protected]>
Signed-off-by: Milena Olech <[email protected]>
Co-developed-by: Michal Michalik <[email protected]>
Signed-off-by: Michal Michalik <[email protected]>
Co-developed-by: Arkadiusz Kubalewski <[email protected]>
Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Signed-off-by: Vadim Fedorenko <[email protected]>
Add documentation explaining common netlink interface to configure DPLL
devices and monitoring events. Common way to implement DPLL device in
a driver is also covered.

Signed-off-by: Vadim Fedorenko <[email protected]>
Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Add firmware admin command to access clock generation unit
configuration, it is required to enable Extended PTP and SyncE features
in the driver.
Add definitions of possible hardware variations of input and output pins
related to clock generation unit and functions to access the data.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Control over clock generation unit is required for further development
of Synchronous Ethernet feature. Interface provides ability to obtain
current state of a dpll, its sources and outputs which are pins, and
allows their configuration.

Co-developed-by: Milena Olech <[email protected]>
Signed-off-by: Milena Olech <[email protected]>
Co-developed-by: Michal Michalik <[email protected]>
Signed-off-by: Michal Michalik <[email protected]>
Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Implement basic DPLL operations in ptp_ocp driver as the
simplest example of using new subsystem.

Signed-off-by: Vadim Fedorenko <[email protected]>
In case netdevice represents a SyncE port, the user needs to understand
the connection between netdevice and associated DPLL pin. There might me
multiple netdevices pointing to the same pin, in case of VF/SF
implementation.

Add a IFLA Netlink attribute to nest the DPLL pin handle, similar to
how is is implemented for devlink port. Add a struct dpll_pin pointer
to netdev and protect access to it by RTNL. Expose netdev_dpll_pin_set()
and netdev_dpll_pin_clear() helpers to the drivers so they can set/clear
the DPLL pin relationship to netdev.

Note that during the lifetime of struct dpll_pin the handle fields do not
change. Therefore it is save to access them lockless. It is drivers
responsibility to call netdev_dpll_pin_clear() before dpll_pin_put().

Signed-off-by: Jiri Pirko <[email protected]>
Implement SyncE support using newly introduced DPLL support.
Make sure that each PFs/VFs/SFs probed with appropriate capability
will spawn a dpll auxiliary device and register appropriate dpll device
and pin instances.

Signed-off-by: Jiri Pirko <[email protected]>
Policy defines max as length of enum list, if enum attribute is
defined with a value, generated netlink policy max value is wrong.
Calculate proper max value of policy for enums that are given a value.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Attributes are not used, remove them and start with a value given.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Remove unspec attributes after removing from dpll netlink spec.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Remove unspec attributes after removing from dpll netlink spec.

Fixes: 6348a82 ("ice: add admin commands to access cgu configuration")
Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Remove unspec attributes after removing from dpll netlink spec.

Fixes: bff85d9 ("ice: implement dpll interface to control cgu")
Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Add missing frequencies to the dpll ynl spec, so their defines are
properly generated.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
- remove "no holdover available" from freerun mode description
- improve description of holdover lock-status
- fix temperature typo

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
ice_dpll_pin flags field is intended to store flags that were received
after getting pin info from firmware, stop using it on set commands,
as the flags of set commands can have different meaning and values.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Use ice_dpll_pin's 'state' field to provide state to the caller,
as well as to check if state change requested shall be proceeded.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Remove attribute values of subset attributes, they are no longer needed
as ynl lib was fixed.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Move device 'nest' to the end of the dpll netlink attributes.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
'device' nested attribute shall be used only for receiving data
from pin-get do/dump commands. Use it this way and define list
of expected list of attributes for device-get which are not nested.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Previously the docs reffered to DPLL_MODE_FORCED mode, but its name was
changed to DPLL_MODE_MANUAL, fix it.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Previously temperature could be supplied with one digit float precision,
use divider value of 1000 instead of 10 and allow three digit float
precision for users.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Enum in subset attribute definition is redundant and not used for
anything in YNL spec, remove it.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Previously it was possible to have different pin-direction returned to
the user depending on which dpll was given as argument for pin-get cmd.
If pin was registered with multiple dplls each could have reported
different direction. I.e. driver exposes chained pins, where one pin is
an input for one dpll and an output for second dpll.
Fix it by enclosing pin-direction in the 'device' nested attribute.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Previously it was possible to have different pin-direction returned to
the user depending on which dpll was given as argument for pin-get cmd.
If pin was registered with multiple dplls each could have reported
different direction. I.e. driver exposes chained pins, where one pin is
an input for one dpll and an output for second dpll.
Fix it by enclosing pin-direction in the 'device' nested attribute.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
The header is already included in linux/dpll.h. Remove it.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Caller shall take care of checking if arguments are valid, remove it.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Propagate error return value of netlink callback mutex lock function to
the caller, a dpll subsystem.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Replace if statements with enum values to the switch case statements.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
In case of runtime errors the dev_err macro shall be used for printing
the traces.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
kubalewski added 24 commits May 24, 2023 20:20
Previously IS_ERR_OR_NULL was used worngly as the function cannot return
NULL, use IS_ERR instead.
In case of error, return original error value.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
When source pin changes previous source state notification shall be
also requested, request it.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Simplify code by locking dpll subsystem once and releasing all the
resources, instead of locking per each resource release.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Use proper way of setting bits for capabilities field.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Align function name with dpll counterpart (rename from
ice_dpll_register_pins to ice_dpll_init_pins), similarly old
ice_dpll_init_pins function renamed to ice_dpll_init_pins_info.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Change the clock_id function to return clock ID on the stack instead
of using an output variable, in order to simplify the function.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Remove the mode as it is not used by any of the in-kernel drivers
implementing the dpll subsystem.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Remove the mode as it is not used by any of the in-kernel drivers
implementing the dpll subsystem.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
DPLL_LOCK_STATUS_CALIBRATING is removed, when dpll is locked with
a source signal it shall return DPLL_LOCK_STATUS_LOCKED, if it is
capable of holdover with that signal it shall return it's state as:
DPLL_LOCK_STATUS_LOCKED_HO_ACQ

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
DPLL_LOCK_STATUS_CALIBRATING is removed, when dpll is locked with
a source signal it shall return DPLL_LOCK_STATUS_LOCKED, if it is
capable of holdover with that signal it shall return it's state as:
DPLL_LOCK_STATUS_LOCKED_HO_ACQ

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
DPLL_LOCK_STATUS_CALIBRATING was removed, return:
DPLL_LOCK_STATUS_LOCKED - if dpll was locked
DPLL_LOCK_STATUS_LOCKED_HO_ACQ - if dpll was locked and holdover is
acquired.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Propagate and use dpll_priv to obtain pf pointer in corresponding
functions.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Pointer is given back from dpll subsystem, it cannot null unless dpll
subsystem is broken.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Define notifications as commands as expected by yaml spec.
Add separated notification for pin events.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Align with new notification names.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Use separated notifications for dpll pins and devices.
Format the messages as corresponding get netlink commands.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Align with fixed notification scheme in dpll subsystem.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Align with fixed notification scheme in dpll subsystem.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Register a recoered clock pin of a pf in its netdevice struct.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Use full names of commands, attributes and values of dpll subsystem in
the documentation.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Rename enum attributes name from dplla to dpll_a.
Add name for dpll commands enum: dpll_cmd.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
The argument of notify functions shall be of 'enum dpll_cmd' type.
Fix it.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Fix implementation of error paths in ice_dpll_init.
Previously the implementation worked but error paths were not clear
for the developers. Fix by providing separated functions for each error
encountered during initialization.
Fix function descriptions.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Rename 'source' to 'input' as it is opposite of 'output', previous naming
'source' shall have 'sink' counterpart, but 'input'/'output' seems a
better fit for explaining the pins existence reason.
Fix function definition alignment.

Signed-off-by: Arkadiusz Kubalewski <[email protected]>
kubalewski pushed a commit that referenced this pull request Jun 5, 2023
Commit 349d03f ("crypto: s390 - add crypto library interface for
ChaCha20") added a library interface to the s390 specific ChaCha20
implementation. However no check was added to verify if the required
facilities are installed before branching into the assembler code.

If compiled into the kernel, this will lead to the following crash,
if vector instructions are not available:

data exception: 0007 ilc:3 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc7+ #11
Hardware name: IBM 3931 A01 704 (KVM/Linux)
Krnl PSW : 0704e00180000000 000000001857277a (chacha20_vx+0x32/0x818)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0000037f0000000a ffffffffffffff60 000000008184b000 0000000019f5c8e6
           0000000000000109 0000037fffb13c58 0000037fffb13c78 0000000019bb1780
           0000037fffb13c58 0000000019f5c8e6 000000008184b000 0000000000000109
           00000000802d8000 0000000000000109 0000000018571ebc 0000037fffb13718
Krnl Code: 000000001857276a: c07000b1f80b        larl    %r7,0000000019bb1780
           0000000018572770: a708000a            lhi     %r0,10
          #0000000018572774: e78950000c36        vlm     %v24,%v25,0(%r5),0
          >000000001857277a: e7a060000806        vl      %v26,0(%r6),0
           0000000018572780: e7bf70004c36        vlm     %v27,%v31,0(%r7),4
           0000000018572786: e70b00000456        vlr     %v0,%v27
           000000001857278c: e71800000456        vlr     %v1,%v24
           0000000018572792: e74b00000456        vlr     %v4,%v27
Call Trace:
 [<000000001857277a>] chacha20_vx+0x32/0x818
Last Breaking-Event-Address:
 [<0000000018571eb6>] chacha20_crypt_s390.constprop.0+0x6e/0xd8
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Fix this by adding a missing MACHINE_HAS_VX check.

Fixes: 349d03f ("crypto: s390 - add crypto library interface for ChaCha20")
Reported-by: Marc Hartmayer <[email protected]>
Cc: <[email protected]> # 5.19+
Reviewed-by: Harald Freudenberger <[email protected]>
[[email protected]: remove duplicates in commit message]
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>
kubalewski pushed a commit that referenced this pull request Jun 5, 2023
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 vvfedorenko#24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 vvfedorenko#25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 vvfedorenko#26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 vvfedorenko#27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
kubalewski pushed a commit that referenced this pull request Jun 19, 2023
Currently, the per cpu upcall counters are allocated after the vport is
created and inserted into the system. This could lead to the datapath
accessing the counters before they are allocated resulting in a kernel
Oops.

Here is an example:

  PID: 59693    TASK: ffff0005f4f51500  CPU: 0    COMMAND: "ovs-vswitchd"
   #0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4
   #1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc
   #2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60
   #3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58
   #4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388
   #5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c
   #6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68
   #7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch]
   ...

  PID: 58682    TASK: ffff0005b2f0bf00  CPU: 0    COMMAND: "kworker/0:3"
   #0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758
   #1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994
   #2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8
   #3 [ffff80000a5d3120] die at ffffb70f0628234c
   #4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8
   #5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4
   #6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4
   #7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710
   #8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74
   #9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac
  #10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24
  #11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc
  #12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch]
  #13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch]
  #14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch]
  #15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch]
  #16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch]
  #17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90

We moved the per cpu upcall counter allocation to the existing vport
alloc and free functions to solve this.

Fixes: 95637d9 ("net: openvswitch: release vport resources on failure")
Fixes: 1933ea3 ("net: openvswitch: Add support to count upcall packets")
Signed-off-by: Eelco Chaudron <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Aaron Conole <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
kubalewski pushed a commit that referenced this pull request Jul 28, 2023
Petr Machata says:

====================
mlxsw: Permit enslavement to netdevices with uppers

The mlxsw driver currently makes the assumption that the user applies
configuration in a bottom-up manner. Thus netdevices need to be added to
the bridge before IP addresses are configured on that bridge or SVI added
on top of it. Enslaving a netdevice to another netdevice that already has
uppers is in fact forbidden by mlxsw for this reason. Despite this safety,
it is rather easy to get into situations where the offloaded configuration
is just plain wrong.

As an example, take a front panel port, configure an IP address: it gets a
RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the
port from the bridge again, but the RIF never comes back. There is a number
of similar situations, where changing the configuration there and back
utterly breaks the offload.

Similarly, detaching a front panel port from a configured topology means
unoffloading of this whole topology -- VLAN uppers, next hops, etc.
Attaching the port back is then not permitted at all. If it were, it would
not result in a working configuration, because much of mlxsw is written to
react to changes in immediate configuration. There is nothing that would go
visit netdevices in the attached-to topology and offload existing routes
and VLAN memberships, for example.

In this patchset, introduce a number of replays to be invoked so that this
sort of post-hoc offload is supported. Then remove the vetoes that
disallowed enslavement of front panel ports to other netdevices with
uppers.

The patchset progresses as follows:

- In patch #1, fix an issue in the bridge driver. To my knowledge, the
  issue could not have resulted in a buggy behavior previously, and thus is
  packaged with this patchset instead of being sent separately to net.

- In patch #2, add a new helper to the switchdev code.

- In patch #3, drop mlxsw selftests that will not be relevant after this
  patchset anymore.

- Patches #4, #5, #6, #7 and #8 prepare the codebase for smoother
  introduction of the rest of the code.

- Patches #9, #10, #11, #12, #13 and #14 replay various aspects of upper
  configuration when a front panel port is introduced into a topology.
  Individual patches take care of bridge and LAG RIF memberships, switchdev
  replay, nexthop and neighbors replay, and MACVLAN offload.

- Patches #15 and #16 introduce RIFs for newly-relevant netdevices when a
  front panel port is enslaved (in which case all uppers are newly
  relevant), or, respectively, deslaved (in which case the newly-relevant
  netdevice is the one being deslaved).

- Up until this point, the introduced scaffolding was not really used,
  because mlxsw still forbids enslavement of mlxsw netdevices to uppers
  with uppers. In patch #17, this condition is finally relaxed.

A sizable selftest suite is available to test all this new code. That will
be sent in a separate patchset.
====================

Signed-off-by: David S. Miller <[email protected]>
kubalewski pushed a commit that referenced this pull request Aug 4, 2023
The cited commit holds encap tbl lock unconditionally when setting
up dests. But it may cause the following deadlock:

 PID: 1063722  TASK: ffffa062ca5d0000  CPU: 13   COMMAND: "handler8"
  #0 [ffffb14de05b7368] __schedule at ffffffffa1d5aa91
  #1 [ffffb14de05b7410] schedule at ffffffffa1d5afdb
  #2 [ffffb14de05b7430] schedule_preempt_disabled at ffffffffa1d5b528
  #3 [ffffb14de05b7440] __mutex_lock at ffffffffa1d5d6cb
  #4 [ffffb14de05b74e8] mutex_lock_nested at ffffffffa1d5ddeb
  #5 [ffffb14de05b74f8] mlx5e_tc_tun_encap_dests_set at ffffffffc12f2096 [mlx5_core]
  #6 [ffffb14de05b7568] post_process_attr at ffffffffc12d9fc5 [mlx5_core]
  #7 [ffffb14de05b75a0] mlx5e_tc_add_fdb_flow at ffffffffc12de877 [mlx5_core]
  #8 [ffffb14de05b75f0] __mlx5e_add_fdb_flow at ffffffffc12e0eef [mlx5_core]
  #9 [ffffb14de05b7660] mlx5e_tc_add_flow at ffffffffc12e12f7 [mlx5_core]
 #10 [ffffb14de05b76b8] mlx5e_configure_flower at ffffffffc12e1686 [mlx5_core]
 #11 [ffffb14de05b7720] mlx5e_rep_indr_offload at ffffffffc12e3817 [mlx5_core]
 #12 [ffffb14de05b7730] mlx5e_rep_indr_setup_tc_cb at ffffffffc12e388a [mlx5_core]
 #13 [ffffb14de05b7740] tc_setup_cb_add at ffffffffa1ab2ba8
 #14 [ffffb14de05b77a0] fl_hw_replace_filter at ffffffffc0bdec2f [cls_flower]
 #15 [ffffb14de05b7868] fl_change at ffffffffc0be6caa [cls_flower]
 #16 [ffffb14de05b7908] tc_new_tfilter at ffffffffa1ab71f0

[1031218.028143]  wait_for_completion+0x24/0x30
[1031218.028589]  mlx5e_update_route_decap_flows+0x9a/0x1e0 [mlx5_core]
[1031218.029256]  mlx5e_tc_fib_event_work+0x1ad/0x300 [mlx5_core]
[1031218.029885]  process_one_work+0x24e/0x510

Actually no need to hold encap tbl lock if there is no encap action.
Fix it by checking if encap action exists or not before holding
encap tbl lock.

Fixes: 37c3b9f ("net/mlx5e: Prevent encap offload when neigh update is running")
Signed-off-by: Chris Mi <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
kubalewski pushed a commit that referenced this pull request Aug 11, 2023
…inux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-08-07

1) Few cleanups

2) Dynamic completion EQs

The driver creates completion EQs for all vectors directly on driver
load, even if those EQs will not be utilized later on.

To allow more flexibility in managing completion EQs and to reduce the
memory overhead of driver load, this series will adjust completion EQs
creation to be dynamic. Meaning, completion EQs will be created only
when needed.

Patch #1 introduces a counter for tracking the current number of
completion EQs.
Patches #2-6 refactor the existing infrastructure of managing completion
EQs and completion IRQs to be compatible with per-vector
allocation/release requests.
Patches #7-8 modify the CPU-to-IRQ affinity calculation to be resilient
in case the affinity is requested but completion IRQ is not allocated yet.
Patch #9 function rename.
Patch #10 handles the corner case of SF performing an IRQ request when no
SF IRQ pool is found, and no PF IRQ exists for the same vector.
Patch #11 modify driver to use dynamically allocate completion EQs.

* tag 'mlx5-updates-2023-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Bridge, Only handle registered netdev bridge events
  net/mlx5: E-Switch, Remove redundant arg ignore_flow_lvl
  net/mlx5: Fix typo reminder -> remainder
  net/mlx5: remove many unnecessary NULL values
  net/mlx5: Allocate completion EQs dynamically
  net/mlx5: Handle SF IRQ request in the absence of SF IRQ pool
  net/mlx5: Rename mlx5_comp_vectors_count() to mlx5_comp_vectors_max()
  net/mlx5: Add IRQ vector to CPU lookup function
  net/mlx5: Introduce mlx5_cpumask_default_spread
  net/mlx5: Implement single completion EQ create/destroy methods
  net/mlx5: Use xarray to store and manage completion EQs
  net/mlx5: Refactor completion IRQ request/release handlers in EQ layer
  net/mlx5: Use xarray to store and manage completion IRQs
  net/mlx5: Refactor completion IRQ request/release API
  net/mlx5: Track the current number of completion EQs
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants