Commits · 9fd6b1fa95db0e90d9c1875c9fee6116e6b29dc2 · SECO Northern Europe / Kernel / linux-imx-kuk

May 04, 2022

selftests: ocelot: tc_flower_chains: streamline test output · 9fd6b1fa

Vladimir Oltean authored 2 years ago


Bring this driver-specific selftest output in line with the other
selftests.

Before:

Testing VLAN pop..                      OK
Testing VLAN push..                     OK
Testing ingress VLAN modification..             OK
Testing egress VLAN modification..              OK
Testing frame prioritization..          OK

After:

TEST: VLAN pop                                                      [ OK ]
TEST: VLAN push                                                     [ OK ]
TEST: Ingress VLAN modification                                     [ OK ]
TEST: Egress VLAN modification                                      [ OK ]
TEST: Frame prioritization                                          [ OK ]

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

9fd6b1fa

selftests: forwarding: multiple instances in tcpdump helper · 7da3896b

Joachim Wiberg authored 2 years ago


Extend tcpdump_start() & C:o to handle multiple instances.  Useful when
observing bridge operation, e.g., unicast learning/flooding, and any
case of multicast distribution (to these ports but not that one ...).

This means the interface argument is now a mandatory argument to all
tcpdump_*() functions, hence the changes to the ocelot flower test.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6182c5c5)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

7da3896b

selftests: forwarding: add TCPDUMP_EXTRA_FLAGS to lib.sh · 32ef4ffc

Joachim Wiberg authored 2 years ago


For some use-cases we may want to change the tcpdump flags used in
tcpdump_start().  For instance, observing interfaces without the PROMISC
flag, e.g. to see what's really being forwarded to the bridge interface.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fe32dffd)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

32ef4ffc

selftests: forwarding: new test, verify host mdb entries · 8e8abb28

Joachim Wiberg authored 2 years ago


Boiler plate for testing static mdb entries.  This first test verifies
adding and removing host mdb entries for all supported types: IPv4,
IPv6, and MAC multicast.

Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
(cherry picked from commit 50fe062c)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Conflicts in tools/testing/selftests/net/forwarding/Makefile with commit
b2b681a4 ("selftests: forwarding: tests of locked port feature")
which we did not backport.

8e8abb28

selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer · eebe32b7

Vladimir Oltean authored 2 years ago

As discussed here with Ido Schimmel:
https://patchwork.kernel.org/project/netdevbpf/patch/20220224102908.5255-2-jianbol@nvidia.com/



the default conform-exceed action is "reclassify", for a reason we don't
really understand.

The point is that hardware can't offload that police action, so not
specifying "conform-exceed" was always wrong, even though the command
used to work in hardware (but not in software) until the kernel started
adding validation for it.

Fix the command used by the selftest by making the policer drop on
exceed, and pass the packet to the next action (goto) on conform.

Fixes: 8cd6b020 ("selftests: ocelot: add some example VCAP IS1, IS2 and ES0 tc offloads")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>

eebe32b7

selftests: net: dsa: symlink the tc_actions.sh test · 431ecb26

Vladimir Oltean authored 2 years ago

This has been validated on the Ocelot/Felix switch family (NXP LS1028A)
and should be relevant to any switch driver that offloads the tc-flower
and/or tc-matchall actions trap, drop, accept, mirred, for which DSA has
operations.

TEST: gact drop and ok (skip_hw) [ OK ]
TEST: mirred egress flower redirect (skip_hw) [ OK ]
TEST: mirred egress flower mirror (skip_hw) [ OK ]
TEST: mirred egress matchall mirror (skip_hw) [ OK ]
TEST: mirred_egress_to_ingress (skip_hw) [ OK ]
TEST: gact drop and ok (skip_sw) [ OK ]
TEST: mirred egress flower redirect (skip_sw) [ OK ]
TEST: mirred egress flower mirror (skip_sw) [ OK ]
TEST: mirred egress matchall mirror (skip_sw) [ OK ]
TEST: trap (skip_sw) [ OK ]
TEST: mirred_egress_to_ingress (skip_sw) [ OK ]

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

431ecb26

selftests: forwarding: tc_actions: allow mirred egress test to run on non-offloaded h2 · 17ae891e

Vladimir Oltean authored 2 years ago


The host interfaces $h1 and $h2 don't have to be switchdev interfaces,
but due to the fact that we pass $tcflags which may have the value of
"skip_sw", we force $h2 to offload a drop rule for dst_ip, something
which it may not be able to do.

The selftest only wants to verify the hit count of this rule as a means
of figuring out whether the packet was received, so remove the $tcflags
for it and let it be done in software.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

17ae891e

net: mscc: ocelot: offload tc action "ok" using an empty action vector · 4f25fe54

Vladimir Oltean authored 2 years ago


The "ok" tc action is useful when placed in front of a more generic
filter to exclude some more specific rules from matching it.

The ocelot switches can offload this tc action by creating an empty
action vector (no _ENA fields set to 1). This makes sense for all of
VCAP IS1, IS2 and ES0 (but not for PSFP).

Add support for this action. Note that this makes the
gact_drop_and_ok_test() selftest pass, where "action ok" is used in
front of an "action drop" rule, both offloaded to VCAP IS2.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

4f25fe54

net: mscc: ocelot: don't use magic numbers for OCELOT_POLICER_DISCARD · 8098d30b

Vladimir Oltean authored 2 years ago

OCELOT_POLICER_DISCARD helps "kill dropped packets dead" since a
PERMIT/DENY mask mode with a port mask of 0 isn't enough to stop the CPU
port from receiving packets removed from the forwarding path.

The hardcoded initialization done for it in ocelot_vcap_init() is
confusing. All we need from it is to have a rate and a burst size of 0.

Reuse qos_policer_conf_set() for that purpose.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

8098d30b

net: mscc: ocelot: drop port argument from qos_policer_conf_set · 2f10e06e

Vladimir Oltean authored 2 years ago


The "port" argument is used for nothing else except printing on the
error path. Print errors on behalf of the policer index, which is less
confusing anyway.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

2f10e06e

net: mscc: ocelot: use list_for_each_entry in ocelot_vcap_filter_add_to_block · 28f509e2

Vladimir Oltean authored 2 years ago


Unify the code paths for adding to an empty list and to a list with
elements by keeping a "pos" list_head element that indicates where to
insert. Initialize "pos" with the list head itself in case
list_for_each_entry() doesn't iterate over any element.

Note that list_for_each_safe() isn't needed because no element is
removed from the list while iterating.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

28f509e2

net: mscc: ocelot: add to tail of empty list in ocelot_vcap_filter_add_to_block · 2bca9733

Vladimir Oltean authored 2 years ago


This makes no functional difference but helps in minimizing the delta
for a future change.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

2bca9733

net: mscc: ocelot: use list_add_tail in ocelot_vcap_filter_add_to_block() · 2cdc461e

Vladimir Oltean authored 2 years ago

list_add(..., pos->prev) and list_add_tail(..., pos) are equivalent, use
the later form to unify with the case where the list is empty later.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

2cdc461e

net: mscc: ocelot: avoid corrupting hardware counters when moving VCAP filters · 1fef87f4

Vladimir Oltean authored 2 years ago


Given the following order of operations:

(1) we add filter A using tc-flower
(2) we send a packet that matches it
(3) we read the filter's statistics to find a hit count of 1
(4) we add a second filter B with a higher preference than A, and A
    moves one position to the right to make room in the TCAM for it
(5) we send another packet, and this matches the second filter B
(6) we read the filter statistics again.

When this happens, the hit count of filter A is 2 and of filter B is 1,
despite a single packet having matched each filter.

Furthermore, in an alternate history, reading the filter stats a second
time between steps (3) and (4) makes the hit count of filter A remain at
1 after step (6), as expected.

The reason why this happens has to do with the filter->stats.pkts field,
which is written to hardware through the call path below:

               vcap_entry_set
               /      |      \
              /       |       \
             /        |        \
            /         |         \
es0_entry_set   is1_entry_set   is2_entry_set
            \         |         /
             \        |        /
              \       |       /
        vcap_data_set(data.counter, ...)

The primary role of filter->stats.pkts is to transport the filter hit
counters from the last readout all the way from vcap_entry_get() ->
ocelot_vcap_filter_stats_update() -> ocelot_cls_flower_stats().
The reason why vcap_entry_set() writes it to hardware is so that the
counters (saturating and having a limited bit width) are cleared
after each user space readout.

The writing of filter->stats.pkts to hardware during the TCAM entry
movement procedure is an unintentional consequence of the code design,
because the hit count isn't up to date at this point.

So at step (4), when filter A is moved by ocelot_vcap_filter_add() to
make room for filter B, the hardware hit count is 0 (no packet matched
on it in the meantime), but filter->stats.pkts is 1, because the last
readout saw the earlier packet. The movement procedure programs the old
hit count back to hardware, so this creates the impression to user space
that more packets have been matched than they really were.

The bug can be seen when running the gact_drop_and_ok_test() from the
tc_actions.sh selftest.

Fix the issue by reading back the hit count to tmp->stats.pkts before
migrating the VCAP filter. Sure, this is a best-effort technique, since
the packets that hit the rule between vcap_entry_get() and
vcap_entry_set() won't be counted, but at least it allows the counters
to be reliably used for selftests where the traffic is under control.

The vcap_entry_get() name is a bit unintuitive, but it only reads back
the counter portion of the TCAM entry, not the entire entry.

The index from which we retrieve the counter is also a bit unintuitive
(i - 1 during add, i + 1 during del), but this is the way in which TCAM
entry movement works. The "entry index" isn't a stored integer for a
TCAM filter, instead it is dynamically computed by
ocelot_vcap_block_get_filter_index() based on the entry's position in
the &block->rules list. That position (as well as block->count) is
automatically updated by ocelot_vcap_filter_add_to_block() on add, and
by ocelot_vcap_block_remove_filter() on del. So "i" is the new filter
index, and "i - 1" or "i + 1" respectively are the old addresses of that
TCAM entry (we only support installing/deleting one filter at a time).

Fixes: b5962294 ("net: mscc: ocelot: Add support for tcam")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

1fef87f4

net: mscc: ocelot: restrict tc-trap actions to VCAP IS2 lookup 0 · 63d51d42

Vladimir Oltean authored 2 years ago


Once the CPU port was added to the destination port mask of a packet, it
can never be cleared, so even packets marked as dropped by the MASK_MODE
of a VCAP IS2 filter will still reach it. This is why we need the
OCELOT_POLICER_DISCARD to "kill dropped packets dead" and make software
stop seeing them.

We disallow policer rules from being put on any other chain than the one
for the first lookup, but we don't do this for "drop" rules, although we
should. This change is merely ascertaining that the rules dont't
(completely) work and letting the user know.

The blamed commit is the one that introduced the multi-chain architecture
in ocelot. Prior to that, we should have always offloaded the filters to
VCAP IS2 lookup 0, where they did work.

Fixes: 1397a2eb ("net: mscc: ocelot: create TCAM skeleton from tc filter chains")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

63d51d42

net: mscc: ocelot: fix VCAP IS2 filters matching on both lookups · ac15987c

Vladimir Oltean authored 2 years ago

The VCAP IS2 TCAM is looked up twice per packet, and each filter can be
configured to only match during the first, second lookup, or both, or
none.

The blamed commit wrote the code for making VCAP IS2 filters match only
on the given lookup. But right below that code, there was another line
that explicitly made the lookup a "don't care", and this is overwriting
the lookup we've selected. So the code had no effect.

Some of the more noticeable effects of having filters match on both
lookups:

- in "tc -s filter show dev swp0 ingress", we see each packet matching a
VCAP IS2 filter counted twice. This throws off scripts such as
tools/testing/selftests/net/forwarding/tc_actions.sh and makes them
fail.

- a "tc-drop" action offloaded to VCAP IS2 needs a policer as well,
because once the CPU port becomes a member of the destination port
mask of a packet, nothing removes it, not even a PERMIT/DENY mask mode
with a port mask of 0. But VCAP IS2 rules with the POLICE_ENA bit in
the action vector can only appear in the first lookup. What happens
when a filter matches both lookups is that the action vector is
combined, and this makes the POLICE_ENA bit ineffective, since the
last lookup in which it has appeared is the second one. In other
words, "tc-drop" actions do not drop packets for the CPU port, dropped
packets are still seen by software unless there was an FDB entry that
directed those packets to some other place different from the CPU.

The last bit used to work, because in the initial commit b5962294
("net: mscc: ocelot: Add support for tcam"), we were writing the FIRST
field of the VCAP IS2 half key with a 1, not with a "don't care".
The change to "don't care" was made inadvertently by me in commit
c1c3993e ("net: mscc: ocelot: generalize existing code for VCAP"),
which I just realized, and which needs a separate fix from this one,
for "stable" kernels that lack the commit blamed below.

Fixes: 226e9cd8 ("net: mscc: ocelot: only install TCAM entries into a specific lookup and PAG")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

ac15987c

net: mscc: ocelot: fix last VCAP IS1/IS2 filter persisting in hardware when deleted · 0bd2abe6

Vladimir Oltean authored 2 years ago


ocelot_vcap_filter_del() works by moving the next filters over the
current one, and then deleting the last filter by calling vcap_entry_set()
with a del_filter which was specially created by memsetting its memory
to zeroes. vcap_entry_set() then programs this to the TCAM and action
RAM via the cache registers.

The problem is that vcap_entry_set() is a dispatch function which looks
at del_filter->block_id. But since del_filter is zeroized memory, the
block_id is 0, or otherwise said, VCAP_ES0. So practically, what we do
is delete the entry at the same TCAM index from VCAP ES0 instead of IS1
or IS2.

The code was not always like this. vcap_entry_set() used to simply be
is2_entry_set(), and then, the logic used to work.

Restore the functionality by populating the block_id of the del_filter
based on the VCAP block of the filter that we're deleting. This makes
vcap_entry_set() know what to do.

Fixes: 1397a2eb ("net: mscc: ocelot: create TCAM skeleton from tc filter chains")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

0bd2abe6

net: mscc: ocelot: avoid use after free with deleted tc-trap rules · 8146e2b6

Vladimir Oltean authored 2 years ago


The error path of ocelot_flower_parse() removes a VCAP filter from
ocelot->traps, but the main deletion path - ocelot_vcap_filter_del() -
does not.

So functions such as felix_update_trapping_destinations() can still
access the freed VCAP filter via ocelot->traps.

Fix this bug by removing the filter from ocelot->traps when it gets
deleted.

Fixes: e42bd4ed ("net: mscc: ocelot: keep traps in a list")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

8146e2b6

net: mscc: ocelot: don't use list_empty() on non-initialized list element · 0e9c8bdf

Vladimir Oltean authored 2 years ago


Since the blamed commit, VCAP filters can appear on more than one list.
If their action is "trap", they are chained on ocelot->traps via
filter->trap_list.

Consequently, when we free a VCAP filter, we must remove it from all
lists it is a member of, including ocelot->traps.

Normally, conditionally removing an element from a list (depending on
whether it is present or not) involves traversing the list, but we
already have a reference to the element, so that isn't really necessary.
Moreover, the operation "list_del(&filter->trap_list)" operation is
fundamentally the same regardless of whether we've iterated through the
list or just happened to have the element. So I thought it would be ok
to check whether the element has been added to a list by calling
list_empty().

However, this does not do the correct thing. list_empty() checks whether
"head->next == head", but in our case, head->next == head->prev == NULL,
and head != NULL. This makes us proceed to call list_del(), which
modifies the prev pointer of the next element, and the next pointer of
the prev element. But the next and prev elements are NULL, so we
dereference those pointers and die.

It would appear that list_empty() is not the function to use to detect
that condition. But if we had previously called INIT_LIST_HEAD() on
&filter->trap_list, then we could use list_empty() to denote whether we
are members of a list (any list).

Although the more "natural" thing seems to be to iterate through
ocelot->traps and only remove the filter from the list if it was a
member of it, it seems pointless to do that.

So fix the bug by calling INIT_LIST_HEAD() on the non-head element.

Fixes: e42bd4ed ("net: mscc: ocelot: keep traps in a list")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

0e9c8bdf

net: dsa: felix: add port mirroring support · f660af36

Vladimir Oltean authored 3 years ago


Gain support for port mirroring using tc-matchall by forwarding the
calls to the ocelot switch library.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 5e497497)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

f660af36

net: dsa: pass extack to dsa_switch_ops :: port_mirror_add() · b1e5ee28

Vladimir Oltean authored 3 years ago


Drivers might have error messages to propagate to user space, most
common being that they support a single mirror port.

Propagate the netlink extack so that they can inform user space in a
verbal way of their limitations.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 0148bb50)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

b1e5ee28

net: mscc: ocelot: offload per-flow mirroring using tc-mirred and VCAP IS2 · 87a59671

Vladimir Oltean authored 3 years ago


Per-flow mirroring with the VCAP IS2 TCAM (in itself handled as an
offload for tc-flower) is done by setting the MIRROR_ENA bit from the
action vector of the filter. The packet is mirrored to the port mask
configured in the ANA:ANA:MIRRORPORTS register (the same port mask as
the destinations for port-based mirroring).

Functionality was tested with:

tc qdisc add dev swp3 clsact
tc filter add dev swp3 ingress protocol ip \
	flower skip_sw ip_proto icmp \
	action mirred egress mirror dev swp1

and pinging through swp3, while seeing that the ICMP replies are
mirrored towards swp1.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit f2a0e216)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

87a59671

net: mscc: ocelot: establish functions for handling VCAP aux resources · 904c5c9c

Vladimir Oltean authored 3 years ago


Some VCAP filters utilize resources which are global to the switch, like
for example VCAP IS2 policers take an index into a global policer pool.

In commit c9a7fe12 ("net: mscc: ocelot: add action of police on
vcap_is2"), Xiaoliang expressed this by hooking into the low-level
ocelot_vcap_filter_add_to_block() and ocelot_vcap_block_remove_filter()
functions, and allocating/freeing the policers from there.

Evaluating the code, there probably isn't a better place, but we'll need
to do something similar for the mirror ports, and the code will start to
look even more hacked up than it is right now.

Create two ocelot_vcap_filter_{add,del}_aux_resources() functions to
contain the madness, and pollute less the body of other functions such
as ocelot_vcap_filter_add_to_block().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit c3d427ea)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

904c5c9c

net: mscc: ocelot: add port mirroring support using tc-matchall · fb5dcc54

Vladimir Oltean authored 3 years ago


Ocelot switches perform port-based ingress mirroring if
ANA:PORT:PORT_CFG field SRC_MIRROR_ENA is set, and egress mirroring if
the port is in ANA:ANA:EMIRRORPORTS.

Both ingress-mirrored and egress-mirrored frames are copied to the port
mask from ANA:ANA:MIRRORPORTS.

So the choice of limiting to a single mirror port via ocelot_mirror_get()
and ocelot_mirror_put() may seem bizarre, but the hardware model doesn't
map very well to the user space model. If the user wants to mirror the
ingress of swp1 towards swp2 and the ingress of swp3 towards swp4, we'd
have to program ANA:ANA:MIRRORPORTS with BIT(2) | BIT(4), and that would
make swp1 be mirrored towards swp4 too, and swp3 towards swp2. But there
are no tc-matchall rules to describe those actions.

Now, we could offload a matchall rule with multiple mirred actions, one
per desired mirror port, and force the user to stick to the multi-action
rule format for subsequent matchall filters. But both DSA and ocelot
have the flow_offload_has_one_action() check for the matchall offload,
plus the fact that it will get cumbersome to cross-check matchall
mirrors with flower mirrors (which will be added in the next patch).

As a result, we limit the configuration to a single mirror port, with
the possibility of lifting the restriction in the future.

Frames injected from the CPU don't get egress-mirrored, since they are
sent with the BYPASS bit in the injection frame header, and this
bypasses the analyzer module (effectively also the mirroring logic).
I don't know what to do/say about this.

Functionality was tested with:

tc qdisc add dev swp3 clsact
tc filter add dev swp3 ingress \
	matchall skip_sw \
	action mirred egress mirror dev swp1

and pinging through swp3, while seeing that the ICMP replies are
mirrored towards swp1.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit ccb6ed42)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

fb5dcc54

net: mscc: ocelot: refactor policer work out of ocelot_setup_tc_cls_matchall · 99b7d2bc

Vladimir Oltean authored 3 years ago


In preparation for adding port mirroring support to the ocelot driver,
the dispatching function ocelot_setup_tc_cls_matchall() must be free of
action-specific code. Move port policer creation and deletion to
separate functions.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 4fa72108)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

99b7d2bc

flow_offload: reject offload for all drivers with invalid police parameters · 967f518a

Jianbo Liu authored 3 years ago


As more police parameters are passed to flow_offload, driver can check
them to make sure hardware handles packets in the way indicated by tc.
The conform-exceed control should be drop/pipe or drop/ok. Besides,
for drop/ok, the police should be the last action. As hardware can't
configure peakrate/avrate/overhead, offload should not be supported if
any of them is configured.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d97b4b10)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Conflicts in drivers/net/ethernet/netronome/nfp/flower/qos_conf.c which
we did not update.

967f518a

net: flow_offload: add tc police action parameters · 1b0edfb5

Jianbo Liu authored 3 years ago


The current police offload action entry is missing exceed/notexceed
actions and parameters that can be configured by tc police action.
Add the missing parameters as a pre-step for offloading police actions
to hardware.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b8cd5831)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

1b0edfb5

flow_offload: allow user to offload tc action to net device · 043944aa

Baowen Zheng authored 3 years ago


Use flow_indr_dev_register/flow_indr_dev_setup_offload to
offload tc action.

We need to call tc_cleanup_flow_action to clean up tc action entry since
in tc_setup_action, some actions may hold dev refcnt, especially the mirror
action.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8cbfe939)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

043944aa

flow_offload: add ops to tc_action_ops for flow action setup · 1d996fc0

Baowen Zheng authored 3 years ago


Add a new ops to tc_action_ops for flow action setup.

Refactor function tc_setup_flow_action to use this new ops.

We make this change to facilitate to add standalone action module.

We will also use this ops to offload action independent of filter
in following patch.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c54e1d92)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

1d996fc0

flow_offload: rename offload functions with offload instead of flow · a1cc905a

Baowen Zheng authored 3 years ago


To improves readability, we rename offload functions with offload instead
of flow.

The term flow is related to exact matches, so we rename these functions
with offload.

We make this change to facilitate single action offload functions naming.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9c1c0e12)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

a1cc905a

flow_offload: add index to flow_action_entry structure · 7c7f4606

Baowen Zheng authored 3 years ago


Add index to flow_action_entry structure and delete index from police and
gate child structure.

We make this change to offload tc action for driver to identify a tc
action.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5a995900)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

7c7f4606

flow_offload: reject to offload tc actions in offload drivers · 9669b395

Baowen Zheng authored 3 years ago


A follow-up patch will allow users to offload tc actions independent of
classifier in the software datapath.

In preparation for this, teach all drivers that support offload of the flow
tables to reject such configuration as currently none of them support it.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 144d4c9e)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

9669b395

flow_offload: fill flags to action structure · e12835db

Baowen Zheng authored 3 years ago


Fill flags to action structure to allow user control if
the action should be offloaded to hardware or not.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 40bd094d)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

e12835db

selftests: forwarding: add basic QoS classification test for Ocelot switches · 80f4e8cd

Vladimir Oltean authored 2 years ago


Test basic (port-default, VLAN PCP and IP DSCP) QoS classification for
Ocelot switches. Advanced QoS classification using tc filters is covered
by tc_flower_chains.sh in the same directory.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

80f4e8cd

selftests: lib.sh: Add PING_COUNT to allow sending configurable amount of packets · ab827e56

Amit Cohen authored 3 years ago


Currently `ping_do()` and `ping6_do()` send 10 packets.

There are cases that it is not possible to catch only the interesting
packets using tc rule, so then, it is possible to send many packets and
verify that at least this amount of packets hit the rule.

Add `PING_COUNT` variable, which is set to 10 by default, to allow tests
sending more than 10 packets using the existing ping API.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 0cd0b1f7)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

ab827e56

net: mscc: ocelot: fix build error due to missing IEEE_8021QAZ_MAX_TCS · a0eb716d

Vladimir Oltean authored 3 years ago


IEEE_8021QAZ_MAX_TCS is defined in include/uapi/linux/dcbnl.h, which is
included by net/dcbnl.h. Then, linux/netdevice.h conditionally includes
net/dcbnl.h if CONFIG_DCB is enabled.

Therefore, when CONFIG_DCB is disabled, this indirect dependency is
broken.

There isn't a good reason to include net/dcbnl.h headers into the ocelot
switch library which exports low-level hardware API, so replace
IEEE_8021QAZ_MAX_TCS with OCELOT_NUM_TC which has the same value.

Fixes: 978777d0 ("net: dsa: felix: configure default-prio and dscp priorities")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220315131215.273450-1-vladimir.oltean@nxp.com


Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 72f56fdb)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

a0eb716d

net: dsa: felix: configure default-prio and dscp priorities · e8096bcf

Vladimir Oltean authored 3 years ago


Follow the established programming model for this driver and provide
shims in the felix DSA driver which call the implementations from the
ocelot switch lib. The ocelot switchdev driver wasn't integrated with
dcbnl due to lack of hardware availability.

The switch doesn't have any fancy QoS classification enabled by default.
The provided getters will create a default-prio app table entry of 0,
and no dscp entry. However, the getters have been made to actually
retrieve the hardware configuration rather than static values, to be
future proof in case DSA will need this information from more call paths.

For default-prio, there is a single field per port, in ANA_PORT_QOS_CFG,
called QOS_DEFAULT_VAL.

DSCP classification is enabled per-port, again via ANA_PORT_QOS_CFG
(field QOS_DSCP_ENA), and individual DSCP values are configured as
trusted or not through register ANA_DSCP_CFG (replicated 64 times).
An untrusted DSCP value falls back to other QoS classification methods.
If trusted, the selected ANA_DSCP_CFG register also holds the QoS class
in the QOS_DSCP_VAL field.

The hardware also supports DSCP remapping (DSCP value X is translated to
DSCP value Y before the QoS class is determined based on the app table
entry for Y) and DSCP packet rewriting. The dcbnl framework, for being
so flexible in other useless areas, doesn't appear to support this.
So this functionality has been left out.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 978777d0)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

e8096bcf

net: dsa: report and change port dscp priority using dcbnl · 3e1975ab

Vladimir Oltean authored 3 years ago


Similar to the port-based default priority, IEEE 802.1Q-2018 allows the
Application Priority Table to define QoS classes (0 to 7) per IP DSCP
value (0 to 63).

In the absence of an app table entry for a packet with DSCP value X,
QoS classification for that packet falls back to other methods (VLAN PCP
or port-based default). The presence of an app table for DSCP value X
with priority Y makes the hardware classify the packet to QoS class Y.

As opposed to the default-prio where DSA exposes only a "set" in
dsa_switch_ops (because the port-based default is the fallback, it
always exists, either implicitly or explicitly), for DSCP priorities we
expose an "add" and a "del". The addition of a DSCP entry means trusting
that DSCP priority, the deletion means ignoring it.

Drivers that already trust (at least some) DSCP values can describe
their configuration in dsa_switch_ops :: port_get_dscp_prio(), which is
called for each DSCP value from 0 to 63.

Again, there can be more than one dcbnl app table entry for the same
DSCP value, DSA chooses the one with the largest configured priority.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 47d75f78)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

3e1975ab

net: dsa: report and change port default priority using dcbnl · 337e425b

Vladimir Oltean authored 3 years ago

The port-based default QoS class is assigned to packets that lack a
VLAN PCP (or the port is configured to not trust the VLAN PCP),
an IP DSCP (or the port is configured to not trust IP DSCP), and packets
on which no tc-skbedit action has matched.

Similar to other drivers, this can be exposed to user space using the
DCB Application Priority Table. IEEE 802.1Q-2018 specifies in Table
D-8 - Sel field values that when the Selector is 1, the Protocol ID
value of 0 denotes the "Default application priority. For use when
application priority is not otherwise specified."

The way in which the dcbnl integration in DSA has been designed has to
do with its requirements. Andrew Lunn explains that SOHO switches are
expected to come with some sort of pre-configured QoS profile, and that
it is desirable for this to come pre-loaded into the DSA slave interfaces'
DCB application priority table.

In the dcbnl design, this is possible because calls to dcb_ieee_setapp()
can be initiated by anyone including being self-initiated by this device
driver.

However, what makes this challenging to implement in DSA is that the DSA
core manages the net_devices (effectively hiding them from drivers),
while drivers manage the hardware. The DSA core has no knowledge of what
individual drivers' QoS policies are. DSA could export to drivers a
wrapper over dcb_ieee_setapp() and these could call that function to
pre-populate the app priority table, however drivers don't have a good
moment in time to do this. The dsa_switch_ops :: setup() method gets
called before the net_devices are created (dsa_slave_create), and so is
dsa_switch_ops :: port_setup(). What remains is dsa_switch_ops ::
port_enable(), but this gets called upon each ndo_open. If we add app
table entries on every open, we'd need to remove them on close, to avoid
duplicate entry errors. But if we delete app priority entries on close,
what we delete may not be the initial, driver pre-populated entries, but
rather user-added entries.

So it is clear that letting drivers choose the timing of the
dcb_ieee_setapp() call is inappropriate. The alternative which was
chosen is to introduce hardware-specific ops in dsa_switch_ops, and
effectively hide dcbnl details from drivers as well. For pre-populating
the application table, dsa_slave_dcbnl_init() will call
ds->ops->port_get_default_prio() which is supposed to read from
hardware. If the operation succeeds, DSA creates a default-prio app
table entry. The method is called as soon as the slave_dev is
registered, but before we release the rtnl_mutex. This is done such that
user space sees the app table entries as soon as it sees the interface
being registered.

The fact that we populate slave_dev->dcbnl_ops with a non-NULL pointer
changes behavior in dcb_doit() from net/dcb/dcbnl.c, which used to
return -EOPNOTSUPP for any dcbnl operation where netdev->dcbnl_ops is
NULL. Because there are still dcbnl-unaware DSA drivers even if they
have dcbnl_ops populated, the way to restore the behavior is to make all
dcbnl_ops return -EOPNOTSUPP on absence of the hardware-specific
dsa_switch_ops method.

The dcbnl framework absurdly allows there to be more than one app table
entry for the same selector and protocol (in other words, more than one
port-based default priority). In the iproute2 dcb program, there is a
"replace" syntactical sugar command which performs an "add" and a "del"
to hide this away. But we choose the largest configured priority when we
call ds->ops->port_set_default_prio(), using __fls(). When there is no
default-prio app table entry left, the port-default priority is restored
to 0.

Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210113154139.1803705-2-olteanv@gmail.com/

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d538eca8)
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

337e425b

net: dsa: felix: add tsn support for felix switch based on net/tsn · c515d87a

Xiaoliang Yang authored 4 years ago


VSC9959 has TSN capabilities on hardware. Using tsntool netlink
interface to configure the following TSN features:
- IEEE 802.1Qbv
- IEEE 802.1Qbu/802.3br
- IEEE 802.1Qci
- IEEE 802.1Qav
- IEEE 802.1CB

This patch is based on netlink adaptation layer in net/tsn/*.
Enable CONFIG_MSCC_FELIX_SWITCH_TSN config to add the TSN support.

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

c515d87a