Skip to content
Snippets Groups Projects
  1. May 04, 2022
    • Vladimir Oltean's avatar
      selftests: ocelot: tc_flower_chains: streamline test output · 9fd6b1fa
      Vladimir Oltean authored
      
      Bring this driver-specific selftest output in line with the other
      selftests.
      
      Before:
      
      Testing VLAN pop..                      OK
      Testing VLAN push..                     OK
      Testing ingress VLAN modification..             OK
      Testing egress VLAN modification..              OK
      Testing frame prioritization..          OK
      
      After:
      
      TEST: VLAN pop                                                      [ OK ]
      TEST: VLAN push                                                     [ OK ]
      TEST: Ingress VLAN modification                                     [ OK ]
      TEST: Egress VLAN modification                                      [ OK ]
      TEST: Frame prioritization                                          [ OK ]
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      9fd6b1fa
    • Joachim Wiberg's avatar
      selftests: forwarding: multiple instances in tcpdump helper · 7da3896b
      Joachim Wiberg authored
      
      Extend tcpdump_start() & C:o to handle multiple instances.  Useful when
      observing bridge operation, e.g., unicast learning/flooding, and any
      case of multicast distribution (to these ports but not that one ...).
      
      This means the interface argument is now a mandatory argument to all
      tcpdump_*() functions, hence the changes to the ocelot flower test.
      
      Signed-off-by: default avatarJoachim Wiberg <troglobit@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 6182c5c5)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      7da3896b
    • Joachim Wiberg's avatar
      selftests: forwarding: add TCPDUMP_EXTRA_FLAGS to lib.sh · 32ef4ffc
      Joachim Wiberg authored
      
      For some use-cases we may want to change the tcpdump flags used in
      tcpdump_start().  For instance, observing interfaces without the PROMISC
      flag, e.g. to see what's really being forwarded to the bridge interface.
      
      Signed-off-by: default avatarJoachim Wiberg <troglobit@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit fe32dffd)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      32ef4ffc
    • Joachim Wiberg's avatar
      selftests: forwarding: new test, verify host mdb entries · 8e8abb28
      Joachim Wiberg authored
      
      Boiler plate for testing static mdb entries.  This first test verifies
      adding and removing host mdb entries for all supported types: IPv4,
      IPv6, and MAC multicast.
      
      Signed-off-by: default avatarJoachim Wiberg <troglobit@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      (cherry picked from commit 50fe062c)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      
      Conflicts in tools/testing/selftests/net/forwarding/Makefile with commit
      b2b681a4 ("selftests: forwarding: tests of locked port feature")
      which we did not backport.
      8e8abb28
    • Vladimir Oltean's avatar
      selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer · eebe32b7
      Vladimir Oltean authored
      As discussed here with Ido Schimmel:
      https://patchwork.kernel.org/project/netdevbpf/patch/20220224102908.5255-2-jianbol@nvidia.com/
      
      
      
      the default conform-exceed action is "reclassify", for a reason we don't
      really understand.
      
      The point is that hardware can't offload that police action, so not
      specifying "conform-exceed" was always wrong, even though the command
      used to work in hardware (but not in software) until the kernel started
      adding validation for it.
      
      Fix the command used by the selftest by making the policer drop on
      exceed, and pass the packet to the next action (goto) on conform.
      
      Fixes: 8cd6b020 ("selftests: ocelot: add some example VCAP IS1, IS2 and ES0 tc offloads")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      eebe32b7
    • Vladimir Oltean's avatar
      selftests: net: dsa: symlink the tc_actions.sh test · 431ecb26
      Vladimir Oltean authored
      
      This has been validated on the Ocelot/Felix switch family (NXP LS1028A)
      and should be relevant to any switch driver that offloads the tc-flower
      and/or tc-matchall actions trap, drop, accept, mirred, for which DSA has
      operations.
      
      TEST: gact drop and ok (skip_hw)                                    [ OK ]
      TEST: mirred egress flower redirect (skip_hw)                       [ OK ]
      TEST: mirred egress flower mirror (skip_hw)                         [ OK ]
      TEST: mirred egress matchall mirror (skip_hw)                       [ OK ]
      TEST: mirred_egress_to_ingress (skip_hw)                            [ OK ]
      TEST: gact drop and ok (skip_sw)                                    [ OK ]
      TEST: mirred egress flower redirect (skip_sw)                       [ OK ]
      TEST: mirred egress flower mirror (skip_sw)                         [ OK ]
      TEST: mirred egress matchall mirror (skip_sw)                       [ OK ]
      TEST: trap (skip_sw)                                                [ OK ]
      TEST: mirred_egress_to_ingress (skip_sw)                            [ OK ]
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      431ecb26
    • Vladimir Oltean's avatar
      selftests: forwarding: tc_actions: allow mirred egress test to run on non-offloaded h2 · 17ae891e
      Vladimir Oltean authored
      
      The host interfaces $h1 and $h2 don't have to be switchdev interfaces,
      but due to the fact that we pass $tcflags which may have the value of
      "skip_sw", we force $h2 to offload a drop rule for dst_ip, something
      which it may not be able to do.
      
      The selftest only wants to verify the hit count of this rule as a means
      of figuring out whether the packet was received, so remove the $tcflags
      for it and let it be done in software.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      17ae891e
    • Vladimir Oltean's avatar
      net: mscc: ocelot: offload tc action "ok" using an empty action vector · 4f25fe54
      Vladimir Oltean authored
      
      The "ok" tc action is useful when placed in front of a more generic
      filter to exclude some more specific rules from matching it.
      
      The ocelot switches can offload this tc action by creating an empty
      action vector (no _ENA fields set to 1). This makes sense for all of
      VCAP IS1, IS2 and ES0 (but not for PSFP).
      
      Add support for this action. Note that this makes the
      gact_drop_and_ok_test() selftest pass, where "action ok" is used in
      front of an "action drop" rule, both offloaded to VCAP IS2.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      4f25fe54
    • Vladimir Oltean's avatar
      net: mscc: ocelot: don't use magic numbers for OCELOT_POLICER_DISCARD · 8098d30b
      Vladimir Oltean authored
      
      OCELOT_POLICER_DISCARD helps "kill dropped packets dead" since a
      PERMIT/DENY mask mode with a port mask of 0 isn't enough to stop the CPU
      port from receiving packets removed from the forwarding path.
      
      The hardcoded initialization done for it in ocelot_vcap_init() is
      confusing. All we need from it is to have a rate and a burst size of 0.
      
      Reuse qos_policer_conf_set() for that purpose.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      8098d30b
    • Vladimir Oltean's avatar
      net: mscc: ocelot: drop port argument from qos_policer_conf_set · 2f10e06e
      Vladimir Oltean authored
      
      The "port" argument is used for nothing else except printing on the
      error path. Print errors on behalf of the policer index, which is less
      confusing anyway.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      2f10e06e
    • Vladimir Oltean's avatar
      net: mscc: ocelot: use list_for_each_entry in ocelot_vcap_filter_add_to_block · 28f509e2
      Vladimir Oltean authored
      
      Unify the code paths for adding to an empty list and to a list with
      elements by keeping a "pos" list_head element that indicates where to
      insert. Initialize "pos" with the list head itself in case
      list_for_each_entry() doesn't iterate over any element.
      
      Note that list_for_each_safe() isn't needed because no element is
      removed from the list while iterating.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      28f509e2
    • Vladimir Oltean's avatar
      net: mscc: ocelot: add to tail of empty list in ocelot_vcap_filter_add_to_block · 2bca9733
      Vladimir Oltean authored
      
      This makes no functional difference but helps in minimizing the delta
      for a future change.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      2bca9733
    • Vladimir Oltean's avatar
      net: mscc: ocelot: use list_add_tail in ocelot_vcap_filter_add_to_block() · 2cdc461e
      Vladimir Oltean authored
      
      list_add(..., pos->prev) and list_add_tail(..., pos) are equivalent, use
      the later form to unify with the case where the list is empty later.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      2cdc461e
    • Vladimir Oltean's avatar
      net: mscc: ocelot: avoid corrupting hardware counters when moving VCAP filters · 1fef87f4
      Vladimir Oltean authored
      
      Given the following order of operations:
      
      (1) we add filter A using tc-flower
      (2) we send a packet that matches it
      (3) we read the filter's statistics to find a hit count of 1
      (4) we add a second filter B with a higher preference than A, and A
          moves one position to the right to make room in the TCAM for it
      (5) we send another packet, and this matches the second filter B
      (6) we read the filter statistics again.
      
      When this happens, the hit count of filter A is 2 and of filter B is 1,
      despite a single packet having matched each filter.
      
      Furthermore, in an alternate history, reading the filter stats a second
      time between steps (3) and (4) makes the hit count of filter A remain at
      1 after step (6), as expected.
      
      The reason why this happens has to do with the filter->stats.pkts field,
      which is written to hardware through the call path below:
      
                     vcap_entry_set
                     /      |      \
                    /       |       \
                   /        |        \
                  /         |         \
      es0_entry_set   is1_entry_set   is2_entry_set
                  \         |         /
                   \        |        /
                    \       |       /
              vcap_data_set(data.counter, ...)
      
      The primary role of filter->stats.pkts is to transport the filter hit
      counters from the last readout all the way from vcap_entry_get() ->
      ocelot_vcap_filter_stats_update() -> ocelot_cls_flower_stats().
      The reason why vcap_entry_set() writes it to hardware is so that the
      counters (saturating and having a limited bit width) are cleared
      after each user space readout.
      
      The writing of filter->stats.pkts to hardware during the TCAM entry
      movement procedure is an unintentional consequence of the code design,
      because the hit count isn't up to date at this point.
      
      So at step (4), when filter A is moved by ocelot_vcap_filter_add() to
      make room for filter B, the hardware hit count is 0 (no packet matched
      on it in the meantime), but filter->stats.pkts is 1, because the last
      readout saw the earlier packet. The movement procedure programs the old
      hit count back to hardware, so this creates the impression to user space
      that more packets have been matched than they really were.
      
      The bug can be seen when running the gact_drop_and_ok_test() from the
      tc_actions.sh selftest.
      
      Fix the issue by reading back the hit count to tmp->stats.pkts before
      migrating the VCAP filter. Sure, this is a best-effort technique, since
      the packets that hit the rule between vcap_entry_get() and
      vcap_entry_set() won't be counted, but at least it allows the counters
      to be reliably used for selftests where the traffic is under control.
      
      The vcap_entry_get() name is a bit unintuitive, but it only reads back
      the counter portion of the TCAM entry, not the entire entry.
      
      The index from which we retrieve the counter is also a bit unintuitive
      (i - 1 during add, i + 1 during del), but this is the way in which TCAM
      entry movement works. The "entry index" isn't a stored integer for a
      TCAM filter, instead it is dynamically computed by
      ocelot_vcap_block_get_filter_index() based on the entry's position in
      the &block->rules list. That position (as well as block->count) is
      automatically updated by ocelot_vcap_filter_add_to_block() on add, and
      by ocelot_vcap_block_remove_filter() on del. So "i" is the new filter
      index, and "i - 1" or "i + 1" respectively are the old addresses of that
      TCAM entry (we only support installing/deleting one filter at a time).
      
      Fixes: b5962294 ("net: mscc: ocelot: Add support for tcam")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      1fef87f4
    • Vladimir Oltean's avatar
      net: mscc: ocelot: restrict tc-trap actions to VCAP IS2 lookup 0 · 63d51d42
      Vladimir Oltean authored
      
      Once the CPU port was added to the destination port mask of a packet, it
      can never be cleared, so even packets marked as dropped by the MASK_MODE
      of a VCAP IS2 filter will still reach it. This is why we need the
      OCELOT_POLICER_DISCARD to "kill dropped packets dead" and make software
      stop seeing them.
      
      We disallow policer rules from being put on any other chain than the one
      for the first lookup, but we don't do this for "drop" rules, although we
      should. This change is merely ascertaining that the rules dont't
      (completely) work and letting the user know.
      
      The blamed commit is the one that introduced the multi-chain architecture
      in ocelot. Prior to that, we should have always offloaded the filters to
      VCAP IS2 lookup 0, where they did work.
      
      Fixes: 1397a2eb ("net: mscc: ocelot: create TCAM skeleton from tc filter chains")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      63d51d42
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix VCAP IS2 filters matching on both lookups · ac15987c
      Vladimir Oltean authored
      
      The VCAP IS2 TCAM is looked up twice per packet, and each filter can be
      configured to only match during the first, second lookup, or both, or
      none.
      
      The blamed commit wrote the code for making VCAP IS2 filters match only
      on the given lookup. But right below that code, there was another line
      that explicitly made the lookup a "don't care", and this is overwriting
      the lookup we've selected. So the code had no effect.
      
      Some of the more noticeable effects of having filters match on both
      lookups:
      
      - in "tc -s filter show dev swp0 ingress", we see each packet matching a
        VCAP IS2 filter counted twice. This throws off scripts such as
        tools/testing/selftests/net/forwarding/tc_actions.sh and makes them
        fail.
      
      - a "tc-drop" action offloaded to VCAP IS2 needs a policer as well,
        because once the CPU port becomes a member of the destination port
        mask of a packet, nothing removes it, not even a PERMIT/DENY mask mode
        with a port mask of 0. But VCAP IS2 rules with the POLICE_ENA bit in
        the action vector can only appear in the first lookup. What happens
        when a filter matches both lookups is that the action vector is
        combined, and this makes the POLICE_ENA bit ineffective, since the
        last lookup in which it has appeared is the second one. In other
        words, "tc-drop" actions do not drop packets for the CPU port, dropped
        packets are still seen by software unless there was an FDB entry that
        directed those packets to some other place different from the CPU.
      
      The last bit used to work, because in the initial commit b5962294
      ("net: mscc: ocelot: Add support for tcam"), we were writing the FIRST
      field of the VCAP IS2 half key with a 1, not with a "don't care".
      The change to "don't care" was made inadvertently by me in commit
      c1c3993e ("net: mscc: ocelot: generalize existing code for VCAP"),
      which I just realized, and which needs a separate fix from this one,
      for "stable" kernels that lack the commit blamed below.
      
      Fixes: 226e9cd8 ("net: mscc: ocelot: only install TCAM entries into a specific lookup and PAG")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      ac15987c
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix last VCAP IS1/IS2 filter persisting in hardware when deleted · 0bd2abe6
      Vladimir Oltean authored
      
      ocelot_vcap_filter_del() works by moving the next filters over the
      current one, and then deleting the last filter by calling vcap_entry_set()
      with a del_filter which was specially created by memsetting its memory
      to zeroes. vcap_entry_set() then programs this to the TCAM and action
      RAM via the cache registers.
      
      The problem is that vcap_entry_set() is a dispatch function which looks
      at del_filter->block_id. But since del_filter is zeroized memory, the
      block_id is 0, or otherwise said, VCAP_ES0. So practically, what we do
      is delete the entry at the same TCAM index from VCAP ES0 instead of IS1
      or IS2.
      
      The code was not always like this. vcap_entry_set() used to simply be
      is2_entry_set(), and then, the logic used to work.
      
      Restore the functionality by populating the block_id of the del_filter
      based on the VCAP block of the filter that we're deleting. This makes
      vcap_entry_set() know what to do.
      
      Fixes: 1397a2eb ("net: mscc: ocelot: create TCAM skeleton from tc filter chains")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      0bd2abe6
    • Vladimir Oltean's avatar
      net: mscc: ocelot: avoid use after free with deleted tc-trap rules · 8146e2b6
      Vladimir Oltean authored
      
      The error path of ocelot_flower_parse() removes a VCAP filter from
      ocelot->traps, but the main deletion path - ocelot_vcap_filter_del() -
      does not.
      
      So functions such as felix_update_trapping_destinations() can still
      access the freed VCAP filter via ocelot->traps.
      
      Fix this bug by removing the filter from ocelot->traps when it gets
      deleted.
      
      Fixes: e42bd4ed ("net: mscc: ocelot: keep traps in a list")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      8146e2b6
    • Vladimir Oltean's avatar
      net: mscc: ocelot: don't use list_empty() on non-initialized list element · 0e9c8bdf
      Vladimir Oltean authored
      
      Since the blamed commit, VCAP filters can appear on more than one list.
      If their action is "trap", they are chained on ocelot->traps via
      filter->trap_list.
      
      Consequently, when we free a VCAP filter, we must remove it from all
      lists it is a member of, including ocelot->traps.
      
      Normally, conditionally removing an element from a list (depending on
      whether it is present or not) involves traversing the list, but we
      already have a reference to the element, so that isn't really necessary.
      Moreover, the operation "list_del(&filter->trap_list)" operation is
      fundamentally the same regardless of whether we've iterated through the
      list or just happened to have the element. So I thought it would be ok
      to check whether the element has been added to a list by calling
      list_empty().
      
      However, this does not do the correct thing. list_empty() checks whether
      "head->next == head", but in our case, head->next == head->prev == NULL,
      and head != NULL. This makes us proceed to call list_del(), which
      modifies the prev pointer of the next element, and the next pointer of
      the prev element. But the next and prev elements are NULL, so we
      dereference those pointers and die.
      
      It would appear that list_empty() is not the function to use to detect
      that condition. But if we had previously called INIT_LIST_HEAD() on
      &filter->trap_list, then we could use list_empty() to denote whether we
      are members of a list (any list).
      
      Although the more "natural" thing seems to be to iterate through
      ocelot->traps and only remove the filter from the list if it was a
      member of it, it seems pointless to do that.
      
      So fix the bug by calling INIT_LIST_HEAD() on the non-head element.
      
      Fixes: e42bd4ed ("net: mscc: ocelot: keep traps in a list")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      0e9c8bdf
    • Vladimir Oltean's avatar
      net: dsa: felix: add port mirroring support · f660af36
      Vladimir Oltean authored
      
      Gain support for port mirroring using tc-matchall by forwarding the
      calls to the ocelot switch library.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      (cherry picked from commit 5e497497)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      f660af36
    • Vladimir Oltean's avatar
      net: dsa: pass extack to dsa_switch_ops :: port_mirror_add() · b1e5ee28
      Vladimir Oltean authored
      
      Drivers might have error messages to propagate to user space, most
      common being that they support a single mirror port.
      
      Propagate the netlink extack so that they can inform user space in a
      verbal way of their limitations.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      (cherry picked from commit 0148bb50)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      b1e5ee28
    • Vladimir Oltean's avatar
      net: mscc: ocelot: offload per-flow mirroring using tc-mirred and VCAP IS2 · 87a59671
      Vladimir Oltean authored
      
      Per-flow mirroring with the VCAP IS2 TCAM (in itself handled as an
      offload for tc-flower) is done by setting the MIRROR_ENA bit from the
      action vector of the filter. The packet is mirrored to the port mask
      configured in the ANA:ANA:MIRRORPORTS register (the same port mask as
      the destinations for port-based mirroring).
      
      Functionality was tested with:
      
      tc qdisc add dev swp3 clsact
      tc filter add dev swp3 ingress protocol ip \
      	flower skip_sw ip_proto icmp \
      	action mirred egress mirror dev swp1
      
      and pinging through swp3, while seeing that the ICMP replies are
      mirrored towards swp1.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      (cherry picked from commit f2a0e216)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      87a59671
    • Vladimir Oltean's avatar
      net: mscc: ocelot: establish functions for handling VCAP aux resources · 904c5c9c
      Vladimir Oltean authored
      
      Some VCAP filters utilize resources which are global to the switch, like
      for example VCAP IS2 policers take an index into a global policer pool.
      
      In commit c9a7fe12 ("net: mscc: ocelot: add action of police on
      vcap_is2"), Xiaoliang expressed this by hooking into the low-level
      ocelot_vcap_filter_add_to_block() and ocelot_vcap_block_remove_filter()
      functions, and allocating/freeing the policers from there.
      
      Evaluating the code, there probably isn't a better place, but we'll need
      to do something similar for the mirror ports, and the code will start to
      look even more hacked up than it is right now.
      
      Create two ocelot_vcap_filter_{add,del}_aux_resources() functions to
      contain the madness, and pollute less the body of other functions such
      as ocelot_vcap_filter_add_to_block().
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      (cherry picked from commit c3d427ea)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      904c5c9c
    • Vladimir Oltean's avatar
      net: mscc: ocelot: add port mirroring support using tc-matchall · fb5dcc54
      Vladimir Oltean authored
      
      Ocelot switches perform port-based ingress mirroring if
      ANA:PORT:PORT_CFG field SRC_MIRROR_ENA is set, and egress mirroring if
      the port is in ANA:ANA:EMIRRORPORTS.
      
      Both ingress-mirrored and egress-mirrored frames are copied to the port
      mask from ANA:ANA:MIRRORPORTS.
      
      So the choice of limiting to a single mirror port via ocelot_mirror_get()
      and ocelot_mirror_put() may seem bizarre, but the hardware model doesn't
      map very well to the user space model. If the user wants to mirror the
      ingress of swp1 towards swp2 and the ingress of swp3 towards swp4, we'd
      have to program ANA:ANA:MIRRORPORTS with BIT(2) | BIT(4), and that would
      make swp1 be mirrored towards swp4 too, and swp3 towards swp2. But there
      are no tc-matchall rules to describe those actions.
      
      Now, we could offload a matchall rule with multiple mirred actions, one
      per desired mirror port, and force the user to stick to the multi-action
      rule format for subsequent matchall filters. But both DSA and ocelot
      have the flow_offload_has_one_action() check for the matchall offload,
      plus the fact that it will get cumbersome to cross-check matchall
      mirrors with flower mirrors (which will be added in the next patch).
      
      As a result, we limit the configuration to a single mirror port, with
      the possibility of lifting the restriction in the future.
      
      Frames injected from the CPU don't get egress-mirrored, since they are
      sent with the BYPASS bit in the injection frame header, and this
      bypasses the analyzer module (effectively also the mirroring logic).
      I don't know what to do/say about this.
      
      Functionality was tested with:
      
      tc qdisc add dev swp3 clsact
      tc filter add dev swp3 ingress \
      	matchall skip_sw \
      	action mirred egress mirror dev swp1
      
      and pinging through swp3, while seeing that the ICMP replies are
      mirrored towards swp1.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      (cherry picked from commit ccb6ed42)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      fb5dcc54
    • Vladimir Oltean's avatar
      net: mscc: ocelot: refactor policer work out of ocelot_setup_tc_cls_matchall · 99b7d2bc
      Vladimir Oltean authored
      
      In preparation for adding port mirroring support to the ocelot driver,
      the dispatching function ocelot_setup_tc_cls_matchall() must be free of
      action-specific code. Move port policer creation and deletion to
      separate functions.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      (cherry picked from commit 4fa72108)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      99b7d2bc
    • Jianbo Liu's avatar
      flow_offload: reject offload for all drivers with invalid police parameters · 967f518a
      Jianbo Liu authored
      
      As more police parameters are passed to flow_offload, driver can check
      them to make sure hardware handles packets in the way indicated by tc.
      The conform-exceed control should be drop/pipe or drop/ok. Besides,
      for drop/ok, the police should be the last action. As hardware can't
      configure peakrate/avrate/overhead, offload should not be supported if
      any of them is configured.
      
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit d97b4b10)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Conflicts in drivers/net/ethernet/netronome/nfp/flower/qos_conf.c which
      we did not update.
      967f518a
    • Jianbo Liu's avatar
      net: flow_offload: add tc police action parameters · 1b0edfb5
      Jianbo Liu authored
      
      The current police offload action entry is missing exceed/notexceed
      actions and parameters that can be configured by tc police action.
      Add the missing parameters as a pre-step for offloading police actions
      to hardware.
      
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit b8cd5831)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      1b0edfb5
    • Baowen Zheng's avatar
      flow_offload: allow user to offload tc action to net device · 043944aa
      Baowen Zheng authored
      
      Use flow_indr_dev_register/flow_indr_dev_setup_offload to
      offload tc action.
      
      We need to call tc_cleanup_flow_action to clean up tc action entry since
      in tc_setup_action, some actions may hold dev refcnt, especially the mirror
      action.
      
      Signed-off-by: default avatarBaowen Zheng <baowen.zheng@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 8cbfe939)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      043944aa
    • Baowen Zheng's avatar
      flow_offload: add ops to tc_action_ops for flow action setup · 1d996fc0
      Baowen Zheng authored
      
      Add a new ops to tc_action_ops for flow action setup.
      
      Refactor function tc_setup_flow_action to use this new ops.
      
      We make this change to facilitate to add standalone action module.
      
      We will also use this ops to offload action independent of filter
      in following patch.
      
      Signed-off-by: default avatarBaowen Zheng <baowen.zheng@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit c54e1d92)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      1d996fc0
    • Baowen Zheng's avatar
      flow_offload: rename offload functions with offload instead of flow · a1cc905a
      Baowen Zheng authored
      
      To improves readability, we rename offload functions with offload instead
      of flow.
      
      The term flow is related to exact matches, so we rename these functions
      with offload.
      
      We make this change to facilitate single action offload functions naming.
      
      Signed-off-by: default avatarBaowen Zheng <baowen.zheng@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 9c1c0e12)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      a1cc905a
    • Baowen Zheng's avatar
      flow_offload: add index to flow_action_entry structure · 7c7f4606
      Baowen Zheng authored
      
      Add index to flow_action_entry structure and delete index from police and
      gate child structure.
      
      We make this change to offload tc action for driver to identify a tc
      action.
      
      Signed-off-by: default avatarBaowen Zheng <baowen.zheng@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 5a995900)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      7c7f4606
    • Baowen Zheng's avatar
      flow_offload: reject to offload tc actions in offload drivers · 9669b395
      Baowen Zheng authored
      
      A follow-up patch will allow users to offload tc actions independent of
      classifier in the software datapath.
      
      In preparation for this, teach all drivers that support offload of the flow
      tables to reject such configuration as currently none of them support it.
      
      Signed-off-by: default avatarBaowen Zheng <baowen.zheng@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 144d4c9e)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      9669b395
    • Baowen Zheng's avatar
      flow_offload: fill flags to action structure · e12835db
      Baowen Zheng authored
      
      Fill flags to action structure to allow user control if
      the action should be offloaded to hardware or not.
      
      Signed-off-by: default avatarBaowen Zheng <baowen.zheng@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 40bd094d)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      e12835db
    • Vladimir Oltean's avatar
      selftests: forwarding: add basic QoS classification test for Ocelot switches · 80f4e8cd
      Vladimir Oltean authored
      
      Test basic (port-default, VLAN PCP and IP DSCP) QoS classification for
      Ocelot switches. Advanced QoS classification using tc filters is covered
      by tc_flower_chains.sh in the same directory.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      80f4e8cd
    • Amit Cohen's avatar
      selftests: lib.sh: Add PING_COUNT to allow sending configurable amount of packets · ab827e56
      Amit Cohen authored
      
      Currently `ping_do()` and `ping6_do()` send 10 packets.
      
      There are cases that it is not possible to catch only the interesting
      packets using tc rule, so then, it is possible to send many packets and
      verify that at least this amount of packets hit the rule.
      
      Add `PING_COUNT` variable, which is set to 10 by default, to allow tests
      sending more than 10 packets using the existing ping API.
      
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      (cherry picked from commit 0cd0b1f7)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      ab827e56
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix build error due to missing IEEE_8021QAZ_MAX_TCS · a0eb716d
      Vladimir Oltean authored
      
      IEEE_8021QAZ_MAX_TCS is defined in include/uapi/linux/dcbnl.h, which is
      included by net/dcbnl.h. Then, linux/netdevice.h conditionally includes
      net/dcbnl.h if CONFIG_DCB is enabled.
      
      Therefore, when CONFIG_DCB is disabled, this indirect dependency is
      broken.
      
      There isn't a good reason to include net/dcbnl.h headers into the ocelot
      switch library which exports low-level hardware API, so replace
      IEEE_8021QAZ_MAX_TCS with OCELOT_NUM_TC which has the same value.
      
      Fixes: 978777d0 ("net: dsa: felix: configure default-prio and dscp priorities")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220315131215.273450-1-vladimir.oltean@nxp.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      (cherry picked from commit 72f56fdb)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      a0eb716d
    • Vladimir Oltean's avatar
      net: dsa: felix: configure default-prio and dscp priorities · e8096bcf
      Vladimir Oltean authored
      
      Follow the established programming model for this driver and provide
      shims in the felix DSA driver which call the implementations from the
      ocelot switch lib. The ocelot switchdev driver wasn't integrated with
      dcbnl due to lack of hardware availability.
      
      The switch doesn't have any fancy QoS classification enabled by default.
      The provided getters will create a default-prio app table entry of 0,
      and no dscp entry. However, the getters have been made to actually
      retrieve the hardware configuration rather than static values, to be
      future proof in case DSA will need this information from more call paths.
      
      For default-prio, there is a single field per port, in ANA_PORT_QOS_CFG,
      called QOS_DEFAULT_VAL.
      
      DSCP classification is enabled per-port, again via ANA_PORT_QOS_CFG
      (field QOS_DSCP_ENA), and individual DSCP values are configured as
      trusted or not through register ANA_DSCP_CFG (replicated 64 times).
      An untrusted DSCP value falls back to other QoS classification methods.
      If trusted, the selected ANA_DSCP_CFG register also holds the QoS class
      in the QOS_DSCP_VAL field.
      
      The hardware also supports DSCP remapping (DSCP value X is translated to
      DSCP value Y before the QoS class is determined based on the app table
      entry for Y) and DSCP packet rewriting. The dcbnl framework, for being
      so flexible in other useless areas, doesn't appear to support this.
      So this functionality has been left out.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 978777d0)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      e8096bcf
    • Vladimir Oltean's avatar
      net: dsa: report and change port dscp priority using dcbnl · 3e1975ab
      Vladimir Oltean authored
      
      Similar to the port-based default priority, IEEE 802.1Q-2018 allows the
      Application Priority Table to define QoS classes (0 to 7) per IP DSCP
      value (0 to 63).
      
      In the absence of an app table entry for a packet with DSCP value X,
      QoS classification for that packet falls back to other methods (VLAN PCP
      or port-based default). The presence of an app table for DSCP value X
      with priority Y makes the hardware classify the packet to QoS class Y.
      
      As opposed to the default-prio where DSA exposes only a "set" in
      dsa_switch_ops (because the port-based default is the fallback, it
      always exists, either implicitly or explicitly), for DSCP priorities we
      expose an "add" and a "del". The addition of a DSCP entry means trusting
      that DSCP priority, the deletion means ignoring it.
      
      Drivers that already trust (at least some) DSCP values can describe
      their configuration in dsa_switch_ops :: port_get_dscp_prio(), which is
      called for each DSCP value from 0 to 63.
      
      Again, there can be more than one dcbnl app table entry for the same
      DSCP value, DSA chooses the one with the largest configured priority.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 47d75f78)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      3e1975ab
    • Vladimir Oltean's avatar
      net: dsa: report and change port default priority using dcbnl · 337e425b
      Vladimir Oltean authored
      The port-based default QoS class is assigned to packets that lack a
      VLAN PCP (or the port is configured to not trust the VLAN PCP),
      an IP DSCP (or the port is configured to not trust IP DSCP), and packets
      on which no tc-skbedit action has matched.
      
      Similar to other drivers, this can be exposed to user space using the
      DCB Application Priority Table. IEEE 802.1Q-2018 specifies in Table
      D-8 - Sel field values that when the Selector is 1, the Protocol ID
      value of 0 denotes the "Default application priority. For use when
      application priority is not otherwise specified."
      
      The way in which the dcbnl integration in DSA has been designed has to
      do with its requirements. Andrew Lunn explains that SOHO switches are
      expected to come with some sort of pre-configured QoS profile, and that
      it is desirable for this to come pre-loaded into the DSA slave interfaces'
      DCB application priority table.
      
      In the dcbnl design, this is possible because calls to dcb_ieee_setapp()
      can be initiated by anyone including being self-initiated by this device
      driver.
      
      However, what makes this challenging to implement in DSA is that the DSA
      core manages the net_devices (effectively hiding them from drivers),
      while drivers manage the hardware. The DSA core has no knowledge of what
      individual drivers' QoS policies are. DSA could export to drivers a
      wrapper over dcb_ieee_setapp() and these could call that function to
      pre-populate the app priority table, however drivers don't have a good
      moment in time to do this. The dsa_switch_ops :: setup() method gets
      called before the net_devices are created (dsa_slave_create), and so is
      dsa_switch_ops :: port_setup(). What remains is dsa_switch_ops ::
      port_enable(), but this gets called upon each ndo_open. If we add app
      table entries on every open, we'd need to remove them on close, to avoid
      duplicate entry errors. But if we delete app priority entries on close,
      what we delete may not be the initial, driver pre-populated entries, but
      rather user-added entries.
      
      So it is clear that letting drivers choose the timing of the
      dcb_ieee_setapp() call is inappropriate. The alternative which was
      chosen is to introduce hardware-specific ops in dsa_switch_ops, and
      effectively hide dcbnl details from drivers as well. For pre-populating
      the application table, dsa_slave_dcbnl_init() will call
      ds->ops->port_get_default_prio() which is supposed to read from
      hardware. If the operation succeeds, DSA creates a default-prio app
      table entry. The method is called as soon as the slave_dev is
      registered, but before we release the rtnl_mutex. This is done such that
      user space sees the app table entries as soon as it sees the interface
      being registered.
      
      The fact that we populate slave_dev->dcbnl_ops with a non-NULL pointer
      changes behavior in dcb_doit() from net/dcb/dcbnl.c, which used to
      return -EOPNOTSUPP for any dcbnl operation where netdev->dcbnl_ops is
      NULL. Because there are still dcbnl-unaware DSA drivers even if they
      have dcbnl_ops populated, the way to restore the behavior is to make all
      dcbnl_ops return -EOPNOTSUPP on absence of the hardware-specific
      dsa_switch_ops method.
      
      The dcbnl framework absurdly allows there to be more than one app table
      entry for the same selector and protocol (in other words, more than one
      port-based default priority). In the iproute2 dcb program, there is a
      "replace" syntactical sugar command which performs an "add" and a "del"
      to hide this away. But we choose the largest configured priority when we
      call ds->ops->port_set_default_prio(), using __fls(). When there is no
      default-prio app table entry left, the port-default priority is restored
      to 0.
      
      Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210113154139.1803705-2-olteanv@gmail.com/
      
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit d538eca8)
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      337e425b
    • Xiaoliang Yang's avatar
      net: dsa: felix: add tsn support for felix switch based on net/tsn · c515d87a
      Xiaoliang Yang authored
      
      VSC9959 has TSN capabilities on hardware. Using tsntool netlink
      interface to configure the following TSN features:
      - IEEE 802.1Qbv
      - IEEE 802.1Qbu/802.3br
      - IEEE 802.1Qci
      - IEEE 802.1Qav
      - IEEE 802.1CB
      
      This patch is based on netlink adaptation layer in net/tsn/*.
      Enable CONFIG_MSCC_FELIX_SWITCH_TSN config to add the TSN support.
      
      Signed-off-by: default avatarXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      c515d87a
Loading