Skip to content
Snippets Groups Projects
  1. Mar 08, 2022
    • Randy Dunlap's avatar
      ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions · 85bf489c
      Randy Dunlap authored
      
      commit 7b83299e5b9385943a857d59e15cba270df20d7e upstream.
      
      early_param() handlers should return 0 on success.
      __setup() handlers should return 1 on success, i.e., the parameter
      has been handled. A return of 0 would cause the "option=value" string
      to be added to init's environment strings, polluting it.
      
      ../arch/arm/mm/mmu.c: In function 'test_early_cachepolicy':
      ../arch/arm/mm/mmu.c:215:1: error: no return statement in function returning non-void [-Werror=return-type]
      ../arch/arm/mm/mmu.c: In function 'test_noalign_setup':
      ../arch/arm/mm/mmu.c:221:1: error: no return statement in function returning non-void [-Werror=return-type]
      
      Fixes: b849a60e ("ARM: make cr_alignment read-only #ifndef CONFIG_CPU_CP15")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarIgor Zhbanov <i.zhbanov@omprussia.ru>
      Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: patches@armlinux.org.uk
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85bf489c
    • Russell King (Oracle)'s avatar
      ARM: Fix kgdb breakpoint for Thumb2 · 6b634104
      Russell King (Oracle) authored
      
      commit d920eaa4c4559f59be7b4c2d26fa0a2e1aaa3da9 upstream.
      
      The kgdb code needs to register an undef hook for the Thumb UDF
      instruction that will fault in order to be functional on Thumb2
      platforms.
      
      Reported-by: default avatarJohannes Stezenbach <js@sig21.net>
      Tested-by: default avatarJohannes Stezenbach <js@sig21.net>
      Fixes: 5cbad0eb ("kgdb: support for ARCH=arm")
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b634104
    • Corinna Vinschen's avatar
      igc: igc_read_phy_reg_gpy: drop premature return · fefe4cb4
      Corinna Vinschen authored
      
      commit fda2635466cd26ad237e1bc5d3f6a60f97ad09b6 upstream.
      
      igc_read_phy_reg_gpy checks the return value from igc_read_phy_reg_mdic
      and if it's not 0, returns immediately. By doing this, it leaves the HW
      semaphore in the acquired state.
      
      Drop this premature return statement, the function returns after
      releasing the semaphore immediately anyway.
      
      Fixes: 5586838f ("igc: Add code for PHY support")
      Signed-off-by: default avatarCorinna Vinschen <vinschen@redhat.com>
      Acked-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fefe4cb4
    • Brian Norris's avatar
      arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output · 0632854f
      Brian Norris authored
      
      commit b5fbaf7d779f5f02b7f75b080e7707222573be2a upstream.
      
      Commit b18c6c3c ("ASoC: rockchip: cdn-dp sound output use spdif")
      switched the platform to SPDIF, but we didn't fix up the device tree.
      
      Drop the pinctrl settings, because the 'spdif_bus' pins are either:
       * unused (on kevin, bob), so the settings is ~harmless
       * used by a different function (on scarlet), which causes probe
         failures (!!)
      
      Fixes: b18c6c3c ("ASoC: rockchip: cdn-dp sound output use spdif")
      Signed-off-by: default avatarBrian Norris <briannorris@chromium.org>
      Reviewed-by: default avatarChen-Yu Tsai <wenst@chromium.org>
      Link: https://lore.kernel.org/r/20220114150129.v2.1.I46f64b00508d9dff34abe1c3e8d2defdab4ea1e5@changeid
      
      
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0632854f
    • Vincent Mailhol's avatar
      can: gs_usb: change active_channels's type from atomic_t to u8 · 43eaf1b1
      Vincent Mailhol authored
      commit 035b0fcf02707d3c9c2890dc1484b11aa5335eb1 upstream.
      
      The driver uses an atomic_t variable: gs_usb:active_channels to keep
      track of the number of opened channels in order to only allocate
      memory for the URBs when this count changes from zero to one.
      
      However, the driver does not decrement the counter when an error
      occurs in gs_can_open(). This issue is fixed by changing the type from
      atomic_t to u8 and by simplifying the logic accordingly.
      
      It is safe to use an u8 here because the network stack big kernel lock
      (a.k.a. rtnl_mutex) is being hold. For details, please refer to [1].
      
      [1] https://lore.kernel.org/linux-can/CAMZ6Rq+sHpiw34ijPsmp7vbUpDtJwvVtdV7CvRZJsLixjAFfrg@mail.gmail.com/T/#t
      
      Fixes: d08e973a ("can: gs_usb: Added support for the GS_USB CAN devices")
      Link: https://lore.kernel.org/all/20220214234814.1321599-1-mailhol.vincent@wanadoo.fr
      
      
      Signed-off-by: default avatarVincent Mailhol <mailhol.vincent@wanadoo.fr>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43eaf1b1
    • Fabio Estevam's avatar
      ASoC: cs4265: Fix the duplicated control name · daaed6ce
      Fabio Estevam authored
      
      commit c5487b9cdea5c1ede38a7ec94db0fc59963c8e86 upstream.
      
      Currently, the following error messages are seen during boot:
      
      asoc-simple-card sound: control 2:0:0:SPDIF Switch:0 is already present
      cs4265 1-004f: ASoC: failed to add widget SPDIF dapm kcontrol SPDIF Switch: -16
      
      Quoting Mark Brown:
      
      "The driver is just plain buggy, it defines both a regular SPIDF Switch
      control and a SND_SOC_DAPM_SWITCH() called SPDIF both of which will
      create an identically named control, it can never have loaded without
      error.  One or both of those has to be renamed or they need to be
      merged into one thing."
      
      Fix the duplicated control name by combining the two SPDIF controls here
      and move the register bits onto the DAPM widget and have DAPM control them.
      
      Fixes: f853d6b3 ("ASoC: cs4265: Add a S/PDIF enable switch")
      Signed-off-by: default avatarFabio Estevam <festevam@denx.de>
      Acked-by: default avatarCharles Keepax <ckeepax@opensource.cirrus.com>
      Link: https://lore.kernel.org/r/20220215120514.1760628-1-festevam@gmail.com
      
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      daaed6ce
    • Alyssa Ross's avatar
      firmware: arm_scmi: Remove space in MODULE_ALIAS name · 8b8ac465
      Alyssa Ross authored
      commit 1ba603f56568c3b4c2542dfba07afa25f21dcff3 upstream.
      
      modprobe can't handle spaces in aliases. Get rid of it to fix the issue.
      
      Link: https://lore.kernel.org/r/20220211102704.128354-1-sudeep.holla@arm.com
      
      
      Fixes: aa4f886f ("firmware: arm_scmi: add basic driver infrastructure for SCMI")
      Reviewed-by: default avatarCristian Marussi <cristian.marussi@arm.com>
      Signed-off-by: default avatarAlyssa Ross <hi@alyssa.is>
      Signed-off-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b8ac465
    • Jann Horn's avatar
      efivars: Respect "block" flag in efivar_entry_set_safe() · 667df6fe
      Jann Horn authored
      
      commit 258dd902022cb10c83671176688074879517fd21 upstream.
      
      When the "block" flag is false, the old code would sometimes still call
      check_var_size(), which wrongly tells ->query_variable_store() that it can
      block.
      
      As far as I can tell, this can't really materialize as a bug at the moment,
      because ->query_variable_store only does something on X86 with generic EFI,
      and in that configuration we always take the efivar_entry_set_nonblocking()
      path.
      
      Fixes: ca0e30dc ("efi: Add nonblocking option to efi_query_variable_store()")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Link: https://lore.kernel.org/r/20220218180559.1432559-1-jannh@google.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      667df6fe
    • Maciej Fijalkowski's avatar
      ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc() · 283c37e5
      Maciej Fijalkowski authored
      
      commit 6c7273a266759d9d36f7c862149f248bcdeddc0f upstream.
      
      Commit c685c69f ("ixgbe: don't do any AF_XDP zero-copy transmit if
      netif is not OK") addressed the ring transient state when
      MEM_TYPE_XSK_BUFF_POOL was being configured which in turn caused the
      interface to through down/up. Maurice reported that when carrier is not
      ok and xsk_pool is present on ring pair, ksoftirqd will consume 100% CPU
      cycles due to the constant NAPI rescheduling as ixgbe_poll() states that
      there is still some work to be done.
      
      To fix this, do not set work_done to false for a !netif_carrier_ok().
      
      Fixes: c685c69f ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK")
      Reported-by: default avatarMaurice Baijens <maurice.baijens@ellips.com>
      Tested-by: default avatarMaurice Baijens <maurice.baijens@ellips.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      283c37e5
    • Zheyu Ma's avatar
      net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe() · 5f394102
      Zheyu Ma authored
      
      commit bd6f1fd5d33dfe5d1b4f2502d3694a7cc13f166d upstream.
      
      During driver initialization, the pointer of card info, i.e. the
      variable 'ci' is required. However, the definition of
      'com20020pci_id_table' reveals that this field is empty for some
      devices, which will cause null pointer dereference when initializing
      these devices.
      
      The following log reveals it:
      
      [    3.973806] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
      [    3.973819] RIP: 0010:com20020pci_probe+0x18d/0x13e0 [com20020_pci]
      [    3.975181] Call Trace:
      [    3.976208]  local_pci_probe+0x13f/0x210
      [    3.977248]  pci_device_probe+0x34c/0x6d0
      [    3.977255]  ? pci_uevent+0x470/0x470
      [    3.978265]  really_probe+0x24c/0x8d0
      [    3.978273]  __driver_probe_device+0x1b3/0x280
      [    3.979288]  driver_probe_device+0x50/0x370
      
      Fix this by checking whether the 'ci' is a null pointer first.
      
      Fixes: 8c14f9c7 ("ARCNET: add com20020 PCI IDs with metadata")
      Signed-off-by: default avatarZheyu Ma <zheyuma97@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f394102
    • Sukadev Bhattiprolu's avatar
      ibmvnic: register netdev after init of adapter · 92b79177
      Sukadev Bhattiprolu authored
      
      commit 570425f8c7c18b14fa8a2a58a0adb431968ad118 upstream.
      
      Finish initializing the adapter before registering netdev so state
      is consistent.
      
      Fixes: c26eba03 ("ibmvnic: Update reset infrastructure to support tunable parameters")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92b79177
    • Randy Dunlap's avatar
      net: sxgbe: fix return value of __setup handler · 6e0f9860
      Randy Dunlap authored
      
      commit 50e06ddceeea263f57fe92baa677c638ecd65bb6 upstream.
      
      __setup() handlers should return 1 on success, i.e., the parameter
      has been handled. A return of 0 causes the "option=value" string to be
      added to init's environment strings, polluting it.
      
      Fixes: acc18c14 ("net: sxgbe: add EEE(Energy Efficient Ethernet) for Samsung sxgbe")
      Fixes: 1edb9ca6 ("net: sxgbe: add basic framework for Samsung 10Gb ethernet driver")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarIgor Zhbanov <i.zhbanov@omprussia.ru>
      Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
      Cc: Siva Reddy <siva.kallam@samsung.com>
      Cc: Girish K S <ks.giri@samsung.com>
      Cc: Byungho An <bh74.an@samsung.com>
      Link: https://lore.kernel.org/r/20220224033528.24640-1-rdunlap@infradead.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e0f9860
    • Slawomir Laba's avatar
      iavf: Fix missing check for running netdev · e1a82db1
      Slawomir Laba authored
      
      commit d2c0f45fcceb0995f208c441d9c9a453623f9ccf upstream.
      
      The driver was queueing reset_task regardless of the netdev
      state.
      
      Do not queue the reset task in iavf_change_mtu if netdev
      is not running.
      
      Fixes: fdd4044f ("iavf: Remove timer for work triggering, use delaying work instead")
      Signed-off-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarPhani Burra <phani.r.burra@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1a82db1
    • Johannes Berg's avatar
      mac80211: treat some SAE auth steps as final · c9a066fe
      Johannes Berg authored
      
      commit 94d9864cc86f572f881db9b842a78e9d075493ae upstream.
      
      When we get anti-clogging token required (added by the commit
      mentioned below), or the other status codes added by the later
      commit 4e56cde1 ("mac80211: Handle special status codes in
      SAE commit") we currently just pretend (towards the internal
      state machine of authentication) that we didn't receive anything.
      
      This has the undesirable consequence of retransmitting the prior
      frame, which is not expected, because the timer is still armed.
      
      If we just disarm the timer at that point, it would result in
      the undesirable side effect of being in this state indefinitely
      if userspace crashes, or so.
      
      So to fix this, reset the timer and set a new auth_data->waiting
      in order to have no more retransmissions, but to have the data
      destroyed when the timer actually fires, which will only happen
      if userspace didn't continue (i.e. crashed or abandoned it.)
      
      Fixes: a4055e74 ("mac80211: Don't destroy auth data in case of anti-clogging")
      Reported-by: default avatarJouni Malinen <j@w1.fi>
      Link: https://lore.kernel.org/r/20220224103932.75964e1d7932.Ia487f91556f29daae734bf61f8181404642e1eec@changeid
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9a066fe
    • Randy Dunlap's avatar
      net: stmmac: fix return value of __setup handler · e6d7f57f
      Randy Dunlap authored
      
      commit e01b042e580f1fbf4fd8da467442451da00c7a90 upstream.
      
      __setup() handlers should return 1 on success, i.e., the parameter
      has been handled. A return of 0 causes the "option=value" string to be
      added to init's environment strings, polluting it.
      
      Fixes: 47dd7a54 ("net: add support for STMicroelectronics Ethernet controllers.")
      Fixes: f3240e28 ("stmmac: remove warning when compile as built-in (V2)")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarIgor Zhbanov <i.zhbanov@omprussia.ru>
      Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
      Cc: Jose Abreu <joabreu@synopsys.com>
      Link: https://lore.kernel.org/r/20220224033536.25056-1-rdunlap@infradead.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6d7f57f
    • Nicolas Escande's avatar
      mac80211: fix forwarded mesh frames AC & queue selection · fa65989a
      Nicolas Escande authored
      
      commit 859ae7018316daa4adbc496012dcbbb458d7e510 upstream.
      
      There are two problems with the current code that have been highlighted
      with the AQL feature that is now enbaled by default.
      
      First problem is in ieee80211_rx_h_mesh_fwding(),
      ieee80211_select_queue_80211() is used on received packets to choose
      the sending AC queue of the forwarding packet although this function
      should only be called on TX packet (it uses ieee80211_tx_info).
      This ends with forwarded mesh packets been sent on unrelated random AC
      queue. To fix that, AC queue can directly be infered from skb->priority
      which has been extracted from QOS info (see ieee80211_parse_qos()).
      
      Second problem is the value of queue_mapping set on forwarded mesh
      frames via skb_set_queue_mapping() is not the AC of the packet but a
      hardware queue index. This may or may not work depending on AC to HW
      queue mapping which is driver specific.
      
      Both of these issues lead to improper AC selection while forwarding
      mesh packets but more importantly due to improper airtime accounting
      (which is done on a per STA, per AC basis) caused traffic stall with
      the introduction of AQL.
      
      Fixes: cf440128 ("mac80211: fix unnecessary frame drops in mesh fwding")
      Fixes: d3c1597b ("mac80211: fix forwarded mesh frame queue mapping")
      Co-developed-by: default avatarRemi Pommarel <repk@triplefau.lt>
      Signed-off-by: default avatarRemi Pommarel <repk@triplefau.lt>
      Signed-off-by: default avatarNicolas Escande <nico.escande@gmail.com>
      Link: https://lore.kernel.org/r/20220214173214.368862-1-nico.escande@gmail.com
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa65989a
    • Valentin Schneider's avatar
      ia64: ensure proper NUMA distance and possible map initialization · dcc3423c
      Valentin Schneider authored
      commit b22a8f7b upstream.
      
      John Paul reported a warning about bogus NUMA distance values spurred by
      commit:
      
        620a6dc4 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
      
      In this case, the afflicted machine comes up with a reported 256 possible
      nodes, all of which are 0 distance away from one another.  This was
      previously silently ignored, but is now caught by the aforementioned
      commit.
      
      The culprit is ia64's node_possible_map which remains unchanged from its
      initialization value of NODE_MASK_ALL.  In John's case, the machine
      doesn't have any SRAT nor SLIT table, but AIUI the possible map remains
      untouched regardless of what ACPI tables end up being parsed.  Thus,
      !online && possible nodes remain with a bogus distance of 0 (distances \in
      [0, 9] are "reserved and have no meaning" as per the ACPI spec).
      
      Follow x86 / drivers/base/arch_numa's example and set the possible map to
      the parsed map, which in this case seems to be the online map.
      
      Link: http://lore.kernel.org/r/255d6b5d-194e-eb0e-ecdd-97477a534441@physik.fu-berlin.de
      Link: https://lkml.kernel.org/r/20210318130617.896309-1-valentin.schneider@arm.com
      
      
      Fixes: 620a6dc4 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Reported-by: default avatarJohn Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Tested-by: default avatarJohn Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Tested-by: default avatarSergei Trofimovich <slyfox@gentoo.org>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Anatoly Pugachev <matorola@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatardann frazier <dann.frazier@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dcc3423c
    • Dietmar Eggemann's avatar
      sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa() · 1312ef5a
      Dietmar Eggemann authored
      
      commit 71e5f664 upstream.
      
      Commit "sched/topology: Make sched_init_numa() use a set for the
      deduplicating sort" allocates 'i + nr_levels (level)' instead of
      'i + nr_levels + 1' sched_domain_topology_level.
      
      This led to an Oops (on Arm64 juno with CONFIG_SCHED_DEBUG):
      
      sched_init_domains
        build_sched_domains()
          __free_domain_allocs()
            __sdt_free() {
      	...
              for_each_sd_topology(tl)
      	  ...
                sd = *per_cpu_ptr(sdd->sd, j); <--
      	  ...
            }
      
      Signed-off-by: default avatarDietmar Eggemann <dietmar.eggemann@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Tested-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Tested-by: default avatarBarry Song <song.bao.hua@hisilicon.com>
      Link: https://lkml.kernel.org/r/6000e39e-7d28-c360-9cd6-8798fd22a9bf@arm.com
      
      
      Signed-off-by: default avatardann frazier <dann.frazier@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1312ef5a
    • Valentin Schneider's avatar
      sched/topology: Make sched_init_numa() use a set for the deduplicating sort · d753aecb
      Valentin Schneider authored
      
      commit 620a6dc4 upstream.
      
      The deduplicating sort in sched_init_numa() assumes that the first line in
      the distance table contains all unique values in the entire table. I've
      been trying to pen what this exactly means for the topology, but it's not
      straightforward. For instance, topology.c uses this example:
      
        node   0   1   2   3
          0:  10  20  20  30
          1:  20  10  20  20
          2:  20  20  10  20
          3:  30  20  20  10
      
        0 ----- 1
        |     / |
        |   /   |
        | /     |
        2 ----- 3
      
      Which works out just fine. However, if we swap nodes 0 and 1:
      
        1 ----- 0
        |     / |
        |   /   |
        | /     |
        2 ----- 3
      
      we get this distance table:
      
        node   0  1  2  3
          0:  10 20 20 20
          1:  20 10 20 30
          2:  20 20 10 20
          3:  20 30 20 10
      
      Which breaks the deduplicating sort (non-representative first line). In
      this case this would just be a renumbering exercise, but it so happens that
      we can have a deduplicating sort that goes through the whole table in O(n²)
      at the extra cost of a temporary memory allocation (i.e. any form of set).
      
      The ACPI spec (SLIT) mentions distances are encoded on 8 bits. Following
      this, implement the set as a 256-bits bitmap. Should this not be
      satisfactory (i.e. we want to support 32-bit values), then we'll have to go
      for some other sparse set implementation.
      
      This has the added benefit of letting us allocate just the right amount of
      memory for sched_domains_numa_distance[], rather than an arbitrary
      (nr_node_ids + 1).
      
      Note: DT binding equivalent (distance-map) decodes distances as 32-bit
      values.
      
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210122123943.1217-2-valentin.schneider@arm.com
      
      
      Signed-off-by: default avatardann frazier <dann.frazier@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d753aecb
    • Jacob Keller's avatar
      ice: fix concurrent reset and removal of VFs · 05ae1f0f
      Jacob Keller authored
      
      commit fadead80fe4c033b5e514fcbadd20b55c4494112 upstream.
      
      Commit c503e632 ("ice: Stop processing VF messages during teardown")
      introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is
      intended to prevent some issues with concurrently handling messages from
      VFs while tearing down the VFs.
      
      This change was motivated by crashes caused while tearing down and
      bringing up VFs in rapid succession.
      
      It turns out that the fix actually introduces issues with the VF driver
      caused because the PF no longer responds to any messages sent by the VF
      during its .remove routine. This results in the VF potentially removing
      its DMA memory before the PF has shut down the device queues.
      
      Additionally, the fix doesn't actually resolve concurrency issues within
      the ice driver. It is possible for a VF to initiate a reset just prior
      to the ice driver removing VFs. This can result in the remove task
      concurrently operating while the VF is being reset. This results in
      similar memory corruption and panics purportedly fixed by that commit.
      
      Fix this concurrency at its root by protecting both the reset and
      removal flows using the existing VF cfg_lock. This ensures that we
      cannot remove the VF while any outstanding critical tasks such as a
      virtchnl message or a reset are occurring.
      
      This locking change also fixes the root cause originally fixed by commit
      c503e632 ("ice: Stop processing VF messages during teardown"), so we
      can simply revert it.
      
      Note that I kept these two changes together because simply reverting the
      original commit alone would leave the driver vulnerable to worse race
      conditions.
      
      Fixes: c503e632 ("ice: Stop processing VF messages during teardown")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05ae1f0f
    • Brett Creeley's avatar
      ice: Fix race conditions between virtchnl handling and VF ndo ops · 41edeeaa
      Brett Creeley authored
      
      commit e6ba5273 upstream.
      
      The VF can be configured via the PF's ndo ops at the same time the PF is
      receiving/handling virtchnl messages. This has many issues, with
      one of them being the ndo op could be actively resetting a VF (i.e.
      resetting it to the default state and deleting/re-adding the VF's VSI)
      while a virtchnl message is being handled. The following error was seen
      because a VF ndo op was used to change a VF's trust setting while the
      VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing:
      
      [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM
      [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5
      [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6
      
      Fix this by making sure the virtchnl handling and VF ndo ops that
      trigger VF resets cannot run concurrently. This is done by adding a
      struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex
      will be locked around the critical operations and VFR. Since the ndo ops
      will trigger a VFR, the virtchnl thread will use mutex_trylock(). This
      is done because if any other thread (i.e. VF ndo op) has the mutex, then
      that means the current VF message being handled is no longer valid, so
      just ignore it.
      
      This issue can be seen using the following commands:
      
      for i in {0..50}; do
              rmmod ice
              modprobe ice
      
              sleep 1
      
              echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
      
              ip link set ens785f1 vf 0 trust on
              ip link set ens785f0 vf 0 trust on
      
              sleep 2
      
              echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs
              sleep 1
              echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
      
              ip link set ens785f1 vf 0 trust on
              ip link set ens785f0 vf 0 trust on
      done
      
      Fixes: 7c710869 ("ice: Add handlers for VF netdevice operations")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41edeeaa
    • Frederic Weisbecker's avatar
      rcu/nocb: Fix missed nocb_timer requeue · 0c145262
      Frederic Weisbecker authored
      
      commit b2fcf210 upstream.
      
      This sequence of events can lead to a failure to requeue a CPU's
      ->nocb_timer:
      
      1.	There are no callbacks queued for any CPU covered by CPU 0-2's
      	->nocb_gp_kthread.  Note that ->nocb_gp_kthread is associated
      	with CPU 0.
      
      2.	CPU 1 enqueues its first callback with interrupts disabled, and
      	thus must defer awakening its ->nocb_gp_kthread.  It therefore
      	queues its rcu_data structure's ->nocb_timer.  At this point,
      	CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.
      
      3.	CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a
      	callback, but with interrupts enabled, allowing it to directly
      	awaken the ->nocb_gp_kthread.
      
      4.	The newly awakened ->nocb_gp_kthread associates both CPU 1's
      	and CPU 2's callbacks with a future grace period and arranges
      	for that grace period to be started.
      
      5.	This ->nocb_gp_kthread goes to sleep waiting for the end of this
      	future grace period.
      
      6.	This grace period elapses before the CPU 1's timer fires.
      	This is normally improbably given that the timer is set for only
      	one jiffy, but timers can be delayed.  Besides, it is possible
      	that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
      
      7.	The grace period ends, so rcu_gp_kthread awakens the
      	->nocb_gp_kthread, which in turn awakens both CPU 1's and
      	CPU 2's ->nocb_cb_kthread.  Then ->nocb_gb_kthread sleeps
      	waiting for more newly queued callbacks.
      
      8.	CPU 1's ->nocb_cb_kthread invokes its callback, then sleeps
      	waiting for more invocable callbacks.
      
      9.	Note that neither kthread updated any ->nocb_timer state,
      	so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE.
      
      10.	CPU 1 enqueues its second callback, this time with interrupts
       	enabled so it can wake directly	->nocb_gp_kthread.
      	It does so with calling wake_nocb_gp() which also cancels the
      	pending timer that got queued in step 2. But that doesn't reset
      	CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
      	So CPU 1's ->nocb_defer_wakeup and its ->nocb_timer are now
      	desynchronized.
      
      11.	->nocb_gp_kthread associates the callback queued in 10 with a new
      	grace period, arranges for that grace period to start and sleeps
      	waiting for it to complete.
      
      12.	The grace period ends, rcu_gp_kthread awakens ->nocb_gp_kthread,
      	which in turn wakes up CPU 1's ->nocb_cb_kthread which then
      	invokes the callback queued in 10.
      
      13.	CPU 1 enqueues its third callback, this time with interrupts
      	disabled so it must queue a timer for a deferred wakeup. However
      	the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which
      	incorrectly indicates that a timer is already queued.  Instead,
      	CPU 1's ->nocb_timer was cancelled in 10.  CPU 1 therefore fails
      	to queue the ->nocb_timer.
      
      14.	CPU 1 has its pending callback and it may go unnoticed until
      	some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever
      	calls an explicit deferred wakeup, for example, during idle entry.
      
      This commit fixes this bug by resetting rdp->nocb_defer_wakeup everytime
      we delete the ->nocb_timer.
      
      It is quite possible that there is a similar scenario involving
      ->nocb_bypass_timer and ->nocb_defer_wakeup.  However, despite some
      effort from several people, a failure scenario has not yet been located.
      However, that by no means guarantees that no such scenario exists.
      Finding a failure scenario is left as an exercise for the reader, and the
      "Fixes:" tag below relates to ->nocb_bypass_timer instead of ->nocb_timer.
      
      Fixes: d1b222c6 (rcu/nocb: Add bypass callback queueing)
      Cc: <stable@vger.kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarNeeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c145262
    • D. Wythe's avatar
      net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server · 9bb7237c
      D. Wythe authored
      
      commit 4940a1fdf31c39f0806ac831cde333134862030b upstream.
      
      The problem of SMC_CLC_DECL_ERR_REGRMB on the server is very clear.
      Based on the fact that whether a new SMC connection can be accepted or
      not depends on not only the limit of conn nums, but also the available
      entries of rtoken. Since the rtoken release is trigger by peer, while
      the conn nums is decrease by local, tons of thing can happen in this
      time difference.
      
      This only thing that needs to be mentioned is that now all connection
      creations are completely protected by smc_server_lgr_pending lock, it's
      enough to check only the available entries in rtokens_used_mask.
      
      Fixes: cd6851f3 ("smc: remote memory buffers (RMBs)")
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9bb7237c
    • D. Wythe's avatar
      net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client · d7eb6626
      D. Wythe authored
      
      commit 0537f0a2151375dcf90c1bbfda6a0aaf57164e89 upstream.
      
      The main reason for this unexpected SMC_CLC_DECL_ERR_REGRMB in client
      dues to following execution sequence:
      
      Server Conn A:           Server Conn B:			Client Conn B:
      
      smc_lgr_unregister_conn
                              smc_lgr_register_conn
                              smc_clc_send_accept     ->
                                                              smc_rtoken_add
      smcr_buf_unuse
      		->		Client Conn A:
      				smc_rtoken_delete
      
      smc_lgr_unregister_conn() makes current link available to assigned to new
      incoming connection, while smcr_buf_unuse() has not executed yet, which
      means that smc_rtoken_add may fail because of insufficient rtoken_entry,
      reversing their execution order will avoid this problem.
      
      Fixes: 3e034725 ("net/smc: common functions for RMBs and send buffers")
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7eb6626
    • D. Wythe's avatar
      net/smc: fix connection leak · 2e8d465b
      D. Wythe authored
      
      commit 9f1c50cf39167ff71dc5953a3234f3f6eeb8fcb5 upstream.
      
      There's a potential leak issue under following execution sequence :
      
      smc_release  				smc_connect_work
      if (sk->sk_state == SMC_INIT)
      					send_clc_confirim
      	tcp_abort();
      					...
      					sk.sk_state = SMC_ACTIVE
      smc_close_active
      switch(sk->sk_state) {
      ...
      case SMC_ACTIVE:
      	smc_close_final()
      	// then wait peer closed
      
      Unfortunately, tcp_abort() may discard CLC CONFIRM messages that are
      still in the tcp send buffer, in which case our connection token cannot
      be delivered to the server side, which means that we cannot get a
      passive close message at all. Therefore, it is impossible for the to be
      disconnected at all.
      
      This patch tries a very simple way to avoid this issue, once the state
      has changed to SMC_ACTIVE after tcp_abort(), we can actively abort the
      smc connection, considering that the state is SMC_INIT before
      tcp_abort(), abandoning the complete disconnection process should not
      cause too much problem.
      
      In fact, this problem may exist as long as the CLC CONFIRM message is
      not received by the server. Whether a timer should be added after
      smc_close_final() needs to be discussed in the future. But even so, this
      patch provides a faster release for connection in above case, it should
      also be valuable.
      
      Fixes: 39f41f36 ("net/smc: common release code for non-accepted sockets")
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e8d465b
    • Vladimir Oltean's avatar
      net: dcb: flush lingering app table entries for unregistered devices · 6a8a4dc2
      Vladimir Oltean authored
      
      commit 91b0383fef06f20b847fa9e4f0e3054ead0b1a1b upstream.
      
      If I'm not mistaken (and I don't think I am), the way in which the
      dcbnl_ops work is that drivers call dcb_ieee_setapp() and this populates
      the application table with dynamically allocated struct dcb_app_type
      entries that are kept in the module-global dcb_app_list.
      
      However, nobody keeps exact track of these entries, and although
      dcb_ieee_delapp() is supposed to remove them, nobody does so when the
      interface goes away (example: driver unbinds from device). So the
      dcb_app_list will contain lingering entries with an ifindex that no
      longer matches any device in dcb_app_lookup().
      
      Reclaim the lost memory by listening for the NETDEV_UNREGISTER event and
      flushing the app table entries of interfaces that are now gone.
      
      In fact something like this used to be done as part of the initial
      commit (blamed below), but it was done in dcbnl_exit() -> dcb_flushapp(),
      essentially at module_exit time. That became dead code after commit
      7a6b6f51 ("DCB: fix kconfig option") which essentially merged
      "tristate config DCB" and "bool config DCBNL" into a single "bool config
      DCB", so net/dcb/dcbnl.c could not be built as a module anymore.
      
      Commit 36b9ad80 ("net/dcb: make dcbnl.c explicitly non-modular")
      recognized this and deleted dcbnl_exit() and dcb_flushapp() altogether,
      leaving us with the version we have today.
      
      Since flushing application table entries can and should be done as soon
      as the netdevice disappears, fundamentally the commit that is to blame
      is the one that introduced the design of this API.
      
      Fixes: 9ab933ab ("dcbnl: add appliction tlv handlers")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a8a4dc2
    • j.nixdorf@avm.de's avatar
      net: ipv6: ensure we call ipv6_mc_down() at most once · f4c63b24
      j.nixdorf@avm.de authored
      
      commit 9995b408f17ff8c7f11bc725c8aa225ba3a63b1c upstream.
      
      There are two reasons for addrconf_notify() to be called with NETDEV_DOWN:
      either the network device is actually going down, or IPv6 was disabled
      on the interface.
      
      If either of them stays down while the other is toggled, we repeatedly
      call the code for NETDEV_DOWN, including ipv6_mc_down(), while never
      calling the corresponding ipv6_mc_up() in between. This will cause a
      new entry in idev->mc_tomb to be allocated for each multicast group
      the interface is subscribed to, which in turn leaks one struct ifmcaddr6
      per nontrivial multicast group the interface is subscribed to.
      
      The following reproducer will leak at least $n objects:
      
      ip addr add ff2e::4242/32 dev eth0 autojoin
      sysctl -w net.ipv6.conf.eth0.disable_ipv6=1
      for i in $(seq 1 $n); do
      	ip link set up eth0; ip link set down eth0
      done
      
      Joining groups with IPV6_ADD_MEMBERSHIP (unprivileged) or setting the
      sysctl net.ipv6.conf.eth0.forwarding to 1 (=> subscribing to ff02::2)
      can also be used to create a nontrivial idev->mc_list, which will the
      leak objects with the right up-down-sequence.
      
      Based on both sources for NETDEV_DOWN events the interface IPv6 state
      should be considered:
      
       - not ready if the network interface is not ready OR IPv6 is disabled
         for it
       - ready if the network interface is ready AND IPv6 is enabled for it
      
      The functions ipv6_mc_up() and ipv6_down() should only be run when this
      state changes.
      
      Implement this by remembering when the IPv6 state is ready, and only
      run ipv6_mc_down() if it actually changed from ready to not ready.
      
      The other direction (not ready -> ready) already works correctly, as:
      
       - the interface notification triggered codepath for NETDEV_UP /
         NETDEV_CHANGE returns early if ipv6 is disabled, and
       - the disable_ipv6=0 triggered codepath skips fully initializing the
         interface as long as addrconf_link_ready(dev) returns false
       - calling ipv6_mc_up() repeatedly does not leak anything
      
      Fixes: 3ce62a84 ("ipv6: exit early in addrconf_notify() if IPv6 is disabled")
      Signed-off-by: default avatarJohannes Nixdorf <j.nixdorf@avm.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4c63b24
    • Sven Eckelmann's avatar
      batman-adv: Don't expect inter-netns unique iflink indices · a9c4a74a
      Sven Eckelmann authored
      
      commit 6c1f41afc1dbe59d9d3c8bb0d80b749c119aa334 upstream.
      
      The ifindex doesn't have to be unique for multiple network namespaces on
      the same machine.
      
        $ ip netns add test1
        $ ip -net test1 link add dummy1 type dummy
        $ ip netns add test2
        $ ip -net test2 link add dummy2 type dummy
      
        $ ip -net test1 link show dev dummy1
        6: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
            link/ether 96:81:55:1e:dd:85 brd ff:ff:ff:ff:ff:ff
        $ ip -net test2 link show dev dummy2
        6: dummy2: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
            link/ether 5a:3c:af:35:07:c3 brd ff:ff:ff:ff:ff:ff
      
      But the batman-adv code to walk through the various layers of virtual
      interfaces uses this assumption because dev_get_iflink handles it
      internally and doesn't return the actual netns of the iflink. And
      dev_get_iflink only documents the situation where ifindex == iflink for
      physical devices.
      
      But only checking for dev->netdev_ops->ndo_get_iflink is also not an option
      because ipoib_get_iflink implements it even when it sometimes returns an
      iflink != ifindex and sometimes iflink == ifindex. The caller must
      therefore make sure itself to check both netns and iflink + ifindex for
      equality. Only when they are equal, a "physical" interface was detected
      which should stop the traversal. On the other hand, vxcan_get_iflink can
      also return 0 in case there was currently no valid peer. In this case, it
      is still necessary to stop.
      
      Fixes: b7eddd0b ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
      Fixes: 5ed4a460 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
      Reported-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9c4a74a
    • Sven Eckelmann's avatar
      batman-adv: Request iflink once in batadv_get_real_netdevice · 3dae11d2
      Sven Eckelmann authored
      
      commit 6116ba09423f7d140f0460be6a1644dceaad00da upstream.
      
      There is no need to call dev_get_iflink multiple times for the same
      net_device in batadv_get_real_netdevice. And since some of the
      ndo_get_iflink callbacks are dynamic (for example via RCUs like in
      vxcan_get_iflink), it could easily happen that the returned values are not
      stable. The pre-checks before __dev_get_by_index are then of course bogus.
      
      Fixes: 5ed4a460 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3dae11d2
    • Sven Eckelmann's avatar
      batman-adv: Request iflink once in batadv-on-batadv check · dcf10d78
      Sven Eckelmann authored
      
      commit 690bb6fb64f5dc7437317153902573ecad67593d upstream.
      
      There is no need to call dev_get_iflink multiple times for the same
      net_device in batadv_is_on_batman_iface. And since some of the
      .ndo_get_iflink callbacks are dynamic (for example via RCUs like in
      vxcan_get_iflink), it could easily happen that the returned values are not
      stable. The pre-checks before __dev_get_by_index are then of course bogus.
      
      Fixes: b7eddd0b ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dcf10d78
    • Florian Westphal's avatar
      netfilter: nf_queue: handle socket prefetch · 81f817f3
      Florian Westphal authored
      
      commit 3b836da4081fa585cf6c392f62557496f2cb0efe upstream.
      
      In case someone combines bpf socket assign and nf_queue, then we will
      queue an skb who references a struct sock that did not have its
      reference count incremented.
      
      As we leave rcu protection, there is no guarantee that skb->sk is still
      valid.
      
      For refcount-less skb->sk case, try to increment the reference count
      and then override the destructor.
      
      In case of failure we have two choices: orphan the skb and 'delete'
      preselect or let nf_queue() drop the packet.
      
      Do the latter, it should not happen during normal operation.
      
      Fixes: cf7fbe66 ("bpf: Add socket assign support")
      Acked-by: default avatarJoe Stringer <joe@cilium.io>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81f817f3
    • Florian Westphal's avatar
      netfilter: nf_queue: fix possible use-after-free · 4d052392
      Florian Westphal authored
      
      commit c3873070247d9e3c7a6b0cf9bf9b45e8018427b1 upstream.
      
      Eric Dumazet says:
        The sock_hold() side seems suspect, because there is no guarantee
        that sk_refcnt is not already 0.
      
      On failure, we cannot queue the packet and need to indicate an
      error.  The packet will be dropped by the caller.
      
      v2: split skb prefetch hunk into separate change
      
      Fixes: 271b72c7 ("udp: RCU handling for Unicast packets.")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d052392
    • Florian Westphal's avatar
      netfilter: nf_queue: don't assume sk is full socket · 3b9ba964
      Florian Westphal authored
      
      commit 747670fd9a2d1b7774030dba65ca022ba442ce71 upstream.
      
      There is no guarantee that state->sk refers to a full socket.
      
      If refcount transitions to 0, sock_put calls sk_free which then ends up
      with garbage fields.
      
      I'd like to thank Oleksandr Natalenko and Jiri Benc for considerable
      debug work and pointing out state->sk oddities.
      
      Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener")
      Tested-by: default avatarOleksandr Natalenko <oleksandr@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b9ba964
    • lena wang's avatar
      net: fix up skbs delta_truesize in UDP GRO frag_list · 4e178ed1
      lena wang authored
      
      commit 224102de2ff105a2c05695e66a08f4b5b6b2d19c upstream.
      
      The truesize for a UDP GRO packet is added by main skb and skbs in main
      skb's frag_list:
      skb_gro_receive_list
              p->truesize += skb->truesize;
      
      The commit 53475c5d ("net: fix use-after-free when UDP GRO with
      shared fraglist") introduced a truesize increase for frag_list skbs.
      When uncloning skb, it will call pskb_expand_head and trusesize for
      frag_list skbs may increase. This can occur when allocators uses
      __netdev_alloc_skb and not jump into __alloc_skb. This flow does not
      use ksize(len) to calculate truesize while pskb_expand_head uses.
      skb_segment_list
      err = skb_unclone(nskb, GFP_ATOMIC);
      pskb_expand_head
              if (!skb->sk || skb->destructor == sock_edemux)
                      skb->truesize += size - osize;
      
      If we uses increased truesize adding as delta_truesize, it will be
      larger than before and even larger than previous total truesize value
      if skbs in frag_list are abundant. The main skb truesize will become
      smaller and even a minus value or a huge value for an unsigned int
      parameter. Then the following memory check will drop this abnormal skb.
      
      To avoid this error we should use the original truesize to segment the
      main skb.
      
      Fixes: 53475c5d ("net: fix use-after-free when UDP GRO with shared fraglist")
      Signed-off-by: default avatarlena wang <lena.wang@mediatek.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/1646133431-8948-1-git-send-email-lena.wang@mediatek.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e178ed1
    • Sasha Neftin's avatar
      e1000e: Correct NVM checksum verification flow · eb5e444f
      Sasha Neftin authored
      commit ffd24fa2fcc76ecb2e61e7a4ef8588177bcb42a6 upstream.
      
      Update MAC type check e1000_pch_tgp because for e1000_pch_cnp,
      NVM checksum update is still possible.
      Emit a more detailed warning message.
      
      Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1191663
      
      
      Fixes: 4051f683 ("e1000e: Do not take care about recovery NVM checksum")
      Reported-by: default avatarThomas Bogendoerfer <tbogendoerfer@suse.de>
      Signed-off-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb5e444f
    • Leon Romanovsky's avatar
      xfrm: enforce validity of offload input flags · b53d4bfd
      Leon Romanovsky authored
      
      commit 7c76ecd9c99b6e9a771d813ab1aa7fa428b3ade1 upstream.
      
      struct xfrm_user_offload has flags variable that received user input,
      but kernel didn't check if valid bits were provided. It caused a situation
      where not sanitized input was forwarded directly to the drivers.
      
      For example, XFRM_OFFLOAD_IPV6 define that was exposed, was used by
      strongswan, but not implemented in the kernel at all.
      
      As a solution, check and sanitize input flags to forward
      XFRM_OFFLOAD_INBOUND to the drivers.
      
      Fixes: d77e38e6 ("xfrm: Add an IPsec hardware offloading API")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b53d4bfd
    • Antony Antony's avatar
      xfrm: fix the if_id check in changelink · 2f0e6d80
      Antony Antony authored
      
      commit 6d0d95a1c2b07270870e7be16575c513c29af3f1 upstream.
      
      if_id will be always 0, because it was not yet initialized.
      
      Fixes: 8dce43919566 ("xfrm: interface with if_id 0 should return error")
      Reported-by: default avatarPavel Machek <pavel@denx.de>
      Signed-off-by: default avatarAntony Antony <antony.antony@secunet.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f0e6d80
    • Eric Dumazet's avatar
      bpf, sockmap: Do not ignore orig_len parameter · 24efaae0
      Eric Dumazet authored
      
      commit 60ce37b03917e593d8e5d8bcc7ec820773daf81d upstream.
      
      Currently, sk_psock_verdict_recv() returns skb->len
      
      This is problematic because tcp_read_sock() might have
      passed orig_len < skb->len, due to the presence of TCP urgent data.
      
      This causes an infinite loop from tcp_read_sock()
      
      Followup patch will make tcp_read_sock() more robust vs bad actors.
      
      Fixes: ef565928 ("bpf, sockmap: Allow skipping sk_skb parser program")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Tested-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/r/20220302161723.3910001-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24efaae0
    • Eric Dumazet's avatar
      netfilter: fix use-after-free in __nf_register_net_hook() · 8b0142c4
      Eric Dumazet authored
      
      commit 56763f12b0f02706576a088e85ef856deacc98a0 upstream.
      
      We must not dereference @new_hooks after nf_hook_mutex has been released,
      because other threads might have freed our allocated hooks already.
      
      BUG: KASAN: use-after-free in nf_hook_entries_get_hook_ops include/linux/netfilter.h:130 [inline]
      BUG: KASAN: use-after-free in hooks_validate net/netfilter/core.c:171 [inline]
      BUG: KASAN: use-after-free in __nf_register_net_hook+0x77a/0x820 net/netfilter/core.c:438
      Read of size 2 at addr ffff88801c1a8000 by task syz-executor237/4430
      
      CPU: 1 PID: 4430 Comm: syz-executor237 Not tainted 5.17.0-rc5-syzkaller-00306-g2293be58d6a1 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255
       __kasan_report mm/kasan/report.c:442 [inline]
       kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
       nf_hook_entries_get_hook_ops include/linux/netfilter.h:130 [inline]
       hooks_validate net/netfilter/core.c:171 [inline]
       __nf_register_net_hook+0x77a/0x820 net/netfilter/core.c:438
       nf_register_net_hook+0x114/0x170 net/netfilter/core.c:571
       nf_register_net_hooks+0x59/0xc0 net/netfilter/core.c:587
       nf_synproxy_ipv6_init+0x85/0xe0 net/netfilter/nf_synproxy_core.c:1218
       synproxy_tg6_check+0x30d/0x560 net/ipv6/netfilter/ip6t_SYNPROXY.c:81
       xt_check_target+0x26c/0x9e0 net/netfilter/x_tables.c:1038
       check_target net/ipv6/netfilter/ip6_tables.c:530 [inline]
       find_check_entry.constprop.0+0x7f1/0x9e0 net/ipv6/netfilter/ip6_tables.c:573
       translate_table+0xc8b/0x1750 net/ipv6/netfilter/ip6_tables.c:735
       do_replace net/ipv6/netfilter/ip6_tables.c:1153 [inline]
       do_ip6t_set_ctl+0x56e/0xb90 net/ipv6/netfilter/ip6_tables.c:1639
       nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101
       ipv6_setsockopt+0x122/0x180 net/ipv6/ipv6_sockglue.c:1024
       rawv6_setsockopt+0xd3/0x6a0 net/ipv6/raw.c:1084
       __sys_setsockopt+0x2db/0x610 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f65a1ace7d9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 71 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f65a1a7f308 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f65a1ace7d9
      RDX: 0000000000000040 RSI: 0000000000000029 RDI: 0000000000000003
      RBP: 00007f65a1b574c8 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000020000000 R11: 0000000000000246 R12: 00007f65a1b55130
      R13: 00007f65a1b574c0 R14: 00007f65a1b24090 R15: 0000000000022000
       </TASK>
      
      The buggy address belongs to the page:
      page:ffffea0000706a00 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c1a8
      flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000000000 ffffea0001c1b108 ffffea000046dd08 0000000000000000
      raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as freed
      page last allocated via order 2, migratetype Unmovable, gfp_mask 0x52dc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO), pid 4430, ts 1061781545818, free_ts 1061791488993
       prep_new_page mm/page_alloc.c:2434 [inline]
       get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4165
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5389
       __alloc_pages_node include/linux/gfp.h:572 [inline]
       alloc_pages_node include/linux/gfp.h:595 [inline]
       kmalloc_large_node+0x62/0x130 mm/slub.c:4438
       __kmalloc_node+0x35a/0x4a0 mm/slub.c:4454
       kmalloc_node include/linux/slab.h:604 [inline]
       kvmalloc_node+0x97/0x100 mm/util.c:580
       kvmalloc include/linux/slab.h:731 [inline]
       kvzalloc include/linux/slab.h:739 [inline]
       allocate_hook_entries_size net/netfilter/core.c:61 [inline]
       nf_hook_entries_grow+0x140/0x780 net/netfilter/core.c:128
       __nf_register_net_hook+0x144/0x820 net/netfilter/core.c:429
       nf_register_net_hook+0x114/0x170 net/netfilter/core.c:571
       nf_register_net_hooks+0x59/0xc0 net/netfilter/core.c:587
       nf_synproxy_ipv6_init+0x85/0xe0 net/netfilter/nf_synproxy_core.c:1218
       synproxy_tg6_check+0x30d/0x560 net/ipv6/netfilter/ip6t_SYNPROXY.c:81
       xt_check_target+0x26c/0x9e0 net/netfilter/x_tables.c:1038
       check_target net/ipv6/netfilter/ip6_tables.c:530 [inline]
       find_check_entry.constprop.0+0x7f1/0x9e0 net/ipv6/netfilter/ip6_tables.c:573
       translate_table+0xc8b/0x1750 net/ipv6/netfilter/ip6_tables.c:735
       do_replace net/ipv6/netfilter/ip6_tables.c:1153 [inline]
       do_ip6t_set_ctl+0x56e/0xb90 net/ipv6/netfilter/ip6_tables.c:1639
       nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1352 [inline]
       free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1404
       free_unref_page_prepare mm/page_alloc.c:3325 [inline]
       free_unref_page+0x19/0x690 mm/page_alloc.c:3404
       kvfree+0x42/0x50 mm/util.c:613
       rcu_do_batch kernel/rcu/tree.c:2527 [inline]
       rcu_core+0x7b1/0x1820 kernel/rcu/tree.c:2778
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Memory state around the buggy address:
       ffff88801c1a7f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff88801c1a7f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      >ffff88801c1a8000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                         ^
       ffff88801c1a8080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff88801c1a8100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      
      Fixes: 2420b79f ("netfilter: debug: check for sorted array")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b0142c4
    • Jiri Bohac's avatar
      xfrm: fix MTU regression · 4952faa7
      Jiri Bohac authored
      
      commit 6596a0229541270fb8d38d989f91b78838e5e9da upstream.
      
      Commit 749439bf ("ipv6: fix udpv6
      sendmsg crash caused by too small MTU") breaks PMTU for xfrm.
      
      A Packet Too Big ICMPv6 message received in response to an ESP
      packet will prevent all further communication through the tunnel
      if the reported MTU minus the ESP overhead is smaller than 1280.
      
      E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead
      is 92 bytes. Receiving a PTB with MTU of 1371 or less will result
      in all further packets in the tunnel dropped. A ping through the
      tunnel fails with "ping: sendmsg: Invalid argument".
      
      Apparently the MTU on the xfrm route is smaller than 1280 and
      fails the check inside ip6_setup_cork() added by 749439bf.
      
      We found this by debugging USGv6/ipv6ready failures. Failing
      tests are: "Phase-2 Interoperability Test Scenario IPsec" /
      5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation).
      
      Commit b515d263 ("xfrm:
      xfrm_state_mtu should return at least 1280 for ipv6") attempted
      to fix this but caused another regression in TCP MSS calculations
      and had to be reverted.
      
      The patch below fixes the situation by dropping the MTU
      check and instead checking for the underflows described in the
      749439bf commit message.
      
      Signed-off-by: default avatarJiri Bohac <jbohac@suse.cz>
      Fixes: 749439bf ("ipv6: fix udpv6 sendmsg crash caused by too small MTU")
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4952faa7
Loading