Skip to content
Snippets Groups Projects
  1. Mar 16, 2022
    • Jason Wang's avatar
      vhost: allow batching hint without size · f83c85ee
      Jason Wang authored
      
      commit 95932ab2ea07b79cdb33121e2f40ccda9e6a73b5 upstream.
      
      Commit e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb
      entries") tries to reject the IOTLB message whose size is zero. But
      the size is not necessarily meaningful, one example is the batching
      hint, so the commit breaks that.
      
      Fixing this be reject zero size message only if the message is used to
      update/invalidate the IOTLB.
      
      Fixes: e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries")
      Reported-by: default avatarEli Cohen <elic@nvidia.com>
      Cc: Anirudh Rayabharam <mail@anirudhrb.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20220310075211.4801-1-jasowang@redhat.com
      
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Tested-by: default avatarEli Cohen <elic@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f83c85ee
    • Vladimir Oltean's avatar
      Revert "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN" · caf18e4d
      Vladimir Oltean authored
      
      This reverts commit 2566a89b which is
      commit a2614140dc0f467a83aa3bb4b6ee2d6480a76202 upstream.
      
      The above change depends on upstream commit 0faf890f ("net: dsa:
      drop rtnl_lock from dsa_slave_switchdev_event_work"), which is not
      present in linux-5.15.y. Without that change, waiting for the switchdev
      workqueue causes deadlocks on the rtnl_mutex.
      
      Backporting the dependency commit isn't trivial/desirable, since it
      requires that the following dependencies of the dependency are also
      backported:
      
      df405910 net: dsa: sja1105: wait for dynamic config command completion on writes too
      eb016afd net: dsa: sja1105: serialize access to the dynamic config interface
      2468346c net: mscc: ocelot: serialize access to the MAC table
      f7eb4a1c net: dsa: b53: serialize access to the ARL table
      cf231b43 net: dsa: lantiq_gswip: serialize access to the PCE registers
      338a3a47 net: dsa: introduce locking for the address lists on CPU and DSA ports
      
      and then this bugfix on top:
      
      8940e6b669ca ("net: dsa: avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held")
      
      Reported-by: default avatarDaniel Suchy <danny@danysek.cz>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      caf18e4d
    • Christoph Hellwig's avatar
      block: drop unused includes in <linux/genhd.h> · 69b80587
      Christoph Hellwig authored
      
      commit b81e0c23 upstream.
      
      Drop various include not actually used in genhd.h itself, and
      move the remaning includes closer together.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Link: https://lore.kernel.org/r/20210920123328.1399408-15-hch@lst.de
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Reported-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com&gt;a>
      Reported-by: default avatar"H. Nikolaus Schaller" <hns@goldelico.com>
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: "Maciej W. Rozycki" <macro@orcam.me.uk>
      [ resolves MIPS build failure by luck, root cause needs to be fixed in
        Linus's tree properly, but this is needed for now to fix the build - gregkh ]
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69b80587
    • Niklas Cassel's avatar
      riscv: dts: k210: fix broken IRQs on hart1 · cd072bf2
      Niklas Cassel authored
      
      commit 74583f1b upstream.
      
      Commit 67d96729 ("riscv: Update Canaan Kendryte K210 device tree")
      incorrectly removed two entries from the PLIC interrupt-controller node's
      interrupts-extended property.
      
      The PLIC driver cannot know the mapping between hart contexts and hart ids,
      so this information has to be provided by device tree, as specified by the
      PLIC device tree binding.
      
      The PLIC driver uses the interrupts-extended property, and initializes the
      hart context registers in the exact same order as provided by the
      interrupts-extended property.
      
      In other words, if we don't specify the S-mode interrupts, the PLIC driver
      will simply initialize the hart0 S-mode hart context with the hart1 M-mode
      configuration. It is therefore essential to specify the S-mode IRQs even
      though the system itself will only ever be running in M-mode.
      
      Re-add the S-mode interrupts, so that we get working IRQs on hart1 again.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 67d96729 ("riscv: Update Canaan Kendryte K210 device tree")
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd072bf2
    • Ville Syrjälä's avatar
      drm/i915: Workaround broken BIOS DBUF configuration on TGL/RKL · 074c8875
      Ville Syrjälä authored
      commit 4e6f55120c7eccf6f9323bb681632e23cbcb3f3c upstream.
      
      On TGL/RKL the BIOS likes to use some kind of bogus DBUF layout
      that doesn't match what the spec recommends. With a single active
      pipe that is not going to be a problem, but with multiple pipes
      active skl_commit_modeset_enables() goes into an infinite loop
      since it can't figure out any order in which it can commit the
      pipes without causing DBUF overlaps between the planes.
      
      We'd need some kind of extra DBUF defrag stage in between to
      make the transition possible. But that is clearly way too complex
      a solution, so in the name of simplicity let's just sanitize the
      DBUF state by simply turning off all planes when we detect a
      pipe encroaching on its neighbours' DBUF slices. We only have
      to disable the primary planes as all other planes should have
      already been disabled (if they somehow were enabled) by
      earlier sanitization steps.
      
      And for good measure let's also sanitize in case the DBUF
      allocations of the pipes already seem to overlap each other.
      
      Cc: <stable@vger.kernel.org> # v5.14+
      Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4762
      
      
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220204141818.1900-3-ville.syrjala@linux.intel.com
      
      
      Reviewed-by: default avatarStanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      (cherry picked from commit 15512021eb3975a8c2366e3883337e252bb0eee5)
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      074c8875
    • Filipe Manana's avatar
      btrfs: make send work with concurrent block group relocation · a1ce40f8
      Filipe Manana authored
      
      commit d96b3424 upstream.
      
      We don't allow send and balance/relocation to run in parallel in order
      to prevent send failing or silently producing some bad stream. This is
      because while send is using an extent (specially metadata) or about to
      read a metadata extent and expecting it belongs to a specific parent
      node, relocation can run, the transaction used for the relocation is
      committed and the extent gets reallocated while send is still using the
      extent, so it ends up with a different content than expected. This can
      result in just failing to read a metadata extent due to failure of the
      validation checks (parent transid, level, etc), failure to find a
      backreference for a data extent, and other unexpected failures. Besides
      reallocation, there's also a similar problem of an extent getting
      discarded when it's unpinned after the transaction used for block group
      relocation is committed.
      
      The restriction between balance and send was added in commit 9e967495
      ("Btrfs: prevent send failures and crashes due to concurrent relocation"),
      kernel 5.3, while the more general restriction between send and relocation
      was added in commit 1cea5cf0 ("btrfs: ensure relocation never runs
      while we have send operations running"), kernel 5.14.
      
      Both send and relocation can be very long running operations. Relocation
      because it has to do a lot of IO and expensive backreference lookups in
      case there are many snapshots, and send due to read IO when operating on
      very large trees. This makes it inconvenient for users and tools to deal
      with scheduling both operations.
      
      For zoned filesystem we also have automatic block group relocation, so
      send can fail with -EAGAIN when users least expect it or send can end up
      delaying the block group relocation for too long. In the future we might
      also get the automatic block group relocation for non zoned filesystems.
      
      This change makes it possible for send and relocation to run in parallel.
      This is achieved the following way:
      
      1) For all tree searches, send acquires a read lock on the commit root
         semaphore;
      
      2) After each tree search, and before releasing the commit root semaphore,
         the leaf is cloned and placed in the search path (struct btrfs_path);
      
      3) After releasing the commit root semaphore, the changed_cb() callback
         is invoked, which operates on the leaf and writes commands to the pipe
         (or file in case send/receive is not used with a pipe). It's important
         here to not hold a lock on the commit root semaphore, because if we did
         we could deadlock when sending and receiving to the same filesystem
         using a pipe - the send task blocks on the pipe because it's full, the
         receive task, which is the only consumer of the pipe, triggers a
         transaction commit when attempting to create a subvolume or reserve
         space for a write operation for example, but the transaction commit
         blocks trying to write lock the commit root semaphore, resulting in a
         deadlock;
      
      4) Before moving to the next key, or advancing to the next change in case
         of an incremental send, check if a transaction used for relocation was
         committed (or is about to finish its commit). If so, release the search
         path(s) and restart the search, to where we were before, so that we
         don't operate on stale extent buffers. The search restarts are always
         possible because both the send and parent roots are RO, and no one can
         add, remove of update keys (change their offset) in RO trees - the
         only exception is deduplication, but that is still not allowed to run
         in parallel with send;
      
      5) Periodically check if there is contention on the commit root semaphore,
         which means there is a transaction commit trying to write lock it, and
         release the semaphore and reschedule if there is contention, so as to
         avoid causing any significant delays to transaction commits.
      
      This leaves some room for optimizations for send to have less path
      releases and re searching the trees when there's relocation running, but
      for now it's kept simple as it performs quite well (on very large trees
      with resulting send streams in the order of a few hundred gigabytes).
      
      Test case btrfs/187, from fstests, stresses relocation, send and
      deduplication attempting to run in parallel, but without verifying if send
      succeeds and if it produces correct streams. A new test case will be added
      that exercises relocation happening in parallel with send and then checks
      that send succeeds and the resulting streams are correct.
      
      A final note is that for now this still leaves the mutual exclusion
      between send operations and deduplication on files belonging to a root
      used by send operations. A solution for that will be slightly more complex
      but it will eventually be built on top of this change.
      
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1ce40f8
    • Thomas Zimmermann's avatar
      drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDP · 342783ba
      Thomas Zimmermann authored
      
      commit 3755d35e upstream.
      
      As reported in [1], DRM_PANEL_EDP depends on DRM_DP_HELPER. Select
      the option to fix the build failure. The error message is shown
      below.
      
        arm-linux-gnueabihf-ld: drivers/gpu/drm/panel/panel-edp.o: in function
          `panel_edp_probe': panel-edp.c:(.text+0xb74): undefined reference to
          `drm_panel_dp_aux_backlight'
        make[1]: *** [/builds/linux/Makefile:1222: vmlinux] Error 1
      
      The issue has been reported before, when DisplayPort helpers were
      hidden behind the option CONFIG_DRM_KMS_HELPER. [2]
      
      v2:
      	* fix and expand commit description (Arnd)
      
      Signed-off-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Fixes: 9d6366e7 ("drm: fb_helper: improve CONFIG_FB dependency")
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Reported-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Reviewed-by: default avatarLyude Paul <lyude@redhat.com>
      Acked-by: default avatarSam Ravnborg <sam@ravnborg.org>
      Link: https://lore.kernel.org/dri-devel/CA+G9fYvN0NyaVkRQmA1O6rX7H8PPaZrUAD7=RDy33QY9rUU-9g@mail.gmail.com/ # [1]
      Link: https://lore.kernel.org/all/20211117062704.14671-1-rdunlap@infradead.org/ # [2]
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Maxime Ripard <mripard@kernel.org>
      Cc: dri-devel@lists.freedesktop.org
      Link: https://patchwork.freedesktop.org/patch/msgid/20220203093922.20754-1-tzimmermann@suse.de
      
      
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      342783ba
    • Li Huafei's avatar
      x86/traps: Mark do_int3() NOKPROBE_SYMBOL · 1fbafa9a
      Li Huafei authored
      
      commit a365a65f9ca1ceb9cf1ac29db4a4f51df7c507ad upstream.
      
      Since kprobe_int3_handler() is called in do_int3(), probing do_int3()
      can cause a breakpoint recursion and crash the kernel. Therefore,
      do_int3() should be marked as NOKPROBE_SYMBOL.
      
      Fixes: 21e28290 ("x86/traps: Split int3 handler up")
      Signed-off-by: default avatarLi Huafei <lihuafei1@huawei.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20220310120915.63349-1-lihuafei1@huawei.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1fbafa9a
    • Jarkko Sakkinen's avatar
      x86/sgx: Free backing memory after faulting the enclave page · ce91f0f0
      Jarkko Sakkinen authored
      
      commit 08999b2489b4c9b939d7483dbd03702ee4576d96 upstream.
      
      There is a limited amount of SGX memory (EPC) on each system.  When that
      memory is used up, SGX has its own swapping mechanism which is similar
      in concept but totally separate from the core mm/* code.  Instead of
      swapping to disk, SGX swaps from EPC to normal RAM.  That normal RAM
      comes from a shared memory pseudo-file and can itself be swapped by the
      core mm code.  There is a hierarchy like this:
      
      	EPC <-> shmem <-> disk
      
      After data is swapped back in from shmem to EPC, the shmem backing
      storage needs to be freed.  Currently, the backing shmem is not freed.
      This effectively wastes the shmem while the enclave is running.  The
      memory is recovered when the enclave is destroyed and the backing
      storage freed.
      
      Sort this out by freeing memory with shmem_truncate_range(), as soon as
      a page is faulted back to the EPC.  In addition, free the memory for
      PCMD pages as soon as all PCMD's in a page have been marked as unused
      by zeroing its contents.
      
      Cc: stable@vger.kernel.org
      Fixes: 1728ab54 ("x86/sgx: Add a page reclaimer")
      Reported-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Link: https://lkml.kernel.org/r/20220303223859.273187-1-jarkko@kernel.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce91f0f0
    • Ross Philipson's avatar
      x86/boot: Add setup_indirect support in early_memremap_is_setup_data() · e946556d
      Ross Philipson authored
      
      commit 445c1470 upstream.
      
      The x86 boot documentation describes the setup_indirect structures and
      how they are used. Only one of the two functions in ioremap.c that needed
      to be modified to be aware of the introduction of setup_indirect
      functionality was updated. Adds comparable support to the other function
      where it was missing.
      
      Fixes: b3c72fc9 ("x86/boot: Introduce setup_indirect")
      Signed-off-by: default avatarRoss Philipson <ross.philipson@oracle.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarDaniel Kiper <daniel.kiper@oracle.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/1645668456-22036-3-git-send-email-ross.philipson@oracle.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e946556d
    • Ross Philipson's avatar
      x86/boot: Fix memremap of setup_indirect structures · 19503d38
      Ross Philipson authored
      
      commit 7228918b upstream.
      
      As documented, the setup_indirect structure is nested inside
      the setup_data structures in the setup_data list. The code currently
      accesses the fields inside the setup_indirect structure but only
      the sizeof(struct setup_data) is being memremapped. No crash
      occurred but this is just due to how the area is remapped under the
      covers.
      
      Properly memremap both the setup_data and setup_indirect structures
      in these cases before accessing them.
      
      Fixes: b3c72fc9 ("x86/boot: Introduce setup_indirect")
      Signed-off-by: default avatarRoss Philipson <ross.philipson@oracle.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarDaniel Kiper <daniel.kiper@oracle.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/1645668456-22036-2-git-send-email-ross.philipson@oracle.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19503d38
    • David Howells's avatar
      watch_queue: Make comment about setting ->defunct more accurate · ffb8fd39
      David Howells authored
      
      commit 4edc0760412b0c4ecefc7e02cb855b310b122825 upstream.
      
      watch_queue_clear() has a comment stating that setting ->defunct to true
      preventing new additions as well as preventing notifications.  Whilst
      the latter is true, the first bit is superfluous since at the time this
      function is called, the pipe cannot be accessed to add new event
      sources.
      
      Remove the "new additions" bit from the comment.
      
      Fixes: c73be61c ("pipe: Add general notification queue support")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ffb8fd39
    • David Howells's avatar
      watch_queue: Fix lack of barrier/sync/lock between post and read · eb38c2e9
      David Howells authored
      
      commit 2ed147f0 upstream.
      
      There's nothing to synchronise post_one_notification() versus
      pipe_read().  Whilst posting is done under pipe->rd_wait.lock, the
      reader only takes pipe->mutex which cannot bar notification posting as
      that may need to be made from contexts that cannot sleep.
      
      Fix this by setting pipe->head with a barrier in post_one_notification()
      and reading pipe->head with a barrier in pipe_read().
      
      If that's not sufficient, the rd_wait.lock will need to be taken,
      possibly in a ->confirm() op so that it only applies to notifications.
      The lock would, however, have to be dropped before copy_page_to_iter()
      is invoked.
      
      Fixes: c73be61c ("pipe: Add general notification queue support")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb38c2e9
    • David Howells's avatar
      watch_queue: Free the alloc bitmap when the watch_queue is torn down · 82ff8a22
      David Howells authored
      
      commit 7ea1a0124b6da246b5bc8c66cddaafd36acf3ecb upstream.
      
      Free the watch_queue note allocation bitmap when the watch_queue is
      destroyed.
      
      Fixes: c73be61c ("pipe: Add general notification queue support")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82ff8a22
    • David Howells's avatar
      watch_queue: Fix the alloc bitmap size to reflect notes allocated · d453d0e5
      David Howells authored
      
      commit 3b4c0371928c17af03e8397ac842346624017ce6 upstream.
      
      Currently, watch_queue_set_size() sets the number of notes available in
      wqueue->nr_notes according to the number of notes allocated, but sets
      the size of the bitmap to the unrounded number of notes originally asked
      for.
      
      Fix this by setting the bitmap size to the number of notes we're
      actually going to make available (ie. the number allocated).
      
      Fixes: c73be61c ("pipe: Add general notification queue support")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d453d0e5
    • David Howells's avatar
      watch_queue: Fix to always request a pow-of-2 pipe ring size · b022b6a0
      David Howells authored
      
      commit 96a4d8912b28451cd62825fd7caa0e66e091d938 upstream.
      
      The pipe ring size must always be a power of 2 as the head and tail
      pointers are masked off by AND'ing with the size of the ring - 1.
      watch_queue_set_size(), however, lets you specify any number of notes
      between 1 and 511.  This number is passed through to pipe_resize_ring()
      without checking/forcing its alignment.
      
      Fix this by rounding the number of slots required up to the nearest
      power of two.  The request is meant to guarantee that at least that many
      notifications can be generated before the queue is full, so rounding
      down isn't an option, but, alternatively, it may be better to give an
      error if we aren't allowed to allocate that much ring space.
      
      Fixes: c73be61c ("pipe: Add general notification queue support")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b022b6a0
    • David Howells's avatar
      watch_queue: Fix to release page in ->release() · ccd03c30
      David Howells authored
      
      commit c1853fbadcba1497f4907971e7107888e0714c81 upstream.
      
      When a pipe ring descriptor points to a notification message, the
      refcount on the backing page is incremented by the generic get function,
      but the release function, which marks the bitmap, doesn't drop the page
      ref.
      
      Fix this by calling generic_pipe_buf_release() at the end of
      watch_queue_pipe_buf_release().
      
      Fixes: c73be61c ("pipe: Add general notification queue support")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ccd03c30
    • David Howells's avatar
      watch_queue, pipe: Free watchqueue state after clearing pipe ring · 8275b669
      David Howells authored
      
      commit db8facfc9fafacefe8a835416a6b77c838088f8b upstream.
      
      In free_pipe_info(), free the watchqueue state after clearing the pipe
      ring as each pipe ring descriptor has a release function, and in the
      case of a notification message, this is watch_queue_pipe_buf_release()
      which tries to mark the allocation bitmap that was previously released.
      
      Fix this by moving the put of the pipe's ref on the watch queue to after
      the ring has been cleared.  We still need to call watch_queue_clear()
      before doing that to make sure that the pipe is disconnected from any
      notification sources first.
      
      Fixes: c73be61c ("pipe: Add general notification queue support")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8275b669
    • David Howells's avatar
      watch_queue: Fix filter limit check · 1b09f28f
      David Howells authored
      
      commit c993ee0f9f81caf5767a50d1faeba39a0dc82af2 upstream.
      
      In watch_queue_set_filter(), there are a couple of places where we check
      that the filter type value does not exceed what the type_filter bitmap
      can hold.  One place calculates the number of bits by:
      
         if (tf[i].type >= sizeof(wfilter->type_filter) * 8)
      
      which is fine, but the second does:
      
         if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG)
      
      which is not.  This can lead to a couple of out-of-bounds writes due to
      a too-large type:
      
       (1) __set_bit() on wfilter->type_filter
       (2) Writing more elements in wfilter->filters[] than we allocated.
      
      Fix this by just using the proper WATCH_TYPE__NR instead, which is the
      number of types we actually know about.
      
      The bug may cause an oops looking something like:
      
        BUG: KASAN: slab-out-of-bounds in watch_queue_set_filter+0x659/0x740
        Write of size 4 at addr ffff88800d2c66bc by task watch_queue_oob/611
        ...
        Call Trace:
         <TASK>
         dump_stack_lvl+0x45/0x59
         print_address_description.constprop.0+0x1f/0x150
         ...
         kasan_report.cold+0x7f/0x11b
         ...
         watch_queue_set_filter+0x659/0x740
         ...
         __x64_sys_ioctl+0x127/0x190
         do_syscall_64+0x43/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
        Allocated by task 611:
         kasan_save_stack+0x1e/0x40
         __kasan_kmalloc+0x81/0xa0
         watch_queue_set_filter+0x23a/0x740
         __x64_sys_ioctl+0x127/0x190
         do_syscall_64+0x43/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
        The buggy address belongs to the object at ffff88800d2c66a0
         which belongs to the cache kmalloc-32 of size 32
        The buggy address is located 28 bytes inside of
         32-byte region [ffff88800d2c66a0, ffff88800d2c66c0)
      
      Fixes: c73be61c ("pipe: Add general notification queue support")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b09f28f
    • Russell King (Oracle)'s avatar
      ARM: fix Thumb2 regression with Spectre BHB · 52445030
      Russell King (Oracle) authored
      
      commit 6c7cb60bff7aec24b834343ff433125f469886a3 upstream.
      
      When building for Thumb2, the vectors make use of a local label. Sadly,
      the Spectre BHB code also uses a local label with the same number which
      results in the Thumb2 reference pointing at the wrong place. Fix this
      by changing the number used for the Spectre BHB local label.
      
      Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround")
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52445030
    • Dima Chumak's avatar
      net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE · 4a8e7f9d
      Dima Chumak authored
      
      commit 39bab83b119faac4bf7f07173a42ed35be95147e upstream.
      
      Only prio 1 is supported for nic mode when there is no ignore flow level
      support in firmware. But for switchdev mode, which supports fixed number
      of statically pre-allocated prios, this restriction is not relevant so
      it can be relaxed.
      
      Fixes: d671e109 ("net/mlx5: Fix tc max supported prio for nic mode")
      Signed-off-by: default avatarDima Chumak <dchumak@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a8e7f9d
    • Michael S. Tsirkin's avatar
      virtio: acknowledge all features before access · cbb726e6
      Michael S. Tsirkin authored
      
      commit 4fa59ede95195f267101a1b8916992cf3f245cdb upstream.
      
      The feature negotiation was designed in a way that
      makes it possible for devices to know which config
      fields will be accessed by drivers.
      
      This is broken since commit 404123c2 ("virtio: allow drivers to
      validate features") with fallout in at least block and net.  We have a
      partial work-around in commit 2f9a174f ("virtio: write back
      F_VERSION_1 before validate") which at least lets devices find out which
      format should config space have, but this is a partial fix: guests
      should not access config space without acknowledging features since
      otherwise we'll never be able to change the config space format.
      
      To fix, split finalize_features from virtio_finalize_features and
      call finalize_features with all feature bits before validation,
      and then - if validation changed any bits - once again after.
      
      Since virtio_finalize_features no longer writes out features
      rename it to virtio_features_ok - since that is what it does:
      checks that features are ok with the device.
      
      As a side effect, this also reduces the amount of hypervisor accesses -
      we now only acknowledge features once unless we are clearing any
      features when validating (which is uncommon).
      
      IRC I think that this was more or less always the intent in the spec but
      unfortunately the way the spec is worded does not say this explicitly, I
      plan to address this at the spec level, too.
      
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 404123c2 ("virtio: allow drivers to validate features")
      Fixes: 2f9a174f ("virtio: write back F_VERSION_1 before validate")
      Cc: "Halil Pasic" <pasic@linux.ibm.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cbb726e6
    • Michael S. Tsirkin's avatar
      virtio: unexport virtio_finalize_features · 22823b1a
      Michael S. Tsirkin authored
      
      commit 838d6d3461db0fdbf33fc5f8a69c27b50b4a46da upstream.
      
      virtio_finalize_features is only used internally within virtio.
      No reason to export it.
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22823b1a
    • Andrei Vagin's avatar
      KVM: x86/mmu: kvm_faultin_pfn has to return false if pfh is returned · a633bc01
      Andrei Vagin authored
      
      commit a7cc099f upstream.
      
      This looks like a typo in 8f32d5e5. This change didn't intend to do
      any functional changes.
      
      The problem was caught by gVisor tests.
      
      Fixes: 8f32d5e5 ("KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code")
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarAndrei Vagin <avagin@gmail.com>
      Message-Id: <20211015163221.472508-1-avagin@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a633bc01
    • Halil Pasic's avatar
      swiotlb: rework "fix info leak with DMA_FROM_DEVICE" · 2c1f97af
      Halil Pasic authored
      
      commit aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13 upstream.
      
      Unfortunately, we ended up merging an old version of the patch "fix info
      leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
      (the swiotlb maintainer), he asked me to create an incremental fix
      (after I have pointed this out the mix up, and asked him for guidance).
      So here we go.
      
      The main differences between what we got and what was agreed are:
      * swiotlb_sync_single_for_device is also required to do an extra bounce
      * We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
      * The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
        must take precedence over DMA_ATTR_SKIP_CPU_SYNC
      
      Thus this patch removes DMA_ATTR_OVERWRITE, and makes
      swiotlb_sync_single_for_device() bounce unconditionally (that is, also
      when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
      data from the swiotlb buffer.
      
      Let me note, that if the size used with dma_sync_* API is less than the
      size used with dma_[un]map_*, under certain circumstances we may still
      end up with swiotlb not being transparent. In that sense, this is no
      perfect fix either.
      
      To get this bullet proof, we would have to bounce the entire
      mapping/bounce buffer. For that we would have to figure out the starting
      address, and the size of the mapping in
      swiotlb_sync_single_for_device(). While this does seem possible, there
      seems to be no firm consensus on how things are supposed to work.
      
      Signed-off-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c1f97af
    • Paul Semel's avatar
      arm64: kasan: fix include error in MTE functions · 0349c79c
      Paul Semel authored
      
      commit b859ebedd1e730bbda69142fca87af4e712649a1 upstream.
      
      Fix `error: expected string literal in 'asm'`.
      This happens when compiling an ebpf object file that includes
      `net/net_namespace.h` from linux kernel headers.
      
      Include trace:
           include/net/net_namespace.h:10
           include/linux/workqueue.h:9
           include/linux/timer.h:8
           include/linux/debugobjects.h:6
           include/linux/spinlock.h:90
           include/linux/workqueue.h:9
           arch/arm64/include/asm/spinlock.h:9
           arch/arm64/include/generated/asm/qrwlock.h:1
           include/asm-generic/qrwlock.h:14
           arch/arm64/include/asm/processor.h:33
           arch/arm64/include/asm/kasan.h:9
           arch/arm64/include/asm/mte-kasan.h:45
           arch/arm64/include/asm/mte-def.h:14
      
      Signed-off-by: default avatarPaul Semel <paul.semel@datadoghq.com>
      Fixes: 2cb34276 ("arm64: kasan: simplify and inline MTE functions")
      Cc: <stable@vger.kernel.org> # 5.12.x
      Link: https://lore.kernel.org/r/bacb5387-2992-97e4-0c48-1ed925905bee@gmail.com
      
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0349c79c
    • Catalin Marinas's avatar
      arm64: Ensure execute-only permissions are not allowed without EPAN · 61d32def
      Catalin Marinas authored
      
      commit 6e2edd6371a497a6350bb735534c9bda2a31f43d upstream.
      
      Commit 18107f8a ("arm64: Support execute-only permissions with
      Enhanced PAN") re-introduced execute-only permissions when EPAN is
      available. When EPAN is not available, arch_filter_pgprot() is supposed
      to change a PAGE_EXECONLY permission into PAGE_READONLY_EXEC. However,
      if BTI or MTE are present, such check does not detect the execute-only
      pgprot in the presence of PTE_GP (BTI) or MT_NORMAL_TAGGED (MTE),
      allowing the user to request PROT_EXEC with PROT_BTI or PROT_MTE.
      
      Remove the arch_filter_pgprot() function, change the default VM_EXEC
      permissions to PAGE_READONLY_EXEC and update the protection_map[] array
      at core_initcall() if EPAN is detected.
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Fixes: 18107f8a ("arm64: Support execute-only permissions with Enhanced PAN")
      Cc: <stable@vger.kernel.org> # 5.13.x
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Reviewed-by: default avatarVladimir Murzin <vladimir.murzin@arm.com>
      Tested-by: default avatarVladimir Murzin <vladimir.murzin@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61d32def
    • Pali Rohár's avatar
      arm64: dts: marvell: armada-37xx: Remap IO space to bus address 0x0 · 72ea28d8
      Pali Rohár authored
      
      commit a1cc1697 upstream.
      
      Legacy and old PCI I/O based cards do not support 32-bit I/O addressing.
      
      Since commit 64f160e1 ("PCI: aardvark: Configure PCIe resources from
      'ranges' DT property") kernel can set different PCIe address on CPU and
      different on the bus for the one A37xx address mapping without any firmware
      support in case the bus address does not conflict with other A37xx mapping.
      
      So remap I/O space to the bus address 0x0 to enable support for old legacy
      I/O port based cards which have hardcoded I/O ports in low address space.
      
      Note that DDR on A37xx is mapped to bus address 0x0. And mapping of I/O
      space can be set to address 0x0 too because MEM space and I/O space are
      separate and so do not conflict.
      
      Remapping IO space on Turris Mox to different address is not possible to
      due bootloader bug.
      
      Signed-off-by: default avatarPali Rohár <pali@kernel.org>
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 76f6386b ("arm64: dts: marvell: Add Aardvark PCIe support for Armada 3700")
      Cc: stable@vger.kernel.org # 64f160e1 ("PCI: aardvark: Configure PCIe resources from 'ranges' DT property")
      Cc: stable@vger.kernel.org # 514ef1e6 ("arm64: dts: marvell: armada-37xx: Extend PCIe MEM space")
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72ea28d8
    • Nicolas Saenz Julienne's avatar
      tracing/osnoise: Force quiescent states while tracing · 42aaf726
      Nicolas Saenz Julienne authored
      commit caf4c86bf136845982c5103b2661751b40c474c0 upstream.
      
      At the moment running osnoise on a nohz_full CPU or uncontested FIFO
      priority and a PREEMPT_RCU kernel might have the side effect of
      extending grace periods too much. This will entice RCU to force a
      context switch on the wayward CPU to end the grace period, all while
      introducing unwarranted noise into the tracer. This behaviour is
      unavoidable as overly extending grace periods might exhaust the system's
      memory.
      
      This same exact problem is what extended quiescent states (EQS) were
      created for, conversely, rcu_momentary_dyntick_idle() emulates them by
      performing a zero duration EQS. So let's make use of it.
      
      In the common case rcu_momentary_dyntick_idle() is fairly inexpensive:
      atomically incrementing a local per-CPU counter and doing a store. So it
      shouldn't affect osnoise's measurements (which has a 1us granularity),
      so we'll call it unanimously.
      
      The uncommon case involve calling rcu_momentary_dyntick_idle() after
      having the osnoise process:
      
       - Receive an expedited quiescent state IPI with preemption disabled or
         during an RCU critical section. (activates rdp->cpu_no_qs.b.exp
         code-path).
      
       - Being preempted within in an RCU critical section and having the
         subsequent outermost rcu_read_unlock() called with interrupts
         disabled. (t->rcu_read_unlock_special.b.blocked code-path).
      
      Neither of those are possible at the moment, and are unlikely to be in
      the future given the osnoise's loop design. On top of this, the noise
      generated by the situations described above is unavoidable, and if not
      exposed by rcu_momentary_dyntick_idle() will be eventually seen in
      subsequent rcu_read_unlock() calls or schedule operations.
      
      Link: https://lkml.kernel.org/r/20220307180740.577607-1-nsaenzju@redhat.com
      
      
      
      Cc: stable@vger.kernel.org
      Fixes: bce29ac9 ("trace: Add osnoise tracer")
      Signed-off-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Acked-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      42aaf726
    • Emil Renner Berthing's avatar
      riscv: Fix auipc+jalr relocation range checks · eae073d8
      Emil Renner Berthing authored
      
      commit 0966d385 upstream.
      
      RISC-V can do PC-relative jumps with a 32bit range using the following
      two instructions:
      
      	auipc	t0, imm20	; t0 = PC + imm20 * 2^12
      	jalr	ra, t0, imm12	; ra = PC + 4, PC = t0 + imm12
      
      Crucially both the 20bit immediate imm20 and the 12bit immediate imm12
      are treated as two's-complement signed values. For this reason the
      immediates are usually calculated like this:
      
      	imm20 = (offset + 0x800) >> 12
      	imm12 = offset & 0xfff
      
      ..where offset is the signed offset from the auipc instruction. When
      the 11th bit of offset is 0 the addition of 0x800 doesn't change the top
      20 bits and imm12 considered positive. When the 11th bit is 1 the carry
      of the addition by 0x800 means imm20 is one higher, but since imm12 is
      then considered negative the two's complement representation means it
      all cancels out nicely.
      
      However, this addition by 0x800 (2^11) means an offset greater than or
      equal to 2^31 - 2^11 would overflow so imm20 is considered negative and
      result in a backwards jump. Similarly the lower range of offset is also
      moved down by 2^11 and hence the true 32bit range is
      
      	[-2^31 - 2^11, 2^31 - 2^11)
      
      Signed-off-by: default avatarEmil Renner Berthing <kernel@esmil.dk>
      Fixes: e2c0cdfb ("RISC-V: User-facing API")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eae073d8
    • Rong Chen's avatar
      mmc: meson: Fix usage of meson_mmc_post_req() · b515552d
      Rong Chen authored
      
      commit f0d2f15362f02444c5d7ffd5a5eb03e4aa54b685 upstream.
      
      Currently meson_mmc_post_req() is called in meson_mmc_request() right
      after meson_mmc_start_cmd(). This could lead to DMA unmapping before the request
      is actually finished.
      
      To fix, don't call meson_mmc_post_req() until meson_mmc_request_done().
      
      Signed-off-by: default avatarRong Chen <rong.chen@amlogic.com>
      Reviewed-by: default avatarKevin Hilman <khilman@baylibre.com>
      Fixes: 79ed05e3 ("mmc: meson-gx: add support for descriptor chain mode")
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20220216124239.4007667-1-rong.chen@amlogic.com
      
      
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b515552d
    • Jisheng Zhang's avatar
      riscv: alternative only works on !XIP_KERNEL · 9b3cdf5e
      Jisheng Zhang authored
      
      commit c80ee64a upstream.
      
      The alternative mechanism needs runtime code patching, it can't work
      on XIP_KERNEL. And the errata workarounds are implemented via the
      alternative mechanism. So add !XIP_KERNEL dependency for alternative
      and erratas.
      
      Signed-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Fixes: 44c92257 ("RISC-V: enable XIP")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b3cdf5e
    • Robert Hancock's avatar
      net: macb: Fix lost RX packet wakeup race in NAPI receive · b5e79218
      Robert Hancock authored
      
      commit 0bf476fc3624e3a72af4ba7340d430a91c18cd67 upstream.
      
      There is an oddity in the way the RSR register flags propagate to the
      ISR register (and the actual interrupt output) on this hardware: it
      appears that RSR register bits only result in ISR being asserted if the
      interrupt was actually enabled at the time, so enabling interrupts with
      RSR bits already set doesn't trigger an interrupt to be raised. There
      was already a partial fix for this race in the macb_poll function where
      it checked for RSR bits being set and re-triggered NAPI receive.
      However, there was a still a race window between checking RSR and
      actually enabling interrupts, where a lost wakeup could happen. It's
      necessary to check again after enabling interrupts to see if RSR was set
      just prior to the interrupt being enabled, and re-trigger receive in that
      case.
      
      This issue was noticed in a point-to-point UDP request-response protocol
      which periodically saw timeouts or abnormally high response times due to
      received packets not being processed in a timely fashion. In many
      applications, more packets arriving, including TCP retransmissions, would
      cause the original packet to be processed, thus masking the issue.
      
      Fixes: 02f7a34f ("net: macb: Re-enable RX interrupt only when RX is done")
      Cc: stable@vger.kernel.org
      Co-developed-by: default avatarScott McNutt <scott.mcnutt@siriusxm.com>
      Signed-off-by: default avatarScott McNutt <scott.mcnutt@siriusxm.com>
      Signed-off-by: default avatarRobert Hancock <robert.hancock@calian.com>
      Tested-by: default avatarClaudiu Beznea <claudiu.beznea@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5e79218
    • Dan Carpenter's avatar
      staging: gdm724x: fix use after free in gdm_lte_rx() · 1fb9dd37
      Dan Carpenter authored
      
      commit fc7f750dc9d102c1ed7bbe4591f991e770c99033 upstream.
      
      The netif_rx_ni() function frees the skb so we can't dereference it to
      save the skb->len.
      
      Fixes: 61e12104 ("staging: gdm7240: adding LTE USB driver")
      Cc: stable <stable@vger.kernel.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Link: https://lore.kernel.org/r/20220228074331.GA13685@kili
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1fb9dd37
    • Hans de Goede's avatar
      staging: rtl8723bs: Fix access-point mode deadlock · 441bc1e3
      Hans de Goede authored
      
      commit 8f4347081be32e67b0873827e0138ab0fdaaf450 upstream.
      
      Commit 54659ca0 ("staging: rtl8723bs: remove possible deadlock when
      disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
      into 2 locks in attempt to fix a lockdep reported issue with the locking
      order of the sta_hash_lock vs pxmitpriv->lock.
      
      But in the end this turned out to not fully solve the sta_hash_lock issue
      so commit a7ac783c ("staging: rtl8723bs: remove a second possible
      deadlock") was added to fix this in another way.
      
      The original fix was kept as it was still seen as a good thing to have,
      but now it turns out that it creates a deadlock in access-point mode:
      
      [Feb20 23:47] ======================================================
      [  +0.074085] WARNING: possible circular locking dependency detected
      [  +0.074077] 5.16.0-1-amd64 #1 Tainted: G         C  E
      [  +0.064710] ------------------------------------------------------
      [  +0.074075] ksoftirqd/3/29 is trying to acquire lock:
      [  +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
      [  +0.114921]
                    but task is already holding lock:
      [  +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
      [  +0.116976]
                    which lock already depends on the new lock.
      
      [  +0.098037]
                    the existing dependency chain (in reverse order) is:
      [  +0.089704]
                    -> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
      [  +0.077232]        _raw_spin_lock_bh+0x34/0x40
      [  +0.053261]        xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
      [  +0.082572]        rtw_xmit+0x58b/0x940 [r8723bs]
      [  +0.056528]        _rtw_xmit_entry+0xba/0x350 [r8723bs]
      [  +0.062755]        dev_hard_start_xmit+0xf1/0x320
      [  +0.056381]        sch_direct_xmit+0x9e/0x360
      [  +0.052212]        __dev_queue_xmit+0xce4/0x1080
      [  +0.055334]        ip6_finish_output2+0x18f/0x6e0
      [  +0.056378]        ndisc_send_skb+0x2c8/0x870
      [  +0.052209]        ndisc_send_ns+0xd3/0x210
      [  +0.050130]        addrconf_dad_work+0x3df/0x5a0
      [  +0.055338]        process_one_work+0x274/0x5a0
      [  +0.054296]        worker_thread+0x52/0x3b0
      [  +0.050124]        kthread+0x16c/0x1a0
      [  +0.044925]        ret_from_fork+0x1f/0x30
      [  +0.049092]
                    -> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
      [  +0.074101]        __lock_acquire+0x10f5/0x1d80
      [  +0.054298]        lock_acquire+0xd7/0x300
      [  +0.049088]        _raw_spin_lock_bh+0x34/0x40
      [  +0.053248]        rtw_xmit_classifier+0x8a/0x140 [r8723bs]
      [  +0.066949]        rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
      [  +0.066946]        rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
      [  +0.078386]        wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
      [  +0.065903]        rtw_recv_entry+0xe36/0x1160 [r8723bs]
      [  +0.063809]        rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
      [  +0.071093]        tasklet_action_common.constprop.0+0xe5/0x110
      [  +0.070966]        __do_softirq+0x16f/0x50a
      [  +0.050134]        __irq_exit_rcu+0xeb/0x140
      [  +0.051172]        irq_exit_rcu+0xa/0x20
      [  +0.047006]        common_interrupt+0xb8/0xd0
      [  +0.052214]        asm_common_interrupt+0x1e/0x40
      [  +0.056381]        finish_task_switch.isra.0+0x100/0x3a0
      [  +0.063670]        __schedule+0x3ad/0xd20
      [  +0.048047]        schedule+0x4e/0xc0
      [  +0.043880]        smpboot_thread_fn+0xc4/0x220
      [  +0.054298]        kthread+0x16c/0x1a0
      [  +0.044922]        ret_from_fork+0x1f/0x30
      [  +0.049088]
                    other info that might help us debug this:
      
      [  +0.095950]  Possible unsafe locking scenario:
      
      [  +0.070952]        CPU0                    CPU1
      [  +0.054282]        ----                    ----
      [  +0.054285]   lock(&psta->sleep_q.lock);
      [  +0.047004]                                lock(&pxmitpriv->lock);
      [  +0.074082]                                lock(&psta->sleep_q.lock);
      [  +0.077209]   lock(&pxmitpriv->lock);
      [  +0.043873]
                     *** DEADLOCK ***
      
      [  +0.070950] 1 lock held by ksoftirqd/3/29:
      [  +0.049082]  #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
      
      Analysis shows that in hindsight the splitting of the lock was not
      a good idea, so revert this to fix the access-point mode deadlock.
      
      Note this is a straight-forward revert done with git revert, the commented
      out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
      code before the reverted changes.
      
      Fixes: 54659ca0 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
      Cc: stable <stable@vger.kernel.org>
      Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
      Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      441bc1e3
    • Miklos Szeredi's avatar
      fuse: fix pipe buffer lifetime for direct_io · ca62747b
      Miklos Szeredi authored
      
      commit 0c4bcfdecb1ac0967619ee7ff44871d93c08c909 upstream.
      
      In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
      fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
      imports the write buffer with fuse_get_user_pages(), which uses
      iov_iter_get_pages() to grab references to userspace pages instead of
      actually copying memory.
      
      On the filesystem device side, these pages can then either be read to
      userspace (via fuse_dev_read()), or splice()d over into a pipe using
      fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.
      
      This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
      the userspace filesystem can mark the request as completed, causing write()
      to return. At that point, the userspace filesystem should no longer have
      access to the pipe buffer.
      
      Fix by copying pages coming from the user address space to new pipe
      buffers.
      
      Reported-by: default avatarJann Horn <jannh@google.com>
      Fixes: c3021629 ("fuse: support splice() reading from fuse device")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca62747b
    • Miklos Szeredi's avatar
      fuse: fix fileattr op failure · d60d34b4
      Miklos Szeredi authored
      
      commit a679a61520d8a7b0211a1da990404daf5cc80b72 upstream.
      
      The fileattr API conversion broke lsattr on ntfs3g.
      
      Previously the ioctl(... FS_IOC_GETFLAGS) returned an EINVAL error, but
      after the conversion the error returned by the fuse filesystem was not
      propagated back to the ioctl() system call, resulting in success being
      returned with bogus values.
      
      Fix by checking for outarg.result in fuse_priv_ioctl(), just as generic
      ioctl code does.
      
      Reported-by: default avatarJean-Pierre André <jean-pierre.andre@wanadoo.fr>
      Fixes: 72227eac ("fuse: convert to fileattr")
      Cc: <stable@vger.kernel.org> # v5.13
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d60d34b4
    • Randy Dunlap's avatar
      ARM: Spectre-BHB: provide empty stub for non-config · 64147ce8
      Randy Dunlap authored
      
      commit 68453767131a5deec1e8f9ac92a9042f929e585d upstream.
      
      When CONFIG_GENERIC_CPU_VULNERABILITIES is not set, references
      to spectre_v2_update_state() cause a build error, so provide an
      empty stub for that function when the Kconfig option is not set.
      
      Fixes this build error:
      
        arm-linux-gnueabi-ld: arch/arm/mm/proc-v7-bugs.o: in function `cpu_v7_bugs_init':
        proc-v7-bugs.c:(.text+0x52): undefined reference to `spectre_v2_update_state'
        arm-linux-gnueabi-ld: proc-v7-bugs.c:(.text+0x82): undefined reference to `spectre_v2_update_state'
      
      Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Cc: Russell King <rmk+kernel@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: patches@armlinux.org.uk
      Acked-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64147ce8
    • Mike Kravetz's avatar
      selftests/memfd: clean up mapping in mfd_fail_write · 5c237251
      Mike Kravetz authored
      [ Upstream commit fda153c89af344d21df281009a9d046cf587ea0f ]
      
      Running the memfd script ./run_hugetlbfs_test.sh will often end in error
      as follows:
      
          memfd-hugetlb: CREATE
          memfd-hugetlb: BASIC
          memfd-hugetlb: SEAL-WRITE
          memfd-hugetlb: SEAL-FUTURE-WRITE
          memfd-hugetlb: SEAL-SHRINK
          fallocate(ALLOC) failed: No space left on device
          ./run_hugetlbfs_test.sh: line 60: 166855 Aborted                 (core dumped) ./memfd_test hugetlbfs
          opening: ./mnt/memfd
          fuse: DONE
      
      If no hugetlb pages have been preallocated, run_hugetlbfs_test.sh will
      allocate 'just enough' pages to run the test.  In the SEAL-FUTURE-WRITE
      test the mfd_fail_write routine maps the file, but does not unmap.  As a
      result, two hugetlb pages remain reserved for the mapping.  When the
      fallocate call in the SEAL-SHRINK test attempts allocate all hugetlb
      pages, it is short by the two reserved pages.
      
      Fix by making sure to unmap in mfd_fail_write.
      
      Link: https://lkml.kernel.org/r/20220219004340.56478-1-mike.kravetz@oracle.com
      
      
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5c237251
    • Aneesh Kumar K.V's avatar
      selftest/vm: fix map_fixed_noreplace test failure · e22807ee
      Aneesh Kumar K.V authored
      [ Upstream commit f39c58008dee7ab5fc94c3f1995a21e886801df0 ]
      
      On the latest RHEL the test fails due to executable mapped at 256MB
      address
      
           # ./map_fixed_noreplace
          mmap() @ 0x10000000-0x10050000 p=0xffffffffffffffff result=File exists
          10000000-10010000 r-xp 00000000 fd:04 34905657                           /root/rpmbuild/BUILD/kernel-5.14.0-56.el9/linux-5.14.0-56.el9.ppc64le/tools/testing/selftests/vm/map_fixed_noreplace
          10010000-10020000 r--p 00000000 fd:04 34905657                           /root/rpmbuild/BUILD/kernel-5.14.0-56.el9/linux-5.14.0-56.el9.ppc64le/tools/testing/selftests/vm/map_fixed_noreplace
          10020000-10030000 rw-p 00010000 fd:04 34905657                           /root/rpmbuild/BUILD/kernel-5.14.0-56.el9/linux-5.14.0-56.el9.ppc64le/tools/testing/selftests/vm/map_fixed_noreplace
          10029b90000-10029bc0000 rw-p 00000000 00:00 0                            [heap]
          7fffbb510000-7fffbb750000 r-xp 00000000 fd:04 24534                      /usr/lib64/libc.so.6
          7fffbb750000-7fffbb760000 r--p 00230000 fd:04 24534                      /usr/lib64/libc.so.6
          7fffbb760000-7fffbb770000 rw-p 00240000 fd:04 24534                      /usr/lib64/libc.so.6
          7fffbb780000-7fffbb7a0000 r--p 00000000 00:00 0                          [vvar]
          7fffbb7a0000-7fffbb7b0000 r-xp 00000000 00:00 0                          [vdso]
          7fffbb7b0000-7fffbb800000 r-xp 00000000 fd:04 24514                      /usr/lib64/ld64.so.2
          7fffbb800000-7fffbb810000 r--p 00040000 fd:04 24514                      /usr/lib64/ld64.so.2
          7fffbb810000-7fffbb820000 rw-p 00050000 fd:04 24514                      /usr/lib64/ld64.so.2
          7fffd93f0000-7fffd9420000 rw-p 00000000 00:00 0                          [stack]
          Error: couldn't map the space we need for the test
      
      Fix this by finding a free address using mmap instead of hardcoding
      BASE_ADDRESS.
      
      Link: https://lkml.kernel.org/r/20220217083417.373823-1-aneesh.kumar@linux.ibm.com
      
      
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Jann Horn <jannh@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e22807ee
Loading