Skip to content
Snippets Groups Projects
  1. Mar 08, 2022
    • Greg Kroah-Hartman's avatar
    • Huang Pei's avatar
      hamradio: fix macro redefine warning · ea2bc310
      Huang Pei authored
      
      commit 16517829 upstream.
      
      MIPS/IA64 define END as assembly function ending, which conflict
      with END definition in mkiss.c, just undef it at first
      
      Reported-by: default avatar <lkp@intel.com>
      Signed-off-by: default avatarHuang Pei <huangpei@loongson.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea2bc310
    • Like Xu's avatar
      KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots() · 8998aa67
      Like Xu authored
      
      commit c6c937d673aaa1d603f62f134e1ca9c173eeeed3 upstream.
      
      Just like on the optional mmu_alloc_direct_roots() path, once shadow
      path reaches "r = -EIO" somewhere, the caller needs to know the actual
      state in order to enter error handling and avoid something worse.
      
      Fixes: 4a38162e ("KVM: MMU: load PDPTRs outside mmu_lock")
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220301124941.48412-1-likexu@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8998aa67
    • Yun Zhou's avatar
      proc: fix documentation and description of pagemap · 416e3a0e
      Yun Zhou authored
      commit dd21bfa425c098b95ca86845f8e7d1ec1ddf6e4a upstream.
      
      Since bit 57 was exported for uffd-wp write-protected (commit
      fb8e37f3: "mm/pagemap: export uffd-wp protection information"),
      fixing it can reduce some unnecessary confusion.
      
      Link: https://lkml.kernel.org/r/20220301044538.3042713-1-yun.zhou@windriver.com
      
      
      Fixes: fb8e37f3 ("mm/pagemap: export uffd-wp protection information")
      Signed-off-by: default avatarYun Zhou <yun.zhou@windriver.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com>
      Cc: Florian Schmidt <florian.schmidt@nutanix.com>
      Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Colin Cross <ccross@google.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      416e3a0e
    • Jiri Bohac's avatar
      Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6" · 8b893496
      Jiri Bohac authored
      commit a6d95c5a628a09be129f25d5663a7e9db8261f51 upstream.
      
      This reverts commit b515d263.
      
      Commit b515d263 ("xfrm: xfrm_state_mtu
      should return at least 1280 for ipv6") in v5.14 breaks the TCP MSS
      calculation in ipsec transport mode, resulting complete stalls of TCP
      connections. This happens when the (P)MTU is 1280 or slighly larger.
      
      The desired formula for the MSS is:
      MSS = (MTU - ESP_overhead) - IP header - TCP header
      
      However, the above commit clamps the (MTU - ESP_overhead) to a
      minimum of 1280, turning the formula into
      MSS = max(MTU - ESP overhead, 1280) -  IP header - TCP header
      
      With the (P)MTU near 1280, the calculated MSS is too large and the
      resulting TCP packets never make it to the destination because they
      are over the actual PMTU.
      
      The above commit also causes suboptimal double fragmentation in
      xfrm tunnel mode, as described in
      https://lore.kernel.org/netdev/20210429202529.codhwpc7w6kbudug@dwarf.suse.cz/
      
      
      
      The original problem the above commit was trying to fix is now fixed
      by commit 6596a0229541270fb8d38d989f91b78838e5e9da ("xfrm: fix MTU
      regression").
      
      Signed-off-by: default avatarJiri Bohac <jbohac@suse.cz>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b893496
    • Josef Bacik's avatar
      btrfs: do not start relocation until in progress drops are done · 6599d5e8
      Josef Bacik authored
      
      commit b4be6aefa73c9a6899ef3ba9c5faaa8a66e333ef upstream.
      
      We hit a bug with a recovering relocation on mount for one of our file
      systems in production.  I reproduced this locally by injecting errors
      into snapshot delete with balance running at the same time.  This
      presented as an error while looking up an extent item
      
        WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680
        CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8
        RIP: 0010:lookup_inline_extent_backref+0x647/0x680
        RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202
        RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
        RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001
        R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000
        R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000
        FS:  0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0
        Call Trace:
         <TASK>
         insert_inline_extent_backref+0x46/0xd0
         __btrfs_inc_extent_ref.isra.0+0x5f/0x200
         ? btrfs_merge_delayed_refs+0x164/0x190
         __btrfs_run_delayed_refs+0x561/0xfa0
         ? btrfs_search_slot+0x7b4/0xb30
         ? btrfs_update_root+0x1a9/0x2c0
         btrfs_run_delayed_refs+0x73/0x1f0
         ? btrfs_update_root+0x1a9/0x2c0
         btrfs_commit_transaction+0x50/0xa50
         ? btrfs_update_reloc_root+0x122/0x220
         prepare_to_merge+0x29f/0x320
         relocate_block_group+0x2b8/0x550
         btrfs_relocate_block_group+0x1a6/0x350
         btrfs_relocate_chunk+0x27/0xe0
         btrfs_balance+0x777/0xe60
         balance_kthread+0x35/0x50
         ? btrfs_balance+0xe60/0xe60
         kthread+0x16b/0x190
         ? set_kthread_struct+0x40/0x40
         ret_from_fork+0x22/0x30
         </TASK>
      
      Normally snapshot deletion and relocation are excluded from running at
      the same time by the fs_info->cleaner_mutex.  However if we had a
      pending balance waiting to get the ->cleaner_mutex, and a snapshot
      deletion was running, and then the box crashed, we would come up in a
      state where we have a half deleted snapshot.
      
      Again, in the normal case the snapshot deletion needs to complete before
      relocation can start, but in this case relocation could very well start
      before the snapshot deletion completes, as we simply add the root to the
      dead roots list and wait for the next time the cleaner runs to clean up
      the snapshot.
      
      Fix this by setting a bit on the fs_info if we have any DEAD_ROOT's that
      had a pending drop_progress key.  If they do then we know we were in the
      middle of the drop operation and set a flag on the fs_info.  Then
      balance can wait until this flag is cleared to start up again.
      
      If there are DEAD_ROOT's that don't have a drop_progress set then we're
      safe to start balance right away as we'll be properly protected by the
      cleaner_mutex.
      
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6599d5e8
    • Filipe Manana's avatar
      btrfs: add missing run of delayed items after unlink during log replay · 4aef4c90
      Filipe Manana authored
      
      commit 4751dc99627e4d1465c5bfa8cb7ab31ed418eff5 upstream.
      
      During log replay, whenever we need to check if a name (dentry) exists in
      a directory we do searches on the subvolume tree for inode references or
      or directory entries (BTRFS_DIR_INDEX_KEY keys, and BTRFS_DIR_ITEM_KEY
      keys as well, before kernel 5.17). However when during log replay we
      unlink a name, through btrfs_unlink_inode(), we may not delete inode
      references and dir index keys from a subvolume tree and instead just add
      the deletions to the delayed inode's delayed items, which will only be
      run when we commit the transaction used for log replay. This means that
      after an unlink operation during log replay, if we attempt to search for
      the same name during log replay, we will not see that the name was already
      deleted, since the deletion is recorded only on the delayed items.
      
      We run delayed items after every unlink operation during log replay,
      except at unlink_old_inode_refs() and at add_inode_ref(). This was due
      to an overlook, as delayed items should be run after evert unlink, for
      the reasons stated above.
      
      So fix those two cases.
      
      Fixes: 0d836392 ("Btrfs: fix mount failure after fsync due to hard link recreation")
      Fixes: 1f250e92 ("Btrfs: fix log replay failure after unlink and link combination")
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4aef4c90
    • Sidong Yang's avatar
      btrfs: qgroup: fix deadlock between rescan worker and remove qgroup · 34146bba
      Sidong Yang authored
      
      commit d4aef1e122d8bbdc15ce3bd0bc813d6b44a7d63a upstream.
      
      The commit e804861bd4e6 ("btrfs: fix deadlock between quota disable and
      qgroup rescan worker") by Kawasaki resolves deadlock between quota
      disable and qgroup rescan worker. But also there is a deadlock case like
      it. It's about enabling or disabling quota and creating or removing
      qgroup. It can be reproduced in simple script below.
      
      for i in {1..100}
      do
          btrfs quota enable /mnt &
          btrfs qgroup create 1/0 /mnt &
          btrfs qgroup destroy 1/0 /mnt &
          btrfs quota disable /mnt &
      done
      
      Here's why the deadlock happens:
      
      1) The quota rescan task is running.
      
      2) Task A calls btrfs_quota_disable(), locks the qgroup_ioctl_lock
         mutex, and then calls btrfs_qgroup_wait_for_completion(), to wait for
         the quota rescan task to complete.
      
      3) Task B calls btrfs_remove_qgroup() and it blocks when trying to lock
         the qgroup_ioctl_lock mutex, because it's being held by task A. At that
         point task B is holding a transaction handle for the current transaction.
      
      4) The quota rescan task calls btrfs_commit_transaction(). This results
         in it waiting for all other tasks to release their handles on the
         transaction, but task B is blocked on the qgroup_ioctl_lock mutex
         while holding a handle on the transaction, and that mutex is being held
         by task A, which is waiting for the quota rescan task to complete,
         resulting in a deadlock between these 3 tasks.
      
      To resolve this issue, the thread disabling quota should unlock
      qgroup_ioctl_lock before waiting rescan completion. Move
      btrfs_qgroup_wait_for_completion() after unlock of qgroup_ioctl_lock.
      
      Fixes: e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
      CC: stable@vger.kernel.org # 5.4+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Signed-off-by: default avatarSidong Yang <realwakka@gmail.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34146bba
    • Josef Bacik's avatar
      btrfs: do not WARN_ON() if we have PageError set · e00077aa
      Josef Bacik authored
      
      commit a50e1fcbc9b85fd4e95b89a75c0884cb032a3e06 upstream.
      
      Whenever we do any extent buffer operations we call
      assert_eb_page_uptodate() to complain loudly if we're operating on an
      non-uptodate page.  Our overnight tests caught this warning earlier this
      week
      
        WARNING: CPU: 1 PID: 553508 at fs/btrfs/extent_io.c:6849 assert_eb_page_uptodate+0x3f/0x50
        CPU: 1 PID: 553508 Comm: kworker/u4:13 Tainted: G        W         5.17.0-rc3+ #564
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
        Workqueue: btrfs-cache btrfs_work_helper
        RIP: 0010:assert_eb_page_uptodate+0x3f/0x50
        RSP: 0018:ffffa961440a7c68 EFLAGS: 00010246
        RAX: 0017ffffc0002112 RBX: ffffe6e74453f9c0 RCX: 0000000000001000
        RDX: ffffe6e74467c887 RSI: ffffe6e74453f9c0 RDI: ffff8d4c5efc2fc0
        RBP: 0000000000000d56 R08: ffff8d4d4a224000 R09: 0000000000000000
        R10: 00015817fa9d1ef0 R11: 000000000000000c R12: 00000000000007b1
        R13: ffff8d4c5efc2fc0 R14: 0000000001500000 R15: 0000000001cb1000
        FS:  0000000000000000(0000) GS:ffff8d4dbbd00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007ff31d3448d8 CR3: 0000000118be8004 CR4: 0000000000370ee0
        Call Trace:
      
         extent_buffer_test_bit+0x3f/0x70
         free_space_test_bit+0xa6/0xc0
         load_free_space_tree+0x1f6/0x470
         caching_thread+0x454/0x630
         ? rcu_read_lock_sched_held+0x12/0x60
         ? rcu_read_lock_sched_held+0x12/0x60
         ? rcu_read_lock_sched_held+0x12/0x60
         ? lock_release+0x1f0/0x2d0
         btrfs_work_helper+0xf2/0x3e0
         ? lock_release+0x1f0/0x2d0
         ? finish_task_switch.isra.0+0xf9/0x3a0
         process_one_work+0x26d/0x580
         ? process_one_work+0x580/0x580
         worker_thread+0x55/0x3b0
         ? process_one_work+0x580/0x580
         kthread+0xf0/0x120
         ? kthread_complete_and_exit+0x20/0x20
         ret_from_fork+0x1f/0x30
      
      This was partially fixed by c2e39305 ("btrfs: clear extent buffer
      uptodate when we fail to write it"), however all that fix did was keep
      us from finding extent buffers after a failed writeout.  It didn't keep
      us from continuing to use a buffer that we already had found.
      
      In this case we're searching the commit root to cache the block group,
      so we can start committing the transaction and switch the commit root
      and then start writing.  After the switch we can look up an extent
      buffer that hasn't been written yet and start processing that block
      group.  Then we fail to write that block out and clear Uptodate on the
      page, and then we start spewing these errors.
      
      Normally we're protected by the tree lock to a certain degree here.  If
      we read a block we have that block read locked, and we block the writer
      from locking the block before we submit it for the write.  However this
      isn't necessarily fool proof because the read could happen before we do
      the submit_bio and after we locked and unlocked the extent buffer.
      
      Also in this particular case we have path->skip_locking set, so that
      won't save us here.  We'll simply get a block that was valid when we
      read it, but became invalid while we were using it.
      
      What we really want is to catch the case where we've "read" a block but
      it's not marked Uptodate.  On read we ClearPageError(), so if we're
      !Uptodate and !Error we know we didn't do the right thing for reading
      the page.
      
      Fix this by checking !Uptodate && !Error, this way we will not complain
      if our buffer gets invalidated while we're using it, and we'll maintain
      the spirit of the check which is to make sure we have a fully in-cache
      block while we're messing with it.
      
      CC: stable@vger.kernel.org # 5.4+
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e00077aa
    • Omar Sandoval's avatar
      btrfs: fix relocation crash due to premature return from btrfs_commit_transaction() · 725a6ac3
      Omar Sandoval authored
      
      commit 5fd76bf31ccfecc06e2e6b29f8c809e934085b99 upstream.
      
      We are seeing crashes similar to the following trace:
      
      [38.969182] WARNING: CPU: 20 PID: 2105 at fs/btrfs/relocation.c:4070 btrfs_relocate_block_group+0x2dc/0x340 [btrfs]
      [38.973556] CPU: 20 PID: 2105 Comm: btrfs Not tainted 5.17.0-rc4 #54
      [38.974580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
      [38.976539] RIP: 0010:btrfs_relocate_block_group+0x2dc/0x340 [btrfs]
      [38.980336] RSP: 0000:ffffb0dd42e03c20 EFLAGS: 00010206
      [38.981218] RAX: ffff96cfc4ede800 RBX: ffff96cfc3ce0000 RCX: 000000000002ca14
      [38.982560] RDX: 0000000000000000 RSI: 4cfd109a0bcb5d7f RDI: ffff96cfc3ce0360
      [38.983619] RBP: ffff96cfc309c000 R08: 0000000000000000 R09: 0000000000000000
      [38.984678] R10: ffff96cec0000001 R11: ffffe84c80000000 R12: ffff96cfc4ede800
      [38.985735] R13: 0000000000000000 R14: 0000000000000000 R15: ffff96cfc3ce0360
      [38.987146] FS:  00007f11c15218c0(0000) GS:ffff96d6dfb00000(0000) knlGS:0000000000000000
      [38.988662] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [38.989398] CR2: 00007ffc922c8e60 CR3: 00000001147a6001 CR4: 0000000000370ee0
      [38.990279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [38.991219] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [38.992528] Call Trace:
      [38.992854]  <TASK>
      [38.993148]  btrfs_relocate_chunk+0x27/0xe0 [btrfs]
      [38.993941]  btrfs_balance+0x78e/0xea0 [btrfs]
      [38.994801]  ? vsnprintf+0x33c/0x520
      [38.995368]  ? __kmalloc_track_caller+0x351/0x440
      [38.996198]  btrfs_ioctl_balance+0x2b9/0x3a0 [btrfs]
      [38.997084]  btrfs_ioctl+0x11b0/0x2da0 [btrfs]
      [38.997867]  ? mod_objcg_state+0xee/0x340
      [38.998552]  ? seq_release+0x24/0x30
      [38.999184]  ? proc_nr_files+0x30/0x30
      [38.999654]  ? call_rcu+0xc8/0x2f0
      [39.000228]  ? __x64_sys_ioctl+0x84/0xc0
      [39.000872]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
      [39.001973]  __x64_sys_ioctl+0x84/0xc0
      [39.002566]  do_syscall_64+0x3a/0x80
      [39.003011]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [39.003735] RIP: 0033:0x7f11c166959b
      [39.007324] RSP: 002b:00007fff2543e998 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [39.008521] RAX: ffffffffffffffda RBX: 00007f11c1521698 RCX: 00007f11c166959b
      [39.009833] RDX: 00007fff2543ea40 RSI: 00000000c4009420 RDI: 0000000000000003
      [39.011270] RBP: 0000000000000003 R08: 0000000000000013 R09: 00007f11c16f94e0
      [39.012581] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff25440df3
      [39.014046] R13: 0000000000000000 R14: 00007fff2543ea40 R15: 0000000000000001
      [39.015040]  </TASK>
      [39.015418] ---[ end trace 0000000000000000 ]---
      [43.131559] ------------[ cut here ]------------
      [43.132234] kernel BUG at fs/btrfs/extent-tree.c:2717!
      [43.133031] invalid opcode: 0000 [#1] PREEMPT SMP PTI
      [43.133702] CPU: 1 PID: 1839 Comm: btrfs Tainted: G        W         5.17.0-rc4 #54
      [43.134863] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
      [43.136426] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs]
      [43.139913] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246
      [43.140629] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001
      [43.141604] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff
      [43.142645] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50
      [43.143669] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000
      [43.144657] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000
      [43.145686] FS:  00007f7657dd68c0(0000) GS:ffff96d6df640000(0000) knlGS:0000000000000000
      [43.146808] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [43.147584] CR2: 00007f7fe81bf5b0 CR3: 00000001093ee004 CR4: 0000000000370ee0
      [43.148589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [43.149581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [43.150559] Call Trace:
      [43.150904]  <TASK>
      [43.151253]  btrfs_finish_extent_commit+0x88/0x290 [btrfs]
      [43.152127]  btrfs_commit_transaction+0x74f/0xaa0 [btrfs]
      [43.152932]  ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs]
      [43.153786]  btrfs_ioctl+0x1edc/0x2da0 [btrfs]
      [43.154475]  ? __check_object_size+0x150/0x170
      [43.155170]  ? preempt_count_add+0x49/0xa0
      [43.155753]  ? __x64_sys_ioctl+0x84/0xc0
      [43.156437]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
      [43.157456]  __x64_sys_ioctl+0x84/0xc0
      [43.157980]  do_syscall_64+0x3a/0x80
      [43.158543]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [43.159231] RIP: 0033:0x7f7657f1e59b
      [43.161819] RSP: 002b:00007ffda5cd1658 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [43.162702] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7657f1e59b
      [43.163526] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003
      [43.164358] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
      [43.165208] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      [43.166029] R13: 00005621b91c3232 R14: 00005621b91ba580 R15: 00007ffda5cd1800
      [43.166863]  </TASK>
      [43.167125] Modules linked in: btrfs blake2b_generic xor pata_acpi ata_piix libata raid6_pq scsi_mod libcrc32c virtio_net virtio_rng net_failover rng_core failover scsi_common
      [43.169552] ---[ end trace 0000000000000000 ]---
      [43.171226] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs]
      [43.174767] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246
      [43.175600] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001
      [43.176468] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff
      [43.177357] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50
      [43.178271] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000
      [43.179178] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000
      [43.180071] FS:  00007f7657dd68c0(0000) GS:ffff96d6df800000(0000) knlGS:0000000000000000
      [43.181073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [43.181808] CR2: 00007fe09905f010 CR3: 00000001093ee004 CR4: 0000000000370ee0
      [43.182706] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [43.183591] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      We first hit the WARN_ON(rc->block_group->pinned > 0) in
      btrfs_relocate_block_group() and then the BUG_ON(!cache) in
      unpin_extent_range(). This tells us that we are exiting relocation and
      removing the block group with bytes still pinned for that block group.
      This is supposed to be impossible: the last thing relocate_block_group()
      does is commit the transaction to get rid of pinned extents.
      
      Commit d0c2f4fa ("btrfs: make concurrent fsyncs wait less when
      waiting for a transaction commit") introduced an optimization so that
      commits from fsync don't have to wait for the previous commit to unpin
      extents. This was only intended to affect fsync, but it inadvertently
      made it possible for any commit to skip waiting for the previous commit
      to unpin. This is because if a call to btrfs_commit_transaction() finds
      that another thread is already committing the transaction, it waits for
      the other thread to complete the commit and then returns. If that other
      thread was in fsync, then it completes the commit without completing the
      previous commit. This makes the following sequence of events possible:
      
      Thread 1____________________|Thread 2 (fsync)_____________________|Thread 3 (balance)___________________
      btrfs_commit_transaction(N) |                                     |
        btrfs_run_delayed_refs    |                                     |
          pin extents             |                                     |
        ...                       |                                     |
        state = UNBLOCKED         |btrfs_sync_file                      |
                                  |  btrfs_start_transaction(N + 1)     |relocate_block_group
                                  |                                     |  btrfs_join_transaction(N + 1)
                                  |  btrfs_commit_transaction(N + 1)    |
        ...                       |  trans->state = COMMIT_START        |
                                  |                                     |  btrfs_commit_transaction(N + 1)
                                  |                                     |    wait_for_commit(N + 1, COMPLETED)
                                  |  wait_for_commit(N, SUPER_COMMITTED)|
        state = SUPER_COMMITTED   |  ...                                |
        btrfs_finish_extent_commit|                                     |
          unpin_extent_range()    |  trans->state = COMPLETED           |
                                  |                                     |    return
                                  |                                     |
          ...                     |                                     |Thread 1 isn't done, so pinned > 0
                                  |                                     |and we WARN
                                  |                                     |
                                  |                                     |btrfs_remove_block_group
          unpin_extent_range()    |                                     |
            Thread 3 removed the  |                                     |
            block group, so we BUG|                                     |
      
      There are other sequences involving SUPER_COMMITTED transactions that
      can cause a similar outcome.
      
      We could fix this by making relocation explicitly wait for unpinning,
      but there may be other cases that need it. Josef mentioned ENOSPC
      flushing and the free space cache inode as other potential victims.
      Rather than playing whack-a-mole, this fix is conservative and makes all
      commits not in fsync wait for all previous transactions, which is what
      the optimization intended.
      
      Fixes: d0c2f4fa ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit")
      CC: stable@vger.kernel.org # 5.15+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      725a6ac3
    • Filipe Manana's avatar
      btrfs: fix lost prealloc extents beyond eof after full fsync · 5342e9f3
      Filipe Manana authored
      
      commit d99478874355d3a7b9d86dfb5d7590d5b1754b1f upstream.
      
      When doing a full fsync, if we have prealloc extents beyond (or at) eof,
      and the leaves that contain them were not modified in the current
      transaction, we end up not logging them. This results in losing those
      extents when we replay the log after a power failure, since the inode is
      truncated to the current value of the logged i_size.
      
      Just like for the fast fsync path, we need to always log all prealloc
      extents starting at or beyond i_size. The fast fsync case was fixed in
      commit 471d557a ("Btrfs: fix loss of prealloc extents past i_size
      after fsync log replay") but it missed the full fsync path. The problem
      exists since the very early days, when the log tree was added by
      commit e02119d5 ("Btrfs: Add a write ahead tree log to optimize
      synchronous operations").
      
      Example reproducer:
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt
      
        # Create our test file with many file extent items, so that they span
        # several leaves of metadata, even if the node/page size is 64K. Use
        # direct IO and not fsync/O_SYNC because it's both faster and it avoids
        # clearing the full sync flag from the inode - we want the fsync below
        # to trigger the slow full sync code path.
        $ xfs_io -f -d -c "pwrite -b 4K 0 16M" /mnt/foo
      
        # Now add two preallocated extents to our file without extending the
        # file's size. One right at i_size, and another further beyond, leaving
        # a gap between the two prealloc extents.
        $ xfs_io -c "falloc -k 16M 1M" /mnt/foo
        $ xfs_io -c "falloc -k 20M 1M" /mnt/foo
      
        # Make sure everything is durably persisted and the transaction is
        # committed. This makes all created extents to have a generation lower
        # than the generation of the transaction used by the next write and
        # fsync.
        sync
      
        # Now overwrite only the first extent, which will result in modifying
        # only the first leaf of metadata for our inode. Then fsync it. This
        # fsync will use the slow code path (inode full sync bit is set) because
        # it's the first fsync since the inode was created/loaded.
        $ xfs_io -c "pwrite 0 4K" -c "fsync" /mnt/foo
      
        # Extent list before power failure.
        $ xfs_io -c "fiemap -v" /mnt/foo
        /mnt/foo:
         EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
           0: [0..7]:          2178048..2178055     8   0x0
           1: [8..16383]:      26632..43007     16376   0x0
           2: [16384..32767]:  2156544..2172927 16384   0x0
           3: [32768..34815]:  2172928..2174975  2048 0x800
           4: [34816..40959]:  hole              6144
           5: [40960..43007]:  2174976..2177023  2048 0x801
      
        <power fail>
      
        # Mount fs again, trigger log replay.
        $ mount /dev/sdc /mnt
      
        # Extent list after power failure and log replay.
        $ xfs_io -c "fiemap -v" /mnt/foo
        /mnt/foo:
         EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
           0: [0..7]:          2178048..2178055     8   0x0
           1: [8..16383]:      26632..43007     16376   0x0
           2: [16384..32767]:  2156544..2172927 16384   0x1
      
        # The prealloc extents at file offsets 16M and 20M are missing.
      
      So fix this by calling btrfs_log_prealloc_extents() when we are doing a
      full fsync, so that we always log all prealloc extents beyond eof.
      
      A test case for fstests will follow soon.
      
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5342e9f3
    • Randy Dunlap's avatar
      tracing: Fix return value of __setup handlers · 80660a72
      Randy Dunlap authored
      commit 1d02b444b8d1345ea4708db3bab4db89a7784b55 upstream.
      
      __setup() handlers should generally return 1 to indicate that the
      boot options have been handled.
      
      Using invalid option values causes the entire kernel boot option
      string to be reported as Unknown and added to init's environment
      strings, polluting it.
      
        Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc6
          kprobe_event=p,syscall_any,$arg1 trace_options=quiet
          trace_clock=jiffies", will be passed to user space.
      
       Run /sbin/init as init process
         with arguments:
           /sbin/init
         with environment:
           HOME=/
           TERM=linux
           BOOT_IMAGE=/boot/bzImage-517rc6
           kprobe_event=p,syscall_any,$arg1
           trace_options=quiet
           trace_clock=jiffies
      
      Return 1 from the __setup() handlers so that init's environment is not
      polluted with kernel boot options.
      
      Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
      Link: https://lkml.kernel.org/r/20220303031744.32356-1-rdunlap@infradead.org
      
      
      
      Cc: stable@vger.kernel.org
      Fixes: 7bcfaf54 ("tracing: Add trace_options kernel command line parameter")
      Fixes: e1e232ca ("tracing: Add trace_clock=<clock> kernel parameter")
      Fixes: 970988e1 ("tracing/kprobe: Add kprobe_event= boot parameter")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarIgor Zhbanov <i.zhbanov@omprussia.ru>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80660a72
    • Steven Rostedt (Google)'s avatar
      tracing/histogram: Fix sorting on old "cpu" value · 452f64ee
      Steven Rostedt (Google) authored
      
      commit 1d1898f65616c4601208963c3376c1d828cbf2c7 upstream.
      
      When trying to add a histogram against an event with the "cpu" field, it
      was impossible due to "cpu" being a keyword to key off of the running CPU.
      So to fix this, it was changed to "common_cpu" to match the other generic
      fields (like "common_pid"). But since some scripts used "cpu" for keying
      off of the CPU (for events that did not have "cpu" as a field, which is
      most of them), a backward compatibility trick was added such that if "cpu"
      was used as a key, and the event did not have "cpu" as a field name, then
      it would fallback and switch over to "common_cpu".
      
      This fix has a couple of subtle bugs. One was that when switching over to
      "common_cpu", it did not change the field name, it just set a flag. But
      the code still found a "cpu" field. The "cpu" field is used for filtering
      and is returned when the event does not have a "cpu" field.
      
      This was found by:
      
        # cd /sys/kernel/tracing
        # echo hist:key=cpu,pid:sort=cpu > events/sched/sched_wakeup/trigger
        # cat events/sched/sched_wakeup/hist
      
      Which showed the histogram unsorted:
      
      { cpu:         19, pid:       1175 } hitcount:          1
      { cpu:          6, pid:        239 } hitcount:          2
      { cpu:         23, pid:       1186 } hitcount:         14
      { cpu:         12, pid:        249 } hitcount:          2
      { cpu:          3, pid:        994 } hitcount:          5
      
      Instead of hard coding the "cpu" checks, take advantage of the fact that
      trace_event_field_field() returns a special field for "cpu" and "CPU" if
      the event does not have "cpu" as a field. This special field has the
      "filter_type" of "FILTER_CPU". Check that to test if the returned field is
      of the CPU type instead of doing the string compare.
      
      Also, fix the sorting bug by testing for the hist_field flag of
      HIST_FIELD_FL_CPU when setting up the sort routine. Otherwise it will use
      the special CPU field to know what compare routine to use, and since that
      special field does not have a size, it returns tracing_map_cmp_none.
      
      Cc: stable@vger.kernel.org
      Fixes: 1e3bac71 ("tracing/histogram: Rename "cpu" to "common_cpu"")
      Reported-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      452f64ee
    • William Mahon's avatar
      HID: add mapping for KEY_ALL_APPLICATIONS · aa6d3eef
      William Mahon authored
      
      commit 327b89f0acc4c20a06ed59e4d9af7f6d804dc2e2 upstream.
      
      This patch adds a new key definition for KEY_ALL_APPLICATIONS
      and aliases KEY_DASHBOARD to it.
      
      It also maps the 0x0c/0x2a2 usage code to KEY_ALL_APPLICATIONS.
      
      Signed-off-by: default avatarWilliam Mahon <wmahon@chromium.org>
      Acked-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Link: https://lore.kernel.org/r/20220303035618.1.I3a7746ad05d270161a18334ae06e3b6db1a1d339@changeid
      
      
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa6d3eef
    • William Mahon's avatar
      HID: add mapping for KEY_DICTATE · b355d6a1
      William Mahon authored
      
      commit bfa26ba343c727e055223be04e08f2ebdd43c293 upstream.
      
      Numerous keyboards are adding dictate keys which allows for text
      messages to be dictated by a microphone.
      
      This patch adds a new key definition KEY_DICTATE and maps 0x0c/0x0d8
      usage code to this new keycode. Additionally hid-debug is adjusted to
      recognize this new usage code as well.
      
      Signed-off-by: default avatarWilliam Mahon <wmahon@chromium.org>
      Acked-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Link: https://lore.kernel.org/r/20220303021501.1.I5dbf50eb1a7a6734ee727bda4a8573358c6d3ec0@changeid
      
      
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b355d6a1
    • David Gow's avatar
      Input: samsung-keypad - properly state IOMEM dependency · 74e9545d
      David Gow authored
      
      commit ba115adf61b36b8c167126425a62b0efc23f72c0 upstream.
      
      Make the samsung-keypad driver explicitly depend on CONFIG_HAS_IOMEM, as it
      calls devm_ioremap(). This prevents compile errors in some configs (e.g,
      allyesconfig/randconfig under UML):
      
      /usr/bin/ld: drivers/input/keyboard/samsung-keypad.o: in function `samsung_keypad_probe':
      samsung-keypad.c:(.text+0xc60): undefined reference to `devm_ioremap'
      
      Signed-off-by: default avatarDavid Gow <davidgow@google.com>
      Acked-by: default avataranton ivanov <anton.ivanov@cambridgegreys.com>
      Link: https://lore.kernel.org/r/20220225041727.1902850-1-davidgow@google.com
      
      
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74e9545d
    • Hans de Goede's avatar
      Input: elan_i2c - fix regulator enable count imbalance after suspend/resume · cb19f03e
      Hans de Goede authored
      
      commit 04b7762e37c95d9b965d16bb0e18dbd1fa2e2861 upstream.
      
      Before these changes elan_suspend() would only disable the regulator
      when device_may_wakeup() returns false; whereas elan_resume() would
      unconditionally enable it, leading to an enable count imbalance when
      device_may_wakeup() returns true.
      
      This triggers the "WARN_ON(regulator->enable_count)" in regulator_put()
      when the elan_i2c driver gets unbound, this happens e.g. with the
      hot-plugable dock with Elan I2C touchpad for the Asus TF103C 2-in-1.
      
      Fix this by making the regulator_enable() call also be conditional
      on device_may_wakeup() returning false.
      
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Link: https://lore.kernel.org/r/20220131135436.29638-2-hdegoede@redhat.com
      
      
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb19f03e
    • Hans de Goede's avatar
      Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power() · f74fc946
      Hans de Goede authored
      
      commit 81a36d8ce554b82b0a08e2b95d0bd44fcbff339b upstream.
      
      elan_disable_power() is called conditionally on suspend, where as
      elan_enable_power() is always called on resume. This leads to
      an imbalance in the regulator's enable count.
      
      Move the regulator_[en|dis]able() calls out of elan_[en|dis]able_power()
      in preparation of fixing this.
      
      No functional changes intended.
      
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Link: https://lore.kernel.org/r/20220131135436.29638-1-hdegoede@redhat.com
      
      
      [dtor: consolidate elan_[en|dis]able() into elan_set_power()]
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f74fc946
    • Lukas Bulwahn's avatar
      MAINTAINERS: adjust file entry for of_net.c after movement · 7a1ee993
      Lukas Bulwahn authored
      
      commit f6164470 upstream.
      
      Commit e330fb14 ("of: net: move of_net under net/") moves of_net.c
      to ./net/core/, but misses to adjust the reference to this file in
      MAINTAINERS.
      
      Hence, ./scripts/get_maintainer.pl --self-test=patterns complains:
      
         warning: no file matches    F:    drivers/of/of_net.c
      
      Adjust the file entry after this file movement.
      
      Signed-off-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Link: https://lore.kernel.org/r/20211016055815.14397-1-lukas.bulwahn@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a1ee993
    • Dan Carpenter's avatar
      iavf: missing unlocks in iavf_watchdog_task() · 7626ab3a
      Dan Carpenter authored
      
      commit bc2f39a6 upstream.
      
      This code was re-organized and there some unlocks missing now.
      
      Fixes: 898ef1cb ("iavf: Combine init and watchdog state machines")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7626ab3a
    • Stefan Assmann's avatar
      iavf: do not override the adapter state in the watchdog task (again) · d7841132
      Stefan Assmann authored
      
      commit fe523d7c upstream.
      
      The watchdog task incorrectly changes the state to __IAVF_RESETTING,
      instead of letting the reset task take care of that. This was already
      resolved by commit 22c8fd71 ("iavf: do not override the adapter
      state in the watchdog task") but the problem was reintroduced by the
      recent code refactoring in commit 45eebd62 ("iavf: Refactor iavf
      state machine tracking").
      
      Fixes: 45eebd62 ("iavf: Refactor iavf state machine tracking")
      Signed-off-by: default avatarStefan Assmann <sassmann@kpanic.de>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7841132
    • Ong Boon Leong's avatar
      net: stmmac: perserve TX and RX coalesce value during XDP setup · d61f3737
      Ong Boon Leong authored
      
      commit 61da6ac715700bcfeef50d187e15c6cc7c9d079b upstream.
      
      When XDP program is loaded, it is desirable that the previous TX and RX
      coalesce values are not re-inited to its default value. This prevents
      unnecessary re-configurig the coalesce values that were working fine
      before.
      
      Fixes: ac746c8520d9 ("net: stmmac: enhance XDP ZC driver level switching performance")
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Tested-by: default avatarKurt Kanzenbach <kurt@linutronix.de>
      Link: https://lore.kernel.org/r/20211124114019.3949125-1-boon.leong.ong@intel.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d61f3737
    • Amit Cohen's avatar
      selftests: mlxsw: resource_scale: Fix return value · d666d336
      Amit Cohen authored
      
      [ Upstream commit 196f9bc050cbc5085b4cbb61cce2efe380bc66d0 ]
      
      The test runs several test cases and is supposed to return an error in
      case at least one of them failed.
      
      Currently, the check of the return value of each test case is in the
      wrong place, which can result in the wrong return value. For example:
      
       # TESTS='tc_police' ./resource_scale.sh
       TEST: 'tc_police' [default] 968                                     [FAIL]
               tc police offload count failed
       Error: mlxsw_spectrum: Failed to allocate policer index.
       We have an error talking to the kernel
       Command failed /tmp/tmp.i7Oc5HwmXY:969
       TEST: 'tc_police' [default] overflow 969                            [ OK ]
       ...
       TEST: 'tc_police' [ipv4_max] overflow 969                           [ OK ]
      
       $ echo $?
       0
      
      Fix this by moving the check to be done after each test case.
      
      Fixes: 059b18e2 ("selftests: mlxsw: Return correct error code in resource scale test")
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d666d336
    • Vladimir Oltean's avatar
      net: dcb: disable softirqs in dcbnl_flush_dev() · 6fe3127d
      Vladimir Oltean authored
      
      [ Upstream commit 10b6bb62ae1a49ee818fc479cf57b8900176773e ]
      
      Ido Schimmel points out that since commit 52cff74e ("dcbnl : Disable
      software interrupts before taking dcb_lock"), the DCB API can be called
      by drivers from softirq context.
      
      One such in-tree example is the chelsio cxgb4 driver:
      dcb_rpl
      -> cxgb4_dcb_handle_fw_update
         -> dcb_ieee_setapp
      
      If the firmware for this driver happened to send an event which resulted
      in a call to dcb_ieee_setapp() at the exact same time as another
      DCB-enabled interface was unregistering on the same CPU, the softirq
      would deadlock, because the interrupted process was already holding the
      dcb_lock in dcbnl_flush_dev().
      
      Fix this unlikely event by using spin_lock_bh() in dcbnl_flush_dev() as
      in the rest of the dcbnl code.
      
      Fixes: 91b0383fef06 ("net: dcb: flush lingering app table entries for unregistered devices")
      Reported-by: default avatarIdo Schimmel <idosch@idosch.org>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220302193939.1368823-1-vladimir.oltean@nxp.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6fe3127d
    • Qiang Yu's avatar
      drm/amdgpu: fix suspend/resume hang regression · 46eed3a3
      Qiang Yu authored
      [ Upstream commit f1ef17011c765495c876fa75435e59eecfdc1ee4 ]
      
      Regression has been reported that suspend/resume may hang with
      the previous vm ready check commit.
      
      So bring back the evicted list check as a temp fix.
      
      Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1922
      
      
      Fixes: c1a66c3bc425 ("drm/amdgpu: check vm ready by amdgpu_vm->evicting flag")
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarQiang Yu <qiang.yu@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      46eed3a3
    • Jiasheng Jiang's avatar
      nl80211: Handle nla_memdup failures in handle_nan_filter · a1e603e5
      Jiasheng Jiang authored
      
      [ Upstream commit 6ad27f522cb3b210476daf63ce6ddb6568c0508b ]
      
      As there's potential for failure of the nla_memdup(),
      check the return value.
      
      Fixes: a442b761 ("cfg80211: add add_nan_func / del_nan_func")
      Signed-off-by: default avatarJiasheng Jiang <jiasheng@iscas.ac.cn>
      Link: https://lore.kernel.org/r/20220301100020.3801187-1-jiasheng@iscas.ac.cn
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a1e603e5
    • Ilya Lipnitskiy's avatar
      MIPS: ralink: mt7621: use bitwise NOT instead of logical · a3049666
      Ilya Lipnitskiy authored
      
      [ Upstream commit 5d8965704fe5662e2e4a7e4424a2cbe53e182670 ]
      
      It was the intention to reverse the bits, not make them all zero by
      using logical NOT operator.
      
      Fixes: cc19db8b312a ("MIPS: ralink: mt7621: do memory detection on KSEG1")
      Suggested-by: default avatarChuanhong Guo <gch981213@gmail.com>
      Signed-off-by: default avatarIlya Lipnitskiy <ilya.lipnitskiy@gmail.com>
      Reviewed-by: default avatarSergio Paracuellos <sergio.paracuellos@gmail.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a3049666
    • Sasha Neftin's avatar
      e1000e: Fix possible HW unit hang after an s0ix exit · 68c4fe2e
      Sasha Neftin authored
      [ Upstream commit 1866aa0d0d6492bc2f8d22d0df49abaccf50cddd ]
      
      Disable the OEM bit/Gig Disable/restart AN impact and disable the PHY
      LAN connected device (LCD) reset during power management flows. This
      fixes possible HW unit hangs on the s0ix exit on some corporate ADL
      platforms.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
      
      
      Fixes: 3e55d231 ("e1000e: Add handshake with the CSME to support S0ix")
      Suggested-by: default avatarDima Ruinskiy <dima.ruinskiy@intel.com>
      Suggested-by: default avatarNir Efrati <nir.efrati@intel.com>
      Signed-off-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      68c4fe2e
    • Douglas Anderson's avatar
      drm/bridge: ti-sn65dsi86: Properly undo autosuspend · 9dfe6abc
      Douglas Anderson authored
      
      [ Upstream commit 26d3474348293dc752c55fe6d41282199f73714c ]
      
      The PM Runtime docs say:
        Drivers in ->remove() callback should undo the runtime PM changes done
        in ->probe(). Usually this means calling pm_runtime_disable(),
        pm_runtime_dont_use_autosuspend() etc.
      
      We weren't doing that for autosuspend. Let's do it.
      
      Fixes: 9bede631 ("drm/bridge: ti-sn65dsi86: Use pm_runtime autosuspend")
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220222141838.1.If784ba19e875e8ded4ec4931601ce6d255845245@changeid
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9dfe6abc
    • Vinay Belgaumkar's avatar
      drm/i915/guc/slpc: Correct the param count for unset param · d675c05b
      Vinay Belgaumkar authored
      
      [ Upstream commit 1b279f6ad467535c3b8a66b4edefaca2cdd5bdc3 ]
      
      SLPC unset param H2G only needs one parameter - the id of the
      param.
      
      Fixes: 025cb07b ("drm/i915/guc/slpc: Cache platform frequency limits")
      
      Suggested-by: default avatarUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
      Signed-off-by: default avatarVinay Belgaumkar <vinay.belgaumkar@intel.com>
      Reviewed-by: default avatarUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
      Signed-off-by: default avatarRamalingam C <ramalingam.c@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220216181504.7155-1-vinay.belgaumkar@intel.com
      
      
      (cherry picked from commit 9648f1c3739505557d94ff749a4f32192ea81fe3)
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d675c05b
    • Slawomir Laba's avatar
      iavf: Fix __IAVF_RESETTING state usage · 6f62bc0f
      Slawomir Laba authored
      
      [ Upstream commit 14756b2ae265d526b8356e86729090b01778fdf6 ]
      
      The setup of __IAVF_RESETTING state in watchdog task had no
      effect and could lead to slow resets in the driver as
      the task for __IAVF_RESETTING state only requeues watchdog.
      Till now the __IAVF_RESETTING was interpreted by reset task
      as running state which could lead to errors with allocating
      and resources disposal.
      
      Make watchdog_task queue the reset task when it's necessary.
      Do not update the state to __IAVF_RESETTING so the reset task
      knows exactly what is the current state of the adapter.
      
      Fixes: 898ef1cb ("iavf: Combine init and watchdog state machines")
      Signed-off-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarPhani Burra <phani.r.burra@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6f62bc0f
    • Slawomir Laba's avatar
      iavf: Fix race in init state · 598bc895
      Slawomir Laba authored
      
      [ Upstream commit a472eb5cbaebb5774672c565e024336c039e9128 ]
      
      When iavf_init_version_check sends VIRTCHNL_OP_GET_VF_RESOURCES
      message, the driver will wait for the response after requeueing
      the watchdog task in iavf_init_get_resources call stack. The
      logic is implemented this way that iavf_init_get_resources has
      to be called in order to allocate adapter->vf_res. It is polling
      for the AQ response in iavf_get_vf_config function. Expect a
      call trace from kernel when adminq_task worker handles this
      message first. adapter->vf_res will be NULL in
      iavf_virtchnl_completion.
      
      Make the watchdog task not queue the adminq_task if the init
      process is not finished yet.
      
      Fixes: 898ef1cb ("iavf: Combine init and watchdog state machines")
      Signed-off-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarPhani Burra <phani.r.burra@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      598bc895
    • Slawomir Laba's avatar
      iavf: Fix locking for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS · ddc5db0b
      Slawomir Laba authored
      
      [ Upstream commit 0579fafd37fb7efe091f0e6c8ccf968864f40f3e ]
      
      iavf_virtchnl_completion is called under crit_lock but when
      the code for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS is called,
      this lock is released in order to obtain rtnl_lock to avoid
      ABBA deadlock with unregister_netdev.
      
      Along with the new way iavf_remove behaves, there exist
      many risks related to the lock release and attmepts to regrab
      it. The driver faces crashes related to races between
      unregister_netdev and netdev_update_features. Yet another
      risk is that the driver could already obtain the crit_lock
      in order to destroy it and iavf_virtchnl_completion could
      crash or block forever.
      
      Make iavf_virtchnl_completion never relock crit_lock in it's
      call paths.
      
      Extract rtnl_lock locking logic to the driver for
      unregister_netdev in order to set the netdev_registered flag
      inside the lock.
      
      Introduce a new flag that will inform adminq_task to perform
      the code from VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS right after
      it finishes processing messages. Guard this code with remove
      flags so it's never called when the driver is in remove state.
      
      Fixes: 5951a2b9 ("iavf: Fix VLAN feature flags after VFR")
      Signed-off-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarPhani Burra <phani.r.burra@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ddc5db0b
    • Slawomir Laba's avatar
      iavf: Fix init state closure on remove · 8c0e4da6
      Slawomir Laba authored
      
      [ Upstream commit 3ccd54ef44ebfa0792c5441b6d9c86618f3378d1 ]
      
      When init states of the adapter work, the errors like lack
      of communication with the PF might hop in. If such events
      occur the driver restores previous states in order to retry
      initialization in a proper way. When remove task kicks in,
      this situation could lead to races with unregistering the
      netdevice as well as resources cleanup. With the commit
      introducing the waiting in remove for init to complete,
      this problem turns into an endless waiting if init never
      recovers from errors.
      
      Introduce __IAVF_IN_REMOVE_TASK bit to indicate that the
      remove thread has started.
      
      Make __IAVF_COMM_FAILED adapter state respect the
      __IAVF_IN_REMOVE_TASK bit and set the __IAVF_INIT_FAILED
      state and return without any action instead of trying to
      recover.
      
      Make __IAVF_INIT_FAILED adapter state respect the
      __IAVF_IN_REMOVE_TASK bit and return without any further
      actions.
      
      Make the loop in the remove handler break when adapter has
      __IAVF_INIT_FAILED state set.
      
      Fixes: 898ef1cb ("iavf: Combine init and watchdog state machines")
      Signed-off-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarPhani Burra <phani.r.burra@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8c0e4da6
    • Slawomir Laba's avatar
      iavf: Add waiting so the port is initialized in remove · 85aa7606
      Slawomir Laba authored
      
      [ Upstream commit 974578017fc1fdd06cea8afb9dfa32602e8529ed ]
      
      There exist races when port is being configured and remove is
      triggered.
      
      unregister_netdev is not and can't be called under crit_lock
      mutex since it is calling ndo_stop -> iavf_close which requires
      this lock. Depending on init state the netdev could be still
      unregistered so unregister_netdev never cleans up, when shortly
      after that the device could become registered.
      
      Make iavf_remove wait until port finishes initialization.
      All critical state changes are atomic (under crit_lock).
      Crashes that come from iavf_reset_interrupt_capability and
      iavf_free_traffic_irqs should now be solved in a graceful
      manner.
      
      Fixes: 605ca7c5 ("iavf: Fix kernel BUG in free_msi_irqs")
      Signed-off-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarPhani Burra <phani.r.burra@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      85aa7606
    • Przemyslaw Patynowski's avatar
      iavf: Fix kernel BUG in free_msi_irqs · e734c794
      Przemyslaw Patynowski authored
      
      [ Upstream commit 605ca7c5 ]
      
      Fix driver not freeing VF's traffic irqs, prior to calling
      pci_disable_msix in iavf_remove.
      There were possible 2 erroneous states in which, iavf_close would
      not be called.
      One erroneous state is fixed by allowing netdev to register, when state
      is already running. It was possible for VF adapter to enter state loop
      from running to resetting, where iavf_open would subsequently fail.
      If user would then unload driver/remove VF pci, iavf_close would not be
      called, as the netdev was not registered, leaving traffic pcis still
      allocated.
      Fixed this by breaking loop, allowing netdev to open device when adapter
      state is __IAVF_RUNNING and it is not explicitily downed.
      Other possiblity is entering to iavf_remove from __IAVF_RESETTING state,
      where iavf_close would not free irqs, but just return 0.
      Fixed this by checking for last adapter state and then removing irqs.
      
      Kernel panic:
      [ 2773.628585] kernel BUG at drivers/pci/msi.c:375!
      ...
      [ 2773.631567] RIP: 0010:free_msi_irqs+0x180/0x1b0
      ...
      [ 2773.640939] Call Trace:
      [ 2773.641572]  pci_disable_msix+0xf7/0x120
      [ 2773.642224]  iavf_reset_interrupt_capability.part.41+0x15/0x30 [iavf]
      [ 2773.642897]  iavf_remove+0x12e/0x500 [iavf]
      [ 2773.643578]  pci_device_remove+0x3b/0xc0
      [ 2773.644266]  device_release_driver_internal+0x103/0x1f0
      [ 2773.644948]  pci_stop_bus_device+0x69/0x90
      [ 2773.645576]  pci_stop_and_remove_bus_device+0xe/0x20
      [ 2773.646215]  pci_iov_remove_virtfn+0xba/0x120
      [ 2773.646862]  sriov_disable+0x2f/0xe0
      [ 2773.647531]  ice_free_vfs+0x2f8/0x350 [ice]
      [ 2773.648207]  ice_sriov_configure+0x94/0x960 [ice]
      [ 2773.648883]  ? _kstrtoull+0x3b/0x90
      [ 2773.649560]  sriov_numvfs_store+0x10a/0x190
      [ 2773.650249]  kernfs_fop_write+0x116/0x190
      [ 2773.650948]  vfs_write+0xa5/0x1a0
      [ 2773.651651]  ksys_write+0x4f/0xb0
      [ 2773.652358]  do_syscall_64+0x5b/0x1a0
      [ 2773.653075]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      Fixes: 22ead37f ("i40evf: Add longer wait after remove module")
      Signed-off-by: default avatarPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e734c794
    • Karen Sornek's avatar
      iavf: Add helper function to go from pci_dev to adapter · 200366d1
      Karen Sornek authored
      
      [ Upstream commit 247aa001 ]
      
      Add helper function to go from pci_dev to adapter to make work simple -
      to go from a pci_dev to the adapter structure and make netdev assignment
      instead of having to go to the net_device then the adapter.
      
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Signed-off-by: default avatarKaren Sornek <karen.sornek@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      200366d1
    • Slawomir Laba's avatar
      iavf: Rework mutexes for better synchronisation · 23901462
      Slawomir Laba authored
      
      [ Upstream commit fc2e6b3b132a907378f6af08356b105a4139c4fb ]
      
      The driver used to crash in multiple spots when put to stress testing
      of the init, reset and remove paths.
      
      The user would experience call traces or hangs when creating,
      resetting, removing VFs. Depending on the machines, the call traces
      are happening in random spots, like reset restoring resources racing
      with driver remove.
      
      Make adapter->crit_lock mutex a mandatory lock for guarding the
      operations performed on all workqueues and functions dealing with
      resource allocation and disposal.
      
      Make __IAVF_REMOVE a final state of the driver respected by
      workqueues that shall not requeue, when they fail to obtain the
      crit_lock.
      
      Make the IRQ handler not to queue the new work for adminq_task
      when the __IAVF_REMOVE state is set.
      
      Fixes: 5ac49f3c ("iavf: use mutexes for locking of critical sections")
      Signed-off-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarPhani Burra <phani.r.burra@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      23901462
    • Jedrzej Jagielski's avatar
      iavf: Add trace while removing device · 9fedc4f8
      Jedrzej Jagielski authored
      
      [ Upstream commit bdb9e5c7aec73a7b8b5acab37587b6de1203e68d ]
      
      Add kernel trace that device was removed.
      Currently there is no such information.
      I.e. Host admin removes a PCI device from a VM,
      than on VM shall be info about the event.
      
      This patch adds info log to iavf_remove function.
      
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Signed-off-by: default avatarJedrzej Jagielski <jedrzej.jagielski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9fedc4f8
    • Mateusz Palczewski's avatar
      iavf: Combine init and watchdog state machines · b4e0e00a
      Mateusz Palczewski authored
      
      [ Upstream commit 898ef1cb ]
      
      Use single state machine for driver initialization and for service
      initialized driver. The init state machine implemented in init_task()
      is merged into the watchdog_task(). The init_task() function is
      removed.
      
      Signed-off-by: default avatarJakub Pawlak <jakub.pawlak@intel.com>
      Signed-off-by: default avatarJan Sokolowski <jan.sokolowski@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b4e0e00a
Loading