Skip to content
Snippets Groups Projects
  1. Jun 09, 2022
    • Sven Schnelle's avatar
      s390/stp: clock_delta should be signed · f19e2e1d
      Sven Schnelle authored
      
      commit 5ace65ebb5ce9fe1cc8fdbdd97079fb566ef0ea4 upstream.
      
      clock_delta is declared as unsigned long in various places. However,
      the clock sync delta can be negative. This would add a huge positive
      offset in clock_sync_global where clock_delta is added to clk.eitod
      which is a 72 bit integer. Declare it as signed long to fix this.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Reviewed-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f19e2e1d
    • Nico Boehr's avatar
      s390/perf: obtain sie_block from the right address · 42b2f5dd
      Nico Boehr authored
      
      commit c9bfb460c3e4da2462e16b0f0b200990b36b1dd2 upstream.
      
      Since commit 1179f170 ("s390: fix fpu restore in entry.S"), the
      sie_block pointer is located at empty1[1], but in sie_block() it was
      taken from empty1[0].
      
      This leads to a random pointer being dereferenced, possibly causing
      system crash.
      
      This problem can be observed when running a simple guest with an endless
      loop and recording the cpu-clock event:
      
        sudo perf kvm --guestvmlinux=<guestkernel> --guest top -e cpu-clock
      
      With this fix, the correct guest address is shown.
      
      Fixes: 1179f170 ("s390: fix fpu restore in entry.S")
      Cc: stable@vger.kernel.org
      Acked-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: default avatarClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarNico Boehr <nrb@linux.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      42b2f5dd
    • Rei Yamamoto's avatar
      mm, compaction: fast_find_migrateblock() should return pfn in the target zone · 20e6ec76
      Rei Yamamoto authored
      commit bbe832b9db2e1ad21522f8f0bf02775fff8a0e0e upstream.
      
      At present, pages not in the target zone are added to cc->migratepages
      list in isolate_migratepages_block().  As a result, pages may migrate
      between nodes unintentionally.
      
      This would be a serious problem for older kernels without commit
      a984226f ("mm: memcontrol: remove the pgdata parameter of
      mem_cgroup_page_lruvec"), because it can corrupt the lru list by
      handling pages in list without holding proper lru_lock.
      
      Avoid returning a pfn outside the target zone in the case that it is
      not aligned with a pageblock boundary.  Otherwise
      isolate_migratepages_block() will handle pages not in the target zone.
      
      Link: https://lkml.kernel.org/r/20220511044300.4069-1-yamamoto.rei@jp.fujitsu.com
      
      
      Fixes: 70b44595 ("mm, compaction: use free lists to quickly locate a migration source")
      Signed-off-by: default avatarRei Yamamoto <yamamoto.rei@jp.fujitsu.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Wonhyuk Yang <vvghjk1234@gmail.com>
      Cc: Rei Yamamoto <yamamoto.rei@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20e6ec76
    • Denis Efremov's avatar
      staging: r8188eu: prevent ->Ssid overflow in rtw_wx_set_scan() · ac2eab7d
      Denis Efremov authored
      
      commit bc10916e890948d8927a5c8c40fb5dc44be5e1b8 upstream.
      
      This code has a check to prevent read overflow but it needs another
      check to prevent writing beyond the end of the ->Ssid[] array.
      
      Fixes: 2b42bd58 ("staging: r8188eu: introduce new os_dep dir for RTL8188eu driver")
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarDenis Efremov <denis.e.efremov@oracle.com>
      Link: https://lore.kernel.org/r/20220518070052.108287-1-denis.e.efremov@oracle.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac2eab7d
    • Johan Hovold's avatar
      PCI: qcom: Fix unbalanced PHY init on probe errors · a7daaaa8
      Johan Hovold authored
      commit 83013631f0f9961416abd812e228c8efbc2f6069 upstream.
      
      Undo the PHY initialisation (e.g. balance runtime PM) if host
      initialisation fails during probe.
      
      Link: https://lore.kernel.org/r/20220401133854.10421-3-johan+linaro@kernel.org
      
      
      Fixes: 82a82383 ("PCI: qcom: Add Qualcomm PCIe controller driver")
      Signed-off-by: default avatarJohan Hovold <johan+linaro@kernel.org>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Acked-by: default avatarStanimir Varbanov <svarbanov@mm-sol.com>
      Cc: stable@vger.kernel.org      # 4.5
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7daaaa8
    • Johan Hovold's avatar
      PCI: qcom: Fix runtime PM imbalance on probe errors · 4f9d6407
      Johan Hovold authored
      commit 87d83b96c8d6c6c2d2096bd0bdba73bcf42b8ef0 upstream.
      
      Drop the leftover pm_runtime_disable() calls from the late probe error
      paths that would, for example, prevent runtime PM from being reenabled
      after a probe deferral.
      
      Link: https://lore.kernel.org/r/20220401133854.10421-2-johan+linaro@kernel.org
      
      
      Fixes: 6e5da6f7 ("PCI: qcom: Fix error handling in runtime PM support")
      Signed-off-by: default avatarJohan Hovold <johan+linaro@kernel.org>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Acked-by: default avatarStanimir Varbanov <svarbanov@mm-sol.com>
      Cc: stable@vger.kernel.org      # 4.20
      Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f9d6407
    • Bjorn Helgaas's avatar
      PCI/PM: Fix bridge_d3_blacklist[] Elo i2 overwrite of Gigabyte X299 · 0db67767
      Bjorn Helgaas authored
      commit 12068bb346db5776d0ec9bb4cd073f8427a1ac92 upstream.
      
      92597f97a40b ("PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold") omitted
      braces around the new Elo i2 entry, so it overwrote the existing Gigabyte
      X299 entry.  Add the appropriate braces.
      
      Found by:
      
        $ make W=1 drivers/pci/pci.o
          CC      drivers/pci/pci.o
        drivers/pci/pci.c:2974:12: error: initialized field overwritten [-Werror=override-init]
         2974 |   .ident = "Elo i2",
              |            ^~~~~~~~
      
      Link: https://lore.kernel.org/r/20220526221258.GA409855@bhelgaas
      
      
      Fixes: 92597f97a40b ("PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold")
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org  # v5.15+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0db67767
    • Alex Deucher's avatar
      drm/amdgpu: add beige goby PCI ID · 283bda02
      Alex Deucher authored
      
      commit 62e9bd20035b53ff6c679499c08546d96c6c60a7 upstream.
      
      Add a beige goby PCI ID.
      
      Reviewed-by: default avatarGuchun Chen <guchun.chen@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      283bda02
    • Gautam Menghani's avatar
      tracing: Initialize integer variable to prevent garbage return value · 4ef5ab53
      Gautam Menghani authored
      commit 154827f8e53d8c492b3fb0cb757fbcadb5d516b5 upstream.
      
      Initialize the integer variable to 0 to fix the clang scan warning:
      Undefined or garbage value returned to caller
      [core.uninitialized.UndefReturn]
              return ret;
      
      Link: https://lkml.kernel.org/r/20220522061826.1751-1-gautammenghani201@gmail.com
      
      
      
      Cc: stable@vger.kernel.org
      Fixes: 8993665a ("tracing/boot: Support multiple handlers for per-event histogram")
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarGautam Menghani <gautammenghani201@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ef5ab53
    • Keita Suzuki's avatar
      tracing: Fix potential double free in create_var_ref() · 37443b35
      Keita Suzuki authored
      commit 99696a2592bca641eb88cc9a80c90e591afebd0f upstream.
      
      In create_var_ref(), init_var_ref() is called to initialize the fields
      of variable ref_field, which is allocated in the previous function call
      to create_hist_field(). Function init_var_ref() allocates the
      corresponding fields such as ref_field->system, but frees these fields
      when the function encounters an error. The caller later calls
      destroy_hist_field() to conduct error handling, which frees the fields
      and the variable itself. This results in double free of the fields which
      are already freed in the previous function.
      
      Fix this by storing NULL to the corresponding fields when they are freed
      in init_var_ref().
      
      Link: https://lkml.kernel.org/r/20220425063739.3859998-1-keitasuzuki.park@sslab.ics.keio.ac.jp
      
      
      
      Fixes: 067fe038 ("tracing: Add variable reference handling to hist triggers")
      CC: stable@vger.kernel.org
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: default avatarTom Zanussi <zanussi@kernel.org>
      Signed-off-by: default avatarKeita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37443b35
    • Laurent Vivier's avatar
      tty: goldfish: Introduce gf_ioread32()/gf_iowrite32() · 0b011b40
      Laurent Vivier authored
      
      commit 2e2ac4a3327479f7e2744cdd88a5c823f2057bad upstream.
      
      The goldfish TTY device was clearly defined as having little-endian
      registers, but the switch to __raw_{read,write}l(() broke its driver
      when running on big-endian kernels (if anyone ever tried this).
      
      The m68k qemu implementation got this wrong, and assumed native-endian
      registers.  While this is a bug in qemu, it is probably impossible to
      fix that since there is no way of knowing which other operating systems
      have started relying on that bug over the years.
      
      Hence revert commit da31de35 ("tty: goldfish: use
      __raw_writel()/__raw_readl()", and define gf_ioread32()/gf_iowrite32()
      to be able to use accessors defined by the architecture.
      
      Cc: stable@vger.kernel.org # v5.11+
      Fixes: da31de35 ("tty: goldfish: use __raw_writel()/__raw_readl()")
      Signed-off-by: default avatarLaurent Vivier <laurent@vivier.eu>
      Link: https://lore.kernel.org/r/20220406201523.243733-2-laurent@vivier.eu
      
      
      [geert: Add rationale based on Arnd's comments]
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b011b40
    • Sakari Ailus's avatar
      ACPI: property: Release subnode properties with data nodes · b3485d2b
      Sakari Ailus authored
      
      commit 3bd561e1572ee02a50cd1a5be339abf1a5b78d56 upstream.
      
      struct acpi_device_properties describes one source of properties present
      on either struct acpi_device or struct acpi_data_node. When properties are
      parsed, both are populated but when released, only those properties that
      are associated with the device node are freed.
      
      Fix this by also releasing memory of the data node properties.
      
      Fixes: 5f5e4890 ("ACPI / property: Allow multiple property compatible _DSD entries")
      Cc: 4.20+ <stable@vger.kernel.org> # 4.20+
      Signed-off-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3485d2b
    • Jan Kara's avatar
      ext4: avoid cycles in directory h-tree · 3a3ce941
      Jan Kara authored
      
      commit 3ba733f879c2a88910744647e41edeefbc0d92b2 upstream.
      
      A maliciously corrupted filesystem can contain cycles in the h-tree
      stored inside a directory. That can easily lead to the kernel corrupting
      tree nodes that were already verified under its hands while doing a node
      split and consequently accessing unallocated memory. Fix the problem by
      verifying traversed block numbers are unique.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220518093332.13986-2-jack@suse.cz
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a3ce941
    • Jan Kara's avatar
      ext4: verify dir block before splitting it · ca17db38
      Jan Kara authored
      
      commit 46c116b920ebec58031f0a78c5ea9599b0d2a371 upstream.
      
      Before splitting a directory block verify its directory entries are sane
      so that the splitting code does not access memory it should not.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220518093332.13986-1-jack@suse.cz
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca17db38
    • Baokun Li's avatar
      ext4: fix bug_on in __es_tree_search · 3c617827
      Baokun Li authored
      
      commit d36f6ed761b53933b0b4126486c10d3da7751e7f upstream.
      
      Hulk Robot reported a BUG_ON:
      ==================================================================
      kernel BUG at fs/ext4/extents_status.c:199!
      [...]
      RIP: 0010:ext4_es_end fs/ext4/extents_status.c:199 [inline]
      RIP: 0010:__es_tree_search+0x1e0/0x260 fs/ext4/extents_status.c:217
      [...]
      Call Trace:
       ext4_es_cache_extent+0x109/0x340 fs/ext4/extents_status.c:766
       ext4_cache_extents+0x239/0x2e0 fs/ext4/extents.c:561
       ext4_find_extent+0x6b7/0xa20 fs/ext4/extents.c:964
       ext4_ext_map_blocks+0x16b/0x4b70 fs/ext4/extents.c:4384
       ext4_map_blocks+0xe26/0x19f0 fs/ext4/inode.c:567
       ext4_getblk+0x320/0x4c0 fs/ext4/inode.c:980
       ext4_bread+0x2d/0x170 fs/ext4/inode.c:1031
       ext4_quota_read+0x248/0x320 fs/ext4/super.c:6257
       v2_read_header+0x78/0x110 fs/quota/quota_v2.c:63
       v2_check_quota_file+0x76/0x230 fs/quota/quota_v2.c:82
       vfs_load_quota_inode+0x5d1/0x1530 fs/quota/dquot.c:2368
       dquot_enable+0x28a/0x330 fs/quota/dquot.c:2490
       ext4_quota_enable fs/ext4/super.c:6137 [inline]
       ext4_enable_quotas+0x5d7/0x960 fs/ext4/super.c:6163
       ext4_fill_super+0xa7c9/0xdc00 fs/ext4/super.c:4754
       mount_bdev+0x2e9/0x3b0 fs/super.c:1158
       mount_fs+0x4b/0x1e4 fs/super.c:1261
      [...]
      ==================================================================
      
      Above issue may happen as follows:
      -------------------------------------
      ext4_fill_super
       ext4_enable_quotas
        ext4_quota_enable
         ext4_iget
          __ext4_iget
           ext4_ext_check_inode
            ext4_ext_check
             __ext4_ext_check
              ext4_valid_extent_entries
               Check for overlapping extents does't take effect
         dquot_enable
          vfs_load_quota_inode
           v2_check_quota_file
            v2_read_header
             ext4_quota_read
              ext4_bread
               ext4_getblk
                ext4_map_blocks
                 ext4_ext_map_blocks
                  ext4_find_extent
                   ext4_cache_extents
                    ext4_es_cache_extent
                     ext4_es_cache_extent
                      __es_tree_search
                       ext4_es_end
                        BUG_ON(es->es_lblk + es->es_len < es->es_lblk)
      
      The error ext4 extents is as follows:
      0af3 0300 0400 0000 00000000    extent_header
      00000000 0100 0000 12000000     extent1
      00000000 0100 0000 18000000     extent2
      02000000 0400 0000 14000000     extent3
      
      In the ext4_valid_extent_entries function,
      if prev is 0, no error is returned even if lblock<=prev.
      This was intended to skip the check on the first extent, but
      in the error image above, prev=0+1-1=0 when checking the second extent,
      so even though lblock<=prev, the function does not return an error.
      As a result, bug_ON occurs in __es_tree_search and the system panics.
      
      To solve this problem, we only need to check that:
      1. The lblock of the first extent is not less than 0.
      2. The lblock of the next extent  is not less than
         the next block of the previous extent.
      The same applies to extent_idx.
      
      Cc: stable@kernel.org
      Fixes: 5946d089 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220518120816.1541863-1-libaokun1@huawei.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c617827
    • Theodore Ts'o's avatar
      ext4: filter out EXT4_FC_REPLAY from on-disk superblock field s_state · b99fd734
      Theodore Ts'o authored
      
      commit c878bea3c9d724ddfa05a813f30de3d25a0ba83f upstream.
      
      The EXT4_FC_REPLAY bit in sbi->s_mount_state is used to indicate that
      we are in the middle of replay the fast commit journal.  This was
      actually a mistake, since the sbi->s_mount_info is initialized from
      es->s_state.  Arguably s_mount_state is misleadingly named, but the
      name is historical --- s_mount_state and s_state dates back to ext2.
      
      What should have been used is the ext4_{set,clear,test}_mount_flag()
      inline functions, which sets EXT4_MF_* bits in sbi->s_mount_flags.
      
      The problem with using EXT4_FC_REPLAY is that a maliciously corrupted
      superblock could result in EXT4_FC_REPLAY getting set in
      s_mount_state.  This bypasses some sanity checks, and this can trigger
      a BUG() in ext4_es_cache_extent().  As a easy-to-backport-fix, filter
      out the EXT4_FC_REPLAY bit for now.  We should eventually transition
      away from EXT4_FC_REPLAY to something like EXT4_MF_REPLAY.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Link: https://lore.kernel.org/r/20220420192312.1655305-1-phind.uet@gmail.com
      Link: https://lore.kernel.org/r/20220517174028.942119-1-tytso@mit.edu
      
      
      Reported-by: default avatar <syzbot+c7358a3cd05ee786eb31@syzkaller.appspotmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b99fd734
    • Ye Bin's avatar
      ext4: fix bug_on in ext4_writepages · 18a759f7
      Ye Bin authored
      
      commit ef09ed5d37b84d18562b30cf7253e57062d0db05 upstream.
      
      we got issue as follows:
      EXT4-fs error (device loop0): ext4_mb_generate_buddy:1141: group 0, block bitmap and bg descriptor inconsistent: 25 vs 31513 free cls
      ------------[ cut here ]------------
      kernel BUG at fs/ext4/inode.c:2708!
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
      CPU: 2 PID: 2147 Comm: rep Not tainted 5.18.0-rc2-next-20220413+ #155
      RIP: 0010:ext4_writepages+0x1977/0x1c10
      RSP: 0018:ffff88811d3e7880 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88811c098000
      RDX: 0000000000000000 RSI: ffff88811c098000 RDI: 0000000000000002
      RBP: ffff888128140f50 R08: ffffffffb1ff6387 R09: 0000000000000000
      R10: 0000000000000007 R11: ffffed10250281ea R12: 0000000000000001
      R13: 00000000000000a4 R14: ffff88811d3e7bb8 R15: ffff888128141028
      FS:  00007f443aed9740(0000) GS:ffff8883aef00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020007200 CR3: 000000011c2a4000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       do_writepages+0x130/0x3a0
       filemap_fdatawrite_wbc+0x83/0xa0
       filemap_flush+0xab/0xe0
       ext4_alloc_da_blocks+0x51/0x120
       __ext4_ioctl+0x1534/0x3210
       __x64_sys_ioctl+0x12c/0x170
       do_syscall_64+0x3b/0x90
      
      It may happen as follows:
      1. write inline_data inode
      vfs_write
        new_sync_write
          ext4_file_write_iter
            ext4_buffered_write_iter
              generic_perform_write
                ext4_da_write_begin
                  ext4_da_write_inline_data_begin -> If inline data size too
                  small will allocate block to write, then mapping will has
                  dirty page
                      ext4_da_convert_inline_data_to_extent ->clear EXT4_STATE_MAY_INLINE_DATA
      2. fallocate
      do_vfs_ioctl
        ioctl_preallocate
          vfs_fallocate
            ext4_fallocate
              ext4_convert_inline_data
                ext4_convert_inline_data_nolock
                  ext4_map_blocks -> fail will goto restore data
                  ext4_restore_inline_data
                    ext4_create_inline_data
                    ext4_write_inline_data
                    ext4_set_inode_state -> set inode EXT4_STATE_MAY_INLINE_DATA
      3. writepages
      __ext4_ioctl
        ext4_alloc_da_blocks
          filemap_flush
            filemap_fdatawrite_wbc
              do_writepages
                ext4_writepages
                  if (ext4_has_inline_data(inode))
                    BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA))
      
      The root cause of this issue is we destory inline data until call
      ext4_writepages under delay allocation mode.  But there maybe already
      convert from inline to extent.  To solve this issue, we call
      filemap_flush first..
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220516122634.1690462-1-yebin10@huawei.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18a759f7
    • Ye Bin's avatar
      ext4: fix warning in ext4_handle_inode_extension · b81d2ff6
      Ye Bin authored
      
      commit f4534c9fc94d22383f187b9409abb3f9df2e3db3 upstream.
      
      We got issue as follows:
      EXT4-fs error (device loop0) in ext4_reserve_inode_write:5741: Out of memory
      EXT4-fs error (device loop0): ext4_setattr:5462: inode #13: comm syz-executor.0: mark_inode_dirty error
      EXT4-fs error (device loop0) in ext4_setattr:5519: Out of memory
      EXT4-fs error (device loop0): ext4_ind_map_blocks:595: inode #13: comm syz-executor.0: Can't allocate blocks for non-extent mapped inodes with bigalloc
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 4361 at fs/ext4/file.c:301 ext4_file_write_iter+0x11c9/0x1220
      Modules linked in:
      CPU: 1 PID: 4361 Comm: syz-executor.0 Not tainted 5.10.0+ #1
      RIP: 0010:ext4_file_write_iter+0x11c9/0x1220
      RSP: 0018:ffff924d80b27c00 EFLAGS: 00010282
      RAX: ffffffff815a3379 RBX: 0000000000000000 RCX: 000000003b000000
      RDX: ffff924d81601000 RSI: 00000000000009cc RDI: 00000000000009cd
      RBP: 000000000000000d R08: ffffffffbc5a2c6b R09: 0000902e0e52a96f
      R10: ffff902e2b7c1b40 R11: ffff902e2b7c1b40 R12: 000000000000000a
      R13: 0000000000000001 R14: ffff902e0e52aa10 R15: ffffffffffffff8b
      FS:  00007f81a7f65700(0000) GS:ffff902e3bc80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffff600400 CR3: 000000012db88001 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       do_iter_readv_writev+0x2e5/0x360
       do_iter_write+0x112/0x4c0
       do_pwritev+0x1e5/0x390
       __x64_sys_pwritev2+0x7e/0xa0
       do_syscall_64+0x37/0x50
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Above issue may happen as follows:
      Assume
      inode.i_size=4096
      EXT4_I(inode)->i_disksize=4096
      
      step 1: set inode->i_isize = 8192
      ext4_setattr
        if (attr->ia_size != inode->i_size)
          EXT4_I(inode)->i_disksize = attr->ia_size;
          rc = ext4_mark_inode_dirty
             ext4_reserve_inode_write
                ext4_get_inode_loc
                  __ext4_get_inode_loc
                    sb_getblk --> return -ENOMEM
         ...
         if (!error)  ->will not update i_size
           i_size_write(inode, attr->ia_size);
      Now:
      inode.i_size=4096
      EXT4_I(inode)->i_disksize=8192
      
      step 2: Direct write 4096 bytes
      ext4_file_write_iter
       ext4_dio_write_iter
         iomap_dio_rw ->return error
       if (extend)
         ext4_handle_inode_extension
           WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize);
      ->Then trigger warning.
      
      To solve above issue, if mark inode dirty failed in ext4_setattr just
      set 'EXT4_I(inode)->i_disksize' with old value.
      
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20220326065351.761952-1-yebin10@huawei.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b81d2ff6
    • Baokun Li's avatar
      ext4: fix race condition between ext4_write and ext4_convert_inline_data · 14602353
      Baokun Li authored
      
      commit f87c7a4b084afc13190cbb263538e444cb2b392a upstream.
      
      Hulk Robot reported a BUG_ON:
       ==================================================================
       EXT4-fs error (device loop3): ext4_mb_generate_buddy:805: group 0,
       block bitmap and bg descriptor inconsistent: 25 vs 31513 free clusters
       kernel BUG at fs/ext4/ext4_jbd2.c:53!
       invalid opcode: 0000 [#1] SMP KASAN PTI
       CPU: 0 PID: 25371 Comm: syz-executor.3 Not tainted 5.10.0+ #1
       RIP: 0010:ext4_put_nojournal fs/ext4/ext4_jbd2.c:53 [inline]
       RIP: 0010:__ext4_journal_stop+0x10e/0x110 fs/ext4/ext4_jbd2.c:116
       [...]
       Call Trace:
        ext4_write_inline_data_end+0x59a/0x730 fs/ext4/inline.c:795
        generic_perform_write+0x279/0x3c0 mm/filemap.c:3344
        ext4_buffered_write_iter+0x2e3/0x3d0 fs/ext4/file.c:270
        ext4_file_write_iter+0x30a/0x11c0 fs/ext4/file.c:520
        do_iter_readv_writev+0x339/0x3c0 fs/read_write.c:732
        do_iter_write+0x107/0x430 fs/read_write.c:861
        vfs_writev fs/read_write.c:934 [inline]
        do_pwritev+0x1e5/0x380 fs/read_write.c:1031
       [...]
       ==================================================================
      
      Above issue may happen as follows:
                 cpu1                     cpu2
      __________________________|__________________________
      do_pwritev
       vfs_writev
        do_iter_write
         ext4_file_write_iter
          ext4_buffered_write_iter
           generic_perform_write
            ext4_da_write_begin
                                 vfs_fallocate
                                  ext4_fallocate
                                   ext4_convert_inline_data
                                    ext4_convert_inline_data_nolock
                                     ext4_destroy_inline_data_nolock
                                      clear EXT4_STATE_MAY_INLINE_DATA
                                     ext4_map_blocks
                                      ext4_ext_map_blocks
                                       ext4_mb_new_blocks
                                        ext4_mb_regular_allocator
                                         ext4_mb_good_group_nolock
                                          ext4_mb_init_group
                                           ext4_mb_init_cache
                                            ext4_mb_generate_buddy  --> error
             ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
                                      ext4_restore_inline_data
                                       set EXT4_STATE_MAY_INLINE_DATA
             ext4_block_write_begin
            ext4_da_write_end
             ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
             ext4_write_inline_data_end
              handle=NULL
              ext4_journal_stop(handle)
               __ext4_journal_stop
                ext4_put_nojournal(handle)
                 ref_cnt = (unsigned long)handle
                 BUG_ON(ref_cnt == 0)  ---> BUG_ON
      
      The lock held by ext4_convert_inline_data is xattr_sem, but the lock
      held by generic_perform_write is i_rwsem. Therefore, the two locks can
      be concurrent.
      
      To solve above issue, we add inode_lock() for ext4_convert_inline_data().
      At the same time, move ext4_convert_inline_data() in front of
      ext4_punch_hole(), remove similar handling from ext4_punch_hole().
      
      Fixes: 0c8d414f ("ext4: let fallocate handle inline data correctly")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220428134031.4153381-1-libaokun1@huawei.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14602353
    • Ye Bin's avatar
      ext4: fix use-after-free in ext4_rename_dir_prepare · 364380c0
      Ye Bin authored
      
      commit 0be698ecbe4471fcad80e81ec6a05001421041b3 upstream.
      
      We got issue as follows:
      EXT4-fs (loop0): mounted filesystem without journal. Opts: ,errors=continue
      ext4_get_first_dir_block: bh->b_data=0xffff88810bee6000 len=34478
      ext4_get_first_dir_block: *parent_de=0xffff88810beee6ae bh->b_data=0xffff88810bee6000
      ext4_rename_dir_prepare: [1] parent_de=0xffff88810beee6ae
      ==================================================================
      BUG: KASAN: use-after-free in ext4_rename_dir_prepare+0x152/0x220
      Read of size 4 at addr ffff88810beee6ae by task rep/1895
      
      CPU: 13 PID: 1895 Comm: rep Not tainted 5.10.0+ #241
      Call Trace:
       dump_stack+0xbe/0xf9
       print_address_description.constprop.0+0x1e/0x220
       kasan_report.cold+0x37/0x7f
       ext4_rename_dir_prepare+0x152/0x220
       ext4_rename+0xf44/0x1ad0
       ext4_rename2+0x11c/0x170
       vfs_rename+0xa84/0x1440
       do_renameat2+0x683/0x8f0
       __x64_sys_renameat+0x53/0x60
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7f45a6fc41c9
      RSP: 002b:00007ffc5a470218 EFLAGS: 00000246 ORIG_RAX: 0000000000000108
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f45a6fc41c9
      RDX: 0000000000000005 RSI: 0000000020000180 RDI: 0000000000000005
      RBP: 00007ffc5a470240 R08: 00007ffc5a470160 R09: 0000000020000080
      R10: 00000000200001c0 R11: 0000000000000246 R12: 0000000000400bb0
      R13: 00007ffc5a470320 R14: 0000000000000000 R15: 0000000000000000
      
      The buggy address belongs to the page:
      page:00000000440015ce refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x10beee
      flags: 0x200000000000000()
      raw: 0200000000000000 ffffea00043ff4c8 ffffea0004325608 0000000000000000
      raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88810beee580: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff88810beee600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      >ffff88810beee680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                        ^
       ffff88810beee700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff88810beee780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      ==================================================================
      Disabling lock debugging due to kernel taint
      ext4_rename_dir_prepare: [2] parent_de->inode=3537895424
      ext4_rename_dir_prepare: [3] dir=0xffff888124170140
      ext4_rename_dir_prepare: [4] ino=2
      ext4_rename_dir_prepare: ent->dir->i_ino=2 parent=-757071872
      
      Reason is first directory entry which 'rec_len' is 34478, then will get illegal
      parent entry. Now, we do not check directory entry after read directory block
      in 'ext4_get_first_dir_block'.
      To solve this issue, check directory entry in 'ext4_get_first_dir_block'.
      
      [ Trigger an ext4_error() instead of just warning if the directory is
        missing a '.' or '..' entry.   Also make sure we return an error code
        if the file system is corrupted.  -TYT ]
      
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220414025223.4113128-1-yebin10@huawei.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      364380c0
    • Dmitry Monakhov's avatar
      ext4: mark group as trimmed only if it was fully scanned · 3e4b684f
      Dmitry Monakhov authored
      
      commit d63c00ea435a5352f486c259665a4ced60399421 upstream.
      
      Otherwise nonaligned fstrim calls will works inconveniently for iterative
      scanners, for example:
      
      // trim [0,16MB] for group-1, but mark full group as trimmed
      fstrim  -o $((1024*1024*128)) -l $((1024*1024*16)) ./m
      // handle [16MB,16MB] for group-1, do nothing because group already has the flag.
      fstrim  -o $((1024*1024*144)) -l $((1024*1024*16)) ./m
      
      [ Update function documentation for ext4_trim_all_free -- TYT ]
      
      Signed-off-by: default avatarDmitry Monakhov <dmtrmonakhov@yandex-team.ru>
      Link: https://lore.kernel.org/r/1650214995-860245-1-git-send-email-dmtrmonakhov@yandex-team.ru
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e4b684f
    • Jan Kara's avatar
      bfq: Make sure bfqg for which we are queueing requests is online · 6ee0868b
      Jan Kara authored
      
      commit 075a53b78b815301f8d3dd1ee2cd99554e34f0dd upstream.
      
      Bios queued into BFQ IO scheduler can be associated with a cgroup that
      was already offlined. This may then cause insertion of this bfq_group
      into a service tree. But this bfq_group will get freed as soon as last
      bio associated with it is completed leading to use after free issues for
      service tree users. Fix the problem by making sure we always operate on
      online bfq_group. If the bfq_group associated with the bio is not
      online, we pick the first online parent.
      
      CC: stable@vger.kernel.org
      Fixes: e21b7a0b ("block, bfq: add full hierarchical scheduling and cgroups support")
      Tested-by: default avatar"yukuai (C)" <yukuai3@huawei.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220401102752.8599-9-jack@suse.cz
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ee0868b
    • Jan Kara's avatar
      bfq: Get rid of __bio_blkcg() usage · 86defc54
      Jan Kara authored
      
      commit 4e54a2493e582361adc3bfbf06c7d50d19d18837 upstream.
      
      BFQ usage of __bio_blkcg() is a relict from the past. Furthermore if bio
      would not be associated with any blkcg, the usage of __bio_blkcg() in
      BFQ is prone to races with the task being migrated between cgroups as
      __bio_blkcg() calls at different places could return different blkcgs.
      
      Convert BFQ to the new situation where bio->bi_blkg is initialized in
      bio_set_dev() and thus practically always valid. This allows us to save
      blkcg_gq lookup and noticeably simplify the code.
      
      CC: stable@vger.kernel.org
      Fixes: 0fe061b9 ("blkcg: fix ref count issue with bio_blkcg() using task_css")
      Tested-by: default avatar"yukuai (C)" <yukuai3@huawei.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220401102752.8599-8-jack@suse.cz
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86defc54
    • Jan Kara's avatar
      bfq: Track whether bfq_group is still online · 54c08ef2
      Jan Kara authored
      
      commit 09f871868080c33992cd6a9b72a5ca49582578fa upstream.
      
      Track whether bfq_group is still online. We cannot rely on
      blkcg_gq->online because that gets cleared only after all policies are
      offlined and we need something that gets updated already under
      bfqd->lock when we are cleaning up our bfq_group to be able to guarantee
      that when we see online bfq_group, it will stay online while we are
      holding bfqd->lock lock.
      
      CC: stable@vger.kernel.org
      Tested-by: default avatar"yukuai (C)" <yukuai3@huawei.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220401102752.8599-7-jack@suse.cz
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54c08ef2
    • Jan Kara's avatar
      bfq: Remove pointless bfq_init_rq() calls · 2b802c0c
      Jan Kara authored
      
      commit 5f550ede5edf846ecc0067be1ba80514e6fe7f8e upstream.
      
      We call bfq_init_rq() from request merging functions where requests we
      get should have already gone through bfq_init_rq() during insert and
      anyway we want to do anything only if the request is already tracked by
      BFQ. So replace calls to bfq_init_rq() with RQ_BFQQ() instead to simply
      skip requests untracked by BFQ. We move bfq_init_rq() call in
      bfq_insert_request() a bit earlier to cover request merging and thus
      can transfer FIFO position in case of a merge.
      
      CC: stable@vger.kernel.org
      Tested-by: default avatar"yukuai (C)" <yukuai3@huawei.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220401102752.8599-6-jack@suse.cz
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b802c0c
    • Jan Kara's avatar
      bfq: Drop pointless unlock-lock pair · a107df38
      Jan Kara authored
      
      commit fc84e1f941b91221092da5b3102ec82da24c5673 upstream.
      
      In bfq_insert_request() we unlock bfqd->lock only to call
      trace_block_rq_insert() and then lock bfqd->lock again. This is really
      pointless since tracing is disabled if we really care about performance
      and even if the tracepoint is enabled, it is a quick call.
      
      CC: stable@vger.kernel.org
      Tested-by: default avatar"yukuai (C)" <yukuai3@huawei.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220401102752.8599-5-jack@suse.cz
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a107df38
    • Jan Kara's avatar
      bfq: Update cgroup information before merging bio · e8821f45
      Jan Kara authored
      
      commit ea591cd4eb270393810e7be01feb8fde6a34fbbe upstream.
      
      When the process is migrated to a different cgroup (or in case of
      writeback just starts submitting bios associated with a different
      cgroup) bfq_merge_bio() can operate with stale cgroup information in
      bic. Thus the bio can be merged to a request from a different cgroup or
      it can result in merging of bfqqs for different cgroups or bfqqs of
      already dead cgroups and causing possible use-after-free issues. Fix the
      problem by updating cgroup information in bfq_merge_bio().
      
      CC: stable@vger.kernel.org
      Fixes: e21b7a0b ("block, bfq: add full hierarchical scheduling and cgroups support")
      Tested-by: default avatar"yukuai (C)" <yukuai3@huawei.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220401102752.8599-4-jack@suse.cz
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8821f45
    • Jan Kara's avatar
      bfq: Split shared queues on move between cgroups · 81b7d0c7
      Jan Kara authored
      
      commit 3bc5e683c67d94bd839a1da2e796c15847b51b69 upstream.
      
      When bfqq is shared by multiple processes it can happen that one of the
      processes gets moved to a different cgroup (or just starts submitting IO
      for different cgroup). In case that happens we need to split the merged
      bfqq as otherwise we will have IO for multiple cgroups in one bfqq and
      we will just account IO time to wrong entities etc.
      
      Similarly if the bfqq is scheduled to merge with another bfqq but the
      merge didn't happen yet, cancel the merge as it need not be valid
      anymore.
      
      CC: stable@vger.kernel.org
      Fixes: e21b7a0b ("block, bfq: add full hierarchical scheduling and cgroups support")
      Tested-by: default avatar"yukuai (C)" <yukuai3@huawei.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220401102752.8599-3-jack@suse.cz
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81b7d0c7
    • Jan Kara's avatar
      bfq: Avoid merging queues with different parents · 5ee21eda
      Jan Kara authored
      commit c1cee4ab36acef271be9101590756ed0c0c374d9 upstream.
      
      It can happen that the parent of a bfqq changes between the moment we
      decide two queues are worth to merge (and set bic->stable_merge_bfqq)
      and the moment bfq_setup_merge() is called. This can happen e.g. because
      the process submitted IO for a different cgroup and thus bfqq got
      reparented. It can even happen that the bfqq we are merging with has
      parent cgroup that is already offline and going to be destroyed in which
      case the merge can lead to use-after-free issues such as:
      
      BUG: KASAN: use-after-free in __bfq_deactivate_entity+0x9cb/0xa50
      Read of size 8 at addr ffff88800693c0c0 by task runc:[2:INIT]/10544
      
      CPU: 0 PID: 10544 Comm: runc:[2:INIT] Tainted: G            E     5.15.2-0.g5fb85fd-default #1 openSUSE Tumbleweed (unreleased) f1f3b891c72369aebecd2e43e4641a6358867c70
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
      Call Trace:
       <IRQ>
       dump_stack_lvl+0x46/0x5a
       print_address_description.constprop.0+0x1f/0x140
       ? __bfq_deactivate_entity+0x9cb/0xa50
       kasan_report.cold+0x7f/0x11b
       ? __bfq_deactivate_entity+0x9cb/0xa50
       __bfq_deactivate_entity+0x9cb/0xa50
       ? update_curr+0x32f/0x5d0
       bfq_deactivate_entity+0xa0/0x1d0
       bfq_del_bfqq_busy+0x28a/0x420
       ? resched_curr+0x116/0x1d0
       ? bfq_requeue_bfqq+0x70/0x70
       ? check_preempt_wakeup+0x52b/0xbc0
       __bfq_bfqq_expire+0x1a2/0x270
       bfq_bfqq_expire+0xd16/0x2160
       ? try_to_wake_up+0x4ee/0x1260
       ? bfq_end_wr_async_queues+0xe0/0xe0
       ? _raw_write_unlock_bh+0x60/0x60
       ? _raw_spin_lock_irq+0x81/0xe0
       bfq_idle_slice_timer+0x109/0x280
       ? bfq_dispatch_request+0x4870/0x4870
       __hrtimer_run_queues+0x37d/0x700
       ? enqueue_hrtimer+0x1b0/0x1b0
       ? kvm_clock_get_cycles+0xd/0x10
       ? ktime_get_update_offsets_now+0x6f/0x280
       hrtimer_interrupt+0x2c8/0x740
      
      Fix the problem by checking that the parent of the two bfqqs we are
      merging in bfq_setup_merge() is the same.
      
      Link: https://lore.kernel.org/linux-block/20211125172809.GC19572@quack2.suse.cz/
      
      
      CC: stable@vger.kernel.org
      Fixes: 430a67f9 ("block, bfq: merge bursts of newly-created queues")
      Tested-by: default avatar"yukuai (C)" <yukuai3@huawei.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220401102752.8599-2-jack@suse.cz
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ee21eda
    • Jan Kara's avatar
      bfq: Avoid false marking of bic as stably merged · d639a4c0
      Jan Kara authored
      
      commit 70456e5210f40ffdb8f6d905acfdcec5bd5fad9e upstream.
      
      bfq_setup_cooperator() can mark bic as stably merged even though it
      decides to not merge its bfqqs (when bfq_setup_merge() returns NULL).
      Make sure to mark bic as stably merged only if we are really going to
      merge bfqqs.
      
      CC: stable@vger.kernel.org
      Tested-by: default avatar"yukuai (C)" <yukuai3@huawei.com>
      Fixes: 430a67f9 ("block, bfq: merge bursts of newly-created queues")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220401102752.8599-1-jack@suse.cz
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d639a4c0
    • Aditya Garg's avatar
      efi: Do not import certificates from UEFI Secure Boot for T2 Macs · 65237307
      Aditya Garg authored
      
      commit 155ca952c7ca19aa32ecfb7373a32bbc2e1ec6eb upstream.
      
      On Apple T2 Macs, when Linux attempts to read the db and dbx efi variables
      at early boot to load UEFI Secure Boot certificates, a page fault occurs
      in Apple firmware code and EFI runtime services are disabled with the
      following logs:
      
      [Firmware Bug]: Page fault caused by firmware at PA: 0xffffb1edc0068000
      WARNING: CPU: 3 PID: 104 at arch/x86/platform/efi/quirks.c:735 efi_crash_gracefully_on_page_fault+0x50/0xf0
      (Removed some logs from here)
      Call Trace:
       <TASK>
       page_fault_oops+0x4f/0x2c0
       ? search_bpf_extables+0x6b/0x80
       ? search_module_extables+0x50/0x80
       ? search_exception_tables+0x5b/0x60
       kernelmode_fixup_or_oops+0x9e/0x110
       __bad_area_nosemaphore+0x155/0x190
       bad_area_nosemaphore+0x16/0x20
       do_kern_addr_fault+0x8c/0xa0
       exc_page_fault+0xd8/0x180
       asm_exc_page_fault+0x1e/0x30
      (Removed some logs from here)
       ? __efi_call+0x28/0x30
       ? switch_mm+0x20/0x30
       ? efi_call_rts+0x19a/0x8e0
       ? process_one_work+0x222/0x3f0
       ? worker_thread+0x4a/0x3d0
       ? kthread+0x17a/0x1a0
       ? process_one_work+0x3f0/0x3f0
       ? set_kthread_struct+0x40/0x40
       ? ret_from_fork+0x22/0x30
       </TASK>
      ---[ end trace 1f82023595a5927f ]---
      efi: Froze efi_rts_wq and disabled EFI Runtime Services
      integrity: Couldn't get size: 0x8000000000000015
      integrity: MODSIGN: Couldn't get UEFI db list
      efi: EFI Runtime Services are disabled!
      integrity: Couldn't get size: 0x8000000000000015
      integrity: Couldn't get UEFI dbx list
      integrity: Couldn't get size: 0x8000000000000015
      integrity: Couldn't get mokx list
      integrity: Couldn't get size: 0x80000000
      
      So we avoid reading these UEFI variables and thus prevent the crash.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAditya Garg <gargaditya08@live.com>
      Reviewed-by: default avatarMimi Zohar <zohar@linux.ibm.com>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65237307
    • Zhihao Cheng's avatar
      fs-writeback: writeback_sb_inodes:Recalculate 'wrote' according skipped pages · 9bc601c6
      Zhihao Cheng authored
      commit 68f4c6eba70df70a720188bce95c85570ddfcc87 upstream.
      
      Commit 505a666e ("writeback: plug writeback in wb_writeback() and
      writeback_inodes_wb()") has us holding a plug during wb_writeback, which
      may cause a potential ABBA dead lock:
      
          wb_writeback		fat_file_fsync
      blk_start_plug(&plug)
      for (;;) {
        iter i-1: some reqs have been added into plug->mq_list  // LOCK A
        iter i:
          progress = __writeback_inodes_wb(wb, work)
          . writeback_sb_inodes // fat's bdev
          .   __writeback_single_inode
          .   . generic_writepages
          .   .   __block_write_full_page
          .   .   . . 	    __generic_file_fsync
          .   .   . . 	      sync_inode_metadata
          .   .   . . 	        writeback_single_inode
          .   .   . . 		  __writeback_single_inode
          .   .   . . 		    fat_write_inode
          .   .   . . 		      __fat_write_inode
          .   .   . . 		        sync_dirty_buffer	// fat's bdev
          .   .   . . 			  lock_buffer(bh)	// LOCK B
          .   .   . . 			    submit_bh
          .   .   . . 			      blk_mq_get_tag	// LOCK A
          .   .   . trylock_buffer(bh)  // LOCK B
          .   .   .   redirty_page_for_writepage
          .   .   .     wbc->pages_skipped++
          .   .   --wbc->nr_to_write
          .   wrote += write_chunk - wbc.nr_to_write  // wrote > 0
          .   requeue_inode
          .     redirty_tail_locked
          if (progress)    // progress > 0
            continue;
        iter i+1:
            queue_io
            // similar process with iter i, infinite for-loop !
      }
      blk_finish_plug(&plug)   // flush plug won't be called
      
      Above process triggers a hungtask like:
      [  399.044861] INFO: task bb:2607 blocked for more than 30 seconds.
      [  399.046824]       Not tainted 5.18.0-rc1-00005-gefae4d9eb6a2-dirty
      [  399.051539] task:bb              state:D stack:    0 pid: 2607 ppid:
      2426 flags:0x00004000
      [  399.051556] Call Trace:
      [  399.051570]  __schedule+0x480/0x1050
      [  399.051592]  schedule+0x92/0x1a0
      [  399.051602]  io_schedule+0x22/0x50
      [  399.051613]  blk_mq_get_tag+0x1d3/0x3c0
      [  399.051640]  __blk_mq_alloc_requests+0x21d/0x3f0
      [  399.051657]  blk_mq_submit_bio+0x68d/0xca0
      [  399.051674]  __submit_bio+0x1b5/0x2d0
      [  399.051708]  submit_bio_noacct+0x34e/0x720
      [  399.051718]  submit_bio+0x3b/0x150
      [  399.051725]  submit_bh_wbc+0x161/0x230
      [  399.051734]  __sync_dirty_buffer+0xd1/0x420
      [  399.051744]  sync_dirty_buffer+0x17/0x20
      [  399.051750]  __fat_write_inode+0x289/0x310
      [  399.051766]  fat_write_inode+0x2a/0xa0
      [  399.051783]  __writeback_single_inode+0x53c/0x6f0
      [  399.051795]  writeback_single_inode+0x145/0x200
      [  399.051803]  sync_inode_metadata+0x45/0x70
      [  399.051856]  __generic_file_fsync+0xa3/0x150
      [  399.051880]  fat_file_fsync+0x1d/0x80
      [  399.051895]  vfs_fsync_range+0x40/0xb0
      [  399.051929]  __x64_sys_fsync+0x18/0x30
      
      In my test, 'need_resched()' (which is imported by 590dca3a "fs-writeback:
      unplug before cond_resched in writeback_sb_inodes") in function
      'writeback_sb_inodes()' seldom comes true, unless cond_resched() is deleted
      from write_cache_pages().
      
      Fix it by correcting wrote number according number of skipped pages
      in writeback_sb_inodes().
      
      Goto Link to find a reproducer.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215837
      
      
      Cc: stable@vger.kernel.org # v4.3
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220510133805.1988292-1-chengzhihao1@huawei.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9bc601c6
    • Emmanuel Grumbach's avatar
      iwlwifi: mvm: fix assert 1F04 upon reconfig · 87737ee5
      Emmanuel Grumbach authored
      
      commit 9d096e3d3061dbf4ee10e2b59fc2c06e05bdb997 upstream.
      
      When we reconfig we must not send the MAC_POWER command that relates to
      a MAC that was not yet added to the firmware.
      
      Ignore those in the iterator.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarGregory Greenman <gregory.greenman@intel.com>
      Link: https://lore.kernel.org/r/20220517120044.ed2ffc8ce732.If786e19512d0da4334a6382ea6148703422c7d7b@changeid
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87737ee5
    • Johannes Berg's avatar
      wifi: mac80211: fix use-after-free in chanctx code · b79110f2
      Johannes Berg authored
      
      commit 2965c4cdf7ad9ce0796fac5e57debb9519ea721e upstream.
      
      In ieee80211_vif_use_reserved_context(), when we have an
      old context and the new context's replace_state is set to
      IEEE80211_CHANCTX_REPLACE_NONE, we free the old context
      in ieee80211_vif_use_reserved_reassign(). Therefore, we
      cannot check the old_ctx anymore, so we should set it to
      NULL after this point.
      
      However, since the new_ctx replace state is clearly not
      IEEE80211_CHANCTX_REPLACES_OTHER, we're not going to do
      anything else in this function and can just return to
      avoid accessing the freed old_ctx.
      
      Cc: stable@vger.kernel.org
      Fixes: 5bcae31d ("mac80211: implement multi-vif in-place reservations")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarKalle Valo <kvalo@kernel.org>
      Link: https://lore.kernel.org/r/20220601091926.df419d91b165.I17a9b3894ff0b8323ce2afdb153b101124c821e5@changeid
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b79110f2
    • Peter Zijlstra's avatar
      objtool: Fix symbol creation · 4a6ca6f8
      Peter Zijlstra authored
      
      commit ead165fa1042247b033afad7be4be9b815d04ade upstream.
      
      Nathan reported objtool failing with the following messages:
      
        warning: objtool: no non-local symbols !?
        warning: objtool: gelf_update_symshndx: invalid section index
      
      The problem is due to commit 4abff6d48dbc ("objtool: Fix code relocs
      vs weak symbols") failing to consider the case where an object would
      have no non-local symbols.
      
      The problem that commit tries to address is adding a STB_LOCAL symbol
      to the symbol table in light of the ELF spec's requirement that:
      
        In each symbol table, all symbols with STB_LOCAL binding preced the
        weak and global symbols.  As ``Sections'' above describes, a symbol
        table section's sh_info section header member holds the symbol table
        index for the first non-local symbol.
      
      The approach taken is to find this first non-local symbol, move that
      to the end and then re-use the freed spot to insert a new local symbol
      and increment sh_info.
      
      Except it never considered the case of object files without global
      symbols and got a whole bunch of details wrong -- so many in fact that
      it is a wonder it ever worked :/
      
      Specifically:
      
       - It failed to re-hash the symbol on the new index, so a subsequent
         find_symbol_by_index() would not find it at the new location and a
         query for the old location would now return a non-deterministic
         choice between the old and new symbol.
      
       - It failed to appreciate that the GElf wrappers are not a valid disk
         format (it works because GElf is basically Elf64 and we only
         support x86_64 atm.)
      
       - It failed to fully appreciate how horrible the libelf API really is
         and got the gelf_update_symshndx() call pretty much completely
         wrong; with the direct consequence that if inserting a second
         STB_LOCAL symbol would require moving the same STB_GLOBAL symbol
         again it would completely come unstuck.
      
      Write a new elf_update_symbol() function that wraps all the magic
      required to update or create a new symbol at a given index.
      
      Specifically, gelf_update_sym*() require an @ndx argument that is
      relative to the @data argument; this means you have to manually
      iterate the section data descriptor list and update @ndx.
      
      Fixes: 4abff6d48dbc ("objtool: Fix code relocs vs weak symbols")
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/YoPCTEYjoPqE4ZxB@hirez.programming.kicks-ass.net
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a6ca6f8
    • Mikulas Patocka's avatar
      objtool: Fix objtool regression on x32 systems · c4923824
      Mikulas Patocka authored
      
      commit 22682a07acc308ef78681572e19502ce8893c4d4 upstream.
      
      Commit c087c6e7b551 ("objtool: Fix type of reloc::addend") failed to
      appreciate cross building from ILP32 hosts, where 'int' == 'long' and
      the issue persists.
      
      As such, use s64/int64_t/Elf64_Sxword for this field and suffer the
      pain that is ISO C99 printf formats for it.
      
      Fixes: c087c6e7b551 ("objtool: Fix type of reloc::addend")
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      [peterz: reword changelog, s/long long/s64/]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/alpine.LRH.2.02.2205161041260.11556@file01.intranet.prod.int.rdu2.redhat.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4923824
    • Chao Yu's avatar
      f2fs: fix to do sanity check for inline inode · 7cfe2d43
      Chao Yu authored
      commit 677a82b44ebf263d4f9a0cfbd576a6ade797a07b upstream.
      
      Yanming reported a kernel bug in Bugzilla kernel [1], which can be
      reproduced. The bug message is:
      
      The kernel message is shown below:
      
      kernel BUG at fs/inode.c:611!
      Call Trace:
       evict+0x282/0x4e0
       __dentry_kill+0x2b2/0x4d0
       dput+0x2dd/0x720
       do_renameat2+0x596/0x970
       __x64_sys_rename+0x78/0x90
       do_syscall_64+0x3b/0x90
      
      [1] https://bugzilla.kernel.org/show_bug.cgi?id=215895
      
      
      
      The bug is due to fuzzed inode has both inline_data and encrypted flags.
      During f2fs_evict_inode(), as the inode was deleted by rename(), it
      will cause inline data conversion due to conflicting flags. The page
      cache will be polluted and the panic will be triggered in clear_inode().
      
      Try fixing the bug by doing more sanity checks for inline data inode in
      sanity_check_inode().
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarMing Yan <yanming@tju.edu.cn>
      Signed-off-by: default avatarChao Yu <chao.yu@oppo.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7cfe2d43
    • Chao Yu's avatar
      f2fs: fix fallocate to use file_modified to update permissions consistently · 59f42b41
      Chao Yu authored
      
      commit 958ed92922028ec67f504dcdc72bfdfd0f43936a upstream.
      
      This patch tries to fix permission consistency issue as all other
      mainline filesystems.
      
      Since the initial introduction of (posix) fallocate back at the turn of
      the century, it has been possible to use this syscall to change the
      user-visible contents of files.  This can happen by extending the file
      size during a preallocation, or through any of the newer modes (punch,
      zero, collapse, insert range).  Because the call can be used to change
      file contents, we should treat it like we do any other modification to a
      file -- update the mtime, and drop set[ug]id privileges/capabilities.
      
      The VFS function file_modified() does all this for us if pass it a
      locked inode, so let's make fallocate drop permissions correctly.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarChao Yu <chao.yu@oppo.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59f42b41
    • Eric Biggers's avatar
      f2fs: don't use casefolded comparison for "." and ".." · 6bde47f4
      Eric Biggers authored
      
      commit b5639bb4313b9d455fc9fc4768d23a5e4ca8cb9d upstream.
      
      Tryng to rename a directory that has all following properties fails with
      EINVAL and triggers the 'WARN_ON_ONCE(!fscrypt_has_encryption_key(dir))'
      in f2fs_match_ci_name():
      
          - The directory is casefolded
          - The directory is encrypted
          - The directory's encryption key is not yet set up
          - The parent directory is *not* encrypted
      
      The problem is incorrect handling of the lookup of ".." to get the
      parent reference to update.  fscrypt_setup_filename() treats ".." (and
      ".") specially, as it's never encrypted.  It's passed through as-is, and
      setting up the directory's key is not attempted.  As the name isn't a
      no-key name, f2fs treats it as a "normal" name and attempts a casefolded
      comparison.  That breaks the assumption of the WARN_ON_ONCE() in
      f2fs_match_ci_name() which assumes that for encrypted directories,
      casefolded comparisons only happen when the directory's key is set up.
      
      We could just remove this WARN_ON_ONCE().  However, since casefolding is
      always a no-op on "." and ".." anyway, let's instead just not casefold
      these names.  This results in the standard bytewise comparison.
      
      Fixes: 7ad08a58 ("f2fs: Handle casefolding with Encryption")
      Cc: <stable@vger.kernel.org> # v5.11+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarGabriel Krisman Bertazi <krisman@collabora.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bde47f4
    • Chao Yu's avatar
      f2fs: fix to do sanity check on total_data_blocks · c9e4cd5b
      Chao Yu authored
      commit 6b8beca0edd32075a769bfe4178ca00c0dcd22a9 upstream.
      
      As Yanming reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=215916
      
      
      
      The kernel message is shown below:
      
      kernel BUG at fs/f2fs/segment.c:2560!
      Call Trace:
       allocate_segment_by_default+0x228/0x440
       f2fs_allocate_data_block+0x13d1/0x31f0
       do_write_page+0x18d/0x710
       f2fs_outplace_write_data+0x151/0x250
       f2fs_do_write_data_page+0xef9/0x1980
       move_data_page+0x6af/0xbc0
       do_garbage_collect+0x312f/0x46f0
       f2fs_gc+0x6b0/0x3bc0
       f2fs_balance_fs+0x921/0x2260
       f2fs_write_single_data_page+0x16be/0x2370
       f2fs_write_cache_pages+0x428/0xd00
       f2fs_write_data_pages+0x96e/0xd50
       do_writepages+0x168/0x550
       __writeback_single_inode+0x9f/0x870
       writeback_sb_inodes+0x47d/0xb20
       __writeback_inodes_wb+0xb2/0x200
       wb_writeback+0x4bd/0x660
       wb_workfn+0x5f3/0xab0
       process_one_work+0x79f/0x13e0
       worker_thread+0x89/0xf60
       kthread+0x26a/0x300
       ret_from_fork+0x22/0x30
      RIP: 0010:new_curseg+0xe8d/0x15f0
      
      The root cause is: ckpt.valid_block_count is inconsistent with SIT table,
      stat info indicates filesystem has free blocks, but SIT table indicates
      filesystem has no free segment.
      
      So that during garbage colloection, it triggers panic when LFS allocator
      fails to find free segment.
      
      This patch tries to fix this issue by checking consistency in between
      ckpt.valid_block_count and block accounted from SIT.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarMing Yan <yanming@tju.edu.cn>
      Signed-off-by: default avatarChao Yu <chao.yu@oppo.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9e4cd5b
Loading