Skip to content
Snippets Groups Projects
  1. Jun 27, 2024
    • Linus Torvalds's avatar
      Revert "mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default" · b3f75255
      Linus Torvalds authored
      
      commit 14d7c92f8df9c0964ae6f8b813c1b3ac38120825 upstream.
      
      This reverts commit 3afb76a66b5559a7b595155803ce23801558a7a9.
      
      This was a wrongheaded workaround for an issue that had already been
      fixed much better by commit 4ef9ad19 ("mm: huge_memory: don't force
      huge page alignment on 32 bit").
      
      Asking users questions at kernel compile time that they can't make sense
      of is not a viable strategy.  And the fact that even the kernel VM
      maintainers apparently didn't catch that this "fix" is not a fix any
      more pretty much proves the point that people can't be expected to
      understand the implications of the question.
      
      It may well be the case that we could improve things further, and that
      __thp_get_unmapped_area() should take the mapping randomization into
      account even for 64-bit kernels.  Maybe we should not be so eager to use
      THP mappings.
      
      But in no case should this be a kernel config option.
      
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3f75255
    • Rafael Aquini's avatar
      mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default · 856cf330
      Rafael Aquini authored
      commit 3afb76a66b5559a7b595155803ce23801558a7a9 upstream.
      
      An ASLR regression was noticed [1] and tracked down to file-mapped areas
      being backed by THP in recent kernels.  The 21-bit alignment constraint
      for such mappings reduces the entropy for randomizing the placement of
      64-bit library mappings and breaks ASLR completely for 32-bit libraries.
      
      The reported issue is easily addressed by increasing vm.mmap_rnd_bits and
      vm.mmap_rnd_compat_bits.  This patch just provides a simple way to set
      ARCH_MMAP_RND_BITS and ARCH_MMAP_RND_COMPAT_BITS to their maximum values
      allowed by the architecture at build time.
      
      [1] https://zolutal.github.io/aslrnt/
      
      [akpm@linux-foundation.org: default to `y' if 32-bit, per Rafael]
      Link: https://lkml.kernel.org/r/20240606180622.102099-1-aquini@redhat.com
      
      
      Fixes: 1854bc6e ("mm/readahead: Align file mappings for non-DAX")
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      856cf330
  2. May 02, 2024
  3. Feb 23, 2024
  4. Aug 18, 2023
    • Jakob Koschel's avatar
      arch: enable HAS_LTO_CLANG with KASAN and KCOV · 349fde59
      Jakob Koschel authored
      Both KASAN and KCOV had issues with LTO_CLANG if DEBUG_INFO is enabled. 
      With LTO inlinable function calls are required to have debug info if they
      are inlined into a function that has debug info.
      
      Starting with LLVM 17 this will be fixed ([1],[2]) and enabling LTO with
      KASAN/KCOV and DEBUG_INFO doesn't cause linker errors anymore.
      
      Link: https://github.com/llvm/llvm-project/commit/913f7e93dac67ecff47bade862ba42f27cb68ca9
      Link: https://github.com/llvm/llvm-project/commit/4a8b1249306ff11f229320abdeadf0c215a00400
      Link: https://lkml.kernel.org/r/20230717-enable-kasan-lto1-v3-1-650e1efc19d1@gmail.com
      
      
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarJakob Koschel <jkl820.git@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Tom Rix <trix@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      349fde59
    • Eric DeVolder's avatar
      kexec: consolidate kexec and crash options into kernel/Kconfig.kexec · 89cde455
      Eric DeVolder authored
      Patch series "refactor Kconfig to consolidate KEXEC and CRASH options", v6.
      
      The Kconfig is refactored to consolidate KEXEC and CRASH options from
      various arch/<arch>/Kconfig files into new file kernel/Kconfig.kexec.
      
      The Kconfig.kexec is now a submenu titled "Kexec and crash features"
      located under "General Setup".
      
      The following options are impacted:
      
       - KEXEC
       - KEXEC_FILE
       - KEXEC_SIG
       - KEXEC_SIG_FORCE
       - KEXEC_IMAGE_VERIFY_SIG
       - KEXEC_BZIMAGE_VERIFY_SIG
       - KEXEC_JUMP
       - CRASH_DUMP
      
      Over time, these options have been copied between Kconfig files and
      are very similar to one another, but with slight differences.
      
      The following architectures are impacted by the refactor (because of
      use of one or more KEXEC/CRASH options):
      
       - arm
       - arm64
       - ia64
       - loongarch
       - m68k
       - mips
       - parisc
       - powerpc
       - riscv
       - s390
       - sh
       - x86 
      
      More information:
      
      In the patch series "crash: Kernel handling of CPU and memory hot
      un/plug"
      
       https://lore.kernel.org/lkml/20230503224145.7405-1-eric.devolder@oracle.com/
      
      the new kernel feature introduces the config option CRASH_HOTPLUG.
      
      In reviewing, Thomas Gleixner requested that the new config option
      not be placed in x86 Kconfig. Rather the option needs a generic/common
      home. To Thomas' point, the KEXEC and CRASH options have largely been
      duplicated in the various arch/<arch>/Kconfig files, with minor
      differences. This kind of proliferation is to be avoid/stopped.
      
       https://lore.kernel.org/lkml/875y91yv63.ffs@tglx/
      
      To that end, I have refactored the arch Kconfigs so as to consolidate
      the various KEXEC and CRASH options. Generally speaking, this work has
      the following themes:
      
      - KEXEC and CRASH options are moved into new file kernel/Kconfig.kexec
        - These items from arch/Kconfig:
            CRASH_CORE KEXEC_CORE KEXEC_ELF HAVE_IMA_KEXEC
        - These items from arch/x86/Kconfig form the common options:
            KEXEC KEXEC_FILE KEXEC_SIG KEXEC_SIG_FORCE
            KEXEC_BZIMAGE_VERIFY_SIG KEXEC_JUMP CRASH_DUMP
        - These items from arch/arm64/Kconfig form the common options:
            KEXEC_IMAGE_VERIFY_SIG
        - The crash hotplug series appends CRASH_HOTPLUG to Kconfig.kexec
      - The Kconfig.kexec is now a submenu titled "Kexec and crash features"
        and is now listed in "General Setup" submenu from init/Kconfig.
      - To control the common options, each has a new ARCH_SUPPORTS_<option>
        option. These gateway options determine whether the common options
        options are valid for the architecture.
      - To account for the slight differences in the original architecture
        coding of the common options, each now has a corresponding
        ARCH_SELECTS_<option> which are used to elicit the same side effects
        as the original arch/<arch>/Kconfig files for KEXEC and CRASH options.
      
      An example, 'make menuconfig' illustrating the submenu:
      
        > General setup > Kexec and crash features
        [*] Enable kexec system call
        [*] Enable kexec file based system call
        [*]   Verify kernel signature during kexec_file_load() syscall
        [ ]     Require a valid signature in kexec_file_load() syscall
        [ ]     Enable bzImage signature verification support
        [*] kexec jump
        [*] kernel crash dumps
        [*]   Update the crash elfcorehdr on system configuration changes
      
      In the process of consolidating the common options, I encountered
      slight differences in the coding of these options in several of the
      architectures. As a result, I settled on the following solution:
      
      - Each of the common options has a 'depends on ARCH_SUPPORTS_<option>'
        statement. For example, the KEXEC_FILE option has a 'depends on
        ARCH_SUPPORTS_KEXEC_FILE' statement.
      
        This approach is needed on all common options so as to prevent
        options from appearing for architectures which previously did
        not allow/enable them. For example, arm supports KEXEC but not
        KEXEC_FILE. The arch/arm/Kconfig does not provide
        ARCH_SUPPORTS_KEXEC_FILE and so KEXEC_FILE and related options
        are not available to arm.
      
      - The boolean ARCH_SUPPORTS_<option> in effect allows the arch to
        determine when the feature is allowed.  Archs which don't have the
        feature simply do not provide the corresponding ARCH_SUPPORTS_<option>.
        For each arch, where there previously were KEXEC and/or CRASH
        options, these have been replaced with the corresponding boolean
        ARCH_SUPPORTS_<option>, and an appropriate def_bool statement.
      
        For example, if the arch supports KEXEC_FILE, then the
        ARCH_SUPPORTS_KEXEC_FILE simply has a 'def_bool y'. This permits
        the KEXEC_FILE option to be available.
      
        If the arch has a 'depends on' statement in its original coding
        of the option, then that expression becomes part of the def_bool
        expression. For example, arm64 had:
      
        config KEXEC
          depends on PM_SLEEP_SMP
      
        and in this solution, this converts to:
      
        config ARCH_SUPPORTS_KEXEC
          def_bool PM_SLEEP_SMP
      
      
      - In order to account for the architecture differences in the
        coding for the common options, the ARCH_SELECTS_<option> in the
        arch/<arch>/Kconfig is used. This option has a 'depends on
        <option>' statement to couple it to the main option, and from
        there can insert the differences from the common option and the
        arch original coding of that option.
      
        For example, a few archs enable CRYPTO and CRYTPO_SHA256 for
        KEXEC_FILE. These require a ARCH_SELECTS_KEXEC_FILE and
        'select CRYPTO' and 'select CRYPTO_SHA256' statements.
      
      Illustrating the option relationships:
      
      For each of the common KEXEC and CRASH options:
       ARCH_SUPPORTS_<option> <- <option> <- ARCH_SELECTS_<option>
      
       <option>                   # in Kconfig.kexec
       ARCH_SUPPORTS_<option>     # in arch/<arch>/Kconfig, as needed
       ARCH_SELECTS_<option>      # in arch/<arch>/Kconfig, as needed
      
      
      For example, KEXEC:
       ARCH_SUPPORTS_KEXEC <- KEXEC <- ARCH_SELECTS_KEXEC
      
       KEXEC                      # in Kconfig.kexec
       ARCH_SUPPORTS_KEXEC        # in arch/<arch>/Kconfig, as needed
       ARCH_SELECTS_KEXEC         # in arch/<arch>/Kconfig, as needed
      
      
      To summarize, the ARCH_SUPPORTS_<option> permits the <option> to be
      enabled, and the ARCH_SELECTS_<option> handles side effects (ie.
      select statements).
      
      Examples:
      A few examples to show the new strategy in action:
      
      ===== x86 (minus the help section) =====
      Original:
       config KEXEC
          bool "kexec system call"
          select KEXEC_CORE
      
       config KEXEC_FILE
          bool "kexec file based system call"
          select KEXEC_CORE
          select HAVE_IMA_KEXEC if IMA
          depends on X86_64
          depends on CRYPTO=y
          depends on CRYPTO_SHA256=y
      
       config ARCH_HAS_KEXEC_PURGATORY
          def_bool KEXEC_FILE
      
       config KEXEC_SIG
          bool "Verify kernel signature during kexec_file_load() syscall"
          depends on KEXEC_FILE
      
       config KEXEC_SIG_FORCE
          bool "Require a valid signature in kexec_file_load() syscall"
          depends on KEXEC_SIG
      
       config KEXEC_BZIMAGE_VERIFY_SIG
          bool "Enable bzImage signature verification support"
          depends on KEXEC_SIG
          depends on SIGNED_PE_FILE_VERIFICATION
          select SYSTEM_TRUSTED_KEYRING
      
       config CRASH_DUMP
          bool "kernel crash dumps"
          depends on X86_64 || (X86_32 && HIGHMEM)
      
       config KEXEC_JUMP
          bool "kexec jump"
          depends on KEXEC && HIBERNATION
          help
      
      becomes...
      New:
      config ARCH_SUPPORTS_KEXEC
          def_bool y
      
      config ARCH_SUPPORTS_KEXEC_FILE
          def_bool X86_64 && CRYPTO && CRYPTO_SHA256
      
      config ARCH_SELECTS_KEXEC_FILE
          def_bool y
          depends on KEXEC_FILE
          select HAVE_IMA_KEXEC if IMA
      
      config ARCH_SUPPORTS_KEXEC_PURGATORY
          def_bool KEXEC_FILE
      
      config ARCH_SUPPORTS_KEXEC_SIG
          def_bool y
      
      config ARCH_SUPPORTS_KEXEC_SIG_FORCE
          def_bool y
      
      config ARCH_SUPPORTS_KEXEC_BZIMAGE_VERIFY_SIG
          def_bool y
      
      config ARCH_SUPPORTS_KEXEC_JUMP
          def_bool y
      
      config ARCH_SUPPORTS_CRASH_DUMP
          def_bool X86_64 || (X86_32 && HIGHMEM)
      
      
      ===== powerpc (minus the help section) =====
      Original:
       config KEXEC
          bool "kexec system call"
          depends on PPC_BOOK3S || PPC_E500 || (44x && !SMP)
          select KEXEC_CORE
      
       config KEXEC_FILE
          bool "kexec file based system call"
          select KEXEC_CORE
          select HAVE_IMA_KEXEC if IMA
          select KEXEC_ELF
          depends on PPC64
          depends on CRYPTO=y
          depends on CRYPTO_SHA256=y
      
       config ARCH_HAS_KEXEC_PURGATORY
          def_bool KEXEC_FILE
      
       config CRASH_DUMP
          bool "Build a dump capture kernel"
          depends on PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
          select RELOCATABLE if PPC64 || 44x || PPC_85xx
      
      becomes...
      New:
      config ARCH_SUPPORTS_KEXEC
          def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP)
      
      config ARCH_SUPPORTS_KEXEC_FILE
          def_bool PPC64 && CRYPTO=y && CRYPTO_SHA256=y
      
      config ARCH_SUPPORTS_KEXEC_PURGATORY
          def_bool KEXEC_FILE
      
      config ARCH_SELECTS_KEXEC_FILE
          def_bool y
          depends on KEXEC_FILE
          select KEXEC_ELF
          select HAVE_IMA_KEXEC if IMA
      
      config ARCH_SUPPORTS_CRASH_DUMP
          def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
      
      config ARCH_SELECTS_CRASH_DUMP
          def_bool y
          depends on CRASH_DUMP
          select RELOCATABLE if PPC64 || 44x || PPC_85xx
      
      
      Testing Approach and Results
      
      There are 388 config files in the arch/<arch>/configs directories.
      For each of these config files, a .config is generated both before and
      after this Kconfig series, and checked for equivalence. This approach
      allows for a rather rapid check of all architectures and a wide
      variety of configs wrt/ KEXEC and CRASH, and avoids requiring
      compiling for all architectures and running kernels and run-time
      testing.
      
      For each config file, the olddefconfig, allnoconfig and allyesconfig
      targets are utilized. In testing the randconfig has revealed problems
      as well, but is not used in the before and after equivalence check
      since one can not generate the "same" .config for before and after,
      even if using the same KCONFIG_SEED since the option list is
      different.
      
      As such, the following script steps compare the before and after
      of 'make olddefconfig'. The new symbols introduced by this series
      are filtered out, but otherwise the config files are PASS only if
      they were equivalent, and FAIL otherwise.
      
      The script performs the test by doing the following:
      
       # Obtain the "golden" .config output for given config file
       # Reset test sandbox
       git checkout master
       git branch -D test_Kconfig
       git checkout -B test_Kconfig master
       make distclean
       # Write out updated config
       cp -f <config file> .config
       make ARCH=<arch> olddefconfig
       # Track each item in .config, LHSB is "golden"
       scoreboard .config 
      
       # Obtain the "changed" .config output for given config file
       # Reset test sandbox
       make distclean
       # Apply this Kconfig series
       git am <this Kconfig series>
       # Write out updated config
       cp -f <config file> .config
       make ARCH=<arch> olddefconfig
       # Track each item in .config, RHSB is "changed"
       scoreboard .config 
      
       # Determine test result
       # Filter-out new symbols introduced by this series
       # Filter-out symbol=n which not in either scoreboard
       # Compare LHSB "golden" and RHSB "changed" scoreboards and issue PASS/FAIL
      
      The script was instrumental during the refactoring of Kconfig as it
      continually revealed problems. The end result being that the solution
      presented in this series passes all configs as checked by the script,
      with the following exceptions:
      
      - arch/ia64/configs/zx1_config with olddefconfig
        This config file has:
        # CONFIG_KEXEC is not set
        CONFIG_CRASH_DUMP=y
        and this refactor now couples KEXEC to CRASH_DUMP, so it is not
        possible to enable CRASH_DUMP without KEXEC.
      
      - arch/sh/configs/* with allyesconfig
        The arch/sh/Kconfig codes CRASH_DUMP as dependent upon BROKEN_ON_MMU
        (which clearly is not meant to be set). This symbol is not provided
        but with the allyesconfig it is set to yes which enables CRASH_DUMP.
        But KEXEC is coded as dependent upon MMU, and is set to no in
        arch/sh/mm/Kconfig, so KEXEC is not enabled.
        This refactor now couples KEXEC to CRASH_DUMP, so it is not
        possible to enable CRASH_DUMP without KEXEC.
      
      While the above exceptions are not equivalent to their original,
      the config file produced is valid (and in fact better wrt/ CRASH_DUMP
      handling).
      
      
      This patch (of 14)
      
      The config options for kexec and crash features are consolidated
      into new file kernel/Kconfig.kexec. Under the "General Setup" submenu
      is a new submenu "Kexec and crash handling". All the kexec and
      crash options that were once in the arch-dependent submenu "Processor
      type and features" are now consolidated in the new submenu.
      
      The following options are impacted:
      
       - KEXEC
       - KEXEC_FILE
       - KEXEC_SIG
       - KEXEC_SIG_FORCE
       - KEXEC_BZIMAGE_VERIFY_SIG
       - KEXEC_JUMP
       - CRASH_DUMP
      
      The three main options are KEXEC, KEXEC_FILE and CRASH_DUMP.
      
      Architectures specify support of certain KEXEC and CRASH features with
      similarly named new ARCH_SUPPORTS_<option> config options.
      
      Architectures can utilize the new ARCH_SELECTS_<option> config
      options to specify additional components when <option> is enabled.
      
      To summarize, the ARCH_SUPPORTS_<option> permits the <option> to be
      enabled, and the ARCH_SELECTS_<option> handles side effects (ie.
      select statements).
      
      Link: https://lkml.kernel.org/r/20230712161545.87870-1-eric.devolder@oracle.com
      Link: https://lkml.kernel.org/r/20230712161545.87870-2-eric.devolder@oracle.com
      
      
      Signed-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Cc. "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com> # for x86
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Juerg Haefliger <juerg.haefliger@canonical.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Marc Aurèle La France <tsi@tuyoix.net>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
      Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: WANG Xuerui <kernel@xen0n.name>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xin Li <xin3.li@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zhen Lei <thunder.leizhen@huawei.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      89cde455
  5. Jul 28, 2023
  6. Jul 11, 2023
  7. Jun 19, 2023
    • Petr Mladek's avatar
      watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific · a5fcc236
      Petr Mladek authored
      There are several hardlockup detector implementations and several Kconfig
      values which allow selection and build of the preferred one.
      
      CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d47
      ("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36.
      It was a preparation step for introducing the new generic perf hardlockup
      detector.
      
      The existing arch-specific variants did not support the to-be-created
      generic build configurations, sysctl interface, etc. This distinction
      was made explicit by the commit 4a7863cc ("x86, nmi_watchdog:
      Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR")
      in v2.6.38.
      
      CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c
      ("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced
      the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used
      by three architectures, namely blackfin, mn10300, and sparc.
      
      The support for blackfin and mn10300 architectures has been completely
      dropped some time ago. And sparc is the only architecture with the historic
      NMI watchdog at the moment.
      
      And the old sparc implementation is really special. It is always built on
      sparc64. It used to be always enabled until the commit 7a5c8b57
      ("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added
      in v4.10-rc1.
      
      There are only few locations where the sparc64 NMI watchdog interacts
      with the generic hardlockup detectors code:
      
        + implements arch_touch_nmi_watchdog() which is called from the generic
          touch_nmi_watchdog()
      
        + implements watchdog_hardlockup_enable()/disable() to support
          /proc/sys/kernel/nmi_watchdog
      
        + is always preferred over other generic watchdogs, see
          CONFIG_HARDLOCKUP_DETECTOR
      
        + includes asm/nmi.h into linux/nmi.h because some sparc-specific
          functions are needed in sparc-specific code which includes
          only linux/nmi.h.
      
      The situation became more complicated after the commit 05a4a952
      ("kernel/watchdog: split up config options") and commit 2104180a
      ("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1.
      They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc
      specific hardlockup detector. It was compatible with the perf one
      regarding the general boot, sysctl, and programming interfaces.
      
      HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of
      HAVE_NMI_WATCHDOG. It made some sense because all arch-specific
      detectors had some common requirements, namely:
      
        + implemented arch_touch_nmi_watchdog()
        + included asm/nmi.h into linux/nmi.h
        + defined the default value for /proc/sys/kernel/nmi_watchdog
      
      But it actually has made things pretty complicated when the generic
      buddy hardlockup detector was added. Before the generic perf detector
      was newer supported together with an arch-specific one. But the buddy
      detector could work on any SMP system. It means that an architecture
      could support both the arch-specific and buddy detector.
      
      As a result, there are few tricky dependencies. For example,
      CONFIG_HARDLOCKUP_DETECTOR depends on:
      
        ((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH
      
      The problem is that the very special sparc implementation is defined as:
      
        HAVE_NMI_WATCHDOG && !HAVE_HARDLOCKUP_DETECTOR_ARCH
      
      Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear
      without reading understanding the history.
      
      Make the logic less tricky and more self-explanatory by making
      HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to
      HAVE_HARDLOCKUP_DETECTOR_SPARC64.
      
      Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF,
      and HARDLOCKUP_DETECTOR_BUDDY may conflict only with
      HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR
      and it is not longer enabled when HAVE_NMI_WATCHDOG is set.
      
      Link: https://lkml.kernel.org/r/20230616150618.6073-5-pmladek@suse.com
      
      
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Reviewed-by: default avatarDouglas Anderson <dianders@chromium.org>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a5fcc236
    • Petr Mladek's avatar
      watchdog/hardlockup: make the config checks more straightforward · 1356d0b9
      Petr Mladek authored
      There are four possible variants of hardlockup detectors:
      
        + buddy: available when SMP is set.
      
        + perf: available when HAVE_HARDLOCKUP_DETECTOR_PERF is set.
      
        + arch-specific: available when HAVE_HARDLOCKUP_DETECTOR_ARCH is set.
      
        + sparc64 special variant: available when HAVE_NMI_WATCHDOG is set
      	and HAVE_HARDLOCKUP_DETECTOR_ARCH is not set.
      
      The check for the sparc64 variant is more complicated because
      HAVE_NMI_WATCHDOG is used to #ifdef code used by both arch-specific
      and sparc64 specific variant. Therefore it is automatically
      selected with HAVE_HARDLOCKUP_DETECTOR_ARCH.
      
      This complexity is partly hidden in HAVE_HARDLOCKUP_DETECTOR_NON_ARCH.
      It reduces the size of some checks but it makes them harder to follow.
      
      Finally, the other temporary variable HARDLOCKUP_DETECTOR_NON_ARCH
      is used to re-compute HARDLOCKUP_DETECTOR_PERF/BUDDY when the global
      HARDLOCKUP_DETECTOR switch is enabled/disabled.
      
      Make the logic more straightforward by the following changes:
      
        + Better explain the role of HAVE_HARDLOCKUP_DETECTOR_ARCH and
          HAVE_NMI_WATCHDOG in comments.
      
        + Add HAVE_HARDLOCKUP_DETECTOR_BUDDY so that there is separate
          HAVE_* for all four hardlockup detector variants.
      
          Use it in the other conditions instead of SMP. It makes it
          clear that it is about the buddy detector.
      
        + Open code HAVE_HARDLOCKUP_DETECTOR_NON_ARCH in HARDLOCKUP_DETECTOR
          and HARDLOCKUP_DETECTOR_PREFER_BUDDY. It helps to understand
          the conditions between the four hardlockup detector variants.
      
        + Define the exact conditions when HARDLOCKUP_DETECTOR_PERF/BUDDY
          can be enabled. It explains the dependency on the other
          hardlockup detector variants.
      
          Also it allows to remove HARDLOCKUP_DETECTOR_NON_ARCH by using "imply".
          It triggers re-evaluating HARDLOCKUP_DETECTOR_PERF/BUDDY when
          the global HARDLOCKUP_DETECTOR switch is changed.
      
        + Add dependency on HARDLOCKUP_DETECTOR so that the affected variables
          disappear when the hardlockup detectors are disabled.
      
          Another nice side effect is that HARDLOCKUP_DETECTOR_PREFER_BUDDY
          value is not preserved when the global switch is disabled.
          The user has to make the decision again when it gets re-enabled.
      
      Link: https://lkml.kernel.org/r/20230616150618.6073-3-pmladek@suse.com
      
      
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Reviewed-by: default avatarDouglas Anderson <dianders@chromium.org>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1356d0b9
    • Douglas Anderson's avatar
      watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe() · 6426e8d1
      Douglas Anderson authored
      Right now there is one arch (sparc64) that selects HAVE_NMI_WATCHDOG
      without selecting HAVE_HARDLOCKUP_DETECTOR_ARCH.  Because of that one
      architecture, we have some special case code in the watchdog core to
      handle the fact that watchdog_hardlockup_probe() isn't implemented.
      
      Let's implement watchdog_hardlockup_probe() for sparc64 and get rid of the
      special case.
      
      As a side effect of doing this, code inspection tells us that we could fix
      a minor bug where the system won't properly realize that NMI watchdogs are
      disabled.  Specifically, on powerpc if CONFIG_PPC_WATCHDOG is turned off
      the arch might still select CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH which
      selects CONFIG_HAVE_NMI_WATCHDOG.  Since CONFIG_PPC_WATCHDOG was off then
      nothing will override the "weak" watchdog_hardlockup_probe() and we'll
      fallback to looking at CONFIG_HAVE_NMI_WATCHDOG.
      
      Link: https://lkml.kernel.org/r/20230526184139.2.Ic6ebbf307ca0efe91f08ce2c1eb4a037ba6b0700@changeid
      
      
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Suggested-by: default avatarPetr Mladek <pmladek@suse.com>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6426e8d1
  8. Jun 16, 2023
    • Thomas Gleixner's avatar
      init: Provide arch_cpu_finalize_init() · 7725acaa
      Thomas Gleixner authored
      
      check_bugs() has become a dumping ground for all sorts of activities to
      finalize the CPU initialization before running the rest of the init code.
      
      Most are empty, a few do actual bug checks, some do alternative patching
      and some cobble a CPU advertisement string together....
      
      Aside of that the current implementation requires duplicated function
      declaration and mostly empty header files for them.
      
      Provide a new function arch_cpu_finalize_init(). Provide a generic
      declaration if CONFIG_ARCH_HAS_CPU_FINALIZE_INIT is selected and a stub
      inline otherwise.
      
      This requires a temporary #ifdef in start_kernel() which will be removed
      along with check_bugs() once the architectures are converted over.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20230613224544.957805717@linutronix.de
      7725acaa
  9. Jun 12, 2023
    • Mickaël Salaün's avatar
      hostfs: Fix ephemeral inodes · 74ce793b
      Mickaël Salaün authored
      
      hostfs creates a new inode for each opened or created file, which
      created useless inode allocations and forbade identifying a host file
      with a kernel inode.
      
      Fix this uncommon filesystem behavior by tying kernel inodes to host
      file's inode and device IDs.  Even if the host filesystem inodes may be
      recycled, this cannot happen while a file referencing it is opened,
      which is the case with hostfs.  It should be noted that hostfs inode IDs
      may not be unique for the same hostfs superblock because multiple host's
      (backed) superblocks may be used.
      
      Delete inodes when dropping them to force backed host's file descriptors
      closing.
      
      This enables to entirely remove ARCH_EPHEMERAL_INODES, and then makes
      Landlock fully supported by UML.  This is very useful for testing
      changes.
      
      These changes also factor out and simplify some helpers thanks to the
      new hostfs_inode_update() and the hostfs_iget() revamp: read_name(),
      hostfs_create(), hostfs_lookup(), hostfs_mknod(), and
      hostfs_fill_sb_common().
      
      A following commit with new Landlock tests check this new hostfs inode
      consistency.
      
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Acked-by: default avatarRichard Weinberger <richard@nod.at>
      Link: https://lore.kernel.org/r/20230612191430.339153-2-mic@digikod.net
      
      
      Signed-off-by: default avatarMickaël Salaün <mic@digikod.net>
      74ce793b
  10. May 15, 2023
    • Thomas Gleixner's avatar
      cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE · 18415f33
      Thomas Gleixner authored
      
      There is often significant latency in the early stages of CPU bringup, and
      time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86) and
      then waiting for it to respond before moving on to the next.
      
      Allow a platform to enable parallel setup which brings all to be onlined
      CPUs up to the CPUHP_BP_KICK_AP state. While this state advancement on the
      control CPU (BP) is single-threaded the important part is the last state
      CPUHP_BP_KICK_AP which wakes the to be onlined CPUs up.
      
      This allows the CPUs to run up to the first sychronization point
      cpuhp_ap_sync_alive() where they wait for the control CPU to release them
      one by one for the full onlining procedure.
      
      This parallelism depends on the CPU hotplug core sync mechanism which
      ensures that the parallel brought up CPUs wait for release before touching
      any state which would make the CPU visible to anything outside the hotplug
      control mechanism.
      
      To handle the SMT constraints of X86 correctly the bringup happens in two
      iterations when CONFIG_HOTPLUG_SMT is enabled. The control CPU brings up
      the primary SMT threads of each core first, which can load the microcode
      without the need to rendevouz with the thread siblings. Once that's
      completed it brings up the secondary SMT threads.
      
      Co-developed-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
      Link: https://lore.kernel.org/r/20230512205257.240231377@linutronix.de
      18415f33
    • Thomas Gleixner's avatar
      cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism · a631be92
      Thomas Gleixner authored
      
      The bring up logic of a to be onlined CPU consists of several parts, which
      are considered to be a single hotplug state:
      
        1) Control CPU issues the wake-up
      
        2) To be onlined CPU starts up, does the minimal initialization,
           reports to be alive and waits for release into the complete bring-up.
      
        3) Control CPU waits for the alive report and releases the upcoming CPU
           for the complete bring-up.
      
      Allow to split this into two states:
      
        1) Control CPU issues the wake-up
      
           After that the to be onlined CPU starts up, does the minimal
           initialization, reports to be alive and waits for release into the
           full bring-up. As this can run after the control CPU dropped the
           hotplug locks the code which is executed on the AP before it reports
           alive has to be carefully audited to not violate any of the hotplug
           constraints, especially not modifying any of the various cpumasks.
      
           This is really only meant to avoid waiting for the AP to react on the
           wake-up. Of course an architecture can move strict CPU related setup
           functionality, e.g. microcode loading, with care before the
           synchronization point to save further pointless waiting time.
      
        2) Control CPU waits for the alive report and releases the upcoming CPU
           for the complete bring-up.
      
      This allows that the two states can be split up to run all to be onlined
      CPUs up to state #1 on the control CPU and then at a later point run state
      #2. This spares some of the latencies of the full serialized per CPU
      bringup by avoiding the per CPU wakeup/wait serialization. The assumption
      is that the first AP already waits when the last AP has been woken up. This
      obvioulsy depends on the hardware latencies and depending on the timings
      this might still not completely eliminate all wait scenarios.
      
      This split is just a preparatory step for enabling the parallel bringup
      later. The boot time bringup is still fully serialized. It has a separate
      config switch so that architectures which want to support parallel bringup
      can test the split of the CPUHP_BRINGUG step separately.
      
      To enable this the architecture must support the CPU hotplug core sync
      mechanism and has to be audited that there are no implicit hotplug state
      dependencies which require a fully serialized bringup.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
      Link: https://lore.kernel.org/r/20230512205257.080801387@linutronix.de
      a631be92
    • Thomas Gleixner's avatar
      cpu/hotplug: Add CPU state tracking and synchronization · 6f062123
      Thomas Gleixner authored
      
      The CPU state tracking and synchronization mechanism in smpboot.c is
      completely independent of the hotplug code and all logic around it is
      implemented in architecture specific code.
      
      Except for the state reporting of the AP there is absolutely nothing
      architecture specific and the sychronization and decision functions can be
      moved into the generic hotplug core code.
      
      Provide an integrated variant and add the core synchronization and decision
      points. This comes in two flavours:
      
        1) DEAD state synchronization
      
           Updated by the architecture code once the AP reaches the point where
           it is ready to be torn down by the control CPU, e.g. by removing power
           or clocks or tear down via the hypervisor.
      
           The control CPU waits for this state to be reached with a timeout. If
           the state is reached an architecture specific cleanup function is
           invoked.
      
        2) Full state synchronization
      
           This extends #1 with AP alive synchronization. This is new
           functionality, which allows to replace architecture specific wait
           mechanims, e.g. cpumasks, completely.
      
           It also prevents that an AP which is in a limbo state can be brought
           up again. This can happen when an AP failed to report dead state
           during a previous off-line operation.
      
      The dead synchronization is what most architectures use. Only x86 makes a
      bringup decision based on that state at the moment.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
      Link: https://lore.kernel.org/r/20230512205256.476305035@linutronix.de
      6f062123
  11. Mar 28, 2023
    • Nicholas Piggin's avatar
      lazy tlb: shoot lazies, non-refcounting lazy tlb mm reference handling scheme · 2655421a
      Nicholas Piggin authored
      On big systems, the mm refcount can become highly contented when doing a
      lot of context switching with threaded applications.  user<->idle switch
      is one of the important cases.  Abandoning lazy tlb entirely slows this
      switching down quite a bit in the common uncontended case, so that is not
      viable.
      
      Implement a scheme where lazy tlb mm references do not contribute to the
      refcount, instead they get explicitly removed when the refcount reaches
      zero.
      
      The final mmdrop() sends IPIs to all CPUs in the mm_cpumask and they
      switch away from this mm to init_mm if it was being used as the lazy tlb
      mm.  Enabling the shoot lazies option therefore requires that the arch
      ensures that mm_cpumask contains all CPUs that could possibly be using mm.
      A DEBUG_VM option IPIs every CPU in the system after this to ensure there
      are no references remaining before the mm is freed.
      
      Shootdown IPIs cost could be an issue, but they have not been observed to
      be a serious problem with this scheme, because short-lived processes tend
      not to migrate CPUs much, therefore they don't get much chance to leave
      lazy tlb mm references on remote CPUs.  There are a lot of options to
      reduce them if necessary, described in comments.
      
      The near-worst-case can be benchmarked with will-it-scale:
      
        context_switch1_threads -t $(($(nproc) / 2))
      
      This will create nproc threads (nproc / 2 switching pairs) all sharing the
      same mm that spread over all CPUs so each CPU does thread->idle->thread
      switching.
      
      [ Rik came up with basically the same idea a few years ago, so credit
        to him for that. ]
      
      Link: https://lore.kernel.org/linux-mm/20230118080011.2258375-1-npiggin@gmail.com/
      Link: https://lore.kernel.org/all/20180728215357.3249-11-riel@surriel.com/
      Link: https://lkml.kernel.org/r/20230203071837.1136453-5-npiggin@gmail.com
      
      
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2655421a
    • Nicholas Piggin's avatar
      lazy tlb: allow lazy tlb mm refcounting to be configurable · 88e3009b
      Nicholas Piggin authored
      Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm
      when it is context switched.  This can be disabled by architectures that
      don't require this refcounting if they clean up lazy tlb mms when the last
      refcount is dropped.  Currently this is always enabled, so the patch
      introduces no functional change.
      
      Link: https://lkml.kernel.org/r/20230203071837.1136453-4-npiggin@gmail.com
      
      
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      88e3009b
  12. Feb 10, 2023
  13. Dec 14, 2022
  14. Nov 09, 2022
  15. Oct 20, 2022
    • Paul E. McKenney's avatar
      srcu: Create an srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe() · 2e83b879
      Paul E. McKenney authored
      On strict load-store architectures, the use of this_cpu_inc() by
      srcu_read_lock() and srcu_read_unlock() is not NMI-safe in TREE SRCU.
      To see this suppose that an NMI arrives in the middle of srcu_read_lock(),
      just after it has read ->srcu_lock_count, but before it has written
      the incremented value back to memory.  If that NMI handler also does
      srcu_read_lock() and srcu_read_lock() on that same srcu_struct structure,
      then upon return from that NMI handler, the interrupted srcu_read_lock()
      will overwrite the NMI handler's update to ->srcu_lock_count, but
      leave unchanged the NMI handler's update by srcu_read_unlock() to
      ->srcu_unlock_count.
      
      This can result in a too-short SRCU grace period, which can in turn
      result in arbitrary memory corruption.
      
      If the NMI handler instead interrupts the srcu_read_unlock(), this
      can result in eternal SRCU grace periods, which is not much better.
      
      This commit therefore creates a pair of new srcu_read_lock_nmisafe()
      and srcu_read_unlock_nmisafe() functions, which allow SRCU readers in
      both NMI handlers and in process and IRQ context.  It is bad practice
      to mix the existing and the new _nmisafe() primitives on the same
      srcu_struct structure.  Use one set or the other, not both.
      
      Just to underline that "bad practice" point, using srcu_read_lock() at
      process level and srcu_read_lock_nmisafe() in your NMI handler will not,
      repeat NOT, work.  If you do not immediately understand why this is the
      case, please review the earlier paragraphs in this commit log.
      
      [ paulmck: Apply kernel test robot feedback. ]
      [ paulmck: Apply feedback from Randy Dunlap. ]
      [ paulmck: Apply feedback from John Ogness. ]
      [ paulmck: Apply feedback from Frederic Weisbecker. ]
      
      Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/
      
      
      
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Ogness <john.ogness@linutronix.de>
      Cc: Petr Mladek <pmladek@suse.com>
      2e83b879
  16. Oct 17, 2022
  17. Sep 28, 2022
  18. Sep 27, 2022
  19. Sep 26, 2022
  20. Sep 05, 2022
  21. Aug 21, 2022
  22. Jul 21, 2022
    • Peter Zijlstra's avatar
      mmu_gather: Remove per arch tlb_{start,end}_vma() · 1e9fdf21
      Peter Zijlstra authored
      
      Scattered across the archs are 3 basic forms of tlb_{start,end}_vma().
      Provide two new MMU_GATHER_knobs to enumerate them and remove the per
      arch tlb_{start,end}_vma() implementations.
      
       - MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range()
         but does *NOT* want to call it for each VMA.
      
       - MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the
         invalidate across multiple VMAs if possible.
      
      With these it is possible to capture the three forms:
      
        1) empty stubs;
           select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS
      
        2) start: flush_cache_range(), end: empty;
           select MMU_GATHER_MERGE_VMAS
      
        3) start: flush_cache_range(), end: flush_tlb_range();
           default
      
      Obviously, if the architecture does not have flush_cache_range() then
      it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e9fdf21
  23. Jul 05, 2022
  24. Jun 30, 2022
    • Frederic Weisbecker's avatar
      context_tracking: Split user tracking Kconfig · 24a9c541
      Frederic Weisbecker authored
      
      Context tracking is going to be used not only to track user transitions
      but also idle/IRQs/NMIs. The user tracking part will then become a
      separate feature. Prepare Kconfig for that.
      
      [ frederic: Apply Max Filippov feedback. ]
      
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
      Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
      Cc: Yu Liao <liaoyu15@huawei.com>
      Cc: Phil Auld <pauld@redhat.com>
      Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
      Cc: Alex Belits <abelits@marvell.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      Tested-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      24a9c541
  25. Jun 23, 2022
    • Mark Rutland's avatar
      arch: make TRACE_IRQFLAGS_NMI_SUPPORT generic · 4510bffb
      Mark Rutland authored
      
      On most architectures, IRQ flag tracing is disabled in NMI context, and
      architectures need to define and select TRACE_IRQFLAGS_NMI_SUPPORT in
      order to enable this.
      
      Commit:
      
        859d069e ("lockdep: Prepare for NMI IRQ state tracking")
      
      Permitted IRQ flag tracing in NMI context, allowing lockdep to work in
      NMI context where an architecture had suitable entry logic. At the time,
      most architectures did not have such suitable entry logic, and this broke
      lockdep on such architectures. Thus, this was partially disabled in
      commit:
      
        ed004953 ("locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs")
      
      ... with architectures needing to select TRACE_IRQFLAGS_NMI_SUPPORT to
      enable IRQ flag tracing in NMI context.
      
      Currently TRACE_IRQFLAGS_NMI_SUPPORT is defined under
      arch/x86/Kconfig.debug. Move it to arch/Kconfig so architectures can
      select it without having to provide their own definition.
      
      Since the regular TRACE_IRQFLAGS_SUPPORT is selected by
      arch/x86/Kconfig, the select of TRACE_IRQFLAGS_NMI_SUPPORT is moved
      there too.
      
      There should be no functional change as a result of this patch.
      
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20220511131733.4074499-2-mark.rutland@arm.com
      
      
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      4510bffb
  26. Jun 15, 2022
    • Prasad Sodagudi's avatar
      lib: Add register read/write tracing support · d593d64f
      Prasad Sodagudi authored
      
      Generic MMIO read/write i.e., __raw_{read,write}{b,l,w,q} accessors
      are typically used to read/write from/to memory mapped registers
      and can cause hangs or some undefined behaviour in following few
      cases,
      
      * If the access to the register space is unclocked, for example: if
        there is an access to multimedia(MM) block registers without MM
        clocks.
      
      * If the register space is protected and not set to be accessible from
        non-secure world, for example: only EL3 (EL: Exception level) access
        is allowed and any EL2/EL1 access is forbidden.
      
      * If xPU(memory/register protection units) is controlling access to
        certain memory/register space for specific clients.
      
      and more...
      
      Such cases usually results in instant reboot/SErrors/NOC or interconnect
      hangs and tracing these register accesses can be very helpful to debug
      such issues during initial development stages and also in later stages.
      
      So use ftrace trace events to log such MMIO register accesses which
      provides rich feature set such as early enablement of trace events,
      filtering capability, dumping ftrace logs on console and many more.
      
      Sample output:
      
      rwmmio_write: __qcom_geni_serial_console_write+0x160/0x1e0 width=32 val=0xa0d5d addr=0xfffffbfffdbff700
      rwmmio_post_write: __qcom_geni_serial_console_write+0x160/0x1e0 width=32 val=0xa0d5d addr=0xfffffbfffdbff700
      rwmmio_read: qcom_geni_serial_poll_bit+0x94/0x138 width=32 addr=0xfffffbfffdbff610
      rwmmio_post_read: qcom_geni_serial_poll_bit+0x94/0x138 width=32 val=0x0 addr=0xfffffbfffdbff610
      
      Co-developed-by: default avatarSai Prakash Ranjan <quic_saipraka@quicinc.com>
      Signed-off-by: default avatarPrasad Sodagudi <psodagud@codeaurora.org>
      Signed-off-by: default avatarSai Prakash Ranjan <quic_saipraka@quicinc.com>
      Acked-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      d593d64f
  27. May 27, 2022
  28. May 22, 2022
  29. Apr 25, 2022
    • Catalin Marinas's avatar
      mm: Add fault_in_subpage_writeable() to probe at sub-page granularity · da32b581
      Catalin Marinas authored
      
      On hardware with features like arm64 MTE or SPARC ADI, an access fault
      can be triggered at sub-page granularity. Depending on how the
      fault_in_writeable() function is used, the caller can get into a
      live-lock by continuously retrying the fault-in on an address different
      from the one where the uaccess failed.
      
      In the majority of cases progress is ensured by the following
      conditions:
      
      1. copy_to_user_nofault() guarantees at least one byte access if the
         user address is not faulting.
      
      2. The fault_in_writeable() loop is resumed from the first address that
         could not be accessed by copy_to_user_nofault().
      
      If the loop iteration is restarted from an earlier (initial) point, the
      loop is repeated with the same conditions and it would live-lock.
      
      Introduce an arch-specific probe_subpage_writeable() and call it from
      the newly added fault_in_subpage_writeable() function. The arch code
      with sub-page faults will have to implement the specific probing
      functionality.
      
      Note that no other fault_in_subpage_*() functions are added since they
      have no callers currently susceptible to a live-lock.
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: https://lore.kernel.org/r/20220423100751.1870771-2-catalin.marinas@arm.com
      
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      da32b581
  30. Apr 22, 2022
Loading