Skip to content
Snippets Groups Projects
Commit b23c4771 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'docs-5.8' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "A fair amount of stuff this time around, dominated by yet another
  massive set from Mauro toward the completion of the RST conversion. I
  *really* hope we are getting close to the end of this. Meanwhile,
  those patches reach pretty far afield to update document references
  around the tree; there should be no actual code changes there. There
  will be, alas, more of the usual trivial merge conflicts.

  Beyond that we have more translations, improvements to the sphinx
  scripting, a number of additions to the sysctl documentation, and lots
  of fixes"

* tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits)
  Documentation: fixes to the maintainer-entry-profile template
  zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst
  tracing: Fix events.rst section numbering
  docs: acpi: fix old http link and improve document format
  docs: filesystems: add info about efivars content
  Documentation: LSM: Correct the basic LSM description
  mailmap: change email for Ricardo Ribalda
  docs: sysctl/kernel: document unaligned controls
  Documentation: admin-guide: update bug-hunting.rst
  docs: sysctl/kernel: document ngroups_max
  nvdimm: fixes to maintainter-entry-profile
  Documentation/features: Correct RISC-V kprobes support entry
  Documentation/features: Refresh the arch support status files
  Revert "docs: sysctl/kernel: document ngroups_max"
  docs: move locking-specific documents to locking/
  docs: move digsig docs to the security book
  docs: move the kref doc into the core-api book
  docs: add IRQ documentation at the core-api book
  docs: debugging-via-ohci1394.txt: add it to the core-api book
  docs: fix references for ipmi.rst file
  ...
parents c2b0fc84 e35b5a4c
No related branches found
No related tags found
No related merge requests found
Showing
with 455 additions and 246 deletions
...@@ -152,6 +152,7 @@ Krzysztof Kozlowski <krzk@kernel.org> <k.kozlowski.k@gmail.com> ...@@ -152,6 +152,7 @@ Krzysztof Kozlowski <krzk@kernel.org> <k.kozlowski.k@gmail.com>
Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Leon Romanovsky <leon@kernel.org> <leon@leon.nu> Leon Romanovsky <leon@kernel.org> <leon@leon.nu>
Leon Romanovsky <leon@kernel.org> <leonro@mellanox.com> Leon Romanovsky <leon@kernel.org> <leonro@mellanox.com>
Leonardo Bras <leobras.c@gmail.com> <leonardo@linux.ibm.com>
Leonid I Ananiev <leonid.i.ananiev@intel.com> Leonid I Ananiev <leonid.i.ananiev@intel.com>
Linas Vepstas <linas@austin.ibm.com> Linas Vepstas <linas@austin.ibm.com>
Linus Lüssing <linus.luessing@c0d3.blue> <linus.luessing@web.de> Linus Lüssing <linus.luessing@c0d3.blue> <linus.luessing@web.de>
...@@ -234,7 +235,9 @@ Ralf Baechle <ralf@linux-mips.org> ...@@ -234,7 +235,9 @@ Ralf Baechle <ralf@linux-mips.org>
Ralf Wildenhues <Ralf.Wildenhues@gmx.de> Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
Randy Dunlap <rdunlap@infradead.org> <rdunlap@xenotime.net> Randy Dunlap <rdunlap@infradead.org> <rdunlap@xenotime.net>
Rémi Denis-Courmont <rdenis@simphalempin.com> Rémi Denis-Courmont <rdenis@simphalempin.com>
Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Ricardo Ribalda <ribalda@kernel.org> <ricardo.ribalda@gmail.com>
Ricardo Ribalda <ribalda@kernel.org> <ricardo@ribalda.com>
Ricardo Ribalda <ribalda@kernel.org> Ricardo Ribalda Delgado <ribalda@kernel.org>
Ross Zwisler <zwisler@kernel.org> <ross.zwisler@linux.intel.com> Ross Zwisler <zwisler@kernel.org> <ross.zwisler@linux.intel.com>
Rudolf Marek <R.Marek@sh.cvut.cz> Rudolf Marek <R.Marek@sh.cvut.cz>
Rui Saraiva <rmps@joel.ist.utl.pt> Rui Saraiva <rmps@joel.ist.utl.pt>
......
...@@ -3104,14 +3104,16 @@ W: http://www.qsl.net/dl1bke/ ...@@ -3104,14 +3104,16 @@ W: http://www.qsl.net/dl1bke/
D: Generic Z8530 driver, AX.25 DAMA slave implementation D: Generic Z8530 driver, AX.25 DAMA slave implementation
D: Several AX.25 hacks D: Several AX.25 hacks
N: Ricardo Ribalda Delgado N: Ricardo Ribalda
E: ricardo.ribalda@gmail.com E: ribalda@kernel.org
W: http://ribalda.com W: http://ribalda.com
D: PLX USB338x driver D: PLX USB338x driver
D: PCA9634 driver D: PCA9634 driver
D: Option GTM671WFS D: Option GTM671WFS
D: Fintek F81216A D: Fintek F81216A
D: AD5761 iio driver D: AD5761 iio driver
D: TI DAC7612 driver
D: Sony IMX214 driver
D: Various kernel hacks D: Various kernel hacks
S: Qtechnology A/S S: Qtechnology A/S
S: Valby Langgade 142 S: Valby Langgade 142
......
...@@ -54,7 +54,7 @@ Date: October 2002 ...@@ -54,7 +54,7 @@ Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org> Contact: Linux Memory Management list <linux-mm@kvack.org>
Description: Description:
Provides information about the node's distribution and memory Provides information about the node's distribution and memory
utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.txt utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.rst
What: /sys/devices/system/node/nodeX/numastat What: /sys/devices/system/node/nodeX/numastat
Date: October 2002 Date: October 2002
......
...@@ -11,7 +11,7 @@ Description: ...@@ -11,7 +11,7 @@ Description:
Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem
are not present in /proc/pid/smaps. These fields represent are not present in /proc/pid/smaps. These fields represent
the sum of the Pss field of each type (anon, file, shmem). the sum of the Pss field of each type (anon, file, shmem).
For more details, see Documentation/filesystems/proc.txt For more details, see Documentation/filesystems/proc.rst
and the procfs man page. and the procfs man page.
Typical output looks like this: Typical output looks like this:
......
...@@ -98,7 +98,11 @@ else # HAVE_PDFLATEX ...@@ -98,7 +98,11 @@ else # HAVE_PDFLATEX
pdfdocs: latexdocs pdfdocs: latexdocs
@$(srctree)/scripts/sphinx-pre-install --version-check @$(srctree)/scripts/sphinx-pre-install --version-check
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;) $(foreach var,$(SPHINXDIRS), \
$(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit; \
mkdir -p $(BUILDDIR)/$(var)/pdf; \
mv $(subst .tex,.pdf,$(wildcard $(BUILDDIR)/$(var)/latex/*.tex)) $(BUILDDIR)/$(var)/pdf/; \
)
endif # HAVE_PDFLATEX endif # HAVE_PDFLATEX
......
...@@ -32,12 +32,13 @@ interrupt goes unhandled over time, they are tracked by the Linux kernel as ...@@ -32,12 +32,13 @@ interrupt goes unhandled over time, they are tracked by the Linux kernel as
Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it
reaches a specific count with the error "nobody cared". This disabled IRQ reaches a specific count with the error "nobody cared". This disabled IRQ
now prevents valid usage by an existing interrupt which may happen to share now prevents valid usage by an existing interrupt which may happen to share
the IRQ line. the IRQ line::
irq 19: nobody cared (try booting with the "irqpoll" option) irq 19: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1 CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1
Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020 Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020
Call Trace: Call Trace:
<IRQ> <IRQ>
? dump_stack+0x46/0x5e ? dump_stack+0x46/0x5e
? __report_bad_irq+0x2e/0xb0 ? __report_bad_irq+0x2e/0xb0
...@@ -85,15 +86,18 @@ Mitigations ...@@ -85,15 +86,18 @@ Mitigations
The mitigations take the form of PCI quirks. The preference has been to The mitigations take the form of PCI quirks. The preference has been to
first identify and make use of a means to disable the routing to the PCH. first identify and make use of a means to disable the routing to the PCH.
In such a case a quirk to disable boot interrupt generation can be In such a case a quirk to disable boot interrupt generation can be
added.[1] added. [1]_
Intel® 6300ESB I/O Controller Hub Intel® 6300ESB I/O Controller Hub
Alternate Base Address Register: Alternate Base Address Register:
BIE: Boot Interrupt Enable BIE: Boot Interrupt Enable
0 = Boot interrupt is enabled.
1 = Boot interrupt is disabled.
Intel® Sandy Bridge through Sky Lake based Xeon servers: == ===========================
0 Boot interrupt is enabled.
1 Boot interrupt is disabled.
== ===========================
Intel® Sandy Bridge through Sky Lake based Xeon servers:
Coherent Interface Protocol Interrupt Control Coherent Interface Protocol Interrupt Control
dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2: dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2:
When this bit is set. Local INTx messages received from the When this bit is set. Local INTx messages received from the
...@@ -109,12 +113,12 @@ line by default. Therefore, on chipsets where this INTx routing cannot be ...@@ -109,12 +113,12 @@ line by default. Therefore, on chipsets where this INTx routing cannot be
disabled, the Linux kernel will reroute the valid interrupt to its legacy disabled, the Linux kernel will reroute the valid interrupt to its legacy
interrupt. This redirection of the handler will prevent the occurrence of interrupt. This redirection of the handler will prevent the occurrence of
the spurious interrupt detection which would ordinarily disable the IRQ the spurious interrupt detection which would ordinarily disable the IRQ
line due to excessive unhandled counts.[2] line due to excessive unhandled counts. [2]_
The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or
disable) the redirection of the interrupt handler to the PCH interrupt disable) the redirection of the interrupt handler to the PCH interrupt
line. The option can be overridden by either pci=ioapicreroute or line. The option can be overridden by either pci=ioapicreroute or
pci=noioapicreroute.[3] pci=noioapicreroute. [3]_
More Documentation More Documentation
...@@ -127,19 +131,19 @@ into the evolution of its handling with chipsets. ...@@ -127,19 +131,19 @@ into the evolution of its handling with chipsets.
Example of disabling of the boot interrupt Example of disabling of the boot interrupt
------------------------------------------ ------------------------------------------
Intel® 6300ESB I/O Controller Hub (Document # 300641-004US) - Intel® 6300ESB I/O Controller Hub (Document # 300641-004US)
5.7.3 Boot Interrupt 5.7.3 Boot Interrupt
https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf
Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families - Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families
Datasheet - Volume 2: Registers (Document # 330784-003) Datasheet - Volume 2: Registers (Document # 330784-003)
6.6.41 cipintrc Coherent Interface Protocol Interrupt Control 6.6.41 cipintrc Coherent Interface Protocol Interrupt Control
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
Example of handler rerouting Example of handler rerouting
---------------------------- ----------------------------
Intel® 6700PXH 64-bit PCI Hub (Document # 302628) - Intel® 6700PXH 64-bit PCI Hub (Document # 302628)
2.15.2 PCI Express Legacy INTx Support and Boot Interrupt 2.15.2 PCI Express Legacy INTx Support and Boot Interrupt
https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf
...@@ -150,6 +154,6 @@ Cheers, ...@@ -150,6 +154,6 @@ Cheers,
Sean V Kelley Sean V Kelley
sean.v.kelley@linux.intel.com sean.v.kelley@linux.intel.com
[1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/ .. [1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/
[2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/ .. [2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/
[3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/ .. [3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/
...@@ -63,7 +63,7 @@ which can then be compiled to AML binary format:: ...@@ -63,7 +63,7 @@ which can then be compiled to AML binary format::
ASL Input: minnomax.asl - 30 lines, 614 bytes, 7 keywords ASL Input: minnomax.asl - 30 lines, 614 bytes, 7 keywords
AML Output: minnowmax.aml - 165 bytes, 6 named objects, 1 executable opcodes AML Output: minnowmax.aml - 165 bytes, 6 named objects, 1 executable opcodes
[1] http://wiki.minnowboard.org/MinnowBoard_MAX#Low_Speed_Expansion_Connector_.28Top.29 [1] https://www.elinux.org/Minnowboard:MinnowMax#Low_Speed_Expansion_.28Top.29
The resulting AML code can then be loaded by the kernel using one of the methods The resulting AML code can then be loaded by the kernel using one of the methods
below. below.
......
...@@ -49,15 +49,19 @@ the issue, it may also contain the word **Oops**, as on this one:: ...@@ -49,15 +49,19 @@ the issue, it may also contain the word **Oops**, as on this one::
Despite being an **Oops** or some other sort of stack trace, the offended Despite being an **Oops** or some other sort of stack trace, the offended
line is usually required to identify and handle the bug. Along this chapter, line is usually required to identify and handle the bug. Along this chapter,
we'll refer to "Oops" for all kinds of stack traces that need to be analized. we'll refer to "Oops" for all kinds of stack traces that need to be analyzed.
.. note:: If the kernel is compiled with ``CONFIG_DEBUG_INFO``, you can enhance the
quality of the stack trace by using file:`scripts/decode_stacktrace.sh`.
Modules linked in
-----------------
Modules that are tainted or are being loaded or unloaded are marked with
"(...)", where the taint flags are described in
file:`Documentation/admin-guide/tainted-kernels.rst`, "being loaded" is
annotated with "+", and "being unloaded" is annotated with "-".
``ksymoops`` is useless on 2.6 or upper. Please use the Oops in its original
format (from ``dmesg``, etc). Ignore any references in this or other docs to
"decoding the Oops" or "running it through ksymoops".
If you post an Oops from 2.6+ that has been run through ``ksymoops``,
people will just tell you to repost it.
Where is the Oops message is located? Where is the Oops message is located?
------------------------------------- -------------------------------------
...@@ -71,7 +75,7 @@ by running ``journalctl`` command. ...@@ -71,7 +75,7 @@ by running ``journalctl`` command.
Sometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to Sometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to
read the data from the kernel buffers and save it. Or you can read the data from the kernel buffers and save it. Or you can
``cat /proc/kmsg > file``, however you have to break in to stop the transfer, ``cat /proc/kmsg > file``, however you have to break in to stop the transfer,
``kmsg`` is a "never ending file". since ``kmsg`` is a "never ending file".
If the machine has crashed so badly that you cannot enter commands or If the machine has crashed so badly that you cannot enter commands or
the disk is not available then you have three options: the disk is not available then you have three options:
...@@ -81,9 +85,9 @@ the disk is not available then you have three options: ...@@ -81,9 +85,9 @@ the disk is not available then you have three options:
planned for a crash. Alternatively, you can take a picture of planned for a crash. Alternatively, you can take a picture of
the screen with a digital camera - not nice, but better than the screen with a digital camera - not nice, but better than
nothing. If the messages scroll off the top of the console, you nothing. If the messages scroll off the top of the console, you
may find that booting with a higher resolution (eg, ``vga=791``) may find that booting with a higher resolution (e.g., ``vga=791``)
will allow you to read more of the text. (Caveat: This needs ``vesafb``, will allow you to read more of the text. (Caveat: This needs ``vesafb``,
so won't help for 'early' oopses) so won't help for 'early' oopses.)
(2) Boot with a serial console (see (2) Boot with a serial console (see
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`), :ref:`Documentation/admin-guide/serial-console.rst <serial_console>`),
...@@ -104,7 +108,7 @@ Kernel source file. There are two methods for doing that. Usually, using ...@@ -104,7 +108,7 @@ Kernel source file. There are two methods for doing that. Usually, using
gdb gdb
^^^ ^^^
The GNU debug (``gdb``) is the best way to figure out the exact file and line The GNU debugger (``gdb``) is the best way to figure out the exact file and line
number of the OOPS from the ``vmlinux`` file. number of the OOPS from the ``vmlinux`` file.
The usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``. The usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``.
...@@ -165,7 +169,7 @@ If you have a call trace, such as:: ...@@ -165,7 +169,7 @@ If you have a call trace, such as::
[<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
... ...
this shows the problem likely in the :jbd: module. You can load that module this shows the problem likely is in the :jbd: module. You can load that module
in gdb and list the relevant code:: in gdb and list the relevant code::
$ gdb fs/jbd/jbd.ko $ gdb fs/jbd/jbd.ko
...@@ -199,8 +203,9 @@ in the kernel hacking menu of the menu configuration.) For example:: ...@@ -199,8 +203,9 @@ in the kernel hacking menu of the menu configuration.) For example::
You need to be at the top level of the kernel tree for this to pick up You need to be at the top level of the kernel tree for this to pick up
your C files. your C files.
If you don't have access to the code you can also debug on some crash dumps If you don't have access to the source code you can still debug some crash
e.g. crash dump output as shown by Dave Miller:: dumps using the following method (example crash dump output as shown by
Dave Miller)::
EIP is at +0x14/0x4c0 EIP is at +0x14/0x4c0
... ...
...@@ -230,6 +235,9 @@ e.g. crash dump output as shown by Dave Miller:: ...@@ -230,6 +235,9 @@ e.g. crash dump output as shown by Dave Miller::
mov 0x8(%ebp), %ebx ! %ebx = skb->sk mov 0x8(%ebp), %ebx ! %ebx = skb->sk
mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
file:`scripts/decodecode` can be used to automate most of this, depending
on what CPU architecture is being debugged.
Reporting the bug Reporting the bug
----------------- -----------------
...@@ -241,7 +249,7 @@ used for the development of the affected code. This can be done by using ...@@ -241,7 +249,7 @@ used for the development of the affected code. This can be done by using
the ``get_maintainer.pl`` script. the ``get_maintainer.pl`` script.
For example, if you find a bug at the gspca's sonixj.c file, you can get For example, if you find a bug at the gspca's sonixj.c file, you can get
their maintainers with:: its maintainers with::
$ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c $ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c
Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%) Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
...@@ -253,16 +261,17 @@ their maintainers with:: ...@@ -253,16 +261,17 @@ their maintainers with::
Please notice that it will point to: Please notice that it will point to:
- The last developers that touched on the source code. On the above example, - The last developers that touched the source code (if this is done inside
Tejun and Bhaktipriya (in this specific case, none really envolved on the a git tree). On the above example, Tejun and Bhaktipriya (in this
development of this file); specific case, none really envolved on the development of this file);
- The driver maintainer (Hans Verkuil); - The driver maintainer (Hans Verkuil);
- The subsystem maintainer (Mauro Carvalho Chehab); - The subsystem maintainer (Mauro Carvalho Chehab);
- The driver and/or subsystem mailing list (linux-media@vger.kernel.org); - The driver and/or subsystem mailing list (linux-media@vger.kernel.org);
- the Linux Kernel mailing list (linux-kernel@vger.kernel.org). - the Linux Kernel mailing list (linux-kernel@vger.kernel.org).
Usually, the fastest way to have your bug fixed is to report it to mailing Usually, the fastest way to have your bug fixed is to report it to mailing
list used for the development of the code (linux-media ML) copying the driver maintainer (Hans). list used for the development of the code (linux-media ML) copying the
driver maintainer (Hans).
If you are totally stumped as to whom to send the report, and If you are totally stumped as to whom to send the report, and
``get_maintainer.pl`` didn't provide you anything useful, send it to ``get_maintainer.pl`` didn't provide you anything useful, send it to
...@@ -303,9 +312,9 @@ protection fault message can be simply cut out of the message files ...@@ -303,9 +312,9 @@ protection fault message can be simply cut out of the message files
and forwarded to the kernel developers. and forwarded to the kernel developers.
Two types of address resolution are performed by ``klogd``. The first is Two types of address resolution are performed by ``klogd``. The first is
static translation and the second is dynamic translation. Static static translation and the second is dynamic translation.
translation uses the System.map file in much the same manner that Static translation uses the System.map file.
ksymoops does. In order to do static translation the ``klogd`` daemon In order to do static translation the ``klogd`` daemon
must be able to find a system map file at daemon initialization time. must be able to find a system map file at daemon initialization time.
See the klogd man page for information on how ``klogd`` searches for map See the klogd man page for information on how ``klogd`` searches for map
files. files.
......
...@@ -105,7 +105,7 @@ References ...@@ -105,7 +105,7 @@ References
---------- ----------
- http://lkml.org/lkml/2007/2/12/6 - http://lkml.org/lkml/2007/2/12/6
- Documentation/filesystems/proc.txt (1.8) - Documentation/filesystems/proc.rst (1.8)
Thanks Thanks
......
...@@ -268,7 +268,7 @@ Guest mitigation mechanisms ...@@ -268,7 +268,7 @@ Guest mitigation mechanisms
/proc/irq/$NR/smp_affinity[_list] files. Limited documentation is /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
available at: available at:
https://www.kernel.org/doc/Documentation/IRQ-affinity.txt https://www.kernel.org/doc/Documentation/core-api/irq/irq-affinity.rst
.. _smt_control: .. _smt_control:
......
Explaining the dreaded "No init found." boot hang message Explaining the "No working init found." boot hang message
========================================================= =========================================================
:Authors: Andreas Mohr <andi at lisas period de>
Cristian Souza <cristianmsbr at gmail period com>
OK, so you've got this pretty unintuitive message (currently located This document provides some high-level reasons for failure
in init/main.c) and are wondering what the H*** went wrong. (listed roughly in order of execution) to load the init binary.
Some high-level reasons for failure (listed roughly in order of execution)
to load the init binary are: 1) **Unable to mount root FS**: Set "debug" kernel parameter (in bootloader
config file or CONFIG_CMDLINE) to get more detailed kernel messages.
A) Unable to mount root FS
B) init binary doesn't exist on rootfs 2) **init binary doesn't exist on rootfs**: Make sure you have the correct
C) broken console device root FS type (and ``root=`` kernel parameter points to the correct
D) binary exists but dependencies not available partition), required drivers such as storage hardware (such as SCSI or
E) binary cannot be loaded USB!) and filesystem (ext3, jffs2, etc.) are builtin (alternatively as
modules, to be pre-loaded by an initrd).
Detailed explanations:
3) **Broken console device**: Possibly a conflict in ``console= setup``
A) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE) --> initial console unavailable. E.g. some serial consoles are unreliable
to get more detailed kernel messages. due to serial IRQ issues (e.g. missing interrupt-based configuration).
B) make sure you have the correct root FS type
(and ``root=`` kernel parameter points to the correct partition),
required drivers such as storage hardware (such as SCSI or USB!)
and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules,
to be pre-loaded by an initrd)
C) Possibly a conflict in ``console= setup`` --> initial console unavailable.
E.g. some serial consoles are unreliable due to serial IRQ issues (e.g.
missing interrupt-based configuration).
Try using a different ``console= device`` or e.g. ``netconsole=``. Try using a different ``console= device`` or e.g. ``netconsole=``.
D) e.g. required library dependencies of the init binary such as
``/lib/ld-linux.so.2`` missing or broken. Use 4) **Binary exists but dependencies not available**: E.g. required library
``readelf -d <INIT>|grep NEEDED`` to find out which libraries are required. dependencies of the init binary such as ``/lib/ld-linux.so.2`` missing or
E) make sure the binary's architecture matches your hardware. broken. Use ``readelf -d <INIT>|grep NEEDED`` to find out which libraries
E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware. are required.
In case you tried loading a non-binary file here (shell script?),
you should make sure that the script specifies an interpreter in its shebang 5) **Binary cannot be loaded**: Make sure the binary's architecture matches
header line (``#!/...``) that is fully working (including its library your hardware. E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM
dependencies). And before tackling scripts, better first test a simple hardware. In case you tried loading a non-binary file here (shell script?),
non-script binary such as ``/bin/sh`` and confirm its successful execution. you should make sure that the script specifies an interpreter in its
To find out more, add code ``to init/main.c`` to display kernel_execve()s shebang header line (``#!/...``) that is fully working (including its
return values. library dependencies). And before tackling scripts, better first test a
simple non-script binary such as ``/bin/sh`` and confirm its successful
execution. To find out more, add code ``to init/main.c`` to display
kernel_execve()s return values.
Please extend this explanation whenever you find new failure causes Please extend this explanation whenever you find new failure causes
(after all loading the init binary is a CRITICAL and hard transition step (after all loading the init binary is a CRITICAL and hard transition step
which needs to be made as painless as possible), then submit patch to LKML. which needs to be made as painless as possible), then submit a patch to LKML.
Further TODOs: Further TODOs:
- Implement the various ``run_init_process()`` invocations via a struct array - Implement the various ``run_init_process()`` invocations via a struct array
which can then store the ``kernel_execve()`` result value and on failure which can then store the ``kernel_execve()`` result value and on failure
log it all by iterating over **all** results (very important usability fix). log it all by iterating over **all** results (very important usability fix).
- try to make the implementation itself more helpful in general, - Try to make the implementation itself more helpful in general, e.g. by
e.g. by providing additional error messages at affected places. providing additional error messages at affected places.
Andreas Mohr <andi at lisas period de>
...@@ -3336,7 +3336,7 @@ ...@@ -3336,7 +3336,7 @@
See Documentation/admin-guide/sysctl/vm.rst for details. See Documentation/admin-guide/sysctl/vm.rst for details.
ohci1394_dma=early [HW] enable debugging via the ohci1394 driver. ohci1394_dma=early [HW] enable debugging via the ohci1394 driver.
See Documentation/debugging-via-ohci1394.txt for more See Documentation/core-api/debugging-via-ohci1394.rst for more
info. info.
olpc_ec_timeout= [OLPC] ms delay when issuing EC commands olpc_ec_timeout= [OLPC] ms delay when issuing EC commands
......
...@@ -10,7 +10,7 @@ them to a "housekeeping" CPU dedicated to such work. ...@@ -10,7 +10,7 @@ them to a "housekeeping" CPU dedicated to such work.
References References
========== ==========
- Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs. - Documentation/core-api/irq/irq-affinity.rst: Binding interrupts to sets of CPUs.
- Documentation/admin-guide/cgroup-v1: Using cgroups to bind tasks to sets of CPUs. - Documentation/admin-guide/cgroup-v1: Using cgroups to bind tasks to sets of CPUs.
......
...@@ -12,107 +12,107 @@ and more generally they allow userland to take control of various ...@@ -12,107 +12,107 @@ and more generally they allow userland to take control of various
memory page faults, something otherwise only the kernel code could do. memory page faults, something otherwise only the kernel code could do.
For example userfaults allows a proper and more optimal implementation For example userfaults allows a proper and more optimal implementation
of the PROT_NONE+SIGSEGV trick. of the ``PROT_NONE+SIGSEGV`` trick.
Design Design
====== ======
Userfaults are delivered and resolved through the userfaultfd syscall. Userfaults are delivered and resolved through the ``userfaultfd`` syscall.
The userfaultfd (aside from registering and unregistering virtual The ``userfaultfd`` (aside from registering and unregistering virtual
memory ranges) provides two primary functionalities: memory ranges) provides two primary functionalities:
1) read/POLLIN protocol to notify a userland thread of the faults 1) ``read/POLLIN`` protocol to notify a userland thread of the faults
happening happening
2) various UFFDIO_* ioctls that can manage the virtual memory regions 2) various ``UFFDIO_*`` ioctls that can manage the virtual memory regions
registered in the userfaultfd that allows userland to efficiently registered in the ``userfaultfd`` that allows userland to efficiently
resolve the userfaults it receives via 1) or to manage the virtual resolve the userfaults it receives via 1) or to manage the virtual
memory in the background memory in the background
The real advantage of userfaults if compared to regular virtual memory The real advantage of userfaults if compared to regular virtual memory
management of mremap/mprotect is that the userfaults in all their management of mremap/mprotect is that the userfaults in all their
operations never involve heavyweight structures like vmas (in fact the operations never involve heavyweight structures like vmas (in fact the
userfaultfd runtime load never takes the mmap_sem for writing). ``userfaultfd`` runtime load never takes the mmap_sem for writing).
Vmas are not suitable for page- (or hugepage) granular fault tracking Vmas are not suitable for page- (or hugepage) granular fault tracking
when dealing with virtual address spaces that could span when dealing with virtual address spaces that could span
Terabytes. Too many vmas would be needed for that. Terabytes. Too many vmas would be needed for that.
The userfaultfd once opened by invoking the syscall, can also be The ``userfaultfd`` once opened by invoking the syscall, can also be
passed using unix domain sockets to a manager process, so the same passed using unix domain sockets to a manager process, so the same
manager process could handle the userfaults of a multitude of manager process could handle the userfaults of a multitude of
different processes without them being aware about what is going on different processes without them being aware about what is going on
(well of course unless they later try to use the userfaultfd (well of course unless they later try to use the ``userfaultfd``
themselves on the same region the manager is already tracking, which themselves on the same region the manager is already tracking, which
is a corner case that would currently return -EBUSY). is a corner case that would currently return ``-EBUSY``).
API API
=== ===
When first opened the userfaultfd must be enabled invoking the When first opened the ``userfaultfd`` must be enabled invoking the
UFFDIO_API ioctl specifying a uffdio_api.api value set to UFFD_API (or ``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or
a later API version) which will specify the read/POLLIN protocol a later API version) which will specify the ``read/POLLIN`` protocol
userland intends to speak on the UFFD and the uffdio_api.features userland intends to speak on the ``UFFD`` and the ``uffdio_api.features``
userland requires. The UFFDIO_API ioctl if successful (i.e. if the userland requires. The ``UFFDIO_API`` ioctl if successful (i.e. if the
requested uffdio_api.api is spoken also by the running kernel and the requested ``uffdio_api.api`` is spoken also by the running kernel and the
requested features are going to be enabled) will return into requested features are going to be enabled) will return into
uffdio_api.features and uffdio_api.ioctls two 64bit bitmasks of ``uffdio_api.features`` and ``uffdio_api.ioctls`` two 64bit bitmasks of
respectively all the available features of the read(2) protocol and respectively all the available features of the read(2) protocol and
the generic ioctl available. the generic ioctl available.
The uffdio_api.features bitmask returned by the UFFDIO_API ioctl The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl
defines what memory types are supported by the userfaultfd and what defines what memory types are supported by the ``userfaultfd`` and what
events, except page fault notifications, may be generated. events, except page fault notifications, may be generated.
If the kernel supports registering userfaultfd ranges on hugetlbfs If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs
virtual memory areas, UFFD_FEATURE_MISSING_HUGETLBFS will be set in virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in
uffdio_api.features. Similarly, UFFD_FEATURE_MISSING_SHMEM will be ``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be
set if the kernel supports registering userfaultfd ranges on shared set if the kernel supports registering ``userfaultfd`` ranges on shared
memory (covering all shmem APIs, i.e. tmpfs, IPCSHM, /dev/zero memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``,
MAP_SHARED, memfd_create, etc). ``MAP_SHARED``, ``memfd_create``, etc).
The userland application that wants to use userfaultfd with hugetlbfs The userland application that wants to use ``userfaultfd`` with hugetlbfs
or shared memory need to set the corresponding flag in or shared memory need to set the corresponding flag in
uffdio_api.features to enable those features. ``uffdio_api.features`` to enable those features.
If the userland desires to receive notifications for events other than If the userland desires to receive notifications for events other than
page faults, it has to verify that uffdio_api.features has appropriate page faults, it has to verify that ``uffdio_api.features`` has appropriate
UFFD_FEATURE_EVENT_* bits set. These events are described in more ``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more
detail below in "Non-cooperative userfaultfd" section. detail below in `Non-cooperative userfaultfd`_ section.
Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should
be invoked (if present in the returned uffdio_api.ioctls bitmask) to be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to
register a memory range in the userfaultfd by setting the register a memory range in the ``userfaultfd`` by setting the
uffdio_register structure accordingly. The uffdio_register.mode uffdio_register structure accordingly. The ``uffdio_register.mode``
bitmask will specify to the kernel which kind of faults to track for bitmask will specify to the kernel which kind of faults to track for
the range (UFFDIO_REGISTER_MODE_MISSING would track missing the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing
pages). The UFFDIO_REGISTER ioctl will return the pages). The ``UFFDIO_REGISTER`` ioctl will return the
uffdio_register.ioctls bitmask of ioctls that are suitable to resolve ``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve
userfaults on the range registered. Not all ioctls will necessarily be userfaults on the range registered. Not all ioctls will necessarily be
supported for all memory types depending on the underlying virtual supported for all memory types depending on the underlying virtual
memory backend (anonymous memory vs tmpfs vs real filebacked memory backend (anonymous memory vs tmpfs vs real filebacked
mappings). mappings).
Userland can use the uffdio_register.ioctls to manage the virtual Userland can use the ``uffdio_register.ioctls`` to manage the virtual
address space in the background (to add or potentially also remove address space in the background (to add or potentially also remove
memory from the userfaultfd registered range). This means a userfault memory from the ``userfaultfd`` registered range). This means a userfault
could be triggering just before userland maps in the background the could be triggering just before userland maps in the background the
user-faulted page. user-faulted page.
The primary ioctl to resolve userfaults is UFFDIO_COPY. That The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That
atomically copies a page into the userfault registered range and wakes atomically copies a page into the userfault registered range and wakes
up the blocked userfaults (unless uffdio_copy.mode & up the blocked userfaults
UFFDIO_COPY_MODE_DONTWAKE is set). Other ioctl works similarly to (unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set).
UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in
half copied page since it'll keep userfaulting until the copy has guaranteeing that nothing can see an half copied page since it'll
finished. keep userfaulting until the copy has finished.
Notes: Notes:
- If you requested UFFDIO_REGISTER_MODE_MISSING when registering then - If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then
you must provide some kind of page in your thread after reading from you must provide some kind of page in your thread after reading from
the uffd. You must provide either UFFDIO_COPY or UFFDIO_ZEROPAGE. the uffd. You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``.
The normal behavior of the OS automatically providing a zero page on The normal behavior of the OS automatically providing a zero page on
an annonymous mmaping is not in place. an annonymous mmaping is not in place.
...@@ -122,13 +122,13 @@ Notes: ...@@ -122,13 +122,13 @@ Notes:
- You get the address of the access that triggered the missing page - You get the address of the access that triggered the missing page
event out of a struct uffd_msg that you read in the thread from the event out of a struct uffd_msg that you read in the thread from the
uffd. You can supply as many pages as you want with UFFDIO_COPY or uffd. You can supply as many pages as you want with ``UFFDIO_COPY`` or
UFFDIO_ZEROPAGE. Keep in mind that unless you used DONTWAKE then ``UFFDIO_ZEROPAGE``. Keep in mind that unless you used DONTWAKE then
the first of any of those IOCTLs wakes up the faulting thread. the first of any of those IOCTLs wakes up the faulting thread.
- Be sure to test for all errors including (pollfd[0].revents & - Be sure to test for all errors including
POLLERR). This can happen, e.g. when ranges supplied were (``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges
incorrect. supplied were incorrect.
Write Protect Notifications Write Protect Notifications
--------------------------- ---------------------------
...@@ -136,41 +136,42 @@ Write Protect Notifications ...@@ -136,41 +136,42 @@ Write Protect Notifications
This is equivalent to (but faster than) using mprotect and a SIGSEGV This is equivalent to (but faster than) using mprotect and a SIGSEGV
signal handler. signal handler.
Firstly you need to register a range with UFFDIO_REGISTER_MODE_WP. Firstly you need to register a range with ``UFFDIO_REGISTER_MODE_WP``.
Instead of using mprotect(2) you use ioctl(uffd, UFFDIO_WRITEPROTECT, Instead of using mprotect(2) you use
struct *uffdio_writeprotect) while mode = UFFDIO_WRITEPROTECT_MODE_WP ``ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect)``
while ``mode = UFFDIO_WRITEPROTECT_MODE_WP``
in the struct passed in. The range does not default to and does not in the struct passed in. The range does not default to and does not
have to be identical to the range you registered with. You can write have to be identical to the range you registered with. You can write
protect as many ranges as you like (inside the registered range). protect as many ranges as you like (inside the registered range).
Then, in the thread reading from uffd the struct will have Then, in the thread reading from uffd the struct will have
msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP set. Now you send ``msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP`` set. Now you send
ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect) again ``ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect)``
while pagefault.mode does not have UFFDIO_WRITEPROTECT_MODE_WP set. again while ``pagefault.mode`` does not have ``UFFDIO_WRITEPROTECT_MODE_WP``
This wakes up the thread which will continue to run with writes. This set. This wakes up the thread which will continue to run with writes. This
allows you to do the bookkeeping about the write in the uffd reading allows you to do the bookkeeping about the write in the uffd reading
thread before the ioctl. thread before the ioctl.
If you registered with both UFFDIO_REGISTER_MODE_MISSING and If you registered with both ``UFFDIO_REGISTER_MODE_MISSING`` and
UFFDIO_REGISTER_MODE_WP then you need to think about the sequence in ``UFFDIO_REGISTER_MODE_WP`` then you need to think about the sequence in
which you supply a page and undo write protect. Note that there is a which you supply a page and undo write protect. Note that there is a
difference between writes into a WP area and into a !WP area. The difference between writes into a WP area and into a !WP area. The
former will have UFFD_PAGEFAULT_FLAG_WP set, the latter former will have ``UFFD_PAGEFAULT_FLAG_WP`` set, the latter
UFFD_PAGEFAULT_FLAG_WRITE. The latter did not fail on protection but ``UFFD_PAGEFAULT_FLAG_WRITE``. The latter did not fail on protection but
you still need to supply a page when UFFDIO_REGISTER_MODE_MISSING was you still need to supply a page when ``UFFDIO_REGISTER_MODE_MISSING`` was
used. used.
QEMU/KVM QEMU/KVM
======== ========
QEMU/KVM is using the userfaultfd syscall to implement postcopy live QEMU/KVM is using the ``userfaultfd`` syscall to implement postcopy live
migration. Postcopy live migration is one form of memory migration. Postcopy live migration is one form of memory
externalization consisting of a virtual machine running with part or externalization consisting of a virtual machine running with part or
all of its memory residing on a different node in the cloud. The all of its memory residing on a different node in the cloud. The
userfaultfd abstraction is generic enough that not a single line of ``userfaultfd`` abstraction is generic enough that not a single line of
KVM kernel code had to be modified in order to add postcopy live KVM kernel code had to be modified in order to add postcopy live
migration to QEMU. migration to QEMU.
Guest async page faults, FOLL_NOWAIT and all other GUP features work Guest async page faults, ``FOLL_NOWAIT`` and all other ``GUP*`` features work
just fine in combination with userfaults. Userfaults trigger async just fine in combination with userfaults. Userfaults trigger async
page faults in the guest scheduler so those guest processes that page faults in the guest scheduler so those guest processes that
aren't waiting for userfaults (i.e. network bound) can keep running in aren't waiting for userfaults (i.e. network bound) can keep running in
...@@ -183,19 +184,19 @@ generating userfaults for readonly guest regions. ...@@ -183,19 +184,19 @@ generating userfaults for readonly guest regions.
The implementation of postcopy live migration currently uses one The implementation of postcopy live migration currently uses one
single bidirectional socket but in the future two different sockets single bidirectional socket but in the future two different sockets
will be used (to reduce the latency of the userfaults to the minimum will be used (to reduce the latency of the userfaults to the minimum
possible without having to decrease /proc/sys/net/ipv4/tcp_wmem). possible without having to decrease ``/proc/sys/net/ipv4/tcp_wmem``).
The QEMU in the source node writes all pages that it knows are missing The QEMU in the source node writes all pages that it knows are missing
in the destination node, into the socket, and the migration thread of in the destination node, into the socket, and the migration thread of
the QEMU running in the destination node runs UFFDIO_COPY|ZEROPAGE the QEMU running in the destination node runs ``UFFDIO_COPY|ZEROPAGE``
ioctls on the userfaultfd in order to map the received pages into the ioctls on the ``userfaultfd`` in order to map the received pages into the
guest (UFFDIO_ZEROCOPY is used if the source page was a zero page). guest (``UFFDIO_ZEROCOPY`` is used if the source page was a zero page).
A different postcopy thread in the destination node listens with A different postcopy thread in the destination node listens with
poll() to the userfaultfd in parallel. When a POLLIN event is poll() to the ``userfaultfd`` in parallel. When a ``POLLIN`` event is
generated after a userfault triggers, the postcopy thread read() from generated after a userfault triggers, the postcopy thread read() from
the userfaultfd and receives the fault address (or -EAGAIN in case the the ``userfaultfd`` and receives the fault address (or ``-EAGAIN`` in case the
userfault was already resolved and waken by a UFFDIO_COPY|ZEROPAGE run userfault was already resolved and waken by a ``UFFDIO_COPY|ZEROPAGE`` run
by the parallel QEMU migration thread). by the parallel QEMU migration thread).
After the QEMU postcopy thread (running in the destination node) gets After the QEMU postcopy thread (running in the destination node) gets
...@@ -206,7 +207,7 @@ remaining missing pages from that new page offset. Soon after that ...@@ -206,7 +207,7 @@ remaining missing pages from that new page offset. Soon after that
(just the time to flush the tcp_wmem queue through the network) the (just the time to flush the tcp_wmem queue through the network) the
migration thread in the QEMU running in the destination node will migration thread in the QEMU running in the destination node will
receive the page that triggered the userfault and it'll map it as receive the page that triggered the userfault and it'll map it as
usual with the UFFDIO_COPY|ZEROPAGE (without actually knowing if it usual with the ``UFFDIO_COPY|ZEROPAGE`` (without actually knowing if it
was spontaneously sent by the source or if it was an urgent page was spontaneously sent by the source or if it was an urgent page
requested through a userfault). requested through a userfault).
...@@ -219,74 +220,74 @@ checked to find which missing pages to send in round robin and we seek ...@@ -219,74 +220,74 @@ checked to find which missing pages to send in round robin and we seek
over it when receiving incoming userfaults. After sending each page of over it when receiving incoming userfaults. After sending each page of
course the bitmap is updated accordingly. It's also useful to avoid course the bitmap is updated accordingly. It's also useful to avoid
sending the same page twice (in case the userfault is read by the sending the same page twice (in case the userfault is read by the
postcopy thread just before UFFDIO_COPY|ZEROPAGE runs in the migration postcopy thread just before ``UFFDIO_COPY|ZEROPAGE`` runs in the migration
thread). thread).
Non-cooperative userfaultfd Non-cooperative userfaultfd
=========================== ===========================
When the userfaultfd is monitored by an external manager, the manager When the ``userfaultfd`` is monitored by an external manager, the manager
must be able to track changes in the process virtual memory must be able to track changes in the process virtual memory
layout. Userfaultfd can notify the manager about such changes using layout. Userfaultfd can notify the manager about such changes using
the same read(2) protocol as for the page fault notifications. The the same read(2) protocol as for the page fault notifications. The
manager has to explicitly enable these events by setting appropriate manager has to explicitly enable these events by setting appropriate
bits in uffdio_api.features passed to UFFDIO_API ioctl: bits in ``uffdio_api.features`` passed to ``UFFDIO_API`` ioctl:
UFFD_FEATURE_EVENT_FORK ``UFFD_FEATURE_EVENT_FORK``
enable userfaultfd hooks for fork(). When this feature is enable ``userfaultfd`` hooks for fork(). When this feature is
enabled, the userfaultfd context of the parent process is enabled, the ``userfaultfd`` context of the parent process is
duplicated into the newly created process. The manager duplicated into the newly created process. The manager
receives UFFD_EVENT_FORK with file descriptor of the new receives ``UFFD_EVENT_FORK`` with file descriptor of the new
userfaultfd context in the uffd_msg.fork. ``userfaultfd`` context in the ``uffd_msg.fork``.
UFFD_FEATURE_EVENT_REMAP ``UFFD_FEATURE_EVENT_REMAP``
enable notifications about mremap() calls. When the enable notifications about mremap() calls. When the
non-cooperative process moves a virtual memory area to a non-cooperative process moves a virtual memory area to a
different location, the manager will receive different location, the manager will receive
UFFD_EVENT_REMAP. The uffd_msg.remap will contain the old and ``UFFD_EVENT_REMAP``. The ``uffd_msg.remap`` will contain the old and
new addresses of the area and its original length. new addresses of the area and its original length.
UFFD_FEATURE_EVENT_REMOVE ``UFFD_FEATURE_EVENT_REMOVE``
enable notifications about madvise(MADV_REMOVE) and enable notifications about madvise(MADV_REMOVE) and
madvise(MADV_DONTNEED) calls. The event UFFD_EVENT_REMOVE will madvise(MADV_DONTNEED) calls. The event ``UFFD_EVENT_REMOVE`` will
be generated upon these calls to madvise. The uffd_msg.remove be generated upon these calls to madvise(). The ``uffd_msg.remove``
will contain start and end addresses of the removed area. will contain start and end addresses of the removed area.
UFFD_FEATURE_EVENT_UNMAP ``UFFD_FEATURE_EVENT_UNMAP``
enable notifications about memory unmapping. The manager will enable notifications about memory unmapping. The manager will
get UFFD_EVENT_UNMAP with uffd_msg.remove containing start and get ``UFFD_EVENT_UNMAP`` with ``uffd_msg.remove`` containing start and
end addresses of the unmapped area. end addresses of the unmapped area.
Although the UFFD_FEATURE_EVENT_REMOVE and UFFD_FEATURE_EVENT_UNMAP Although the ``UFFD_FEATURE_EVENT_REMOVE`` and ``UFFD_FEATURE_EVENT_UNMAP``
are pretty similar, they quite differ in the action expected from the are pretty similar, they quite differ in the action expected from the
userfaultfd manager. In the former case, the virtual memory is ``userfaultfd`` manager. In the former case, the virtual memory is
removed, but the area is not, the area remains monitored by the removed, but the area is not, the area remains monitored by the
userfaultfd, and if a page fault occurs in that area it will be ``userfaultfd``, and if a page fault occurs in that area it will be
delivered to the manager. The proper resolution for such page fault is delivered to the manager. The proper resolution for such page fault is
to zeromap the faulting address. However, in the latter case, when an to zeromap the faulting address. However, in the latter case, when an
area is unmapped, either explicitly (with munmap() system call), or area is unmapped, either explicitly (with munmap() system call), or
implicitly (e.g. during mremap()), the area is removed and in turn the implicitly (e.g. during mremap()), the area is removed and in turn the
userfaultfd context for such area disappears too and the manager will ``userfaultfd`` context for such area disappears too and the manager will
not get further userland page faults from the removed area. Still, the not get further userland page faults from the removed area. Still, the
notification is required in order to prevent manager from using notification is required in order to prevent manager from using
UFFDIO_COPY on the unmapped area. ``UFFDIO_COPY`` on the unmapped area.
Unlike userland page faults which have to be synchronous and require Unlike userland page faults which have to be synchronous and require
explicit or implicit wakeup, all the events are delivered explicit or implicit wakeup, all the events are delivered
asynchronously and the non-cooperative process resumes execution as asynchronously and the non-cooperative process resumes execution as
soon as manager executes read(). The userfaultfd manager should soon as manager executes read(). The ``userfaultfd`` manager should
carefully synchronize calls to UFFDIO_COPY with the events carefully synchronize calls to ``UFFDIO_COPY`` with the events
processing. To aid the synchronization, the UFFDIO_COPY ioctl will processing. To aid the synchronization, the ``UFFDIO_COPY`` ioctl will
return -ENOSPC when the monitored process exits at the time of return ``-ENOSPC`` when the monitored process exits at the time of
UFFDIO_COPY, and -ENOENT, when the non-cooperative process has changed ``UFFDIO_COPY``, and ``-ENOENT``, when the non-cooperative process has changed
its virtual memory layout simultaneously with outstanding UFFDIO_COPY its virtual memory layout simultaneously with outstanding ``UFFDIO_COPY``
operation. operation.
The current asynchronous model of the event delivery is optimal for The current asynchronous model of the event delivery is optimal for
single threaded non-cooperative userfaultfd manager implementations. A single threaded non-cooperative ``userfaultfd`` manager implementations. A
synchronous event delivery model can be added later as a new synchronous event delivery model can be added later as a new
userfaultfd feature to facilitate multithreading enhancements of the ``userfaultfd`` feature to facilitate multithreading enhancements of the
non cooperative manager, for example to allow UFFDIO_COPY ioctls to non cooperative manager, for example to allow ``UFFDIO_COPY`` ioctls to
run in parallel to the event reception. Single threaded run in parallel to the event reception. Single threaded
implementations should continue to use the current async event implementations should continue to use the current async event
delivery model instead. delivery model instead.
...@@ -18,7 +18,7 @@ Mounting the root filesystem via NFS (nfsroot) ...@@ -18,7 +18,7 @@ Mounting the root filesystem via NFS (nfsroot)
In order to use a diskless system, such as an X-terminal or printer server for In order to use a diskless system, such as an X-terminal or printer server for
example, it is necessary for the root filesystem to be present on a non-disk example, it is necessary for the root filesystem to be present on a non-disk
device. This may be an initramfs (see device. This may be an initramfs (see
Documentation/filesystems/ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/filesystems/ramfs-rootfs-initramfs.rst), a ramdisk (see
Documentation/admin-guide/initrd.rst) or a filesystem mounted via NFS. The Documentation/admin-guide/initrd.rst) or a filesystem mounted via NFS. The
following text describes on how to use NFS for the root filesystem. For the rest following text describes on how to use NFS for the root filesystem. For the rest
of this text 'client' means the diskless system, and 'server' means the NFS of this text 'client' means the diskless system, and 'server' means the NFS
......
...@@ -6,6 +6,21 @@ Numa policy hit/miss statistics ...@@ -6,6 +6,21 @@ Numa policy hit/miss statistics
All units are pages. Hugepages have separate counters. All units are pages. Hugepages have separate counters.
The numa_hit, numa_miss and numa_foreign counters reflect how well processes
are able to allocate memory from nodes they prefer. If they succeed, numa_hit
is incremented on the preferred node, otherwise numa_foreign is incremented on
the preferred node and numa_miss on the node where allocation succeeded.
Usually preferred node is the one local to the CPU where the process executes,
but restrictions such as mempolicies can change that, so there are also two
counters based on CPU local node. local_node is similar to numa_hit and is
incremented on allocation from a node by CPU on the same node. other_node is
similar to numa_miss and is incremented on the node where allocation succeeds
from a CPU from a different node. Note there is no counter analogical to
numa_foreign.
In more detail:
=============== ============================================================ =============== ============================================================
numa_hit A process wanted to allocate memory from this node, numa_hit A process wanted to allocate memory from this node,
and succeeded. and succeeded.
...@@ -14,11 +29,13 @@ numa_miss A process wanted to allocate memory from another node, ...@@ -14,11 +29,13 @@ numa_miss A process wanted to allocate memory from another node,
but ended up with memory from this node. but ended up with memory from this node.
numa_foreign A process wanted to allocate on this node, numa_foreign A process wanted to allocate on this node,
but ended up with memory from another one. but ended up with memory from another node.
local_node A process ran on this node and got memory from it. local_node A process ran on this node's CPU,
and got memory from this node.
other_node A process ran on this node and got memory from another node. other_node A process ran on a different node's CPU
and got memory from this node.
interleave_hit Interleaving wanted to allocate from this node interleave_hit Interleaving wanted to allocate from this node
and succeeded. and succeeded.
...@@ -28,3 +45,11 @@ For easier reading you can use the numastat utility from the numactl package ...@@ -28,3 +45,11 @@ For easier reading you can use the numastat utility from the numactl package
(http://oss.sgi.com/projects/libnuma/). Note that it only works (http://oss.sgi.com/projects/libnuma/). Note that it only works
well right now on machines with a small number of CPUs. well right now on machines with a small number of CPUs.
Note that on systems with memoryless nodes (where a node has CPUs but no
memory) the numa_hit, numa_miss and numa_foreign statistics can be skewed
heavily. In the current kernel implementation, if a process prefers a
memoryless node (i.e. because it is running on one of its local CPU), the
implementation actually treats one of the nearest nodes with memory as the
preferred node. As a result, such allocation will not increase the numa_foreign
counter on the memoryless node, and will skew the numa_hit, numa_miss and
numa_foreign statistics of the nearest node.
...@@ -156,11 +156,11 @@ the labels provided by the BIOS won't match the real ones. ...@@ -156,11 +156,11 @@ the labels provided by the BIOS won't match the real ones.
ECC memory ECC memory
---------- ----------
As mentioned on the previous section, ECC memory has extra bits to be As mentioned in the previous section, ECC memory has extra bits to be
used for error correction. So, on 64 bit systems, a memory module used for error correction. In the above example, a memory module has
has 64 bits of *data width*, and 74 bits of *total width*. So, there are 64 bits of *data width*, and 72 bits of *total width*. The extra 8
8 bits extra bits to be used for the error detection and correction bits which are used for the error detection and correction mechanisms
mechanisms. Those extra bits are called *syndrome*\ [#f1]_\ [#f2]_. are referred to as the *syndrome*\ [#f1]_\ [#f2]_.
So, when the cpu requests the memory controller to write a word with So, when the cpu requests the memory controller to write a word with
*data width*, the memory controller calculates the *syndrome* in real time, *data width*, the memory controller calculates the *syndrome* in real time,
...@@ -212,7 +212,7 @@ EDAC - Error Detection And Correction ...@@ -212,7 +212,7 @@ EDAC - Error Detection And Correction
purposes. purposes.
When the subsystem was pushed upstream for the first time, on When the subsystem was pushed upstream for the first time, on
Kernel 2.6.16, for the first time, it was renamed to ``EDAC``. Kernel 2.6.16, it was renamed to ``EDAC``.
Purpose Purpose
------- -------
...@@ -351,15 +351,17 @@ controllers. The following example will assume 2 channels: ...@@ -351,15 +351,17 @@ controllers. The following example will assume 2 channels:
+------------+-----------+-----------+ +------------+-----------+-----------+
| | ``ch0`` | ``ch1`` | | | ``ch0`` | ``ch1`` |
+============+===========+===========+ +============+===========+===========+
| ``csrow0`` | DIMM_A0 | DIMM_B0 | | |**DIMM_A0**|**DIMM_B0**|
| | rank0 | rank0 | +------------+-----------+-----------+
+------------+ - | - | | ``csrow0`` | rank0 | rank0 |
+------------+-----------+-----------+
| ``csrow1`` | rank1 | rank1 | | ``csrow1`` | rank1 | rank1 |
+------------+-----------+-----------+ +------------+-----------+-----------+
| ``csrow2`` | DIMM_A1 | DIMM_B1 | | |**DIMM_A1**|**DIMM_B1**|
| | rank0 | rank0 | +------------+-----------+-----------+
+------------+ - | - | | ``csrow2`` | rank0 | rank0 |
| ``csrow3`` | rank1 | rank1 | +------------+-----------+-----------+
| ``csrow3`` | rank1 | rank1 |
+------------+-----------+-----------+ +------------+-----------+-----------+
In the above example, there are 4 physical slots on the motherboard In the above example, there are 4 physical slots on the motherboard
......
...@@ -102,6 +102,30 @@ See the ``type_of_loader`` and ``ext_loader_ver`` fields in ...@@ -102,6 +102,30 @@ See the ``type_of_loader`` and ``ext_loader_ver`` fields in
:doc:`/x86/boot` for additional information. :doc:`/x86/boot` for additional information.
bpf_stats_enabled
=================
Controls whether the kernel should collect statistics on BPF programs
(total time spent running, number of times run...). Enabling
statistics causes a slight reduction in performance on each program
run. The statistics can be seen using ``bpftool``.
= ===================================
0 Don't collect statistics (default).
1 Collect statistics.
= ===================================
cad_pid
=======
This is the pid which will be signalled on reboot (notably, by
Ctrl-Alt-Delete). Writing a value to this file which doesn't
correspond to a running process will result in ``-ESRCH``.
See also `ctrl-alt-del`_.
cap_last_cap cap_last_cap
============ ============
...@@ -241,6 +265,40 @@ domain names are in general different. For a detailed discussion ...@@ -241,6 +265,40 @@ domain names are in general different. For a detailed discussion
see the ``hostname(1)`` man page. see the ``hostname(1)`` man page.
firmware_config
===============
See :doc:`/driver-api/firmware/fallback-mechanisms`.
The entries in this directory allow the firmware loader helper
fallback to be controlled:
* ``force_sysfs_fallback``, when set to 1, forces the use of the
fallback;
* ``ignore_sysfs_fallback``, when set to 1, ignores any fallback.
ftrace_dump_on_oops
===================
Determines whether ``ftrace_dump()`` should be called on an oops (or
kernel panic). This will output the contents of the ftrace buffers to
the console. This is very useful for capturing traces that lead to
crashes and outputting them to a serial console.
= ===================================================
0 Disabled (default).
1 Dump buffers of all CPUs.
2 Dump the buffer of the CPU that triggered the oops.
= ===================================================
ftrace_enabled, stack_tracer_enabled
====================================
See :doc:`/trace/ftrace`.
hardlockup_all_cpu_backtrace hardlockup_all_cpu_backtrace
============================ ============================
...@@ -344,6 +402,25 @@ Controls whether the panic kmsg data should be reported to Hyper-V. ...@@ -344,6 +402,25 @@ Controls whether the panic kmsg data should be reported to Hyper-V.
= ========================================================= = =========================================================
ignore-unaligned-usertrap
=========================
On architectures where unaligned accesses cause traps, and where this
feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN``;
currently, ``arc`` and ``ia64``), controls whether all unaligned traps
are logged.
= =============================================================
0 Log all unaligned accesses.
1 Only warn the first time a process traps. This is the default
setting.
= =============================================================
See also `unaligned-trap`_ and `unaligned-dump-stack`_. On ``ia64``,
this allows system administrators to override the
``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
kexec_load_disabled kexec_load_disabled
=================== ===================
...@@ -459,6 +536,15 @@ Notes: ...@@ -459,6 +536,15 @@ Notes:
successful IPC object allocation. If an IPC object allocation syscall successful IPC object allocation. If an IPC object allocation syscall
fails, it is undefined if the value remains unmodified or is reset to -1. fails, it is undefined if the value remains unmodified or is reset to -1.
ngroups_max
===========
Maximum number of supplementary groups, _i.e._ the maximum size which
``setgroups`` will accept. Exports ``NGROUPS_MAX`` from the kernel.
nmi_watchdog nmi_watchdog
============ ============
...@@ -877,7 +963,7 @@ this sysctl interface anymore. ...@@ -877,7 +963,7 @@ this sysctl interface anymore.
pty pty
=== ===
See Documentation/filesystems/devpts.txt. See Documentation/filesystems/devpts.rst.
randomize_va_space randomize_va_space
...@@ -1173,6 +1259,65 @@ If a value outside of this range is written to ``threads-max`` an ...@@ -1173,6 +1259,65 @@ If a value outside of this range is written to ``threads-max`` an
``EINVAL`` error occurs. ``EINVAL`` error occurs.
traceoff_on_warning
===================
When set, disables tracing (see :doc:`/trace/ftrace`) when a
``WARN()`` is hit.
tracepoint_printk
=================
When tracepoints are sent to printk() (enabled by the ``tp_printk``
boot parameter), this entry provides runtime control::
echo 0 > /proc/sys/kernel/tracepoint_printk
will stop tracepoints from being sent to printk(), and::
echo 1 > /proc/sys/kernel/tracepoint_printk
will send them to printk() again.
This only works if the kernel was booted with ``tp_printk`` enabled.
See :doc:`/admin-guide/kernel-parameters` and
:doc:`/trace/boottime-trace`.
.. _unaligned-dump-stack:
unaligned-dump-stack (ia64)
===========================
When logging unaligned accesses, controls whether the stack is
dumped.
= ===================================================
0 Do not dump the stack. This is the default setting.
1 Dump the stack.
= ===================================================
See also `ignore-unaligned-usertrap`_.
unaligned-trap
==============
On architectures where unaligned accesses cause traps, and where this
feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_ALLOW``; currently,
``arc`` and ``parisc``), controls whether unaligned traps are caught
and emulated (instead of failing).
= ========================================================
0 Do not emulate unaligned accesses.
1 Emulate unaligned accesses. This is the default setting.
= ========================================================
See also `ignore-unaligned-usertrap`_.
unknown_nmi_panic unknown_nmi_panic
================= =================
...@@ -1184,6 +1329,16 @@ NMI switch that most IA32 servers have fires unknown NMI up, for ...@@ -1184,6 +1329,16 @@ NMI switch that most IA32 servers have fires unknown NMI up, for
example. If a system hangs up, try pressing the NMI switch. example. If a system hangs up, try pressing the NMI switch.
unprivileged_bpf_disabled
=========================
Writing 1 to this entry will disable unprivileged calls to ``bpf()``;
once disabled, calling ``bpf()`` without ``CAP_SYS_ADMIN`` will return
``-EPERM``.
Once set, this can't be cleared.
watchdog watchdog
======== ========
......
...@@ -24,13 +24,13 @@ optional external memory-mapped interface. ...@@ -24,13 +24,13 @@ optional external memory-mapped interface.
Version 1 of the Activity Monitors architecture implements a counter group Version 1 of the Activity Monitors architecture implements a counter group
of four fixed and architecturally defined 64-bit event counters. of four fixed and architecturally defined 64-bit event counters.
- CPU cycle counter: increments at the frequency of the CPU. - CPU cycle counter: increments at the frequency of the CPU.
- Constant counter: increments at the fixed frequency of the system - Constant counter: increments at the fixed frequency of the system
clock. clock.
- Instructions retired: increments with every architecturally executed - Instructions retired: increments with every architecturally executed
instruction. instruction.
- Memory stall cycles: counts instruction dispatch stall cycles caused by - Memory stall cycles: counts instruction dispatch stall cycles caused by
misses in the last level cache within the clock domain. misses in the last level cache within the clock domain.
When in WFI or WFE these counters do not increment. When in WFI or WFE these counters do not increment.
...@@ -59,11 +59,11 @@ counters, only the presence of the extension. ...@@ -59,11 +59,11 @@ counters, only the presence of the extension.
Firmware (code running at higher exception levels, e.g. arm-tf) support is Firmware (code running at higher exception levels, e.g. arm-tf) support is
needed to: needed to:
- Enable access for lower exception levels (EL2 and EL1) to the AMU - Enable access for lower exception levels (EL2 and EL1) to the AMU
registers. registers.
- Enable the counters. If not enabled these will read as 0. - Enable the counters. If not enabled these will read as 0.
- Save/restore the counters before/after the CPU is being put/brought up - Save/restore the counters before/after the CPU is being put/brought up
from the 'off' power state. from the 'off' power state.
When using kernels that have this feature enabled but boot with broken When using kernels that have this feature enabled but boot with broken
firmware the user may experience panics or lockups when accessing the firmware the user may experience panics or lockups when accessing the
...@@ -81,10 +81,10 @@ are not trapped in EL2/EL3. ...@@ -81,10 +81,10 @@ are not trapped in EL2/EL3.
The fixed counters of AMUv1 are accessible though the following system The fixed counters of AMUv1 are accessible though the following system
register definitions: register definitions:
- SYS_AMEVCNTR0_CORE_EL0 - SYS_AMEVCNTR0_CORE_EL0
- SYS_AMEVCNTR0_CONST_EL0 - SYS_AMEVCNTR0_CONST_EL0
- SYS_AMEVCNTR0_INST_RET_EL0 - SYS_AMEVCNTR0_INST_RET_EL0
- SYS_AMEVCNTR0_MEM_STALL_EL0 - SYS_AMEVCNTR0_MEM_STALL_EL0
Auxiliary platform specific counters can be accessed using Auxiliary platform specific counters can be accessed using
SYS_AMEVCNTR1_EL0(n), where n is a value between 0 and 15. SYS_AMEVCNTR1_EL0(n), where n is a value between 0 and 15.
...@@ -97,9 +97,9 @@ Userspace access ...@@ -97,9 +97,9 @@ Userspace access
Currently, access from userspace to the AMU registers is disabled due to: Currently, access from userspace to the AMU registers is disabled due to:
- Security reasons: they might expose information about code executed in - Security reasons: they might expose information about code executed in
secure mode. secure mode.
- Purpose: AMU counters are intended for system management use. - Purpose: AMU counters are intended for system management use.
Also, the presence of the feature is not visible to userspace. Also, the presence of the feature is not visible to userspace.
...@@ -110,8 +110,8 @@ Virtualization ...@@ -110,8 +110,8 @@ Virtualization
Currently, access from userspace (EL0) and kernelspace (EL1) on the KVM Currently, access from userspace (EL0) and kernelspace (EL1) on the KVM
guest side is disabled due to: guest side is disabled due to:
- Security reasons: they might expose information about code executed - Security reasons: they might expose information about code executed
by other guests or the host. by other guests or the host.
Any attempt to access the AMU registers will result in an UNDEFINED Any attempt to access the AMU registers will result in an UNDEFINED
exception being injected into the guest. exception being injected into the guest.
...@@ -173,8 +173,10 @@ Before jumping into the kernel, the following conditions must be met: ...@@ -173,8 +173,10 @@ Before jumping into the kernel, the following conditions must be met:
- Caches, MMUs - Caches, MMUs
The MMU must be off. The MMU must be off.
The instruction cache may be on or off, and must not hold any stale The instruction cache may be on or off, and must not hold any stale
entries corresponding to the loaded kernel image. entries corresponding to the loaded kernel image.
The address range corresponding to the loaded kernel image must be The address range corresponding to the loaded kernel image must be
cleaned to the PoC. In the presence of a system cache or other cleaned to the PoC. In the presence of a system cache or other
coherent masters with caches enabled, this will typically require coherent masters with caches enabled, this will typically require
...@@ -239,6 +241,7 @@ Before jumping into the kernel, the following conditions must be met: ...@@ -239,6 +241,7 @@ Before jumping into the kernel, the following conditions must be met:
- The DT or ACPI tables must describe a GICv2 interrupt controller. - The DT or ACPI tables must describe a GICv2 interrupt controller.
For CPUs with pointer authentication functionality: For CPUs with pointer authentication functionality:
- If EL3 is present: - If EL3 is present:
- SCR_EL3.APK (bit 16) must be initialised to 0b1 - SCR_EL3.APK (bit 16) must be initialised to 0b1
...@@ -250,18 +253,22 @@ Before jumping into the kernel, the following conditions must be met: ...@@ -250,18 +253,22 @@ Before jumping into the kernel, the following conditions must be met:
- HCR_EL2.API (bit 41) must be initialised to 0b1 - HCR_EL2.API (bit 41) must be initialised to 0b1
For CPUs with Activity Monitors Unit v1 (AMUv1) extension present: For CPUs with Activity Monitors Unit v1 (AMUv1) extension present:
- If EL3 is present: - If EL3 is present:
CPTR_EL3.TAM (bit 30) must be initialised to 0b0
CPTR_EL2.TAM (bit 30) must be initialised to 0b0 - CPTR_EL3.TAM (bit 30) must be initialised to 0b0
AMCNTENSET0_EL0 must be initialised to 0b1111 - CPTR_EL2.TAM (bit 30) must be initialised to 0b0
AMCNTENSET1_EL0 must be initialised to a platform specific value - AMCNTENSET0_EL0 must be initialised to 0b1111
having 0b1 set for the corresponding bit for each of the auxiliary - AMCNTENSET1_EL0 must be initialised to a platform specific value
counters present. having 0b1 set for the corresponding bit for each of the auxiliary
counters present.
- If the kernel is entered at EL1: - If the kernel is entered at EL1:
AMCNTENSET0_EL0 must be initialised to 0b1111
AMCNTENSET1_EL0 must be initialised to a platform specific value - AMCNTENSET0_EL0 must be initialised to 0b1111
having 0b1 set for the corresponding bit for each of the auxiliary - AMCNTENSET1_EL0 must be initialised to a platform specific value
counters present. having 0b1 set for the corresponding bit for each of the auxiliary
counters present.
The requirements described above for CPU mode, caches, MMUs, architected The requirements described above for CPU mode, caches, MMUs, architected
timers, coherency and system registers apply to all CPUs. All CPUs must timers, coherency and system registers apply to all CPUs. All CPUs must
...@@ -305,7 +312,8 @@ following manner: ...@@ -305,7 +312,8 @@ following manner:
Documentation/devicetree/bindings/arm/psci.yaml. Documentation/devicetree/bindings/arm/psci.yaml.
- Secondary CPU general-purpose register settings - Secondary CPU general-purpose register settings
x0 = 0 (reserved for future use)
x1 = 0 (reserved for future use) - x0 = 0 (reserved for future use)
x2 = 0 (reserved for future use) - x1 = 0 (reserved for future use)
x3 = 0 (reserved for future use) - x2 = 0 (reserved for future use)
- x3 = 0 (reserved for future use)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment