Skip to content
Snippets Groups Projects
Commit c5111f50 authored by Dave Kleikamp's avatar Dave Kleikamp
Browse files

Merge with /home/shaggy/git/linus-clean/

parents 69eb66d7 a488edc9
No related merge requests found
Showing
with 588 additions and 105 deletions
......@@ -3101,7 +3101,7 @@ S: Minto, NSW, 2566
S: Australia
N: Stephen Smalley
E: sds@epoch.ncsc.mil
E: sds@tycho.nsa.gov
D: portions of the Linux Security Module (LSM) framework and security modules
N: Chris Smith
......@@ -3643,11 +3643,9 @@ S: Cambridge. CB1 7EG
S: England
N: Chris Wright
E: chrisw@osdl.org
E: chrisw@sous-sol.org
D: hacking on LSM framework and security modules.
S: c/o OSDL
S: 12725 SW Millikan Way, Suite 400
S: Beaverton, OR 97005
S: Portland, OR
S: USA
N: Michal Wronski
......
......@@ -90,16 +90,20 @@ at OLS. The resulting abundance of RCU patches was presented the
following year [McKenney02a], and use of RCU in dcache was first
described that same year [Linder02a].
Also in 2002, Michael [Michael02b,Michael02a] presented techniques
that defer the destruction of data structures to simplify non-blocking
synchronization (wait-free synchronization, lock-free synchronization,
and obstruction-free synchronization are all examples of non-blocking
synchronization). In particular, this technique eliminates locking,
reduces contention, reduces memory latency for readers, and parallelizes
pipeline stalls and memory latency for writers. However, these
techniques still impose significant read-side overhead in the form of
memory barriers. Researchers at Sun worked along similar lines in the
same timeframe [HerlihyLM02,HerlihyLMS03].
Also in 2002, Michael [Michael02b,Michael02a] presented "hazard-pointer"
techniques that defer the destruction of data structures to simplify
non-blocking synchronization (wait-free synchronization, lock-free
synchronization, and obstruction-free synchronization are all examples of
non-blocking synchronization). In particular, this technique eliminates
locking, reduces contention, reduces memory latency for readers, and
parallelizes pipeline stalls and memory latency for writers. However,
these techniques still impose significant read-side overhead in the
form of memory barriers. Researchers at Sun worked along similar lines
in the same timeframe [HerlihyLM02,HerlihyLMS03]. These techniques
can be thought of as inside-out reference counts, where the count is
represented by the number of hazard pointers referencing a given data
structure (rather than the more conventional counter field within the
data structure itself).
In 2003, the K42 group described how RCU could be used to create
hot-pluggable implementations of operating-system functions. Later that
......@@ -113,7 +117,6 @@ number of operating-system kernels [PaulEdwardMcKenneyPhD], a paper
describing how to make RCU safe for soft-realtime applications [Sarma04c],
and a paper describing SELinux performance with RCU [JamesMorris04b].
2005 has seen further adaptation of RCU to realtime use, permitting
preemption of RCU realtime critical sections [PaulMcKenney05a,
PaulMcKenney05b].
......
......@@ -177,3 +177,9 @@ over a rather long period of time, but improvements are always welcome!
If you want to wait for some of these other things, you might
instead need to use synchronize_irq() or synchronize_sched().
12. Any lock acquired by an RCU callback must be acquired elsewhere
with irq disabled, e.g., via spin_lock_irqsave(). Failing to
disable irq on a given acquisition of that lock will result in
deadlock as soon as the RCU callback happens to interrupt that
acquisition's critical section.
......@@ -232,7 +232,7 @@ entry does not exist. For this to be helpful, the search function must
return holding the per-entry spinlock, as ipc_lock() does in fact do.
Quick Quiz: Why does the search function need to return holding the
per-entry lock for this deleted-flag technique to be helpful?
per-entry lock for this deleted-flag technique to be helpful?
If the system-call audit module were to ever need to reject stale data,
one way to accomplish this would be to add a "deleted" flag and a "lock"
......@@ -275,8 +275,8 @@ flag under the spinlock as follows:
{
struct audit_entry *e;
/* Do not use the _rcu iterator here, since this is the only
* deletion routine. */
/* Do not need to use the _rcu iterator here, since this
* is the only deletion routine. */
list_for_each_entry(e, list, list) {
if (!audit_compare_rule(rule, &e->rule)) {
spin_lock(&e->lock);
......@@ -304,9 +304,12 @@ function to reject newly deleted data.
Answer to Quick Quiz
If the search function drops the per-entry lock before returning, then
the caller will be processing stale data in any case. If it is really
OK to be processing stale data, then you don't need a "deleted" flag.
If processing stale data really is a problem, then you need to hold the
per-entry lock across all of the code that uses the value looked up.
Why does the search function need to return holding the per-entry
lock for this deleted-flag technique to be helpful?
If the search function drops the per-entry lock before returning,
then the caller will be processing stale data in any case. If it
is really OK to be processing stale data, then you don't need a
"deleted" flag. If processing stale data really is a problem,
then you need to hold the per-entry lock across all of the code
that uses the value that was returned.
......@@ -111,6 +111,11 @@ o What are all these files in this directory?
You are reading it!
rcuref.txt
Describes how to combine use of reference counts
with RCU.
whatisRCU.txt
Overview of how the RCU implementation works. Along
......
Refcounter design for elements of lists/arrays protected by RCU.
Reference-count design for elements of lists/arrays protected by RCU.
Refcounting on elements of lists which are protected by traditional
reader/writer spinlocks or semaphores are straight forward as in:
Reference counting on elements of lists which are protected by traditional
reader/writer spinlocks or semaphores are straightforward:
1. 2.
add() search_and_reference()
......@@ -28,12 +28,12 @@ release_referenced() delete()
...
}
If this list/array is made lock free using rcu as in changing the
write_lock in add() and delete() to spin_lock and changing read_lock
If this list/array is made lock free using RCU as in changing the
write_lock() in add() and delete() to spin_lock and changing read_lock
in search_and_reference to rcu_read_lock(), the atomic_get in
search_and_reference could potentially hold reference to an element which
has already been deleted from the list/array. atomic_inc_not_zero takes
care of this scenario. search_and_reference should look as;
has already been deleted from the list/array. Use atomic_inc_not_zero()
in this scenario as follows:
1. 2.
add() search_and_reference()
......@@ -51,17 +51,16 @@ add() search_and_reference()
release_referenced() delete()
{ {
... write_lock(&list_lock);
atomic_dec(&el->rc, relfunc) ...
... delete_element
} write_unlock(&list_lock);
...
if (atomic_dec_and_test(&el->rc)) ...
call_rcu(&el->head, el_free); delete_element
... write_unlock(&list_lock);
} ...
if (atomic_dec_and_test(&el->rc))
call_rcu(&el->head, el_free);
...
}
Sometimes, reference to the element need to be obtained in the
update (write) stream. In such cases, atomic_inc_not_zero might be an
overkill since the spinlock serialising list updates are held. atomic_inc
is to be used in such cases.
Sometimes, a reference to the element needs to be obtained in the
update (write) stream. In such cases, atomic_inc_not_zero() might be
overkill, since we hold the update-side spinlock. One might instead
use atomic_inc() in such cases.
......@@ -200,10 +200,11 @@ rcu_assign_pointer()
the new value, and also executes any memory-barrier instructions
required for a given CPU architecture.
Perhaps more important, it serves to document which pointers
are protected by RCU. That said, rcu_assign_pointer() is most
frequently used indirectly, via the _rcu list-manipulation
primitives such as list_add_rcu().
Perhaps just as important, it serves to document (1) which
pointers are protected by RCU and (2) the point at which a
given structure becomes accessible to other CPUs. That said,
rcu_assign_pointer() is most frequently used indirectly, via
the _rcu list-manipulation primitives such as list_add_rcu().
rcu_dereference()
......@@ -258,9 +259,11 @@ rcu_dereference()
locking.
As with rcu_assign_pointer(), an important function of
rcu_dereference() is to document which pointers are protected
by RCU. And, again like rcu_assign_pointer(), rcu_dereference()
is typically used indirectly, via the _rcu list-manipulation
rcu_dereference() is to document which pointers are protected by
RCU, in particular, flagging a pointer that is subject to changing
at any time, including immediately after the rcu_dereference().
And, again like rcu_assign_pointer(), rcu_dereference() is
typically used indirectly, via the _rcu list-manipulation
primitives, such as list_for_each_entry_rcu().
The following diagram shows how each API communicates among the
......@@ -327,7 +330,7 @@ for specialized uses, but are relatively uncommon.
3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
This section shows a simple use of the core RCU API to protect a
global pointer to a dynamically allocated structure. More typical
global pointer to a dynamically allocated structure. More-typical
uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt.
struct foo {
......@@ -410,6 +413,8 @@ o Use synchronize_rcu() -after- removing a data element from an
data item.
See checklist.txt for additional rules to follow when using RCU.
And again, more-typical uses of RCU may be found in listRCU.txt,
arrayRCU.txt, and NMI-RCU.txt.
4. WHAT IF MY UPDATING THREAD CANNOT BLOCK?
......@@ -513,7 +518,7 @@ production-quality implementation, and see:
for papers describing the Linux kernel RCU implementation. The OLS'01
and OLS'02 papers are a good introduction, and the dissertation provides
more details on the current implementation.
more details on the current implementation as of early 2004.
5A. "TOY" IMPLEMENTATION #1: LOCKING
......@@ -768,7 +773,6 @@ RCU pointer/list traversal:
rcu_dereference
list_for_each_rcu (to be deprecated in favor of
list_for_each_entry_rcu)
list_for_each_safe_rcu (deprecated, not used)
list_for_each_entry_rcu
list_for_each_continue_rcu (to be deprecated in favor of new
list_for_each_entry_continue_rcu)
......@@ -807,7 +811,8 @@ Quick Quiz #1: Why is this argument naive? How could a deadlock
Answer: Consider the following sequence of events:
1. CPU 0 acquires some unrelated lock, call it
"problematic_lock".
"problematic_lock", disabling irq via
spin_lock_irqsave().
2. CPU 1 enters synchronize_rcu(), write-acquiring
rcu_gp_mutex.
......@@ -894,7 +899,7 @@ Answer: Just as PREEMPT_RT permits preemption of spinlock
ACKNOWLEDGEMENTS
My thanks to the people who helped make this human-readable, including
Jon Walpole, Josh Triplett, Serge Hallyn, and Suzanne Wood.
Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.
For more information, see http://www.rdrop.com/users/paulmck/RCU.
......@@ -11,6 +11,8 @@
Joel Schopp <jschopp@austin.ibm.com>
ia64/x86_64:
Ashok Raj <ashok.raj@intel.com>
s390:
Heiko Carstens <heiko.carstens@de.ibm.com>
Authors: Ashok Raj <ashok.raj@intel.com>
Lots of feedback: Nathan Lynch <nathanl@austin.ibm.com>,
......@@ -44,9 +46,28 @@ maxcpus=n Restrict boot time cpus to n. Say if you have 4 cpus, using
maxcpus=2 will only boot 2. You can choose to bring the
other cpus later online, read FAQ's for more info.
additional_cpus=n [x86_64 only] use this to limit hotpluggable cpus.
This option sets
cpu_possible_map = cpu_present_map + additional_cpus
additional_cpus*=n Use this to limit hotpluggable cpus. This option sets
cpu_possible_map = cpu_present_map + additional_cpus
(*) Option valid only for following architectures
- x86_64, ia64, s390
ia64 and x86_64 use the number of disabled local apics in ACPI tables MADT
to determine the number of potentially hot-pluggable cpus. The implementation
should only rely on this to count the #of cpus, but *MUST* not rely on the
apicid values in those tables for disabled apics. In the event BIOS doesnt
mark such hot-pluggable cpus as disabled entries, one could use this
parameter "additional_cpus=x" to represent those cpus in the cpu_possible_map.
s390 uses the number of cpus it detects at IPL time to also the number of bits
in cpu_possible_map. If it is desired to add additional cpus at a later time
the number should be specified using this option or the possible_cpus option.
possible_cpus=n [s390 only] use this to set hotpluggable cpus.
This option sets possible_cpus bits in
cpu_possible_map. Thus keeping the numbers of bits set
constant even if the machine gets rebooted.
This option overrides additional_cpus.
CPU maps and such
-----------------
......
Export cpu topology info by sysfs. Items (attributes) are similar
to /proc/cpuinfo.
1) /sys/devices/system/cpu/cpuX/topology/physical_package_id:
represent the physical package id of cpu X;
2) /sys/devices/system/cpu/cpuX/topology/core_id:
represent the cpu core id to cpu X;
3) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
represent the thread siblings to cpu X in the same core;
4) /sys/devices/system/cpu/cpuX/topology/core_siblings:
represent the thread siblings to cpu X in the same physical package;
To implement it in an architecture-neutral way, a new source file,
driver/base/topology.c, is to export the 5 attributes.
If one architecture wants to support this feature, it just needs to
implement 4 defines, typically in file include/asm-XXX/topology.h.
The 4 defines are:
#define topology_physical_package_id(cpu)
#define topology_core_id(cpu)
#define topology_thread_siblings(cpu)
#define topology_core_siblings(cpu)
The type of **_id is int.
The type of siblings is cpumask_t.
To be consistent on all architectures, the 4 attributes should have
deafult values if their values are unavailable. Below is the rule.
1) physical_package_id: If cpu has no physical package id, -1 is the
default value.
2) core_id: If cpu doesn't support multi-core, its core id is 0.
3) thread_siblings: Just include itself, if the cpu doesn't support
HT/multi-thread.
4) core_siblings: Just include itself, if the cpu doesn't support
multi-core and HT/Multi-thread.
So be careful when declaring the 4 defines in include/asm-XXX/topology.h.
If an attribute isn't defined on an architecture, it won't be exported.
The Linux Kernel Device Model
Patrick Mochel <mochel@osdl.org>
Patrick Mochel <mochel@digitalimplant.org>
26 August 2002
Drafted 26 August 2002
Updated 31 January 2006
Overview
~~~~~~~~
This driver model is a unification of all the current, disparate driver models
that are currently in the kernel. It is intended to augment the
The Linux Kernel Driver Model is a unification of all the disparate driver
models that were previously used in the kernel. It is intended to augment the
bus-specific drivers for bridges and devices by consolidating a set of data
and operations into globally accessible data structures.
Current driver models implement some sort of tree-like structure (sometimes
just a list) for the devices they control. But, there is no linkage between
the different bus types.
Traditional driver models implemented some sort of tree-like structure
(sometimes just a list) for the devices they control. There wasn't any
uniformity across the different bus types.
A common data structure can provide this linkage with little overhead: when a
bus driver discovers a particular device, it can insert it into the global
tree as well as its local tree. In fact, the local tree becomes just a subset
of the global tree.
Common data fields can also be moved out of the local bus models into the
global model. Some of the manipulations of these fields can also be
consolidated. Most likely, manipulation functions will become a set
of helper functions, which the bus drivers wrap around to include any
bus-specific items.
The common device and bridge interface currently reflects the goals of the
modern PC: namely the ability to do seamless Plug and Play, power management,
and hot plug. (The model dictated by Intel and Microsoft (read: ACPI) ensures
us that any device in the system may fit any of these criteria.)
In reality, not every bus will be able to support such operations. But, most
buses will support a majority of those operations, and all future buses will.
In other words, a bus that doesn't support an operation is the exception,
instead of the other way around.
The current driver model provides a comon, uniform data model for describing
a bus and the devices that can appear under the bus. The unified bus
model includes a set of common attributes which all busses carry, and a set
of common callbacks, such as device discovery during bus probing, bus
shutdown, bus power management, etc.
The common device and bridge interface reflects the goals of the modern
computer: namely the ability to do seamless device "plug and play", power
management, and hot plug. In particular, the model dictated by Intel and
Microsoft (namely ACPI) ensures that almost every device on almost any bus
on an x86-compatible system can work within this paradigm. Of course,
not every bus is able to support all such operations, although most
buses support a most of those operations.
Downstream Access
~~~~~~~~~~~~~~~~~
Common data fields have been moved out of individual bus layers into a common
data structure. But, these fields must still be accessed by the bus layers,
data structure. These fields must still be accessed by the bus layers,
and sometimes by the device-specific drivers.
Other bus layers are encouraged to do what has been done for the PCI layer.
......@@ -53,7 +46,7 @@ struct pci_dev now looks like this:
struct pci_dev {
...
struct device device;
struct device dev;
};
Note first that it is statically allocated. This means only one allocation on
......@@ -64,9 +57,9 @@ the two.
The PCI bus layer freely accesses the fields of struct device. It knows about
the structure of struct pci_dev, and it should know the structure of struct
device. PCI devices that have been converted generally do not touch the fields
of struct device. More precisely, device-specific drivers should not touch
fields of struct device unless there is a strong compelling reason to do so.
device. Individual PCI device drivers that have been converted the the current
driver model generally do not and should not touch the fields of struct device,
unless there is a strong compelling reason to do so.
This abstraction is prevention of unnecessary pain during transitional phases.
If the name of the field changes or is removed, then every downstream driver
......
......@@ -111,4 +111,8 @@ source: linux/Documentation/video4linux/CARDLIST.bttv
If you have problems with this please do ask on the mailing list.
--
Authors: Richard Walker, Jamie Honan, Michael Hunold, Manu Abraham
Authors: Richard Walker,
Jamie Honan,
Michael Hunold,
Manu Abraham,
Michael Krufky
......@@ -148,3 +148,44 @@ Why: The 8250 serial driver now has the ability to deal with the differences
brother on Alchemy SOCs. The loss of features is not considered an
issue.
Who: Ralf Baechle <ralf@linux-mips.org>
---------------------------
What: Legacy /proc/pci interface (PCI_LEGACY_PROC)
When: March 2006
Why: deprecated since 2.5.53 in favor of lspci(8)
Who: Adrian Bunk <bunk@stusta.de>
---------------------------
What: pci_module_init(driver)
When: January 2007
Why: Is replaced by pci_register_driver(pci_driver).
Who: Richard Knutsson <ricknu-0@student.ltu.se> and Greg Kroah-Hartman <gregkh@suse.de>
---------------------------
What: I2C interface of the it87 driver
When: January 2007
Why: The ISA interface is faster and should be always available. The I2C
probing is also known to cause trouble in at least one case (see
bug #5889.)
Who: Jean Delvare <khali@linux-fr.org>
---------------------------
What: mount/umount uevents
When: February 2007
Why: These events are not correct, and do not properly let userspace know
when a file system has been mounted or unmounted. Userspace should
poll the /proc/mounts file instead to detect this properly.
Who: Greg Kroah-Hartman <gregkh@suse.de>
---------------------------
What: Support for NEC DDB5074 and DDB5476 evaluation boards.
When: June 2006
Why: Board specific code doesn't build anymore since ~2.6.0 and no
users have complained indicating there is no more need for these
boards. This should really be considered a last call.
Who: Ralf Baechle <ralf@linux-mips.org>
......@@ -320,6 +320,7 @@ static struct config_item_type simple_children_type = {
.ct_item_ops = &simple_children_item_ops,
.ct_group_ops = &simple_children_group_ops,
.ct_attrs = simple_children_attrs,
.ct_owner = THIS_MODULE,
};
static struct configfs_subsystem simple_children_subsys = {
......@@ -403,6 +404,7 @@ static struct config_item_type group_children_type = {
.ct_item_ops = &group_children_item_ops,
.ct_group_ops = &group_children_group_ops,
.ct_attrs = group_children_attrs,
.ct_owner = THIS_MODULE,
};
static struct configfs_subsystem group_children_subsys = {
......
......@@ -457,6 +457,12 @@ ChangeLog
Note, a technical ChangeLog aimed at kernel hackers is in fs/ntfs/ChangeLog.
2.1.26:
- Implement support for sector sizes above 512 bytes (up to the maximum
supported by NTFS which is 4096 bytes).
- Enhance support for NTFS volumes which were supported by Windows but
not by Linux due to invalid attribute list attribute flags.
- A few minor updates and bug fixes.
2.1.25:
- Write support is now extended with write(2) being able to both
overwrite existing file data and to extend files. Also, if a write
......
......@@ -35,6 +35,7 @@ Features which OCFS2 does not support yet:
be cluster coherent.
- quotas
- cluster aware flock
- cluster aware lockf
- Directory change notification (F_NOTIFY)
- Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease)
- POSIX ACLs
......
......@@ -79,15 +79,27 @@ that instance in a system with many cpus making intensive use of it.
tmpfs has a mount option to set the NUMA memory allocation policy for
all files in that instance:
mpol=interleave prefers to allocate memory from each node in turn
mpol=default prefers to allocate memory from the local node
mpol=bind prefers to allocate from mpol_nodelist
mpol=preferred prefers to allocate from first node in mpol_nodelist
all files in that instance (if CONFIG_NUMA is enabled) - which can be
adjusted on the fly via 'mount -o remount ...'
The following mount option is used in conjunction with mpol=interleave,
mpol=bind or mpol=preferred:
mpol_nodelist: nodelist suitable for parsing with nodelist_parse.
mpol=default prefers to allocate memory from the local node
mpol=prefer:Node prefers to allocate memory from the given Node
mpol=bind:NodeList allocates memory only from nodes in NodeList
mpol=interleave prefers to allocate from each node in turn
mpol=interleave:NodeList allocates from each node of NodeList in turn
NodeList format is a comma-separated list of decimal numbers and ranges,
a range being two hyphen-separated decimal numbers, the smallest and
largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15
Note that trying to mount a tmpfs with an mpol option will fail if the
running kernel does not support NUMA; and will fail if its nodelist
specifies a node >= MAX_NUMNODES. If your system relies on that tmpfs
being mounted, but from time to time runs a kernel built without NUMA
capability (perhaps a safe recovery kernel), or configured to support
fewer nodes, then it is advisable to omit the mpol option from automatic
mount options. It can be added later, when the tmpfs is already mounted
on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'.
To specify the initial root directory you can use the following mount
......@@ -109,4 +121,4 @@ RAM/SWAP in 10240 inodes and it is only accessible by root.
Author:
Christoph Rohland <cr@sap.com>, 1.12.01
Updated:
Hugh Dickins <hugh@veritas.com>, 13 March 2005
Hugh Dickins <hugh@veritas.com>, 19 February 2006
......@@ -57,8 +57,6 @@ OPTIONS
port=n port to connect to on the remote server
timeout=n request timeouts (in ms) (default 60000ms)
noextend force legacy mode (no 9P2000.u semantics)
uid attempt to mount as a particular uid
......@@ -74,10 +72,16 @@ OPTIONS
RESOURCES
=========
The Linux version of the 9P server, along with some client-side utilities
can be found at http://v9fs.sf.net (along with a CVS repository of the
development branch of this module). There are user and developer mailing
lists here, as well as a bug-tracker.
The Linux version of the 9P server is now maintained under the npfs project
on sourceforge (http://sourceforge.net/projects/npfs).
There are user and developer mailing lists available through the v9fs project
on sourceforge (http://sourceforge.net/projects/v9fs).
News and other information is maintained on SWiK (http://swik.net/v9fs).
Bug reports may be issued through the kernel.org bugzilla
(http://bugzilla.kernel.org)
For more information on the Plan 9 Operating System check out
http://plan9.bell-labs.com/plan9
......
=================================
INTERNAL KERNEL ABI FOR FR-V ARCH
=================================
The internal FRV kernel ABI is not quite the same as the userspace ABI. A number of the registers
are used for special purposed, and the ABI is not consistent between modules vs core, and MMU vs
no-MMU.
This partly stems from the fact that FRV CPUs do not have a separate supervisor stack pointer, and
most of them do not have any scratch registers, thus requiring at least one general purpose
register to be clobbered in such an event. Also, within the kernel core, it is possible to simply
jump or call directly between functions using a relative offset. This cannot be extended to modules
for the displacement is likely to be too far. Thus in modules the address of a function to call
must be calculated in a register and then used, requiring two extra instructions.
This document has the following sections:
(*) System call register ABI
(*) CPU operating modes
(*) Internal kernel-mode register ABI
(*) Internal debug-mode register ABI
(*) Virtual interrupt handling
========================
SYSTEM CALL REGISTER ABI
========================
When a system call is made, the following registers are effective:
REGISTERS CALL RETURN
=============== ======================= =======================
GR7 System call number Preserved
GR8 Syscall arg #1 Return value
GR9-GR13 Syscall arg #2-6 Preserved
===================
CPU OPERATING MODES
===================
The FR-V CPU has three basic operating modes. In order of increasing capability:
(1) User mode.
Basic userspace running mode.
(2) Kernel mode.
Normal kernel mode. There are many additional control registers available that may be
accessed in this mode, in addition to all the stuff available to user mode. This has two
submodes:
(a) Exceptions enabled (PSR.T == 1).
Exceptions will invoke the appropriate normal kernel mode handler. On entry to the
handler, the PSR.T bit will be cleared.
(b) Exceptions disabled (PSR.T == 0).
No exceptions or interrupts may happen. Any mandatory exceptions will cause the CPU to
halt unless the CPU is told to jump into debug mode instead.
(3) Debug mode.
No exceptions may happen in this mode. Memory protection and management exceptions will be
flagged for later consideration, but the exception handler won't be invoked. Debugging traps
such as hardware breakpoints and watchpoints will be ignored. This mode is entered only by
debugging events obtained from the other two modes.
All kernel mode registers may be accessed, plus a few extra debugging specific registers.
=================================
INTERNAL KERNEL-MODE REGISTER ABI
=================================
There are a number of permanent register assignments that are set up by entry.S in the exception
prologue. Note that there is a complete set of exception prologues for each of user->kernel
transition and kernel->kernel transition. There are also user->debug and kernel->debug mode
transition prologues.
REGISTER FLAVOUR USE
=============== ======= ====================================================
GR1 Supervisor stack pointer
GR15 Current thread info pointer
GR16 GP-Rel base register for small data
GR28 Current exception frame pointer (__frame)
GR29 Current task pointer (current)
GR30 Destroyed by kernel mode entry
GR31 NOMMU Destroyed by debug mode entry
GR31 MMU Destroyed by TLB miss kernel mode entry
CCR.ICC2 Virtual interrupt disablement tracking
CCCR.CC3 Cleared by exception prologue (atomic op emulation)
SCR0 MMU See mmu-layout.txt.
SCR1 MMU See mmu-layout.txt.
SCR2 MMU Save for EAR0 (destroyed by icache insns in debug mode)
SCR3 MMU Save for GR31 during debug exceptions
DAMR/IAMR NOMMU Fixed memory protection layout.
DAMR/IAMR MMU See mmu-layout.txt.
Certain registers are also used or modified across function calls:
REGISTER CALL RETURN
=============== =============================== ===============================
GR0 Fixed Zero -
GR2 Function call frame pointer
GR3 Special Preserved
GR3-GR7 - Clobbered
GR8 Function call arg #1 Return value (or clobbered)
GR9 Function call arg #2 Return value MSW (or clobbered)
GR10-GR13 Function call arg #3-#6 Clobbered
GR14 - Clobbered
GR15-GR16 Special Preserved
GR17-GR27 - Preserved
GR28-GR31 Special Only accessed explicitly
LR Return address after CALL Clobbered
CCR/CCCR - Mostly Clobbered
================================
INTERNAL DEBUG-MODE REGISTER ABI
================================
This is the same as the kernel-mode register ABI for functions calls. The difference is that in
debug-mode there's a different stack and a different exception frame. Almost all the global
registers from kernel-mode (including the stack pointer) may be changed.
REGISTER FLAVOUR USE
=============== ======= ====================================================
GR1 Debug stack pointer
GR16 GP-Rel base register for small data
GR31 Current debug exception frame pointer (__debug_frame)
SCR3 MMU Saved value of GR31
Note that debug mode is able to interfere with the kernel's emulated atomic ops, so it must be
exceedingly careful not to do any that would interact with the main kernel in this regard. Hence
the debug mode code (gdbstub) is almost completely self-contained. The only external code used is
the sprintf family of functions.
Futhermore, break.S is so complicated because single-step mode does not switch off on entry to an
exception. That means unless manually disabled, single-stepping will blithely go on stepping into
things like interrupts. See gdbstub.txt for more information.
==========================
VIRTUAL INTERRUPT HANDLING
==========================
Because accesses to the PSR is so slow, and to disable interrupts we have to access it twice (once
to read and once to write), we don't actually disable interrupts at all if we don't have to. What
we do instead is use the ICC2 condition code flags to note virtual disablement, such that if we
then do take an interrupt, we note the flag, really disable interrupts, set another flag and resume
execution at the point the interrupt happened. Setting condition flags as a side effect of an
arithmetic or logical instruction is really fast. This use of the ICC2 only occurs within the
kernel - it does not affect userspace.
The flags we use are:
(*) CCR.ICC2.Z [Zero flag]
Set to virtually disable interrupts, clear when interrupts are virtually enabled. Can be
modified by logical instructions without affecting the Carry flag.
(*) CCR.ICC2.C [Carry flag]
Clear to indicate hardware interrupts are really disabled, set otherwise.
What happens is this:
(1) Normal kernel-mode operation.
ICC2.Z is 0, ICC2.C is 1.
(2) An interrupt occurs. The exception prologue examines ICC2.Z and determines that nothing needs
doing. This is done simply with an unlikely BEQ instruction.
(3) The interrupts are disabled (local_irq_disable)
ICC2.Z is set to 1.
(4) If interrupts were then re-enabled (local_irq_enable):
ICC2.Z would be set to 0.
A TIHI #2 instruction (trap #2 if condition HI - Z==0 && C==0) would be used to trap if
interrupts were now virtually enabled, but physically disabled - which they're not, so the
trap isn't taken. The kernel would then be back to state (1).
(5) An interrupt occurs. The exception prologue examines ICC2.Z and determines that the interrupt
shouldn't actually have happened. It jumps aside, and there disabled interrupts by setting
PSR.PIL to 14 and then it clears ICC2.C.
(6) If interrupts were then saved and disabled again (local_irq_save):
ICC2.Z would be shifted into the save variable and masked off (giving a 1).
ICC2.Z would then be set to 1 (thus unchanged), and ICC2.C would be unaffected (ie: 0).
(7) If interrupts were then restored from state (6) (local_irq_restore):
ICC2.Z would be set to indicate the result of XOR'ing the saved value (ie: 1) with 1, which
gives a result of 0 - thus leaving ICC2.Z set.
ICC2.C would remain unaffected (ie: 0).
A TIHI #2 instruction would be used to again assay the current state, but this would do
nothing as Z==1.
(8) If interrupts were then enabled (local_irq_enable):
ICC2.Z would be cleared. ICC2.C would be left unaffected. Both flags would now be 0.
A TIHI #2 instruction again issued to assay the current state would then trap as both Z==0
[interrupts virtually enabled] and C==0 [interrupts really disabled] would then be true.
(9) The trap #2 handler would simply enable hardware interrupts (set PSR.PIL to 0), set ICC2.C to
1 and return.
(10) Immediately upon returning, the pending interrupt would be taken.
(11) The interrupt handler would take the path of actually processing the interrupt (ICC2.Z is
clear, BEQ fails as per step (2)).
(12) The interrupt handler would then set ICC2.C to 1 since hardware interrupts are definitely
enabled - or else the kernel wouldn't be here.
(13) On return from the interrupt handler, things would be back to state (1).
This trap (#2) is only available in kernel mode. In user mode it will result in SIGILL.
Kernel driver f71805f
=====================
Supported chips:
* Fintek F71805F/FG
Prefix: 'f71805f'
Addresses scanned: none, address read from Super I/O config space
Datasheet: Provided by Fintek on request
Author: Jean Delvare <khali@linux-fr.org>
Thanks to Denis Kieft from Barracuda Networks for the donation of a
test system (custom Jetway K8M8MS motherboard, with CPU and RAM) and
for providing initial documentation.
Thanks to Kris Chen from Fintek for answering technical questions and
providing additional documentation.
Thanks to Chris Lin from Jetway for providing wiring schematics and
anwsering technical questions.
Description
-----------
The Fintek F71805F/FG Super I/O chip includes complete hardware monitoring
capabilities. It can monitor up to 9 voltages (counting its own power
source), 3 fans and 3 temperature sensors.
This chip also has fan controlling features, using either DC or PWM, in
three different modes (one manual, two automatic). The driver doesn't
support these features yet.
The driver assumes that no more than one chip is present, which seems
reasonable.
Voltage Monitoring
------------------
Voltages are sampled by an 8-bit ADC with a LSB of 8 mV. The supported
range is thus from 0 to 2.040 V. Voltage values outside of this range
need external resistors. An exception is in0, which is used to monitor
the chip's own power source (+3.3V), and is divided internally by a
factor 2.
The two LSB of the voltage limit registers are not used (always 0), so
you can only set the limits in steps of 32 mV (before scaling).
The wirings and resistor values suggested by Fintek are as follow:
pin expected
name use R1 R2 divider raw val.
in0 VCC VCC3.3V int. int. 2.00 1.65 V
in1 VIN1 VTT1.2V 10K - 1.00 1.20 V
in2 VIN2 VRAM 100K 100K 2.00 ~1.25 V (1)
in3 VIN3 VCHIPSET 47K 100K 1.47 2.24 V (2)
in4 VIN4 VCC5V 200K 47K 5.25 0.95 V
in5 VIN5 +12V 200K 20K 11.00 1.05 V
in6 VIN6 VCC1.5V 10K - 1.00 1.50 V
in7 VIN7 VCORE 10K - 1.00 ~1.40 V (1)
in8 VIN8 VSB5V 200K 47K 1.00 0.95 V
(1) Depends on your hardware setup.
(2) Obviously not correct, swapping R1 and R2 would make more sense.
These values can be used as hints at best, as motherboard manufacturers
are free to use a completely different setup. As a matter of fact, the
Jetway K8M8MS uses a significantly different setup. You will have to
find out documentation about your own motherboard, and edit sensors.conf
accordingly.
Each voltage measured has associated low and high limits, each of which
triggers an alarm when crossed.
Fan Monitoring
--------------
Fan rotation speeds are reported as 12-bit values from a gated clock
signal. Speeds down to 366 RPM can be measured. There is no theoretical
high limit, but values over 6000 RPM seem to cause problem. The effective
resolution is much lower than you would expect, the step between different
register values being 10 rather than 1.
The chip assumes 2 pulse-per-revolution fans.
An alarm is triggered if the rotation speed drops below a programmable
limit or is too low to be measured.
Temperature Monitoring
----------------------
Temperatures are reported in degrees Celsius. Each temperature measured
has a high limit, those crossing triggers an alarm. There is an associated
hysteresis value, below which the temperature has to drop before the
alarm is cleared.
All temperature channels are external, there is no embedded temperature
sensor. Each channel can be used for connecting either a thermal diode
or a thermistor. The driver reports the currently selected mode, but
doesn't allow changing it. In theory, the BIOS should have configured
everything properly.
......@@ -9,7 +9,7 @@ Supported chips:
http://www.ite.com.tw/
* IT8712F
Prefix: 'it8712'
Addresses scanned: I2C 0x28 - 0x2f
Addresses scanned: I2C 0x2d
from Super I/O config space (8 I/O ports)
Datasheet: Publicly available at the ITE website
http://www.ite.com.tw/
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment