- May 20, 2020
-
-
Linus Torvalds authored
commit 44720996 upstream. This is another fine warning, related to the 'zero-length-bounds' one, but hitting the same historical code in the kernel. Because C didn't historically support flexible array members, we have code that instead uses a one-sized array, the same way we have cases of zero-sized arrays. The one-sized arrays come from either not wanting to use the gcc zero-sized array extension, or from a slight convenience-feature, where particularly for strings, the size of the structure now includes the allocation for the final NUL character. So with a "char name[1];" at the end of a structure, you can do things like v = my_malloc(sizeof(struct vendor) + strlen(name)); and avoid the "+1" for the terminator. Yes, the modern way to do that is with a flexible array, and using 'offsetof()' instead of 'sizeof()', and adding the "+1" by hand. That also technically gets the size "more correct" in that it avoids any alignment (and thus padding) issues, but this is another long-term cleanup thing that will not happen for 5.7. So disable the warning for now, even though it's potentially quite useful. Having a slew of warnings that then hide more urgent new issues is not an improvement. Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
commit 5c45de21 upstream. This is a fine warning, but we still have a number of zero-length arrays in the kernel that come from the traditional gcc extension. Yes, they are getting converted to flexible arrays, but in the meantime the gcc-10 warning about zero-length bounds is very verbose, and is hiding other issues. I missed one actual build failure because it was hidden among hundreds of lines of warning. Thankfully I caught it on the second go before pushing things out, but it convinced me that I really need to disable the new warnings for now. We'll hopefully be all done with our conversion to flexible arrays in the not too distant future, and we can then re-enable this warning. Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
commit 78a5255f upstream. We have some rather random rules about when we accept the "maybe-initialized" warnings, and when we don't. For example, we consider it unreliable for gcc versions < 4.9, but also if -O3 is enabled, or if optimizing for size. And then various kernel config options disabled it, because they know that they trigger that warning by confusing gcc sufficiently (ie PROFILE_ALL_BRANCHES). And now gcc-10 seems to be introducing a lot of those warnings too, so it falls under the same heading as 4.9 did. At the same time, we have a very straightforward way to _enable_ that warning when wanted: use "W=2" to enable more warnings. So stop playing these ad-hoc games, and just disable that warning by default, with the known and straight-forward "if you want to work on the extra compiler warnings, use W=123". Would it be great to have code that is always so obvious that it never confuses the compiler whether a variable is used initialized or not? Yes, it would. In a perfect world, the compilers would be smarter, and our source code would be simpler. That's currently not the world we live in, though. Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason Gunthorpe authored
commit 7dba9203 upstream. Returning the error code via a 'int *ret' when the function returns a pointer is very un-kernely and causes gcc 10's static analysis to choke: net/rds/message.c: In function ‘rds_message_map_pages’: net/rds/message.c:358:10: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized] 358 | return ERR_PTR(ret); Use a typical ERR_PTR return instead. Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com> Acked-by:
Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason Gunthorpe authored
commit 01b2bafe upstream. Aside from good practice, this avoids a warning from gcc 10: ./include/linux/kernel.h:997:3: warning: array subscript -31 is outside array bounds of ‘struct list_head[1]’ [-Warray-bounds] 997 | ((type *)(__mptr - offsetof(type, member))); }) | ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/list.h:493:2: note: in expansion of macro ‘container_of’ 493 | container_of(ptr, type, member) | ^~~~~~~~~~~~ ./include/linux/pnp.h:275:30: note: in expansion of macro ‘list_entry’ 275 | #define global_to_pnp_dev(n) list_entry(n, struct pnp_dev, global_list) | ^~~~~~~~~~ ./include/linux/pnp.h:281:11: note: in expansion of macro ‘global_to_pnp_dev’ 281 | (dev) != global_to_pnp_dev(&pnp_global); \ | ^~~~~~~~~~~~~~~~~ arch/x86/kernel/rtc.c:189:2: note: in expansion of macro ‘pnp_for_each_dev’ 189 | pnp_for_each_dev(dev) { Because the common code doesn't cast the starting list_head to the containing struct. Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com> [ rjw: Whitespace adjustments ] Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Olga Kornievskaia authored
[ Upstream commit 8eed292b ] Prior to commit e3d3ab64dd66 ("SUNRPC: Use au_rslack when computing reply buffer size"), there was enough slack in the reply buffer to commodate filehandles of size 60bytes. However, the real problem was that the reply buffer size for the MOUNT operation was not correctly calculated. Received buffer size used the filehandle size for NFSv2 (32bytes) which is much smaller than the allowed filehandle size for the v3 mounts. Fix the reply buffer size (decode arguments size) for the MNT command. Fixes: 2c94b8ec ("SUNRPC: Use au_rslack when computing reply buffer size") Signed-off-by:
Olga Kornievskaia <kolga@netapp.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Yafang Shao authored
[ Upstream commit 04fd61a4 ] A recent commit 9852ae3f ("mm, memcg: consider subtrees in memory.events") changed the behavior of memcg events, which will now consider subtrees in memory.events. But oom_kill event is a special one as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed in memory.oom_control. The file memory.oom_control is in both root memcg and non root memcg, that is different with memory.event as it only in non-root memcg. That commit is okay for cgroup2, but it is not okay for cgroup1 as it will cause inconsistent behavior between root memcg and non-root memcg. Here's an example on why this behavior is inconsistent in cgroup1. root memcg / memcg foo / memcg bar Suppose there's an oom_kill in memcg bar, then the oon_kill will be root memcg : memory.oom_control(oom_kill) 0 / memcg foo : memory.oom_control(oom_kill) 1 / memcg bar : memory.oom_control(oom_kill) 1 For the non-root memcg, its memory.oom_control(oom_kill) includes its descendants' oom_kill, but for root memcg, it doesn't include its descendants' oom_kill. That means, memory.oom_control(oom_kill) has different meanings in different memcgs. That is inconsistent. Then the user has to know whether the memcg is root or not. If we can't fully support it in cgroup1, for example by adding memory.events.local into cgroup1 as well, then let's don't touch its original behavior. Fixes: 9852ae3f ("mm, memcg: consider subtrees in memory.events") Reported-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Yafang Shao <laoar.shao@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Chris Down <chris@chrisdown.name> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200502141055.7378-1-laoar.shao@gmail.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Wei Yongjun authored
[ Upstream commit 29b74cb7 ] Fix to return negative error code -ENOMEM from the smcd_alloc_dev() error handling case instead of 0, as done elsewhere in this function. Fixes: 684b89bc ("s390/ism: add device driver for internal shared memory") Reported-by:
Hulk Robot <hulkci@huawei.com> Signed-off-by:
Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by:
Ursula Braun <ubraun@linux.ibm.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Samu Nuutamo authored
[ Upstream commit 333e22db ] When tsi-as-adc is configured it is possible for in7[0123]_input read to return an incorrect value if a concurrent read to in[456]_input is performed. This is caused by a concurrent manipulation of the mux channel without proper locking as hwmon and mfd use different locks for synchronization. Switch hwmon to use the same lock as mfd when accessing the TSI channel. Fixes: 4f16cab1 ("hwmon: da9052: Add support for TSI channel") Signed-off-by:
Samu Nuutamo <samu.nuutamo@vincit.fi> [rebase to current master, reword commit message slightly] Signed-off-by:
Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Potnuri Bharat Teja authored
[ Upstream commit c8b1f340 ] While reading the TCB field in t4_tcb_get_field32() the wrong mask is passed as a parameter which leads the driver eventually to a kernel panic/app segfault from access to an illegal SRQ index while flushing the SRQ completions during connection teardown. Fixes: 11a27e21 ("iw_cxgb4: complete the cached SRQ buffers") Link: https://lore.kernel.org/r/20200511185608.5202-1-bharat@chelsio.com Signed-off-by:
Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Sasha Levin authored
[ Upstream commit 50bbe3d3 ] Do not decrease the reference count of resource tracker object twice in the error flow of res_get_common_doit. Fixes: c5dfe0ea ("RDMA/nldev: Add resource tracker doit callback") Link: https://lore.kernel.org/r/20200507062942.98305-1-leon@kernel.org Signed-off-by:
Maor Gottlieb <maorg@mellanox.com> Signed-off-by:
Leon Romanovsky <leonro@mellanox.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jack Morgenstein authored
[ Upstream commit 1901b91f ] The IB core pkey cache is populated by procedure ib_cache_update(). Initially, the pkey cache pointer is NULL. ib_cache_update allocates a buffer and populates it with the device's pkeys, via repeated calls to procedure ib_query_pkey(). If there is a failure in populating the pkey buffer via ib_query_pkey(), ib_cache_update does not replace the old pkey buffer cache with the updated one -- it leaves the old cache as is. Since initially the pkey buffer cache is NULL, when calling ib_cache_update the first time, a failure in ib_query_pkey() will cause the pkey buffer cache pointer to remain NULL. In this situation, any calls subsequent to ib_get_cached_pkey(), ib_find_cached_pkey(), or ib_find_cached_pkey_exact() will try to dereference the NULL pkey cache pointer, causing a kernel panic. Fix this by checking the ib_cache_update() return value. Fixes: 8faea9fd ("RDMA/cache: Move the cache per-port data into the main ib_port_data") Fixes: 1da177e4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/r/20200507071012.100594-1-leon@kernel.org Signed-off-by:
Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by:
Leon Romanovsky <leonro@mellanox.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jack Morgenstein authored
[ Upstream commit 6693ca95 ] In the mlx4_ib_post_send() flow, some functions call ib_get_cached_pkey() without checking its return value. If ib_get_cached_pkey() returns an error code, these functions should return failure. Fixes: 1ffeb2eb ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support") Fixes: 225c7b1f ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters") Fixes: e622f2f4 ("IB: split struct ib_send_wr") Link: https://lore.kernel.org/r/20200426075921.130074-1-leon@kernel.org Signed-off-by:
Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by:
Leon Romanovsky <leonro@mellanox.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Sudip Mukherjee authored
[ Upstream commit bb43c8e3 ] The commit below modified rxe_create_mmap_info() to return ERR_PTR's but didn't update the callers to handle them. Modify rxe_create_mmap_info() to only return ERR_PTR and fix all error checking after rxe_create_mmap_info() is called. Ensure that all other exit paths properly set the error return. Fixes: ff23dfa1 ("IB: Pass only ib_udata in function prototypes") Link: https://lore.kernel.org/r/20200425233545.17210-1-sudipm.mukherjee@gmail.com Link: https://lore.kernel.org/r/20200511183742.GB225608@mwanda Cc: stable@vger.kernel.org [5.4+] Signed-off-by:
Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Phil Sutter authored
[ Upstream commit 340eaff6 ] Expired intervals would still match and be dumped to user space until garbage collection wiped them out. Make sure they stop matching and disappear (from users' perspective) as soon as they expire. Fixes: 8d8540c4 ("netfilter: nft_set_rbtree: add timeout support") Signed-off-by:
Phil Sutter <phil@nwl.cc> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Stefano Brivio authored
[ Upstream commit 6f7c9caf ] Replace negations of nft_rbtree_interval_end() with a new helper, nft_rbtree_interval_start(), wherever this helps to visualise the problem at hand, that is, for all the occurrences except for the comparison against given flags in __nft_rbtree_get(). This gets especially useful in the next patch. Signed-off-by:
Stefano Brivio <sbrivio@redhat.com> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Chuck Lever authored
[ Upstream commit ce99aa62 ] Ensure that signalled ASYNC rpc_tasks exit immediately instead of spinning until a timeout (or forever). To avoid checking for the signal flag on every scheduler iteration, the check is instead introduced in the client's finite state machine. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Fixes: ae67bd38 ("SUNRPC: Fix up task signalling") Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
J. Bruce Fields authored
[ Upstream commit 29fe8399 ] We add the new state to the nfsi->open_states list, making it potentially visible to other threads, before we've finished initializing it. That wasn't a problem when all the readers were also taking the i_lock (as we do here), but since we switched to RCU, there's now a possibility that a reader could see the partially initialized state. Symptoms observed were a crash when another thread called nfs4_get_valid_delegation() on a NULL inode, resulting in an oops like: BUG: unable to handle page fault for address: ffffffffffffffb0 ... RIP: 0010:nfs4_get_valid_delegation+0x6/0x30 [nfsv4] ... Call Trace: nfs4_open_prepare+0x80/0x1c0 [nfsv4] __rpc_execute+0x75/0x390 [sunrpc] ? finish_task_switch+0x75/0x260 rpc_async_schedule+0x29/0x40 [sunrpc] process_one_work+0x1ad/0x370 worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x10c/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x22/0x30 Fixes: 9ae075fd "NFSv4: Convert open state lookup to use RCU" Reviewed-by:
Seiichi Ikarashi <s.ikarashi@fujitsu.com> Tested-by:
Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Tested-by:
Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by:
J. Bruce Fields <bfields@redhat.com> Cc: stable@vger.kernel.org # v4.20+ Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Christoph Hellwig authored
[ Upstream commit d51c2145 ] The second argument is the end "pointer", not the length. Fixes: d28f6df1 ("arm64/kexec: Add core kexec support") Cc: <stable@vger.kernel.org> # 4.8.x- Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Zhenyu Wang authored
[ Upstream commit 72a7a992 ] As i915 won't allocate extra PDP for current default PML4 table, so for 3-level ppgtt guest, we would hit kernel pointer access failure on extra PDP pointers. So this trys to bypass that now. It won't impact real shadow PPGTT setup, so guest context still works. This is verified on 4.15 guest kernel with i915.enable_ppgtt=1 to force on old aliasing ppgtt behavior. Fixes: 4f15665c ("drm/i915: Add ppgtt to GVT GEM context") Reviewed-by:
Xiong Zhang <xiong.y.zhang@intel.com> Signed-off-by:
Zhenyu Wang <zhenyuw@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20200506095918.124913-1-zhenyuw@linux.intel.com Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Arnd Bergmann authored
[ Upstream commit 2c407aca ] gcc-10 warns around a suspicious access to an empty struct member: net/netfilter/nf_conntrack_core.c: In function '__nf_conntrack_alloc': net/netfilter/nf_conntrack_core.c:1522:9: warning: array subscript 0 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[0]'} [-Wzero-length-bounds] 1522 | memset(&ct->__nfct_init_offset[0], 0, | ^~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from net/netfilter/nf_conntrack_core.c:37: include/net/netfilter/nf_conntrack.h:90:5: note: while referencing '__nfct_init_offset' 90 | u8 __nfct_init_offset[0]; | ^~~~~~~~~~~~~~~~~~ The code is correct but a bit unusual. Rework it slightly in a way that does not trigger the warning, using an empty struct instead of an empty array. There are probably more elegant ways to do this, but this is the smallest change. Fixes: c41884ce ("netfilter: conntrack: avoid zeroing timer") Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Dave Wysochanski authored
[ Upstream commit 50eaa652 ] Commit 402cb8dd ("fscache: Attach the index key and aux data to the cookie") added the aux_data and aux_data_len to parameters to fscache_acquire_cookie(), and updated the callers in the NFS client. In the process it modified the aux_data to include the change_attr, but missed adding change_attr to a couple places where aux_data was used. Specifically, when opening a file and the change_attr is not added, the following attempt to lookup an object will fail inside cachefiles_check_object_xattr() = -116 due to nfs_fscache_inode_check_aux() failing memcmp on auxdata and returning FSCACHE_CHECKAUX_OBSOLETE. Fix this by adding nfs_fscache_update_auxdata() to set the auxdata from all relevant fields in the inode, including the change_attr. Fixes: 402cb8dd ("fscache: Attach the index key and aux data to the cookie") Signed-off-by:
Dave Wysochanski <dwysocha@redhat.com> Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Arnd Bergmann authored
[ Upstream commit 6e31ded6 ] nfs currently behaves differently on 32-bit and 64-bit kernels regarding the on-disk format of nfs_fscache_inode_auxdata. That format should really be the same on any kernel, and we should avoid the 'timespec' type in order to remove that from the kernel later on. Using plain 'timespec64' would not be good here, since that includes implied padding and would possibly leak kernel stack data to the on-disk format on 32-bit architectures. struct __kernel_timespec would work as a replacement, but open-coding the two struct members in nfs_fscache_inode_auxdata makes it more obvious what's going on here, and keeps the current format for 64-bit architectures. Cc: David Howells <dhowells@redhat.com> Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Dave Wysochanski authored
[ Upstream commit d9bfced1 ] Commit 402cb8dd ("fscache: Attach the index key and aux data to the cookie") added the index_key and index_key_len parameters to fscache_acquire_cookie(), and updated the callers in the NFS client. One of the callers was inside nfs_fscache_get_super_cookie() and was changed to use the full struct nfs_fscache_key as the index_key. However, a couple members of this structure contain pointers and thus will change each time the same NFS share is remounted. Since index_key is used for fscache_cookie->key_hash and this subsequently is used to compare cookies, the effectiveness of fscache with NFS is reduced to the point at which a umount occurs. Any subsequent remount of the same share will cause a unique NFS super_block index_key and key_hash to be generated for the same data, rendering any prior fscache data unable to be found. A simple reproducer demonstrates the problem. 1. Mount share with 'fsc', create a file, drop page cache systemctl start cachefilesd mount -o vers=3,fsc 127.0.0.1:/export /mnt dd if=/dev/zero of=/mnt/file1.bin bs=4096 count=1 echo 3 > /proc/sys/vm/drop_caches 2. Read file into page cache and fscache, then unmount dd if=/mnt/file1.bin of=/dev/null bs=4096 count=1 umount /mnt 3. Remount and re-read which should come from fscache mount -o vers=3,fsc 127.0.0.1:/export /mnt echo 3 > /proc/sys/vm/drop_caches dd if=/mnt/file1.bin of=/dev/null bs=4096 count=1 4. Check for READ ops in mountstats - there should be none grep READ: /proc/self/mountstats Looking at the history and the removed function, nfs_super_get_key(), we should only use nfs_fscache_key.key plus any uniquifier, for the fscache index_key. Fixes: 402cb8dd ("fscache: Attach the index key and aux data to the cookie") Signed-off-by:
Dave Wysochanski <dwysocha@redhat.com> Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Alex Deucher authored
[ Upstream commit a6aacb2b ] We set the fb smem pointer to the offset into the BAR, so keep the fbdev bo in vram. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207581 Fixes: 6c8d74ca ("drm/amdgpu: Enable scatter gather display support") Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Christian Brauner authored
[ Upstream commit 3f2c788a ] Jan reported an issue where an interaction between sign-extending clone's flag argument on ppc64le and the new CLONE_INTO_CGROUP feature causes clone() to consistently fail with EBADF. The whole story is a little longer. The legacy clone() syscall is odd in a bunch of ways and here two things interact. First, legacy clone's flag argument is word-size dependent, i.e. it's an unsigned long whereas most system calls with flag arguments use int or unsigned int. Second, legacy clone() ignores unknown and deprecated flags. The two of them taken together means that users on 64bit systems can pass garbage for the upper 32bit of the clone() syscall since forever and things would just work fine. Just try this on a 64bit kernel prior to v5.7-rc1 where this will succeed and on v5.7-rc1 where this will fail with EBADF: int main(int argc, char *argv[]) { pid_t pid; /* Note that legacy clone() has different argument ordering on * different architectures so this won't work everywhere. * * Only set the upper 32 bits. */ pid = syscall(__NR_clone, 0xffffffff00000000 | SIGCHLD, NULL, NULL, NULL, NULL); if (pid < 0) exit(EXIT_FAILURE); if (pid == 0) exit(EXIT_SUCCESS); if (wait(NULL) != pid) exit(EXIT_FAILURE); exit(EXIT_SUCCESS); } Since legacy clone() couldn't be extended this was not a problem so far and nobody really noticed or cared since nothing in the kernel ever bothered to look at the upper 32 bits. But once we introduced clone3() and expanded the flag argument in struct clone_args to 64 bit we opened this can of worms. With the first flag-based extension to clone3() making use of the upper 32 bits of the flag argument we've effectively made it possible for the legacy clone() syscall to reach clone3() only flags. The sign extension scenario is just the odd corner-case that we needed to figure this out. The reason we just realized this now and not already when we introduced CLONE_CLEAR_SIGHAND was that CLONE_INTO_CGROUP assumes that a valid cgroup file descriptor has been given. So the sign extension (or the user accidently passing garbage for the upper 32 bits) caused the CLONE_INTO_CGROUP bit to be raised and the kernel to error out when it didn't find a valid cgroup file descriptor. Let's fix this by always capping the upper 32 bits for all codepaths that are not aware of clone3() features. This ensures that we can't reach clone3() only features by accident via legacy clone as with the sign extension case and also that legacy clone() works exactly like before, i.e. ignoring any unknown flags. This solution risks no regressions and is also pretty clean. Fixes: 7f192e3c ("fork: add clone3") Fixes: ef2c41cf ("clone3: allow spawning processes into cgroups") Reported-by:
Jan Stancek <jstancek@redhat.com> Signed-off-by:
Christian Brauner <christian.brauner@ubuntu.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dmitry V. Levin <ldv@altlinux.org> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: libc-alpha@sourceware.org Cc: stable@vger.kernel.org # 5.3+ Link: https://sourceware.org/pipermail/libc-alpha/2020-May/113596.html Link: https://lore.kernel.org/r/20200507103214.77218-1-christian.brauner@ubuntu.com Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Andreas Gruenbacher authored
[ Upstream commit aa83da7f ] It turns out that when extending an existing bio, gfs2_find_jhead fails to check if the block number is consecutive, which leads to incorrect reads for fragmented journals. In addition, limit the maximum bio size to an arbitrary value of 2 megabytes: since commit 07173c3e ("block: enable multipage bvecs"), if we just keep adding pages until bio_add_page fails, bios will grow much larger than useful, which pins more memory than necessary with barely any additional performance gains. Fixes: f4686c26 ("gfs2: read journal in large chunks") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Adrian Hunter authored
[ Upstream commit c077dc5e ] First, it should be noted that the CQE timeout (60 seconds) is substantial so a CQE request that times out is really stuck, and the race between timeout and completion is extremely unlikely. Nevertheless this patch fixes an issue with it. Commit ad73d6fe ("mmc: complete requests from ->timeout") preserved the existing functionality, to complete the request. However that had only been necessary because the block layer timeout handler had been marking the request to prevent it from being completed normally. That restriction was removed at the same time, the result being that a request that has gone will have been completed anyway. That is, the completion was unnecessary. At the time, the unnecessary completion was harmless because the block layer would ignore it, although that changed in kernel v5.0. Note for stable, this patch will not apply cleanly without patch "mmc: core: Fix recursive locking issue in CQE recovery path" Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Fixes: ad73d6fe ("mmc: complete requests from ->timeout") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200508062227.23144-1-adrian.hunter@intel.com Signed-off-by:
Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Sarthak Garg authored
[ Upstream commit 39a22f73 ] Consider the following stack trace -001|raw_spin_lock_irqsave -002|mmc_blk_cqe_complete_rq -003|__blk_mq_complete_request(inline) -003|blk_mq_complete_request(rq) -004|mmc_cqe_timed_out(inline) -004|mmc_mq_timed_out mmc_mq_timed_out acquires the queue_lock for the first time. The mmc_blk_cqe_complete_rq function also tries to acquire the same queue lock resulting in recursive locking where the task is spinning for the same lock which it has already acquired leading to watchdog bark. Fix this issue with the lock only for the required critical section. Cc: <stable@vger.kernel.org> Fixes: 1e8e55b6 ("mmc: block: Add CQE support") Suggested-by:
Sahitya Tummala <stummala@codeaurora.org> Signed-off-by:
Sarthak Garg <sartgarg@codeaurora.org> Acked-by:
Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/1588868135-31783-1-git-send-email-vbadigan@codeaurora.org Signed-off-by:
Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Veerabhadrarao Badiganti authored
[ Upstream commit e6bfb1bf ] In the request completion path with CQE, request type is being checked after the request is getting completed. This is resulting in returning the wrong request type and leading to the IO hang issue. ASYNC request type is getting returned for DCMD type requests. Because of this mismatch, mq->cqe_busy flag is never getting cleared and the driver is not invoking blk_mq_hw_run_queue. So requests are not getting dispatched to the LLD from the block layer. All these eventually leading to IO hang issues. So, get the request type before completing the request. Cc: <stable@vger.kernel.org> Fixes: 1e8e55b6 ("mmc: block: Add CQE support") Signed-off-by:
Veerabhadrarao Badiganti <vbadigan@codeaurora.org> Acked-by:
Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/1588775643-18037-2-git-send-email-vbadigan@codeaurora.org Signed-off-by:
Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Ben Chuang authored
[ Upstream commit b56ff195 ] Need to clear some bits in a vendor-defined register after reboot from Windows 10. Fixes: e51df6ce ("mmc: host: sdhci-pci: Add Genesys Logic GL975x support") Reported-by:
Grzegorz Kowal <custos.mentis@gmail.com> Signed-off-by:
Ben Chuang <ben.chuang@genesyslogic.com.tw> Acked-by:
Adrian Hunter <adrian.hunter@intel.com> Tested-by:
Grzegorz Kowal <custos.mentis@gmail.com> Link: https://lore.kernel.org/r/20200504063957.6638-1-benchuanggli@gmail.com Cc: stable@vger.kernel.org Signed-off-by:
Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Christophe JAILLET authored
[ Upstream commit 7c277dd2 ] If devm_request_threaded_irq() fails, the allocated struct mmc_host needs to be freed via calling mmc_free_host(), so let's do that. Fixes: c5413ad8 ("mmc: add new Alcor Micro Cardreader SD/MMC driver") Signed-off-by:
Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20200426202355.43055-1-christophe.jaillet@wanadoo.fr Cc: stable@vger.kernel.org Signed-off-by:
Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
John Fastabend authored
[ Upstream commit 81aabbb9 ] In bpf_tcp_ingress we used apply_bytes to subtract bytes from sg.size which is used to track total bytes in a message. But this is not correct because apply_bytes is itself modified in the main loop doing the mem_charge. Then at the end of this we have sg.size incorrectly set and out of sync with actual sk values. Then we can get a splat if we try to cork the data later and again try to redirect the msg to ingress. To fix instead of trying to track msg.size do the easy thing and include it as part of the sk_msg_xfer logic so that when the msg is moved the sg.size is always correct. To reproduce the below users will need ingress + cork and hit an error path that will then try to 'free' the skmsg. [ 173.699981] BUG: KASAN: null-ptr-deref in sk_msg_free_elem+0xdd/0x120 [ 173.699987] Read of size 8 at addr 0000000000000008 by task test_sockmap/5317 [ 173.700000] CPU: 2 PID: 5317 Comm: test_sockmap Tainted: G I 5.7.0-rc1+ #43 [ 173.700005] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019 [ 173.700009] Call Trace: [ 173.700021] dump_stack+0x8e/0xcb [ 173.700029] ? sk_msg_free_elem+0xdd/0x120 [ 173.700034] ? sk_msg_free_elem+0xdd/0x120 [ 173.700042] __kasan_report+0x102/0x15f [ 173.700052] ? sk_msg_free_elem+0xdd/0x120 [ 173.700060] kasan_report+0x32/0x50 [ 173.700070] sk_msg_free_elem+0xdd/0x120 [ 173.700080] __sk_msg_free+0x87/0x150 [ 173.700094] tcp_bpf_send_verdict+0x179/0x4f0 [ 173.700109] tcp_bpf_sendpage+0x3ce/0x5d0 Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by:
John Fastabend <john.fastabend@gmail.com> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Reviewed-by:
Jakub Sitnicki <jakub@cloudflare.com> Acked-by:
Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/158861290407.14306.5327773422227552482.stgit@john-Precision-5820-Tower Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
John Fastabend authored
[ Upstream commit 3e104c23 ] When sk_msg_pop() is called where the pop operation is working on the end of a sge element and there is no additional trailing data and there _is_ data in front of pop, like the following case, |____________a_____________|__pop__| We have out of order operations where we incorrectly set the pop variable so that instead of zero'ing pop we incorrectly leave it untouched, effectively. This can cause later logic to shift the buffers around believing it should pop extra space. The result is we have 'popped' more data then we expected potentially breaking program logic. It took us a while to hit this case because typically we pop headers which seem to rarely be at the end of a scatterlist elements but we can't rely on this. Fixes: 7246d8ed ("bpf: helper to pop data from messages") Signed-off-by:
John Fastabend <john.fastabend@gmail.com> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Reviewed-by:
Jakub Sitnicki <jakub@cloudflare.com> Acked-by:
Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/158861288359.14306.7654891716919968144.stgit@john-Precision-5820-Tower Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Sultan Alsawaf authored
[ Upstream commit 421abe20 ] In commit 5a7d202b, a logical AND was erroneously changed to an OR, causing WaIncreaseLatencyIPCEnabled to be enabled unconditionally for kabylake and coffeelake, even when IPC is disabled. Fix the logic so that WaIncreaseLatencyIPCEnabled is only used when IPC is enabled. Fixes: 5a7d202b ("drm/i915: Drop WaIncreaseLatencyIPCEnabled/1140 for cnl") Cc: stable@vger.kernel.org # 5.3.x+ Signed-off-by:
Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by:
Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200430214654.51314-1-sultan@kerneltoast.com (cherry picked from commit 690d22da) Signed-off-by:
Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Dan Carpenter authored
[ Upstream commit 37e31d2d ] The i40iw_arp_table() function can return -EOVERFLOW if i40iw_alloc_resource() fails so we can't just test for "== -1". Fixes: 4e9042e6 ("i40iw: add hw and utils files") Link: https://lore.kernel.org/r/20200422092211.GA195357@mwanda Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Acked-by:
Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Takashi Sakamoto authored
[ Upstream commit 10348721 ] The snd-firewire-lib.ko has 'amdtp-packet' event of tracepoints. Current printk format for the event includes 'sizeof(u8)' macro expected to be extended in compilation time. However, this is not done. As a result, perf tools cannot parse the event for printing: $ mount -l -t debugfs debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime) $ cat /sys/kernel/debug/tracing/events/snd_firewire_lib/amdtp_packet/format ... print fmt: "%02u %04u %04x %04x %02d %03u %02u %03u %02u %01u %02u %s", REC->second, REC->cycle, REC->src, REC->dest, REC->channel, REC->payload_quadlets, REC->data_blocks, REC->data_block_counter, REC->packet_index, REC->irq, REC->index, __print_array(__get_dynamic_array(cip_header), __get_dynamic_array_len(cip_header), sizeof(u8)) $ sudo perf record -e snd_firewire_lib:amdtp_packet [snd_firewire_lib:amdtp_packet] function sizeof not defined Error: expected type 5 but read 0 This commit fixes it by obsoleting the macro with actual size. Cc: <stable@vger.kernel.org> Fixes: bde2bbdb ("ALSA: firewire-lib: use dynamic array for CIP header of tracing events") Signed-off-by:
Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/20200503045718.86337-1-o-takashi@sakamocchi.jp Signed-off-by:
Takashi Iwai <tiwai@suse.de> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Wei Yongjun authored
[ Upstream commit 7f645462 ] Fix to return negative error code -EFAULT from the copy_to_user() error handling case instead of 0, as done elsewhere in this function. Fixes: bd513cd0 ("bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall") Signed-off-by:
Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200430081851.166996-1-weiyongjun1@huawei.com Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Grace Kao authored
[ Upstream commit 69388e15 ] According to Braswell NDA Specification Update (#557593), concurrent read accesses may result in returning 0xffffffff and write instructions may be dropped. We have an established format for the commit references, i.e. cdca06e4 ("pinctrl: baytrail: Add missing spinlock usage in byt_gpio_irq_handler") Fixes: 0bd50d71 ("pinctrl: cherryview: prevent concurrent access to GPIO controllers") Signed-off-by:
Grace Kao <grace.kao@intel.com> Reported-by:
Brian Norris <briannorris@chromium.org> Reviewed-by:
Brian Norris <briannorris@chromium.org> Acked-by:
Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Ansuel Smith authored
[ Upstream commit 90bcb0c3 ] Fix a typo in the readl/writel accessor conversion where val is used instead of pol changing the behavior of the original code. Cc: stable@vger.kernel.org Fixes: 6c736989 pinctrl: qcom: Introduce readl/writel accessors Signed-off-by:
Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by:
Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20200414003726.25347-1-ansuelsmth@gmail.com Signed-off-by:
Linus Walleij <linus.walleij@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-