- May 14, 2021
-
-
Pavel Begunkov authored
[ Upstream commit 38134ada ] Colin reported before possible overflow and sign extension problems in io_provide_buffers_prep(). As Linus pointed out previous attempt did nothing useful, see d81269fe ("io_uring: fix provide_buffers sign extension"). Do that with help of check_<op>_overflow helpers. And fix struct io_provide_buf::len type, as it doesn't make much sense to keep it signed. Reported-by:
Colin Ian King <colin.king@canonical.com> Fixes: efe68c1c ("io_uring: validate the full range of provided buffers for access") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/46538827e70fce5f6cdb50897cff4cacc490f380.1618488258.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Thadeu Lima de Souza Cascardo authored
commit d1f82808 upstream. Read and write operations are capped to MAX_RW_COUNT. Some read ops rely on that limit, and that is not guaranteed by the IORING_OP_PROVIDE_BUFFERS. Truncate those lengths when doing io_add_buffers, so buffer addresses still use the uncapped length. Also, take the chance and change struct io_buffer len member to __u32, so it matches struct io_provide_buffer len member. This fixes CVE-2021-3491, also reported as ZDI-CAN-13546. Fixes: ddf0322d ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Reported-by:
Billy Jheng Bing-Jhong <(@st424204)> Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Apr 16, 2021
-
-
Jens Axboe authored
[ Upstream commit 4b982bd0 ] S_ISBLK is marked as unbounded work for async preparation, because it doesn't match S_ISREG. That is incorrect, as any read/write to a block device is also a bounded operation. Fix it up and ensure that S_ISBLK isn't marked unbounded. Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- Apr 10, 2021
-
-
Pavel Begunkov authored
[ Upstream commit 1ee4160c ] When we cancel a timeout we should emit a sensible return code, like -ECANCELED but not 0, otherwise it may trick users. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7b0ad1065e3bd1994722702bd0ba9e7bc9b0683b.1616696997.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- Apr 07, 2021
-
-
Stefan Metzmacher authored
[ Upstream commit 0031275d ] Without that it's not safe to use them in a linked combination with others. Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE should be possible. We already handle short reads and writes for the following opcodes: - IORING_OP_READV - IORING_OP_READ_FIXED - IORING_OP_READ - IORING_OP_WRITEV - IORING_OP_WRITE_FIXED - IORING_OP_WRITE - IORING_OP_SPLICE - IORING_OP_TEE Now we have it for these as well: - IORING_OP_SENDMSG - IORING_OP_SEND - IORING_OP_RECVMSG - IORING_OP_RECV For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC flags in order to call req_set_fail_links(). There might be applications arround depending on the behavior that even short send[msg]()/recv[msg]() retuns continue an IOSQE_IO_LINK chain. It's very unlikely that such applications pass in MSG_WAITALL, which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'. It's expected that the low level sock_sendmsg() call just ignores MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set SO_ZEROCOPY. We also expect the caller to know about the implicit truncation to MAX_RW_COUNT, which we don't detect. cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.org Signed-off-by:
Stefan Metzmacher <metze@samba.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Stefan Metzmacher authored
[ Upstream commit 76cd979f ] We never want to generate any SIGPIPE, -EPIPE only is much better. Signed-off-by:
Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/38961085c3ec49fd21550c7788f214d1ff02d2d4.1615908477.git.metze@samba.org Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Pavel Begunkov authored
[ Upstream commit efe814a4 ] It's racy to modify req->flags from a not owning context, e.g. linked timeout calling req_set_fail_links() for the master request might race with that request setting/clearing flags while being executed concurrently. Just remove req_set_fail_links(prev) from io_link_timeout_fn(), io_async_find_and_cancel() and functions down the line take care of setting the fail bit. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- Mar 30, 2021
-
-
Pavel Begunkov authored
[ Upstream commit d81269fe ] io_provide_buffers_prep()'s "p->len * p->nbufs" to sign extension problems. Not a huge problem as it's only used for access_ok() and increases the checked length, but better to keep typing right. Reported-by:
Colin Ian King <colin.king@canonical.com> Fixes: efe68c1c ("io_uring: validate the full range of provided buffers for access") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Reviewed-by:
Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/562376a39509e260d8532186a06226e56eb1f594.1616149233.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- Mar 25, 2021
-
-
Jens Axboe authored
[ Upstream commit b5b0ecb7 ] The callback can only be armed, if we get -EIOCBQUEUED returned. It's important that we clear the WAITQ bit for other cases, otherwise we can queue for async retry and filemap will assume that we're armed and return -EAGAIN instead of just blocking for the IO. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jens Axboe authored
[ Upstream commit 7c977a58 ] If we're exiting the ring, just let the IO fail with -EAGAIN as nobody will care anyway. It's not the right context to reissue from. Cc: stable@vger.kernel.org Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Pavel Begunkov authored
[ Upstream commit 9ae1f8dd ] WARNING: inconsistent lock state inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. syz-executor217/8450 [HC1[1]:SC0[0]:HE0:SE1] takes: ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline] ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: io_req_clean_work fs/io_uring.c:1398 [inline] ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: io_dismantle_req+0x66f/0xf60 fs/io_uring.c:2029 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&fs->lock); <Interrupt> lock(&fs->lock); *** DEADLOCK *** 1 lock held by syz-executor217/8450: #0: ffff88802417c3e8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x1071/0x1f30 fs/io_uring.c:9442 stack backtrace: CPU: 1 PID: 8450 Comm: syz-executor217 Not tainted 5.11.0-rc5-next-20210129-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> [...] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:354 [inline] io_req_clean_work fs/io_uring.c:1398 [inline] io_dismantle_req+0x66f/0xf60 fs/io_uring.c:2029 __io_free_req+0x3d/0x2e0 fs/io_uring.c:2046 io_free_req fs/io_uring.c:2269 [inline] io_double_put_req fs/io_uring.c:2392 [inline] io_put_req+0xf9/0x570 fs/io_uring.c:2388 io_link_timeout_fn+0x30c/0x480 fs/io_uring.c:6497 __run_hrtimer kernel/time/hrtimer.c:1519 [inline] __hrtimer_run_queues+0x609/0xe40 kernel/time/hrtimer.c:1583 hrtimer_interrupt+0x334/0x940 kernel/time/hrtimer.c:1645 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1085 [inline] __sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1102 asm_call_irq_on_stack+0xf/0x20 </IRQ> __run_sysvec_on_irqstack arch/x86/include/asm/irq_stack.h:37 [inline] run_sysvec_on_irqstack_cond arch/x86/include/asm/irq_stack.h:89 [inline] sysvec_apic_timer_interrupt+0xbd/0x100 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:629 RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline] RIP: 0010:_raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:199 spin_unlock_irq include/linux/spinlock.h:404 [inline] io_queue_linked_timeout+0x194/0x1f0 fs/io_uring.c:6525 __io_queue_sqe+0x328/0x1290 fs/io_uring.c:6594 io_queue_sqe+0x631/0x10d0 fs/io_uring.c:6639 io_queue_link_head fs/io_uring.c:6650 [inline] io_submit_sqe fs/io_uring.c:6697 [inline] io_submit_sqes+0x19b5/0x2720 fs/io_uring.c:6960 __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9443 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Don't free requests from under hrtimer context (softirq) as it may sleep or take spinlocks improperly (e.g. non-irq versions). Cc: stable@vger.kernel.org # 5.6+ Reported-by:
<syzbot+81d17233a2b02eafba33@syzkaller.appspotmail.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jens Axboe authored
commit 3ebba796 upstream. If we create it in a disabled state because IORING_SETUP_R_DISABLED is set on ring creation, we need to ensure that we've kicked the thread if we're exiting before it's been explicitly disabled. Otherwise we can run into a deadlock where exit is waiting go park the SQPOLL thread, but the SQPOLL thread itself is waiting to get a signal to start. That results in the below trace of both tasks hung, waiting on each other: INFO: task syz-executor458:8401 blocked for more than 143 seconds. Not tainted 5.11.0-next-20210226-syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor458 state:D stack:27536 pid: 8401 ppid: 8400 flags:0x00004004 Call Trace: context_switch kernel/sched/core.c:4324 [inline] __schedule+0x90c/0x21a0 kernel/sched/core.c:5075 schedule+0xcf/0x270 kernel/sched/core.c:5154 schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x168/0x270 kernel/sched/completion.c:138 io_sq_thread_park fs/io_uring.c:7115 [inline] io_sq_thread_park+0xd5/0x130 fs/io_uring.c:7103 io_uring_cancel_task_requests+0x24c/0xd90 fs/io_uring.c:8745 __io_uring_files_cancel+0x110/0x230 fs/io_uring.c:8840 io_uring_files_cancel include/linux/io_uring.h:47 [inline] do_exit+0x299/0x2a60 kernel/exit.c:780 do_group_exit+0x125/0x310 kernel/exit.c:922 __do_sys_exit_group kernel/exit.c:933 [inline] __se_sys_exit_group kernel/exit.c:931 [inline] __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x43e899 RSP: 002b:00007ffe89376d48 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 00000000004af2f0 RCX: 000000000043e899 RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000010000000 R10: 0000000000008011 R11: 0000000000000246 R12: 00000000004af2f0 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 INFO: task iou-sqp-8401:8402 can't die for more than 143 seconds. task:iou-sqp-8401 state:D stack:30272 pid: 8402 ppid: 8400 flags:0x00004004 Call Trace: context_switch kernel/sched/core.c:4324 [inline] __schedule+0x90c/0x21a0 kernel/sched/core.c:5075 schedule+0xcf/0x270 kernel/sched/core.c:5154 schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x168/0x270 kernel/sched/completion.c:138 io_sq_thread+0x27d/0x1ae0 fs/io_uring.c:6717 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 INFO: task iou-sqp-8401:8402 blocked for more than 143 seconds. Reported-by:
<syzbot+fb5458330b4442f2090d@syzkaller.appspotmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Mar 09, 2021
-
-
Jens Axboe authored
commit 1c3b3e65 upstream. syzbot reports a deadlock, attempting to lock the same spinlock twice: ============================================ WARNING: possible recursive locking detected 5.11.0-syzkaller #0 Not tainted -------------------------------------------- swapper/1/0 is trying to acquire lock: ffff88801b2b1130 (&runtime->sleep){..-.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline] ffff88801b2b1130 (&runtime->sleep){..-.}-{2:2}, at: io_poll_double_wake+0x25f/0x6a0 fs/io_uring.c:4960 but task is already holding lock: ffff88801b2b3130 (&runtime->sleep){..-.}-{2:2}, at: __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:137 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&runtime->sleep); lock(&runtime->sleep); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by swapper/1/0: #0: ffff888147474908 (&group->lock){..-.}-{2:2}, at: _snd_pcm_stream_lock_irqsave+0x9f/0xd0 sound/core/pcm_native.c:170 #1: ffff88801b2b3130 (&runtime->sleep){..-.}-{2:2}, at: __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:137 stack backtrace: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.11.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0xfa/0x151 lib/dump_stack.c:120 print_deadlock_bug kernel/locking/lockdep.c:2829 [inline] check_deadlock kernel/locking/lockdep.c:2872 [inline] validate_chain kernel/locking/lockdep.c:3661 [inline] __lock_acquire.cold+0x14c/0x3b4 kernel/locking/lockdep.c:4900 lock_acquire kernel/locking/lockdep.c:5510 [inline] lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:354 [inline] io_poll_double_wake+0x25f/0x6a0 fs/io_uring.c:4960 __wake_up_common+0x147/0x650 kernel/sched/wait.c:108 __wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:138 snd_pcm_update_state+0x46a/0x540 sound/core/pcm_lib.c:203 snd_pcm_update_hw_ptr0+0xa75/0x1a50 sound/core/pcm_lib.c:464 snd_pcm_period_elapsed+0x160/0x250 sound/core/pcm_lib.c:1805 dummy_hrtimer_callback+0x94/0x1b0 sound/drivers/dummy.c:378 __run_hrtimer kernel/time/hrtimer.c:1519 [inline] __hrtimer_run_queues+0x609/0xe40 kernel/time/hrtimer.c:1583 hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1600 __do_softirq+0x29b/0x9f6 kernel/softirq.c:345 invoke_softirq kernel/softirq.c:221 [inline] __irq_exit_rcu kernel/softirq.c:422 [inline] irq_exit_rcu+0x134/0x200 kernel/softirq.c:434 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632 RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline] RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:70 [inline] RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:137 [inline] RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline] RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:516 Code: dd 38 6e f8 84 db 75 ac e8 54 32 6e f8 e8 0f 1c 74 f8 e9 0c 00 00 00 e8 45 32 6e f8 0f 00 2d 4e 4a c5 00 e8 39 32 6e f8 fb f4 <9c> 5b 81 e3 00 02 00 00 fa 31 ff 48 89 de e8 14 3a 6e f8 48 85 db RSP: 0018:ffffc90000d47d18 EFLAGS: 00000293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff8880115c3780 RSI: ffffffff89052537 RDI: 0000000000000000 RBP: ffff888141127064 R08: 0000000000000001 R09: 0000000000000001 R10: ffffffff81794168 R11: 0000000000000000 R12: 0000000000000001 R13: ffff888141127000 R14: ffff888141127064 R15: ffff888143331804 acpi_idle_enter+0x361/0x500 drivers/acpi/processor_idle.c:647 cpuidle_enter_state+0x1b1/0xc80 drivers/cpuidle/cpuidle.c:237 cpuidle_enter+0x4a/0xa0 drivers/cpuidle/cpuidle.c:351 call_cpuidle kernel/sched/idle.c:158 [inline] cpuidle_idle_call kernel/sched/idle.c:239 [inline] do_idle+0x3e1/0x590 kernel/sched/idle.c:300 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:397 start_secondary+0x274/0x350 arch/x86/kernel/smpboot.c:272 secondary_startup_64_no_verify+0xb0/0xbb which is due to the driver doing poll_wait() twice on the same wait_queue_head. That is perfectly valid, but from checking the rest of the kernel tree, it's the only driver that does this. We can handle this just fine, we just need to ignore the second addition as we'll get woken just fine on the first one. Cc: stable@vger.kernel.org # 5.8+ Fixes: 18bceab1 ("io_uring: allow POLL_ADD with double poll_wait() users") Reported-by:
<syzbot+28abd693db9e92c160d8@syzkaller.appspotmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Mar 04, 2021
-
-
Hao Xu authored
[ Upstream commit ed670c3f ] Abaci reported follow issue: [ 30.615891] ====================================================== [ 30.616648] WARNING: possible circular locking dependency detected [ 30.617423] 5.11.0-rc3-next-20210115 #1 Not tainted [ 30.618035] ------------------------------------------------------ [ 30.618914] a.out/1128 is trying to acquire lock: [ 30.619520] ffff88810b063868 (&ep->mtx){+.+.}-{3:3}, at: __ep_eventpoll_poll+0x9f/0x220 [ 30.620505] [ 30.620505] but task is already holding lock: [ 30.621218] ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 30.622349] [ 30.622349] which lock already depends on the new lock. [ 30.622349] [ 30.623289] [ 30.623289] the existing dependency chain (in reverse order) is: [ 30.624243] [ 30.624243] -> #1 (&ctx->uring_lock){+.+.}-{3:3}: [ 30.625263] lock_acquire+0x2c7/0x390 [ 30.625868] __mutex_lock+0xae/0x9f0 [ 30.626451] io_cqring_overflow_flush.part.95+0x6d/0x70 [ 30.627278] io_uring_poll+0xcb/0xd0 [ 30.627890] ep_item_poll.isra.14+0x4e/0x90 [ 30.628531] do_epoll_ctl+0xb7e/0x1120 [ 30.629122] __x64_sys_epoll_ctl+0x70/0xb0 [ 30.629770] do_syscall_64+0x2d/0x40 [ 30.630332] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 30.631187] [ 30.631187] -> #0 (&ep->mtx){+.+.}-{3:3}: [ 30.631985] check_prevs_add+0x226/0xb00 [ 30.632584] __lock_acquire+0x1237/0x13a0 [ 30.633207] lock_acquire+0x2c7/0x390 [ 30.633740] __mutex_lock+0xae/0x9f0 [ 30.634258] __ep_eventpoll_poll+0x9f/0x220 [ 30.634879] __io_arm_poll_handler+0xbf/0x220 [ 30.635462] io_issue_sqe+0xa6b/0x13e0 [ 30.635982] __io_queue_sqe+0x10b/0x550 [ 30.636648] io_queue_sqe+0x235/0x470 [ 30.637281] io_submit_sqes+0xcce/0xf10 [ 30.637839] __x64_sys_io_uring_enter+0x3fb/0x5b0 [ 30.638465] do_syscall_64+0x2d/0x40 [ 30.638999] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 30.639643] [ 30.639643] other info that might help us debug this: [ 30.639643] [ 30.640618] Possible unsafe locking scenario: [ 30.640618] [ 30.641402] CPU0 CPU1 [ 30.641938] ---- ---- [ 30.642664] lock(&ctx->uring_lock); [ 30.643425] lock(&ep->mtx); [ 30.644498] lock(&ctx->uring_lock); [ 30.645668] lock(&ep->mtx); [ 30.646321] [ 30.646321] *** DEADLOCK *** [ 30.646321] [ 30.647642] 1 lock held by a.out/1128: [ 30.648424] #0: ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 30.649954] [ 30.649954] stack backtrace: [ 30.650592] CPU: 1 PID: 1128 Comm: a.out Not tainted 5.11.0-rc3-next-20210115 #1 [ 30.651554] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 30.652290] Call Trace: [ 30.652688] dump_stack+0xac/0xe3 [ 30.653164] check_noncircular+0x11e/0x130 [ 30.653747] ? check_prevs_add+0x226/0xb00 [ 30.654303] check_prevs_add+0x226/0xb00 [ 30.654845] ? add_lock_to_list.constprop.49+0xac/0x1d0 [ 30.655564] __lock_acquire+0x1237/0x13a0 [ 30.656262] lock_acquire+0x2c7/0x390 [ 30.656788] ? __ep_eventpoll_poll+0x9f/0x220 [ 30.657379] ? __io_queue_proc.isra.88+0x180/0x180 [ 30.658014] __mutex_lock+0xae/0x9f0 [ 30.658524] ? __ep_eventpoll_poll+0x9f/0x220 [ 30.659112] ? mark_held_locks+0x5a/0x80 [ 30.659648] ? __ep_eventpoll_poll+0x9f/0x220 [ 30.660229] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 30.660885] ? trace_hardirqs_on+0x46/0x110 [ 30.661471] ? __io_queue_proc.isra.88+0x180/0x180 [ 30.662102] ? __ep_eventpoll_poll+0x9f/0x220 [ 30.662696] __ep_eventpoll_poll+0x9f/0x220 [ 30.663273] ? __ep_eventpoll_poll+0x220/0x220 [ 30.663875] __io_arm_poll_handler+0xbf/0x220 [ 30.664463] io_issue_sqe+0xa6b/0x13e0 [ 30.664984] ? __lock_acquire+0x782/0x13a0 [ 30.665544] ? __io_queue_proc.isra.88+0x180/0x180 [ 30.666170] ? __io_queue_sqe+0x10b/0x550 [ 30.666725] __io_queue_sqe+0x10b/0x550 [ 30.667252] ? __fget_files+0x131/0x260 [ 30.667791] ? io_req_prep+0xd8/0x1090 [ 30.668316] ? io_queue_sqe+0x235/0x470 [ 30.668868] io_queue_sqe+0x235/0x470 [ 30.669398] io_submit_sqes+0xcce/0xf10 [ 30.669931] ? xa_load+0xe4/0x1c0 [ 30.670425] __x64_sys_io_uring_enter+0x3fb/0x5b0 [ 30.671051] ? lockdep_hardirqs_on_prepare+0xde/0x180 [ 30.671719] ? syscall_enter_from_user_mode+0x2b/0x80 [ 30.672380] do_syscall_64+0x2d/0x40 [ 30.672901] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 30.673503] RIP: 0033:0x7fd89c813239 [ 30.673962] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48 [ 30.675920] RSP: 002b:00007ffc65a7c628 EFLAGS: 00000217 ORIG_RAX: 00000000000001aa [ 30.676791] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd89c813239 [ 30.677594] RDX: 0000000000000000 RSI: 0000000000000014 RDI: 0000000000000003 [ 30.678678] RBP: 00007ffc65a7c720 R08: 0000000000000000 R09: 0000000003000000 [ 30.679492] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400ff0 [ 30.680282] R13: 00007ffc65a7c840 R14: 0000000000000000 R15: 0000000000000000 This might happen if we do epoll_wait on a uring fd while reading/writing the former epoll fd in a sqe in the former uring instance. So let's don't flush cqring overflow list, just do a simple check. Reported-by:
Abaci <abaci@linux.alibaba.com> Fixes: 6c503150 ("io_uring: patch up IOPOLL overflow_flush sync") Signed-off-by:
Hao Xu <haoxu@linux.alibaba.com> Reviewed-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- Feb 13, 2021
-
-
Pavel Begunkov authored
[ Upstream commit aec18a57 ] Since SQPOLL task can be shared and so task_work entries can be a mix of them, we need to drop mm and files before trying to issue next request. Cc: stable@vger.kernel.org # 5.10+ Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit 3a7efd1a ] What 84965ff8 ("io_uring: if we see flush on exit, cancel related tasks") really wants is to cancel all relevant REQ_F_INFLIGHT requests reliably. That can be achieved by io_uring_cancel_files(), but we'll miss it calling io_uring_cancel_task_requests(files=NULL) from io_uring_flush(), because it will go through __io_uring_cancel_task_requests(). Just always call io_uring_cancel_files() during cancel, it's good enough for now. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit 70b2c60d ] WARNING: CPU: 0 PID: 21359 at fs/io_uring.c:9042 io_uring_cancel_task_requests+0xe55/0x10c0 fs/io_uring.c:9042 Call Trace: io_uring_flush+0x47b/0x6e0 fs/io_uring.c:9227 filp_close+0xb4/0x170 fs/open.c:1295 close_files fs/file.c:403 [inline] put_files_struct fs/file.c:418 [inline] put_files_struct+0x1cc/0x350 fs/file.c:415 exit_files+0x7e/0xa0 fs/file.c:435 do_exit+0xc22/0x2ae0 kernel/exit.c:820 do_group_exit+0x125/0x310 kernel/exit.c:922 get_signal+0x427/0x20f0 kernel/signal.c:2773 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Now io_uring_cancel_task_requests() can be called not through file notes but directly, remove a WARN_ONCE() there that give us false positives. That check is not very important and we catch it in other places. Fixes: 84965ff8 ("io_uring: if we see flush on exit, cancel related tasks") Cc: stable@vger.kernel.org # 5.9+ Reported-by:
<syzbot+3e3d9bd0c6ce9efbc3ef@syzkaller.appspotmail.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit f609cbb8 ] kernel BUG at lib/list_debug.c:29! Call Trace: __list_add include/linux/list.h:67 [inline] list_add include/linux/list.h:86 [inline] io_file_get+0x8cc/0xdb0 fs/io_uring.c:6466 __io_splice_prep+0x1bc/0x530 fs/io_uring.c:3866 io_splice_prep fs/io_uring.c:3920 [inline] io_req_prep+0x3546/0x4e80 fs/io_uring.c:6081 io_queue_sqe+0x609/0x10d0 fs/io_uring.c:6628 io_submit_sqe fs/io_uring.c:6705 [inline] io_submit_sqes+0x1495/0x2720 fs/io_uring.c:6953 __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9353 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 io_file_get() may be called from splice, and so REQ_F_INFLIGHT may already be set. Fixes: 02a13674 ("io_uring: account io_uring internal files as REQ_F_INFLIGHT") Cc: stable@vger.kernel.org # 5.9+ Reported-by:
<syzbot+6879187cf57845801267@syzkaller.appspotmail.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hao Xu authored
[ Upstream commit 6195ba09 ] Abaci reported the follow warning: [ 27.073425] do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait_exclusive+0x3a/0xc0 [ 27.075805] WARNING: CPU: 0 PID: 951 at kernel/sched/core.c:7853 __might_sleep+0x80/0xa0 [ 27.077604] Modules linked in: [ 27.078379] CPU: 0 PID: 951 Comm: a.out Not tainted 5.11.0-rc3+ #1 [ 27.079637] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 27.080852] RIP: 0010:__might_sleep+0x80/0xa0 [ 27.081835] Code: 65 48 8b 04 25 80 71 01 00 48 8b 90 c0 15 00 00 48 8b 70 18 48 c7 c7 08 39 95 82 c6 05 f9 5f de 08 01 48 89 d1 e8 00 c6 fa ff 0b eb bf 41 0f b6 f5 48 c7 c7 40 23 c9 82 e8 f3 48 ec 00 eb a7 [ 27.084521] RSP: 0018:ffffc90000fe3ce8 EFLAGS: 00010286 [ 27.085350] RAX: 0000000000000000 RBX: ffffffff82956083 RCX: 0000000000000000 [ 27.086348] RDX: ffff8881057a0000 RSI: ffffffff8118cc9e RDI: ffff88813bc28570 [ 27.087598] RBP: 00000000000003a7 R08: 0000000000000001 R09: 0000000000000001 [ 27.088819] R10: ffffc90000fe3e00 R11: 00000000fffef9f0 R12: 0000000000000000 [ 27.089819] R13: 0000000000000000 R14: ffff88810576eb80 R15: ffff88810576e800 [ 27.091058] FS: 00007f7b144cf740(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 [ 27.092775] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 27.093796] CR2: 00000000022da7b8 CR3: 000000010b928002 CR4: 00000000003706f0 [ 27.094778] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 27.095780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 27.097011] Call Trace: [ 27.097685] __mutex_lock+0x5d/0xa30 [ 27.098565] ? prepare_to_wait_exclusive+0x71/0xc0 [ 27.099412] ? io_cqring_overflow_flush.part.101+0x6d/0x70 [ 27.100441] ? lockdep_hardirqs_on_prepare+0xe9/0x1c0 [ 27.101537] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 27.102656] ? trace_hardirqs_on+0x46/0x110 [ 27.103459] ? io_cqring_overflow_flush.part.101+0x6d/0x70 [ 27.104317] io_cqring_overflow_flush.part.101+0x6d/0x70 [ 27.105113] io_cqring_wait+0x36e/0x4d0 [ 27.105770] ? find_held_lock+0x28/0xb0 [ 27.106370] ? io_uring_remove_task_files+0xa0/0xa0 [ 27.107076] __x64_sys_io_uring_enter+0x4fb/0x640 [ 27.107801] ? rcu_read_lock_sched_held+0x59/0xa0 [ 27.108562] ? lockdep_hardirqs_on_prepare+0xe9/0x1c0 [ 27.109684] ? syscall_enter_from_user_mode+0x26/0x70 [ 27.110731] do_syscall_64+0x2d/0x40 [ 27.111296] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 27.112056] RIP: 0033:0x7f7b13dc8239 [ 27.112663] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48 [ 27.115113] RSP: 002b:00007ffd6d7f5c88 EFLAGS: 00000286 ORIG_RAX: 00000000000001aa [ 27.116562] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b13dc8239 [ 27.117961] RDX: 000000000000478e RSI: 0000000000000000 RDI: 0000000000000003 [ 27.118925] RBP: 00007ffd6d7f5cb0 R08: 0000000020000040 R09: 0000000000000008 [ 27.119773] R10: 0000000000000001 R11: 0000000000000286 R12: 0000000000400480 [ 27.120614] R13: 00007ffd6d7f5d90 R14: 0000000000000000 R15: 0000000000000000 [ 27.121490] irq event stamp: 5635 [ 27.121946] hardirqs last enabled at (5643): [] console_unlock+0x5c4/0x740 [ 27.123476] hardirqs last disabled at (5652): [] console_unlock+0x4e7/0x740 [ 27.125192] softirqs last enabled at (5272): [] __do_softirq+0x3c5/0x5aa [ 27.126430] softirqs last disabled at (5267): [] asm_call_irq_on_stack+0xf/0x20 [ 27.127634] ---[ end trace 289d7e28fa60f928 ]--- This is caused by calling io_cqring_overflow_flush() which may sleep after calling prepare_to_wait_exclusive() which set task state to TASK_INTERRUPTIBLE Reported-by:
Abaci <abaci@linux.alibaba.com> Fixes: 6c503150 ("io_uring: patch up IOPOLL overflow_flush sync") Reviewed-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Hao Xu <haoxu@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit ca70f00b ] do not call blocking ops when !TASK_RUNNING; state=2 set at [<00000000ced9dbfc>] prepare_to_wait+0x1f4/0x3b0 kernel/sched/wait.c:262 WARNING: CPU: 1 PID: 19888 at kernel/sched/core.c:7853 __might_sleep+0xed/0x100 kernel/sched/core.c:7848 RIP: 0010:__might_sleep+0xed/0x100 kernel/sched/core.c:7848 Call Trace: __mutex_lock_common+0xc4/0x2ef0 kernel/locking/mutex.c:935 __mutex_lock kernel/locking/mutex.c:1103 [inline] mutex_lock_nested+0x1a/0x20 kernel/locking/mutex.c:1118 io_wq_submit_work+0x39a/0x720 fs/io_uring.c:6411 io_run_cancel fs/io-wq.c:856 [inline] io_wqe_cancel_pending_work fs/io-wq.c:990 [inline] io_wq_cancel_cb+0x614/0xcb0 fs/io-wq.c:1027 io_uring_cancel_files fs/io_uring.c:8874 [inline] io_uring_cancel_task_requests fs/io_uring.c:8952 [inline] __io_uring_files_cancel+0x115d/0x19e0 fs/io_uring.c:9038 io_uring_files_cancel include/linux/io_uring.h:51 [inline] do_exit+0x2e6/0x2490 kernel/exit.c:780 do_group_exit+0x168/0x2d0 kernel/exit.c:922 get_signal+0x16b5/0x2030 kernel/signal.c:2770 arch_do_signal_or_restart+0x8e/0x6a0 arch/x86/kernel/signal.c:811 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x48/0x190 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Rewrite io_uring_cancel_files() to mimic __io_uring_task_cancel()'s counting scheme, so it does all the heavy work before setting TASK_UNINTERRUPTIBLE. Cc: stable@vger.kernel.org # 5.9+ Reported-by:
<syzbot+f655445043a26a7cfab8@syzkaller.appspotmail.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> [axboe: fix inverted task check] Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit c98de08c ] As tasks now cancel only theirs requests, and inflight_wait is awaited only in io_uring_cancel_files(), which should be called with ->in_idle set, instead of keeping a separate inflight_wait use tctx->wait. That will add some spurious wakeups but actually is safer from point of not hanging the task. e.g. task1 | IRQ | *start* io_complete_rw_common(link) | link: req1 -> req2 -> req3(with files) *cancel_files() | io_wq_cancel(), etc. | | put_req(link), adds to io-wq req2 schedule() | So, task1 will never try to cancel req2 or req3. If req2 is long-standing (e.g. read(empty_pipe)), this may hang. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit a1bb3cd5 ] If the tctx inflight number haven't changed because of cancellation, __io_uring_task_cancel() will continue leaving the task in TASK_UNINTERRUPTIBLE state, that's not expected by __io_uring_files_cancel(). Ensure we always call finish_wait() before retrying. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jens Axboe authored
[ Upstream commit 84965ff8 ] Ensure we match tasks that belong to a dead or dying task as well, as we need to reap those in addition to those belonging to the exiting task. Cc: stable@vger.kernel.org # 5.9+ Reported-by:
Josef Grieb <josef.grieb@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jens Axboe authored
[ Upstream commit 02a13674 ] We need to actively cancel anything that introduces a potential circular loop, where io_uring holds a reference to itself. If the file in question is an io_uring file, then add the request to the inflight list. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit bee749b1 ] io_uring_cancel_files()'s task check condition mistakenly got flipped. 1. There can't be a request in the inflight list without IO_WQ_WORK_FILES, kill this check to keep the whole condition simpler. 2. Also, don't call the function for files==NULL to not do such a check, all that staff is already handled well by its counter part, __io_uring_cancel_task_requests(). With that just flip the task check. Also, it iowq-cancels all request of current task there, don't forget to set right ->files into struct io_task_cancel. Fixes: c1973b38bf639 ("io_uring: cancel only requests of current task") Reported-by:
<syzbot+c0d52d0b3c0c3ffb9525@syzkaller.appspotmail.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit f6edbabb ] Instead of iterating over each request and cancelling it individually in io_uring_cancel_files(), try to cancel all matching requests and use ->inflight_list only to check if there anything left. In many cases it should be faster, and we can reuse a lot of code from task cancellation. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit 6b81928d ] Make io_poll_remove_all() and io_kill_timeouts() to match against files as well. A preparation patch, effectively not used by now. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit b52fda00 ] io_uring_cancel_files() guarantees to cancel all matching requests, that's not necessary to do that in a loop. Move it up in the callchain into io_uring_cancel_task_requests(). Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit 08d23634 ] Add io_match_task() that matches both task and files. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit 06de5f59 ] If IORING_SETUP_SQPOLL is set all requests belong to the corresponding SQPOLL task, so skip task checking in that case and always match. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Feb 10, 2021
-
-
Xiaoguang Wang authored
commit d7e10d47 upstream. Abaci Robot reported following panic: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 800000010ef3f067 P4D 800000010ef3f067 PUD 10d9df067 PMD 0 Oops: 0002 [#1] SMP PTI CPU: 0 PID: 1869 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc3+ #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:put_files_struct+0x1b/0x120 Code: 24 18 c7 00 f4 ff ff ff e9 4d fd ff ff 66 90 0f 1f 44 00 00 41 57 41 56 49 89 fe 41 55 41 54 55 53 48 83 ec 08 e8 b5 6b db ff 41 ff 0e 74 13 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f e9 9c RSP: 0000:ffffc90002147d48 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88810d9a5300 RCX: 0000000000000000 RDX: ffff88810d87c280 RSI: ffffffff8144ba6b RDI: 0000000000000000 RBP: 0000000000000080 R08: 0000000000000001 R09: ffffffff81431500 R10: ffff8881001be000 R11: 0000000000000000 R12: ffff88810ac2f800 R13: ffff88810af38a00 R14: 0000000000000000 R15: ffff8881057130c0 FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000010dbaa002 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __io_clean_op+0x10c/0x2a0 io_dismantle_req+0x3c7/0x600 __io_free_req+0x34/0x280 io_put_req+0x63/0xb0 io_worker_handle_work+0x60e/0x830 ? io_wqe_worker+0x135/0x520 io_wqe_worker+0x158/0x520 ? __kthread_parkme+0x96/0xc0 ? io_worker_handle_work+0x830/0x830 kthread+0x134/0x180 ? kthread_create_worker_on_cpu+0x90/0x90 ret_from_fork+0x1f/0x30 Modules linked in: CR2: 0000000000000000 ---[ end trace c358ca86af95b1e7 ]--- I guess case below can trigger above panic: there're two threads which operates different io_uring ctxs and share same sqthread identity, and later one thread exits, io_uring_cancel_task_requests() will clear task->io_uring->identity->files to be NULL in sqpoll mode, then another ctx that uses same identity will panic. Indeed we don't need to clear task->io_uring->identity->files here, io_grab_identity() should handle identity->files changes well, if task->io_uring->identity->files is not equal to current->files, io_cow_identity() should handle this changes well. Cc: stable@vger.kernel.org # 5.5+ Reported-by:
Abaci Robot <abaci@linux.alibaba.com> Signed-off-by:
Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Feb 03, 2021
-
-
Pavel Begunkov authored
commit 907d1df3 upstream. Joseph reports following deadlock: CPU0: ... io_kill_linked_timeout // &ctx->completion_lock io_commit_cqring __io_queue_deferred __io_queue_async_work io_wq_enqueue io_wqe_enqueue // &wqe->lock CPU1: ... __io_uring_files_cancel io_wq_cancel_cb io_wqe_cancel_pending_work // &wqe->lock io_cancel_task_cb // &ctx->completion_lock Only __io_queue_deferred() calls queue_async_work() while holding ctx->completion_lock, enqueue drained requests via io_req_task_queue() instead. Cc: stable@vger.kernel.org # 5.9+ Reported-by:
Joseph Qi <joseph.qi@linux.alibaba.com> Tested-by:
Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Jan 30, 2021
-
-
Pavel Begunkov authored
[ Upstream commit 9d5c8190 ] [ 27.629441] BUG: sleeping function called from invalid context at fs/file.c:402 [ 27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0 [ 27.633220] 1 lock held by io_wqe_worker-0/1012: [ 27.634286] #0: ffff888105e26c98 (&ctx->completion_lock) {....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70 [ 27.649249] Call Trace: [ 27.649874] dump_stack+0xac/0xe3 [ 27.650666] ___might_sleep+0x284/0x2c0 [ 27.651566] put_files_struct+0xb8/0x120 [ 27.652481] __io_clean_op+0x10c/0x2a0 [ 27.653362] __io_cqring_fill_event+0x2c1/0x350 [ 27.654399] __io_req_complete.part.102+0x41/0x70 [ 27.655464] io_openat2+0x151/0x300 [ 27.656297] io_issue_sqe+0x6c/0x14e0 [ 27.660991] io_wq_submit_work+0x7f/0x240 [ 27.662890] io_worker_handle_work+0x501/0x8a0 [ 27.664836] io_wqe_worker+0x158/0x520 [ 27.667726] kthread+0x134/0x180 [ 27.669641] ret_from_fork+0x1f/0x30 Instead of cleaning files on overflow, return back overflow cancellation into io_uring_cancel_files(). Previously it was racy to clean REQ_F_OVERFLOW flag, but we got rid of it, and can do it through repetitive attempts targeting all matching requests. Cc: stable@vger.kernel.org # 5.9+ Reported-by:
Abaci <abaci@linux.alibaba.com> Reported-by:
Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit 4aa84f2f ] CPU0 CPU1 ---- ---- lock(&new->fa_lock); local_irq_disable(); lock(&ctx->completion_lock); lock(&new->fa_lock); <Interrupt> lock(&ctx->completion_lock); *** DEADLOCK *** Move kill_fasync() out of io_commit_cqring() to io_cqring_ev_posted(), so it doesn't hold completion_lock while doing it. That saves from the reported deadlock, and it's just nice to shorten the locking time and untangle nested locks (compl_lock -> wq_head::lock). Cc: stable@vger.kernel.org # 5.5+ Reported-by:
<syzbot+91ca3f25bd7f795f019c@syzkaller.appspotmail.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit 0b5cd6c3 ] If there are no requests at the time __io_uring_task_cancel() is called, tctx_inflight() returns zero and and it terminates not getting a chance to go through __io_uring_files_cancel() and do io_disable_sqo_submit(). And we absolutely want them disabled by the time cancellation ends. Cc: stable@vger.kernel.org # 5.5+ Reported-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death") Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit 4325cb49 ] WARNING: CPU: 1 PID: 11100 at fs/io_uring.c:9096 io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096 RIP: 0010:io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096 Call Trace: filp_close+0xb4/0x170 fs/open.c:1280 close_files fs/file.c:401 [inline] put_files_struct fs/file.c:416 [inline] put_files_struct+0x1cc/0x350 fs/file.c:413 exit_files+0x7e/0xa0 fs/file.c:433 do_exit+0xc22/0x2ae0 kernel/exit.c:820 do_group_exit+0x125/0x310 kernel/exit.c:922 get_signal+0x3e9/0x20a0 kernel/signal.c:2770 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 An SQPOLL ring creator task may have gotten rid of its file note during exit and called io_disable_sqo_submit(), but the io_uring is still left referenced through fdtable, which will be put during close_files() and cause a false positive warning. First split the warning into two for more clarity when is hit, and the add sqo_dead check to handle the described case. Cc: stable@vger.kernel.org # 5.5+ Reported-by:
<syzbot+a32b546d58dde07875a1@syzkaller.appspotmail.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit 6b393a1f ] WARNING: CPU: 1 PID: 9094 at fs/io_uring.c:8884 io_disable_sqo_submit+0x106/0x130 fs/io_uring.c:8884 Call Trace: io_uring_flush+0x28b/0x3a0 fs/io_uring.c:9099 filp_close+0xb4/0x170 fs/open.c:1280 close_fd+0x5c/0x80 fs/file.c:626 __do_sys_close fs/open.c:1299 [inline] __se_sys_close fs/open.c:1297 [inline] __x64_sys_close+0x2f/0xa0 fs/open.c:1297 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 io_uring's final close() may be triggered by any task not only the creator. It's well handled by io_uring_flush() including SQPOLL case, though a warning in io_disable_sqo_submit() will fallaciously fire by moving this warning out to the only call site that matters. Cc: stable@vger.kernel.org # 5.5+ Reported-by:
<syzbot+2f5d1785dc624932da78@syzkaller.appspotmail.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit 06585c49 ] WARNING: CPU: 0 PID: 8494 at fs/io_uring.c:8717 io_ring_ctx_wait_and_kill+0x4f2/0x600 fs/io_uring.c:8717 Call Trace: io_uring_release+0x3e/0x50 fs/io_uring.c:8759 __fput+0x283/0x920 fs/file_table.c:280 task_work_run+0xdd/0x190 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 failed io_uring_install_fd() is a special case, we don't do io_ring_ctx_wait_and_kill() directly but defer it to fput, though still need to io_disable_sqo_submit() before. note: it doesn't fix any real problem, just a warning. That's because sqring won't be available to the userspace in this case and so SQPOLL won't submit anything. Cc: stable@vger.kernel.org # 5.5+ Reported-by:
<syzbot+9c9c35374c0ecac06516@syzkaller.appspotmail.com> Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit b4411616 ] general protection fault, probably for non-canonical address 0xdffffc0000000022: 0000 [#1] KASAN: null-ptr-deref in range [0x0000000000000110-0x0000000000000117] RIP: 0010:io_ring_set_wakeup_flag fs/io_uring.c:6929 [inline] RIP: 0010:io_disable_sqo_submit+0xdb/0x130 fs/io_uring.c:8891 Call Trace: io_uring_create fs/io_uring.c:9711 [inline] io_uring_setup+0x12b1/0x38e0 fs/io_uring.c:9739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 io_disable_sqo_submit() might be called before user rings were allocated, don't do io_ring_set_wakeup_flag() in those cases. Cc: stable@vger.kernel.org # 5.5+ Reported-by:
<syzbot+ab412638aeb652ded540@syzkaller.appspotmail.com> Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
[ Upstream commit d9d05217 ] When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want its internals like ->files and ->mm to be poked by the SQPOLL task, it have never been nice and recently got racy. That can happen when the owner undergoes destruction and SQPOLL tasks tries to submit new requests in parallel, and so calls io_sq_thread_acquire*(). That patch halts SQPOLL submissions when sqo_task dies by introducing sqo_dead flag. Once set, the SQPOLL task must not do any submission, which is synchronised by uring_lock as well as the new flag. The tricky part is to make sure that disabling always happens, that means either the ring is discovered by creator's do_exit() -> cancel, or if the final close() happens before it's done by the creator. The last is guaranteed by the fact that for SQPOLL the creator task and only it holds exactly one file note, so either it pins up to do_exit() or removed by the creator on the final put in flush. (see comments in uring_flush() around file->f_count == 2). One more place that can trigger io_sq_thread_acquire_*() is __io_req_task_submit(). Shoot off requests on sqo_dead there, even though actually we don't need to. That's because cancellation of sqo_task should wait for the request before going any further. note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the caller would enter the ring to get an error, but it still doesn't guarantee that the flag won't be cleared. note 2: if final __userspace__ close happens not from the creator task, the file note will pin the ring until the task dies. Cc: stable@vger.kernel.org # 5.5+ Fixed: b1b6b5a3 ("kernel/io_uring: cancel io_uring before task works") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-