Skip to content
Snippets Groups Projects
  1. May 14, 2021
  2. Apr 16, 2021
  3. Apr 10, 2021
  4. Apr 07, 2021
  5. Mar 30, 2021
  6. Mar 25, 2021
    • Jens Axboe's avatar
      io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return · 76f49668
      Jens Axboe authored
      
      [ Upstream commit b5b0ecb7 ]
      
      The callback can only be armed, if we get -EIOCBQUEUED returned. It's
      important that we clear the WAITQ bit for other cases, otherwise we can
      queue for async retry and filemap will assume that we're armed and
      return -EAGAIN instead of just blocking for the IO.
      
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      76f49668
    • Jens Axboe's avatar
      io_uring: don't attempt IO reissue from the ring exit path · 3c08f772
      Jens Axboe authored
      
      [ Upstream commit 7c977a58 ]
      
      If we're exiting the ring, just let the IO fail with -EAGAIN as nobody
      will care anyway. It's not the right context to reissue from.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3c08f772
    • Pavel Begunkov's avatar
      io_uring: fix inconsistent lock state · 1c20e904
      Pavel Begunkov authored
      
      [ Upstream commit 9ae1f8dd ]
      
      WARNING: inconsistent lock state
      
      inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
      syz-executor217/8450 [HC1[1]:SC0[0]:HE0:SE1] takes:
      ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
      ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: io_req_clean_work fs/io_uring.c:1398 [inline]
      ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: io_dismantle_req+0x66f/0xf60 fs/io_uring.c:2029
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&fs->lock);
        <Interrupt>
          lock(&fs->lock);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor217/8450:
       #0: ffff88802417c3e8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x1071/0x1f30 fs/io_uring.c:9442
      
      stack backtrace:
      CPU: 1 PID: 8450 Comm: syz-executor217 Not tainted 5.11.0-rc5-next-20210129-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
      [...]
       _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:354 [inline]
       io_req_clean_work fs/io_uring.c:1398 [inline]
       io_dismantle_req+0x66f/0xf60 fs/io_uring.c:2029
       __io_free_req+0x3d/0x2e0 fs/io_uring.c:2046
       io_free_req fs/io_uring.c:2269 [inline]
       io_double_put_req fs/io_uring.c:2392 [inline]
       io_put_req+0xf9/0x570 fs/io_uring.c:2388
       io_link_timeout_fn+0x30c/0x480 fs/io_uring.c:6497
       __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
       __hrtimer_run_queues+0x609/0xe40 kernel/time/hrtimer.c:1583
       hrtimer_interrupt+0x334/0x940 kernel/time/hrtimer.c:1645
       local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1085 [inline]
       __sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1102
       asm_call_irq_on_stack+0xf/0x20
       </IRQ>
       __run_sysvec_on_irqstack arch/x86/include/asm/irq_stack.h:37 [inline]
       run_sysvec_on_irqstack_cond arch/x86/include/asm/irq_stack.h:89 [inline]
       sysvec_apic_timer_interrupt+0xbd/0x100 arch/x86/kernel/apic/apic.c:1096
       asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:629
      RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
      RIP: 0010:_raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:199
       spin_unlock_irq include/linux/spinlock.h:404 [inline]
       io_queue_linked_timeout+0x194/0x1f0 fs/io_uring.c:6525
       __io_queue_sqe+0x328/0x1290 fs/io_uring.c:6594
       io_queue_sqe+0x631/0x10d0 fs/io_uring.c:6639
       io_queue_link_head fs/io_uring.c:6650 [inline]
       io_submit_sqe fs/io_uring.c:6697 [inline]
       io_submit_sqes+0x19b5/0x2720 fs/io_uring.c:6960
       __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9443
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Don't free requests from under hrtimer context (softirq) as it may sleep
      or take spinlocks improperly (e.g. non-irq versions).
      
      Cc: stable@vger.kernel.org # 5.6+
      Reported-by: default avatar <syzbot+81d17233a2b02eafba33@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1c20e904
    • Jens Axboe's avatar
      io_uring: ensure that SQPOLL thread is started for exit · 6cae8095
      Jens Axboe authored
      
      commit 3ebba796 upstream.
      
      If we create it in a disabled state because IORING_SETUP_R_DISABLED is
      set on ring creation, we need to ensure that we've kicked the thread if
      we're exiting before it's been explicitly disabled. Otherwise we can run
      into a deadlock where exit is waiting go park the SQPOLL thread, but the
      SQPOLL thread itself is waiting to get a signal to start.
      
      That results in the below trace of both tasks hung, waiting on each other:
      
      INFO: task syz-executor458:8401 blocked for more than 143 seconds.
            Not tainted 5.11.0-next-20210226-syzkaller #0
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:syz-executor458 state:D stack:27536 pid: 8401 ppid:  8400 flags:0x00004004
      Call Trace:
       context_switch kernel/sched/core.c:4324 [inline]
       __schedule+0x90c/0x21a0 kernel/sched/core.c:5075
       schedule+0xcf/0x270 kernel/sched/core.c:5154
       schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868
       do_wait_for_common kernel/sched/completion.c:85 [inline]
       __wait_for_common kernel/sched/completion.c:106 [inline]
       wait_for_common kernel/sched/completion.c:117 [inline]
       wait_for_completion+0x168/0x270 kernel/sched/completion.c:138
       io_sq_thread_park fs/io_uring.c:7115 [inline]
       io_sq_thread_park+0xd5/0x130 fs/io_uring.c:7103
       io_uring_cancel_task_requests+0x24c/0xd90 fs/io_uring.c:8745
       __io_uring_files_cancel+0x110/0x230 fs/io_uring.c:8840
       io_uring_files_cancel include/linux/io_uring.h:47 [inline]
       do_exit+0x299/0x2a60 kernel/exit.c:780
       do_group_exit+0x125/0x310 kernel/exit.c:922
       __do_sys_exit_group kernel/exit.c:933 [inline]
       __se_sys_exit_group kernel/exit.c:931 [inline]
       __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x43e899
      RSP: 002b:00007ffe89376d48 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 00000000004af2f0 RCX: 000000000043e899
      RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
      RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000010000000
      R10: 0000000000008011 R11: 0000000000000246 R12: 00000000004af2f0
      R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
      INFO: task iou-sqp-8401:8402 can't die for more than 143 seconds.
      task:iou-sqp-8401    state:D stack:30272 pid: 8402 ppid:  8400 flags:0x00004004
      Call Trace:
       context_switch kernel/sched/core.c:4324 [inline]
       __schedule+0x90c/0x21a0 kernel/sched/core.c:5075
       schedule+0xcf/0x270 kernel/sched/core.c:5154
       schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868
       do_wait_for_common kernel/sched/completion.c:85 [inline]
       __wait_for_common kernel/sched/completion.c:106 [inline]
       wait_for_common kernel/sched/completion.c:117 [inline]
       wait_for_completion+0x168/0x270 kernel/sched/completion.c:138
       io_sq_thread+0x27d/0x1ae0 fs/io_uring.c:6717
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      INFO: task iou-sqp-8401:8402 blocked for more than 143 seconds.
      
      Reported-by: default avatar <syzbot+fb5458330b4442f2090d@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6cae8095
  7. Mar 09, 2021
    • Jens Axboe's avatar
      io_uring: ignore double poll add on the same waitqueue head · a2501d87
      Jens Axboe authored
      
      commit 1c3b3e65 upstream.
      
      syzbot reports a deadlock, attempting to lock the same spinlock twice:
      
      ============================================
      WARNING: possible recursive locking detected
      5.11.0-syzkaller #0 Not tainted
      --------------------------------------------
      swapper/1/0 is trying to acquire lock:
      ffff88801b2b1130 (&runtime->sleep){..-.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
      ffff88801b2b1130 (&runtime->sleep){..-.}-{2:2}, at: io_poll_double_wake+0x25f/0x6a0 fs/io_uring.c:4960
      
      but task is already holding lock:
      ffff88801b2b3130 (&runtime->sleep){..-.}-{2:2}, at: __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:137
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&runtime->sleep);
        lock(&runtime->sleep);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by swapper/1/0:
       #0: ffff888147474908 (&group->lock){..-.}-{2:2}, at: _snd_pcm_stream_lock_irqsave+0x9f/0xd0 sound/core/pcm_native.c:170
       #1: ffff88801b2b3130 (&runtime->sleep){..-.}-{2:2}, at: __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:137
      
      stack backtrace:
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.11.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0xfa/0x151 lib/dump_stack.c:120
       print_deadlock_bug kernel/locking/lockdep.c:2829 [inline]
       check_deadlock kernel/locking/lockdep.c:2872 [inline]
       validate_chain kernel/locking/lockdep.c:3661 [inline]
       __lock_acquire.cold+0x14c/0x3b4 kernel/locking/lockdep.c:4900
       lock_acquire kernel/locking/lockdep.c:5510 [inline]
       lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:354 [inline]
       io_poll_double_wake+0x25f/0x6a0 fs/io_uring.c:4960
       __wake_up_common+0x147/0x650 kernel/sched/wait.c:108
       __wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:138
       snd_pcm_update_state+0x46a/0x540 sound/core/pcm_lib.c:203
       snd_pcm_update_hw_ptr0+0xa75/0x1a50 sound/core/pcm_lib.c:464
       snd_pcm_period_elapsed+0x160/0x250 sound/core/pcm_lib.c:1805
       dummy_hrtimer_callback+0x94/0x1b0 sound/drivers/dummy.c:378
       __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
       __hrtimer_run_queues+0x609/0xe40 kernel/time/hrtimer.c:1583
       hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1600
       __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
       invoke_softirq kernel/softirq.c:221 [inline]
       __irq_exit_rcu kernel/softirq.c:422 [inline]
       irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
       sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100
       </IRQ>
       asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
      RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline]
      RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:70 [inline]
      RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:137 [inline]
      RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline]
      RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:516
      Code: dd 38 6e f8 84 db 75 ac e8 54 32 6e f8 e8 0f 1c 74 f8 e9 0c 00 00 00 e8 45 32 6e f8 0f 00 2d 4e 4a c5 00 e8 39 32 6e f8 fb f4 <9c> 5b 81 e3 00 02 00 00 fa 31 ff 48 89 de e8 14 3a 6e f8 48 85 db
      RSP: 0018:ffffc90000d47d18 EFLAGS: 00000293
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: ffff8880115c3780 RSI: ffffffff89052537 RDI: 0000000000000000
      RBP: ffff888141127064 R08: 0000000000000001 R09: 0000000000000001
      R10: ffffffff81794168 R11: 0000000000000000 R12: 0000000000000001
      R13: ffff888141127000 R14: ffff888141127064 R15: ffff888143331804
       acpi_idle_enter+0x361/0x500 drivers/acpi/processor_idle.c:647
       cpuidle_enter_state+0x1b1/0xc80 drivers/cpuidle/cpuidle.c:237
       cpuidle_enter+0x4a/0xa0 drivers/cpuidle/cpuidle.c:351
       call_cpuidle kernel/sched/idle.c:158 [inline]
       cpuidle_idle_call kernel/sched/idle.c:239 [inline]
       do_idle+0x3e1/0x590 kernel/sched/idle.c:300
       cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:397
       start_secondary+0x274/0x350 arch/x86/kernel/smpboot.c:272
       secondary_startup_64_no_verify+0xb0/0xbb
      
      which is due to the driver doing poll_wait() twice on the same
      wait_queue_head. That is perfectly valid, but from checking the rest
      of the kernel tree, it's the only driver that does this.
      
      We can handle this just fine, we just need to ignore the second addition
      as we'll get woken just fine on the first one.
      
      Cc: stable@vger.kernel.org # 5.8+
      Fixes: 18bceab1 ("io_uring: allow POLL_ADD with double poll_wait() users")
      Reported-by: default avatar <syzbot+28abd693db9e92c160d8@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2501d87
  8. Mar 04, 2021
    • Hao Xu's avatar
      io_uring: fix possible deadlock in io_uring_poll · 81dfee47
      Hao Xu authored
      
      [ Upstream commit ed670c3f ]
      
      Abaci reported follow issue:
      
      [   30.615891] ======================================================
      [   30.616648] WARNING: possible circular locking dependency detected
      [   30.617423] 5.11.0-rc3-next-20210115 #1 Not tainted
      [   30.618035] ------------------------------------------------------
      [   30.618914] a.out/1128 is trying to acquire lock:
      [   30.619520] ffff88810b063868 (&ep->mtx){+.+.}-{3:3}, at: __ep_eventpoll_poll+0x9f/0x220
      [   30.620505]
      [   30.620505] but task is already holding lock:
      [   30.621218] ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0
      [   30.622349]
      [   30.622349] which lock already depends on the new lock.
      [   30.622349]
      [   30.623289]
      [   30.623289] the existing dependency chain (in reverse order) is:
      [   30.624243]
      [   30.624243] -> #1 (&ctx->uring_lock){+.+.}-{3:3}:
      [   30.625263]        lock_acquire+0x2c7/0x390
      [   30.625868]        __mutex_lock+0xae/0x9f0
      [   30.626451]        io_cqring_overflow_flush.part.95+0x6d/0x70
      [   30.627278]        io_uring_poll+0xcb/0xd0
      [   30.627890]        ep_item_poll.isra.14+0x4e/0x90
      [   30.628531]        do_epoll_ctl+0xb7e/0x1120
      [   30.629122]        __x64_sys_epoll_ctl+0x70/0xb0
      [   30.629770]        do_syscall_64+0x2d/0x40
      [   30.630332]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   30.631187]
      [   30.631187] -> #0 (&ep->mtx){+.+.}-{3:3}:
      [   30.631985]        check_prevs_add+0x226/0xb00
      [   30.632584]        __lock_acquire+0x1237/0x13a0
      [   30.633207]        lock_acquire+0x2c7/0x390
      [   30.633740]        __mutex_lock+0xae/0x9f0
      [   30.634258]        __ep_eventpoll_poll+0x9f/0x220
      [   30.634879]        __io_arm_poll_handler+0xbf/0x220
      [   30.635462]        io_issue_sqe+0xa6b/0x13e0
      [   30.635982]        __io_queue_sqe+0x10b/0x550
      [   30.636648]        io_queue_sqe+0x235/0x470
      [   30.637281]        io_submit_sqes+0xcce/0xf10
      [   30.637839]        __x64_sys_io_uring_enter+0x3fb/0x5b0
      [   30.638465]        do_syscall_64+0x2d/0x40
      [   30.638999]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   30.639643]
      [   30.639643] other info that might help us debug this:
      [   30.639643]
      [   30.640618]  Possible unsafe locking scenario:
      [   30.640618]
      [   30.641402]        CPU0                    CPU1
      [   30.641938]        ----                    ----
      [   30.642664]   lock(&ctx->uring_lock);
      [   30.643425]                                lock(&ep->mtx);
      [   30.644498]                                lock(&ctx->uring_lock);
      [   30.645668]   lock(&ep->mtx);
      [   30.646321]
      [   30.646321]  *** DEADLOCK ***
      [   30.646321]
      [   30.647642] 1 lock held by a.out/1128:
      [   30.648424]  #0: ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0
      [   30.649954]
      [   30.649954] stack backtrace:
      [   30.650592] CPU: 1 PID: 1128 Comm: a.out Not tainted 5.11.0-rc3-next-20210115 #1
      [   30.651554] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [   30.652290] Call Trace:
      [   30.652688]  dump_stack+0xac/0xe3
      [   30.653164]  check_noncircular+0x11e/0x130
      [   30.653747]  ? check_prevs_add+0x226/0xb00
      [   30.654303]  check_prevs_add+0x226/0xb00
      [   30.654845]  ? add_lock_to_list.constprop.49+0xac/0x1d0
      [   30.655564]  __lock_acquire+0x1237/0x13a0
      [   30.656262]  lock_acquire+0x2c7/0x390
      [   30.656788]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.657379]  ? __io_queue_proc.isra.88+0x180/0x180
      [   30.658014]  __mutex_lock+0xae/0x9f0
      [   30.658524]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.659112]  ? mark_held_locks+0x5a/0x80
      [   30.659648]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.660229]  ? _raw_spin_unlock_irqrestore+0x2d/0x40
      [   30.660885]  ? trace_hardirqs_on+0x46/0x110
      [   30.661471]  ? __io_queue_proc.isra.88+0x180/0x180
      [   30.662102]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.662696]  __ep_eventpoll_poll+0x9f/0x220
      [   30.663273]  ? __ep_eventpoll_poll+0x220/0x220
      [   30.663875]  __io_arm_poll_handler+0xbf/0x220
      [   30.664463]  io_issue_sqe+0xa6b/0x13e0
      [   30.664984]  ? __lock_acquire+0x782/0x13a0
      [   30.665544]  ? __io_queue_proc.isra.88+0x180/0x180
      [   30.666170]  ? __io_queue_sqe+0x10b/0x550
      [   30.666725]  __io_queue_sqe+0x10b/0x550
      [   30.667252]  ? __fget_files+0x131/0x260
      [   30.667791]  ? io_req_prep+0xd8/0x1090
      [   30.668316]  ? io_queue_sqe+0x235/0x470
      [   30.668868]  io_queue_sqe+0x235/0x470
      [   30.669398]  io_submit_sqes+0xcce/0xf10
      [   30.669931]  ? xa_load+0xe4/0x1c0
      [   30.670425]  __x64_sys_io_uring_enter+0x3fb/0x5b0
      [   30.671051]  ? lockdep_hardirqs_on_prepare+0xde/0x180
      [   30.671719]  ? syscall_enter_from_user_mode+0x2b/0x80
      [   30.672380]  do_syscall_64+0x2d/0x40
      [   30.672901]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   30.673503] RIP: 0033:0x7fd89c813239
      [   30.673962] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05  3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48
      [   30.675920] RSP: 002b:00007ffc65a7c628 EFLAGS: 00000217 ORIG_RAX: 00000000000001aa
      [   30.676791] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd89c813239
      [   30.677594] RDX: 0000000000000000 RSI: 0000000000000014 RDI: 0000000000000003
      [   30.678678] RBP: 00007ffc65a7c720 R08: 0000000000000000 R09: 0000000003000000
      [   30.679492] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400ff0
      [   30.680282] R13: 00007ffc65a7c840 R14: 0000000000000000 R15: 0000000000000000
      
      This might happen if we do epoll_wait on a uring fd while reading/writing
      the former epoll fd in a sqe in the former uring instance.
      So let's don't flush cqring overflow list, just do a simple check.
      
      Reported-by: default avatarAbaci <abaci@linux.alibaba.com>
      Fixes: 6c503150 ("io_uring: patch up IOPOLL overflow_flush sync")
      Signed-off-by: default avatarHao Xu <haoxu@linux.alibaba.com>
      Reviewed-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      81dfee47
  9. Feb 13, 2021
  10. Feb 10, 2021
    • Xiaoguang Wang's avatar
      io_uring: don't modify identity's files uncess identity is cowed · 4f25d448
      Xiaoguang Wang authored
      
      commit d7e10d47 upstream.
      
      Abaci Robot reported following panic:
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 800000010ef3f067 P4D 800000010ef3f067 PUD 10d9df067 PMD 0
      Oops: 0002 [#1] SMP PTI
      CPU: 0 PID: 1869 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc3+ #1
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      RIP: 0010:put_files_struct+0x1b/0x120
      Code: 24 18 c7 00 f4 ff ff ff e9 4d fd ff ff 66 90 0f 1f 44 00 00 41 57 41 56 49 89 fe 41 55 41 54 55 53 48 83 ec 08 e8 b5 6b db ff  41 ff 0e 74 13 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f e9 9c
      RSP: 0000:ffffc90002147d48 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: ffff88810d9a5300 RCX: 0000000000000000
      RDX: ffff88810d87c280 RSI: ffffffff8144ba6b RDI: 0000000000000000
      RBP: 0000000000000080 R08: 0000000000000001 R09: ffffffff81431500
      R10: ffff8881001be000 R11: 0000000000000000 R12: ffff88810ac2f800
      R13: ffff88810af38a00 R14: 0000000000000000 R15: ffff8881057130c0
      FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 000000010dbaa002 CR4: 00000000003706f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __io_clean_op+0x10c/0x2a0
       io_dismantle_req+0x3c7/0x600
       __io_free_req+0x34/0x280
       io_put_req+0x63/0xb0
       io_worker_handle_work+0x60e/0x830
       ? io_wqe_worker+0x135/0x520
       io_wqe_worker+0x158/0x520
       ? __kthread_parkme+0x96/0xc0
       ? io_worker_handle_work+0x830/0x830
       kthread+0x134/0x180
       ? kthread_create_worker_on_cpu+0x90/0x90
       ret_from_fork+0x1f/0x30
      Modules linked in:
      CR2: 0000000000000000
      ---[ end trace c358ca86af95b1e7 ]---
      
      I guess case below can trigger above panic: there're two threads which
      operates different io_uring ctxs and share same sqthread identity, and
      later one thread exits, io_uring_cancel_task_requests() will clear
      task->io_uring->identity->files to be NULL in sqpoll mode, then another
      ctx that uses same identity will panic.
      
      Indeed we don't need to clear task->io_uring->identity->files here,
      io_grab_identity() should handle identity->files changes well, if
      task->io_uring->identity->files is not equal to current->files,
      io_cow_identity() should handle this changes well.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: default avatarXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Reviewed-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f25d448
  11. Feb 03, 2021
  12. Jan 30, 2021
    • Pavel Begunkov's avatar
      io_uring: fix sleeping under spin in __io_clean_op · d92d0086
      Pavel Begunkov authored
      
      [ Upstream commit 9d5c8190 ]
      
      [   27.629441] BUG: sleeping function called from invalid context
      	at fs/file.c:402
      [   27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0,
      	pid: 1012, name: io_wqe_worker-0
      [   27.633220] 1 lock held by io_wqe_worker-0/1012:
      [   27.634286]  #0: ffff888105e26c98 (&ctx->completion_lock)
      	{....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70
      [   27.649249] Call Trace:
      [   27.649874]  dump_stack+0xac/0xe3
      [   27.650666]  ___might_sleep+0x284/0x2c0
      [   27.651566]  put_files_struct+0xb8/0x120
      [   27.652481]  __io_clean_op+0x10c/0x2a0
      [   27.653362]  __io_cqring_fill_event+0x2c1/0x350
      [   27.654399]  __io_req_complete.part.102+0x41/0x70
      [   27.655464]  io_openat2+0x151/0x300
      [   27.656297]  io_issue_sqe+0x6c/0x14e0
      [   27.660991]  io_wq_submit_work+0x7f/0x240
      [   27.662890]  io_worker_handle_work+0x501/0x8a0
      [   27.664836]  io_wqe_worker+0x158/0x520
      [   27.667726]  kthread+0x134/0x180
      [   27.669641]  ret_from_fork+0x1f/0x30
      
      Instead of cleaning files on overflow, return back overflow cancellation
      into io_uring_cancel_files(). Previously it was racy to clean
      REQ_F_OVERFLOW flag, but we got rid of it, and can do it through
      repetitive attempts targeting all matching requests.
      
      Cc: stable@vger.kernel.org # 5.9+
      Reported-by: default avatarAbaci <abaci@linux.alibaba.com>
      Reported-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d92d0086
    • Pavel Begunkov's avatar
      io_uring: dont kill fasync under completion_lock · 7bccd1c1
      Pavel Begunkov authored
      
      [ Upstream commit 4aa84f2f ]
      
            CPU0                    CPU1
             ----                    ----
        lock(&new->fa_lock);
                                     local_irq_disable();
                                     lock(&ctx->completion_lock);
                                     lock(&new->fa_lock);
        <Interrupt>
          lock(&ctx->completion_lock);
      
       *** DEADLOCK ***
      
      Move kill_fasync() out of io_commit_cqring() to io_cqring_ev_posted(),
      so it doesn't hold completion_lock while doing it. That saves from the
      reported deadlock, and it's just nice to shorten the locking time and
      untangle nested locks (compl_lock -> wq_head::lock).
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: default avatar <syzbot+91ca3f25bd7f795f019c@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7bccd1c1
    • Pavel Begunkov's avatar
      io_uring: fix skipping disabling sqo on exec · 186725a8
      Pavel Begunkov authored
      
      [ Upstream commit 0b5cd6c3 ]
      
      If there are no requests at the time __io_uring_task_cancel() is called,
      tctx_inflight() returns zero and and it terminates not getting a chance
      to go through __io_uring_files_cancel() and do
      io_disable_sqo_submit(). And we absolutely want them disabled by the
      time cancellation ends.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      186725a8
    • Pavel Begunkov's avatar
      io_uring: fix uring_flush in exit_files() warning · 54b4c4f9
      Pavel Begunkov authored
      
      [ Upstream commit 4325cb49 ]
      
      WARNING: CPU: 1 PID: 11100 at fs/io_uring.c:9096
      	io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096
      RIP: 0010:io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096
      Call Trace:
       filp_close+0xb4/0x170 fs/open.c:1280
       close_files fs/file.c:401 [inline]
       put_files_struct fs/file.c:416 [inline]
       put_files_struct+0x1cc/0x350 fs/file.c:413
       exit_files+0x7e/0xa0 fs/file.c:433
       do_exit+0xc22/0x2ae0 kernel/exit.c:820
       do_group_exit+0x125/0x310 kernel/exit.c:922
       get_signal+0x3e9/0x20a0 kernel/signal.c:2770
       arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
       handle_signal_work kernel/entry/common.c:147 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      An SQPOLL ring creator task may have gotten rid of its file note during
      exit and called io_disable_sqo_submit(), but the io_uring is still left
      referenced through fdtable, which will be put during close_files() and
      cause a false positive warning.
      
      First split the warning into two for more clarity when is hit, and the
      add sqo_dead check to handle the described case.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: default avatar <syzbot+a32b546d58dde07875a1@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54b4c4f9
    • Pavel Begunkov's avatar
      io_uring: fix false positive sqo warning on flush · 06827591
      Pavel Begunkov authored
      
      [ Upstream commit 6b393a1f ]
      
      WARNING: CPU: 1 PID: 9094 at fs/io_uring.c:8884
      	io_disable_sqo_submit+0x106/0x130 fs/io_uring.c:8884
      Call Trace:
       io_uring_flush+0x28b/0x3a0 fs/io_uring.c:9099
       filp_close+0xb4/0x170 fs/open.c:1280
       close_fd+0x5c/0x80 fs/file.c:626
       __do_sys_close fs/open.c:1299 [inline]
       __se_sys_close fs/open.c:1297 [inline]
       __x64_sys_close+0x2f/0xa0 fs/open.c:1297
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      io_uring's final close() may be triggered by any task not only the
      creator. It's well handled by io_uring_flush() including SQPOLL case,
      though a warning in io_disable_sqo_submit() will fallaciously fire by
      moving this warning out to the only call site that matters.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: default avatar <syzbot+2f5d1785dc624932da78@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06827591
    • Pavel Begunkov's avatar
      io_uring: do sqo disable on install_fd error · 8cb6f4da
      Pavel Begunkov authored
      
      [ Upstream commit 06585c49 ]
      
      WARNING: CPU: 0 PID: 8494 at fs/io_uring.c:8717
      	io_ring_ctx_wait_and_kill+0x4f2/0x600 fs/io_uring.c:8717
      Call Trace:
       io_uring_release+0x3e/0x50 fs/io_uring.c:8759
       __fput+0x283/0x920 fs/file_table.c:280
       task_work_run+0xdd/0x190 kernel/task_work.c:140
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
       exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      failed io_uring_install_fd() is a special case, we don't do
      io_ring_ctx_wait_and_kill() directly but defer it to fput, though still
      need to io_disable_sqo_submit() before.
      
      note: it doesn't fix any real problem, just a warning. That's because
      sqring won't be available to the userspace in this case and so SQPOLL
      won't submit anything.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: default avatar <syzbot+9c9c35374c0ecac06516@syzkaller.appspotmail.com>
      Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cb6f4da
    • Pavel Begunkov's avatar
      io_uring: fix null-deref in io_disable_sqo_submit · 0e3562e3
      Pavel Begunkov authored
      
      [ Upstream commit b4411616 ]
      
      general protection fault, probably for non-canonical address
      	0xdffffc0000000022: 0000 [#1] KASAN: null-ptr-deref
      	in range [0x0000000000000110-0x0000000000000117]
      RIP: 0010:io_ring_set_wakeup_flag fs/io_uring.c:6929 [inline]
      RIP: 0010:io_disable_sqo_submit+0xdb/0x130 fs/io_uring.c:8891
      Call Trace:
       io_uring_create fs/io_uring.c:9711 [inline]
       io_uring_setup+0x12b1/0x38e0 fs/io_uring.c:9739
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      io_disable_sqo_submit() might be called before user rings were
      allocated, don't do io_ring_set_wakeup_flag() in those cases.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: default avatar <syzbot+ab412638aeb652ded540@syzkaller.appspotmail.com>
      Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0e3562e3
    • Pavel Begunkov's avatar
      io_uring: stop SQPOLL submit on creator's death · a63d9157
      Pavel Begunkov authored
      
      [ Upstream commit d9d05217 ]
      
      When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want
      its internals like ->files and ->mm to be poked by the SQPOLL task, it
      have never been nice and recently got racy. That can happen when the
      owner undergoes destruction and SQPOLL tasks tries to submit new
      requests in parallel, and so calls io_sq_thread_acquire*().
      
      That patch halts SQPOLL submissions when sqo_task dies by introducing
      sqo_dead flag. Once set, the SQPOLL task must not do any submission,
      which is synchronised by uring_lock as well as the new flag.
      
      The tricky part is to make sure that disabling always happens, that
      means either the ring is discovered by creator's do_exit() -> cancel,
      or if the final close() happens before it's done by the creator. The
      last is guaranteed by the fact that for SQPOLL the creator task and only
      it holds exactly one file note, so either it pins up to do_exit() or
      removed by the creator on the final put in flush. (see comments in
      uring_flush() around file->f_count == 2).
      
      One more place that can trigger io_sq_thread_acquire_*() is
      __io_req_task_submit(). Shoot off requests on sqo_dead there, even
      though actually we don't need to. That's because cancellation of
      sqo_task should wait for the request before going any further.
      
      note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the
      caller would enter the ring to get an error, but it still doesn't
      guarantee that the flag won't be cleared.
      
      note 2: if final __userspace__ close happens not from the creator
      task, the file note will pin the ring until the task dies.
      
      Cc: stable@vger.kernel.org # 5.5+
      Fixed: b1b6b5a3 ("kernel/io_uring: cancel io_uring before task works")
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a63d9157
Loading