Skip to content
Snippets Groups Projects
  1. Apr 12, 2021
  2. Jan 28, 2021
    • Sargun Dhillon's avatar
      ovl: implement volatile-specific fsync error behaviour · 335d3fc5
      Sargun Dhillon authored
      
      Overlayfs's volatile option allows the user to bypass all forced sync calls
      to the upperdir filesystem. This comes at the cost of safety. We can never
      ensure that the user's data is intact, but we can make a best effort to
      expose whether or not the data is likely to be in a bad state.
      
      The best way to handle this in the time being is that if an overlayfs's
      upperdir experiences an error after a volatile mount occurs, that error
      will be returned on fsync, fdatasync, sync, and syncfs. This is
      contradictory to the traditional behaviour of VFS which fails the call
      once, and only raises an error if a subsequent fsync error has occurred,
      and been raised by the filesystem.
      
      One awkward aspect of the patch is that we have to manually set the
      superblock's errseq_t after the sync_fs callback as opposed to just
      returning an error from syncfs. This is because the call chain looks
      something like this:
      
      sys_syncfs ->
      	sync_filesystem ->
      		__sync_filesystem ->
      			/* The return value is ignored here
      			sb->s_op->sync_fs(sb)
      			_sync_blockdev
      		/* Where the VFS fetches the error to raise to userspace */
      		errseq_check_and_advance
      
      Because of this we call errseq_set every time the sync_fs callback occurs.
      Due to the nature of this seen / unseen dichotomy, if the upperdir is an
      inconsistent state at the initial mount time, overlayfs will refuse to
      mount, as overlayfs cannot get a snapshot of the upperdir's errseq that
      will increment on error until the user calls syncfs.
      
      Signed-off-by: default avatarSargun Dhillon <sargun@sargun.me>
      Suggested-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Fixes: c86243b0 ("ovl: provide a mount option "volatile"")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      335d3fc5
    • Amir Goldstein's avatar
      ovl: skip getxattr of security labels · 03fedf93
      Amir Goldstein authored
      
      When inode has no listxattr op of its own (e.g. squashfs) vfs_listxattr
      calls the LSM inode_listsecurity hooks to list the xattrs that LSMs will
      intercept in inode_getxattr hooks.
      
      When selinux LSM is installed but not initialized, it will list the
      security.selinux xattr in inode_listsecurity, but will not intercept it
      in inode_getxattr.  This results in -ENODATA for a getxattr call for an
      xattr returned by listxattr.
      
      This situation was manifested as overlayfs failure to copy up lower
      files from squashfs when selinux is built-in but not initialized,
      because ovl_copy_xattr() iterates the lower inode xattrs by
      vfs_listxattr() and vfs_getxattr().
      
      ovl_copy_xattr() skips copy up of security labels that are indentified by
      inode_copy_up_xattr LSM hooks, but it does that after vfs_getxattr().
      Since we are not going to copy them, skip vfs_getxattr() of the security
      labels.
      
      Reported-by: default avatarMichael Labriola <michael.d.labriola@gmail.com>
      Tested-by: default avatarMichael Labriola <michael.d.labriola@gmail.com>
      Link: https://lore.kernel.org/linux-unionfs/2nv9d47zt7.fsf@aldarion.sourceruckus.org/
      
      
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      03fedf93
    • Liangyan's avatar
      ovl: fix dentry leak in ovl_get_redirect · e04527fe
      Liangyan authored
      
      We need to lock d_parent->d_lock before dget_dlock, or this may
      have d_lockref updated parallelly like calltrace below which will
      cause dentry->d_lockref leak and risk a crash.
      
           CPU 0                                CPU 1
      ovl_set_redirect                       lookup_fast
        ovl_get_redirect                       __d_lookup
          dget_dlock
            //no lock protection here            spin_lock(&dentry->d_lock)
            dentry->d_lockref.count++            dentry->d_lockref.count++
      
      [   49.799059] PGD 800000061fed7067 P4D 800000061fed7067 PUD 61fec5067 PMD 0
      [   49.799689] Oops: 0002 [#1] SMP PTI
      [   49.800019] CPU: 2 PID: 2332 Comm: node Not tainted 4.19.24-7.20.al7.x86_64 #1
      [   49.800678] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8a46cfe 04/01/2014
      [   49.801380] RIP: 0010:_raw_spin_lock+0xc/0x20
      [   49.803470] RSP: 0018:ffffac6fc5417e98 EFLAGS: 00010246
      [   49.803949] RAX: 0000000000000000 RBX: ffff93b8da3446c0 RCX: 0000000a00000000
      [   49.804600] RDX: 0000000000000001 RSI: 000000000000000a RDI: 0000000000000088
      [   49.805252] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff993cf040
      [   49.805898] R10: ffff93b92292e580 R11: ffffd27f188a4b80 R12: 0000000000000000
      [   49.806548] R13: 00000000ffffff9c R14: 00000000fffffffe R15: ffff93b8da3446c0
      [   49.807200] FS:  00007ffbedffb700(0000) GS:ffff93b927880000(0000) knlGS:0000000000000000
      [   49.807935] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   49.808461] CR2: 0000000000000088 CR3: 00000005e3f74006 CR4: 00000000003606a0
      [   49.809113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   49.809758] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   49.810410] Call Trace:
      [   49.810653]  d_delete+0x2c/0xb0
      [   49.810951]  vfs_rmdir+0xfd/0x120
      [   49.811264]  do_rmdir+0x14f/0x1a0
      [   49.811573]  do_syscall_64+0x5b/0x190
      [   49.811917]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   49.812385] RIP: 0033:0x7ffbf505ffd7
      [   49.814404] RSP: 002b:00007ffbedffada8 EFLAGS: 00000297 ORIG_RAX: 0000000000000054
      [   49.815098] RAX: ffffffffffffffda RBX: 00007ffbedffb640 RCX: 00007ffbf505ffd7
      [   49.815744] RDX: 0000000004449700 RSI: 0000000000000000 RDI: 0000000006c8cd50
      [   49.816394] RBP: 00007ffbedffaea0 R08: 0000000000000000 R09: 0000000000017d0b
      [   49.817038] R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000012
      [   49.817687] R13: 00000000072823d8 R14: 00007ffbedffb700 R15: 00000000072823d8
      [   49.818338] Modules linked in: pvpanic cirrusfb button qemu_fw_cfg atkbd libps2 i8042
      [   49.819052] CR2: 0000000000000088
      [   49.819368] ---[ end trace 4e652b8aa299aa2d ]---
      [   49.819796] RIP: 0010:_raw_spin_lock+0xc/0x20
      [   49.821880] RSP: 0018:ffffac6fc5417e98 EFLAGS: 00010246
      [   49.822363] RAX: 0000000000000000 RBX: ffff93b8da3446c0 RCX: 0000000a00000000
      [   49.823008] RDX: 0000000000000001 RSI: 000000000000000a RDI: 0000000000000088
      [   49.823658] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff993cf040
      [   49.825404] R10: ffff93b92292e580 R11: ffffd27f188a4b80 R12: 0000000000000000
      [   49.827147] R13: 00000000ffffff9c R14: 00000000fffffffe R15: ffff93b8da3446c0
      [   49.828890] FS:  00007ffbedffb700(0000) GS:ffff93b927880000(0000) knlGS:0000000000000000
      [   49.830725] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   49.832359] CR2: 0000000000000088 CR3: 00000005e3f74006 CR4: 00000000003606a0
      [   49.834085] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   49.835792] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Cc: <stable@vger.kernel.org>
      Fixes: a6c60655 ("ovl: redirect on rename-dir")
      Signed-off-by: default avatarLiangyan <liangyan.peng@linux.alibaba.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Suggested-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      e04527fe
    • Miklos Szeredi's avatar
      ovl: avoid deadlock on directory ioctl · b854cc65
      Miklos Szeredi authored
      
      The function ovl_dir_real_file() currently uses the inode lock to serialize
      writes to the od->upperfile field.
      
      However, this function will get called by ovl_ioctl_set_flags(), which
      utilizes the inode lock too.  In this case ovl_dir_real_file() will try to
      claim a lock that is owned by a function in its call stack, which won't get
      released before ovl_dir_real_file() returns.
      
      Fix by replacing the open coded compare and exchange by an explicit atomic
      op.
      
      Fixes: 61536bed ("ovl: support [S|G]ETFLAGS and FS[S|G]ETXATTR ioctls for directories")
      Cc: stable@vger.kernel.org # v5.10
      Reported-by: default avatarIcenowy Zheng <icenowy@aosc.io>
      Tested-by: default avatarIcenowy Zheng <icenowy@aosc.io>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      b854cc65
    • Miklos Szeredi's avatar
      ovl: perform vfs_getxattr() with mounter creds · 554677b9
      Miklos Szeredi authored
      
      The vfs_getxattr() in ovl_xattr_set() is used to check whether an xattr
      exist on a lower layer file that is to be removed.  If the xattr does not
      exist, then no need to copy up the file.
      
      This call of vfs_getxattr() wasn't wrapped in credential override, and this
      is probably okay.  But for consitency wrap this instance as well.
      
      Reported-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      554677b9
    • Miklos Szeredi's avatar
      ovl: add warning on user_ns mismatch · 9efb069d
      Miklos Szeredi authored
      
      Currently there's no way to create an overlay filesystem outside of the
      current user namespace.  Make sure that if this assumption changes it
      doesn't go unnoticed.
      
      Reported-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      9efb069d
  3. Jan 24, 2021
  4. Dec 14, 2020
    • Miklos Szeredi's avatar
      ovl: unprivieged mounts · 459c7c56
      Miklos Szeredi authored
      
      Enable unprivileged user namespace mounts of overlayfs.  Overlayfs's
      permission model (*) ensures that the mounter itself cannot gain additional
      privileges by the act of creating an overlayfs mount.
      
      This feature request is coming from the "rootless" container crowd.
      
      (*) Documentation/filesystems/overlayfs.txt#Permission model
      
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      459c7c56
    • Miklos Szeredi's avatar
      ovl: do not get metacopy for userxattr · 87b2c60c
      Miklos Szeredi authored
      
      When looking up an inode on the lower layer for which the mounter lacks
      read permisison the metacopy check will fail.  This causes the lookup to
      fail as well, even though the directory is readable.
      
      So ignore EACCES for the "userxattr" case and assume no metacopy for the
      unreadable file.
      
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      87b2c60c
    • Miklos Szeredi's avatar
      ovl: do not fail because of O_NOATIME · b6650dab
      Miklos Szeredi authored
      
      In case the file cannot be opened with O_NOATIME because of lack of
      capabilities, then clear O_NOATIME instead of failing.
      
      Remove WARN_ON(), since it would now trigger if O_NOATIME was cleared.
      Noticed by Amir Goldstein.
      
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      b6650dab
    • Miklos Szeredi's avatar
      ovl: do not fail when setting origin xattr · 6939f977
      Miklos Szeredi authored
      
      Comment above call already says this, but only EOPNOTSUPP is ignored, other
      failures are not.
      
      For example setting "user.*" will fail with EPERM on symlink/special.
      
      Ignore this error as well.
      
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      6939f977
    • Miklos Szeredi's avatar
      ovl: user xattr · 2d2f2d73
      Miklos Szeredi authored
      
      Optionally allow using "user.overlay." namespace instead of
      "trusted.overlay."
      
      This is necessary for overlayfs to be able to be mounted in an unprivileged
      namepsace.
      
      Make the option explicit, since it makes the filesystem format be
      incompatible.
      
      Disable redirect_dir and metacopy options, because these would allow
      privilege escalation through direct manipulation of the
      "user.overlay.redirect" or "user.overlay.metacopy" xattrs.
      
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: default avatarAmir Goldstein <amir73il@gmail.com>
      2d2f2d73
    • Miklos Szeredi's avatar
      ovl: simplify file splice · 82a763e6
      Miklos Szeredi authored
      
      generic_file_splice_read() and iter_file_splice_write() will call back into
      f_op->iter_read() and f_op->iter_write() respectively.  These already do
      the real file lookup and cred override.  So the code in ovl_splice_read()
      and ovl_splice_write() is redundant.
      
      In addition the ovl_file_accessed() call in ovl_splice_write() is
      incorrect, though probably harmless.
      
      Fix by calling generic_file_splice_read() and iter_file_splice_write()
      directly.
      
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      82a763e6
    • Miklos Szeredi's avatar
      ovl: make ioctl() safe · 89bdfaf9
      Miklos Szeredi authored
      
      ovl_ioctl_set_flags() does a capability check using flags, but then the
      real ioctl double-fetches flags and uses potentially different value.
      
      The "Check the capability before cred override" comment misleading: user
      can skip this check by presenting benign flags first and then overwriting
      them to non-benign flags.
      
      Just remove the cred override for now, hoping this doesn't cause a
      regression.
      
      The proper solution is to create a new setxflags i_op (patches are in the
      works).
      
      Xfstests don't show a regression.
      
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Fixes: dab5ca8f ("ovl: add lsattr/chattr support")
      Cc: <stable@vger.kernel.org> # v4.19
      89bdfaf9
    • Miklos Szeredi's avatar
      ovl: check privs before decoding file handle · c846af05
      Miklos Szeredi authored
      
      CAP_DAC_READ_SEARCH is required by open_by_handle_at(2) so check it in
      ovl_decode_real_fh() as well to prevent privilege escalation for
      unprivileged overlay mounts.
      
      [Amir] If the mounter is not capable in init ns, ovl_check_origin() and
      ovl_verify_index() will not function as expected and this will break index
      and nfs export features.  So check capability in ovl_can_decode_fh(), to
      auto disable those features.
      
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      c846af05
  5. Nov 12, 2020
Loading