Skip to content
Snippets Groups Projects
  1. Nov 17, 2015
  2. Nov 16, 2015
  3. Nov 15, 2015
    • Hannes Frederic Sowa's avatar
      af-unix: fix use-after-free with concurrent readers while splicing · 73ed5d25
      Hannes Frederic Sowa authored
      During splicing an af-unix socket to a pipe we have to drop all
      af-unix socket locks. While doing so we allow another reader to enter
      unix_stream_read_generic which can read, copy and finally free another
      skb. If exactly this skb is just in process of being spliced we get a
      use-after-free report by kasan.
      
      First, we must make sure to not have a free while the skb is used during
      the splice operation. We simply increment its use counter before unlocking
      the reader lock.
      
      Stream sockets have the nice characteristic that we don't care about
      zero length writes and they never reach the peer socket's queue. That
      said, we can take the UNIXCB.consumed field as the indicator if the
      skb was already freed from the socket's receive queue. If the skb was
      fully consumed after we locked the reader side again we know it has been
      dropped by a second reader. We indicate a short read to user space and
      abort the current splice operation.
      
      This bug has been found with syzkaller
      (http://github.com/google/syzkaller
      
      ) by Dmitry Vyukov.
      
      Fixes: 2b514574 ("net: af_unix: implement splice for stream af_unix sockets")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73ed5d25
  4. Oct 25, 2015
  5. Oct 05, 2015
  6. Sep 29, 2015
  7. Jun 11, 2015
  8. May 27, 2015
  9. May 25, 2015
  10. May 11, 2015
  11. Apr 15, 2015
  12. Mar 02, 2015
  13. Jan 29, 2015
    • Christoph Hellwig's avatar
      net: remove sock_iocb · 7cc05662
      Christoph Hellwig authored
      
      The sock_iocb structure is allocate on stack for each read/write-like
      operation on sockets, and contains various fields of which only the
      embedded msghdr and sometimes a pointer to the scm_cookie is ever used.
      Get rid of the sock_iocb and put a msghdr directly on the stack and pass
      the scm_cookie explicitly to netlink_mmap_sendmsg.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cc05662
  14. Dec 09, 2014
    • Al Viro's avatar
      put iov_iter into msghdr · c0371da6
      Al Viro authored
      
      Note that the code _using_ ->msg_iter at that point will be very
      unhappy with anything other than unshifted iovec-backed iov_iter.
      We still need to convert users to proper primitives.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c0371da6
  15. Nov 24, 2014
  16. Nov 05, 2014
    • David S. Miller's avatar
      net: Add and use skb_copy_datagram_msg() helper. · 51f3d02b
      David S. Miller authored
      
      This encapsulates all of the skb_copy_datagram_iovec() callers
      with call argument signature "skb, offset, msghdr->msg_iov, length".
      
      When we move to iov_iters in the networking, the iov_iter object will
      sit in the msghdr.
      
      Having a helper like this means there will be less places to touch
      during that transformation.
      
      Based upon descriptions and patch from Al Viro.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51f3d02b
  17. May 16, 2014
  18. Apr 18, 2014
  19. Apr 11, 2014
    • David S. Miller's avatar
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller authored
      
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      676d2369
  20. Mar 26, 2014
  21. Mar 06, 2014
    • Anton Blanchard's avatar
      net: unix socket code abuses csum_partial · 0a13404d
      Anton Blanchard authored
      
      The unix socket code is using the result of csum_partial to
      hash into a lookup table:
      
      	unix_hash_fold(csum_partial(sunaddr, len, 0));
      
      csum_partial is only guaranteed to produce something that can be
      folded into a checksum, as its prototype explains:
      
       * returns a 32-bit number suitable for feeding into itself
       * or csum_tcpudp_magic
      
      The 32bit value should not be used directly.
      
      Depending on the alignment, the ppc64 csum_partial will return
      different 32bit partial checksums that will fold into the same
      16bit checksum.
      
      This difference causes the following testcase (courtesy of
      Gustavo) to sometimes fail:
      
      #include <sys/socket.h>
      #include <stdio.h>
      
      int main()
      {
      	int fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);
      
      	int i = 1;
      	setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4);
      
      	struct sockaddr addr;
      	addr.sa_family = AF_LOCAL;
      	bind(fd, &addr, 2);
      
      	listen(fd, 128);
      
      	struct sockaddr_storage ss;
      	socklen_t sslen = (socklen_t)sizeof(ss);
      	getsockname(fd, (struct sockaddr*)&ss, &sslen);
      
      	fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);
      
      	if (connect(fd, (struct sockaddr*)&ss, sslen) == -1){
      		perror(NULL);
      		return 1;
      	}
      	printf("OK\n");
      	return 0;
      }
      
      As suggested by davem, fix this by using csum_fold to fold the
      partial 32bit checksum into a 16bit checksum before using it.
      
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a13404d
  22. Jan 19, 2014
  23. Dec 17, 2013
  24. Dec 11, 2013
  25. Dec 06, 2013
  26. Nov 21, 2013
    • Hannes Frederic Sowa's avatar
      net: rework recvmsg handler msg_name and msg_namelen logic · f3d33426
      Hannes Frederic Sowa authored
      
      This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
      set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
      to return msg_name to the user.
      
      This prevents numerous uninitialized memory leaks we had in the
      recvmsg handlers and makes it harder for new code to accidentally leak
      uninitialized memory.
      
      Optimize for the case recvfrom is called with NULL as address. We don't
      need to copy the address at all, so set it to NULL before invoking the
      recvmsg handler. We can do so, because all the recvmsg handlers must
      cope with the case a plain read() is called on them. read() also sets
      msg_name to NULL.
      
      Also document these changes in include/linux/net.h as suggested by David
      Miller.
      
      Changes since RFC:
      
      Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      affect sendto as it would bail out earlier while trying to copy-in the
      address. It also more naturally reflects the logic by the callers of
      verify_iovec.
      
      With this change in place I could remove "
      if (!uaddr || msg_sys->msg_namelen == 0)
      	msg->msg_name = NULL
      ".
      
      This change does not alter the user visible error logic as we ignore
      msg_namelen as long as msg_name is NULL.
      
      Also remove two unnecessary curly brackets in ___sys_recvmsg and change
      comments to netdev style.
      
      Cc: David Miller <davem@davemloft.net>
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3d33426
  27. Oct 19, 2013
    • Daniel Borkmann's avatar
      net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race · 90c6bd34
      Daniel Borkmann authored
      
      In the case of credentials passing in unix stream sockets (dgram
      sockets seem not affected), we get a rather sparse race after
      commit 16e57262 ("af_unix: dont send SCM_CREDENTIALS by default").
      
      We have a stream server on receiver side that requests credential
      passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED
      on each spawned/accepted socket on server side to 1 first (as it's
      not inherited), it can happen that in the time between accept() and
      setsockopt() we get interrupted, the sender is being scheduled and
      continues with passing data to our receiver. At that time SO_PASSCRED
      is neither set on sender nor receiver side, hence in cmsg's
      SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534
      (== overflow{u,g}id) instead of what we actually would like to see.
      
      On the sender side, here nc -U, the tests in maybe_add_creds()
      invoked through unix_stream_sendmsg() would fail, as at that exact
      time, as mentioned, the sender has neither SO_PASSCRED on his side
      nor sees it on the server side, and we have a valid 'other' socket
      in place. Thus, sender believes it would just look like a normal
      connection, not needing/requesting SO_PASSCRED at that time.
      
      As reverting 16e57262 would not be an option due to the significant
      performance regression reported when having creds always passed,
      one way/trade-off to prevent that would be to set SO_PASSCRED on
      the listener socket and allow inheriting these flags to the spawned
      socket on server side in accept(). It seems also logical to do so
      if we'd tell the listener socket to pass those flags onwards, and
      would fix the race.
      
      Before, strace:
      
      recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
              msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
              cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}},
              msg_flags=0}, 0) = 5
      
      After, strace:
      
      recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
              msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
              cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}},
              msg_flags=0}, 0) = 5
      
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90c6bd34
  28. Aug 12, 2013
  29. Aug 10, 2013
    • Eric Dumazet's avatar
      net: attempt high order allocations in sock_alloc_send_pskb() · 28d64271
      Eric Dumazet authored
      
      Adding paged frags skbs to af_unix sockets introduced a performance
      regression on large sends because of additional page allocations, even
      if each skb could carry at least 100% more payload than before.
      
      We can instruct sock_alloc_send_pskb() to attempt high order
      allocations.
      
      Most of the time, it does a single page allocation instead of 8.
      
      I added an additional parameter to sock_alloc_send_pskb() to
      let other users to opt-in for this new feature on followup patches.
      
      Tested:
      
      Before patch :
      
      $ netperf -t STREAM_STREAM
      STREAM STREAM TEST
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       2304  212992  212992    10.00    46861.15
      
      After patch :
      
      $ netperf -t STREAM_STREAM
      STREAM STREAM TEST
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       2304  212992  212992    10.00    57981.11
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28d64271
    • Eric Dumazet's avatar
      af_unix: improve STREAM behavior with fragmented memory · e370a723
      Eric Dumazet authored
      
      unix_stream_sendmsg() currently uses order-2 allocations,
      and we had numerous reports this can fail.
      
      The __GFP_REPEAT flag present in sock_alloc_send_pskb() is
      not helping.
      
      This patch extends the work done in commit eb6a2481
      ("af_unix: reduce high order page allocations) for
      datagram sockets.
      
      This opens the possibility of zero copy IO (splice() and
      friends)
      
      The trick is to not use skb_pull() anymore in recvmsg() path,
      and instead add a @consumed field in UNIXCB() to track amount
      of already read payload in the skb.
      
      There is a performance regression for large sends
      because of extra page allocations that will be addressed
      in a follow-up patch, allowing sock_alloc_send_pskb()
      to attempt high order page allocations.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e370a723
  30. May 12, 2013
    • Colin Cross's avatar
      af_unix: use freezable blocking calls in read · 2b15af6f
      Colin Cross authored
      
      Avoid waking up every thread sleeping in read call on an AF_UNIX
      socket during suspend and resume by calling a freezable blocking
      call.  Previous patches modified the freezer to avoid sending
      wakeups to threads that are blocked in freezable blocking calls.
      
      This call was selected to be converted to a freezable call because
      it doesn't hold any locks or release any resources when interrupted
      that might be needed by another freezing task or a kernel driver
      during suspend, and is a common site where idle userspace tasks are
      blocked.
      
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarColin Cross <ccross@android.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2b15af6f
  31. Apr 30, 2013
  32. Apr 07, 2013
  33. Apr 05, 2013
  34. Apr 02, 2013
  35. Mar 31, 2013
    • Keller, Jacob E's avatar
      net: add option to enable error queue packets waking select · 7d4c04fc
      Keller, Jacob E authored
      
      Currently, when a socket receives something on the error queue it only wakes up
      the socket on select if it is in the "read" list, that is the socket has
      something to read. It is useful also to wake the socket if it is in the error
      list, which would enable software to wait on error queue packets without waking
      up for regular data on the socket. The main use case is for receiving
      timestamped transmit packets which return the timestamp to the socket via the
      error queue. This enables an application to select on the socket for the error
      queue only instead of for the regular traffic.
      
      -v2-
      * Added the SO_SELECT_ERR_QUEUE socket option to every architechture specific file
      * Modified every socket poll function that checks error queue
      
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Cc: Jeffrey Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Matthew Vick <matthew.vick@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d4c04fc
  36. Mar 26, 2013
Loading