Skip to content
Snippets Groups Projects
  1. Dec 17, 2014
  2. Dec 15, 2014
  3. Dec 12, 2014
  4. Dec 11, 2014
    • Alexei Starovoitov's avatar
      net: sock: fix access via invalid file descriptor · 198bf1b0
      Alexei Starovoitov authored
      
      0day robot reported the following crash:
      [   21.233581] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007
      [   21.234709] IP: [<ffffffff8156ebda>] sk_attach_bpf+0x39/0xc2
      
      It's due to bpf_prog_get() returning ERR_PTR.
      Check it properly.
      
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Fixes: 89aa0758 ("net: sock: allow eBPF programs to be attached to sockets")
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      198bf1b0
    • Gu Zheng's avatar
      net: introduce helper macro for_each_cmsghdr · f95b414e
      Gu Zheng authored
      
      Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating
      cmsghdr from msghdr, just cleanup.
      
      Signed-off-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f95b414e
    • Johannes Weiner's avatar
      mm: memcontrol: lockless page counters · 3e32cb2e
      Johannes Weiner authored
      
      Memory is internally accounted in bytes, using spinlock-protected 64-bit
      counters, even though the smallest accounting delta is a page.  The
      counter interface is also convoluted and does too many things.
      
      Introduce a new lockless word-sized page counter API, then change all
      memory accounting over to it.  The translation from and to bytes then only
      happens when interfacing with userspace.
      
      The removed locking overhead is noticable when scaling beyond the per-cpu
      charge caches - on a 4-socket machine with 144-threads, the following test
      shows the performance differences of 288 memcgs concurrently running a
      page fault benchmark:
      
      vanilla:
      
         18631648.500498      task-clock (msec)         #  140.643 CPUs utilized            ( +-  0.33% )
               1,380,638      context-switches          #    0.074 K/sec                    ( +-  0.75% )
                  24,390      cpu-migrations            #    0.001 K/sec                    ( +-  8.44% )
           1,843,305,768      page-faults               #    0.099 M/sec                    ( +-  0.00% )
      50,134,994,088,218      cycles                    #    2.691 GHz                      ( +-  0.33% )
         <not supported>      stalled-cycles-frontend
         <not supported>      stalled-cycles-backend
       8,049,712,224,651      instructions              #    0.16  insns per cycle          ( +-  0.04% )
       1,586,970,584,979      branches                  #   85.176 M/sec                    ( +-  0.05% )
           1,724,989,949      branch-misses             #    0.11% of all branches          ( +-  0.48% )
      
           132.474343877 seconds time elapsed                                          ( +-  0.21% )
      
      lockless:
      
         12195979.037525      task-clock (msec)         #  133.480 CPUs utilized            ( +-  0.18% )
                 832,850      context-switches          #    0.068 K/sec                    ( +-  0.54% )
                  15,624      cpu-migrations            #    0.001 K/sec                    ( +- 10.17% )
           1,843,304,774      page-faults               #    0.151 M/sec                    ( +-  0.00% )
      32,811,216,801,141      cycles                    #    2.690 GHz                      ( +-  0.18% )
         <not supported>      stalled-cycles-frontend
         <not supported>      stalled-cycles-backend
       9,999,265,091,727      instructions              #    0.30  insns per cycle          ( +-  0.10% )
       2,076,759,325,203      branches                  #  170.282 M/sec                    ( +-  0.12% )
           1,656,917,214      branch-misses             #    0.08% of all branches          ( +-  0.55% )
      
            91.369330729 seconds time elapsed                                          ( +-  0.45% )
      
      On top of improved scalability, this also gets rid of the icky long long
      types in the very heart of memcg, which is great for 32 bit and also makes
      the code a lot more readable.
      
      Notable differences between the old and new API:
      
      - res_counter_charge() and res_counter_charge_nofail() become
        page_counter_try_charge() and page_counter_charge() resp. to match
        the more common kernel naming scheme of try_do()/do()
      
      - res_counter_uncharge_until() is only ever used to cancel a local
        counter and never to uncharge bigger segments of a hierarchy, so
        it's replaced by the simpler page_counter_cancel()
      
      - res_counter_set_limit() is replaced by page_counter_limit(), which
        expects its callers to serialize against themselves
      
      - res_counter_memparse_write_strategy() is replaced by
        page_counter_limit(), which rounds down to the nearest page size -
        rather than up.  This is more reasonable for explicitely requested
        hard upper limits.
      
      - to keep charging light-weight, page_counter_try_charge() charges
        speculatively, only to roll back if the result exceeds the limit.
        Because of this, a failing bigger charge can temporarily lock out
        smaller charges that would otherwise succeed.  The error is bounded
        to the difference between the smallest and the biggest possible
        charge size, so for memcg, this means that a failing THP charge can
        send base page charges into reclaim upto 2MB (4MB) before the limit
        would have been reached.  This should be acceptable.
      
      [akpm@linux-foundation.org: add includes for WARN_ON_ONCE and memparse]
      [akpm@linux-foundation.org: add includes for WARN_ON_ONCE, memparse, strncmp, and PAGE_SIZE]
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3e32cb2e
  5. Dec 10, 2014
  6. Dec 09, 2014
    • Ying Xue's avatar
      tipc: avoid double lock 'spin_lock:&seq->lock' · 023160bc
      Ying Xue authored
      
      The commit fb9962f3 ("tipc: ensure all name sequences are properly
      protected with its lock") involves below errors:
      
      net/tipc/name_table.c:980 tipc_purge_publications() error: double lock 'spin_lock:&seq->lock'
      
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      023160bc
    • Roopa Prabhu's avatar
      rocker: remove swdev mode · 1d460b98
      Roopa Prabhu authored
      
      Remove use of 'swdev' mode in rocker. rocker dev offloads
      can use the BRIDGE_FLAGS_SELF to indicate offload to hardware.
      
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d460b98
    • Li RongQing's avatar
      net: avoid to call skb_queue_len again · e008f3f0
      Li RongQing authored
      
      the queue length of sd->input_pkt_queue has been put into qlen,
      and impossible to change, since hold the lock
      
      Signed-off-by: default avatarLi RongQing <roy.qing.li@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e008f3f0
    • Eric Dumazet's avatar
      tcp: refine TSO autosizing · 605ad7f1
      Eric Dumazet authored
      
      Commit 95bd09eb ("tcp: TSO packets automatic sizing") tried to
      control TSO size, but did this at the wrong place (sendmsg() time)
      
      At sendmsg() time, we might have a pessimistic view of flow rate,
      and we end up building very small skbs (with 2 MSS per skb).
      
      This is bad because :
      
       - It sends small TSO packets even in Slow Start where rate quickly
         increases.
       - It tends to make socket write queue very big, increasing tcp_ack()
         processing time, but also increasing memory needs, not necessarily
         accounted for, as fast clones overhead is currently ignored.
       - Lower GRO efficiency and more ACK packets.
      
      Servers with a lot of small lived connections suffer from this.
      
      Lets instead fill skbs as much as possible (64KB of payload), but split
      them at xmit time, when we have a precise idea of the flow rate.
      skb split is actually quite efficient.
      
      Patch looks bigger than necessary, because TCP Small Queue decision now
      has to take place after the eventual split.
      
      As Neal suggested, introduce a new tcp_tso_autosize() helper, so that
      tcp_tso_should_defer() can be synchronized on same goal.
      
      Rename tp->xmit_size_goal_segs to tp->gso_segs, as this variable
      contains number of mss that we can put in GSO packet, and is not
      related to the autosizing goal anymore.
      
      Tested:
      
      40 ms rtt link
      
      nstat >/dev/null
      netperf -H remote -l -2000000 -- -s 1000000
      nstat | egrep "IpInReceives|IpOutRequests|TcpOutSegs|IpExtOutOctets"
      
      Before patch :
      
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/s
      
       87380 2000000 2000000    0.36         44.22
      IpInReceives                    600                0.0
      IpOutRequests                   599                0.0
      TcpOutSegs                      1397               0.0
      IpExtOutOctets                  2033249            0.0
      
      After patch :
      
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380 2000000 2000000    0.36       44.27
      IpInReceives                    221                0.0
      IpOutRequests                   232                0.0
      TcpOutSegs                      1397               0.0
      IpExtOutOctets                  2013953            0.0
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      605ad7f1
    • Al Viro's avatar
      skb_copy_datagram_iovec() can die · d3a9632f
      Al Viro authored
      
      no callers other than itself.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d3a9632f
    • Al Viro's avatar
      switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives · e5a4b0bb
      Al Viro authored
      
      ... making both non-draining.  That means that tcp_recvmsg() becomes
      non-draining.  And _that_ would break iscsit_do_rx_data() unless we
      	a) make sure tcp_recvmsg() is uniformly non-draining (it is)
      	b) make sure it copes with arbitrary (including shifted)
      iov_iter (it does, all it uses is iov_iter primitives)
      	c) make iscsit_do_rx_data() initialize ->msg_iter only once.
      
      Fortunately, (c) is doable with minimal work and we are rid of one
      the two places where kernel send/recvmsg users would be unhappy with
      non-draining behaviour.
      
      Actually, that makes all but one of ->recvmsg() instances iov_iter-clean.
      The exception is skcipher_recvmsg() and it also isn't hard to convert
      to primitives (iov_iter_get_pages() is needed there).  That'll wait
      a bit - there's some interplay with ->sendmsg() path for that one.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e5a4b0bb
    • Al Viro's avatar
      first fruits - kill l2cap ->memcpy_fromiovec() · 17836394
      Al Viro authored
      
      Just use copy_from_iter().  That's what this method is trying to do
      in all cases, in a very convoluted fashion.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      17836394
Loading