- Dec 17, 2014
-
-
Eliad Peller authored
If there are no interfaces up, there is no reason to continue the reconfig flow. The current code might end up calling driver callbacks (e.g. resume(), reconfig_complete()) while the driver is already stopped. Signed-off-by:
Eliad Peller <eliadx.peller@intel.com> Signed-off-by:
Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Luciano Coelho authored
The ht_oper variable is assigned a value, but never used in ieee80211_parse_ch_switch_ie(). Remove it. Signed-off-by:
Luciano Coelho <luciano.coelho@intel.com> Signed-off-by:
Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Eliad Peller authored
Commit e1a0c6b3 ("mac80211: stop toggling IEEE80211_HT_CAP_SUP_WIDTH_20_40") mistakenly removed the actual update of sta->sta.bandwidth. Refactor ieee80211_sta_cur_vht_bw() into multiple functions (calculate caps-bw and chandef-bw separately, and min them with cur_max_bandwidth). On ht chanwidth action frame set only cur_max_bandwidth (according to the sta capabilities) and recalc the sta bw. Signed-off-by:
Eliad Peller <eliadx.peller@intel.com> Signed-off-by:
Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Moshe Benji authored
In beacons, handle the Country IE even if no Power Constraint IE is present, and, capability wise, also in case that the Radio Measurements capability is enabled. In cases where the Country IE should be handled and that the Power Constraint IE is not present, the Country IE alone will set the power limit (and not both Country and Power Constraint IEs). Signed-off-by:
Moshe Benji <moshe.benji@intel.com> Signed-off-by:
Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Chaya Rachel Ivgi authored
HT override configurations was ignored when choosing the channel (until now, the override configuration affected only the capabilities shown in the IEs). The override configurations received only on association time, so in this case we should determine the channel again. Signed-off-by:
Chaya Rachel Ivgi <chaya.rachel.ivgi@intel.com> Signed-off-by:
Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Arik Nemtsov authored
The custom-reg handling function can currently only add flags to a given channel. This results in stale flags being left applied. In some cases a channel was disabled and even the orig_flags were changed to reflect this. Previously the API was designed for a single invocation before wiphy registration, so this didn't matter. The previous approach doesn't scale well to self-managed regulatory devices, particularly when a more permissive regdom is applied after a restrictive one. Signed-off-by:
Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Arik Nemtsov authored
If a device has self-managed regulatory, insist on returning the wiphy specific regdomain if a wiphy-idx is specified. The global regdomain is meaningless for such devices. Also add an attribute for self-managed devices, so usermode can distinguish them as such. Signed-off-by:
Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by:
Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Jonathan Doron authored
Add a new regulatory flag that allows a driver to manage regdomain changes/updates for its own wiphy. A self-managed wiphys only employs regulatory information obtained from the FW and driver and does not use other cfg80211 sources like beacon-hints, country-code IEs and hints from other devices on the same system. Conversely, a self-managed wiphy does not share its regulatory hints with other devices in the system. If a system contains several devices, one or more of which are self-managed, there might be contradictory regulatory settings between them. Usage of flag is generally discouraged. Only use it if the FW/driver is incompatible with non-locally originated hints. A new API lets the driver send a complete regdomain, to be applied on its wiphy only. After a wiphy-specific regdomain change takes place, usermode will get a new type of change notification. The regulatory core also takes care enforce regulatory restrictions, in case some interfaces are on forbidden channels. Signed-off-by:
Jonathan Doron <jonathanx.doron@intel.com> Signed-off-by:
Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by:
Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Arik Nemtsov authored
If a wiphy-idx is specified, the kernel will return the wiphy specific regdomain, if such exists. Otherwise return the global regdom. When no wiphy-idx is specified, return the global regdomain as well as all wiphy-specific regulatory domains in the system, via a new nested list of attributes. Add a new attribute for each wiphy-specific regdomain, for usermode to identify it as such. Signed-off-by:
Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Nishikawa, Kenzoh authored
Instead of sending peer candidate events just once, send them as long as the peer remains in the LISTEN state in the peering state machine, when userspace is implementing the peering manager. Userspace may silence the events from a peer by progressing the state machine or by setting the link state to BLOCKED. Fixes the problem that a mesh peering process won't be fired again after the previous first peering trial fails due to like air propagation error if the peering is managed by user space such as wpa_supplicant. This patch works with another patch for wpa_supplicant described here which fires a peering process again triggered by the notice from kernel. http://lists.shmoo.com/pipermail/hostap/2014-November/031235.html Signed-off-by:
Kenzoh Nishikawa <Kenzoh.Nishikawa@jp.sony.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Luciano Coelho authored
The call to cfg80211_ch_switch_notify() should be at the end of the ieee80211_chswitch_post_beacon() function, because it should only be sent if everything succeeded. Fixes: d04b5ac9 ("cfg80211/mac80211: allow any interface to send channel switch notifications") Signed-off-by:
Luciano Coelho <luciano.coelho@intel.com> Signed-off-by:
Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Janusz Dziedzic authored
When using IBSS in HT mode, we always get NSS=1 in rc_update callback. Force NSS recalculation when rates updated and notify driver that NSS changed. Signed-off-by:
Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
- Dec 15, 2014
-
-
Johannes Berg authored
In order to let drivers have more dynamic U-APSD support, move the enablement flag to the virtual interface driver flags. This lets drivers not only set it up differently for different interfaces, but also enable/disable on the fly if needed. Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Johannes Berg authored
The power level might have been set, but as the interface was idle it might not have taken effect yet. Ask the driver to check the power level when starting up an AP so that in this case the correct power level is used in case the device/driver can only set it when the interface is actually active. Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
- Dec 12, 2014
-
-
Sujith Manoharan authored
Since multicast frames are marked as no-ack, using IEEE80211_TX_STAT_ACK to check if they have been successfully transmitted by the driver is incorrect since a driver can choose to ignore transmission status for no-ack frames. This results in incorrect accounting for such frames. To fix this issue, this patch introduces a new flag that can be used by drivers to indicate error-free transmission of no-ack frames. Signed-off-by:
Sujith Manoharan <c_manoha@qca.qualcomm.com> [add a note about not setting the flag for non-no-ack frames] Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Sujith Manoharan authored
Move IEEE80211_TX_CTL_PS_RESPONSE to info->control.flags since this is used only in the TX path (by ath9k). This frees up a bit which can be used for other purposes. Signed-off-by:
Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Vadim Kochan authored
It allows to identify the wlan kind of device for the user application, e.g.: # ip -d link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 2: enp0s25: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff promiscuity 0 3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff promiscuity 0 wlan Signed-off-by:
Vadim Kochan <vadim4j@gmail.com> Acked-by:
Marcel Holtmann <marcel@holtmann.org> [make wireless_link_ops const] Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Johannes Berg authored
The code assigns a constant value (a pointer to a static variable) to an RCU pointer, which results in a sparse warning: reg.c:112:10: warning: cast adds address space to expression (<asn:4>) Suppress this warning by using __force. Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
- Dec 11, 2014
-
-
Alexei Starovoitov authored
0day robot reported the following crash: [ 21.233581] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007 [ 21.234709] IP: [<ffffffff8156ebda>] sk_attach_bpf+0x39/0xc2 It's due to bpf_prog_get() returning ERR_PTR. Check it properly. Reported-by:
Fengguang Wu <fengguang.wu@intel.com> Fixes: 89aa0758 ("net: sock: allow eBPF programs to be attached to sockets") Signed-off-by:
Alexei Starovoitov <ast@plumgrid.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Gu Zheng authored
Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating cmsghdr from msghdr, just cleanup. Signed-off-by:
Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Johannes Weiner authored
Memory is internally accounted in bytes, using spinlock-protected 64-bit counters, even though the smallest accounting delta is a page. The counter interface is also convoluted and does too many things. Introduce a new lockless word-sized page counter API, then change all memory accounting over to it. The translation from and to bytes then only happens when interfacing with userspace. The removed locking overhead is noticable when scaling beyond the per-cpu charge caches - on a 4-socket machine with 144-threads, the following test shows the performance differences of 288 memcgs concurrently running a page fault benchmark: vanilla: 18631648.500498 task-clock (msec) # 140.643 CPUs utilized ( +- 0.33% ) 1,380,638 context-switches # 0.074 K/sec ( +- 0.75% ) 24,390 cpu-migrations # 0.001 K/sec ( +- 8.44% ) 1,843,305,768 page-faults # 0.099 M/sec ( +- 0.00% ) 50,134,994,088,218 cycles # 2.691 GHz ( +- 0.33% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 8,049,712,224,651 instructions # 0.16 insns per cycle ( +- 0.04% ) 1,586,970,584,979 branches # 85.176 M/sec ( +- 0.05% ) 1,724,989,949 branch-misses # 0.11% of all branches ( +- 0.48% ) 132.474343877 seconds time elapsed ( +- 0.21% ) lockless: 12195979.037525 task-clock (msec) # 133.480 CPUs utilized ( +- 0.18% ) 832,850 context-switches # 0.068 K/sec ( +- 0.54% ) 15,624 cpu-migrations # 0.001 K/sec ( +- 10.17% ) 1,843,304,774 page-faults # 0.151 M/sec ( +- 0.00% ) 32,811,216,801,141 cycles # 2.690 GHz ( +- 0.18% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 9,999,265,091,727 instructions # 0.30 insns per cycle ( +- 0.10% ) 2,076,759,325,203 branches # 170.282 M/sec ( +- 0.12% ) 1,656,917,214 branch-misses # 0.08% of all branches ( +- 0.55% ) 91.369330729 seconds time elapsed ( +- 0.45% ) On top of improved scalability, this also gets rid of the icky long long types in the very heart of memcg, which is great for 32 bit and also makes the code a lot more readable. Notable differences between the old and new API: - res_counter_charge() and res_counter_charge_nofail() become page_counter_try_charge() and page_counter_charge() resp. to match the more common kernel naming scheme of try_do()/do() - res_counter_uncharge_until() is only ever used to cancel a local counter and never to uncharge bigger segments of a hierarchy, so it's replaced by the simpler page_counter_cancel() - res_counter_set_limit() is replaced by page_counter_limit(), which expects its callers to serialize against themselves - res_counter_memparse_write_strategy() is replaced by page_counter_limit(), which rounds down to the nearest page size - rather than up. This is more reasonable for explicitely requested hard upper limits. - to keep charging light-weight, page_counter_try_charge() charges speculatively, only to roll back if the result exceeds the limit. Because of this, a failing bigger charge can temporarily lock out smaller charges that would otherwise succeed. The error is bounded to the difference between the smallest and the biggest possible charge size, so for memcg, this means that a failing THP charge can send base page charges into reclaim upto 2MB (4MB) before the limit would have been reached. This should be acceptable. [akpm@linux-foundation.org: add includes for WARN_ON_ONCE and memparse] [akpm@linux-foundation.org: add includes for WARN_ON_ONCE, memparse, strncmp, and PAGE_SIZE] Signed-off-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Michal Hocko <mhocko@suse.cz> Acked-by:
Vladimir Davydov <vdavydov@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Dec 10, 2014
-
-
Joe Perches authored
Making things const is a good thing. (x86-64 defconfig with all irda) $ size net/irda/built-in.o* text data bss dec hex filename 109276 1868 244 111388 1b31c net/irda/built-in.o.new 108828 2316 244 111388 1b31c net/irda/built-in.o.old Signed-off-by:
Joe Perches <joe@perches.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Joe Perches authored
It's better when function pointer arrays aren't modifiable. Net change: $ size net/llc/built-in.o.* text data bss dec hex filename 61193 12758 1344 75295 1261f net/llc/built-in.o.new 47113 27030 1344 75487 126df net/llc/built-in.o.old Signed-off-by:
Joe Perches <joe@perches.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Joe Perches authored
It's better when function pointer arrays aren't modifiable. Net change from original: $ size net/llc/built-in.o.* text data bss dec hex filename 61065 12886 1344 75295 1261f net/llc/built-in.o.new 47113 27030 1344 75487 126df net/llc/built-in.o.old Signed-off-by:
Joe Perches <joe@perches.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Joe Perches authored
It's better when function pointer arrays aren't modifiable. Signed-off-by:
Joe Perches <joe@perches.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
This patch effectively reverts commit 500f8087 ("net: ovs: use CRC32 accelerated flow hash if available"), and other remaining arch_fast_hash() users such as from nfsd via commit 6282cd56 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.") where it has been used as a hash function for bloom filtering. While we think that these users are actually not much of concern, it has been requested to remove the arch_fast_hash() library bits that arose from [1] entirely as per recent discussion [2]. The main argument is that using it as a hash may introduce bias due to its linearity (see avalanche criterion) and thus makes it less clear (though we tried to document that) when this security/performance trade-off is actually acceptable for a general purpose library function. Lets therefore avoid any further confusion on this matter and remove it to prevent any future accidental misuse of it. For the time being, this is going to make hashing of flow keys a bit more expensive in the ovs case, but future work could reevaluate a different hashing discipline. [1] https://patchwork.ozlabs.org/patch/299369/ [2] https://patchwork.ozlabs.org/patch/418756/ Cc: Neil Brown <neilb@suse.de> Cc: Francesco Fusco <fusco@ntop.org> Cc: Jesse Gross <jesse@nicira.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by:
Daniel Borkmann <dborkman@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
For netlink, we shouldn't be using arch_fast_hash() as a hashing discipline, but rather jhash() instead. Since netlink sockets can be opened by any user, a local attacker would be able to easily create collisions with the DPDK-derived arch_fast_hash(), which trades off performance for security by using crc32 CPU instructions on x86_64. While it might have a legimite use case in other places, it should be avoided in netlink context, though. As rhashtable's API is very flexible, we could later on still decide on other hashing disciplines, if legitimate. Reference: http://thread.gmane.org/gmane.linux.kernel/1844123 Fixes: e341694e ("netlink: Convert netlink_lookup() to use RCU protected hash table") Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by:
Daniel Borkmann <dborkman@redhat.com> Acked-by:
Thomas Graf <tgraf@suug.ch> Acked-by:
Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Richard Alpe authored
commit 908344cd ("tipc: fix bug in multicast congestion handling") introduced a race in the broadcast link wakeup functionality. This patch eliminates this broadcast link wakeup race caused by operation on the wakeup list without proper locking. If this race hit and corrupted the list all subsequent wakeup messages would be lost, resulting in a considerable memory leak. Signed-off-by:
Richard Alpe <richard.alpe@ericsson.com> Signed-off-by:
Erik Hugne <erik.hugne@ericsson.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
This change pulls the core functionality out of __netdev_alloc_skb and places them in a new function named __alloc_rx_skb. The reason for doing this is to make these bits accessible to a new function __napi_alloc_skb. In addition __alloc_rx_skb now has a new flags value that is used to determine which page frag pool to allocate from. If the SKB_ALLOC_NAPI flag is set then the NAPI pool is used. The advantage of this is that we do not have to use local_irq_save/restore when accessing the NAPI pool from NAPI context. In my test setup I saw at least 11ns of savings using the napi_alloc_skb function versus the netdev_alloc_skb function, most of this being due to the fact that we didn't have to call local_irq_save/restore. The main use case for napi_alloc_skb would be for things such as copybreak or page fragment based receive paths where an skb is allocated after the data has been received instead of before. Signed-off-by:
Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
This patch splits the netdev_alloc_frag function up so that it can be used on one of two page frag pools instead of being fixed on the netdev_alloc_cache. By doing this we can add a NAPI specific function __napi_alloc_frag that accesses a pool that is only used from softirq context. The advantage to this is that we do not need to call local_irq_save/restore which can be a significant savings. I also took the opportunity to refactor the core bits that were placed in __alloc_page_frag. First I updated the allocation to do either a 32K allocation or an order 0 page. This is based on the changes in commmit d9b2938a where it was found that latencies could be reduced in case of failures. Then I also rewrote the logic to work from the end of the page to the start. By doing this the size value doesn't have to be used unless we have run out of space for page fragments. Finally I cleaned up the atomic bits so that we just do an atomic_sub_and_test and if that returns true then we set the page->_count via an atomic_set. This way we can remove the extra conditional for the atomic_read since it would have led to an atomic_inc in the case of success anyway. Signed-off-by:
Alexander Duyck <alexander.h.duyck@redhat.com> Acked-by:
Alexei Starovoitov <ast@plumgrid.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
To cancel nesting, this function is more convenient. Signed-off-by:
Jiri Pirko <jiri@resnulli.us> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Valdis Kletnieks authored
commit 46e5da40 (net: qdisc: use rcu prefix and silence sparse warnings) triggers a spurious warning: net/sched/sch_fq_codel.c:97 suspicious rcu_dereference_check() usage! The code should be using the _bh variant of rcu_dereference. Signed-off-by:
Valdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by:
Eric Dumazet <edumazet@google.com> Acked-by:
John Fastabend <john.r.fastabend@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
When I cooked commit c3658e8d ("tcp: fix possible NULL dereference in tcp_vX_send_reset()") I missed other spots we could deref a NULL skb_dst(skb) Again, if a socket is provided, we do not need skb_dst() to get a pointer to network namespace : sock_net(sk) is good enough. Reported-by:
Dann Frazier <dann.frazier@canonical.com> Bisected-by:
Dann Frazier <dann.frazier@canonical.com> Tested-by:
Dann Frazier <dann.frazier@canonical.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Fixes: ca777eff ("tcp: remove dst refcount false sharing for prequeue mode") Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Dec 09, 2014
-
-
Ying Xue authored
The commit fb9962f3 ("tipc: ensure all name sequences are properly protected with its lock") involves below errors: net/tipc/name_table.c:980 tipc_purge_publications() error: double lock 'spin_lock:&seq->lock' Reported-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Ying Xue <ying.xue@windriver.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
Remove use of 'swdev' mode in rocker. rocker dev offloads can use the BRIDGE_FLAGS_SELF to indicate offload to hardware. Signed-off-by:
Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by:
Scott Feldman <sfeldma@gmail.com> Signed-off-by:
Jiri Pirko <jiri@resnulli.us> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Li RongQing authored
the queue length of sd->input_pkt_queue has been put into qlen, and impossible to change, since hold the lock Signed-off-by:
Li RongQing <roy.qing.li@gmail.com> Acked-by:
Eric Dumazet <edumazet@google.com> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Commit 95bd09eb ("tcp: TSO packets automatic sizing") tried to control TSO size, but did this at the wrong place (sendmsg() time) At sendmsg() time, we might have a pessimistic view of flow rate, and we end up building very small skbs (with 2 MSS per skb). This is bad because : - It sends small TSO packets even in Slow Start where rate quickly increases. - It tends to make socket write queue very big, increasing tcp_ack() processing time, but also increasing memory needs, not necessarily accounted for, as fast clones overhead is currently ignored. - Lower GRO efficiency and more ACK packets. Servers with a lot of small lived connections suffer from this. Lets instead fill skbs as much as possible (64KB of payload), but split them at xmit time, when we have a precise idea of the flow rate. skb split is actually quite efficient. Patch looks bigger than necessary, because TCP Small Queue decision now has to take place after the eventual split. As Neal suggested, introduce a new tcp_tso_autosize() helper, so that tcp_tso_should_defer() can be synchronized on same goal. Rename tp->xmit_size_goal_segs to tp->gso_segs, as this variable contains number of mss that we can put in GSO packet, and is not related to the autosizing goal anymore. Tested: 40 ms rtt link nstat >/dev/null netperf -H remote -l -2000000 -- -s 1000000 nstat | egrep "IpInReceives|IpOutRequests|TcpOutSegs|IpExtOutOctets" Before patch : Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/s 87380 2000000 2000000 0.36 44.22 IpInReceives 600 0.0 IpOutRequests 599 0.0 TcpOutSegs 1397 0.0 IpExtOutOctets 2033249 0.0 After patch : Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 2000000 2000000 0.36 44.27 IpInReceives 221 0.0 IpOutRequests 232 0.0 TcpOutSegs 1397 0.0 IpExtOutOctets 2013953 0.0 Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Neal Cardwell <ncardwell@google.com> Acked-by:
Yuchung Cheng <ycheng@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Al Viro authored
no callers other than itself. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... making both non-draining. That means that tcp_recvmsg() becomes non-draining. And _that_ would break iscsit_do_rx_data() unless we a) make sure tcp_recvmsg() is uniformly non-draining (it is) b) make sure it copes with arbitrary (including shifted) iov_iter (it does, all it uses is iov_iter primitives) c) make iscsit_do_rx_data() initialize ->msg_iter only once. Fortunately, (c) is doable with minimal work and we are rid of one the two places where kernel send/recvmsg users would be unhappy with non-draining behaviour. Actually, that makes all but one of ->recvmsg() instances iov_iter-clean. The exception is skcipher_recvmsg() and it also isn't hard to convert to primitives (iov_iter_get_pages() is needed there). That'll wait a bit - there's some interplay with ->sendmsg() path for that one. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Just use copy_from_iter(). That's what this method is trying to do in all cases, in a very convoluted fashion. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-