commit cbc890891ddaf0240ad669dd9f0e48c599ff3d63 Author: Greg Kroah-Hartman Date: Tue Sep 29 19:26:41 2015 +0200 Linux 4.1.9 commit c3a0355bddf35457563a0d64e91f9dfca7c80280 Author: Daniel Axtens Date: Fri Aug 14 17:41:24 2015 +1000 cxl: Don't remove AFUs/vPHBs in cxl_reset commit 4e1efb403c1c016ae831bd9988a7d2e5e0af41a0 upstream. If the driver doesn't participate in EEH, the AFUs will be removed by cxl_remove, which will be invoked by EEH. If the driver does particpate in EEH, the vPHB needs to stick around so that the it can particpate. In both cases, we shouldn't remove the AFU/vPHB. Reviewed-by: Cyril Bur Signed-off-by: Daniel Axtens Signed-off-by: Michael Ellerman Reported-by: Guenter Roeck Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit e55ffaf457bcc8ec4e9d9f56f955971f834d65b3 Author: Andy Whitcroft Date: Thu Aug 13 20:49:01 2015 +0100 ipv4: off-by-one in continuation handling in /proc/net/route [ Upstream commit 25b97c016b26039982daaa2c11d83979f93b71ab ] When generating /proc/net/route we emit a header followed by a line for each route. When a short read is performed we will restart this process based on the open file descriptor. When calculating the start point we fail to take into account that the 0th entry is the header. This leads us to skip the first entry when doing a continuation read. This can be easily seen with the comparison below: while read l; do echo "$l"; done A cat /proc/net/route >B diff -bu A B | grep '^[+-]' On my example machine I have approximatly 10KB of route output. There we see the very first non-title element is lost in the while read case, and an entry around the 8K mark in the cat case: +wlan0 00000000 02021EAC 0003 0 0 400 00000000 0 0 0 -tun1 00C0AC0A 00000000 0001 0 0 950 00C0FFFF 0 0 0 Fix up the off-by-one when reaquiring position on continuation. Fixes: 8be33e955cb9 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf") BugLink: http://bugs.launchpad.net/bugs/1483440 Acked-by: Alexander Duyck Signed-off-by: Andy Whitcroft Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b21ee342590aa41e21aa0196bff5af592cc349d0 Author: Florian Fainelli Date: Sat Aug 8 12:58:57 2015 -0700 net: dsa: Do not override PHY interface if already configured [ Upstream commit 211c504a444710b1d8ce3431ac19f2578602ca27 ] In case we need to divert reads/writes using the slave MII bus, we may have already fetched a valid PHY interface property from Device Tree, and that mode is used by the PHY driver to make configuration decisions. If we could not fetch the "phy-mode" property, we will assign p->phy_interface to PHY_INTERFACE_MODE_NA, such that we can actually check for that condition as to whether or not we should override the interface value. Fixes: 19334920eaf7 ("net: dsa: Set valid phy interface type") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0c1122ae6107b01e50bb18fa40eb44e7fa492fbc Author: Eric Dumazet Date: Mon Aug 10 09:09:13 2015 -0700 inet: fix races with reqsk timers [ Upstream commit 2235f2ac75fd2501c251b0b699a9632e80239a6d ] reqsk_queue_destroy() and reqsk_queue_unlink() should use del_timer_sync() instead of del_timer() before calling reqsk_put(), otherwise we could free a req still used by another cpu. But before doing so, reqsk_queue_destroy() must release syn_wait_lock spinlock or risk a dead lock, as reqsk_timer_handler() might need to take this same spinlock from reqsk_queue_unlink() (called from inet_csk_reqsk_queue_drop()) Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d36f8434da8c333aa3837cf421a52f3835642759 Author: Eric Dumazet Date: Mon Aug 10 15:07:34 2015 -0700 inet: fix possible request socket leak [ Upstream commit 3257d8b12f954c462d29de6201664a846328a522 ] In commit b357a364c57c9 ("inet: fix possible panic in reqsk_queue_unlink()"), I missed fact that tcp_check_req() can return the listener socket in one case, and that we must release the request socket refcount or we leak it. Tested: Following packetdrill test template shows the issue 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 2920 +0 > S. 0:0(0) ack 1 +.002 < . 1:1(0) ack 21 win 2920 +0 > R 21:21(0) Fixes: b357a364c57c9 ("inet: fix possible panic in reqsk_queue_unlink()") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d397617f757558a18f491e41b5f95a5bcf446d70 Author: Daniel Borkmann Date: Fri Aug 7 00:26:41 2015 +0200 netlink: make sure -EBUSY won't escape from netlink_insert [ Upstream commit 4e7c1330689e27556de407d3fdadc65ffff5eb12 ] Linus reports the following deadlock on rtnl_mutex; triggered only once so far (extract): [12236.694209] NetworkManager D 0000000000013b80 0 1047 1 0x00000000 [12236.694218] ffff88003f902640 0000000000000000 ffffffff815d15a9 0000000000000018 [12236.694224] ffff880119538000 ffff88003f902640 ffffffff81a8ff84 00000000ffffffff [12236.694230] ffffffff81a8ff88 ffff880119c47f00 ffffffff815d133a ffffffff81a8ff80 [12236.694235] Call Trace: [12236.694250] [] ? schedule_preempt_disabled+0x9/0x10 [12236.694257] [] ? schedule+0x2a/0x70 [12236.694263] [] ? schedule_preempt_disabled+0x9/0x10 [12236.694271] [] ? __mutex_lock_slowpath+0x7f/0xf0 [12236.694280] [] ? mutex_lock+0x16/0x30 [12236.694291] [] ? rtnetlink_rcv+0x10/0x30 [12236.694299] [] ? netlink_unicast+0xfb/0x180 [12236.694309] [] ? rtnl_getlink+0x113/0x190 [12236.694319] [] ? rtnetlink_rcv_msg+0x7a/0x210 [12236.694331] [] ? sock_has_perm+0x5c/0x70 [12236.694339] [] ? rtnetlink_rcv+0x30/0x30 [12236.694346] [] ? netlink_rcv_skb+0x9c/0xc0 [12236.694354] [] ? rtnetlink_rcv+0x1f/0x30 [12236.694360] [] ? netlink_unicast+0xfb/0x180 [12236.694367] [] ? netlink_sendmsg+0x484/0x5d0 [12236.694376] [] ? __wake_up+0x2f/0x50 [12236.694387] [] ? sock_sendmsg+0x33/0x40 [12236.694396] [] ? ___sys_sendmsg+0x22e/0x240 [12236.694405] [] ? ___sys_recvmsg+0x135/0x1a0 [12236.694415] [] ? eventfd_write+0x82/0x210 [12236.694423] [] ? fsnotify+0x32e/0x4c0 [12236.694429] [] ? wake_up_q+0x60/0x60 [12236.694434] [] ? __sys_sendmsg+0x39/0x70 [12236.694440] [] ? entry_SYSCALL_64_fastpath+0x12/0x6a It seems so far plausible that the recursive call into rtnetlink_rcv() looks suspicious. One way, where this could trigger is that the senders NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so the rtnl_getlink() request's answer would be sent to the kernel instead to the actual user process, thus grabbing rtnl_mutex() twice. One theory would be that netlink_autobind() triggered via netlink_sendmsg() internally overwrites the -EBUSY error to 0, but where it is wrongly originating from __netlink_insert() instead. That would reset the socket's portid to 0, which is then filled into NETLINK_CB(skb).portid later on. As commit d470e3b483dc ("[NETLINK]: Fix two socket hashing bugs.") also puts it, -EBUSY should not be propagated from netlink_insert(). It looks like it's very unlikely to reproduce. We need to trigger the rhashtable_insert_rehash() handler under a situation where rehashing currently occurs (one /rare/ way would be to hit ht->elasticity limits while not filled enough to expand the hashtable, but that would rather require a specifically crafted bind() sequence with knowledge about destination slots, seems unlikely). It probably makes sense to guard __netlink_insert() in any case and remap that error. It was suggested that EOVERFLOW might be better than an already overloaded ENOMEM. Reference: http://thread.gmane.org/gmane.linux.network/372676 Reported-by: Linus Torvalds Signed-off-by: Daniel Borkmann Acked-by: Herbert Xu Acked-by: Thomas Graf Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1d79bc603aec343479bc56bc770b8b13c9660523 Author: Ivan Vecera Date: Thu Aug 6 22:48:23 2015 +0200 bna: fix interrupts storm caused by erroneous packets [ Upstream commit ade4dc3e616e33c80d7e62855fe1b6f9895bc7c3 ] The commit "e29aa33 bna: Enable Multi Buffer RX" moved packets counter increment from the beginning of the NAPI processing loop after the check for erroneous packets so they are never accounted. This counter is used to inform firmware about number of processed completions (packets). As these packets are never acked the firmware fires IRQs for them again and again. Fixes: e29aa33 ("bna: Enable Multi Buffer RX") Signed-off-by: Ivan Vecera Acked-by: Rasesh Mody Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b291dba31d0862c9262005ee7a8b38f2377494fa Author: Nikolay Aleksandrov Date: Tue Aug 4 19:06:33 2015 +0200 bridge: netlink: account for the IFLA_BRPORT_PROXYARP_WIFI attribute size and policy [ Upstream commit 786c2077ec8e9eab37a88fc14aac4309a8061e18 ] The attribute size wasn't accounted for in the get_slave_size() callback (br_port_get_slave_size) when it was introduced, so fix it now. Also add a policy entry for it in br_port_policy. Signed-off-by: Nikolay Aleksandrov Fixes: 842a9ae08a25 ("bridge: Extend Proxy ARP design to allow optional rules for Wi-Fi") Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9000d23361befc59b04273c7b5ae1bfee2eaa4ee Author: Nikolay Aleksandrov Date: Tue Aug 4 19:06:32 2015 +0200 bridge: netlink: account for the IFLA_BRPORT_PROXYARP attribute size and policy [ Upstream commit 355b9f9df1f0311f20087350aee8ad96eedca8a9 ] The attribute size wasn't accounted for in the get_slave_size() callback (br_port_get_slave_size) when it was introduced, so fix it now. Also add a policy entry for it in br_port_policy. Signed-off-by: Nikolay Aleksandrov Fixes: 958501163ddd ("bridge: Add support for IEEE 802.11 Proxy ARP") Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 89b2791c0fd6938a1c2124589fe1b0f699d4b0d2 Author: Eric Dumazet Date: Sat Aug 1 12:14:33 2015 +0200 udp: fix dst races with multicast early demux [ Upstream commit 10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a ] Multicast dst are not cached. They carry DST_NOCACHE. As mentioned in commit f8864972126899 ("ipv4: fix dst race in sk_dst_get()"), these dst need special care before caching them into a socket. Caching them is allowed only if their refcnt was not 0, ie we must use atomic_inc_not_zero() Also, we must use READ_ONCE() to fetch sk->sk_rx_dst, as mentioned in commit d0c294c53a771 ("tcp: prevent fetching dst twice in early demux code") Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux") Tested-by: Gregory Hoggarth Signed-off-by: Eric Dumazet Reported-by: Gregory Hoggarth Reported-by: Alex Gartrell Cc: Michal Kubeček Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c1a6ce0483bbdd9cf3489ef9efc312482344f58c Author: Dan Carpenter Date: Sat Aug 1 15:33:26 2015 +0300 rds: fix an integer overflow test in rds_info_getsockopt() [ Upstream commit 468b732b6f76b138c0926eadf38ac88467dcd271 ] "len" is a signed integer. We check that len is not negative, so it goes from zero to INT_MAX. PAGE_SIZE is unsigned long so the comparison is type promoted to unsigned long. ULONG_MAX - 4095 is a higher than INT_MAX so the condition can never be true. I don't know if this is harmful but it seems safe to limit "len" to INT_MAX - 4095. Fixes: a8c879a7ee98 ('RDS: Info and stats') Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 715bb7ae43279e24a8d4d35a9ac8288b47a75500 Author: Ido Schimmel Date: Sun Aug 2 19:29:16 2015 +0200 rocker: free netdevice during netdevice removal [ Upstream commit 1ebd47efa4e17391dfac8caa349c6a8d35f996d1 ] When removing a port's netdevice in 'rocker_remove_ports', we should also free the allocated 'net_device' structure. Do that by calling 'free_netdev' after unregistering it. Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko Fixes: 4b8ac9660af ("rocker: introduce rocker switch driver") Acked-by: Scott Feldman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 563071d6899c56701a47e945c5f39673bfcb0d38 Author: Daniel Borkmann Date: Wed Jul 29 23:35:25 2015 +0200 net: sched: fix refcount imbalance in actions [ Upstream commit 28e6b67f0b292f557468c139085303b15f1a678f ] Since commit 55334a5db5cd ("net_sched: act: refuse to remove bound action outside"), we end up with a wrong reference count for a tc action. Test case 1: FOO="1,6 0 0 4294967295," BAR="1,6 0 0 4294967294," tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \ action bpf bytecode "$FOO" tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 1 bind 1 tc actions replace action bpf bytecode "$BAR" index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe index 1 ref 2 bind 1 tc actions replace action bpf bytecode "$FOO" index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 3 bind 1 Test case 2: FOO="1,6 0 0 4294967295," tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 1 bind 1 tc actions add action drop index 1 RTNETLINK answers: File exists [...] tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 2 bind 1 tc actions add action drop index 1 RTNETLINK answers: File exists [...] tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 3 bind 1 What happens is that in tcf_hash_check(), we check tcf_common for a given index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've found an existing action. Now there are the following cases: 1) We do a late binding of an action. In that case, we leave the tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init() handler. This is correctly handeled. 2) We replace the given action, or we try to add one without replacing and find out that the action at a specific index already exists (thus, we go out with error in that case). In case of 2), we have to undo the reference count increase from tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to do so because of the 'tcfc_bindcnt > 0' check which bails out early with an -EPERM error. Now, while commit 55334a5db5cd prevents 'tc actions del action ...' on an already classifier-bound action to drop the reference count (which could then become negative, wrap around etc), this restriction only accounts for invocations outside a specific action's ->init() handler. One possible solution would be to add a flag thus we possibly trigger the -EPERM ony in situations where it is indeed relevant. After the patch, above test cases have correct reference count again. Fixes: 55334a5db5cd ("net_sched: act: refuse to remove bound action outside") Signed-off-by: Daniel Borkmann Reviewed-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e04f76d6215119252fc7d0b8033d4529f931f9f9 Author: Daniel Borkmann Date: Wed Jul 29 18:40:56 2015 +0200 act_bpf: fix memory leaks when replacing bpf programs [ Upstream commit f4eaed28c7834fc049c754f63e6988bbd73778d9 ] We currently trigger multiple memory leaks when replacing bpf actions, besides others: comm "tc", pid 1909, jiffies 4294851310 (age 1602.796s) hex dump (first 32 bytes): 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ 18 b0 98 6d 00 88 ff ff 00 00 00 00 00 00 00 00 ...m............ backtrace: [] kmemleak_alloc+0x4e/0xb0 [] __vmalloc_node_range+0x1bd/0x2c0 [] __vmalloc+0x4a/0x50 [] bpf_prog_alloc+0x3a/0xa0 [] bpf_prog_create+0x44/0xa0 [] tcf_bpf_init+0x28b/0x3c0 [act_bpf] [] tcf_action_init_1+0x191/0x1b0 [] tcf_action_init+0x82/0xf0 [] tcf_exts_validate+0xb2/0xc0 [] cls_bpf_modify_existing+0x98/0x340 [cls_bpf] [] cls_bpf_change+0x1a6/0x274 [cls_bpf] [] tc_ctl_tfilter+0x335/0x910 [] rtnetlink_rcv_msg+0x95/0x240 [] netlink_rcv_skb+0xaf/0xc0 [] rtnetlink_rcv+0x2e/0x40 [] netlink_unicast+0xef/0x1b0 Issue is that the old content from tcf_bpf is allocated and needs to be released when we replace it. We seem to do that since the beginning of act_bpf on the filter and insns, later on the name as well. Example test case, after patch: # FOO="1,6 0 0 4294967295," # BAR="1,6 0 0 4294967294," # tc actions add action bpf bytecode "$FOO" index 2 # tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 2 ref 1 bind 0 # tc actions replace action bpf bytecode "$BAR" index 2 # tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe index 2 ref 1 bind 0 # tc actions replace action bpf bytecode "$FOO" index 2 # tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 2 ref 1 bind 0 # tc actions del action bpf index 2 [...] # echo "scan" > /sys/kernel/debug/kmemleak # cat /sys/kernel/debug/kmemleak | grep "comm \"tc\"" | wc -l 0 Fixes: d23b8ad8ab23 ("tc: add BPF based action") Signed-off-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2ef3d434d30f074fc744f86a956239feea922b04 Author: Alexander Drozdov Date: Tue Jul 28 13:57:01 2015 +0300 packet: tpacket_snd(): fix signed/unsigned comparison [ Upstream commit dbd46ab412b8fb395f2b0ff6f6a7eec9df311550 ] tpacket_fill_skb() can return a negative value (-errno) which is stored in tp_len variable. In that case the following condition will be (but shouldn't be) true: tp_len > dev->mtu + dev->hard_header_len as dev->mtu and dev->hard_header_len are both unsigned. That may lead to just returning an incorrect EMSGSIZE errno to the user. Fixes: 52f1454f629fa ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case") Signed-off-by: Alexander Drozdov Acked-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d339d03860f21b3f27a674312df09b6ce92c0c42 Author: Lars Westerhoff Date: Tue Jul 28 01:32:21 2015 +0300 packet: missing dev_put() in packet_do_bind() [ Upstream commit 158cd4af8dedbda0d612d448c724c715d0dda649 ] When binding a PF_PACKET socket, the use count of the bound interface is always increased with dev_hold in dev_get_by_{index,name}. However, when rebound with the same protocol and device as in the previous bind the use count of the interface was not decreased. Ultimately, this caused the deletion of the interface to fail with the following message: unregister_netdevice: waiting for dummy0 to become free. Usage count = 1 This patch moves the dev_put out of the conditional part that was only executed when either the protocol or device changed on a bind. Fixes: 902fefb82ef7 ('packet: improve socket create/bind latency in some cases') Signed-off-by: Lars Westerhoff Signed-off-by: Dan Carpenter Reviewed-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d1f56d1041b423c2c9c767b03eaade32f5463efc Author: Alexander Duyck Date: Mon Jul 27 13:08:06 2015 -0700 fib_trie: Drop unnecessary calls to leaf_pull_suffix [ Upstream commit 1513069edcf8dd86cfd8d5daef482b97d6b93df6 ] It was reported that update_suffix was taking a long time on systems where a large number of leaves were attached to a single node. As it turns out fib_table_flush was calling update_suffix for each leaf that didn't have all of the aliases stripped from it. As a result, on this large node removing one leaf would result in us calling update_suffix for every other leaf on the node. The fix is to just remove the calls to leaf_pull_suffix since they are redundant as we already have a call in resize that will go through and update the suffix length for the node before we exit out of fib_table_flush or fib_table_flush_external. Reported-by: David Ahern Signed-off-by: Alexander Duyck Tested-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f75d70aa248dad6038eb5b49b5b816ce96e8ae7e Author: Jack Morgenstein Date: Wed Jul 22 16:53:47 2015 +0300 net/mlx4_core: Fix wrong index in propagating port change event to VFs [ Upstream commit 1c1bf34951e8d17941bf708d1901c47e81b15d55 ] The port-change event processing in procedure mlx4_eq_int() uses "slave" as the vf_oper array index. Since the value of "slave" is the PF function index, the result is that the PF link state is used for deciding to propagate the event for all the VFs. The VF link state should be used, so the VF function index should be used here. Fixes: 948e306d7d64 ('net/mlx4: Add VF link state support') Signed-off-by: Jack Morgenstein Signed-off-by: Matan Barak Signed-off-by: Or Gerlitz Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 90ec7452b5804da4e0a0041e25c4ee74e06578db Author: Nikolay Aleksandrov Date: Wed Jul 22 13:03:40 2015 +0200 bridge: netlink: fix slave_changelink/br_setport race conditions [ Upstream commit 963ad94853000ab100f5ff19eea80095660d41b4 ] Since slave_changelink support was added there have been a few race conditions when using br_setport() since some of the port functions it uses require the bridge lock. It is very easy to trigger a lockup due to some internal spin_lock() usage without bh disabled, also it's possible to get the bridge into an inconsistent state. Signed-off-by: Nikolay Aleksandrov Fixes: 3ac636b8591c ("bridge: implement rtnl_link_ops->slave_changelink") Reviewed-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 51677b722338da3671ef19846616cf3811253760 Author: Michael S. Tsirkin Date: Wed Jul 15 15:26:19 2015 +0300 virtio_net: don't require ANY_LAYOUT with VERSION_1 [ Upstream commit 75993300d008f418ee2569a632185fc1d7d50674 ] ANY_LAYOUT is a compatibility feature. It's implied for VERSION_1 devices, and non-transitional devices might not offer it. Change code to behave accordingly. Signed-off-by: Michael S. Tsirkin Reviewed-by: Paolo Bonzini Reviewed-by: Stefan Hajnoczi Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b265c3003196a76f9b88d54e098e982e7c55647c Author: Florian Westphal Date: Tue Jul 21 16:33:50 2015 +0200 netlink: don't hold mutex in rcu callback when releasing mmapd ring [ Upstream commit 0470eb99b4721586ccac954faac3fa4472da0845 ] Kirill A. Shutemov says: This simple test-case trigers few locking asserts in kernel: int main(int argc, char **argv) { unsigned int block_size = 16 * 4096; struct nl_mmap_req req = { .nm_block_size = block_size, .nm_block_nr = 64, .nm_frame_size = 16384, .nm_frame_nr = 64 * block_size / 16384, }; unsigned int ring_size; int fd; fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0) exit(1); if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0) exit(1); ring_size = req.nm_block_nr * req.nm_block_size; mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); return 0; } +++ exited with 0 +++ BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init 3 locks held by init/1: #0: (reboot_mutex){+.+...}, at: [] SyS_reboot+0xa9/0x220 #1: ((reboot_notifier_list).rwsem){.+.+..}, at: [] __blocking_notifier_call_chain+0x39/0x70 #2: (rcu_callback){......}, at: [] rcu_do_batch.isra.49+0x160/0x10c0 Preemption disabled at:[] __delay+0xf/0x20 CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102 0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002 ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98 Call Trace: [] dump_stack+0x4f/0x7b [] ___might_sleep+0x16d/0x270 [] __might_sleep+0x4d/0x90 [] mutex_lock_nested+0x2f/0x430 [] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [] ? __this_cpu_preempt_check+0x13/0x20 [] netlink_set_ring+0x1ed/0x350 [] ? netlink_undo_bind+0x70/0x70 [] netlink_sock_destruct+0x80/0x150 [] __sk_free+0x1d/0x160 [] sk_free+0x19/0x20 [..] Cong Wang says: We can't hold mutex lock in a rcu callback, [..] Thomas Graf says: The socket should be dead at this point. It might be simpler to add a netlink_release_ring() function which doesn't require locking at all. Reported-by: "Kirill A. Shutemov" Diagnosed-by: Cong Wang Suggested-by: Thomas Graf Signed-off-by: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1d18abda904fa7d299239c6ba17b986a43f36b3c Author: Edward Hyunkoo Jee Date: Tue Jul 21 09:43:59 2015 +0200 inet: frags: fix defragmented packet's IP header for af_packet [ Upstream commit 0848f6428ba3a2e42db124d41ac6f548655735bf ] When ip_frag_queue() computes positions, it assumes that the passed sk_buff does not contain L2 headers. However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly functions can be called on outgoing packets that contain L2 headers. Also, IPv4 checksum is not corrected after reassembly. Fixes: 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4 fanouts.") Signed-off-by: Edward Hyunkoo Jee Signed-off-by: Eric Dumazet Cc: Willem de Bruijn Cc: Jerry Chu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit dcdd14ea0bd3e1fc668c22baf7d2992a3062e33a Author: Daniel Borkmann Date: Fri Jul 17 22:38:45 2015 +0200 sched: cls_flow: fix panic on filter replace [ Upstream commit 32b2f4b196b37695fdb42b31afcbc15399d6ef91 ] The following test case causes a NULL pointer dereference in cls_flow: tc filter add dev foo parent 1: handle 0x1 flow hash keys dst action ok tc filter replace dev foo parent 1: pref 49152 handle 0x1 \ flow hash keys mark action drop To be more precise, actually two different panics are fixed, the first occurs because tcf_exts_init() is not called on the newly allocated filter when we do a replace. And the second panic uncovered after that happens since the arguments of list_replace_rcu() are swapped, the old element needs to be the first argument and the new element the second. Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU") Signed-off-by: Daniel Borkmann Acked-by: John Fastabend Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9238d30c469f797598ef6eb616c68fabe67532e6 Author: Daniel Borkmann Date: Fri Jul 17 22:38:43 2015 +0200 sched: cls_bpf: fix panic on filter replace [ Upstream commit f6bfc46da6292b630ba389592123f0dd02066172 ] The following test case causes a NULL pointer dereference in cls_bpf: FOO="1,6 0 0 4294967295," tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok tc filter replace dev foo parent 1: pref 49152 handle 0x1 \ bpf bytecode "$FOO" flowid 1:1 action drop The problem is that commit 1f947bf151e9 ("net: sched: rcu'ify cls_bpf") accidentally swapped the arguments of list_replace_rcu(), the old element needs to be the first argument and the new element the second. Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf") Signed-off-by: Daniel Borkmann Acked-by: John Fastabend Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 68c2a15f7246e095707471935ac6281386493a7f Author: dingtianhong Date: Thu Jul 16 16:30:02 2015 +0800 bonding: correct the MAC address for "follow" fail_over_mac policy [ Upstream commit a951bc1e6ba58f11df5ed5ddc41311e10f5fd20b ] The "follow" fail_over_mac policy is useful for multiport devices that either become confused or incur a performance penalty when multiple ports are programmed with the same MAC address, but the same MAC address still may happened by this steps for this policy: 1) echo +eth0 > /sys/class/net/bond0/bonding/slaves bond0 has the same mac address with eth0, it is MAC1. 2) echo +eth1 > /sys/class/net/bond0/bonding/slaves eth1 is backup, eth1 has MAC2. 3) ifconfig eth0 down eth1 became active slave, bond will swap MAC for eth0 and eth1, so eth1 has MAC1, and eth0 has MAC2. 4) ifconfig eth1 down there is no active slave, and eth1 still has MAC1, eth2 has MAC2. 5) ifconfig eth0 up the eth0 became active slave again, the bond set eth0 to MAC1. Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same MAC address, it will break this policy for ACTIVE_BACKUP mode. This patch will fix this problem by finding the old active slave and swap them MAC address before change active slave. Signed-off-by: Ding Tianhong Tested-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a1f2fcd2d6adc73a0d40247573266d3c668b3b52 Author: Herbert Xu Date: Mon Jul 20 17:55:38 2015 +0800 Revert "sit: Add gro callbacks to sit_offload" [ Upstream commit fdbf5b097bbd9693a86c0b8bfdd071a9a2117cfc ] This patch reverts 19424e052fb44da2f00d1a868cbb51f3e9f4bbb5 ("sit: Add gro callbacks to sit_offload") because it generates packets that cannot be handled even by our own GSO. Reported-by: Wolfgang Walter Signed-off-by: Herbert Xu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f6eda61aba0ca9e6388fcb2453541281d4f8cb14 Author: Nikolay Aleksandrov Date: Wed Jul 15 21:52:51 2015 +0200 bonding: fix destruction of bond with devices different from arphrd_ether [ Upstream commit 06f6d1094aa0992432b1e2a0920b0ee86ccd83bf ] When the bonding is being unloaded and the netdevice notifier is unregistered it executes NETDEV_UNREGISTER for each device which should remove the bond's proc entry but if the device enslaved is not of ARPHRD_ETHER type and is in front of the bonding, it may execute bond_release_and_destroy() first which would release the last slave and destroy the bond device leaving the proc entry and thus we will get the following error (with dynamic debug on for bond_netdev_event to see the events order): [ 908.963051] eql: event: 9 [ 908.963052] eql: IFF_SLAVE [ 908.963054] eql: event: 2 [ 908.963056] eql: IFF_SLAVE [ 908.963058] eql: event: 6 [ 908.963059] eql: IFF_SLAVE [ 908.963110] bond0: Releasing active interface eql [ 908.976168] bond0: Destroying bond bond0 [ 908.976266] bond0 (unregistering): Released all slaves [ 908.984097] ------------[ cut here ]------------ [ 908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575 remove_proc_entry+0x112/0x160() [ 908.984110] remove_proc_entry: removing non-empty directory 'net/bonding', leaking at least 'bond0' [ 908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore usb_common [last unloaded: bonding] [ 908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G W O 4.2.0-rc2+ #8 [ 908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 908.984172] 0000000000000000 ffffffff81732d41 ffffffff81525b34 ffff8800358dfda8 [ 908.984175] ffffffff8106c521 ffff88003595af78 ffff88003595af40 ffff88003e3a4280 [ 908.984178] ffffffffa058d040 0000000000000000 ffffffff8106c59a ffffffff8172ebd0 [ 908.984181] Call Trace: [ 908.984188] [] ? dump_stack+0x40/0x50 [ 908.984193] [] ? warn_slowpath_common+0x81/0xb0 [ 908.984196] [] ? warn_slowpath_fmt+0x4a/0x50 [ 908.984199] [] ? remove_proc_entry+0x112/0x160 [ 908.984205] [] ? bond_destroy_proc_dir+0x26/0x30 [bonding] [ 908.984208] [] ? bond_net_exit+0x8e/0xa0 [bonding] [ 908.984217] [] ? ops_exit_list.isra.4+0x37/0x70 [ 908.984225] [] ? unregister_pernet_operations+0x8d/0xd0 [ 908.984228] [] ? unregister_pernet_subsys+0x1d/0x30 [ 908.984232] [] ? bonding_exit+0x23/0xdba [bonding] [ 908.984236] [] ? SyS_delete_module+0x18a/0x250 [ 908.984241] [] ? task_work_run+0x89/0xc0 [ 908.984244] [] ? entry_SYSCALL_64_fastpath+0x16/0x75 [ 908.984247] ---[ end trace 7c006ed4abbef24b ]--- Thus remove the proc entry manually if bond_release_and_destroy() is used. Because of the checks in bond_remove_proc_entry() it's not a problem for a bond device to change namespaces (the bug fixed by the Fixes commit) but since commit f9399814927ad ("bonding: Don't allow bond devices to change network namespaces.") that can't happen anyway. Reported-by: Carol Soto Signed-off-by: Nikolay Aleksandrov Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from the netdev events") Tested-by: Carol L Soto Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7921f568ebdebd8862de6146d04137d0c8176ce0 Author: Eric Dumazet Date: Tue Jul 14 08:10:22 2015 +0200 ipv6: lock socket in ip6_datagram_connect() [ Upstream commit 03645a11a570d52e70631838cb786eb4253eb463 ] ip6_datagram_connect() is doing a lot of socket changes without socket being locked. This looks wrong, at least for udp_lib_rehash() which could corrupt lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses. Signed-off-by: Eric Dumazet Acked-by: Herbert Xu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ef84567e9484f1a4755b184f6ad888c371504245 Author: Tilman Schmidt Date: Tue Jul 14 00:37:13 2015 +0200 isdn/gigaset: reset tty->receive_room when attaching ser_gigaset [ Upstream commit fd98e9419d8d622a4de91f76b306af6aa627aa9c ] Commit 79901317ce80 ("n_tty: Don't flush buffer when closing ldisc"), first merged in kernel release 3.10, caused the following regression in the Gigaset M101 driver: Before that commit, when closing the N_TTY line discipline in preparation to switching to N_GIGASET_M101, receive_room would be reset to a non-zero value by the call to n_tty_flush_buffer() in n_tty's close method. With the removal of that call, receive_room might be left at zero, blocking data reception on the serial line. The present patch fixes that regression by setting receive_room to an appropriate value in the ldisc open method. Fixes: 79901317ce80 ("n_tty: Don't flush buffer when closing ldisc") Signed-off-by: Tilman Schmidt Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3e71447b891943cfaf8e5be7bd638c0414b3fea8 Author: WANG Cong Date: Mon Jul 13 12:30:07 2015 -0700 fq_codel: fix a use-after-free [ Upstream commit 052cbda41fdc243a8d40cce7ab3a6327b4b2887e ] Fixes: 25331d6ce42b ("net: sched: implement qstat helper routines") Cc: John Fastabend Signed-off-by: Cong Wang Signed-off-by: Cong Wang Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1b8976fedaf00e6e06ebe850aa4c8602d55282c3 Author: Nikolay Aleksandrov Date: Mon Jul 13 06:36:19 2015 -0700 bridge: mdb: fix double add notification [ Upstream commit 5ebc784625ea68a9570d1f70557e7932988cd1b4 ] Since the mdb add/del code was introduced there have been 2 br_mdb_notify calls when doing br_mdb_add() resulting in 2 notifications on each add. Example: Command: bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent Before patch: root@debian:~# bridge monitor all [MDB]dev br0 port eth1 grp 239.0.0.1 permanent [MDB]dev br0 port eth1 grp 239.0.0.1 permanent After patch: root@debian:~# bridge monitor all [MDB]dev br0 port eth1 grp 239.0.0.1 permanent Signed-off-by: Nikolay Aleksandrov Fixes: cfd567543590 ("bridge: add support of adding and deleting mdb entries") Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e308dd584ec309635129444e41dd4c90abf61842 Author: Herbert Xu Date: Tue Aug 4 15:42:47 2015 +0800 net: Fix skb_set_peeked use-after-free bug [ Upstream commit a0a2a6602496a45ae838a96db8b8173794b5d398 ] The commit 738ac1ebb96d02e0d23bc320302a6ea94c612dec ("net: Clone skb before setting peeked flag") introduced a use-after-free bug in skb_recv_datagram. This is because skb_set_peeked may create a new skb and free the existing one. As it stands the caller will continue to use the old freed skb. This patch fixes it by making skb_set_peeked return the new skb (or the old one if unchanged). Fixes: 738ac1ebb96d ("net: Clone skb before setting peeked flag") Reported-by: Brenden Blanco Signed-off-by: Herbert Xu Tested-by: Brenden Blanco Reviewed-by: Konstantin Khlebnikov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8b843198d812c885a8a9aeedf0135b47e3e500f5 Author: Herbert Xu Date: Mon Jul 13 20:01:42 2015 +0800 net: Fix skb csum races when peeking [ Upstream commit 89c22d8c3b278212eef6a8cc66b570bc840a6f5a ] When we calculate the checksum on the recv path, we store the result in the skb as an optimisation in case we need the checksum again down the line. This is in fact bogus for the MSG_PEEK case as this is done without any locking. So multiple threads can peek and then store the result to the same skb, potentially resulting in bogus skb states. This patch fixes this by only storing the result if the skb is not shared. This preserves the optimisations for the few cases where it can be done safely due to locking or other reasons, e.g., SIOCINQ. Signed-off-by: Herbert Xu Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f21407a98826002e0274c09f1d77ab0f69b80a3d Author: Herbert Xu Date: Mon Jul 13 16:04:13 2015 +0800 net: Clone skb before setting peeked flag [ Upstream commit 738ac1ebb96d02e0d23bc320302a6ea94c612dec ] Shared skbs must not be modified and this is crucial for broadcast and/or multicast paths where we use it as an optimisation to avoid unnecessary cloning. The function skb_recv_datagram breaks this rule by setting peeked without cloning the skb first. This causes funky races which leads to double-free. This patch fixes this by cloning the skb and replacing the skb in the list when setting skb->peeked. Fixes: a59322be07c9 ("[UDP]: Only increment counter on first peek/recv") Reported-by: Konstantin Khlebnikov Signed-off-by: Herbert Xu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 316590d25ea05263eb4869fa668bf8a78fa04c1c Author: Dan Carpenter Date: Sun Jul 12 01:20:55 2015 +0300 net/xen-netback: off by one in BUG_ON() condition [ Upstream commit 50c2e4dd6749725338621fff456b26d3a592259f ] The > should be >=. I also added spaces around the '-' operations so the code is a little more consistent and matches the condition better. Fixes: f53c3fe8dad7 ('xen-netback: Introduce TX grant mapping') Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5568552ac83161a71b02f15e1d9c5b388d3da0a1 Author: Julian Anastasov Date: Thu Jul 9 09:59:10 2015 +0300 net: call rcu_read_lock early in process_backlog [ Upstream commit 2c17d27c36dcce2b6bf689f41a46b9e909877c21 ] Incoming packet should be either in backlog queue or in RCU read-side section. Otherwise, the final sequence of flush_backlog() and synchronize_net() may miss packets that can run without device reference: CPU 1 CPU 2 skb->dev: no reference process_backlog:__skb_dequeue process_backlog:local_irq_enable on_each_cpu for flush_backlog => IPI(hardirq): flush_backlog - packet not found in backlog CPU delayed ... synchronize_net - no ongoing RCU read-side sections netdev_run_todo, rcu_barrier: no ongoing callbacks __netif_receive_skb_core:rcu_read_lock - too late free dev process packet for freed dev Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman Cc: Stephen Hemminger Signed-off-by: Julian Anastasov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f75c8a3015422128ca150e81c730bc2a471b5f4a Author: Julian Anastasov Date: Thu Jul 9 09:59:09 2015 +0300 net: do not process device backlog during unregistration [ Upstream commit e9e4dd3267d0c5234c5c0f47440456b10875dec9 ] commit 381c759d9916 ("ipv4: Avoid crashing in ip_error") fixes a problem where processed packet comes from device with destroyed inetdev (dev->ip_ptr). This is not expected because inetdev_destroy is called in NETDEV_UNREGISTER phase and packets should not be processed after dev_close_many() and synchronize_net(). Above fix is still required because inetdev_destroy can be called for other reasons. But it shows the real problem: backlog can keep packets for long time and they do not hold reference to device. Such packets are then delivered to upper levels at the same time when device is unregistered. Calling flush_backlog after NETDEV_UNREGISTER_FINAL still accounts all packets from backlog but before that some packets continue to be delivered to upper levels long after the synchronize_net call which is supposed to wait the last ones. Also, as Eric pointed out, processed packets, mostly from other devices, can continue to add new packets to backlog. Fix the problem by moving flush_backlog early, after the device driver is stopped and before the synchronize_net() call. Then use netif_running check to make sure we do not add more packets to backlog. We have to do it in enqueue_to_backlog context when the local IRQ is disabled. As result, after the flush_backlog and synchronize_net sequence all packets should be accounted. Thanks to Eric W. Biederman for the test script and his valuable feedback! Reported-by: Vittorio Gambaletta Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman Cc: Stephen Hemminger Signed-off-by: Julian Anastasov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit af9f8e1f82574c2eb91377aa93db0a1297e92c0b Author: Eric Dumazet Date: Thu Jul 9 18:56:07 2015 +0200 bridge: fix potential crash in __netdev_pick_tx() [ Upstream commit a7d35f9d73e9ffa74a02304b817e579eec632f67 ] Commit c29390c6dfee ("xps: must clear sender_cpu before forwarding") fixed an issue in normal forward path, caused by sender_cpu & napi_id skb fields being an union. Bridge is another point where skb can be forwarded, so we need the same cure. Bug triggers if packet was received on a NIC using skb_mark_napi_id() Fixes: 2bd82484bb4c ("xps: fix xps for stacked devices") Signed-off-by: Eric Dumazet Reported-by: Bob Liu Tested-by: Bob Liu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fd08e0ccaa649064ca0df99eaadb58199be41970 Author: Oleg Nesterov Date: Wed Jul 8 21:42:11 2015 +0200 net: pktgen: fix race between pktgen_thread_worker() and kthread_stop() [ Upstream commit fecdf8be2d91e04b0a9a4f79ff06499a36f5d14f ] pktgen_thread_worker() is obviously racy, kthread_stop() can come between the kthread_should_stop() check and set_current_state(). Signed-off-by: Oleg Nesterov Reported-by: Jan Stancek Reported-by: Marcelo Leitner Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 27463fc0ab7a97bb0f311623f846f5f7c8457be8 Author: Nikolay Aleksandrov Date: Tue Jul 7 15:55:56 2015 +0200 bridge: mdb: zero out the local br_ip variable before use [ Upstream commit f1158b74e54f2e2462ba5e2f45a118246d9d5b43 ] Since commit b0e9a30dd669 ("bridge: Add vlan id to multicast groups") there's a check in br_ip_equal() for a matching vlan id, but the mdb functions were not modified to use (or at least zero it) so when an entry was added it would have a garbage vlan id (from the local br_ip variable in __br_mdb_add/del) and this would prevent it from being matched and also deleted. So zero out the whole local ip var to protect ourselves from future changes and also to fix the current bug, since there's no vlan id support in the mdb uapi - use always vlan id 0. Example before patch: root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent RTNETLINK answers: Invalid argument After patch: root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb Signed-off-by: Nikolay Aleksandrov Fixes: b0e9a30dd669 ("bridge: Add vlan id to multicast groups") Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bb18cdc8b40e14a36fff015e794724e989af426c Author: Stephen Smalley Date: Tue Jul 7 09:43:45 2015 -0400 net/tipc: initialize security state for new connection socket [ Upstream commit fdd75ea8df370f206a8163786e7470c1277a5064 ] Calling connect() with an AF_TIPC socket would trigger a series of error messages from SELinux along the lines of: SELinux: Invalid class 0 type=AVC msg=audit(1434126658.487:34500): avc: denied { } for pid=292 comm="kworker/u16:5" scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass= permissive=0 This was due to a failure to initialize the security state of the new connection sock by the tipc code, leaving it with junk in the security class field and an unlabeled secid. Add a call to security_sk_clone() to inherit the security state from the parent socket. Reported-by: Tim Shearer Signed-off-by: Stephen Smalley Acked-by: Paul Moore Acked-by: Ying Xue Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 952f21eb0cd26e07b9a9e0639c0bd9453caee441 Author: Timo Teräs Date: Tue Jul 7 08:34:13 2015 +0300 ip_tunnel: fix ipv4 pmtu check to honor inner ip header df [ Upstream commit fc24f2b2094366da8786f59f2606307e934cea17 ] Frag needed should be sent only if the inner header asked to not fragment. Currently fragmentation is broken if the tunnel has df set, but df was not asked in the original packet. The tunnel's df needs to be still checked to update internally the pmtu cache. Commit 23a3647bc4f93bac broke it, and this commit fixes the ipv4 df check back to the way it was. Fixes: 23a3647bc4f93bac ("ip_tunnels: Use skb-len to PMTU check.") Cc: Pravin B Shelar Signed-off-by: Timo Teräs Acked-by: Pravin B Shelar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 61d0de723e87ffcbe2350a582ecc426dbe78054c Author: Daniel Borkmann Date: Tue Jul 7 00:07:52 2015 +0200 rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver [ Upstream commit 4f7d2cdfdde71ffe962399b7020c674050329423 ] Jason Gunthorpe reported that since commit c02db8c6290b ("rtnetlink: make SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes anymore with respect to their policy, that is, ifla_vfinfo_policy[]. Before, they were part of ifla_policy[], but they have been nested since placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO, which is another nested attribute for the actual VF attributes such as IFLA_VF_MAC, IFLA_VF_VLAN, etc. Despite the policy being split out from ifla_policy[] in this commit, it's never applied anywhere. nla_for_each_nested() only does basic nla_ok() testing for struct nlattr, but it doesn't know about the data context and their requirements. Fix, on top of Jason's initial work, does 1) parsing of the attributes with the right policy, and 2) using the resulting parsed attribute table from 1) instead of the nla_for_each_nested() loop (just like we used to do when still part of ifla_policy[]). Reference: http://thread.gmane.org/gmane.linux.network/368913 Fixes: c02db8c6290b ("rtnetlink: make SR-IOV VF interface symmetric") Reported-by: Jason Gunthorpe Cc: Chris Wright Cc: Sucheta Chakraborty Cc: Greg Rose Cc: Jeff Kirsher Cc: Rony Efraim Cc: Vlad Zolotarov Cc: Nicolas Dichtel Cc: Thomas Graf Signed-off-by: Jason Gunthorpe Signed-off-by: Daniel Borkmann Acked-by: Vlad Zolotarov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c559b9ff0bcbfce18608d1e89cdc2358e234f803 Author: Nicolas Dichtel Date: Mon Jul 6 17:25:10 2015 +0200 Revert "dev: set iflink to 0 for virtual interfaces" [ Upstream commit 95ec655bc465ccb2a3329d4aff9a45e3c8188db5 ] This reverts commit e1622baf54df8cc958bf29d71de5ad545ea7d93c. The side effect of this commit is to add a '@NONE' after each virtual interface name with a 'ip link'. It may break existing scripts. Reported-by: Olivier Hartkopp Signed-off-by: Nicolas Dichtel Tested-by: Oliver Hartkopp Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bd60ae48f727ef6cb9b0524fb49b7336cbb0ca66 Author: Eric Dumazet Date: Mon Jul 6 17:13:26 2015 +0200 net: graceful exit from netif_alloc_netdev_queues() [ Upstream commit d339727c2b1a10f25e6636670ab6e1841170e328 ] User space can crash kernel with ip link add ifb10 numtxqueues 100000 type ifb We must replace a BUG_ON() by proper test and return -EINVAL for crazy values. Fixes: 60877a32bce00 ("net: allow large number of tx queues") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 56fd491a29b829fac4ffb30f1fcec2b386455ad0 Author: Phil Sutter Date: Mon Jul 6 15:51:20 2015 +0200 rhashtable: fix for resize events during table walk [ Upstream commit 142b942a75cb10ede1b42bf85368d41449ab4e3b ] If rhashtable_walk_next detects a resize operation in progress, it jumps to the new table and continues walking that one. But it misses to drop the reference to it's current item, leading it to continue traversing the new table's bucket in which the current item is sorted into, and after reaching that bucket's end continues traversing the new table's second bucket instead of the first one, thereby potentially missing items. This fixes the rhashtable runtime test for me. Bug probably introduced by Herbert Xu's patch eddee5ba ("rhashtable: Fix walker behaviour during rehash") although not explicitly tested. Fixes: eddee5ba ("rhashtable: Fix walker behaviour during rehash") Signed-off-by: Phil Sutter Acked-by: Herbert Xu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ba6143d6e411fb94a9d73fffbaf309c126f3c16a Author: Angga Date: Fri Jul 3 14:40:52 2015 +1200 ipv6: Make MLD packets to only be processed locally [ Upstream commit 4c938d22c88a9ddccc8c55a85e0430e9c62b1ac5 ] Before commit daad151263cf ("ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().") MLD packets were only processed locally. After the change, a copy of MLD packet goes through ip6_mr_input, causing MRT6MSG_NOCACHE message to be generated to user space. Make MLD packet only processed locally. Fixes: daad151263cf ("ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().") Signed-off-by: Hermin Anggawijaya Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 164802e408904270d95f74de66404ba54df3fc60 Author: Jan Kara Date: Tue Jul 28 14:57:14 2015 -0400 jbd2: avoid infinite loop when destroying aborted journal commit 841df7df196237ea63233f0f9eaa41db53afd70f upstream. Commit 6f6a6fda2945 "jbd2: fix ocfs2 corrupt when updating journal superblock fails" changed jbd2_cleanup_journal_tail() to return EIO when the journal is aborted. That makes logic in jbd2_log_do_checkpoint() bail out which is fine, except that jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make a progress in cleaning the journal. Without it jbd2_journal_destroy() just loops in an infinite loop. Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of jbd2_log_do_checkpoint() fails with error. Reported-by: Eryu Guan Tested-by: Eryu Guan Fixes: 6f6a6fda294506dfe0e3e0a253bb2d2923f28f0a Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit 76763f58c0f7d27d1aaee039a89969964d73bf7d Author: Yinghai Lu Date: Wed Sep 9 15:39:12 2015 -0700 lib/decompressors: use real out buf size for gunzip with kernel commit 2d3862d26e67a59340ba1cf1748196c76c5787de upstream. When loading x86 64bit kernel above 4GiB with patched grub2, got kernel gunzip error. | early console in decompress_kernel | decompress_kernel: | input: [0x807f2143b4-0x807ff61aee] | output: [0x807cc00000-0x807f3ea29b] 0x027ea29c: output_len | boot via startup_64 | KASLR using RDTSC... | new output: [0x46fe000000-0x470138cfff] 0x0338d000: output_run_size | decompress: [0x46fe000000-0x47007ea29b] <=== [0x807f2143b4-0x807ff61aee] | | Decompressing Linux... gz... | | uncompression error | | -- System halted the new buffer is at 0x46fe000000ULL, decompressor_gzip is using 0xffffffb901ffffff as out_len. gunzip in lib/zlib_inflate/inflate.c cap that len to 0x01ffffff and decompress fails later. We could hit this problem with crashkernel booting that uses kexec loading kernel above 4GiB. We have decompress_* support: 1. inbuf[]/outbuf[] for kernel preboot. 2. inbuf[]/flush() for initramfs 3. fill()/flush() for initrd. This bug only affect kernel preboot path that use outbuf[]. Add __decompress and take real out_buf_len for gunzip instead of guessing wrong buf size. Fixes: 1431574a1c4 (lib/decompressors: fix "no limit" output buffer length) Signed-off-by: Yinghai Lu Cc: Alexandre Courbot Cc: Jon Medhurst Cc: Stephen Warren Cc: "H. Peter Anvin" Cc: Thomas Gleixner Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit d9c410f96c332b82e4b3de03f6e5f45902f1dbfb Author: Hin-Tak Leung Date: Wed Sep 9 15:38:04 2015 -0700 hfs,hfsplus: cache pages correctly between bnode_create and bnode_free commit 7cb74be6fd827e314f81df3c5889b87e4c87c569 upstream. Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and hfs_bnode_find() for finding or creating pages corresponding to an inode) are immediately kmap()'ed and used (both read and write) and kunmap()'ed, and should not be page_cache_release()'ed until hfs_bnode_free(). This patch fixes a problem I first saw in July 2012: merely running "du" on a large hfsplus-mounted directory a few times on a reasonably loaded system would get the hfsplus driver all confused and complaining about B-tree inconsistencies, and generates a "BUG: Bad page state". Most recently, I can generate this problem on up-to-date Fedora 22 with shipped kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller mounts) and "du /mnt" simultaneously on two windows, where /mnt is a lightly-used QEMU VM image of the full Mac OS X 10.9: $ df -i / /home /mnt Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/fedora-root 3276800 551665 2725135 17% / /dev/mapper/fedora-home 52879360 716221 52163139 2% /home /dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt After applying the patch, I was able to run "du /" (60+ times) and "du /mnt" (150+ times) continuously and simultaneously for 6+ hours. There are many reports of the hfsplus driver getting confused under load and generating "BUG: Bad page state" or other similar issues over the years. [1] The unpatched code [2] has always been wrong since it entered the kernel tree. The only reason why it gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else, most of the time. The current RW driver appears to have followed the design and development of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec 2001) had a B-tree node-centric approach to read_cache_page()/page_cache_release() per bnode_get()/bnode_put(), migrating towards version 0.2 (June 2002) of caching and releasing pages per inode extents. When the current RW code first entered the kernel [2] in 2005, there was an REF_PAGES conditional (and "//" commented out code) to switch between B-node centric paging to inode-centric paging. There was a mistake with the direction of one of the REF_PAGES conditionals in __hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were removed, but a page_cache_release() was mistakenly left in (propagating the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out page_cache_release() in bnode_release() (which should be spanned by !REF_PAGES) was never enabled. References: [1]: Michael Fox, Apr 2013 http://www.spinics.net/lists/linux-fsdevel/msg63807.html ("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'") Sasha Levin, Feb 2015 http://lkml.org/lkml/2015/2/20/85 ("use after free") https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887 https://bugzilla.kernel.org/show_bug.cgi?id=42342 https://bugzilla.kernel.org/show_bug.cgi?id=63841 https://bugzilla.kernel.org/show_bug.cgi?id=78761 [2]: http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\ fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db commit d1081202f1d0ee35ab0beb490da4b65d4bc763db Author: Andrew Morton Date: Wed Feb 25 16:17:36 2004 -0800 [PATCH] HFS rewrite http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\ fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd commit 91556682e0bf004d98a529bf829d339abb98bbbd Author: Andrew Morton Date: Wed Feb 25 16:17:48 2004 -0800 [PATCH] HFS+ support [3]: http://sourceforge.net/projects/linux-hfsplus/ http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/ http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\ fs/hfsplus/bnode.c?r1=1.4&r2=1.5 Date: Thu Jun 6 09:45:14 2002 +0000 Use buffer cache instead of page cache in bnode.c. Cache inode extents. [4]: http://git.kernel.org/cgit/linux/kernel/git/\ stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d commit a5e3985fa014029eb6795664c704953720cc7f7d Author: Roman Zippel Date: Tue Sep 6 15:18:47 2005 -0700 [PATCH] hfs: remove debug code Signed-off-by: Hin-Tak Leung Signed-off-by: Sergei Antonov Reviewed-by: Anton Altaparmakov Reported-by: Sasha Levin Cc: Al Viro Cc: Christoph Hellwig Cc: Vyacheslav Dubeyko Cc: Sougata Santra Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 2a0538e207c245fa649a090eaa3585302eb950e5 Author: Heiko Stübner Date: Sun Jun 21 21:52:52 2015 +0200 net: stmmac: dwmac-rk: Fix clk rate when provided by soc commit c48fa33c1fb2ccdb4bcc863a7b841f11efe0f8b0 upstream. The first iteration of the dwmac-rk support did access an intermediate clock directly below the pll selector. This was removed in a subsequent revision, but the clock and one invocation remained. This results in the driver trying to set the rate of a non-existent clock when the soc and not some external source provides the phy clock for RMII phys. So set the rate of the correct clock and remove the remaining now completely unused definition. Fixes: 436f5ae08f9d ("GMAC: add driver for Rockchip RK3288 SoCs integrated GMAC") Signed-off-by: Heiko Stuebner Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4d95fbcd38406f6747aeb86560a15a95e4527cc4 Author: Alexey Brodkin Date: Wed Jun 24 11:47:41 2015 +0300 stmmac: troubleshoot unexpected bits in des0 & des1 commit f1590670ce069eefeb93916391a67643e6ad1630 upstream. Current implementation of descriptor init procedure only takes care about setting/clearing ownership flag in "des0"/"des1" fields while it is perfectly possible to get unexpected bits set because of the following factors: [1] On driver probe underlying memory allocated with dma_alloc_coherent() might not be zeroed and so it will be filled with garbage. [2] During driver operation some bits could be set by SD/MMC controller (for example error flags etc). And unexpected and/or randomly set flags in "des0"/"des1" fields may lead to unpredictable behavior of GMAC DMA block. This change addresses both items above with: [1] Use of dma_zalloc_coherent() instead of simple dma_alloc_coherent() to make sure allocated memory is zeroed. That shouldn't affect performance because this allocation only happens once on driver probe. [2] Do explicit zeroing of both "des0" and "des1" fields of all buffer descriptors during initialization of DMA transfer. And while at it fixed identation of dma_free_coherent() counterpart as well. Signed-off-by: Alexey Brodkin Cc: Giuseppe Cavallaro Cc: arc-linux-dev@synopsys.com Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Cc: David Miller Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 79866f313766dd2e8e118053e1eb8bd5abd608ac Author: Alexey Brodkin Date: Wed Sep 9 18:01:08 2015 +0300 stmmac: fix check for phydev being open commit dfc50fcaad574e5c8c85cbc83eca1426b2413fa4 upstream. Current check of phydev with IS_ERR(phydev) may make not much sense because of_phy_connect() returns NULL on failure instead of error value. Still for checking result of phy_connect() IS_ERR() makes perfect sense. So let's use combined check IS_ERR_OR_NULL() that covers both cases. Cc: Sergei Shtylyov Cc: Giuseppe Cavallaro Cc: linux-kernel@vger.kernel.org Cc: David Miller Signed-off-by: Alexey Brodkin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6a85820d32b8f908e2dd41b75977fb77768118c2 Author: Ariel Nahum Date: Sun Aug 9 11:16:27 2015 +0300 IB/mlx4: Fix incorrect cq flushing in error state commit 799cdaf8a98f13d4fba3162e21e1e63f21045010 upstream. When handling a device internal error, the driver is responsible to drain the completion queue with flush errors. In case a completion queue was assigned to multiple send queues, the driver iterates over the send queues and generates flush errors of inflight wqes. The driver must correctly pass the wc array with an offset as a result of the previous send queue iteration. Not doing so will overwrite previously set completions and return a wrong number of polled completions which includes ones which were not correctly set. Fixes: 35f05dabf95a (IB/mlx4: Reset flow support for IB kernel ULPs) Signed-off-by: Ariel Nahum Signed-off-by: Sagi Grimberg Cc: Yishai Hadas Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit c6632eb0bba08cb911f19ef3a96ff7df346c80ae Author: Noa Osherovich Date: Thu Jul 30 17:34:24 2015 +0300 IB/mlx4: Use correct SL on AH query under RoCE commit 5e99b139f1b68acd65e36515ca347b03856dfb5a upstream. The mlx4 IB driver implementation for ib_query_ah used a wrong offset (28 instead of 29) when link type is Ethernet. Fixed to use the correct one. Fixes: fa417f7b520e ('IB/mlx4: Add support for IBoE') Signed-off-by: Shani Michaeli Signed-off-by: Noa Osherovich Signed-off-by: Or Gerlitz Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit 6c634f2d7447b3056f69c4c6442e5eeef839a441 Author: Jack Morgenstein Date: Thu Jul 30 17:34:23 2015 +0300 IB/mlx4: Forbid using sysfs to change RoCE pkeys commit 2b135db3e81301d0452e6aa107349abe67b097d6 upstream. The pkey mapping for RoCE must remain the default mapping: VFs: virtual index 0 = mapped to real index 0 (0xFFFF) All others indices: mapped to a real pkey index containing an invalid pkey. PF: virtual index i = real index i. Don't allow users to change these mappings using files found in sysfs. Fixes: c1e7e466120b ('IB/mlx4: Add iov directory in sysfs under the ib device') Signed-off-by: Jack Morgenstein Signed-off-by: Or Gerlitz Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit fe5e52bf0b5b8a2ae1fefdb359fd8616ab7c9815 Author: Jack Morgenstein Date: Thu Jul 30 17:34:21 2015 +0300 IB/mlx4: Fix potential deadlock when sending mad to wire commit 90c1d8b6350cca9d8a234f03c77a317a7613bcee upstream. send_mad_to_wire takes the same spinlock that is taken in the interrupt context. Therefore, it needs irqsave/restore. Fixes: b9c5d6a64358 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV') Signed-off-by: Jack Morgenstein Signed-off-by: Or Gerlitz Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit 26783066148258ea62996c30d9c063dd193cb7a2 Author: Haggai Eran Date: Tue Sep 1 09:56:56 2015 +0300 IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow commit 11d748045c6dadb279d1acdb6d2ea8f3f2ede85b upstream. The mlx5_ib_reg_user_mr() function will attempt to call clean_mr() in its error flow even though there is never a case where the error flow occurs with a valid MR pointer to destroy. Remove the clean_mr() call and the incorrect comment above it. Fixes: b4cfe447d47b ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers") Cc: Eli Cohen Signed-off-by: Haggai Eran Reviewed-by: Sagi Grimberg Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit 6465d5d793ecc69c78c1ad7da2e4bbd13de9f063 Author: Sagi Grimberg Date: Thu Aug 6 18:32:50 2015 +0300 IB/iser: Fix possible bogus DMA unmapping commit 8d5944d80359e645feb2ebd069a6f4caf7825e40 upstream. If iser_initialize_task_headers() routine failed before dma mapping, we should not attempt to unmap in cleanup_task(). Fixes: 7414dde0a6c3a958e (IB/iser: Fix race between iser connection ...) Signed-off-by: Sagi Grimberg Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit cdb6f383011c73248db3974bc5175ba45053005d Author: Sagi Grimberg Date: Thu Aug 6 18:32:48 2015 +0300 IB/iser: Fix missing return status check in iser_send_data_out commit d16739055bd1f562ae4d83e69f7f7f1cefcfbe16 upstream. Since commit "IB/iser: Fix race between iser connection teardown..." iser_initialize_task_headers() might fail, so we need to check that. Fixes: 7414dde0a6c3a958e (IB/iser: Fix race between iser connection ...) Signed-off-by: Sagi Grimberg Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit ce3e4e26260af851fb0188306c8bb47748955ef8 Author: Yishai Hadas Date: Thu Aug 13 18:32:03 2015 +0300 IB/uverbs: Fix race between ib_uverbs_open and remove_one commit 35d4a0b63dc0c6d1177d4f532a9deae958f0662c upstream. Fixes: 2a72f212263701b927559f6850446421d5906c41 ("IB/uverbs: Remove dev_table") Before this commit there was a device look-up table that was protected by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When it was dropped and container_of was used instead, it enabled the race with remove_one as dev might be freed just after: dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but before the kref_get. In addition, this buggy patch added some dead code as container_of(x,y,z) can never be NULL and so dev can never be NULL. As a result the comment above ib_uverbs_open saying "the open method will either immediately run -ENXIO" is wrong as it can never happen. The solution follows Jason Gunthorpe suggestion from below URL: https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html cdev will hold a kref on the parent (the containing structure, ib_uverbs_device) and only when that kref is released it is guaranteed that open will never be called again. In addition, fixes the active count scheme to use an atomic not a kref to prevent WARN_ON as pointed by above comment from Jason. Signed-off-by: Yishai Hadas Signed-off-by: Shachar Raindel Reviewed-by: Jason Gunthorpe Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit 909246d01e085df6d6808fea35146910c0f30b09 Author: Christoph Hellwig Date: Wed Aug 26 11:00:37 2015 +0200 IB/uverbs: reject invalid or unknown opcodes commit b632ffa7cee439ba5dce3b3bc4a5cbe2b3e20133 upstream. We have many WR opcodes that are only supported in kernel space and/or require optional information to be copied into the WR structure. Reject all those not explicitly handled so that we can't pass invalid information to drivers. Signed-off-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Sagi Grimberg Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit 780dbff3f4b9613a36879f6defe69572a3f1c934 Author: Mike Marciniszyn Date: Tue Jul 21 08:36:07 2015 -0400 IB/qib: Change lkey table allocation to support more MRs commit d6f1c17e162b2a11e708f28fa93f2f79c164b442 upstream. The lkey table is allocated with with a get_user_pages() with an order based on a number of index bits from a module parameter. The underlying kernel code cannot allocate that many contiguous pages. There is no reason the underlying memory needs to be physically contiguous. This patch: - switches the allocation/deallocation to vmalloc/vfree - caps the number of bits to 23 to insure at least 1 generation bit o this matches the module parameter description Reviewed-by: Vinit Agnihotri Signed-off-by: Mike Marciniszyn Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit b25b17c50502368c2306a449ce217d34f8e0b287 Author: Bart Van Assche Date: Fri Aug 14 11:01:09 2015 -0700 IB/srp: Stop the scsi_eh_ and scsi_tmf_ threads if login fails commit bc44bd1d864664f3658352c6aaaa02557d49165d upstream. scsi_host_alloc() not only allocates memory for a SCSI host but also creates the scsi_eh_ kernel thread and the scsi_tmf_ workqueue. Stop these threads if login fails by calling scsi_host_put(). Reported-by: Konstantin Krotov Fixes: fb49c8bbaae7 ("Remove an extraneous scsi_host_put() from an error path") Signed-off-by: Bart Van Assche Cc: Sagi Grimberg Cc: Sebastian Parschauer Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit bff7d8e55354c86893324b85eed532cb0c94f965 Author: Bart Van Assche Date: Fri Jul 31 14:13:22 2015 -0700 IB/srp: Handle partial connection success correctly commit c257ea6f9f9aed0b173e0c2932bb8dac5612cdc6 upstream. Avoid that the following kernel warning is reported if the SRP target system accepts fewer channels per connection than what was requested by the initiator system: WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:617 srp_destroy_qp+0xb1/0x120 [ib_srp]() Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] srp_destroy_qp+0xb1/0x120 [ib_srp] [] srp_create_ch_ib+0x19b/0x420 [ib_srp] [] srp_create_target+0x7d7/0xa94 [ib_srp] [] dev_attr_store+0x20/0x30 [] sysfs_write_file+0xef/0x170 [] vfs_write+0xb4/0x130 [] sys_write+0x5f/0xa0 [] system_call_fastpath+0x16/0x1b Signed-off-by: Bart Van Assche Cc: Sagi Grimberg Cc: Sebastian Parschauer Cc: Christoph Hellwig Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit 0796f55ba4db268f3a6a6c6bb9e481ce9eaf5d48 Author: Hans de Goede Date: Mon Aug 10 10:45:45 2015 +0200 ideapad-laptop: Add Lenovo Yoga 3 14 to no_hw_rfkill dmi list commit fa92a31b3335478c545cdc8e79e1e9b788184e6b upstream. Like some of the other Yoga models the Lenovo Yoga 3 14 does not have a hw rfkill switch, and trying to read the hw rfkill switch through the ideapad module causes it to always reported blocking breaking wifi. This commit adds the Lenovo Yoga 3 14 to the no_hw_rfkill dmi list, fixing the wifi breakage. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1239050 Signed-off-by: Hans de Goede Signed-off-by: Darren Hart Signed-off-by: Greg Kroah-Hartman commit 919a78e3e4df2c28f8b6c1aff5d6d9d9beff0d52 Author: Hin-Tak Leung Date: Wed Sep 9 15:38:07 2015 -0700 hfs: fix B-tree corruption after insertion at position 0 commit b4cc0efea4f0bfa2477c56af406cfcf3d3e58680 upstream. Fix B-tree corruption when a new record is inserted at position 0 in the node in hfs_brec_insert(). This is an identical change to the corresponding hfs b-tree code to Sergei Antonov's "hfsplus: fix B-tree corruption after insertion at position 0", to keep similar code paths in the hfs and hfsplus drivers in sync, where appropriate. Signed-off-by: Hin-Tak Leung Cc: Sergei Antonov Cc: Joe Perches Reviewed-by: Vyacheslav Dubeyko Cc: Anton Altaparmakov Cc: Al Viro Cc: Christoph Hellwig Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 22fda415e1316c6077277341cd5d8679f6f630fa Author: Tyler Hicks Date: Wed Aug 5 11:26:36 2015 -0500 eCryptfs: Invalidate dcache entries when lower i_nlink is zero commit 5556e7e6d30e8e9b5ee51b0e5edd526ee80e5e36 upstream. Consider eCryptfs dcache entries to be stale when the corresponding lower inode's i_nlink count is zero. This solves a problem caused by the lower inode being directly modified, without going through the eCryptfs mount, leaving stale eCryptfs dentries cached and the eCryptfs inode's i_nlink count not being cleared. Signed-off-by: Tyler Hicks Reported-by: Richard Weinberger Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit feab9b2a1002c23d415f7a2db4f4777c60d3eb52 Author: Joerg Roedel Date: Tue Aug 25 10:54:28 2015 +0200 iommu/vt-d: Really use upper context table when necessary commit 4df4eab168c1c4058603be55a3169d4a45779cc0 upstream. There is a bug in iommu_context_addr() which will always use the lower context table, even when the upper context table needs to be used. Fix this issue. Fixes: 03ecc32c5274 ("iommu/vt-d: support extended root and context entries") Reported-by: Xiao, Nan Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit 1f398b31044845397b5e2fefdb7ac247180e6a7c Author: Thierry Reding Date: Thu Aug 6 14:20:31 2015 +0200 iommu/tegra-smmu: Parameterize number of TLB lines commit 11cec15bf3fb498206ef63b1fa26c27689e02d0e upstream. The number of TLB lines was increased from 16 on Tegra30 to 32 on Tegra114 and later. Parameterize the value so that the initial default can be set accordingly. On Tegra30, initializing the value to 32 would effectively disable the TLB and hence cause massive latencies for memory accesses translated through the SMMU. This is especially noticeable for isochronuous clients such as display, whose FIFOs would continuously underrun. Fixes: 891846516317 ("memory: Add NVIDIA Tegra memory controller support") Signed-off-by: Thierry Reding Signed-off-by: Greg Kroah-Hartman commit fb2908b509749657cab5f7dbda2f24add8668969 Author: Will Deacon Date: Tue Aug 11 16:48:32 2015 +0100 iommu/io-pgtable-arm: Unmap and free table when overwriting with block commit cf27ec930be906e142c752f9161197d69ca534d7 upstream. When installing a block mapping, we unconditionally overwrite a non-leaf PTE if we find one. However, this can cause a problem if the following sequence of events occur: (1) iommu_map called for a 4k (i.e. PAGE_SIZE) mapping at some address - We initialise the page table all the way down to a leaf entry - No TLB maintenance is required, because we're going from invalid to valid. (2) iommu_unmap is called on the mapping installed in (1) - We walk the page table to the final (leaf) entry and zero it - We only changed a valid leaf entry, so we invalidate leaf-only (3) iommu_map is called on the same address as (1), but this time for a 2MB (i.e. BLOCK_SIZE) mapping) - We walk the page table down to the penultimate level, where we find a table entry - We overwrite the table entry with a block mapping and return without any TLB maintenance and without freeing the memory used by the now-orphaned table. This last step can lead to a walk-cache caching the overwritten table entry, causing unexpected faults when the new mapping is accessed by a device. One way to fix this would be to collapse the page table when freeing the last page at a given level, but this would require expensive iteration on every map call. Instead, this patch detects the case when we are overwriting a table entry and explicitly unmaps the table first, which takes care of both freeing and TLB invalidation. Reported-by: Brian Starkey Tested-by: Brian Starkey Signed-off-by: Will Deacon Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit 70ebd071ea1a363f859a2b2480bfe12cb4cf93b3 Author: Emil Medve Date: Wed Mar 25 00:28:48 2015 -0500 iommu/fsl: Really fix init section(s) content commit 57fb907da89977640ef183556a621336c1348fa0 upstream. '0f1fb99 iommu/fsl: Fix section mismatch' was intended to address the modpost warning and the potential crash. Crash which is actually easy to trigger with a 'unbind' followed by a 'bind' sequence. The fix is wrong as fsl_of_pamu_driver.driver gets added by bus_add_driver() to a couple of klist(s) which become invalid/corrupted as soon as the init sections are freed. Depending on when/how the init sections storage is reused various/random errors and crashes will happen 'cd70d46 iommu/fsl: Various cleanups' contains annotations that go further down the wrong path laid by '0f1fb99 iommu/fsl: Fix section mismatch' Now remove all the incorrect annotations from the above mentioned patches (not exactly a revert) and those previously existing in the code, This fixes the modpost warning(s), the unbind/bind sequence crashes and the random errors/crashes Fixes: 0f1fb99b62ce ("iommu/fsl: Fix section mismatch") Fixes: cd70d4659ff3 ("iommu/fsl: Various cleanups") Signed-off-by: Emil Medve Acked-by: Varun Sethi Tested-by: Madalin Bucur Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit 18c45d9c8e9a8449f73b333eb1dd354d88ceb186 Author: NeilBrown Date: Wed Jul 22 10:20:07 2015 +1000 md: flush ->event_work before stopping array. commit ee5d004fd0591536a061451eba2b187092e9127c upstream. The 'event_work' worker used by dm-raid may still be running when the array is stopped. This can result in an oops. So flush the workqueue on which it is run after detaching and before destroying the device. Reported-by: Heinz Mauelshagen Signed-off-by: NeilBrown Fixes: 9d09e663d550 ("dm: raid456 basic support") Signed-off-by: Greg Kroah-Hartman commit ae286448cf64d37128b74c72b9b5435da7e4ba17 Author: NeilBrown Date: Mon Jul 6 17:37:49 2015 +1000 md/raid10: always set reshape_safe when initializing reshape_position. commit 299b0685e31c9f3dcc2d58ee3beca761a40b44b3 upstream. 'reshape_position' tracks where in the reshape we have reached. 'reshape_safe' tracks where in the reshape we have safely recorded in the metadata. These are compared to determine when to update the metadata. So it is important that reshape_safe is initialised properly. Currently it isn't. When starting a reshape from the beginning it usually has the correct value by luck. But when reducing the number of devices in a RAID10, it has the wrong value and this leads to the metadata not being updated correctly. This can lead to corruption if the reshape is not allowed to complete. This patch is suitable for any -stable kernel which supports RAID10 reshape, which is 3.5 and later. Fixes: 3ea7daa5d7fd ("md/raid10: add reshape support") Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman commit d7edf5fe979269233c9d20e5101ae004df05070e Author: NeilBrown Date: Mon Aug 3 17:09:57 2015 +1000 md/raid5: don't let shrink_slab shrink too far. commit 49895bcc7e566ba455eb2996607d6fbd3447ce16 upstream. I have a report of drop_one_stripe() called from raid5_cache_scan() apparently finding ->max_nr_stripes == 0. This should not be allowed. So add a test to keep max_nr_stripes above min_nr_stripes. Also use a 'mask' rather than a 'mod' in drop_one_stripe to ensure 'hash' is valid even if max_nr_stripes does reach zero. Fixes: edbe83ab4c27 ("md/raid5: allow the stripe_cache to grow and shrink.") Reported-by: Tomas Papan Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman commit ed8b312450dae5e5a9e5bc022e1261a2e6414984 Author: NeilBrown Date: Mon Jul 6 12:49:23 2015 +1000 md/raid5: avoid races when changing cache size. commit 2d5b569b665ea6d0b15c52529ff06300de81a7ce upstream. Cache size can grow or shrink due to various pressures at any time. So when we resize the cache as part of a 'grow' operation (i.e. change the size to allow more devices) we need to blocks that automatic growing/shrinking. So introduce a mutex. auto grow/shrink uses mutex_trylock() and just doesn't bother if there is a blockage. Resizing the whole cache holds the mutex to ensure that the correct number of new stripes is allocated. This bug can result in some stripes not being freed when an array is stopped. This leads to the kmem_cache not being freed and a subsequent array can try to use the same kmem_cache and get confused. Fixes: edbe83ab4c27 ("md/raid5: allow the stripe_cache to grow and shrink.") Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman commit cd6a4dd89b4cdcc220c255ba5fe29b677d431cbd Author: Jialing Fu Date: Fri Aug 28 11:13:09 2015 +0800 mmc: core: fix race condition in mmc_wait_data_done commit 71f8a4b81d040b3d094424197ca2f1bf811b1245 upstream. The following panic is captured in ker3.14, but the issue still exists in latest kernel. --------------------------------------------------------------------- [ 20.738217] c0 3136 (Compiler) Unable to handle kernel NULL pointer dereference at virtual address 00000578 ...... [ 20.738499] c0 3136 (Compiler) PC is at _raw_spin_lock_irqsave+0x24/0x60 [ 20.738527] c0 3136 (Compiler) LR is at _raw_spin_lock_irqsave+0x20/0x60 [ 20.740134] c0 3136 (Compiler) Call trace: [ 20.740165] c0 3136 (Compiler) [] _raw_spin_lock_irqsave+0x24/0x60 [ 20.740200] c0 3136 (Compiler) [] __wake_up+0x1c/0x54 [ 20.740230] c0 3136 (Compiler) [] mmc_wait_data_done+0x28/0x34 [ 20.740262] c0 3136 (Compiler) [] mmc_request_done+0xa4/0x220 [ 20.740314] c0 3136 (Compiler) [] sdhci_tasklet_finish+0xac/0x264 [ 20.740352] c0 3136 (Compiler) [] tasklet_action+0xa0/0x158 [ 20.740382] c0 3136 (Compiler) [] __do_softirq+0x10c/0x2e4 [ 20.740411] c0 3136 (Compiler) [] irq_exit+0x8c/0xc0 [ 20.740439] c0 3136 (Compiler) [] handle_IRQ+0x48/0xac [ 20.740469] c0 3136 (Compiler) [] gic_handle_irq+0x38/0x7c ---------------------------------------------------------------------- Because in SMP, "mrq" has race condition between below two paths: path1: CPU0: static void mmc_wait_data_done(struct mmc_request *mrq) { mrq->host->context_info.is_done_rcv = true; // // If CPU0 has just finished "is_done_rcv = true" in path1, and at // this moment, IRQ or ICache line missing happens in CPU0. // What happens in CPU1 (path2)? // // If the mmcqd thread in CPU1(path2) hasn't entered to sleep mode: // path2 would have chance to break from wait_event_interruptible // in mmc_wait_for_data_req_done and continue to run for next // mmc_request (mmc_blk_rw_rq_prep). // // Within mmc_blk_rq_prep, mrq is cleared to 0. // If below line still gets host from "mrq" as the result of // compiler, the panic happens as we traced. wake_up_interruptible(&mrq->host->context_info.wait); } path2: CPU1: static int mmc_wait_for_data_req_done(... { ... while (1) { wait_event_interruptible(context_info->wait, (context_info->is_done_rcv || context_info->is_new_req)); static void mmc_blk_rw_rq_prep(... { ... memset(brq, 0, sizeof(struct mmc_blk_request)); This issue happens very coincidentally; however adding mdelay(1) in mmc_wait_data_done as below could duplicate it easily. static void mmc_wait_data_done(struct mmc_request *mrq) { mrq->host->context_info.is_done_rcv = true; + mdelay(1); wake_up_interruptible(&mrq->host->context_info.wait); } At runtime, IRQ or ICache line missing may just happen at the same place of the mdelay(1). This patch gets the mmc_context_info at the beginning of function, it can avoid this race condition. Signed-off-by: Jialing Fu Tested-by: Shawn Lin Fixes: 2220eedfd7ae ("mmc: fix async request mechanism ....") Signed-off-by: Shawn Lin Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman commit cea49b29549b4eec81dad4ade3acc7fc175cfccd Author: Jisheng Zhang Date: Tue Aug 18 16:21:39 2015 +0800 mmc: sdhci: also get preset value and driver type for MMC_DDR52 commit 0dafa60eb2506617e6968b97cc5a44914a7fb1a6 upstream. commit bb8175a8aa42 ("mmc: sdhci: clarify DDR timing mode between SD-UHS and eMMC") added MMC_DDR52 as eMMC's DDR mode to be distinguished from SD-UHS, but it missed setting driver type for MMC_DDR52 timing mode. So sometimes we get the following error on Marvell BG2Q DMP board: [ 1.559598] mmcblk0: error -84 transferring data, sector 0, nr 8, cmd response 0x900, card status 0xb00 [ 1.569314] mmcblk0: retrying using single block read [ 1.575676] mmcblk0: error -84 transferring data, sector 2, nr 6, cmd response 0x900, card status 0x0 [ 1.585202] blk_update_request: I/O error, dev mmcblk0, sector 2 [ 1.591818] mmcblk0: error -84 transferring data, sector 3, nr 5, cmd response 0x900, card status 0x0 [ 1.601341] blk_update_request: I/O error, dev mmcblk0, sector 3 This patches fixes this by adding the missing driver type setting. Fixes: bb8175a8aa42 ("mmc: sdhci: clarify DDR timing mode ...") Signed-off-by: Jisheng Zhang Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman commit 2b1e7d58a82a911fae3dcb762590a68e78806434 Author: Adam Lee Date: Mon Aug 3 14:33:28 2015 +0800 mmc: sdhci-pci: set the clear transfer mode register quirk for O2Micro commit 143b648ddf1583905fa15d32be27a31442fc7933 upstream. This patch fixes MMC not working issue on O2Micro/BayHub Host, which requires transfer mode register to be cleared when sending no DMA command. Signed-off-by: Peter Guo Signed-off-by: Adam Lee Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman commit 2be9c8262419a2db45e7461b1eb26ead770a4438 Author: Jann Horn Date: Wed Sep 9 15:38:30 2015 -0700 fs: Don't dump core if the corefile would become world-readable. commit 40f705a736eac10e7dca7ab5dd5ed675a6df031d upstream. On a filesystem like vfat, all files are created with the same owner and mode independent of who created the file. When a vfat filesystem is mounted with root as owner of all files and read access for everyone, root's processes left world-readable coredumps on it (but other users' processes only left empty corefiles when given write access because of the uid mismatch). Given that the old behavior was inconsistent and insecure, I don't see a problem with changing it. Now, all processes refuse to dump core unless the resulting corefile will only be readable by their owner. Signed-off-by: Jann Horn Acked-by: Kees Cook Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 244d3c13db7a94ceecfb591fba716a525d53aa38 Author: Jann Horn Date: Wed Sep 9 15:38:28 2015 -0700 fs: if a coredump already exists, unlink and recreate with O_EXCL commit fbb1816942c04429e85dbf4c1a080accc534299e upstream. It was possible for an attacking user to trick root (or another user) into writing his coredumps into an attacker-readable, pre-existing file using rename() or link(), causing the disclosure of secret data from the victim process' virtual memory. Depending on the configuration, it was also possible to trick root into overwriting system files with coredumps. Fix that issue by never writing coredumps into existing files. Requirements for the attack: - The attack only applies if the victim's process has a nonzero RLIMIT_CORE and is dumpable. - The attacker can trick the victim into coredumping into an attacker-writable directory D, either because the core_pattern is relative and the victim's cwd is attacker-writable or because an absolute core_pattern pointing to a world-writable directory is used. - The attacker has one of these: A: on a system with protected_hardlinks=0: execute access to a folder containing a victim-owned, attacker-readable file on the same partition as D, and the victim-owned file will be deleted before the main part of the attack takes place. (In practice, there are lots of files that fulfill this condition, e.g. entries in Debian's /var/lib/dpkg/info/.) This does not apply to most Linux systems because most distros set protected_hardlinks=1. B: on a system with protected_hardlinks=1: execute access to a folder containing a victim-owned, attacker-readable and attacker-writable file on the same partition as D, and the victim-owned file will be deleted before the main part of the attack takes place. (This seems to be uncommon.) C: on any system, independent of protected_hardlinks: write access to a non-sticky folder containing a victim-owned, attacker-readable file on the same partition as D (This seems to be uncommon.) The basic idea is that the attacker moves the victim-owned file to where he expects the victim process to dump its core. The victim process dumps its core into the existing file, and the attacker reads the coredump from it. If the attacker can't move the file because he does not have write access to the containing directory, he can instead link the file to a directory he controls, then wait for the original link to the file to be deleted (because the kernel checks that the link count of the corefile is 1). A less reliable variant that requires D to be non-sticky works with link() and does not require deletion of the original link: link() the file into D, but then unlink() it directly before the kernel performs the link count check. On systems with protected_hardlinks=0, this variant allows an attacker to not only gain information from coredumps, but also clobber existing, victim-writable files with coredumps. (This could theoretically lead to a privilege escalation.) Signed-off-by: Jann Horn Cc: Kees Cook Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 154dff393c2a7b554469f626e442901ce8acc905 Author: Jaewon Kim Date: Tue Sep 8 15:02:21 2015 -0700 vmscan: fix increasing nr_isolated incurred by putback unevictable pages commit c54839a722a02818677bcabe57e957f0ce4f841d upstream. reclaim_clean_pages_from_list() assumes that shrink_page_list() returns number of pages removed from the candidate list. But shrink_page_list() puts back mlocked pages without passing it to caller and without counting as nr_reclaimed. This increases nr_isolated. To fix this, this patch changes shrink_page_list() to pass unevictable pages back to caller. Caller will take care those pages. Minchan said: It fixes two issues. 1. With unevictable page, cma_alloc will be successful. Exactly speaking, cma_alloc of current kernel will fail due to unevictable pages. 2. fix leaking of NR_ISOLATED counter of vmstat With it, too_many_isolated works. Otherwise, it could make hang until the process get SIGKILL. Signed-off-by: Jaewon Kim Acked-by: Minchan Kim Cc: Mel Gorman Acked-by: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 804a6f7fb449d20d7fa560d040c75e95d8faf3e3 Author: Helge Deller Date: Thu Sep 3 22:45:21 2015 +0200 parisc: Filter out spurious interrupts in PA-RISC irq handler commit b1b4e435e4ef7de77f07bf2a42c8380b960c2d44 upstream. When detecting a serial port on newer PA-RISC machines (with iosapic) we have a long way to go to find the right IRQ line, registering it, then registering the serial port and the irq handler for the serial port. During this phase spurious interrupts for the serial port may happen which then crashes the kernel because the action handler might not have been set up yet. So, basically it's a race condition between the serial port hardware and the CPU which sets up the necessary fields in the irq sructs. The main reason for this race is, that we unmask the serial port irqs too early without having set up everything properly before (which isn't easily possible because we need the IRQ number to register the serial ports). This patch is a work-around for this problem. It adds checks to the CPU irq handler to verify if the IRQ action field has been initialized already. If not, we just skip this interrupt (which isn't critical for a serial port at bootup). The real fix would probably involve rewriting all PA-RISC specific IRQ code (for CPU, IOSAPIC, GSC and EISA) to use IRQ domains with proper parenting of the irq chips and proper irq enabling along this line. This bug has been in the PA-RISC port since the beginning, but the crashes happened very rarely with currently used hardware. But on the latest machine which I bought (a C8000 workstation), which uses the fastest CPUs (4 x PA8900, 1GHz) and which has the largest possible L1 cache size (64MB each), the kernel crashed at every boot because of this race. So, without this patch the machine would currently be unuseable. For the record, here is the flow logic: 1. serial_init_chip() in 8250_gsc.c calls iosapic_serial_irq(). 2. iosapic_serial_irq() calls txn_alloc_irq() to find the irq. 3. iosapic_serial_irq() calls cpu_claim_irq() to register the CPU irq 4. cpu_claim_irq() unmasks the CPU irq (which it shouldn't!) 5. serial_init_chip() then registers the 8250 port. Problems: - In step 4 the CPU irq shouldn't have been registered yet, but after step 5 - If serial irq happens between 4 and 5 have finished, the kernel will crash Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit f39b5f920977852886e4491d354e72b95935ec0b Author: John David Anglin Date: Mon Sep 7 20:13:28 2015 -0400 parisc: Use double word condition in 64bit CAS operation commit 1b59ddfcf1678de38a1f8ca9fb8ea5eebeff1843 upstream. The attached change fixes the condition used in the "sub" instruction. A double word comparison is needed. This fixes the 64-bit LWS CAS operation on 64-bit kernels. I can now enable 64-bit atomic support in GCC. Signed-off-by: John David Anglin Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit 645305df79c555470ab6728561314ea852e2612f Author: Helge Deller Date: Wed Sep 2 18:17:29 2015 +0200 PCI,parisc: Enable 64-bit bus addresses on PA-RISC commit e02a653e15d8d32e9e768fd99a3271aafe5c5d77 upstream. Commit 3a9ad0b ("PCI: Add pci_bus_addr_t") unconditionally introduced usage of 64-bit PCI bus addresses on all 64-bit platforms which broke PA-RISC. It turned out that due to enabling the 64-bit addresses, the PCI logic decided to use the GMMIO instead of the LMMIO region. This commit simply disables registering the GMMIO and thus we fall back to use the LMMIO region as before. Reverts commit 45ea2a5fed6dacb9bb0558d8b21eacc1c45d5bb4 ("PCI: Don't use 64-bit bus addresses on PA-RISC") To: linux-parisc@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: Bjorn Helgaas Cc: Meelis Roos Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit 1a64393e4a48bb45c6b50a557bf719d0c4d01045 Author: Mitja Spes Date: Wed Sep 2 10:02:29 2015 +0200 rtc: abx80x: fix RTC write bit commit 5f1b2f77646fc0ef2f36fc554f5722a1381d0892 upstream. Fix RTC write bit as per application manual Signed-off-by: Mitja Spes Signed-off-by: Alexandre Belloni Signed-off-by: Greg Kroah-Hartman commit 4530473fda002bbcb567419d55f5bc9f7f6541fc Author: Joonyoung Shim Date: Fri Aug 21 18:43:41 2015 +0900 rtc: s5m: fix to update ctrl register commit ff02c0444b83201ff76cc49deccac8cf2bffc7bc upstream. According to datasheet, the S2MPS13X and S2MPS14X should update write buffer via setting WUDR bit to high after ctrl register is written. If not, ALARM interrupt of rtc-s5m doesn't happen first time when i use tools/testing/selftests/timers/rtctest.c test program and hour format is used to 12 hour mode in Odroid-XU3 board. One more issue is the RTC doesn't keep time on Odroid-XU3 board when i turn on board after power off even if RTC battery is connected. It can be solved as setting WUDR & RUDR bits to high at the same time after RTC_CTRL register is written. It's same with condition of only writing ALARM registers, so this is for only S2MPS14 and we should set WUDR & A_UDR bits to high on S2MPS13. I can't find any reasonable description about this like fix from datasheet, but can find similar codes from rtc driver source of hardkernel kernel and vendor kernel. Signed-off-by: Joonyoung Shim Reviewed-by: Krzysztof Kozlowski Tested-by: Krzysztof Kozlowski Signed-off-by: Alexandre Belloni Signed-off-by: Greg Kroah-Hartman commit 68912df901b9807a708cd1599b30fa56548d15e1 Author: Joonyoung Shim Date: Wed Aug 12 19:21:46 2015 +0900 rtc: s3c: fix disabled clocks for alarm commit 1fb1c35f56bb6ab4a65920c648154b0f78f634a5 upstream. The clock enable/disable codes for alarm have been removed from commit 24e1455493da ("drivers/rtc/rtc-s3c.c: delete duplicate clock control") and the clocks are disabled even if alarm is set, so alarm interrupt can't happen. The s3c_rtc_setaie function can be called several times with 'enabled' argument having same value, so it needs to check whether clocks are enabled or not. Signed-off-by: Joonyoung Shim Reviewed-by: Krzysztof Kozlowski Signed-off-by: Alexandre Belloni Signed-off-by: Greg Kroah-Hartman commit 85d1ba73e42e8317b21ba71ab277fb762a45e6b0 Author: Trond Myklebust Date: Fri Sep 18 15:53:24 2015 -0400 SUNRPC: Lock the transport layer on shutdown commit 79234c3db6842a3de03817211d891e0c2878f756 upstream. Avoid all races with the connect/disconnect handlers by taking the transport lock. Reported-by:"Suzuki K. Poulose" Acked-by: Jeff Layton Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 77bb3c931d577185760c59428a87e8966a126b0b Author: Trond Myklebust Date: Wed Sep 16 23:43:17 2015 -0400 SUNRPC: Ensure that we wait for connections to complete before retrying commit 0fdea1e8a2853f79d39b8555cc9de16a7e0ab26f upstream. Commit 718ba5b87343, moved the responsibility for unlocking the socket to xs_tcp_setup_socket, meaning that the socket will be unlocked before we know that it has finished trying to connect. The following patch is based on an initial patch by Russell King to ensure that we delay clearing the XPRT_CONNECTING flag until we either know that we failed to initiate a connection attempt, or the connection attempt itself failed. Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing") Reported-by: Russell King Reported-by: Russell King Tested-by: Russell King Tested-by: Benjamin Coddington Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit f160db25e9c4baa65871f4f7972064f1d7ecb160 Author: Trond Myklebust Date: Sat Aug 29 13:36:30 2015 -0700 SUNRPC: xs_reset_transport must mark the connection as disconnected commit 0c78789e3a030615c6650fde89546cadf40ec2cc upstream. In case the reconnection attempt fails. Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit fc56e1157e720c07a5ef553f492349af547eb60c Author: Trond Myklebust Date: Thu Aug 13 15:33:51 2015 -0400 SUNRPC: Fix a thinko in xs_connect() commit 99b1a4c32ad22024ac6198a4337aaec5ea23168f upstream. It is rather pointless to test the value of transport->inet after calling xs_reset_transport(), since it will always be zero, and so we will never see any exponential back off behaviour. Also don't force early connections for SOFTCONN tasks. If the server disconnects us, we should respect the exponential backoff. Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 0e592fde0adf6047581a43ca057f17e919cb082d Author: Pratyush Anand Date: Thu Aug 27 10:01:33 2015 +0530 net: sunrpc: fix tracepoint Warning: unknown op '->' commit 051ac3848a94f21cfdec899cc9c65ce7f9f116fa upstream. `perf stat -e sunrpc:svc_xprt_do_enqueue true` results in Warning: unknown op '->' Warning: [sunrpc:svc_xprt_do_enqueue] unknown op '->' Similar warning for svc_handle_xprt as well. Actually TP_printk() should never dereference an address saved in the ring buffer that points somewhere in the kernel. There's no guarantee that that object still exists (with the exception of static strings). Therefore change all the arguments for TP_printk(), so that it references values existing in the ring buffer only. While doing that, also fix another possible bug when argument xprt could be NULL and TP_fast_assign() tries to access it's elements. Signed-off-by: Pratyush Anand Reviewed-by: Jeff Layton Acked-by: Steven Rostedt Fixes: 83a712e0afef "sunrpc: add some tracepoints around ..." Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman commit d40d9de9d3533ef6607342ec153b426b9d2b892e Author: Trond Myklebust Date: Wed Aug 19 00:14:20 2015 -0500 Revert "NFSv4: Remove incorrect check in can_open_delegated()" commit 36319608e28701c07cad80ae3be8b0fdfb1ab40f upstream. This reverts commit 4e379d36c050b0117b5d10048be63a44f5036115. This commit opens up a race between the recovery code and the open code. Reported-by: Olga Kornievskaia Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 3d5c6b90ed90eadcd9c66951877697d848e63ac1 Author: Trond Myklebust Date: Sun Aug 30 18:37:59 2015 -0700 NFSv4.1: Fix a protocol issue with CLOSE stateids commit 4a1e2feb9d246775dee0f78ed5b18826bae2b1c5 upstream. According to RFC5661 Section 18.2.4, CLOSE is supposed to return the zero stateid. This means that nfs_clear_open_stateid_locked() cannot assume that the result stateid will always match the 'other' field of the existing open stateid when trying to determine a race with a parallel OPEN. Instead, we look at the argument, and check for matches. Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit f6384199b2592674c7e6cfcfb4c0c54eacf992d7 Author: Trond Myklebust Date: Thu Aug 27 20:37:39 2015 -0400 NFSv4.1/flexfiles: Fix a protocol error in layoutreturn commit d13549074cf066d6d5bb29903d044beffea342d3 upstream. According to the flexfiles protocol, the layoutreturn should specify an array of errors in the following format: struct ff_ioerr4 { offset4 ffie_offset; length4 ffie_length; stateid4 ffie_stateid; device_error4 ffie_errors<>; }; This patch fixes up the code to ensure that our ffie_errors is indeed encoded as an array (albeit with only a single entry). Reported-by: Tom Haynes Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit b5a6dec79b2764584e5d5105f3f49bc58bf99513 Author: Peng Tao Date: Sat Aug 22 06:40:00 2015 +0800 NFS41/flexfiles: zero out DS write wcc commit 5420401079e152ff68a8024f6a375804b1c21505 upstream. We do not want to update inode attributes with DS values. Signed-off-by: Peng Tao Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 66bfdda4b2042bca88b429a55ca010d5ab94f991 Author: Trond Myklebust Date: Thu Aug 20 18:56:07 2015 -0500 NFSv4: Force a post-op attribute update when holding a delegation commit aaae3f00d3f67f681a1f3cb7af999e976e8a24ce upstream. If the ctime or mtime or change attribute have changed because of an operation we initiated, we should make sure that we force an attribute update. However we do not want to mark the page cache for revalidation. Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 73e8e7b2bbf68ddc590a0e725c2099fe0fa75b6a Author: Peng Tao Date: Thu Aug 20 01:52:59 2015 +0800 NFS41/flexfiles: update inode after write finishes commit 69f230d907e8c1ca3f9bd528993eeb98f712b0dd upstream. Otherwise we break fstest case tests/read_write/mctime.t Does files layout need the same fix as well? Cc: Anna Schumaker Signed-off-by: Peng Tao Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 8d1920be7e762a77468c5ea82aab1ac4358d8fc9 Author: Trond Myklebust Date: Mon Aug 17 12:57:07 2015 -0500 NFS: nfs_set_pgio_error sometimes misses errors commit e9ae58aeee8842a50f7e199d602a5ccb2e41a95f upstream. We should ensure that we always set the pgio_header's error field if a READ or WRITE RPC call returns an error. The current code depends on 'hdr->good_bytes' always being initialised to a large value, which is not always done correctly by callers. When this happens, applications may end up missing important errors. Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 87fbed4145998fbb3960c243c70cf78fbd7e5f42 Author: Kinglong Mee Date: Sat Aug 15 21:52:10 2015 +0800 NFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 client commit 18e3b739fdc826481c6a1335ce0c5b19b3d415da upstream. ---Steps to Reproduce-- # cat /etc/exports /nfs/referal *(rw,insecure,no_subtree_check,no_root_squash,crossmnt) /nfs/old *(ro,insecure,subtree_check,root_squash,crossmnt) # mount -t nfs nfs-server:/nfs/ /mnt/ # ll /mnt/*/ # cat /etc/exports /nfs/referal *(rw,insecure,no_subtree_check,no_root_squash,crossmnt,refer=/nfs/old/@nfs-server) /nfs/old *(ro,insecure,subtree_check,root_squash,crossmnt) # service nfs restart # ll /mnt/*/ --->>>>> oops here [ 5123.102925] BUG: unable to handle kernel NULL pointer dereference at (null) [ 5123.103363] IP: [] nfs4_proc_get_locations+0x9b/0x120 [nfsv4] [ 5123.103752] PGD 587b9067 PUD 3cbf5067 PMD 0 [ 5123.104131] Oops: 0000 [#1] [ 5123.104529] Modules linked in: nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon parport_pc parport i2c_piix4 shpchp auth_rpcgss nfs_acl vmw_vmci lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi serio_raw scsi_transport_spi e1000 mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd] [ 5123.105887] CPU: 0 PID: 15853 Comm: ::1-manager Tainted: G OE 4.2.0-rc6+ #214 [ 5123.106358] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 [ 5123.106860] task: ffff88007620f300 ti: ffff88005877c000 task.ti: ffff88005877c000 [ 5123.107363] RIP: 0010:[] [] nfs4_proc_get_locations+0x9b/0x120 [nfsv4] [ 5123.107909] RSP: 0018:ffff88005877fdb8 EFLAGS: 00010246 [ 5123.108435] RAX: ffff880053f3bc00 RBX: ffff88006ce6c908 RCX: ffff880053a0d240 [ 5123.108968] RDX: ffffea0000e6d940 RSI: ffff8800399a0000 RDI: ffff88006ce6c908 [ 5123.109503] RBP: ffff88005877fe28 R08: ffffffff81c708a0 R09: 0000000000000000 [ 5123.110045] R10: 00000000000001a2 R11: ffff88003ba7f5c8 R12: ffff880054c55800 [ 5123.110618] R13: 0000000000000000 R14: ffff880053a0d240 R15: ffff880053a0d240 [ 5123.111169] FS: 0000000000000000(0000) GS:ffffffff81c27000(0000) knlGS:0000000000000000 [ 5123.111726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5123.112286] CR2: 0000000000000000 CR3: 0000000054cac000 CR4: 00000000001406f0 [ 5123.112888] Stack: [ 5123.113458] ffffea0000e6d940 ffff8800399a0000 00000000000167d0 0000000000000000 [ 5123.114049] 0000000000000000 0000000000000000 0000000000000000 00000000a7ec82c6 [ 5123.114662] ffff88005877fe18 ffffea0000e6d940 ffff8800399a0000 ffff880054c55800 [ 5123.115264] Call Trace: [ 5123.115868] [] nfs4_try_migration+0xbb/0x220 [nfsv4] [ 5123.116487] [] nfs4_run_state_manager+0x4ab/0x7b0 [nfsv4] [ 5123.117104] [] ? nfs4_do_reclaim+0x510/0x510 [nfsv4] [ 5123.117813] [] kthread+0xd7/0xf0 [ 5123.118456] [] ? kthread_worker_fn+0x160/0x160 [ 5123.119108] [] ret_from_fork+0x3f/0x70 [ 5123.119723] [] ? kthread_worker_fn+0x160/0x160 [ 5123.120329] Code: 4c 8b 6a 58 74 17 eb 52 48 8d 55 a8 89 c6 4c 89 e7 e8 4a b5 ff ff 8b 45 b0 85 c0 74 1c 4c 89 f9 48 8b 55 90 48 8b 75 98 48 89 df <41> ff 55 00 3d e8 d8 ff ff 41 89 c6 74 cf 48 8b 4d c8 65 48 33 [ 5123.121643] RIP [] nfs4_proc_get_locations+0x9b/0x120 [nfsv4] [ 5123.122308] RSP [ 5123.122942] CR2: 0000000000000000 Fixes: ec011fe847 ("NFS: Introduce a vector of migration recovery ops") Signed-off-by: Kinglong Mee Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 0bdce6a850fe41e414700193996165fd787352f4 Author: Trond Myklebust Date: Thu Aug 13 10:59:07 2015 -0400 NFSv4.1/pNFS: Fix borken function _same_data_server_addrs_locked() commit 6f536936b79bd4b5cea8fb0e5b8b0bce8cd1ea4a upstream. - Switch back to using list_for_each_entry(). Fixes an incorrect test for list NULL termination. - Do not assume that lists are sorted. - Finally, consider an existing entry to match if it consists of a subset of the addresses in the new entry. Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit e204275a010138f27bca6bdc24ae6b042cd568e8 Author: Trond Myklebust Date: Thu Aug 6 12:06:30 2015 -0400 NFS: Don't let the ctime override attribute barriers. commit 7c2dad99d60c86ec686b3bfdcb787c450a7ea89f upstream. Chuck reports seeing cases where a GETATTR that happens to race with an asynchronous WRITE is overriding the file size, despite the attribute barrier being set by the writeback code. The culprit turns out to be the check in nfs_ctime_need_update(), which sees that the ctime is newer than the cached ctime, and assumes that it is safe to override the attribute barrier. This patch removes that override, and ensures that attribute barriers are always respected. Reported-by: Chuck Lever Fixes: a08a8cd375db9 ("NFS: Add attribute update barriers to NFS writebacks") Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 7bc97ee9c3ffd89d6226ec82398f5fbe843164e8 Author: NeilBrown Date: Thu Jul 30 13:00:56 2015 +1000 NFSv4: don't set SETATTR for O_RDONLY|O_EXCL commit efcbc04e16dfa95fef76309f89710dd1d99a5453 upstream. It is unusual to combine the open flags O_RDONLY and O_EXCL, but it appears that libre-office does just that. [pid 3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0 [pid 3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL NFSv4 takes O_EXCL as a sign that a setattr command should be sent, probably to reset the timestamps. When it was an O_RDONLY open, the SETATTR command does not identify any actual attributes to change. If no delegation was provided to the open, the SETATTR uses the all-zeros stateid and the request is accepted (at least by the Linux NFS server - no harm, no foul). If a read-delegation was provided, this is used in the SETATTR request, and a Netapp filer will justifiably claim NFS4ERR_BAD_STATEID, which the Linux client takes as a sign to retry - indefinitely. So only treat O_EXCL specially if O_CREAT was also given. Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit ea056d3d54709b92498af8f8b7175ca2e9e02980 Author: Jeff Layton Date: Mon Aug 24 12:41:48 2015 -0400 nfsd: ensure that delegation stateid hash references are only put once commit 3fcbbd244ed1d20dc0eb7d48d729503992fa9b7d upstream. It's possible that a DELEGRETURN could race with (e.g.) client expiry, in which case we could end up putting the delegation hash reference more than once. Have unhash_delegation_locked return a bool that indicates whether it was already unhashed. In the case of destroy_delegation we only conditionally put the hash reference if that returns true. The other callers of unhash_delegation_locked call it while walking list_heads that shouldn't yet be detached. If we find that it doesn't return true in those cases, then throw a WARN_ON as that indicates that we have a partially hashed delegation, and that something is likely very wrong. Tested-by: Andrew W Elble Tested-by: Anna Schumaker Signed-off-by: Jeff Layton Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman commit 0940ed480a9e75bddcfa7c2d7e42411b08514534 Author: Jeff Layton Date: Mon Aug 24 12:41:47 2015 -0400 nfsd: ensure that the ol stateid hash reference is only put once commit e85687393f3ee0a77ccca016f903d1558bb69258 upstream. When an open or lock stateid is hashed, we take an extra reference to it. When we unhash it, we drop that reference. The code however does not properly account for the case where we have two callers concurrently trying to unhash the stateid. This can lead to list corruption and the hash reference being put more than once. Fix this by having unhash_ol_stateid use list_del_init on the st_perfile list_head, and then testing to see if that list_head is empty before releasing the hash reference. This means that some of the unhashing wrappers now become bool return functions so we can test to see whether the stateid was unhashed before we put the reference. Reported-by: Andrew W Elble Tested-by: Andrew W Elble Reported-by: Anna Schumaker Tested-by: Anna Schumaker Signed-off-by: Jeff Layton Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman commit 8751006ab36573c2314b3ef99164eb7079a86175 Author: Kinglong Mee Date: Thu Jul 30 21:52:44 2015 +0800 nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug commit 6896f15aabde505b35888039af93d1d182a0108a upstream. Currently we'll respond correctly to a request for either FS_LAYOUT_TYPES or LAYOUT_TYPES, but not to a request for both attributes simultaneously. Signed-off-by: Kinglong Mee Reviewed-by: Christoph Hellwig Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman commit 9b6d61edd38cdedf81c1bbe79fc637c533df984a Author: Trond Myklebust Date: Sun Jul 5 20:06:38 2015 -0400 NFSv4/pnfs: Ensure we don't miss a file extension commit 2b83d3de4c18af49800e0b26ae013db4fcf43a4a upstream. pNFS writes don't return attributes, however that doesn't mean that we should ignore the fact that they may be extending the file. This patch ensures that if a write is seen to extend the file, then we always set an attribute barrier, and update the cached file size. Signed-off-by: Trond Myklebust Cc: Peng Tao Signed-off-by: Greg Kroah-Hartman commit fe995635458e8967dedc75ed8d47fe208e77a9be Author: Filipe Manana Date: Wed Aug 12 11:54:35 2015 +0100 Btrfs: check if previous transaction aborted to avoid fs corruption commit 1f9b8c8fbc9a4d029760b16f477b9d15500e3a34 upstream. While we are committing a transaction, it's possible the previous one is still finishing its commit and therefore we wait for it to finish first. However we were not checking if that previous transaction ended up getting aborted after we waited for it to commit, so we ended up committing the current transaction which can lead to fs corruption because the new superblock can point to trees that have had one or more nodes/leafs that were never durably persisted. The following sequence diagram exemplifies how this is possible: CPU 0 CPU 1 transaction N starts (...) btrfs_commit_transaction(N) cur_trans->state = TRANS_STATE_COMMIT_START; (...) cur_trans->state = TRANS_STATE_COMMIT_DOING; (...) cur_trans->state = TRANS_STATE_UNBLOCKED; root->fs_info->running_transaction = NULL; btrfs_start_transaction() --> starts transaction N + 1 btrfs_write_and_wait_transaction(trans, root); --> starts writing all new or COWed ebs created at transaction N creates some new ebs, COWs some existing ebs but doesn't COW or deletes eb X btrfs_commit_transaction(N + 1) (...) cur_trans->state = TRANS_STATE_COMMIT_START; (...) wait_for_commit(root, prev_trans); --> prev_trans == transaction N btrfs_write_and_wait_transaction() continues writing ebs --> fails writing eb X, we abort transaction N and set bit BTRFS_FS_STATE_ERROR on fs_info->fs_state, so no new transactions can start after setting that bit cleanup_transaction() btrfs_cleanup_one_transaction() wakes up task at CPU 1 continues, doesn't abort because cur_trans->aborted (transaction N + 1) is zero, and no checks for bit BTRFS_FS_STATE_ERROR in fs_info->fs_state are made btrfs_write_and_wait_transaction(trans, root); --> succeeds, no errors during writeback write_ctree_super(trans, root, 0); --> succeeds --> we have now a superblock that points us to some root that uses eb X, which was never written to disk In this scenario future attempts to read eb X from disk results in an error message like "parent transid verify failed on X wanted Y found Z". So fix this by aborting the current transaction if after waiting for the previous transaction we verify that it was aborted. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik Reviewed-by: Liu Bo Signed-off-by: Chris Mason Signed-off-by: Greg Kroah-Hartman commit c7f2670397e5a1ecfeb3fe5995306d44ba911404 Author: Benoit Parrot Date: Wed Jul 15 18:00:06 2015 -0300 media: am437x-vpfe: Fix a race condition during release commit c99235fa3ef833c3c23926085f2bb68851c8460a upstream. There was a race condition where during cleanup/release operation on-going streaming would cause a kernel panic because the hardware module was disabled prematurely with IRQ still pending. Fixes: 417d2e507edc ("[media] media: platform: add VPFE capture driver support for AM437X") Signed-off-by: Benoit Parrot Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 3eaf132eca3a276733faafcdd09fbf3e05091849 Author: Benoit Parrot Date: Mon Jun 29 18:19:06 2015 -0300 media: am437x-vpfe: Requested frame size and fmt overwritten by current sensor setting commit f47c9045643f91e76d8a9030828b9fe1cf4a6bcf upstream. Upon a S_FMT the input/requested frame size and pixel format is overwritten by the current sub-device settings. Fix this so application can actually set the frame size and format. Fixes: 417d2e507edc ("[media] media: platform: add VPFE capture driver support for AM437X") Signed-off-by: Benoit Parrot Acked-by: Lad, Prabhakar Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit f82aa0e2a52daa65a7c6648c5d5d331c93d61774 Author: Sakari Ailus Date: Fri Jun 12 20:06:23 2015 -0300 v4l: omap3isp: Fix sub-device power management code commit 9d39f05490115bf145e5ea03c0b7ec9d3d015b01 upstream. Commit 813f5c0ac5cc ("media: Change media device link_notify behaviour") modified the media controller link setup notification API and updated the OMAP3 ISP driver accordingly. As a side effect it introduced a bug by turning power on after setting the link instead of before. This results in sub-devices not being powered down in some cases when they should be. Fix it. Fixes: 813f5c0ac5cc [media] media: Change media device link_notify behaviour Signed-off-by: Sakari Ailus Signed-off-by: Laurent Pinchart Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 78cc6a0aa1812329d31c3633bef364e9b15f0051 Author: David Härdeman Date: Tue May 19 19:03:12 2015 -0300 rc-core: fix remove uevent generation commit a66b0c41ad277ae62a3ae6ac430a71882f899557 upstream. The input_dev is already gone when the rc device is being unregistered so checking for its presence only means that no remove uevent will be generated. Signed-off-by: David Härdeman Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit c85ea6919f1cf092ad6418323e49faac6ac7cf3c Author: Michal Hocko Date: Fri Aug 21 14:11:51 2015 -0700 mm: make page pfmemalloc check more robust commit 2f064f3485cd29633ad1b3cfb00cc519509a3d72 upstream. Commit c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") added checks for page->pfmemalloc to __skb_fill_page_desc(): if (page->pfmemalloc && !page->mapping) skb->pfmemalloc = true; It assumes page->mapping == NULL implies that page->pfmemalloc can be trusted. However, __delete_from_page_cache() can set set page->mapping to NULL and leave page->index value alone. Due to being in union, a non-zero page->index will be interpreted as true page->pfmemalloc. So the assumption is invalid if the networking code can see such a page. And it seems it can. We have encountered this with a NFS over loopback setup when such a page is attached to a new skbuf. There is no copying going on in this case so the page confuses __skb_fill_page_desc which interprets the index as pfmemalloc flag and the network stack drops packets that have been allocated using the reserves unless they are to be queued on sockets handling the swapping which is the case here and that leads to hangs when the nfs client waits for a response from the server which has been dropped and thus never arrive. The struct page is already heavily packed so rather than finding another hole to put it in, let's do a trick instead. We can reuse the index again but define it to an impossible value (-1UL). This is the page index so it should never see the value that large. Replace all direct users of page->pfmemalloc by page_is_pfmemalloc which will hide this nastiness from unspoiled eyes. The information will get lost if somebody wants to use page->index obviously but that was the case before and the original code expected that the information should be persisted somewhere else if that is really needed (e.g. what SLAB and SLUB do). [akpm@linux-foundation.org: fix blooper in slub] Fixes: c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") Signed-off-by: Michal Hocko Debugged-by: Vlastimil Babka Debugged-by: Jiri Bohac Cc: Eric Dumazet Cc: David Miller Acked-by: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit f48b9555a631b2363feb0be735f4d608b4c72e3c Author: Minfei Huang Date: Sun Jul 12 20:18:42 2015 +0800 x86/mm: Initialize pmd_idx in page_table_range_init_count() commit 9962eea9e55f797f05f20ba6448929cab2a9f018 upstream. The variable pmd_idx is not initialized for the first iteration of the for loop. Assign the proper value which indexes the start address. Fixes: 719272c45b82 'x86, mm: only call early_ioremap_page_table_range_init() once' Signed-off-by: Minfei Huang Cc: tony.luck@intel.com Cc: wangnan0@huawei.com Cc: david.vrabel@citrix.com Reviewed-by: yinghai@kernel.org Link: http://lkml.kernel.org/r/1436703522-29552-1-git-send-email-mhuang@redhat.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 397995fe3a36cf86913f724ee1d1933443e86d06 Author: Yinghai Lu Date: Fri Sep 4 15:42:39 2015 -0700 mm: check if section present during memory block registering commit 04697858d89e4bf2650364f8d6956e2554e8ef88 upstream. Tony Luck found on his setup, if memory block size 512M will cause crash during booting. BUG: unable to handle kernel paging request at ffffea0074000020 IP: get_nid_for_pfn+0x17/0x40 PGD 128ffcb067 PUD 128ffc9067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc8 #1 ... Call Trace: ? register_mem_sect_under_node+0x66/0xe0 register_one_node+0x17b/0x240 ? pci_iommu_alloc+0x6e/0x6e topology_init+0x3c/0x95 do_one_initcall+0xcd/0x1f0 The system has non continuous RAM address: BIOS-e820: [mem 0x0000001300000000-0x0000001cffffffff] usable BIOS-e820: [mem 0x0000001d70000000-0x0000001ec7ffefff] usable BIOS-e820: [mem 0x0000001f00000000-0x0000002bffffffff] usable BIOS-e820: [mem 0x0000002c18000000-0x0000002d6fffefff] usable BIOS-e820: [mem 0x0000002e00000000-0x00000039ffffffff] usable So there are start sections in memory block not present. For example: memory block : [0x2c18000000, 0x2c20000000) 512M first three sections are not present. The current register_mem_sect_under_node() assume first section is present, but memory block section number range [start_section_nr, end_section_nr] would include not present section. For arch that support vmemmap, we don't setup memmap for struct page area within not present sections area. So skip the pfn range that belong to absent section. [akpm@linux-foundation.org: simplification] [rientjes@google.com: more simplification] Fixes: bdee237c0343 ("x86: mm: Use 2GB memory block size on large memory x86-64 systems") Fixes: 982792c782ef ("x86, mm: probe memory block size for generic x86 64bit") Signed-off-by: Yinghai Lu Signed-off-by: David Rientjes Reported-by: Tony Luck Tested-by: Tony Luck Cc: Greg KH Cc: Ingo Molnar Tested-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit a75c632bfbd637a80832511b91ab4f46cbc88175 Author: Jeffery Miller Date: Tue Sep 1 11:23:02 2015 -0400 Add radeon suspend/resume quirk for HP Compaq dc5750. commit 09bfda10e6efd7b65bcc29237bee1765ed779657 upstream. With the radeon driver loaded the HP Compaq dc5750 Small Form Factor machine fails to resume from suspend. Adding a quirk similar to other devices avoids the problem and the system resumes properly. Signed-off-by: Jeffery Miller Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit c1589311c9e46c84aea04546b65be12481db8d81 Author: Jann Horn Date: Fri Sep 11 16:27:27 2015 +0200 CIFS: fix type confusion in copy offload ioctl commit 4c17a6d56bb0cad3066a714e94f7185a24b40f49 upstream. This might lead to local privilege escalation (code execution as kernel) for systems where the following conditions are met: - CONFIG_CIFS_SMB2 and CONFIG_CIFS_POSIX are enabled - a cifs filesystem is mounted where: - the mount option "vers" was used and set to a value >=2.0 - the attacker has write access to at least one file on the filesystem To attack this, an attacker would have to guess the target_tcon pointer (but guessing wrong doesn't cause a crash, it just returns an error code) and win a narrow race. Signed-off-by: Jann Horn Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit f5a73e9c4a1c5dc9f1d90abf782c5f9ac694e443 Author: Aneesh Kumar K.V Date: Tue Sep 15 12:30:08 2015 +0530 powerpc/mm: Recompute hash value after a failed update commit 36b35d5d807b7e57aff7d08e63de8b17731ee211 upstream. If we had secondary hash flag set, we ended up modifying hash value in the updatepp code path. Hence with a failed updatepp we will be using a wrong hash value for the following hash insert. Fix this by recomputing hash before insert. Without this patch we can end up with using wrong slot number in linux pte. That can result in us missing an hash pte update or invalidate which can cause memory corruption or even machine check. Fixes: 6d492ecc6489 ("powerpc/THP: Add code to handle HPTE faults for hugepages") Signed-off-by: Aneesh Kumar K.V Reviewed-by: Paul Mackerras Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit b46f51da057f2a4728f8cc4e30e64e297151db76 Author: Benjamin Herrenschmidt Date: Tue Sep 15 11:24:17 2015 +1000 powerpc/boot: Specify ABI v2 when building an LE boot wrapper commit 655471f54c2e395ba29ae4156ba0f49928177cc1 upstream. The kernel does it, not the boot wrapper, which breaks with some cross compilers that still default to ABI v1. Fixes: 147c05168fc8 ("powerpc/boot: Add support for 64bit little endian wrapper") Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit 98747a5651a63b7e9246880909013753c7b633eb Author: Leonidas Da Silva Barbosa Date: Mon Jul 13 13:51:39 2015 -0300 crypto: vmx - Adding enable_kernel_vsx() to access VSX instructions commit 2d6f0600b2cd755959527230ef5a6fba97bb762a upstream. vmx-crypto driver make use of some VSX instructions which are only available if VSX is enabled. Running in cases where VSX are not enabled vmx-crypto fails in a VSX exception. In order to fix this enable_kernel_vsx() was added to turn on VSX instructions for vmx-crypto. Signed-off-by: Leonidas S. Barbosa Signed-off-by: Herbert Xu Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit b40924367be91f86c882a6a023a50fbfd7f9c430 Author: Leonidas Da Silva Barbosa Date: Mon Jul 13 13:51:01 2015 -0300 powerpc: Uncomment and make enable_kernel_vsx() routine available commit 72cd7b44bc99376b3f3c93cedcd052663fcdf705 upstream. enable_kernel_vsx() function was commented since anything was using it. However, vmx-crypto driver uses VSX instructions which are only available if VSX is enable. Otherwise it rises an exception oops. This patch uncomment enable_kernel_vsx() routine and makes it available. Signed-off-by: Leonidas S. Barbosa Signed-off-by: Herbert Xu Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit ce813f1fef5077fb318c0802aa0be148a580e632 Author: Thomas Huth Date: Fri Jul 17 12:46:58 2015 +0200 powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers commit 1c2cb594441d02815d304cccec9742ff5c707495 upstream. The EPOW interrupt handler uses rtas_get_sensor(), which in turn uses rtas_busy_delay() to wait for RTAS becoming ready in case it is necessary. But rtas_busy_delay() is annotated with might_sleep() and thus may not be used by interrupts handlers like the EPOW handler! This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is enabled: BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6 Call Trace: [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable) [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180 [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0 [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0 [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450 [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300 [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0 [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260 [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80 [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200 [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24 [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110 [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180 Fix this issue by introducing a new rtas_get_sensor_fast() function that does not use rtas_busy_delay() - and thus can only be used for sensors that do not cause a BUSY condition - known as "fast" sensors. The EPOW sensor is defined to be "fast" in sPAPR - mpe. Fixes: 587f83e8dd50 ("powerpc/pseries: Use rtas_get_sensor in RAS code") Signed-off-by: Thomas Huth Reviewed-by: Nathan Fontenot Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit e4b334216318a348011304a8f83e8c33d9664f38 Author: Michael Ellerman Date: Fri Aug 7 16:19:43 2015 +1000 powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash commit 74b5037baa2011a2799e2c43adde7d171b072f9e upstream. The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K PAGE_SIZE. However when built with a 4K PAGE_SIZE there is an additional config option which can be enabled, PPC_HAS_HASH_64K, which means the kernel also knows how to hash a 64K page even though the base PAGE_SIZE is 4K. This is used in one obscure configuration, to support 64K pages for SPU local store on the Cell processor when the rest of the kernel is using 4K pages. In this configuration, pte_pagesize_index() is defined to just pass through its arguments to get_slice_psize(). However pte_pagesize_index() is called for both user and kernel addresses, whereas get_slice_psize() only knows how to handle user addresses. This has been broken forever, however until recently it happened to work. That was because in get_slice_psize() the large kernel address would cause the right shift of the slice mask to return zero. However in commit 7aa0727f3302 ("powerpc/mm: Increase the slice range to 64TB"), the get_slice_psize() code was changed so that instead of a right shift we do an array lookup based on the address. When passed a kernel address this means we index way off the end of the slice array and return random junk. That is only fatal if we happen to hit something non-zero, but when we do return a non-zero value we confuse the MMU code and eventually cause a check stop. This fix is ugly, but simple. When we're called for a kernel address we return 4K, which is always correct in this configuration, otherwise we use the slice mask. Fixes: 7aa0727f3302 ("powerpc/mm: Increase the slice range to 64TB") Reported-by: Cyril Bur Signed-off-by: Michael Ellerman Reviewed-by: Aneesh Kumar K.V Signed-off-by: Greg Kroah-Hartman commit f1ab3c0449ca09b72e6eea7eb077244bfffb7478 Author: Gavin Shan Date: Fri Aug 28 11:57:00 2015 +1000 powerpc/eeh: Fix fenced PHB caused by eeh_slot_error_detail() commit 259800135c654a098d9f0adfdd3d1f20eef1f231 upstream. The config space of some PCI devices can't be accessed when their PEs are in frozen state. Otherwise, fenced PHB might be seen. Those PEs are identified with flag EEH_PE_CFG_RESTRICTED, meaing EEH_PE_CFG_BLOCKED is set automatically when the PE is put to frozen state (EEH_PE_ISOLATED). eeh_slot_error_detail() restores PCI device BARs with eeh_pe_restore_bars(), which then calls eeh_ops->restore_config() to reinitialize the PCI device in (OPAL) firmware. eeh_ops->restore_config() produces PCI config access that causes fenced PHB. The problem was reported on below adapter: 0001:01:00.0 0200: 14e4:168e (rev 10) 0001:01:00.0 Ethernet controller: Broadcom Corporation \ NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) This fixes the issue by skipping eeh_pe_restore_bars() in eeh_slot_error_detail() when EEH_PE_CFG_BLOCKED is set for the PE. Fixes: b6541db1 ("powerpc/eeh: Block PCI config access upon frozen PE") Reported-by: Manvanthara B. Puttashankar Signed-off-by: Gavin Shan Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit 91552f87853c6e50b3968f6514ddb9084acb9f10 Author: Daniel Axtens Date: Fri Aug 14 16:03:19 2015 +1000 powerpc/eeh: Probe after unbalanced kref check commit e642d11bdbfe8eb10116ab3959a2b5d75efda832 upstream. In the complete hotplug case, EEH PEs are supposed to be released and set to NULL. Normally, this is done by eeh_remove_device(), which is called from pcibios_release_device(). However, if something is holding a kref to the device, it will not be released, and the PE will remain. eeh_add_device_late() has a check for this which will explictly destroy the PE in this case. This check in eeh_add_device_late() occurs after a call to eeh_ops->probe(). On PowerNV, probe is a pointer to pnv_eeh_probe(), which will exit without probing if there is an existing PE. This means that on PowerNV, devices with outstanding krefs will not be rediscovered by EEH correctly after a complete hotplug. This is affecting CXL (CAPI) devices in the field. Put the probe after the kref check so that the PE is destroyed and affected devices are correctly rediscovered by EEH. Fixes: d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during hotplug") Cc: Gavin Shan Signed-off-by: Daniel Axtens Acked-by: Gavin Shan Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit 7584b2d8dbe6f49a76d756b83682a298f33cbfb6 Author: Gavin Shan Date: Thu Aug 27 14:12:36 2015 +1000 powerpc/pseries: Fix corrupted pdn list commit 590c7567a2895f939525ead57b0334c6d47986f0 upstream. Commit cca87d30 ("powerpc/pci: Refactor pci_dn") introduced pdn list for SRIOV VFs. It means the pdn is be put into the child list of its parent pdn when the pdn is created. When doing PCI hot unplugging on pSeries, the PCI device node as well as its pdn are released through procfs entry "powerpc/ofdt". Some one else grabs the memory chunk of the pdn and update it accordingly. At the same time, the pdn is still tracked in the child list of parent pdn. It leads to corrupted child list in the parent pdn. This fixes above issue by removing the pdn from the child list of its parent pdn when the device node is detached from the system. Note the pdn is free'd when the device node is released if the device node is dynamic one. Otherwise, the device node as well as the pdn won't be released. Fixes: cca87d30 ("powerpc/pci: Refactor pci_dn") Reported-by: Santwana Samantray Signed-off-by: Gavin Shan Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit 56347e799e690aebf26037fb2e11325078cedeb3 Author: David Dueck Date: Tue Jul 28 09:48:16 2015 +0200 pinctrl: at91: fix null pointer dereference commit 1ab36387ea4face01aac3560b396b1e2ce07c4ff upstream. Not all gpio banks are necessarily enabled, in the current code this can lead to null pointer dereferences. [ 51.130000] Unable to handle kernel NULL pointer dereference at virtual address 00000058 [ 51.130000] pgd = dee04000 [ 51.130000] [00000058] *pgd=3f66d831, *pte=00000000, *ppte=00000000 [ 51.140000] Internal error: Oops: 17 [#1] ARM [ 51.140000] Modules linked in: [ 51.140000] CPU: 0 PID: 1664 Comm: cat Not tainted 4.1.1+ #6 [ 51.140000] Hardware name: Atmel SAMA5 [ 51.140000] task: df6dd880 ti: dec60000 task.ti: dec60000 [ 51.140000] PC is at at91_pinconf_get+0xb4/0x200 [ 51.140000] LR is at at91_pinconf_get+0xb4/0x200 [ 51.140000] pc : [] lr : [] psr: 600f0013 sp : dec61e48 ip : 600f0013 fp : df522538 [ 51.140000] r10: df52250c r9 : 00000058 r8 : 00000068 [ 51.140000] r7 : 00000000 r6 : df53c910 r5 : 00000000 r4 : dec61e7c [ 51.140000] r3 : 00000000 r2 : c06746d4 r1 : 00000000 r0 : 00000003 [ 51.140000] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 51.140000] Control: 10c53c7d Table: 3ee04059 DAC: 00000015 [ 51.140000] Process cat (pid: 1664, stack limit = 0xdec60208) [ 51.140000] Stack: (0xdec61e48 to 0xdec62000) [ 51.140000] 1e40: 00000358 00000000 df522500 ded15f80 c05a9d08 ded15f80 [ 51.140000] 1e60: 0000048c 00000061 df522500 ded15f80 c05a9d08 c01e7304 ded15f80 00000000 [ 51.140000] 1e80: c01e6008 00000060 0000048c c01e6034 c01e5f6c ded15f80 dec61ec0 00000000 [ 51.140000] 1ea0: 00020000 ded6f280 dec61f80 00000001 00000001 c00ae0b8 b6e80000 ded15fb0 [ 51.140000] 1ec0: 00000000 00000000 df4bc974 00000055 00000800 ded6f280 b6e80000 ded6f280 [ 51.140000] 1ee0: ded6f280 00020000 b6e80000 00000000 00020000 c0090dec c0671e1c dec61fb0 [ 51.140000] 1f00: b6f8b510 00000001 00004201 c000924c 00000000 00000003 00000003 00000000 [ 51.140000] 1f20: df4bc940 00022000 00000022 c066e188 b6e7f000 c00836f4 000b6e7f ded6f280 [ 51.140000] 1f40: ded6f280 b6e80000 dec61f80 ded6f280 00020000 c0091508 00000000 00000003 [ 51.140000] 1f60: 00022000 00000000 00000000 ded6f280 ded6f280 00020000 b6e80000 c0091d9c [ 51.140000] 1f80: 00000000 00000000 ffffffff 00020000 00020000 b6e80000 00000003 c000f124 [ 51.140000] 1fa0: dec60000 c000efa0 00020000 00020000 00000003 b6e80000 00020000 000271c4 [ 51.140000] 1fc0: 00020000 00020000 b6e80000 00000003 7fffe000 00000000 00000000 00020000 [ 51.140000] 1fe0: 00000000 bef50b64 00013835 b6f29c76 400f0030 00000003 00000000 00000000 [ 51.140000] [] (at91_pinconf_get) from [] (at91_pinconf_dbg_show+0x18/0x2c0) [ 51.140000] [] (at91_pinconf_dbg_show) from [] (pinconf_pins_show+0xc8/0xf8) [ 51.140000] [] (pinconf_pins_show) from [] (seq_read+0x1a0/0x464) [ 51.140000] [] (seq_read) from [] (__vfs_read+0x20/0xd0) [ 51.140000] [] (__vfs_read) from [] (vfs_read+0x7c/0x108) [ 51.140000] [] (vfs_read) from [] (SyS_read+0x40/0x94) [ 51.140000] [] (SyS_read) from [] (ret_fast_syscall+0x0/0x3c) [ 51.140000] Code: eb010ec2 e30a0d08 e34c005a eb0ae5a7 (e5993000) [ 51.150000] ---[ end trace fb3c370da3ea4794 ]--- Fixes: a0b957f306fa ("pinctrl: at91: allow to have disabled gpio bank") Signed-off-by: David Dueck Acked-by: Ludovic Desroches Acked-by: Alexandre Belloni Acked-by: Nicolas Ferre Cc: Boris Brezillon Cc: Jean-Christophe PLAGNIOL-VILLARD Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman commit d521521ee7c630dfe5f0036946b820f2ace80cb1 Author: Niranjan Sivakumar Date: Sat Sep 5 18:20:35 2015 +0200 ALSA: hda - Fix white noise on Dell M3800 commit 467e1436ba85f78b8c4610c4549eb255a8211c42 upstream. The M3800 is very minor workstation variant of the XPS 15 which has already been patched for this issue. I figured it's probably more important for this version of the laptop to be patched than the regular XPS as Dell sells is pre-configured with Ubuntu to be used as a Linux workstation. I have tested the patch on my the hardware on Linux 4.2.0. Signed-off-by: Niranjan Sivakumar Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 0ea955bb6792090d06e93a4822407d6d6238bf74 Author: Woodrow Shen Date: Fri Sep 4 15:08:12 2015 +0800 ALSA: hda - Add some FIXUP quirks for white noise on Dell laptop. commit 1adecc6755e1e4193b5618ddb2e107f6d6e88f4b upstream. Dell laptop has a series model to use the same codec but different subsystem ID. At the same time they happens the white noise by login screen and headphone; for fixing them together, I only can add these IDs to FIXUP function ALC292_FIXUP_DISABLE_AAMIX, then try to solve such the similar issues. Codec: Realtek ALC3235 Vendor Id: 0x10ec0293 Subsystem Id: 0x102806dd Subsystem Id: 0x102806df Subsystem Id: 0x102806e0 BugLink: https://bugs.launchpad.net/bugs/1492132 Signed-off-by: Woodrow Shen Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 68e17e9c950fcde261e286491b24d2015ea0edcb Author: Takashi Iwai Date: Thu Aug 13 18:05:06 2015 +0200 ALSA: hda - Use ALC880_FIXUP_FUJITSU for FSC Amilo M1437 commit a161574e200ae63a5042120e0d8c36830e81bde3 upstream. It turned out that the machine has a bass speaker, so take a correct fixup entry. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102501 Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit cd7885f07b7cf2e44b29c64d3f79e6cb104e1b19 Author: Takashi Iwai Date: Thu Aug 13 18:02:39 2015 +0200 ALSA: hda - Enable headphone jack detect on old Fujitsu laptops commit bb148bdeb0ab16fc0ae8009799471e4d7180073b upstream. According to the bug report, FSC Amilo laptops with ALC880 can detect the headphone jack but currently the driver disables it. It's partly intentionally, as non-working jack detect was reported in the past. Let's enable now. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102501 Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit ae7d175f7d461643dd6eface450704c4076cfc0d Author: Yao-Wen Mao Date: Fri Aug 28 16:33:25 2015 +0800 ALSA: usb-audio: correct the value cache check. commit 6aa6925cad06159dc6e25857991bbc4960821242 upstream. The check of cval->cached should be zero-based (including master channel). Signed-off-by: Yao-Wen Mao Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 37e7e6bf19f75255444a6a53d90a16d8c5e5a471 Author: Takashi Iwai Date: Thu Sep 3 22:20:00 2015 -0700 Input: evdev - do not report errors form flush() commit eb38f3a4f6e86f8bb10a3217ebd85ecc5d763aae upstream. We've got bug reports showing the old systemd-logind (at least system-210) aborting unexpectedly, and this turned out to be because of an invalid error code from close() call to evdev devices. close() is supposed to return only either EINTR or EBADFD, while the device returned ENODEV. logind was overreacting to it and decided to kill itself when an unexpected error code was received. What a tragedy. The bad error code comes from flush fops, and actually evdev_flush() returns ENODEV when device is disconnected or client's access to it is revoked. But in these cases the fact that flush did not actually happen is not an error, but rather normal behavior. For non-disconnected devices result of flush is also not that interesting as there is no potential of data loss and even if it fails application has no way of handling the error. Because of that we are better off always returning success from evdev_flush(). Also returning EINTR from flush()/close() is discouraged (as it is not clear how application should handle this error), so let's stop taking evdev->mutex interruptibly. Bugzilla: http://bugzilla.suse.com/show_bug.cgi?id=939834 Signed-off-by: Takashi Iwai Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman commit f283a20ab79c44af3e493cfbcc2550404d75c3b8 Author: Marc Zyngier Date: Wed Sep 16 16:18:59 2015 +0100 arm64: KVM: Disable virtual timer even if the guest is not using it commit c4cbba9fa078f55d9f6d081dbb4aec7cf969e7c7 upstream. When running a guest with the architected timer disabled (with QEMU and the kernel_irqchip=off option, for example), it is important to make sure the timer gets turned off. Otherwise, the guest may try to enable it anyway, leading to a screaming HW interrupt. The fix is to unconditionally turn off the virtual timer on guest exit. Reviewed-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 7dd1e0b3f23eafbbbac8b018b591c613f8468368 Author: Will Deacon Date: Mon Sep 14 16:06:03 2015 +0100 KVM: arm64: add workaround for Cortex-A57 erratum #852523 commit 43297dda0a51e4ffed0888ce727c218cfb7474b6 upstream. When restoring the system register state for an AArch32 guest at EL2, writes to DACR32_EL2 may not be correctly synchronised by Cortex-A57, which can lead to the guest effectively running with junk in the DACR and running into unexpected domain faults. This patch works around the issue by re-ordering our restoration of the AArch32 register aliases so that they happen before the AArch64 system registers. Ensuring that the registers are restored in this order guarantees that they will be correctly synchronised by the core. Reviewed-by: Marc Zyngier Signed-off-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 26550ff8f2ef668a526ca8fedbcf86bbd018169b Author: Pavel Fedin Date: Wed Aug 5 11:53:57 2015 +0100 arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources commit c2f58514cfb374d5368c9da945f1765cd48eb0da upstream. Until b26e5fdac43c ("arm/arm64: KVM: introduce per-VM ops"), kvm_vgic_map_resources() used to include a check on irqchip_in_kernel(), and vgic_v2_map_resources() still has it. But now vm_ops are not initialized until we call kvm_vgic_create(). Therefore kvm_vgic_map_resources() can being called without a VGIC, and we die because vm_ops.map_resources is NULL. Fixing this restores QEMU's kernel-irqchip=off option to a working state, allowing to use GIC emulation in userspace. Fixes: b26e5fdac43c ("arm/arm64: KVM: introduce per-VM ops") Signed-off-by: Pavel Fedin [maz: reworked commit message] Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 1feed3799f32961ccfade28cd755f93c023fa21c Author: Will Deacon Date: Tue Mar 17 12:15:02 2015 +0000 arm64: errata: add module build workaround for erratum #843419 commit df057cc7b4fa59e9b55f07ffdb6c62bf02e99a00 upstream. Cortex-A53 processors <= r0p4 are affected by erratum #843419 which can lead to a memory access using an incorrect address in certain sequences headed by an ADRP instruction. There is a linker fix to generate veneers for ADRP instructions, but this doesn't work for kernel modules which are built as unlinked ELF objects. This patch adds a new config option for the erratum which, when enabled, builds kernel modules with the mcmodel=large flag. This uses absolute addressing for all kernel symbols, thereby removing the use of ADRP as a PC-relative form of addressing. The ADRP relocs are removed from the module loader so that we fail to load any potentially affected modules. Acked-by: Catalin Marinas Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit ad09182eee001c592b7a442fad5e7b2ca61b34aa Author: Will Deacon Date: Wed Sep 2 18:49:28 2015 +0100 arm64: head.S: initialise mdcr_el2 in el2_setup commit d10bcd473301888f957ec4b6b12aa3621be78d59 upstream. When entering the kernel at EL2, we fail to initialise the MDCR_EL2 register which controls debug access and PMU capabilities at EL1. This patch ensures that the register is initialised so that all traps are disabled and all the PMU counters are available to the host. When a guest is scheduled, KVM takes care to configure trapping appropriately. Acked-by: Marc Zyngier Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit a70957a9cab26b99de59514598eaf201312f8956 Author: Will Deacon Date: Tue Sep 15 12:07:06 2015 +0100 arm64: compat: fix vfp save/restore across signal handlers in big-endian commit bdec97a855ef1e239f130f7a11584721c9a1bf04 upstream. When saving/restoring the VFP registers from a compat (AArch32) signal frame, we rely on the compat registers forming a prefix of the native register file and therefore make use of copy_{to,from}_user to transfer between the native fpsimd_state and the compat_vfp_sigframe. Unfortunately, this doesn't work so well in a big-endian environment. Our fpsimd save/restore code operates directly on 128-bit quantities (Q registers) whereas the compat_vfp_sigframe represents the registers as an array of 64-bit (D) registers. The architecture packs the compat D registers into the Q registers, with the least significant bytes holding the lower register. Consequently, we need to swap the 64-bit halves when converting between these two representations on a big-endian machine. This patch replaces the __copy_{to,from}_user invocations in our compat VFP signal handling code with explicit __put_user loops that operate on 64-bit values and swap them accordingly. Reviewed-by: Catalin Marinas Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit fe6b3cafd6a0ee59a6bd5b5a0e2fee18ea257328 Author: Ard Biesheuvel Date: Tue Aug 18 10:34:42 2015 +0100 arm64: set MAX_MEMBLOCK_ADDR according to linear region size commit 34ba2c4247e5c4b1542b1106e156af324660c4f0 upstream. The linear region size of a 39-bit VA kernel is only 256 GB, which may be insufficient to cover all of system RAM, even on platforms that have much less than 256 GB of memory but which is laid out very sparsely. So make sure we clip the memory we will not be able to map before installing it into the memblock memory table, by setting MAX_MEMBLOCK_ADDR accordingly. Reviewed-by: Catalin Marinas Tested-by: Stuart Yoder Signed-off-by: Ard Biesheuvel Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit 01cb08b46fbcb7572db3cfa41a46b054ec8c2fc9 Author: Ard Biesheuvel Date: Tue Aug 18 10:34:41 2015 +0100 of/fdt: make memblock maximum physical address arch configurable commit 8eafeb48022816513abc4f440bdad4c350fe81a3 upstream. When parsing the memory nodes to populate the memblock memory table, we check against high and low limits and clip any memory that exceeds either one of them. However, for arm64, the high limit of (phys_addr_t)~0 is not very meaningful, since phys_addr_t is 64 bits (i.e., no limit) but there may be other constraints that limit the memory ranges that we can support. So rename MAX_PHYS_ADDR to MAX_MEMBLOCK_ADDR (for clarity) and only define it if the arch does not supply a definition of its own. Acked-by: Rob Herring Reviewed-by: Catalin Marinas Tested-by: Stuart Yoder Signed-off-by: Ard Biesheuvel Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit 380cd983b40fb5a58d7c06a11c736ce9d531d408 Author: Ard Biesheuvel Date: Thu Aug 27 07:12:33 2015 +0100 arm64: flush FP/SIMD state correctly after execve() commit 674c242c9323d3c293fc4f9a3a3a619fe3063290 upstream. When a task calls execve(), its FP/SIMD state is flushed so that none of the original program state is observeable by the incoming program. However, since this flushing consists of setting the in-memory copy of the FP/SIMD state to all zeroes, the CPU field is set to CPU 0 as well, which indicates to the lazy FP/SIMD preserve/restore code that the FP/SIMD state does not need to be reread from memory if the task is scheduled again on CPU 0 without any other tasks having entered userland (or used the FP/SIMD in kernel mode) on the same CPU in the mean time. If this happens, the FP/SIMD state of the old program will still be present in the registers when the new program starts. So set the CPU field to the invalid value of NR_CPUS when performing the flush, by calling fpsimd_flush_task_state(). Reported-by: Chunyan Zhang Reported-by: Janet Liu Signed-off-by: Ard Biesheuvel Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit 1bcc586eae97f5dfb9249c2d8c9406f0f8da57ae Author: Jeff Vander Stoep Date: Tue Aug 18 20:50:10 2015 +0100 arm64: kconfig: Move LIST_POISON to a safe value commit bf0c4e04732479f650ff59d1ee82de761c0071f0 upstream. Move the poison pointer offset to 0xdead000000000000, a recognized value that is not mappable by user-space exploits. Acked-by: Catalin Marinas Signed-off-by: Thierry Strudel Signed-off-by: Jeff Vander Stoep Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit 2a6f41747737303b2241f0e30c6960a30102ac26 Author: Theodore Ts'o Date: Sun Aug 16 10:03:57 2015 -0400 Revert "ext4: remove block_device_ejected" commit bdfe0cbd746aa9b2509c2f6d6be17193cf7facd7 upstream. This reverts commit 08439fec266c3cc5702953b4f54bdf5649357de0. Unfortunately we still need to test for bdi->dev to avoid a crash when a USB stick is yanked out while a file system is mounted: usb 2-2: USB disconnect, device number 2 Buffer I/O error on dev sdb1, logical block 15237120, lost sync page write JBD2: Error -5 detected when updating journal superblock for sdb1-8. BUG: unable to handle kernel paging request at 34beb000 IP: [] __percpu_counter_add+0x18/0xc0 *pdpt = 0000000023db9001 *pde = 0000000000000000 Oops: 0000 [#1] SMP CPU: 0 PID: 4083 Comm: umount Tainted: G U OE 4.1.1-040101-generic #201507011435 Hardware name: LENOVO 7675CTO/7675CTO, BIOS 7NETC2WW (2.22 ) 03/22/2011 task: ebf06b50 ti: ebebc000 task.ti: ebebc000 EIP: 0060:[] EFLAGS: 00010082 CPU: 0 EIP is at __percpu_counter_add+0x18/0xc0 EAX: f21c8e88 EBX: f21c8e88 ECX: 00000000 EDX: 00000001 ESI: 00000001 EDI: 00000000 EBP: ebebde60 ESP: ebebde40 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 34beb000 CR3: 33354200 CR4: 000007f0 Stack: c1abe100 edcb0098 edcb00ec ffffffff f21c8e68 ffffffff f21c8e68 f286d160 ebebde84 c1160454 00000010 00000282 f72a77f8 00000984 f72a77f8 f286d160 f286d170 ebebdea0 c11e613f 00000000 00000282 f72a77f8 edd7f4d0 00000000 Call Trace: [] account_page_dirtied+0x74/0x110 [] __set_page_dirty+0x3f/0xb0 [] mark_buffer_dirty+0x53/0xc0 [] ext4_commit_super+0x17b/0x250 [] ext4_put_super+0xc1/0x320 [] ? fsnotify_unmount_inodes+0x1aa/0x1c0 [] ? evict_inodes+0xca/0xe0 [] generic_shutdown_super+0x6a/0xe0 [] ? prepare_to_wait_event+0xd0/0xd0 [] ? unregister_shrinker+0x40/0x50 [] kill_block_super+0x26/0x70 [] deactivate_locked_super+0x45/0x80 [] deactivate_super+0x47/0x60 [] cleanup_mnt+0x39/0x80 [] __cleanup_mnt+0x10/0x20 [] task_work_run+0x91/0xd0 [] do_notify_resume+0x7c/0x90 [] work_notify Code: 8b 55 e8 e9 f4 fe ff ff 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 83 ec 20 89 5d f4 89 c3 89 75 f8 89 d6 89 7d fc 89 cf 8b 48 14 <64> 8b 01 89 45 ec 89 c2 8b 45 08 c1 fa 1f 01 75 ec 89 55 f0 89 EIP: [] __percpu_counter_add+0x18/0xc0 SS:ESP 0068:ebebde40 CR2: 0000000034beb000 ---[ end trace dd564a7bea834ecd ]--- Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101011 Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit c4f568f41c5ba3a16f70743240b7cad77ea250ec Author: Eric Sandeen Date: Sat Aug 15 10:45:06 2015 -0400 ext4: don't manipulate recovery flag when freezing no-journal fs commit c642dc9e1aaed953597e7092d7df329e6234096e upstream. At some point along this sequence of changes: f6e63f9 ext4: fold ext4_nojournal_sops into ext4_sops bb04457 ext4: support freezing ext2 (nojournal) file systems 9ca9238 ext4: Use separate super_operations structure for no_journal filesystems ext4 started setting needs_recovery on filesystems without journals when they are unfrozen. This makes no sense, and in fact confuses blkid to the point where it doesn't recognize the filesystem at all. (freeze ext2; unfreeze ext2; run blkid; see no output; run dumpe2fs, see needs_recovery set on fs w/ no journal). To fix this, don't manipulate the INCOMPAT_RECOVER feature on filesystems without journals. Reported-by: Stu Mark Reviewed-by: Jan Kara Signed-off-by: Eric Sandeen Signed-off-by: Theodore Ts'o Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit a847e529661790162905169d28e9fe6e819345b7 Author: Daniel Axtens Date: Tue Sep 15 15:04:07 2015 +1000 cxl: Fix unbalanced pci_dev_get in cxl_probe commit 2925c2fdf1e0eb642482f5b30577e9435aaa8edb upstream. Currently the first thing we do in cxl_probe is to grab a reference on the pci device. Later on, we call device_register on our adapter. In our remove path, we call device_unregister, but we never call pci_dev_put. We therefore leak the device every time we do a reflash. device_register/unregister is sufficient to hold the reference. Therefore, drop the call to pci_dev_get. Here's why this is safe. The proposed cxl_probe(pdev) calls cxl_adapter_init: a) init calls cxl_adapter_alloc, which creates a struct cxl, conventionally called adapter. This struct contains a device entry, adapter->dev. b) init calls cxl_configure_adapter, where we set adapter->dev.parent = &dev->dev (here dev is the pci dev) So at this point, the cxl adapter's device's parent is the PCI device that I want to be refcounted properly. c) init calls cxl_register_adapter *) cxl_register_adapter calls device_register(&adapter->dev) So now we're in device_register, where dev is the adapter device, and we want to know if the PCI device is safe after we return. device_register(&adapter->dev) calls device_initialize() and then device_add(). device_add() does a get_device(). device_add() also explicitly grabs the device's parent, and calls get_device() on it: parent = get_device(dev->parent); So therefore, device_register() takes a lock on the parent PCI dev, which is what pci_dev_get() was guarding. pci_dev_get() can therefore be safely removed. Fixes: f204e0b8cedd ("cxl: Driver code for powernv PCIe based cards for userspace access") Signed-off-by: Daniel Axtens Acked-by: Ian Munsie Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit 348b074897b6ce7004a9b4024b2c641a1262a961 Author: Daniel Axtens Date: Fri Aug 21 17:25:15 2015 +1000 cxl: Remove racy attempt to force EEH invocation in reset commit 9d8e27673c45927fee9e7d8992ffb325a6b0b0e4 upstream. cxl_reset currently PERSTs the slot, and then repeatedly tries to read MMIO space in order to kick off EEH. There are 2 problems with this: it's unnecessary, and it's racy. It's unnecessary because the PERST will bring down the PHB link. That will be picked up by the CAPP, which will send out an HMI. Skiboot, noticing an HMI from the CAPP, will send an OPAL notification to the kernel, which will trigger EEH recovery. It's also racy: the EEH recovery triggered by the CAPP will eventually cause the MMIO space to have its mapping invalidated and the pointer NULLed out. This races with our attempt to read the MMIO space. This is causing OOPSes in testing. Simply drop all the attempts to force EEH detection, and trust that Skiboot will send the notification and that we'll act on it. The Skiboot code to send the EEH notification has been in Skiboot for as long as CAPP recovery has been supported, so we don't need to worry about breaking obscure setups with ancient firmware. Cc: Ryan Grimm Fixes: 62fa19d4b4fd ("cxl: Add ability to reset the card") Signed-off-by: Daniel Axtens Acked-by: Ian Munsie Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit 1a7af1d9c5256dab6c33caf0b855d38102e79bb5 Author: Bob Copeland Date: Sat Jun 13 10:16:31 2015 -0400 mac80211: enable assoc check for mesh interfaces commit 3633ebebab2bbe88124388b7620442315c968e8f upstream. We already set a station to be associated when peering completes, both in user space and in the kernel. Thus we should always have an associated sta before sending data frames to that station. Failure to check assoc state can cause crashes in the lower-level driver due to transmitting unicast data frames before driver sta structures (e.g. ampdu state in ath9k) are initialized. This occurred when forwarding in the presence of fixed mesh paths: frames were transmitted to stations with whom we hadn't yet completed peering. Reported-by: Alexis Green Tested-by: Jesse Jones Signed-off-by: Bob Copeland Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit 5242720923a0a6d3a59b62e51c053f0948b9d2a3 Author: Markos Chandras Date: Fri Jul 17 10:38:32 2015 +0100 MIPS: math-emu: Emulate missing BC1{EQ,NE}Z instructions commit c909ca718e8f50cf484ef06a8dd935e738e8e53d upstream. Commit c8a34581ec09 ("MIPS: Emulate the BC1{EQ,NE}Z FPU instructions") added support for emulating the new R6 BC1{EQ,NE}Z branches but it missed the case where the instruction that caused the exception was not on a DS. Signed-off-by: Markos Chandras Fixes: c8a34581ec09 ("MIPS: Emulate the BC1{EQ,NE}Z FPU instructions") Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/10738/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman commit 2297868856a2208eed7598446edfcc71ffb717a1 Author: Markos Chandras Date: Fri Jul 17 10:36:03 2015 +0100 MIPS: math-emu: Allow m{f,t}hc emulation on MIPS R6 commit e8f80cc1a6d80587136b015e989a12827e1fcfe5 upstream. The mfhc/mthc instructions are supported on MIPS R6 so emulate them if needed. Signed-off-by: Markos Chandras Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/10737/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman commit 4e00f05deff2b5828c0bf2a747f8b024c0b25718 Author: Jean Delvare Date: Tue Sep 1 18:07:41 2015 +0200 tg3: Fix temperature reporting commit d3d11fe08ccc9bff174fc958722b5661f0932486 upstream. The temperature registers appear to report values in degrees Celsius while the hwmon API mandates values to be exposed in millidegrees Celsius. Do the conversion so that the values reported by "sensors" are correct. Fixes: aed93e0bf493 ("tg3: Add hwmon support for temperature") Signed-off-by: Jean Delvare Cc: Prashant Sreedharan Cc: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a0e26ed623a1e1460c1a191fbc0f37bddab7851a Author: Shota Suzuki Date: Wed Jul 1 09:25:52 2015 +0900 igb: Fix oops caused by missing queue pairing commit 72ddef0506da852dc82f078f37ced8ef4d74a2bf upstream. When initializing igb driver (e.g. 82576, I350), IGB_FLAG_QUEUE_PAIRS is set if adapter->rss_queues exceeds half of max_rss_queues in igb_init_queue_configuration(). On the other hand, IGB_FLAG_QUEUE_PAIRS is not set even if the number of queues exceeds half of max_combined in igb_set_channels() when changing the number of queues by "ethtool -L". In this case, if numvecs is larger than MAX_MSIX_ENTRIES (10), the size of adapter->msix_entries[], an overflow can occur in igb_set_interrupt_capability(), which in turn leads to an oops. Fix this problem as follows: - When changing the number of queues by "ethtool -L", set IGB_FLAG_QUEUE_PAIRS in the same way as initializing igb driver. - When increasing the size of q_vector, reallocate it appropriately. (With IGB_FLAG_QUEUE_PAIRS set, the size of q_vector gets larger.) Another possible way to fix this problem is to cap the queues at its initial number, which is the number of the initial online cpus. But this is not the optimal way because we cannot increase queues when another cpu becomes online. Note that before commit cd14ef54d25b ("igb: Change to use statically allocated array for MSIx entries"), this problem did not cause oops but just made the number of queues become 1 because of entering msi_only mode in igb_set_interrupt_capability(). Fixes: 907b7835799f ("igb: Add ethtool support to configure number of channels") Signed-off-by: Shota Suzuki Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: Greg Kroah-Hartman commit d7378111c7f7845a14792564d5168505f5309b74 Author: Larry Finger Date: Wed Jul 8 10:18:50 2015 -0500 rtlwifi: rtl8821ae: Fix an expression that is always false commit 251086f588720277a6f5782020a648ce32c4e00b upstream. In routine _rtl8821ae_set_media_status(), an incorrect mask results in a test for AP status to always be false. Similar bugs were fixed in rtl8192cu and rtl8192de, but this instance was missed at that time. Reported-by: David Binderman Signed-off-by: Larry Finger Cc: David Binderman Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman commit e8ad44583d5cffa93063a2c866c6f7806dd40374 Author: Adrien Schildknecht Date: Wed Aug 19 17:33:12 2015 +0200 rtlwifi: rtl8192cu: Add new device ID commit 1642d09fb9b128e8e538b2a4179962a34f38dff9 upstream. The v2 of NetGear WNA1000M uses a different idProduct: USB ID 0846:9043 Signed-off-by: Adrien Schildknecht Acked-by: Larry Finger Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman commit 11e43d237c8ca2d39047c527b105366823c5f8bc Author: Eric W. Biederman Date: Mon Aug 10 17:35:07 2015 -0500 unshare: Unsharing a thread does not require unsharing a vm commit 12c641ab8270f787dfcce08b5f20ce8b65008096 upstream. In the logic in the initial commit of unshare made creating a new thread group for a process, contingent upon creating a new memory address space for that process. That is wrong. Two separate processes in different thread groups can share a memory address space and clone allows creation of such proceses. This is significant because it was observed that mm_users > 1 does not mean that a process is multi-threaded, as reading /proc/PID/maps temporarily increments mm_users, which allows other processes to (accidentally) interfere with unshare() calls. Correct the check in check_unshare_flags() to test for !thread_group_empty() for CLONE_THREAD, CLONE_SIGHAND, and CLONE_VM. For sighand->count > 1 for CLONE_SIGHAND and CLONE_VM. For !current_is_single_threaded instead of mm_users > 1 for CLONE_VM. By using the correct checks in unshare this removes the possibility of an accidental denial of service attack. Additionally using the correct checks in unshare ensures that only an explicit unshare(CLONE_VM) can possibly trigger the slow path of current_is_single_threaded(). As an explict unshare(CLONE_VM) is pointless it is not expected there are many applications that make that call. Fixes: b2e0d98705e60e45bbb3c0032c48824ad7ae0704 userns: Implement unshare of the user namespace Reported-by: Ricky Zhou Reported-by: Kees Cook Reviewed-by: Kees Cook Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman commit 1af97771b397a9ca2ee984c3277bcada3eec95cd Author: Ming Lei Date: Sun Aug 9 03:41:50 2015 -0400 blk-mq: fix buffer overflow when reading sysfs file of 'pending' commit 596f5aad2a704b72934e5abec1b1b4114c16f45b upstream. There may be lots of pending requests so that the buffer of PAGE_SIZE can't hold them at all. One typical example is scsi-mq, the queue depth(.can_queue) of scsi_host and blk-mq is quite big but scsi_device's queue_depth is a bit small(.cmd_per_lun), then it is quite easy to have lots of pending requests in hw queue. This patch fixes the following warning and the related memory destruction. [ 359.025101] fill_read_buffer: blk_mq_hw_sysfs_show+0x0/0x7d returned bad count^M [ 359.055595] irq event stamp: 15537^M [ 359.055606] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M [ 359.055614] Dumping ftrace buffer:^M [ 359.055660] (ftrace buffer empty)^M [ 359.055672] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M [ 359.055678] CPU: 4 PID: 21631 Comm: stress-ng-sysfs Not tainted 4.2.0-rc5-next-20150805 #434^M [ 359.055679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M [ 359.055682] task: ffff8802161cc000 ti: ffff88021b4a8000 task.ti: ffff88021b4a8000^M [ 359.055693] RIP: 0010:[] [] __kmalloc+0xe8/0x152^M Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 4fbdb442390e3eeb755d299c3dcbc777ea83f193 Author: Christophe Ricard Date: Wed Aug 19 21:26:42 2015 +0200 nfc: nci: hci: Add check on skb nci_hci_send_cmd parameter commit 5a9e0ffc0f128ecdf7c770f76c268e4f9f3c9118 upstream. skb can be NULL and may lead to a NULL pointer error. Add a check condition before setting HCI rx buffer. Signed-off-by: Christophe Ricard Signed-off-by: Samuel Ortiz Signed-off-by: Greg Kroah-Hartman commit f84e11df678f2e3c47264f9f42523d822b3fe426 Author: Christophe Ricard Date: Fri Aug 14 22:33:33 2015 +0200 NFC: st21nfca: fix use of uninitialized variables in error path commit 5a3570061a131309143a49e4bbdbce7e23f261e7 upstream. st21nfca_hci_load_session() calls kfree_skb() on unitialized variables skb_pipe_info and skb_pipe_list if the call to nfc_hci_connect_gate() failed. Reword the error path to not use these variables when they are not initialized. While at it, there seemed to be a memory leak because skb_pipe_info was only freed once, after the for-loop, even though several ones were created by nfc_hci_send_cmd. Fixes: ec03ff1a8f9a ("NFC: st21nfca: Remove skb_pipe_list and skb_pipe_info useless allocation") Acked-by: Christophe Ricard Signed-off-by: Nicolas Iooss Signed-off-by: Samuel Ortiz Signed-off-by: Greg Kroah-Hartman