This TODO list gets stale very rapidly, and so is likely not up-to-date. 20040909: Brian Feldman has reported additional BPF related races when an interface detaches while being sniffed using BPF. In the 20040908 patch, I add annotations of a class of related races in BPF. It looks like the BPF code could use some amount of clean-up and restructuring (I've merged a change to make it use queue(3) as a starting point in order to make the behavior more clear). Continuing work to explore the performance impact of eliminating locking in mbuf allocation and free paths using an experimental per-thread UMA cache model (20040907-rwatson_umaperthread.diff); it avoids locking in the common case and simulates the effects of using per-cpu caches w/o mutexes for the purposes of testing. There appear to be a couple of races between consumers of if_afdata and the ifnet attachment code, which sets the initialized flag before it's actually properly initialized. Simply moving the flag set probably won't help, as consumers of if_afdata don't check the flag. A better strategy would appear to be to explore re-ordering the ifnet allocation and initialization code to eliminate exposing the ifnet before it's fully initialized (in as much as is possible). Optimizations made to the /dev/random yarrow and entropy harvesting code in netperf (20040828) need to be benchmarked and then possibly merged to the FreeBSD CVS repository. These changes eliminate a large number of locking operations, but in some cases possible at the cost of increasing contention. 20040907: bpf.c:1.36 on 20040909 is believed to correct this. Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x10 fault code = supervisor read, page not present instruction pointer = 0x8:0xc066b25e stack pointer = 0x10:0xe93bbc2c frame pointer = 0x10:0xe93bbc40 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 44 (irq31: em3) [thread 100042] Stopped at catchpacket+0x12: movl 0x10(%eax),%eax db> trace catchpacket(c58afe00,c5a89802,5ea,60,c07a3cf4) at catchpacket+0x12 bpf_tap(c5574000,c5a89802,5ea,c5a50800,c553c800) at bpf_tap+0xb7 bpf_mtap(c5574000,c5a5ec00) at bpf_mtap+0x36 ether_input(c553c800,c5a5ec00) at ether_input+0x119 em_process_receive_interrupts(c553c800,ffffffc6,0,c5574140,4) at em_process_rece ive_interrupts+0x2f9 em_intr(c553c800) at em_intr+0x10a ithread_loop(c53e0d80,e93bbd48) at ithread_loop+0x1bd fork_exit(c05f34ec,c53e0d80,e93bbd48) at fork_exit+0x75 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe93bbd7c, ebp = 0 --- 20040805: Some more things to do: - Walk network interfaces and decide which need INTR_MPSAFE vs. which need IFF_NEEDSGIANT. (FIXED) - Flip debug_mpsafenet to 1 by default. Document it. (FIXED) - Find a link-time way to identify code linked into the kernel that requires Giant to operate correctly. (FIXED) - The AIO socket call backs rely on Giant to not corrupt the AIO work lists (etc). This needs to be fixed. Things to think about: - Are we still doing %fs machinery on UP for PCPU access? Can we make it cheaper by making stronger UP assumptions? - Can we further optimize use of pcpu stuff in mi_switch()? 20040801: Big ticket TODO items are now: - Kqueue locking: there are two WIP patches, and one needs to be committed so that Kqueue is safe for pipes and sockets w/o Giant present. (jmg) (FIXED) - Review, test, fix, etc, IFF_NEEDSGIANT support. Sweep network drivers to mark as IFF_NEEDSGIANT to allow non-MPSAFE drivers to run in a slightly degraded mode in the absence of Giant. (rwatson) (FIXED) - Make more network device drivers inherrently MPSAFE. - More IPv6 locking, in6_prefix removal to simplify IPv6 locking. (gnn) (FIXED) - Netipx and other less popular network stack pieces need more work. - There appear to be some possible races in the select()/poll() code that trigger under load. They appear not to trigger with debug.mpsafenet=0, and may be the result of removing Giant. This needs to be debugged and fixed. (FIXED) - Proper locking needed in ip_ctloutput(). (rwatson) - A fair number of Netgraph nodes need review for locking requirements and locking. (glebius, bz, et al) - The two PF_KEY implementations need to use netisrs similar to routing socket dispatch. (rwatson) - tcp_timer.c contains a global time wait list array that appears not to be properly locked. 20040624: TODO list greatly stale. Here are some known TODO items: - Portalfs locking is problematic; it reaches into the UNIX domain socket code with what is likely inadequate locking. (FIXED) - Giant pushdown needs to occur in fdrop_locked() for fo_close(), but likely fo_stat() and others. (FIXED) - There are known lock order issues between raw socket control blocks and the pfkey code in both KAME IPSEC and FAST_IPSEC. These likely need to be resolved using a netisr to defer socket deliver of pfkey messages to avoid acquiring the raw socket pcb list lock at poor moments. Changes would be similar to the routing socket changes. - mbuma statistics are not updated atomically in the CVS version, although they are on rwatson_netperf. - Need to trim a lot of spls. - soreceive() and sosend() in rwatson_netperf differ quite a bit from CVS, and the differences need to be reviewed (and maybe merged). There are cached state issues in soreceive() in both branches that need to be resolved. (FIXED) - Accept filter locking needs to be merged (and probably fixed more). (FIXED) - More locking assertions in socketvar.h macros would be useful. - There are a number of locking changes present in rwatson_netperf for UDP and TCP usrreq protocolk entry points that need to be reviewed, possibly simplified, and merged. - ip_ctloutput() frobs inpcb state with inadequate locking for mrouter and other bits, the inpcb needs to be passed down so these functions can lock properly without holding the mutex over a blocking sleep. - Several drivers use si_drv1 for non-atomic test-and-set. More to come... 20040416: Known lock orders consist of a lock order reversal between the TCP code and routing code as a result of calls to tcp_usr_rcvd(). sshd panic below now resolved as a result of locking improvements in soreceive(). Major todo items are: Task Owner ---- ----- if_ppp rwatson, maurycy ifnet luigi, mlaier, maurycy ifaddr luigi ifnet cloning brooks NFS server rick kqueue jmg, feldman KAME IPsec - KAME IPv6 mlaier netipx - netatalk rwatson, bob bishop network interfaces - sppp rik netgraph - sendfile/aio - 20040408: mlaier is picking up additional socket locking. jmg is picking up kqueue locking. sshd-generated panic below appears to be legitimate after enabling proper locking in if_xl. May be an existing race opened up by locking, or an error in logic changes in UNIX domain sockets. It specifically appears to be a problem with sshd passing ancillary data. (fixed) 20040407: Note: 4-processor system panics and problems may be due to problems with if_xl locking that I didn't see until Scott Long pointed them out to me. Experienced the following panic during high NFS server traffic on a four CPU system: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x3bdc0259 fault code = supervisor read, page not present instruction pointer = 0x8:0xc05fd365 stack pointer = 0x10:0xe980ac4c frame pointer = 0x10:0xe980ac5c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 418 (nfsd) kernel: type 12 trap, code=0 Stopped at m_free+0x11: cmpw $0,0x10(%ebx) db> trace m_free(3bdc0249) at m_free+0x11 m_freem(c228ec00) at m_freem+0x12 nfsrv_getcache(c669aa00,e980aca8,0,2,2) at nfsrv_getcache+0x2ca nfssvc_nfsd(c622e930,c0882080,8,c07e1e00,3e2) at nfssvc_nfsd+0x2ad nfssvc(c622e930,e980ad14,2,0,292) at nfssvc+0x15f syscall(2f,2f,2f,bfbfeec4,4) at syscall+0x217 Xint0x80_syscall() at Xint0x80_syscall+0x1d --- syscall (155, FreeBSD ELF32, nfssvc), eip = 0x280b86d7, esp = 0xbfbfeb2c, ebp = 0xbfbfeb48 --- db> Experienced the following odd exit of dhclient on a four CPU system: hippy# Apr 7 10:05:17 hippy kernel: pid 254 (dhclient), uid 0: exited on signal 11 (core dumped) Core was generated by `dhclient'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libc.so.5...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.5 Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)... done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x0807f001 in getsockname () (gdb) bt #0 0x0807f001 in getsockname () #1 0x080a1230 in ?? () #2 0x0807c2f8 in getsockname () #3 0x08050a9a in getsockname () #4 0x0804c679 in getsockname () #5 0x08049d36 in getsockname () (gdb) Experienced the following panic when starting an incoming SSH session to a four logical processor box: db> show msgbuf msgbufp = 0xc101bfe4 magic = 63062, size = 32740, r= 9090, w = 9199, ptr = 0xc1014000, cksum= 716347 panic: m 0 so->so_rcv.sb_cc 17 at line 859 in file ../../../kern/uipc_socket.c cpuid = 3; Debugger("panic") db> trace Debugger(c07c3561) at Debugger+0x46 __panic(c07c94c2,35b,c07c954e,0,11) at __panic+0x13d soreceive(c6718c30,ebba7c0c,ebba7c38,0,ebba7c10) at soreceive+0x1f4 recvit(c67f82a0,3,ebba7cc0,0,bfbfe410) at recvit+0x1a2 recvmsg(c67f82a0,ebba7d14,3,4,296) at recvmsg+0x9a syscall(808002f,bfbf002f,bfbf002f,bfbfe44c,8079a70) at syscall+0x217 Xint0x80_syscall() at Xint0x80_syscall+0x1d --- syscall (27, FreeBSD ELF32, recvmsg), eip = 0x282afff7, esp = 0xbfbfe3fc, eb p = 0xbfbfe458 --- It appears to be a race in the socket buffer code; the receive path looks reasonable, but the socket buffer sizing code looks suspect (lots of read-modify-write on size fields without locks). 20040404: KQueue has no locking of the majority of its state, and will require significant attention. John-Mark Gurney has agreed to look at this problem. A number of fields in network interfaces may not be locked properly; Max Laier has agreed to look at this problem. 20040402: Kris Kennaway has reported a hang in the NFS server code that needs to be investigated. (FIXED) Kris Kennaway reported a mutex recursion panic in the socket code. (FIXED) netnatm needs its pcb list locked down. 20040331: There are some known lock order reversals reported by WITNESS; see REVERSALS. net/if_sl.c now has global variable locking with some nits; it needs softc locking. (FIXED) 20040330: net/if.c requires locking (cloning, lists, ifaddrs, etc). net/if_spppsubr.c requires locking. net/if_gif softc requires attention, especially relating to inter-layer softc use. net/if_gre softc requires attention, especially relating to inter-layer softc use. net/if_ppp.c requires locking. net/if_sl.c requires locking. (Done) netinet/accf_*.c require locking. netinet6/*.c needs to use pcb/pcbinfo locking. netinet6/*.c requires locking. General unresolved issues in the use of si_flags and si_drv1 across pseudo-interfaces. net/if_vlan.c requires review, testing. net/if_stf.c requires review, testing. net/raw_cb.c requires review, testing. netinet/tcp_timer.c requires review. netatalk/*.c needs review, locking, testing. netatm needs review, locking, testing. netnatm needs review, locking, testing. netipx/*.c needs review, locking, testing. netgraph/*.c needs review, locking, testing. Socket locking needs to handle MAC labels.