Contributed by Paul Richards <paul@FreeBSD.org> and Jörg Wunsch <joerg@FreeBSD.org>
Here are some instructions for getting kernel debugging working on a crash dump. They assume that you have enough swap space for a crash dump. If you have multiple swap partitions and the first one is too small to hold the dump, you can configure your kernel to use an alternate dump device (in the config kernel line), or you can specify an alternate using the dumpon(8) command. The best way to use dumpon(8) is to set the dumpdev variable in /etc/rc.conf. Typically you want to specify one of the swap devices specified in /etc/fstab. Dumps to non-swap devices, tapes for example, are currently not supported. Config your kernel using config -g. See Kernel Configuration for details on configuring the FreeBSD kernel.
Use the dumpon(8) command to tell the kernel where to dump to (note that this will have to be done after configuring the partition in question as swap space via swapon(8)). This is normally arranged via /etc/rc.conf and /etc/rc. Alternatively, you can hard-code the dump device via the dump clause in the config line of your kernel config file. This is deprecated and should be used only if you want a crash dump from a kernel that crashes during booting.
Note: In the following, the term gdb refers to the debugger gdb run in ``kernel debug mode''. This can be accomplished by starting the gdb with the option -k. In kernel debug mode, gdb changes its prompt to (kgdb).
Tip: If you are using FreeBSD 3 or earlier, you should make a stripped copy of the debug kernel, rather than installing the large debug kernel itself:
# cp kernel kernel.debug # strip -g kernelThis stage isn't necessary, but it is recommended. (In FreeBSD 4 and later releases this step is performed automatically at the end of the kernel make process.) When the kernel has been stripped, either automatically or by using the commands above, you may install it as usual by typing make install.
Note that older releases of FreeBSD (up to but not including 3.1) used a.out kernels by default, which must have their symbol tables permanently resident in physical memory. With the larger symbol table in an unstripped debug kernel, this is wasteful. Recent FreeBSD releases use ELF kernels where this is no longer a problem.
If you are testing a new kernel, for example by typing the new kernel's name at the boot prompt, but need to boot a different one in order to get your system up and running again, boot it only into single user state using the -s flag at the boot prompt, and then perform the following steps:
# fsck -p # mount -a -t ufs # so your file system for /var/crash is writable # savecore -N /kernel.panicked /var/crash # exit # ...to multi-user
This instructs savecore(8) to use another kernel for symbol name extraction. It would otherwise default to the currently running kernel and most likely not do anything at all since the crash dump and the kernel symbols differ.
Now, after a crash dump, go to /sys/compile/WHATEVER and run gdb -k. From gdb do:
symbol-file kernel.debug exec-file /var/crash/kernel.0 core-file /var/crash/vmcore.0and voila, you can debug the crash dump using the kernel sources just like you can for any other program.
Here is a script log of a gdb session illustrating the procedure. Long lines have been folded to improve readability, and the lines are numbered for reference. Despite this, it is a real-world error trace taken during the development of the pcvt console driver.
1:Script started on Fri Dec 30 23:15:22 1994 2:# cd /sys/compile/URIAH 3:# gdb -k kernel /var/crash/vmcore.1 4:Reading symbol data from /usr/src/sys/compile/URIAH/kernel ...done. 5:IdlePTD 1f3000 6:panic: because you said to! 7:current pcb at 1e3f70 8:Reading in symbols for ../../i386/i386/machdep.c...done. 9:(kgdb) where 10:#0 boot (arghowto=256) (../../i386/i386/machdep.c line 767) 11:#1 0xf0115159 in panic () 12:#2 0xf01955bd in diediedie () (../../i386/i386/machdep.c line 698) 13:#3 0xf010185e in db_fncall () 14:#4 0xf0101586 in db_command (-266509132, -266509516, -267381073) 15:#5 0xf0101711 in db_command_loop () 16:#6 0xf01040a0 in db_trap () 17:#7 0xf0192976 in kdb_trap (12, 0, -272630436, -266743723) 18:#8 0xf019d2eb in trap_fatal (...) 19:#9 0xf019ce60 in trap_pfault (...) 20:#10 0xf019cb2f in trap (...) 21:#11 0xf01932a1 in exception:calltrap () 22:#12 0xf0191503 in cnopen (...) 23:#13 0xf0132c34 in spec_open () 24:#14 0xf012d014 in vn_open () 25:#15 0xf012a183 in open () 26:#16 0xf019d4eb in syscall (...) 27:(kgdb) up 10 28:Reading in symbols for ../../i386/i386/trap.c...done. 29:#10 0xf019cb2f in trap (frame={tf_es = -260440048, tf_ds = 16, tf_\ 30:edi = 3072, tf_esi = -266445372, tf_ebp = -272630356, tf_isp = -27\ 31:2630396, tf_ebx = -266427884, tf_edx = 12, tf_ecx = -266427884, tf\ 32:_eax = 64772224, tf_trapno = 12, tf_err = -272695296, tf_eip = -26\ 33:6672343, tf_cs = -266469368, tf_eflags = 66066, tf_esp = 3072, tf_\ 34:ss = -266427884}) (../../i386/i386/trap.c line 283) 35:283 (void) trap_pfault(&frame, FALSE); 36:(kgdb) frame frame->tf_ebp frame->tf_eip 37:Reading in symbols for ../../i386/isa/pcvt/pcvt_drv.c...done. 38:#0 0xf01ae729 in pcopen (dev=3072, flag=3, mode=8192, p=(struct p\ 39:roc *) 0xf07c0c00) (../../i386/isa/pcvt/pcvt_drv.c line 403) 40:403 return ((*linesw[tp->t_line].l_open)(dev, tp)); 41:(kgdb) list 42:398 43:399 tp->t_state |= TS_CARR_ON; 44:400 tp->t_cflag |= CLOCAL; /* cannot be a modem (:-) */ 45:401 46:402 #if PCVT_NETBSD || (PCVT_FREEBSD >= 200) 47:403 return ((*linesw[tp->t_line].l_open)(dev, tp)); 48:404 #else 49:405 return ((*linesw[tp->t_line].l_open)(dev, tp, flag)); 50:406 #endif /* PCVT_NETBSD || (PCVT_FREEBSD >= 200) */ 51:407 } 52:(kgdb) print tp 53:Reading in symbols for ../../i386/i386/cons.c...done. 54:$1 = (struct tty *) 0x1bae 55:(kgdb) print tp->t_line 56:$2 = 1767990816 57:(kgdb) up 58:#1 0xf0191503 in cnopen (dev=0x00000000, flag=3, mode=8192, p=(st\ 59:ruct proc *) 0xf07c0c00) (../../i386/i386/cons.c line 126) 60: return ((*cdevsw[major(dev)].d_open)(dev, flag, mode, p)); 61:(kgdb) up 62:#2 0xf0132c34 in spec_open () 63:(kgdb) up 64:#3 0xf012d014 in vn_open () 65:(kgdb) up 66:#4 0xf012a183 in open () 67:(kgdb) up 68:#5 0xf019d4eb in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi =\ 69: 2158592, tf_esi = 0, tf_ebp = -272638436, tf_isp = -272629788, tf\ 70:_ebx = 7086, tf_edx = 1, tf_ecx = 0, tf_eax = 5, tf_trapno = 582, \ 71:tf_err = 582, tf_eip = 75749, tf_cs = 31, tf_eflags = 582, tf_esp \ 72:= -272638456, tf_ss = 39}) (../../i386/i386/trap.c line 673) 73:673 error = (*callp->sy_call)(p, args, rval); 74:(kgdb) up 75:Initial frame selected; you cannot go up. 76:(kgdb) quit 77:# exit 78:exit 79: 80:Script done on Fri Dec 30 23:18:04 1994
Comments to the above script:
This is a dump taken from within DDB (see below), hence the panic comment ``because you said to!'', and a rather long stack trace; the initial reason for going into DDB has been a page fault trap though.
This is the location of function trap() in the stack trace.
Force usage of a new stack frame; this is no longer necessary now. The stack frames are supposed to point to the right locations now, even in case of a trap. (I do not have a new core dump handy <g>, my kernel has not panicked for a rather long time.) From looking at the code in source line 403, there is a high probability that either the pointer access for ``tp'' was messed up, or the array access was out of bounds.
The pointer looks suspicious, but happens to be a valid address.
However, it obviously points to garbage, so we have found our error! (For those unfamiliar with that particular piece of code: tp->t_line refers to the line discipline of the console device here, which must be a rather small integer number.)
For questions about FreeBSD, e-mail
<questions@FreeBSD.org>.
For questions about this documentation, e-mail <doc@FreeBSD.org>.