Here's the deal. I'm running e-smith 4.0 (gateway/file server mode) on a PII 400, 128mbram, 13gb fujitsu ide drive. Everything is operating great except for one big problem. On the console I get the error below. Things are still working alright, but every so often the server completely freezes (i.e. can't reboot) and the following message (or a variant of the same) is logged to the screen (and the messages log):
Dec 12 04:03:00 e-smith kernel: kfree: Bad obj c16afe60
Dec 12 04:03:00 e-smith kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Dec 12 04:03:00 e-smith kernel: current->tss.cr3 = 00ef7000, %cr3 = 00ef7000
Dec 12 04:03:00 e-smith kernel: *pde = 00000000
Dec 12 04:03:00 e-smith kernel: Oops: 0002
Dec 12 04:03:00 e-smith kernel: CPU: 0
Dec 12 04:03:00 e-smith kernel: EIP: 0010:[kfree+403/424]
Dec 12 04:03:00 e-smith kernel: EFLAGS: 00010286
Dec 12 04:03:00 e-smith kernel: eax: 0000001b ebx: c7fb2620 ecx: 0000001a edx: 00000021
Dec 12 04:03:00 e-smith kernel: esi: c16afe60 edi: c2e80550 ebp: 00000688 esp: c09f7e68
Dec 12 04:03:00 e-smith kernel: ds: 0018 es: 0018 ss: 0018
Dec 12 04:03:00 e-smith kernel: Process slocate (pid: 4258, process nr: 83, stackpage=c09f7000)
Dec 12 04:03:00 e-smith kernel: Stack: c7fb3060 c16afa00 c2e80550 00000688 c16afa00 c2e80550 c01317c4 c16afe60
Dec 12 04:03:00 e-smith kernel: c09f7ed0 c09f7ed0 c021ba64 00001006 c09f7ed0 00000001 00001006 c013275b
Dec 12 04:03:00 e-smith kernel: fffff682 00001006 00000000 c0258450 c021ba64 c0258450 c3db9640 c3f03cb0
Dec 12 04:03:00 e-smith kernel: Call Trace: [prune_dcache+220/300] [try_to_free_inodes+199/264] [grow_inodes+30/384] [get_new_inode+173/280] [get_new_inode+185/280] [iget+88/96] [ext2_lookup+84/124]
Dec 12 04:03:00 e-smith kernel: [real_lookup+79/160] [lookup_dentry+296/488] [__namei+40/88] [sys_newlstat+14/96] [system_call+52/56] [startup_32+43/285]
Dec 12 04:03:00 e-smith kernel: Code: c7 05 00 00 00 00 00 00 00 00 83 c4 08 5b 5e 5f 5d 83 c4 08
The only way to recover is a hard reboot. This forces me into fsck, which find numerous errors in the filesystem. Have the system fix these and everything appears normal again. Wait a couple of days and the same problem. It is getting worse.
I'm not really sure what is going on to cause the error (I'm still trying to trace). It might be happening when reasonably sized (10mb+) file transfres are happening.
Any ideas?
Thanks