Koozali.org: home of the SME Server

Server unresponsive / kernel panic

Offline ntblade

  • *
  • 252
  • +0/-0
Server unresponsive / kernel panic
« on: June 04, 2009, 12:41:47 PM »
Hi all,
One of my servers became unresponsive this morning - couldn't ping, ssh, renew address lease and (momentarily) pressing the power button had no effect so I had no option but to hard power it off and restart.

Here is the contents of /var/log/messages:
Code: [Select]
Jun  3 22:41:00 sme dhcpd: DHCPREQUEST for 192.168.0.50 from 00:16:76:76:06:c4 via eth0
Jun  3 22:41:00 sme dhcpd: DHCPACK on 192.168.0.50 to 00:16:76:76:06:c4 via eth0
Jun  3 22:42:26 sme squid[4321]: sslReadServer: FD 20: read failure: (104) Connection reset by peer
Jun  3 22:46:45 sme squid[4321]: sslReadServer: FD 18: read failure: (104) Connection reset by peer
Jun  4 04:11:03 sme squid[4321]: storeDirWriteCleanLogs: Starting...
Jun  4 04:11:03 sme squid[4321]:   Finished.  Wrote 6077 entries.
Jun  4 04:11:03 sme squid[4321]:   Took 0.0 seconds (238379.2 entries/sec).
Jun  4 04:11:03 sme squid[4321]: logfileRotate: /var/log/squid/store.log
Jun  4 04:11:03 sme squid[4321]: logfileRotate: /var/log/squid/access.log
Jun  4 05:14:55 sme kernel: eip: c012044d
Jun  4 05:14:55 sme kernel: ------------[ cut here ]------------
Jun  4 05:14:55 sme kernel: kernel BUG at include/asm/spinlock.h:146!
Jun  4 05:14:55 sme kernel: invalid operand: 0000 [#1]
Jun  4 05:14:55 sme kernel: SMP
Jun  4 05:14:55 sme kernel: Modules linked in: ppp_mppe(U) ppp_async crc_ccitt ppp_generic(U) slhc appletalk(U) sk98lin ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_state ip_nat_ftp ip_conntrack_ftp iptable_mangle iptable_nat ip_conntrack iptable_filter ip_tables button battery ac uhci_hcd ehci_hcd snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore bonding(U) dm_snapshot dm_zero dm_mirror ext3 jbd raid1 dm_mod ata_piix libata sd_mod scsi_mod
Jun  4 05:14:55 sme kernel: CPU:    0
Jun  4 05:14:55 sme kernel: EIP:    0060:[_spin_lock_irqsave+32/69]    Not tainted VLI
Jun  4 05:14:55 sme kernel: EIP:    0060:[<c02df64b>]    Not tainted VLI
Jun  4 05:14:55 sme kernel: EFLAGS: 00010012   (2.6.9-78.0.8.ELsmp)
Jun  4 05:14:55 sme kernel: EIP is at _spin_lock_irqsave+0x20/0x45
Jun  4 05:14:55 sme kernel: eax: c012044d   ebx: 00000292   ecx: c02f32e1   edx: c02f32e1
Jun  4 05:14:55 sme kernel: esi: d2ed56a0   edi: d2ed56a0   ebp: 09e89468   esp: d9d70f64
Jun  4 05:14:55 sme kernel: ds: 007b   es: 007b   ss: 0068
Jun  4 05:14:55 sme kernel: Process squid (pid: 4321, threadinfo=d9d70000 task=d80d5130)
Jun  4 05:14:55 sme kernel: Stack: d26f8058 d26f8064 c012044d d26f8054 d26f8000 00000000 c016d3f7 00000000
Jun  4 05:14:55 sme kernel:        d3985788 c016dff9 000f4627 00000000 d3985780 00000000 00000000 c016d415
Jun  4 05:14:55 sme kernel:        d26f8000 00000000 c01265f5 09e89448 00945f20 00363ff4 d9d70000 c02e0a83
Jun  4 05:14:55 sme kernel: Call Trace:
Jun  4 05:14:55 sme kernel:  [remove_wait_queue+15/52] remove_wait_queue+0xf/0x34
Jun  4 05:14:55 sme kernel:  [<c012044d>] remove_wait_queue+0xf/0x34
Jun  4 05:14:55 sme kernel:  [poll_freewait+26/56] poll_freewait+0x1a/0x38
Jun  4 05:14:55 sme kernel:  [<c016d3f7>] poll_freewait+0x1a/0x38
Jun  4 05:14:55 sme kernel:  [sys_poll+618/633] sys_poll+0x26a/0x279
Jun  4 05:14:55 sme kernel:  [<c016dff9>] sys_poll+0x26a/0x279
Jun  4 05:14:55 sme kernel:  [__pollwait+0/149] __pollwait+0x0/0x95
Jun  4 05:14:55 sme kernel:  [<c016d415>] __pollwait+0x0/0x95
Jun  4 05:14:55 sme kernel:  [sys_gettimeofday+83/172] sys_gettimeofday+0x53/0xac
Jun  4 05:14:55 sme kernel:  [<c01265f5>] sys_gettimeofday+0x53/0xac
Jun  4 05:14:55 sme kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
Jun  4 05:14:55 sme kernel:  [<c02e0a83>] syscall_call+0x7/0xb
Jun  4 05:14:55 sme kernel:  [__lock_text_end+2008/4221] __lock_text_end+0x7d8/0x107d
Jun  4 05:14:55 sme kernel:  [<c02e007b>] __lock_text_end+0x7d8/0x107d
Jun  4 05:14:55 sme kernel: Code: 81 00 00 00 00 01 c3 f0 ff 00 c3 56 89 c6 53 9c 5b fa 81 78 04 ad 4e ad de 74 18 ff 74 24 08 68 e1 32 2f c0 e8 f9 34 e4 ff 59 58 <0f> 0b 92 00 d1 22 2f c0 f0 fe 0e 79 13 f7 c3 00 02 00 00 74 01
Jun  4 05:14:55 sme kernel:  <0>Fatal exception: panic in 5 seconds

//Rebooted here //

Jun  4 10:17:33 sme syslogd 1.4.1: restart.
Jun  4 10:17:33 sme syslog: syslogd startup succeeded
Jun  4 10:17:33 sme kernel: klogd 1.4.1, log source = /proc/kmsg started.
Jun  4 10:17:33 sme syslog: klogd startup succeeded
Jun  4 10:17:33 sme kernel: Inspecting /boot/System.map-2.6.9-78.0.8.ELsmp
Jun  4 10:17:33 sme kernel: Loaded 25371 symbols from /boot/System.map-2.6.9-78.0.8.ELsmp.
Jun  4 10:17:33 sme kernel: Symbols match kernel version 2.6.9.
Jun  4 10:17:33 sme kernel: No module symbols loaded - kernel modules not enabled.
Jun  4 10:17:33 sme kernel: Linux version 2.6.9-78.0.8.ELsmp (mockbuild@builder16.centos.org) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-10)) #1 SMP Wed Nov 19 20:05:04 EST 2008
Jun  4 10:17:33 sme kernel: BIOS-provided physical RAM map:

Can anyone help me diagnose this please?

Many thanks,
Norrie

Offline MSmith

  • *
  • 675
  • +0/-0
Re: Server unresponsive / kernel panic
« Reply #1 on: June 04, 2009, 04:36:32 PM »
Assuming it's a "stock" SME (no contribs installed) I'd look at the hardware first, including a thorough RAM check, hard disk check and visual inspection of the motherboard for blown capacitors.  Might want to dust the machine out and check your CPU fan while you're at it in case you have a heat problem.  If it's an older machine, remove the expansion cards and RAM and use the ancient pencil eraser trick to clean the contacts.
...

Offline ntblade

  • *
  • 252
  • +0/-0
Re: Server unresponsive / kernel panic
« Reply #2 on: June 04, 2009, 05:53:38 PM »
Cheers,
I've a nasty feeling about this one as I had to replace a blown PSU the day before this happened.  Up until then it had been rock solid.  I'm just about to go away for a week as well.  Argggh!

N

Offline MSmith

  • *
  • 675
  • +0/-0
Re: Server unresponsive / kernel panic
« Reply #3 on: June 04, 2009, 07:10:22 PM »
Well then, pull the hard drive(s) and throw them into another box; check them with the drive maker's diagnostics then fire it up.  Most likely SME will wake up, shake itself, notice the new hardware and go about its business.
...