Koozali.org: home of the SME Server

~Help~ Kernel Panic Error

Offline bandito

  • *
  • 13
  • +0/-0
~Help~ Kernel Panic Error
« on: November 04, 2010, 12:57:15 AM »
I am running an SME 7.4 server on a Dell Poweredge server with a Perc5/i raid card (this is for a small school I volunteer for).  I'm pretty sure the hard drives and array are intact because they check out in the raid config utility.  There was an issue with the power and it went off and on a number of times I assume without properly shutting down.  Now I am receiving a "kernel panic - not syncing" "attempting to kill init" error message.


I tried to run the "sme rescue" from the CD but it cannot find any partitions to fix.  But I know a partition exists because it is booting up to grub.  I have searched through the forums and didn't find any compatible solutions. I have attached a picture of the grub screen and the error.   I greatly appreciate any help.

http://img440.imageshack.us/f/1103101824.jpg/

http://img801.imageshack.us/f/1103101825.jpg/
« Last Edit: November 04, 2010, 02:04:04 AM by bandito »

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: ~Help~ Kernel Panic Error
« Reply #1 on: November 04, 2010, 07:09:55 AM »
Most likely a hard drive failure, I think the drive(s) that contain the root (/) partitions are lost. Since you do not detail your configuration very well (What kind of raid type? What raid configuration?) it leaves us nothing but guess work.
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline bandito

  • *
  • 13
  • +0/-0
Re: ~Help~ Kernel Panic Error
« Reply #2 on: November 04, 2010, 03:40:17 PM »
It is a raid5 array on a hardware raid card.  The onboard raid  utility says the array and hard drives are functional.  So I don't think it is a physical failure.  If the array was broken somehow how would the system boot to the grub menu?  I think the SME boot partion is at least intact. 

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: ~Help~ Kernel Panic Error
« Reply #3 on: November 04, 2010, 03:46:57 PM »
does the controller have cache memory? does it have BBU?

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: ~Help~ Kernel Panic Error
« Reply #4 on: November 04, 2010, 04:09:09 PM »
It is a raid5 array on a hardware raid card.  The onboard raid  utility says the array and hard drives are functional.  So I don't think it is a physical failure.
That might be the case indeed. Not sure what the raid utility tests, but I doubt it does a failure check on bad clusters.

If the array was broken somehow how would the system boot to the grub menu?  I think the SME boot partion is at least intact.
The grub menu is at the /boot partition IIRC, which is a different partition then the root partition. I think the SME boot partition is intact to, hence my suspicion concerning the root partition.
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline bandito

  • *
  • 13
  • +0/-0
Re: ~Help~ Kernel Panic Error
« Reply #5 on: November 04, 2010, 04:33:55 PM »
does the controller have cache memory? does it have BBU?

yes.  it has a cache and a battery backup.

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: ~Help~ Kernel Panic Error
« Reply #6 on: November 04, 2010, 04:49:19 PM »
yes.  it has a cache and a battery backup.

then it shouldn't be a fs corruption
I think something is wrong with your hw (at least 2 disk if raid5).. start/stop cycles and electric spokes are evil for hds

Offline bandito

  • *
  • 13
  • +0/-0
Re: ~Help~ Kernel Panic Error
« Reply #7 on: November 04, 2010, 05:05:34 PM »
then it shouldn't be a fs corruption
I think something is wrong with your hw (at least 2 disk if raid5).. start/stop cycles and electric spokes are evil for hds

When the bios boots for the RAID card there is a battery error message.  It says that it is completely discharged and the write caching is disabled.  So there was probably data in it when the power issues happened. 

The RAID utility has several utilities that I used to check the hard drives individually and the a raid array consistency check.  The array and HDs passed all of the tests (well all the ones that wouldn't destroy the data). This is what makes me believe it is a fs issue.  If I load the SME cd it sees the array for an install/upgrade, but it doesn't see the partition.  I'm sure I could do a clean install onto the existing array.   Are there any command line options I can try?  Would booting to a different kernel help?

Offline janet

  • *****
  • 4,812
  • +0/-0
Re: ~Help~ Kernel Panic Error
« Reply #8 on: November 04, 2010, 09:29:52 PM »
bandito

This page http://wiki.contribs.org/Category:Howto has a number of RAID articles that may assist you. There are also very good generic Linux RAID articles on the net, google for Linux RAID. It is a complex subject and if you do not understand RAID already, then you potentially have a steep learning curve.
Trying to learn on a broken system and being sure you will recover your data is not absolutely guaranteed.

An approach to take would be to boot up to the SME install CD in rescue mode (press F5 & type sme rescue), mount the system (refer on screen prompts) and determine the RAID status by running
cat /proc/mdstat
What it says will determine what you do next.
From the command prompt you may then be able to correct or re-add the partitions (refer the RAID articles or post the output back here for suggestions).

To really play safe you should clone all hard drives and set aside the original drives, noting the port that each drive is connected to, and then "play" with the cloned drives trying to repair the partitions. That way you won't inadvertantly overwrite data on the original drives.
Find an expert in RAID5 to help you if you are unsure.
 
There have been numerous posts in these forums about rebuilding RAID arrays and adding partitions, so search the forums, go back 3 or 4 years.

Kernel panic messages are usually a sign of hardware failure or incompatibility of some sort.
Check individual components are OK eg drives, drive controller, motherboard, power supply (all busses).

Do you have good backups to restore from ?
Do you need to (ie preferably desire) recover data that is on the existing drives (if possible), or would a restore from backip be OK ?


.
« Last Edit: November 04, 2010, 09:34:10 PM by mary »
Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: ~Help~ Kernel Panic Error
« Reply #9 on: November 04, 2010, 09:53:44 PM »
mary, he is talking about HW raid, not sw one

Offline bandito

  • *
  • 13
  • +0/-0
Re: ~Help~ Kernel Panic Error
« Reply #10 on: November 04, 2010, 10:04:10 PM »
Thanks for the info, Mary.  I'll look into it tonight. 

Unfortunately, I was backing up to a USB drive that was plugged into the same UPS that garfed up the server.  The USB drive is toast.

bandito

This page http://wiki.contribs.org/Category:Howto has a number of RAID articles that may assist you. There are also very good generic Linux RAID articles on the net, google for Linux RAID. It is a complex subject and if you do not understand RAID already, then you potentially have a steep learning curve.
Trying to learn on a broken system and being sure you will recover your data is not absolutely guaranteed.

An approach to take would be to boot up to the SME install CD in rescue mode (press F5 & type sme rescue), mount the system (refer on screen prompts) and determine the RAID status by running
cat /proc/mdstat
What it says will determine what you do next.
From the command prompt you may then be able to correct or re-add the partitions (refer the RAID articles or post the output back here for suggestions).

To really play safe you should clone all hard drives and set aside the original drives, noting the port that each drive is connected to, and then "play" with the cloned drives trying to repair the partitions. That way you won't inadvertantly overwrite data on the original drives.
Find an expert in RAID5 to help you if you are unsure.
 
There have been numerous posts in these forums about rebuilding RAID arrays and adding partitions, so search the forums, go back 3 or 4 years.

Kernel panic messages are usually a sign of hardware failure or incompatibility of some sort.
Check individual components are OK eg drives, drive controller, motherboard, power supply (all busses).

Do you have good backups to restore from ?
Do you need to (ie preferably desire) recover data that is on the existing drives (if possible), or would a restore from backip be OK ?


.

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: ~Help~ Kernel Panic Error
« Reply #11 on: November 04, 2010, 10:07:52 PM »
Thanks for the info, Mary.  I'll look into it tonight. 

Unfortunately, I was backing up to a USB drive that was plugged into the same UPS that garfed up the server.  The USB drive is toast.


then I suspect something evil happened to the internal hds