Koozali.org: home of the SME Server

Hardware failure SME server not accessible anymore

Offline twijtzes

  • ***
  • 47
  • +0/-0
    • http://www.foodconsult.nl
Hardware failure SME server not accessible anymore
« on: October 22, 2010, 02:50:06 PM »
Dear All,

We had a serious server crash. Aparently everything came together. Dar refused to make backups for two consecutive weeks, but overwrote all previous backups ( :sad: STUPID ME :-(; I should have checked).

One of the disks in the raid array (HP 5i) died (72 GB SCSI). I replaced it and let the raid controller rebuild it.

I cannot access the SME server anymore. Does anyone have a trick or a tip how I can rescue the information in the ibays. I saw a rescue option on the installation CD. That has been running for more then a week now. However, i still look at the CCiSS driver loading message. I see the lights of the SCSI disks flashing. Is this supposed to take so long. There was about 25 GB of data on the server.


Please please please help me
Taco

ps. A good bottle of wine/beer/whiskey/fruit juice will be sent to you if you are able to help me (I promise!)
« Last Edit: October 23, 2010, 07:04:11 AM by twijtzes »

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: Hardware failure SME server not accessible anymore
« Reply #1 on: October 22, 2010, 04:37:42 PM »
twijtzes:

does the server have a monitor? :-)

if so, please seat in front of it and tell us what you see
is there any error message?

Offline twijtzes

  • ***
  • 47
  • +0/-0
    • http://www.foodconsult.nl
Re: Hardware failure SME server not accessible anymore
« Reply #2 on: October 22, 2010, 06:15:07 PM »
It does,

there are no error messages, I see a blue screen

on the top left SME server 7.5

In the center of the screen
Loading SCSI driver
loading cciss driver

at the bottom press space bar
F12


and a lot of disk activity

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: Hardware failure SME server not accessible anymore
« Reply #3 on: October 22, 2010, 06:26:14 PM »
ok.. can you tell me which hp server are you using? did you check here if your hw is certified for redhat 4.x?

Offline twijtzes

  • ***
  • 47
  • +0/-0
    • http://www.foodconsult.nl
Re: Hardware failure SME server not accessible anymore
« Reply #4 on: October 22, 2010, 08:05:14 PM »
It has been running SME sever 7.5 flawlessly for some time now; I guess it is Redhat supported It is a HP DL380 G3

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: Hardware failure SME server not accessible anymore
« Reply #5 on: October 22, 2010, 08:33:07 PM »
It has been running SME sever 7.5 flawlessly for some time now; I guess it is Redhat supported It is a HP DL380 G3

then open a bug in bugzilla..
I suspect an hw failure

Offline twijtzes

  • ***
  • 47
  • +0/-0
    • http://www.foodconsult.nl
Re: Hardware failure SME server not accessible anymore
« Reply #6 on: October 22, 2010, 08:38:14 PM »
In what way is it a bug then ?

Can I access the system through a CD-like install to see what's left of the information on the drives. Could a reinstall work without modifying the data, a repair maybe; what are my options ?
« Last Edit: October 22, 2010, 08:41:57 PM by twijtzes »

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: Hardware failure SME server not accessible anymore
« Reply #7 on: October 22, 2010, 09:15:07 PM »
In what way is it a bug then ?

it could be a bug or an hw failure

Quote
Can I access the system through a CD-like install to see what's left of the information on the drives. Could a reinstall work without modifying the data, a repair maybe; what are my options ?

try booting from cd with
Code: [Select]
sme rescue

at boot prompt..

I would check raid status from bios too

Offline twijtzes

  • ***
  • 47
  • +0/-0
    • http://www.foodconsult.nl
Re: Hardware failure SME server not accessible anymore
« Reply #8 on: October 22, 2010, 09:31:27 PM »
I am a little bit afraid that if I stop what the system is doing now, the whole system may get corrupted.

Before I started the current rescue, the Raid Status of all disks was ok

What do you think, could the current status not be a rescue action from the software itself or is it hanging ?

I guess it is running the rescue option; it's been a week you know... is this holdup normal during rescue ?

Taco

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: Hardware failure SME server not accessible anymore
« Reply #9 on: October 22, 2010, 09:34:19 PM »
I guess it is running the rescue option; it's been a week you know... is this holdup normal during rescue ?

boot should take 1, 2 minutes, not ages :-)

reboot, check raid status, boot from cd in rescue mode, let us know (try to boot from SME8 cd too)

Offline twijtzes

  • ***
  • 47
  • +0/-0
    • http://www.foodconsult.nl
Re: Hardware failure SME server not accessible anymore
« Reply #10 on: October 22, 2010, 09:48:38 PM »
i'll try tomorrow morning;

thanks so far

is version 8 a good idea ?

i'll download it right now

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: Hardware failure SME server not accessible anymore
« Reply #11 on: October 22, 2010, 09:53:17 PM »
is version 8 a good idea ?

it comes from CentOS5.X so it's newer, maybe with a better hw support
let us know

Offline twijtzes

  • ***
  • 47
  • +0/-0
    • http://www.foodconsult.nl
Re: Hardware failure SME server not accessible anymore
« Reply #12 on: October 22, 2010, 10:25:15 PM »
The offer still stands, tomorrow i'll give it a go

Taco

Offline janet

  • *****
  • 4,812
  • +0/-0
Re: Hardware failure SME server not accessible anymore
« Reply #13 on: October 23, 2010, 05:32:17 AM »
twijtzes

A golden rule of data recovery is to stop using the disks that the data is on.
As you are running RAID rebuild and other "unknown to us" rescue/repair options, then who knows what is now happening with your disks and the data on them, if indeed there is any data still on them.

You should shutdown the machine, remove the drives, rebuild a new machine using different drives or build another test server using a single standard drive, and then mount the old drives to see if there is any date on them.

You can do a similar thing using the SME server install CD and boot up from that in rescue mode, and that way see what is on the drives, but you need to take great care that you do not inadvertantly delete any remaining data.

All the above is "critical" as you do not have a current backup, tch tch. You should always have multiple backup disks (rotated from day to day) so that you have a backup of the backup, so to speak.
Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline twijtzes

  • ***
  • 47
  • +0/-0
    • http://www.foodconsult.nl
Re: Hardware failure SME server not accessible anymore
« Reply #14 on: October 23, 2010, 06:34:09 AM »
Hi Mary,

You are very right. I have no clue when it comes to servers, I am merely a user. Found out that SME server did the trick for us and went for it. I had DAR running and ran all updates. However each time DAR made a (rotating) backup (each day, one week period), it made a backup and overwrote the previous backup of the week before. During the backup job the system rebooted, so the backups were never finished. I should, of course, have checked the backup on a daily basis, however as it had been making the backups for more three years now, I thought there was no reason to do so. I was wrong.

I haven't done anything but put in a SME install disk in the CD drive and choose rescue.
 
Can you please explain how I could delete remaining data when using the CD, as I have done that....

Kindest regards,
Taco
« Last Edit: October 23, 2010, 07:11:39 AM by twijtzes »