Koozali.org: home of the SME Server

Another Raid question. System will not reboot.

Offline tviles

  • *****
  • 197
  • +0/-0
Another Raid question. System will not reboot.
« on: November 01, 2008, 02:55:32 AM »
Dell Poweredge 2500 with 6 scsi's so SME is running raid 5 with spare. Notice after last two SME updates that server will not reboot after installing updates. Does anyone see anything jumping out at them from the below info? Thanks Tracy Oh I can push the power button a few times then SME will start to load up again.
I will try to catch error message next time more updates come in and it is time for a reboot.


[root@XXXXXXXXX ~]# fdisk -l | more
Disk /dev/md1 doesn't contain a valid partition table
Disk /dev/md2 doesn't contain a valid partition table
Disk /dev/dm-0 doesn't contain a valid partition table
Disk /dev/dm-1 doesn't contain a valid partition table

Disk /dev/sda: 36.4 GB, 36420075008 bytes
255 heads, 63 sectors/track, 4427 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   fd  Linux raid autodetect
/dev/sda2              14        4427    35455455   fd  Linux raid autodetect

Disk /dev/sdb: 36.4 GB, 36420075008 bytes
255 heads, 63 sectors/track, 4427 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          13      104391   fd  Linux raid autodetect
/dev/sdb2              14        4427    35455455   fd  Linux raid autodetect

Disk /dev/sdc: 36.4 GB, 36420075008 bytes
255 heads, 63 sectors/track, 4427 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1   *           1          13      104391   fd  Linux raid autodetect
/dev/sdc2              14        4427    35455455   fd  Linux raid autodetect

Disk /dev/sdd: 36.4 GB, 36420075008 bytes
255 heads, 63 sectors/track, 4427 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1   *           1          13      104391   fd  Linux raid autodetect
/dev/sdd2              14        4427    35455455   fd  Linux raid autodetect

Disk /dev/sde: 73.5 GB, 73543163904 bytes
255 heads, 63 sectors/track, 8941 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1   *           1          13      104391   fd  Linux raid autodetect
/dev/sde2              14        8941    71714160   fd  Linux raid autodetect

Disk /dev/sdf: 73.5 GB, 73543163904 bytes
255 heads, 63 sectors/track, 8941 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdf1   *           1          13      104391   fd  Linux raid autodetect
/dev/sdf2              14        8941    71714160   fd  Linux raid autodetect

Disk /dev/md1: 106 MB, 106823680 bytes
2 heads, 4 sectors/track, 26080 cylinders
Units = cylinders of 8 * 512 = 4096 bytes


Disk /dev/md2: 145.2 GB, 145224630272 bytes
2 heads, 4 sectors/track, 35455232 cylinders
Units = cylinders of 8 * 512 = 4096 bytes


Disk /dev/dm-0: 143.0 GB, 143076098048 bytes
2 heads, 4 sectors/track, 34930688 cylinders
Units = cylinders of 8 * 512 = 4096 bytes


Disk /dev/dm-1: 2080 MB, 2080374784 bytes
2 heads, 4 sectors/track, 507904 cylinders
Units = cylinders of 8 * 512 = 4096 bytes


Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: Another Raid question. System will not reboot.
« Reply #1 on: November 01, 2008, 03:27:24 AM »
Dell Poweredge 2500 with 6 scsi's so SME is running raid 5 with spare. Notice after last two SME updates that server will not reboot after installing updates.

What do you mean "will not reboot"? What do you see when you try to reboot? How are you trying to reboot? How/when did you generate the "fdisk -l" information? What does "cat /proc/mdstat" say?

Offline tviles

  • *****
  • 197
  • +0/-0
Re: Another Raid question. System will not reboot.
« Reply #2 on: November 01, 2008, 11:27:00 AM »
After updates for SME 7.3 server come in it states to select reconfigure to finish updates. So that is when I see it go down and when it tries to boot backup again I get a boot failure message. I need to get that message for you after the next updates come in. Server is in a busy location that runs 24x7.  How did I generate the fdisk statement? I was VPN from house using putty logged in as root. Is that what you are asking? When? just last evening. Oct. 31. If I get some time this weekend I will go over there and try a reboot and see if I can get that message. I will try a reboot from the SME server panel.

[root@XXXXXXXX ~]# cat /proc/mdstat
Personalities : [raid1] [raid5]
md2 : active raid5 sda2[0] sde2[4] sdd2[3] sdc2[2] sdb2[1]
      141820928 blocks level 5, 256k chunk, algorithm 2 [5/5] [UUUUU]

md1 : active raid1 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
      104320 blocks [5/5] [UUUUU]

unused devices: <none>
[root@XXXXXXXXX ~]#

Offline Reinhold

  • *
  • 517
  • +0/-0
    • http://127.0.0.1
Re: Another Raid question. System will not reboot.
« Reply #3 on: November 01, 2008, 01:13:20 PM »
tviles,

Quote
Notice after last two SME updates that server will not reboot after installing updates.

i) Your filesystem seems perfect...
ii) Your Raid system is fully operational !

Are you saying "system hang's at reboot" ? ...unlikely since you obviously got that mdstat through...
Are you saying "system did install new software ... but no reboot needed"?
Are you saying "signal-event reboot" ... does not shut down & reboot your server ?

...in any case if you type "reboot" at the commandline in a root-console session your SME will shut down - no doubt
(I can't say if it re-boots ... but from the data you show that is at least likely  :P )

...whether that makes sense in your present situation or not is another matter...
(Good'olTimesDepartment: I had an SME 6.x (?), uptime 'years' , applied all updates, and there was no need for a single reboot...  :grin:)

Regards
Reinhold

............

Offline Reinhold

  • *
  • 517
  • +0/-0
    • http://127.0.0.1
Re: Another Raid question. System will not reboot.
« Reply #4 on: November 01, 2008, 01:30:58 PM »
still puzzled I fully noticed this:

... Thanks Tracy Oh I can push the power button a few times then SME will start to load up again.

pushing power buttons is generally a rather bad thing in a linux system...more so on a server...  (*)
...most likely you will end up having to do one or the other fsck...

Note: dell gave you a nice CD for your poweredge you can boot from and "debug" your hardware.

Regards
Reinhold

(*) luckily SME converts your pushy action into a useful command - unless you push&hold that button on your poweredge long enough which your FS software and even your scsi drives will not like at all (they will whine and shriek their protest to you)
« Last Edit: November 01, 2008, 01:33:16 PM by Reinhold »
............

Offline tviles

  • *****
  • 197
  • +0/-0
Re: Another Raid question. System will not reboot.
« Reply #5 on: November 01, 2008, 02:57:32 PM »
OK now I am curious and I really appreciate the help. I have to run over there this morning anyway so I will reboot it and post again here in a couple of hours. Yes SME can start the reboot and shut down things properly, after server shows opening screen normal for reboot I get this message I will go look.

Offline tviles

  • *****
  • 197
  • +0/-0
Re: Another Raid question. System will not reboot.
« Reply #6 on: November 01, 2008, 03:46:02 PM »
OK I'm sorry this appears to be a hardware issue. The error message is just after the adaptec bios screen appears showing all the drives and BP and it says No boot device available- strike F1 to retry boot, F2 for setup utility.

Then is I power off then on I get a lot of these messages once it starts to boot past the first error message above.
Current sdf:sense key Medium Error Additional sense: Address mark not found for data field. Also Additional sense retriers exhausted end_requiest : I/O erro , dev sdf, sector 143636973 Buffer I/O erro on device sdf2 logical block.

I see one drive what I would call drive 5 that always has the two green lights on. The rest of the drives are just showing one green light during boot up.
 

Offline Reinhold

  • *
  • 517
  • +0/-0
    • http://127.0.0.1
Re: Another Raid question. System will not reboot.
« Reply #7 on: November 01, 2008, 06:19:13 PM »
...I/O erro , dev sdf, sector 143636973 Buffer I/O erro on device sdf2 logical block.

I see one drive what I would call drive 5 that always has the two green lights on. The rest of the drives are just showing one green light during boot up.
 

tviler,

sdf, drive5, two led's on...
My bet is on "interface of drive5 stuck- drive not responding", the highest/hottest on top(?), went bad due to thermal problems...
switch boot in bios ... boot that dell hardware cd to confirm  (it's on IDE IIRC) .

Do not panic ... you are in degraded mode but data is still there... DO BACKUP ... go buy & install new drive asap


Regards
Reinhold

P.S.: If your system supports hdtemp ... you should check temperatures when system is operational again
P.P.S.: ... do NOT TURN YOUR SYSTEM ON/OFF ON/OFF until the defective drive starts spinning again. You might loose everything that's not on a backup! :shock:
............

Offline tviles

  • *****
  • 197
  • +0/-0
Re: Another Raid question. System will not reboot.
« Reply #8 on: November 01, 2008, 06:45:57 PM »
Do not panic ... you are in degraded mode but data is still there... DO BACKUP ... go buy & install new drive asap

I live in a state of panic, it's just how I roll. OK I thought so but wasn't sure that drive is also making a knocking noise. Yes I have two backups and I have an extra drive in the desk drawer waiting for this. So I go into server manager and do a shut down and when it's down take out bad drive and replace with new drive and then turn on server? Does SME rebuild the new drive? Or does it involve more than that? Thanks for your help.

switch boot in bios ... boot that dell hardware cd to confirm  (it's on IDE IIRC) .

I did switch boot in bios and the server will now reboot. But of course I still see all the error messages once SME takes over and starts up the SME bootup process as mentioned above.


Regards
Reinhold

P.S.: If your system supports hdtemp ... you should check temperatures when system is operational again

I'm not sure if the Dell server does this or not, I don't think I have ever seen anything in the bios screens for this.


P.P.S.: ... do NOT TURN YOUR SYSTEM ON/OFF ON/OFF until the defective drive starts spinning again. You might loose everything that's not on a backup! :shock:

10-4

I don't have that Dell CD. I bought this server from a guy in a parking lot off Craigslist. Then we got into the end of the year finance shut down at work and I had to put this server into action. I bet I can download that CD from Dell. I will look.

I can head back over there Sunday morning. After I hear if SME rebuilds the drive or not from you.

Offline tviles

  • *****
  • 197
  • +0/-0
Update and results Re: Another Raid question. System will not reboot.
« Reply #9 on: November 07, 2008, 03:18:19 AM »
For the newbies like me I thought I would post results. I was running raid 5 with spare, the above errors were when SME was booting up. If you watch a 6 drive scsi system like this you will see the 5 drives in the raid 5 blink all at the same time. The spare does not. The results from the commands above given to me were showing the raid results? Not sure. Replaced 73gb drive tonight with 73 gb drive and SME fired right up no problems no sync wait. So I am guessing it was the spare that was failing. Now  on to the next project. I installed Promise pci sata card and a 1TB sata drive into server. I have a lot of reading to do on that one yet. Dell Bios screen showed the sata card on boot but that is as far as I got tonight. Thanks again for the help people I appreciate it. Tracy