Koozali.org: home of the SME Server

Obsolete Releases => SME Server 7.x => Topic started by: jekal on September 04, 2011, 10:47:56 AM

Title: RAID1 -- no reboot after removal of faulty disk
Post by: jekal on September 04, 2011, 10:47:56 AM: HI,

I have a faulty disk in my RAID1 array. So far no prob.
Code: [Select]
[root@server ~]# mdadm -a mdadm: an md device must be given in this mode # cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sda2[0] 732467520 blocks [2/1] [U_] md1 : active raid1 sda1[0] 104320 blocks [2/1] [U_] unused devices: <none>
I id'ed the faulty disk and removed it but the server doesn't reboot. There is no boot partition.

So there are 2 questions:
1: in a RAID1 I thought everything is mirrored. Why not the boot information?
2: how do I get the boot Information copied to the working disk?

SME Version 7.5.1
Jens
Title: Re: RAID1 -- no reboot after removal of faulty disk
Post by: chris burnat on September 04, 2011, 11:29:17 AM: Are you sure you did not remove the good disk? Is the server rebooting with the faulty disk connected exactly as it was before you removed it?
Title: Re: RAID1 -- no reboot after removal of faulty disk
Post by: jekal on September 04, 2011, 12:01:35 PM: yes, I am pretty sure :-P

this ist the bad disk:
Code: [Select]
smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: SAMSUNG HD753LJ Serial Number: S13UJ1KQ318196 Firmware Version: 1AA01109 User Capacity: 750,156,374,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Not recognized. Minor revision code: 0x52 Local Time is: Sun Sep 4 11:58:38 2011 CEST ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details. SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 0 Warning: ATA Specification requires self-test log structure revision number = 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Offline Completed: read failure 00% 22826 1465143105 # 2 Short offline Aborted by host 00% 13702 - # 3 Short offline Aborted by host 00% 5634 - # 4 Offline Aborted by host 00% 2091 - # 5 Extended offline Aborted by host 00% 608 -
this is the good one (replaced last year):
Code: [Select]
smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: SAMSUNG HD753LJ Serial Number: S13UJDWZ502170 Firmware Version: 1AA01118 User Capacity: 750,156,374,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Not recognized. Minor revision code: 0x52 Local Time is: Sun Sep 4 12:00:18 2011 CEST ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details. SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Offline Completed without error 00% 8290 - # 2 Extended offline Completed without error 00% 1687 -
Title: Re: RAID1 -- no reboot after removal of faulty disk
Post by: jekal on September 04, 2011, 12:11:40 PM: When I do a reboot the nvidia raid bios reports a degraded array.
When the bad disk is removed it reports no raid anymore.

My new disk is a Seagate 1TB, which should replace the bad disk, when I get his reboot thing resolved :D
Title: Re: RAID1 -- no reboot after removal of faulty disk
Post by: chris burnat on September 04, 2011, 01:09:15 PM: Quote from: jekal on September 04, 2011, 12:11:40 PM
When I do a reboot the nvidia raid bios reports a degraded array.

Are you mirroring using SME native raid or using a propriatory software?
i.e. ftp://ftp.tyan.com/manuals/m_NVRAID_Users_Guide_v20.pdf
Title: Re: RAID1 -- no reboot after removal of faulty disk
Post by: jekal on September 04, 2011, 01:13:37 PM: Hi Chris,

I had added the raid in the bios of the motherboard. Originally I thought this should be enough for mirroring. But I had to use the unix utility as well (mdadm).

I replaced last year the other disk w/o any problems. just hat to add the disk with mdadm --add.
Title: Re: RAID1 -- no reboot after removal of faulty disk
Post by: CharlieBrady on September 04, 2011, 03:41:58 PM: Quote from: jekal on September 04, 2011, 01:13:37 PM
I had added the raid in the bios of the motherboard.

In that case you are not using SME server's built-in RAID1. You are using "fake-raid", which is not recommended.

You will need to use the BIOS to add the new drive to the mirror. Maybe your system will boot once you have done that.

SME server's raid monitoriing will not be able to tell you about failed devices in a BIOS fakeraid configuration.
Title: Re: RAID1 -- no reboot after removal of faulty disk
Post by: johnp on September 04, 2011, 03:45:57 PM: Using the bios motherboard raid and forcing some type of raid via commands is likely not a good thing. If the motherboard was handling it, sme would have only saw one disk as it should.

I think your best option would have been to just use the software raid in sme. Case in point, one of my servers was running on a nice hardware raid card with only 2 disks. When it came time to add more drive space, the tools weren't available to grow the disc. I did manage a work around, but it would have been much easier if I had installed it originally using the built in.
Title: Re: RAID1 -- no reboot after removal of faulty disk
Post by: chris burnat on September 04, 2011, 04:18:02 PM: I have no experience with the type of fake raid you are using... However, having read a little the document I quoted, check what sort of raid you have selected, it may be that you have setup a raid 0, not a raid 1. If this is the case, there is no disk redundancy, just faster access and improved bandwidth, which may explain why the "good" disk does not reboot on its own. Hope this is not the case...
Title: Re: RAID1 -- no reboot after removal of faulty disk
Post by: jekal on September 04, 2011, 05:40:10 PM: well, at least I definitly have a RAID1, because I had a faulty disk replaced last year and still have all data available.

I have made a backup with the SME Admin console. Does this backup all my settings and data? Inkl my second domain (I need for DynDNS).

If I remove the array from the BIOS I fear I have to re-install SME. Correct?

Jens
Title: Re: RAID1 -- no reboot after removal of faulty disk
Post by: CharlieBrady on September 05, 2011, 01:48:47 AM: Quote from: jekal on September 04, 2011, 05:40:10 PM
I have made a backup with the SME Admin console. Does this backup all my settings and data?

Yes, that's exactly what it does.

Quote
If I remove the array from the BIOS I fear I have to re-install SME. Correct?

I believe so.
Title: Re: RAID1 -- no reboot after removal of faulty disk
Post by: jekal on September 12, 2011, 07:29:01 AM: Done.
I checked my system and discovered that the RAID controller is a simple one which needs a software driver. So I disabled the BIOS Raid, removed the faulty (but the only booting HDA), started a new install and restored my backup when the installation asked for it. This was yesterday evening at 21:00h.
I can't say I slept very well but this morning the restore was finished and the system is running :) :) :)

Good to know, that this is really working.

Jens