Koozali.org: home of the SME Server

Raid (SOLVED)

Offline jumba

  • ****
  • 291
  • +0/-0
  • Donations: July 2007 - $ 20.00
    • Smeserver på svenska!
Raid (SOLVED)
« on: January 16, 2008, 03:39:59 PM »
...so I just received the message:

Code: [Select]
A DegradedArray event has been detected on md device /dev/md2.
Here is the situation:

Code: [Select]
[root@server ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[1]
      244035264 blocks [2/1] [_U]
     
md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]
     
unused devices: <none>
[root@server ~]# mdadm --detail --verbose /dev/md2
/dev/md2:
        Version : 00.90.01
  Creation Time : Tue Dec 18 15:33:31 2007
     Raid Level : raid1
     Array Size : 244035264 (232.73 GiB 249.89 GB)
    Device Size : 244035264 (232.73 GiB 249.89 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Wed Jan 16 15:34:50 2008
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : e9b25f43:328a09e4:d0e0305e:94051d6a
         Events : 0.862266

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1       8       18        1      active sync   /dev/sdb2
[root@server ~]# mdadm --detail --verbose /dev/md1
/dev/md1:
        Version : 00.90.01
  Creation Time : Tue Dec 18 15:33:31 2007
     Raid Level : raid1
     Array Size : 104320 (101.89 MiB 106.82 MB)
    Device Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Wed Jan 16 15:10:09 2008
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 8b3124ee:f2c663f6:5a197366:46d9e75b
         Events : 0.1030

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
[root@server ~]#

Yes, I have searched the forum, but since there are real "experts" aroud here, what would be the solution in this case?

Many thanks in advance :lol:

Edit:

My own guess how to solve the problem:

Code: [Select]
mdadm /dev/md2 -a /dev/sda2
Could somebody please confirm that I'm on the right track?

Edit2:

OK, I was in a hurry, read http://wiki.contribs.org/Raid even one more time and decided to go with my theory:

Code: [Select]
[root@server ~]# mdadm /dev/md2 -a /dev/sda2
mdadm: hot added /dev/sda2
[root@server ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda2[2] sdb2[1]
      244035264 blocks [2/1] [_U]
      [>....................]  recovery =  0.1% (288448/244035264) finish=84.4min speed=48074K/sec
md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]
     
unused devices: <none>

It seems I was on the right track, but I'll have to keep a cloose look on that drive :-?
« Last Edit: January 16, 2008, 04:31:03 PM by jumba »

Offline raem

  • *
  • 3,972
  • +4/-0
Re: Raid (SOLVED)
« Reply #1 on: January 16, 2008, 09:28:07 PM »
jumba

You should run a disk check on both drives asap as one may fail soon.
...

Offline jumba

  • ****
  • 291
  • +0/-0
  • Donations: July 2007 - $ 20.00
    • Smeserver på svenska!
Re: Raid (SOLVED)
« Reply #2 on: January 16, 2008, 10:45:14 PM »
You should run a disk check on both drives asap as one may fail soon.

Thanks, I will. The strange thing is that this is an almost brand new server (DELL), - only 1 month old :???:

Offline calisun

  • *
  • 601
  • +0/-0
Re: Raid (SOLVED)
« Reply #3 on: September 20, 2008, 05:59:37 AM »
I have exact same message:
A DegradedArray event has been detected on md device /dev/md2.

And both of my drives are practically new, only 3 month on a server that does not get a lot of usage.

Is it just a coincidence that we both have new drives and in both cases same drive has issues?
Or could it be  a case of the drives not dying but something wrong with software RAID?

And what about code that jumba posted?
 mdadm /dev/md2 -a /dev/sda2
SME user and community member since 2005.
Want to install Wordpress in iBay of SME Server?
See my step-by-step How-To wiki here:
http://wiki.contribs.org/Wordpress_Multisite

Offline jumba

  • ****
  • 291
  • +0/-0
  • Donations: July 2007 - $ 20.00
    • Smeserver på svenska!
Re: Raid (SOLVED)
« Reply #4 on: September 20, 2008, 09:41:32 AM »
Hi again. I've had no further problems with that server or the drives since then....

Offline calisun

  • *
  • 601
  • +0/-0
Re: Raid (SOLVED)
« Reply #5 on: September 22, 2008, 05:27:53 PM »
What was your solution? Run the code you posted? Replaced drive??

Also what code did you use to test the drive? 
SME user and community member since 2005.
Want to install Wordpress in iBay of SME Server?
See my step-by-step How-To wiki here:
http://wiki.contribs.org/Wordpress_Multisite

Offline jumba

  • ****
  • 291
  • +0/-0
  • Donations: July 2007 - $ 20.00
    • Smeserver på svenska!
Re: Raid (SOLVED)
« Reply #6 on: September 22, 2008, 10:21:22 PM »
What was your solution? Run the code you posted? Replaced drive??

Also what code did you use to test the drive? 

Actually, I did nothing more than I wrote here.

I ran the code posted, and kept read the logs more carefully for a couple of weeks afterwards.

The disks are still operating well in that server, with signs of failure...

Not much of help for you, but that's still the truth :?

//Jumba

Offline calisun

  • *
  • 601
  • +0/-0
Re: Raid (SOLVED)
« Reply #7 on: December 21, 2008, 09:56:31 PM »
OK, I did:
Code: mdadm /dev/md2 -a /dev/sda2

Soon after I did that, admin got an email:
RebuildStarted event on /dev/md2

couple minutes after that I got an email:
RebuildFinished event on /dev/md2

and couple minutes after that I got an email:
FailSpare event on /dev/md2

it said:
This is an automatically generated mail message from mdadm running on
A FailSpare event has been detected on md device /dev/md2.
Device /dev/sda2 is now an active member of md device /dev/md2.

So if I am reading this right, one of the drives failed and the one that failed is sda1. I am I reading this right?
 
SME user and community member since 2005.
Want to install Wordpress in iBay of SME Server?
See my step-by-step How-To wiki here:
http://wiki.contribs.org/Wordpress_Multisite

Offline calisun

  • *
  • 601
  • +0/-0
Re: Raid (SOLVED)
« Reply #8 on: December 21, 2008, 10:10:08 PM »
Yup, after pulling more info, one of the drives is dead. I just want to make sure that it is sda1, I don't want to replace the wrong drive. Also after I replace the drive, do I need to do anything, or will raid sinc the two drives?


[root@x ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda2[2](F) sdb2[1]
      732467520 blocks [2/1] [_U]
     
md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]
     
unused devices: <none>

[root@x ~]# mdadm --detail --verbose /dev/md2
/dev/md2:
        Version : 00.90.01
  Creation Time : Sat Feb  2 10:21:21 2008
     Raid Level : raid1
     Array Size : 732467520 (698.54 GiB 750.05 GB)
    Device Size : 732467520 (698.54 GiB 750.05 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sun Dec 21 13:01:06 2008
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

           UUID : 738d1166:ec014667:f7dcc221:126c798f
         Events : 0.6595410

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1       8       18        1      active sync   /dev/sdb2

       2       8        2        -      faulty   /dev/sda2
[root@x ~]#
SME user and community member since 2005.
Want to install Wordpress in iBay of SME Server?
See my step-by-step How-To wiki here:
http://wiki.contribs.org/Wordpress_Multisite

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: Raid (SOLVED)
« Reply #9 on: December 21, 2008, 11:04:22 PM »
Yup, after pulling more info, one of the drives is dead. I just want to make sure that it is sda1, I don't want to replace the wrong drive.

sda1 is not a drive. sda1 is a partition. It is on drive sda.



Offline electroman00

  • ****
  • 491
  • +0/-0
Re: Raid (SOLVED)
« Reply #10 on: December 21, 2008, 11:43:33 PM »
Unplug sda (bad drive) and set the bios to boot to sdb, if it boots then it's a good drive.
Don't make any changes to the server, just test things out and shutdown.

Then test the other drive the same way.

Plug one drive in at a time and set the bios to boot to that drive.

You want to be sure you have the bad drive.

sda might boot and appear as ok when is by itself, doesn't mean it is.

Most important, mark good / bad on each drive, so you don't loose track of what your doing and before you start any destructive i.e. format command
wait 15 sec. and rethink all your steps.

Sure does look like sda md2 is the bad boy.

md2 : active raid1 sda2[2](F) sdb2[1]
      732467520 blocks [2/1] [_U]

Check sata cable or scsi jumpers and mark it sda with a felt pen.

Doesn't mean the drive is bad, just give it a mfg diag format.

Once you know which drive for sure, then remove the good drive and mfg format the bad drive, MAKE SURE the good drive is unplugged.
Then there's no chance of formatting the wrong one.

You can also run mfg diag on it to test it.

You can put it back into sda and keep the bios set to boot to sdb.

Then server admin #5 manage raid.

About 30-60sec. it will start a rebuild, atl-f2 and login to watch the rebuild

watch -n .1 cat /proc/mdstat

oops EDIT:If you get this far then reboot after the rebuild and make sure you set the bios to boot sda.....not sdb.

=======================================

You could also do  a manual restore of sda md2 partition from sdb, but if you make a mistake, your screwed, like forever.
Like if you restore (bad) sda md2 to (good) sdb md2 your screwed.

Plus you won't be able to re-format the entire drive and test it with mfg diag, the preferred method.

In short.... it's worth a try, might just be a corrupt sda md2

hth
 
« Last Edit: December 21, 2008, 11:48:58 PM by electroman00 »

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: Raid (SOLVED)
« Reply #11 on: December 22, 2008, 01:50:35 AM »
OP should not do anything without first scanning /var/log/messages.* for hard drive related error messages and using smartctl to query the drive error and self-test information.

Offline calisun

  • *
  • 601
  • +0/-0
Re: Raid (SOLVED)
« Reply #12 on: December 22, 2008, 08:48:59 AM »
My server is in a colocation facility, so I don't have time to sit there playing with the drive. Plus I have couple of very active web pages hosted on that server, so I can't afford to have the site offline for extended period of time. So I will just replace the drive with a new drive and later when I get home I will see if I can rescue the drive, if I can I will keep it as a spare.

Basically I need to know is how to figure out which drive is dead before I get there so I don't replace the wrong drive.
SME user and community member since 2005.
Want to install Wordpress in iBay of SME Server?
See my step-by-step How-To wiki here:
http://wiki.contribs.org/Wordpress_Multisite

Offline electroman00

  • ****
  • 491
  • +0/-0
Re: Raid (SOLVED)
« Reply #13 on: December 22, 2008, 09:43:33 AM »
md2 : active raid1 sda2[2](F) sdb2[1]
      732467520 blocks [2/1] [_U]

Offline Frank VB

  • ***
  • 127
  • +0/-0
Re: Raid (SOLVED)
« Reply #14 on: December 22, 2008, 10:50:48 AM »
Use the smartctl command to find out the serial number of your drive (hopefully, you can still retrieve this information form your dead drive). This number is also printed on the label of the hard drive:

Code: [Select]
# smartctl -i /dev/sda
smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD2500YS-18SHB2
Serial Number:    WD-WCANY4116755
Firmware Version: 20.06C07
User Capacity:    250,000,000,000 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Dec 22 10:43:42 2008 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

« Last Edit: December 22, 2008, 10:52:41 AM by frankvb »

Offline calisun

  • *
  • 601
  • +0/-0
Re: Raid (SOLVED)
« Reply #15 on: December 22, 2008, 06:53:49 PM »

Correct me if I m wrong, but looking in the email I got, which I posted in my Reply #7

Code: [Select]
Device /dev/sda2 is now an active member of md device /dev/md2.
it says that sda2 is now an active ...md2,
so it was not before.


And when one looks at my Reply#8

Code: [Select]
md2 : active raid1 sda2[2](F) sdb2[1]
      732467520 blocks [2/1] [_U]
     
md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]

these are two identical 750 gig drives, and md1: sda1 shows some strange size.

So am I reading this right? I need to replace sda1 not sda2?

« Last Edit: December 22, 2008, 06:58:43 PM by calisun »
SME user and community member since 2005.
Want to install Wordpress in iBay of SME Server?
See my step-by-step How-To wiki here:
http://wiki.contribs.org/Wordpress_Multisite

Offline calisun

  • *
  • 601
  • +0/-0
Re: Raid (SOLVED)
« Reply #16 on: December 22, 2008, 07:04:32 PM »

thanks frankvb, using the code you provided, it shows me only one drive. So I guess I know which drive not to replace :)


Code: [Select]
[root@x ~]# smartctl -i /dev/sda
smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     Hitachi HDS721075KLA330
Serial Number:    GTF200P8G1L7XF
Firmware Version: GK8OA70M
User Capacity:    750,156,374,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 1
Local Time is:    Mon Dec 22 09:50:48 2008 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Error SMART Values Read failed: Input/output error
Smartctl: SMART Read Values failed.

Error SMART Thresholds Read failed: Input/output error
Smartctl: SMART Read Thresholds failed.

[root@x ~]#
SME user and community member since 2005.
Want to install Wordpress in iBay of SME Server?
See my step-by-step How-To wiki here:
http://wiki.contribs.org/Wordpress_Multisite

Offline Frank VB

  • ***
  • 127
  • +0/-0
Re: Raid (SOLVED)
« Reply #17 on: December 22, 2008, 07:24:25 PM »
Quote
Error SMART Values Read failed: Input/output error
Smartctl: SMART Read Values failed.
It looks like your sda has a problem. Make sure you also check your sdb drive (use the -a switch to get a full report on the drive):

Code: [Select]
#smartctl -a /dev/sdb | more

Offline pfloor

  • ****
  • 889
  • +1/-0
Re: Raid (SOLVED)
« Reply #18 on: December 22, 2008, 09:15:22 PM »
Correct me if I m wrong, but looking in the email I got, which I posted in my Reply #7

Code: [Select]
Device /dev/sda2 is now an active member of md device /dev/md2.
it says that sda2 is now an active ...md2,
so it was not before.

It was (and possibly still is) not.  It appears from your previous posts that partition sda2 failed, rebuilt and then failed again.  What state it is in now is unknown as you keep referring to old posts.

Quote
And when one looks at my Reply#8

Code: [Select]
md2 : active raid1 sda2[2](F) sdb2[1]
      732467520 blocks [2/1] [_U]
    
md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]

these are two identical 750 gig drives, and md1: sda1 shows some strange size.

So am I reading this right? I need to replace sda1 not sda2?
No, and you have been told already that sda1 is NOT a drive, it is a partition.

md1 and md2 are neither drives nor are they partitions, they are mirror devices (aka raid devices)

You appear to be completely confused about drives (actual physical hardware devices), partitions (parts of each physical device) and mirror devices, maybe this will help you:

sda1+sdb1=md1
sda2+sdb2=md2

sda1 (serial drive "A", partition #1) + sdb1 (serial drive "B", partition #1) are mirrored together to form md1 (mirror device #1) This is the small one that contains /boot

sda2 (serial drive "A", partition #2) + sdb2 (serial drive "B", partition #2) are mirrored together to form md2 (mirror device #2) This is the large one that contains everything else.

Or perhaps an illistration (sorry, best I could come up with):
Code: [Select]
Simple 2 drive raid 1 mirror

      Drives      Mirrors
   sda     sdb
P | 1 | + | 1 | = |md1|
a |---|---|---|---|---|
r |   |   |   |   |   |
t |   |   |   |   |   |
i |   |   |   |   |   |
t |   |   |   |   |   |
i | 2 | + | 2 | = |md2|
o |   |   |   |   |   |
n |   |   |   |   |   |
s |   |   |   |   |   |

So, sda1 and sda2 are on the same drive (sda) and if any partition on a particular drive is bad (sda2 in your case) then you need to replace the entire drive (that would be sda in your case).

Make sense now?
In life, you must either "Push, Pull or Get out of the way!"

Offline calisun

  • *
  • 601
  • +0/-0
Re: Raid (SOLVED)
« Reply #19 on: December 22, 2008, 11:03:29 PM »
thank you pfloor
yes, I was confusing physical drive with a partition. Thank you for your explanation.
I have been building my own computers since DOS 5.0, but I guess applying my pc knowledge to Linux does not always relate :)
 
sdb does not show any error messages, so yes it is sda


Code: [Select]
[root@x ~]# smartctl -i /dev/sdb
smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     Hitachi HDS721075KLA330
Serial Number:    GTA300P8G4GNJA
Firmware Version: GK8OA70M
User Capacity:    750,156,374,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 1
Local Time is:    Mon Dec 22 14:04:24 2008 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

[root@x ~]#
« Last Edit: December 22, 2008, 11:15:14 PM by calisun »
SME user and community member since 2005.
Want to install Wordpress in iBay of SME Server?
See my step-by-step How-To wiki here:
http://wiki.contribs.org/Wordpress_Multisite

Offline pfloor

  • ****
  • 889
  • +1/-0
Re: Raid (SOLVED)
« Reply #20 on: December 22, 2008, 11:34:39 PM »
I have been building my own computers since DOS 5.0, but I guess applying my pc knowledge to Linux does not always relate :)
Seldom do they relate other than the box that the hardware resides. :-)
In life, you must either "Push, Pull or Get out of the way!"