Koozali.org: home of the SME Server

Obsolete Releases => SME Server 9.x => Topic started by: DanB35 on August 11, 2014, 03:58:12 PM

Title: RAID1 starts rebuilding for no reason?
Post by: DanB35 on August 11, 2014, 03:58:12 PM: I've been running SME 9.0 for a few weeks without significant issues. However, in the last week or so, I've had it twice start rebuilding the array for no readily-apparent reason. I didn't keep the admin emails from the previous time this happened, but the most recent instance started at 1:00 am yesterday, and the fact that it was exactly on the hour makes me a little suspicious.

Some history, in case it's relevant: I had been running SME 8.1 on this machine, on mirrored disks. When I did the upgrade, I removed one of the disks, installed SME 9 and restored from my backup onto the other disk, and made sure it was working. Once it appeared to be working OK, I reinstalled the previously-removed disk and used "manage redundancy" from the console menu to set up the mirror. Curiously, I did not receive any emails about the rebuild status as the disk was synced, but I was able to monitor it with /proc/mdstat and it finished without errors. Currently, mdstat indicates both disks are online:

Code: [Select]
[dan@e-smith ~]$ cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[2] sda1[0] 255936 blocks super 1.0 [2/2] [UU] md1 : active raid1 sdb2[2] sda2[0] 1953126208 blocks super 1.1 [2/2] [UU] bitmap: 5/15 pages [20KB], 65536KB chunk unused devices: <none>
Where should I start looking to see what triggered the rebuild?
Title: Re: RAID1 starts rebuilding for no reason?
Post by: stephdl on August 11, 2014, 05:21:20 PM: If the message comes from every sunday at 1h00 AM then it is not a raid warning.

Please take a look to that bug report, the package is waiting a release.

http://bugs.contribs.org/show_bug.cgi?id=7748
Title: Re: RAID1 starts rebuilding for no reason?
Post by: DanB35 on August 11, 2014, 06:10:51 PM: That bug is described as sending emails on routine checks, but it looks like the raid-check actually forces a rebuild of the array:

Code: [Select]
[root@e-smith log]# /usr/sbin/raid-check ^Z [1]+ Stopped /usr/sbin/raid-check [root@e-smith log]# bg [1]+ /usr/sbin/raid-check & [root@e-smith log]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[2] sda1[0] 255936 blocks super 1.0 [2/2] [UU] md1 : active raid1 sdb2[2] sda2[0] 1953126208 blocks super 1.1 [2/2] [UU] [>....................] check = 0.0% (1019072/1953126208) finish=638.5min speed=50953K/sec bitmap: 10/15 pages [40KB], 65536KB chunk unused devices: <none>
Is this the same thing, or something different?
Title: Re: RAID1 starts rebuilding for no reason?
Post by: stephdl on August 11, 2014, 08:40:00 PM: In first if you think that it is a bug, you are welcome to go to bugzilla.

Now I would to explain what is occurring (of course I can be wrong)

Code: [Select]
[root@sme9 ~]# rpm -qf /usr/sbin/raid-check mdadm-3.2.6-7.el6_5.2.i686
this is where the package comes from

Code: [Select]
[root@sme9 ~]# yum info mdadm Loaded plugins: fastestmirror, smeserver Loading mirror speeds from cached hostfile * base: ftp.rezopole.net * smeaddons: mirror.hakkers.com * smeextras: mirror.hakkers.com * smeos: mirror.hakkers.com * smeupdates: mirror.hakkers.com * updates: ftp.rezopole.net Installed Packages Name : mdadm Arch : i686 Version : 3.2.6 Release : 7.el6_5.2 Size : 884 k Repo : installed From repo : updates Summary : The mdadm program controls Linux md devices (software RAID arrays) URL : http://www.kernel.org/pub/linux/utils/raid/mdadm/ License : GPLv2+ Description : The mdadm program is used to create, manage, and monitor Linux MD (software : RAID) devices. As such, it provides similar functionality to the raidtools : package. However, mdadm is a single program, and it can perform : almost all functions without a configuration file, though a configuration : file can be used to help with some common tasks.
the script which is launched every sunday at 1:00AM is not ours (it comes from mdadm) , it is a pure centos binary, we cannot modify it if we want to be centos compatible

then we have an event called '/sbin/e-smith/mdevent' which is in charge to watch about the events launched by mdadm, but we need to patch that event to avoid to send email if the script /usr/sbin/raid-check is working.

It is important to regularly verify the state of your raid and it is what you can see in your /proc/mdstat.
Title: Re: RAID1 starts rebuilding for no reason?
Post by: CharlieBrady on August 11, 2014, 11:25:40 PM: http://stackoverflow.com/questions/12114461/why-does-mdadm-keep-checking-resyncing
Title: Re: RAID1 starts rebuilding for no reason?
Post by: DanB35 on August 13, 2014, 05:17:51 PM: Certainly it's important to periodically verify the state of the array, and looking more closely at the mdstat output I posted, it does look like it's checking, rather than rebuilding, the array. Should the patch instead check the mdstat output to determine if it's a "check" event vs. a "rebuild" event?
Title: Re: RAID1 starts rebuilding for no reason?
Post by: stephdl on August 13, 2014, 05:34:48 PM: Code: [Select]
print "Event: $event, Device: $device, Member: $member\n"; +if ($event =~ m#^Rebuild# && system( "ps -C raid-check" ) == 0 ) { + exit 0; +} + if ($event =~ m#^Rebuild|^Fail|^Degraded|^SpareActive#) { my $domain = $conf->get_value("DomainName") || 'localhost'; my $user = "admin_raidreport\@$domain";
http://bugs.contribs.org/attachment.cgi?id=4664&action=diff

If '$event' contains Rebuild AND the processus raid-check is running, then the mdevent is stopped

If '$event' contains Rebuild OR Fail OR Degraded OR SpareActive, then the mdevent drops an email to the sysadmin