Koozali.org: home of the SME Server
Obsolete Releases => SME Server 9.x => Topic started by: DanB35 on August 11, 2014, 03:58:12 PM
-
I've been running SME 9.0 for a few weeks without significant issues. However, in the last week or so, I've had it twice start rebuilding the array for no readily-apparent reason. I didn't keep the admin emails from the previous time this happened, but the most recent instance started at 1:00 am yesterday, and the fact that it was exactly on the hour makes me a little suspicious.
Some history, in case it's relevant: I had been running SME 8.1 on this machine, on mirrored disks. When I did the upgrade, I removed one of the disks, installed SME 9 and restored from my backup onto the other disk, and made sure it was working. Once it appeared to be working OK, I reinstalled the previously-removed disk and used "manage redundancy" from the console menu to set up the mirror. Curiously, I did not receive any emails about the rebuild status as the disk was synced, but I was able to monitor it with /proc/mdstat and it finished without errors. Currently, mdstat indicates both disks are online:
[dan@e-smith ~]$ cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[2] sda1[0]
255936 blocks super 1.0 [2/2] [UU]
md1 : active raid1 sdb2[2] sda2[0]
1953126208 blocks super 1.1 [2/2] [UU]
bitmap: 5/15 pages [20KB], 65536KB chunk
unused devices: <none>
Where should I start looking to see what triggered the rebuild?
-
If the message comes from every sunday at 1h00 AM then it is not a raid warning.
Please take a look to that bug report, the package is waiting a release.
http://bugs.contribs.org/show_bug.cgi?id=7748
-
That bug is described as sending emails on routine checks, but it looks like the raid-check actually forces a rebuild of the array:
[root@e-smith log]# /usr/sbin/raid-check
^Z
[1]+ Stopped /usr/sbin/raid-check
[root@e-smith log]# bg
[1]+ /usr/sbin/raid-check &
[root@e-smith log]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[2] sda1[0]
255936 blocks super 1.0 [2/2] [UU]
md1 : active raid1 sdb2[2] sda2[0]
1953126208 blocks super 1.1 [2/2] [UU]
[>....................] check = 0.0% (1019072/1953126208) finish=638.5min speed=50953K/sec
bitmap: 10/15 pages [40KB], 65536KB chunk
unused devices: <none>
Is this the same thing, or something different?
-
In first if you think that it is a bug, you are welcome to go to bugzilla.
Now I would to explain what is occurring (of course I can be wrong)
[root@sme9 ~]# rpm -qf /usr/sbin/raid-check
mdadm-3.2.6-7.el6_5.2.i686
this is where the package comes from
[root@sme9 ~]# yum info mdadm
Loaded plugins: fastestmirror, smeserver
Loading mirror speeds from cached hostfile
* base: ftp.rezopole.net
* smeaddons: mirror.hakkers.com
* smeextras: mirror.hakkers.com
* smeos: mirror.hakkers.com
* smeupdates: mirror.hakkers.com
* updates: ftp.rezopole.net
Installed Packages
Name : mdadm
Arch : i686
Version : 3.2.6
Release : 7.el6_5.2
Size : 884 k
Repo : installed
From repo : updates
Summary : The mdadm program controls Linux md devices (software RAID arrays)
URL : http://www.kernel.org/pub/linux/utils/raid/mdadm/
License : GPLv2+
Description : The mdadm program is used to create, manage, and monitor Linux MD (software
: RAID) devices. As such, it provides similar functionality to the raidtools
: package. However, mdadm is a single program, and it can perform
: almost all functions without a configuration file, though a configuration
: file can be used to help with some common tasks.
the script which is launched every sunday at 1:00AM is not ours (it comes from mdadm) , it is a pure centos binary, we cannot modify it if we want to be centos compatible
then we have an event called '/sbin/e-smith/mdevent' which is in charge to watch about the events launched by mdadm, but we need to patch that event to avoid to send email if the script /usr/sbin/raid-check is working.
It is important to regularly verify the state of your raid and it is what you can see in your /proc/mdstat.
-
http://stackoverflow.com/questions/12114461/why-does-mdadm-keep-checking-resyncing
-
Certainly it's important to periodically verify the state of the array, and looking more closely at the mdstat output I posted, it does look like it's checking, rather than rebuilding, the array. Should the patch instead check the mdstat output to determine if it's a "check" event vs. a "rebuild" event?
-
print "Event: $event, Device: $device, Member: $member\n";
+if ($event =~ m#^Rebuild# && system( "ps -C raid-check" ) == 0 ) {
+ exit 0;
+}
+
if ($event =~ m#^Rebuild|^Fail|^Degraded|^SpareActive#) {
my $domain = $conf->get_value("DomainName") || 'localhost';
my $user = "admin_raidreport\@$domain";
http://bugs.contribs.org/attachment.cgi?id=4664&action=diff
If '$event' contains Rebuild AND the processus raid-check is running, then the mdevent is stopped
If '$event' contains Rebuild OR Fail OR Degraded OR SpareActive, then the mdevent drops an email to the sysadmin