Koozali.org: home of the SME Server

raid1 and cron.weekly and 99-raid-check

Offline purvis

  • ****
  • 567
  • +0/-0
raid1 and cron.weekly and 99-raid-check
« on: July 03, 2012, 09:51:23 AM »
hi all
i have all raid1 servers
on sunday mornings the raid1 is being rebuild, it shows up in the  /var/log/raidmonitor/current file and also a email is transmitted
it did this with 2 servers on sme8beta6 also
as far as i can tell on searching the issue, the 99-raid-check in the cron.weekly seems to be the problem on raid1 systems.
it has been looked at in the bug reports and seems to be issued a NOTABUG

i moved the 99-raid-check for now to another directory
it is great to to have a raid check but it this program seems to discover problems that do not seem to be problems
i have not seen anybody address this in the near past

am i doing right by removing the raid-check program
i do not want my raids rebuild for no reason, or on a fluke

the below lines are how i disabled the cron.weekly program


Code: [Select]
cd /
mkdir -p /opt/removed
cp /etc/cron.weekly/99-raid-check /opt/removed
rm /etc/cron.weekly/99-raid-check

Offline Stefano

  • *
  • 10,839
  • +2/-0
Re: raid1 and cron.weekly and 99-raid-check
« Reply #1 on: July 03, 2012, 10:05:10 AM »
purvis..

it's a know issue, read here

anyway, I would not turn off raid check..

Offline purvis

  • ****
  • 567
  • +0/-0
Re: raid1 and cron.weekly and 99-raid-check
« Reply #2 on: July 03, 2012, 10:37:49 AM »
thanks Stefano
my servers have too many files for a unnecessary rebuild and also it takes about 6 hours at one location and i expect about another 10 hours on the server i am working on now with the update to sme8
i believe you told me in the past that you are not running raid1 but something much larger in disk drive numbers.
i just did not pick this up on my test servers, one was rebuilding every week since like nov of 2011

i am worried about this file that i moved getting placed back onto my the servers during any future updates until i get a better handle on the issue and how to prevent this from happening.
i did find some notes at the centos website but i did not see a resolution to it for raid1 systems.


Offline Stefano

  • *
  • 10,839
  • +2/-0
Re: raid1 and cron.weekly and 99-raid-check
« Reply #3 on: July 03, 2012, 11:09:22 AM »
strange.. AFAIK the weekly rebuild is only for  /boot array

I have several servers with raid1, and some of them are SME8..

if you are experiencing raid array rebuilding  for the data array, you should take a look at the logs, because there should be someting wrong that you should be aware of..

Offline purvis

  • ****
  • 567
  • +0/-0
Re: raid1 and cron.weekly and 99-raid-check
« Reply #4 on: July 03, 2012, 11:24:55 AM »
i had deleted all the "current" files that most the rebuild notifications.
I did have some emails but they where not that explanitory
if you have raid1, would you please review your  "current" file
"cat /var/log/raidmonitor/current"

if i had read right, the data side was rebuild too.
i hope that i am wrong.
i have already disabled the 99-raid-check in the production servers that i have got running now and deleted the current info as i consider the info in there noise after i fix any drive problems and consider any new text in the file as problems to be looked at and usually fixed, which to me usually means a drive to replace.
i have a web page on each server that displays the file "current" in the /var/log/raidmonitor directory  with a php script.



Offline purvis

  • ****
  • 567
  • +0/-0
Re: raid1 and cron.weekly and 99-raid-check
« Reply #5 on: July 03, 2012, 11:37:21 AM »
i went back and read the emails from the test sme8 beta version that is still working and from what i am seeing
it does look as if only the boot partition is being rebuilt because it only took about 1 minute looking at the email date and times.
but i thought the current file read different, i would like to be wrong about all this.

Offline Stefano

  • *
  • 10,839
  • +2/-0
Re: raid1 and cron.weekly and 99-raid-check
« Reply #6 on: July 03, 2012, 12:00:03 PM »
Code: [Select]
[root@fileserver ~]$ cat /var/log/raidmonitor/current | tai64nlocal
2012-04-01 04:22:03.199508500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-04-01 04:22:03.556841500 Event: RebuildFinished, Device: /dev/md1, Member:
2012-04-08 04:22:02.955305500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-04-08 04:22:03.324030500 Event: RebuildFinished, Device: /dev/md1, Member:
2012-04-15 04:22:03.398342500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-04-15 04:22:03.753300500 Event: RebuildFinished, Device: /dev/md1, Member:
2012-04-22 04:22:02.020038500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-04-22 04:22:06.723849500 Event: RebuildFinished, Device: /dev/md1, Member:
2012-04-29 04:22:02.807889500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-04-29 04:22:03.335523500 Event: RebuildFinished, Device: /dev/md1, Member:
2012-05-06 04:22:02.994157500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-05-06 04:22:03.386231500 Event: RebuildFinished, Device: /dev/md1, Member:
2012-05-13 04:22:03.242857500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-05-13 04:22:03.890641500 Event: RebuildFinished, Device: /dev/md1, Member:
2012-05-20 04:22:03.630331500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-05-20 04:22:04.015167500 Event: RebuildFinished, Device: /dev/md1, Member:
2012-05-27 04:22:02.775402500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-05-27 04:22:04.094962500 Event: RebuildFinished, Device: /dev/md1, Member:
2012-06-03 04:22:02.027454500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-06-03 04:22:04.746238500 Event: RebuildFinished, Device: /dev/md1, Member:
2012-06-10 04:22:02.124750500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-06-10 04:22:05.718231500 Event: RebuildFinished, Device: /dev/md1, Member:
2012-06-17 04:22:03.514358500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-06-17 04:22:03.906898500 Event: Rebuild40, Device: /dev/md1, Member:
2012-06-17 04:22:06.006501500 Event: RebuildFinished, Device: /dev/md1, Member:
2012-06-24 04:22:03.494050500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-06-24 04:22:03.891586500 Event: RebuildFinished, Device: /dev/md1, Member:
2012-07-01 04:22:03.196314500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-07-01 04:22:03.797266500 Event: Rebuild80, Device: /dev/md1, Member:
2012-07-01 04:22:05.815311500 Event: RebuildFinished, Device: /dev/md1, Member:

as you can see, only /dev/md1 is rebuilt

Offline purvis

  • ****
  • 567
  • +0/-0
Re: raid1 and cron.weekly and 99-raid-check
« Reply #7 on: July 03, 2012, 10:22:31 PM »
thanks for posting that
that is like what matches up with the emails sent to the admin on my machines

do you find it odd that some rebuilds took several minutes while most took less than a minute
Code: [Select]
2012-06-17 04:22:03.514358500 Event: RebuildStarted, Device: /dev/md1, Member:
2012-06-17 04:22:03.906898500 Event: Rebuild40, Device: /dev/md1, Member:
2012-06-17 04:22:06.006501500 Event: RebuildFinished, Device: /dev/md1, Member: