Koozali.org: home of the SME Server
Contribs.org Forums => Koozali SME Server 10.x => Topic started by: wdepot on September 09, 2023, 09:06:05 PM
-
When I have config setprop smartd status enabled set isn't the server supposed to email the admin when a hard drive begins to fail? We just recently had our main server running SME 10.1 die. When I tried the hard drives in another computer the BIOS on that machine immediately showed that the SMART status for both drives was failed. I never got any emails from the server telling me that the hard drives were going bad. If I had we would have obtained new drives and replaced the old ones before they failed entirely. Do I also need to make sure SMART is enabled in BIOS for the SME smartd to do its job? I'm thinking that I will enable hardware RAID 1 when the replacement hard drives arrive.
-
you should get such messages, but you need to follow them, here an example
SMART error (CurrentPendingSector) detected on host: x
This message was generated by the smartd daemon running on:
host name: x
DNS domain: xxxxx.com
The following warning/error was logged by the smartd daemon:
Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Device info:
CT1000MX500SSD1, S/N:xxxxxxxxxx, WWN:5-00a075-1e626ce91, FW:M3CR043, 1.00 TB
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
Another message will be sent in 24 hours if the problem persists.
-
further more with soft raid you should recieve such alert also
Fail event on /dev/md127:x
This is an automatically generated mail message from mdadm
running on x
A Fail event had been detected on md device /dev/md127.
It could be related to component device /dev/sdd1.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid1]
md0 : active raid1 sde1[3] sda1[0]
255936 blocks super 1.0 [2/2] [UU]
md1 : active raid1 sde2[2] sda2[0]
976373760 blocks super 1.1 [2/2] [UU]
bitmap: 6/8 pages [24KB], 65536KB chunk
md127 : active raid1 sdd1[0](F) sdc1[2]
3906887424 blocks super 1.2 [2/1] [_U]
bitmap: 4/30 pages [16KB], 65536KB chunk
-
you should get such messages, but you need to follow them, here an example
That's what I thought, though I'm not sure what you mean by following them. All admin emails are forwarded to our primary email address so I get them all, mainly messages from fail2ban and the daily backup report. I never did get any messages that the hard drives were having problems Though I have in the past prior to SMEServer 10.