Koozali.org: home of the SME Server

Smartd under SME Server 10

Offline wdepot

  • ***
  • 89
  • +0/-0
    • http://westerndepot.com
Smartd under SME Server 10
« on: September 09, 2023, 09:06:05 PM »
When I have config setprop smartd status enabled set isn't the server supposed to email the admin when a hard drive begins to fail? We just recently had our main server running SME 10.1 die. When I tried the hard drives in another computer the BIOS on that machine immediately showed that the SMART status for both drives was failed. I never got any emails from the server telling me that the hard drives were going bad. If I had we would have obtained new drives and replaced the old ones before they failed entirely. Do I also need to make sure SMART is enabled in BIOS for the SME smartd to do its job? I'm thinking that I will enable hardware RAID 1 when the replacement hard drives arrive.

Offline Jean-Philippe Pialasse

  • *
  • 2,802
  • +11/-0
  • aka Unnilennium
    • http://smeserver.pialasse.com
Re: Smartd under SME Server 10
« Reply #1 on: September 10, 2023, 06:44:01 AM »
you should get such messages, but you need to follow them, here an example

Code: [Select]
SMART error (CurrentPendingSector) detected on host: x

This message was generated by the smartd daemon running on:
   host name:  x
   DNS domain: xxxxx.com

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors

Device info:
CT1000MX500SSD1, S/N:xxxxxxxxxx, WWN:5-00a075-1e626ce91, FW:M3CR043, 1.00 TB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
Another message will be sent in 24 hours if the problem persists.

Offline Jean-Philippe Pialasse

  • *
  • 2,802
  • +11/-0
  • aka Unnilennium
    • http://smeserver.pialasse.com
Re: Smartd under SME Server 10
« Reply #2 on: September 10, 2023, 06:47:11 AM »
further more with soft raid you should recieve such alert also

Quote
Fail event on /dev/md127:x
This is an automatically generated mail message from mdadm
running on x

A Fail event had been detected on md device /dev/md127.

It could be related to component device /dev/sdd1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1]
md0 : active raid1 sde1[3] sda1[0]
      255936 blocks super 1.0 [2/2] [UU]
     
md1 : active raid1 sde2[2] sda2[0]
      976373760 blocks super 1.1 [2/2] [UU]
      bitmap: 6/8 pages [24KB], 65536KB chunk

md127 : active raid1 sdd1[0](F) sdc1[2]
      3906887424 blocks super 1.2 [2/1] [_U]
      bitmap: 4/30 pages [16KB], 65536KB chunk

Offline wdepot

  • ***
  • 89
  • +0/-0
    • http://westerndepot.com
Re: Smartd under SME Server 10
« Reply #3 on: September 12, 2023, 11:41:23 PM »
you should get such messages, but you need to follow them, here an example

That's what I thought, though I'm not sure what you mean by following them. All admin emails are forwarded to our primary email address so I get them all, mainly messages from fail2ban and the daily backup report. I never did get any messages that the hard drives were having problems Though I have in the past prior to SMEServer 10.