help automating SMART drive reports with cron

wdepot

101
+0/-0

help automating SMART drive reports with cron

« on: September 14, 2023, 07:37:20 PM »

After having the two hard drives on our primary server fail unexpectedly without ever getting a warning that they were beginning to go bad even though we've always had the smartd status enabled I had the sudden idea that I could use cron to automate a monthly check of the hard drives and email the reports to admin. I have no problem creating the templates-custom/etc/crontab file containing the cron jobs to run and store the reports as follows:

Code: [Select]

# run monthly SMART disk tests with cron
30 00 1 * * root smartctl -t long /dev/sda
30 00 1 * * root smartctl -t long /dev/sdb
40 02 1 * * root smartctl -a /dev/sda > /home/e-smith/files/ibays/Primary/files/sda-smart.txt
40 02 1 * * root smartctl -a /dev/sdb > /home/e-smith/files/ibays/Primary/files/sdb-smart.txt
45 02 1 * * root ??command to email reports to admin

The one part I don't know how to do is the last line of the file where the two report files that are generated are emailed to the admin user. I know I could technically just download the files from the server to examine them but I also know that I would forget to do so if I don't get the reports automatically sent. Any help I can get with how to automatically create and send an email with the subject "Server Hard Drive Status" and the two text files either attached or else included as the content for the email would be greatly appreciated. I suspect a separate script file may be needed to create and send the email with the script being triggered by cron. In that case I would need to know the wording for the script and the best place to store it where the system will back it up whenever a standard full backup gets run.

Logged

Jean-Philippe Pialasse

2,911
+11/-0
aka Unnilennium

Re: help automating SMART drive reports with cron

« Reply #1 on: September 15, 2023, 12:01:31 AM »

well the wheel exists
/etc/smartd.conf:

Code: [Select]

DEVICESCAN -a -m admin -M diminishing -I 190 -I 194

Code: [Select]

   -a     Equivalent to turning on all of the  following  Directives:
              '-H' to check the SMART health status, '-f' to report fail‐
              ures of Usage (rather than  Prefail)  Attributes,  '-t'  to
              track  changes  in  both  Prefailure  and Usage Attributes,
              '-l error' to report increases in the number of ATA errors,
              '-l selftest'  to  report  increases in the number of Self-
              Test Log errors,  '-l selfteststs'  to  report  changes  of
              Self-Test execution status, '-C 197' to report nonzero val‐
              ues of the current pending sector count, and  '-U  198'  to
              report nonzero values of the offline pending sector count.

              Note  that  -a  is the default for ATA devices.  If none of
              these other Directives is given, then -a is assumed.

Code: [Select]

    -I ID  [ATA only] Ignore device Attribute ID when tracking changes
              in the Attribute values.  ID must be a decimal integer  in
              the  range  from  1  to  255.   This Directive modifies the
              behavior of the '-p', '-u', and  '-t'  tracking  Directives
              and has no effect without one of them.

Code: [Select]

       -M TYPE
              These  Directives  modify  the behavior of the smartd email
              warnings enabled with the '-m'  email  Directive  described
              above.  These '-M' Directives only work in conjunction with
              the '-m' Directive and can not be used without it.

              Multiple -M Directives may be given.  If more than  one  of
              the  following  three  -M Directives are given (example: -M
              once -M daily) then the  final  one  (in  the  example,  -M
              daily) is used.

              The  valid  arguments  to  the -M Directive are (one of the
              following three):

              once - send only one warning email for each  type  of  disk
              problem detected.  This is the default unless state persis‐
              tence ('-s' option) is enabled.

              daily - send additional warning reminder emails,  once  per
              day,  for  each type of disk problem detected.  This is the
              default if state persistence ('-s' option) is enabled.

              diminishing -  send  additional  warning  reminder  emails,
              after  a  one-day interval, then a two-day interval, then a
              four-day interval, and so on for each type of disk  problem
              detected.   Each  interval is twice as long as the previous
              interval.

              If a disk problem is no longer detected, the internal email
              counter  is  reset.  If the problem reappears a new warning
              email is sent immediately.

and the ignored code are:
190 Airflow_Temperature_Cel
194 Temperature_Celsius

by you must follow the alert i was pointing the increasing alert interval.

cron will mail to admin all output of any command. so you do mot need anymore command.
also by doing so you will send all your smart alerts in the middle of cron noise email. you have no easy way to see the change from one month to the other, you will not have any alert if there is a change 3 days after the last monthly check.

I do not say that test should not be done. but not sure you will get more attention from a cron mail than one dedicated to smart changes.

« Last Edit: September 15, 2023, 12:06:16 AM by Jean-Philippe Pialasse »

Logged

wdepot

101
+0/-0

Re: help automating SMART drive reports with cron

« Reply #2 on: September 15, 2023, 02:23:04 AM »

Okay, so if I understood you correctly I could do the template as

Code: [Select]

# run monthly SMART disk tests with cron
30 00 1 * * root smartctl -t long /dev/sda
30 00 1 * * root smartctl -t long /dev/sdb
40 02 1 * * root smartctl -a /dev/sda
40 02 1 * * root smartctl -a /dev/sdb

and cron will automatically email me the results of the smartctl -a lines.

One question about the directives you listed. When I do config show smartd I notice that the directive line shows -M diminishing. Based on the directives you listed do you think it would be a good idea to update the smartd directive to -a -M daily instead?

Logged

Jean-Philippe Pialasse

2,911
+11/-0
aka Unnilennium

Re: help automating SMART drive reports with cron

« Reply #3 on: September 15, 2023, 05:25:56 AM »

Quote from: wdepot on September 15, 2023, 02:23:04 AM

Okay, so if I understood you correctly I could do the template as

Code: [Select]
# run monthly SMART disk tests with cron 30 00 1 * * root smartctl -t long /dev/sda 30 00 1 * * root smartctl -t long /dev/sdb 40 02 1 * * root smartctl -a /dev/sda 40 02 1 * * root smartctl -a /dev/sdb
and cron will automatically email me the results of the smartctl -a lines.

correct
and you should send to dev null the two first lines to avoid noise.

Quote from: wdepot on September 15, 2023, 02:23:04 AM

One question about the directives you listed. When I do config show smartd I notice that the directive line shows -M diminishing. Based on the directives you listed do you think it would be a good idea to update the smartd directive to -a -M daily instead?

yes, but you will get a huge amount of daily message with nothing important and will end to discart them or ignore them.

Have you heard about airport x-ray screening machines adding false images to check if the human behind is still watching?
The more noise you get the least you will react.

Logged

wdepot

101
+0/-0

Re: help automating SMART drive reports with cron

« Reply #4 on: September 15, 2023, 06:27:31 PM »

Quote from: Jean-Philippe Pialasse on September 15, 2023, 05:25:56 AM

correct
and you should send to dev null the two first lines to avoid noise. yes, but you will get a huge amount of daily message with nothing important and will end to discard them or ignore them.

So add > /dev/null to the end of the first two lines of the cron job, easy enough.

As for the for the directive line for smartd I take it I should add -a to the line but leave -M diminishing alone, is that correct?

Logged