This script is designed for our environment (SOGo web mail with Thunderbird email clients, neither Horde nor Outlook are used). It will teach spam as well as ham to Spamassassin.
Please feel free to add any comments!
Requirements:
- Users are instructed that any email they wish to save must be moved to the "Saved" folder or any folder under the "Saved" folder in Thunderbird or SOGo.
- In Thunderbird the default profile that is pushed out to the users has the Junk folder set to "junkmail" and email flagged as Junk are set to "read".
- Users should use the "Junk"/"Not Junk" buttons in Thunderbird to flag spam. This way both Thunderbird and Spamassassin are learning.
Once a day the script will:
- Search the "junkmail" folder for read emails less than one day old and feed them to "sa-learn as "spam".
- Search all folders, including and under the "Saved" folder for read emails that are less than one day old and feed them to "sa-learn" as "ham".
- Search the "Trash" fodler for any emails older than 30 days and delete them.
- Email the log file to the admin.
In a shell:
nano -w learnspam.sh
Note that you must replace <your server hostname> with your server's hostname ("server" for "server.sme.org") in the following code.
#!/bin/bash
(
date
echo ''
echo ''
for userdir in $(ls -A1 /home/e-smith/files/users)
do
echo $userdir
echo "Find email in the junkmail folder that is less than 24 hours old and feed it to sa-learn"
test -d /home/e-smith/files/users/$userdir/Maildir/.junkmail/cur && find /home/e-smith/files/users/$userdir/Maildir/.junkmail/cur -iname '*.<your server hostname>*' -type f -ctime 0 -exec sa-learn --spam --no-sync '{}' \; || echo " No junkmail folder"
echo "Find email in the Saved folder and sub-folders that is less than 24 hours old and feed it to sa-learn"
test -d /home/e-smith/files/users/$userdir/Maildir/.Saved && find /home/e-smith/files/users/$userdir/Maildir -path '*/.Saved*cur*.<your server hostname>*' -type f -ctime 0 -exec sa-learn --ham --no-sync '{}' \; || echo " No Saved mail folders"
echo "Find email in the Trash folder that is more than 30 days old and delete it"
test -d /home/e-smith/files/users/$userdir/Maildir/.Trash && find /home/e-smith/files/users/$userdir/Maildir/.Trash -type f -ctime +30 -exec rm -vf '{}' \; || echo " No Trash folder"
echo
done
sa-learn --sync
date
) 1>/var/log/teach_spam.log 2>&1
sleep 10
mail -s "Learn SPAM" admin </var/log/teach_spam.log
chmod +x learnspam.sh
Configure the script to run once a day, preferably just after midnight. I have it setup to run as a pre-command to an AFFA backup.
Known issues and comments:
- Emails caught as SPAM by Spamassassin and are moved to the "junkemail" folder will be taught as spam to Spamassassin. This is not an issue since Spamassassin will recognize these emails and not learn new tokens from them but it does take more resources. A work around is to leave the default setting in Thunderbird and have it drop spam into the "Junk" folder and then have the script only learn from the "Junk" folder and then move these files to the "junkmail" folder. The downside to this is that the users will have to look for false spam email in two folders.
- Email that is not flagged as "read" is ignored. This can be changed by having the script look in the actual listed folder and not the "cur" sub-folder or have it look in both the "cur" and "new" sub-folders.
- Emails that are over a day old in the "Inbox" before they are processed by the client or user may be ignored. I am not sure if the file system flags a file as "changed" after a move.
I hope some of you find this useful.
Kevin