Koozali.org: home of the SME Server

Obsolete Releases => SME 7.x Contribs => Topic started by: kevinb on September 26, 2009, 05:54:01 PM

Title: [ANNOUNNCE] Yet another SPAM learning script
Post by: kevinb on September 26, 2009, 05:54:01 PM
This script is designed for our environment (SOGo web mail with Thunderbird email clients, neither Horde nor Outlook are used). It will teach spam as well as ham to Spamassassin.

Please feel free to add any comments!

Requirements:

Once a day the script will:

In a shell:

Code: [Select]
nano -w learnspam.sh
Note that you must replace <your server hostname> with your server's hostname ("server" for "server.sme.org") in the following code.

Code: [Select]
#!/bin/bash
(
date
echo ''
echo ''
for userdir in $(ls -A1 /home/e-smith/files/users)
do
        echo $userdir
        echo "Find email in the junkmail folder that is less than 24 hours old and feed it to sa-learn"
        test -d /home/e-smith/files/users/$userdir/Maildir/.junkmail/cur && find /home/e-smith/files/users/$userdir/Maildir/.junkmail/cur -iname '*.<your server hostname>*' -type f -ctime 0 -exec sa-learn --spam --no-sync '{}' \; || echo "    No junkmail folder"

        echo "Find email in the Saved folder and sub-folders that is less than 24 hours old and feed it to sa-learn"
        test -d /home/e-smith/files/users/$userdir/Maildir/.Saved && find /home/e-smith/files/users/$userdir/Maildir -path '*/.Saved*cur*.<your server hostname>*' -type f -ctime 0 -exec sa-learn --ham --no-sync '{}' \; || echo "    No Saved mail folders"

        echo "Find email in the Trash folder that is more than 30 days old and delete it"
        test -d /home/e-smith/files/users/$userdir/Maildir/.Trash && find /home/e-smith/files/users/$userdir/Maildir/.Trash -type f -ctime +30 -exec rm -vf '{}' \; || echo "    No Trash folder"

echo
done

sa-learn --sync
date
) 1>/var/log/teach_spam.log 2>&1

sleep 10
mail -s "Learn SPAM" admin </var/log/teach_spam.log

Code: [Select]
chmod +x learnspam.sh
Configure the script to run once a day, preferably just after midnight. I have it setup to run as a pre-command to an AFFA backup.

Known issues and comments:


I hope some of you find this useful.

Kevin
Title: Re: [ANNOUNNCE] Yet another SPAM learning script
Post by: cactus on September 27, 2009, 10:10:18 AM
Configure the script to run once a day, preferably just after midnight. I have it setup to run as a pre-command to an AFFA backup.
Which might IMHO have the undesired drawback that if the script fails the pre-backup event fails and no backup is made. Why not ust configure a seperate cronjob for it?
Title: Re: [ANNOUNNCE] Yet another SPAM learning script
Post by: kevinb on September 27, 2009, 03:34:50 PM
Good point cactus,

I did not test this. I do not know what happens to the affa job if the precommand fails. Does it timeout and continue?

I run it before so the "find" command is looking back 24 hours from a consistent starting point and, if the move files from the Junk folder to the junkmail folder method is used you do not risk moving files during the backup.

A separate cron may be the most advisable method.
Title: Re: [ANNOUNNCE] Yet another SPAM learning script
Post by: Knuddi on September 29, 2009, 09:47:10 PM
Did you consider to use SpamAssassin Coach for this purpose? As far as I can see then all which is needed is for the spamd process to accept connection from others than the localhost. This way your users can determine themselves at the speed they like.

http://sourceforge.net/projects/soc2006spamd/

I have tried myself, but maybe its time...

/Jesper
Title: Re: [ANNOUNNCE] Yet another SPAM learning script
Post by: kevinb on September 30, 2009, 01:21:19 AM
I did not know that project existed ... thanks Jesper.

There is not much there for docs.

Does it only "coach" SA when the user decides it's spam or not? Or does it also learn from TB flags as spam?

BTW ... the above script can learn every spam email if you have TB use the default Junk folder, sa-learn every email in the Junk folder, then move the emails to junkmail. Every ham email can be taught if you use a diff command. But my thought was we'll get enough taught with the simpler method presented above.

Thanks for the feedback!
Title: Re: [ANNOUNNCE] Yet another SPAM learning script
Post by: Knuddi on September 30, 2009, 08:17:21 AM
As far as I can read it does sa-learn via SA's socket interface based on user feedback. So if the user classifies an email as spam and presses that button it will learn that email as spam (and remove it). It doesn't run through mails in the Inbox or junkmail folder as I see it.

/Jesper