Koozali.org: home of the SME Server

LearnAsNoSpam fpr Sonora Bayesian filtering ?

Offline TeNeCo

  • ***
  • 60
  • +0/-0
LearnAsNoSpam fpr Sonora Bayesian filtering ?
« on: October 21, 2007, 12:44:23 PM »
wouldn't it be nice to have not only a folder "LearnAsSpam" but also a folder called "LearnAsNoSpam" to move in the mails that have been sortet into the junk folder although they are no spam?

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
« Reply #1 on: October 21, 2007, 12:51:51 PM »
wouldn't it be nice to have not only a folder "LearnAsSpam" but also a folder called "LearnAsNoSpam" to move in the mails that have been sortet into the junk folder although they are no spam?
Yes it is, normally this is called LearnAsHam. You can easily implement it yourself:
1. Make a copy of the /usr/bin/LearnAsSpam.pl script and rename it to /usr/bin/LearnAsHam.pl
2. Modify the line that reads:
Code: [Select]
my $result = `su - root -c "/usr/bin/sa-learn --spam $filetolearn"`;to read
Code: [Select]
my $result = `su - root -c "/usr/bin/sa-learn --ham $filetolearn"`;3. Also create a copy of the /etc/cron.d/LearnAsSpam.cron script and call it /etc/cron.d/LearnAsHam.cron and modify the content to start the LearnAsHam script.
4. Last step is to create a LearnAsHam directory in your users mailboxes.
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline TeNeCo

  • ***
  • 60
  • +0/-0
Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
« Reply #2 on: October 21, 2007, 01:56:26 PM »
Thank you for your quick answer.
In the file LearnAsSpam.pl  is also a line called:
my $dirname = sprintf "LearnAsSpam";
Do I have to change it to LearnAsHAM, too?

Offline mmccarn

  • *
  • 2,651
  • +10/-0
Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
« Reply #3 on: October 21, 2007, 04:15:09 PM »
Here's a bug with a 'LearnAsHAM.pl' attachment: http://bugs.contribs.org/show_bug.cgi?id=1701

Just make sure that your users know to *copy* their HAM to the 'LearnAsHam' folder, and don't accidentally *move* it there - the contents are deleted at each "learning".


Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
« Reply #4 on: October 21, 2007, 05:08:15 PM »
Thank you for your quick answer.
In the file LearnAsSpam.pl  is also a line called:
my $dirname = sprintf "LearnAsSpam";
Do I have to change it to LearnAsHAM, too?
Oops, good remark! Yes please change the folder name accordingly to your setup, in my example it shoudl read LearnAsHam indeed.
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline Normando

  • *
  • 841
  • +2/-1
    • Unixlan
Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
« Reply #5 on: October 22, 2007, 01:05:06 AM »
I have used this great hack to create folders automatically.

http://wiki.horde.org/ImpAutomaticDefaultFolderCreation?referrer=HowTo#

Offline TeNeCo

  • ***
  • 60
  • +0/-0
Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
« Reply #6 on: December 14, 2007, 10:31:05 AM »
Hm, it's strange: I move certain mails into that "LearnAsHam" folder, the mails vanish from that folder every night but new mails from that sender are still moved to the junkmail folder.

In the header I can find:
X-Spam-Status: Yes, hits=2.6 required=2.0   tests=HTML_MESSAGE,INVALID_MSGID,NORMAL_HTTP_TO_IP,SPF_PASS

is there somewhere a white list for ClamAV?

Offline brianr

  • *
  • 990
  • +2/-0
Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
« Reply #7 on: December 14, 2007, 10:52:48 AM »
Spamassassin uses more than just the sender to score the emails.  Perhaps the content is "spammy", often graphic signatures can trip the scoring.  also your threshold at 2 is VERY low.  I use 3 in a few places but normally got for 4 or 5.

what you need is Darrell May's WBL contrib, which you can find here:

http://mirror.contribs.org/smeserver/contribs/dmay/smeserver/7.x/testing/smeserver-wbl/

download the latest rpm  (not the src one), install it by:

yum localinstall *.rpm

and then go to the server manager and plug in your email address into the email-wbl / accept panel, don't forget to "save" and then "update" form the panel.

Spamassasin will now give the emails from the specified sender a score of -100!     
Brian j Read
(retired, for a second time, still got 2 installations though)
The instrument I am playing is my favourite Melodeon.
.........

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
« Reply #8 on: December 14, 2007, 10:54:11 AM »
Hm, it's strange: I move certain mails into that "LearnAsHam" folder, the mails vanish from that folder every night but new mails from that sender are still moved to the junkmail folder.

In the header I can find:
X-Spam-Status: Yes, hits=2.6 required=2.0   tests=HTML_MESSAGE,INVALID_MSGID,NORMAL_HTTP_TO_IP,SPF_PASS

is there somewhere a white list for ClamAV?
This has nothing to do with ClamAV (that is for anti-virus), this is the work of SpamAssasin (the spam fighter engine on SME Server).

It is perfectly normal that this is still qualified as spam as the total score based on the checks is above the required level, moving items to the learns as ham folder will not automatically white list the senders address as it checks more than only e-mailaddresses as you can see in the tests list: it was a message in HTML format, it had a invalid message ID, passed SPF checks... Each check will add (or subtract sometimes) a certain value from the spam score.

To white list senders you will have to make use of the white list feature which you could also have found in the excellent howto from sonoracomm. It is listed somewhere in the middle of the page.
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline mmccarn

  • *
  • 2,651
  • +10/-0
Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
« Reply #9 on: December 14, 2007, 03:32:09 PM »
Bayesian filtering will not affect the score of a message until the bayesian subsystem has seen at least 200 messages of a given category (spam or ham).

So, if you have not yet "fed" LearnAsHam at least 200 messages it will not apply any "this is ham" ruling.

You can learn the status of your bayesian database using sa-learn --dump magic; here's what the output should look like:
Code: [Select]
[root@sme ~]# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0       4041          0  non-token data: nspam
0.000          0       4326          0  non-token data: nham
0.000          0     137660          0  non-token data: ntokens
0.000          0 1185313443          0  non-token data: oldest atime
0.000          0 1197633835          0  non-token data: newest atime
0.000          0 1197564788          0  non-token data: last journal sync atime
0.000          0 1196372192          0  non-token data: last expiry atime
0.000          0   11059200          0  non-token data: last expire atime delta
0.000          0      25757          0  non-token data: last expire reduction count

"nspam" (4041 above) needs to be above 200 before the bayesian filtering will add to the spam score of a message.

"nham" (4326 above) needs to be above 200 before bayesian filtering will reduce the spam score of a message.


Offline crazybob

  • *****
  • 894
  • +0/-0
    • Stalzer R&D
Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
« Reply #10 on: May 11, 2008, 01:09:14 PM »
I have used this great hack to create folders automatically.

http://wiki.horde.org/ImpAutomaticDefaultFolderCreation?referrer=HowTo#

Using the above script, were you able to create folders with upper case letters in the folder name? If so how? 8-)

Thanks

Bob
If you think you know whats going on, you obviously have no idea whats going on!

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
« Reply #11 on: May 11, 2008, 01:26:12 PM »
Using the above script, were you able to create folders with upper case letters in the folder name? If so how? 8-)

Thanks

Bob
Why would you use that method? SME Server has a default skeleton for user folders, if you add the .LearnAsSpam and .LearnAsHam folder in the appropriate place it should add it to every new user you make from there on, you will have to add them to all existing users manually.

The mail folder skel files are located in: /etc/e-smith/skel/user/Maildir
The user Maildir are located at: /home/e-smith/files/$user/Maildir

To quickly create the dirs for existing users you can use the following code as root user:

Code: [Select]
pushd /home/e-smith/files/users/; \
for u in `find -maxdepth 1 -type d`; \
do \
mkdir -p $u/Maildir/.LearnAsHam/{cur,new,tmp}; \
mkdir -p $u/Maildir/.LearnAsSpam/{cur,new,tmp}; \
done; \
popd

To create the directories in the skel folder (so they will be created for every new user):

Code: [Select]
mkdir -p /etc/e-smith/skel/user/Maildir/{.LearnAsHam/{cur,new,tmp},.LearnAsSpam/{cur,new,tmp}}
Edit: updated create directory statements to create ./new ./tmp ./cur in every folder
« Last Edit: May 11, 2008, 01:35:13 PM by cactus »
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline crazybob

  • *****
  • 894
  • +0/-0
    • Stalzer R&D
Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
« Reply #12 on: May 11, 2008, 06:16:43 PM »
Cactus,

Thanks a bunch. Will give it a try.

Bob

Works Great :-P
« Last Edit: May 11, 2008, 06:27:01 PM by crazybob »
If you think you know whats going on, you obviously have no idea whats going on!

Offline kevinb

  • *
  • 237
  • +0/-0
Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
« Reply #13 on: May 13, 2008, 09:25:57 PM »
I have not tried this myself yet but if you go to the bottom of the page at http://www.scalix.com/wiki/index.php?title=HowTos/SpamAssassin there is a script that can be modified I believe to work with SME simply.

I was planning on having the script look for emails in the ".Junk" and ".Junk E-mail" folders (these are created and used by Thunderbird and Outlook) that were more than two weeks old (giving the user plenty of time to recover any email if needed) and learn them as SPAM and delete them. This is user friendly since TB and OL have nice interfaces for flagging email as SPAM and HAM.

The users would be instructed NOT to keep emails or folders in their Inbox but move them to the "Saved" folder or sub-folders. These folders are used for the HAM learning as well as speeding up email client and web mail startup.

The LEARNASHAM script would look in the "Saved" folder and sub-folders for email that is newer than one day and learn it as HAM.

These scripts would run every night.

Kevin

Offline Amir Inbar

  • *
  • 113
  • +0/-0
    • http://www.sheroot.net
Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
« Reply #14 on: October 01, 2008, 07:27:46 PM »
There is one problem with :
Quote
pushd /home/e-smith/files/users/; \
for u in `find -maxdepth 1 -type d`; \
do \
mkdir -p $u/Maildir/.LearnAsHam/{cur,new,tmp}; \
mkdir -p $u/Maildir/.LearnAsSpam/{cur,new,tmp}; \
done; \
popd
since it is ran as root, the owner is root and the group is root thus users can't copy mail to folders created.
......