Koozali.org: home of the SME Server

Obsolete Releases => SME 7.x Contribs => Topic started by: TeNeCo on October 21, 2007, 12:44:23 PM

Title: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: TeNeCo on October 21, 2007, 12:44:23 PM
wouldn't it be nice to have not only a folder "LearnAsSpam" but also a folder called "LearnAsNoSpam" to move in the mails that have been sortet into the junk folder although they are no spam?
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: cactus on October 21, 2007, 12:51:51 PM
wouldn't it be nice to have not only a folder "LearnAsSpam" but also a folder called "LearnAsNoSpam" to move in the mails that have been sortet into the junk folder although they are no spam?
Yes it is, normally this is called LearnAsHam. You can easily implement it yourself:
1. Make a copy of the /usr/bin/LearnAsSpam.pl script and rename it to /usr/bin/LearnAsHam.pl
2. Modify the line that reads:
Code: [Select]
my $result = `su - root -c "/usr/bin/sa-learn --spam $filetolearn"`;to read
Code: [Select]
my $result = `su - root -c "/usr/bin/sa-learn --ham $filetolearn"`;3. Also create a copy of the /etc/cron.d/LearnAsSpam.cron script and call it /etc/cron.d/LearnAsHam.cron and modify the content to start the LearnAsHam script.
4. Last step is to create a LearnAsHam directory in your users mailboxes.
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: TeNeCo on October 21, 2007, 01:56:26 PM
Thank you for your quick answer.
In the file LearnAsSpam.pl  is also a line called:
my $dirname = sprintf "LearnAsSpam";
Do I have to change it to LearnAsHAM, too?
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: mmccarn on October 21, 2007, 04:15:09 PM
Here's a bug with a 'LearnAsHAM.pl' attachment: http://bugs.contribs.org/show_bug.cgi?id=1701

Just make sure that your users know to *copy* their HAM to the 'LearnAsHam' folder, and don't accidentally *move* it there - the contents are deleted at each "learning".

Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: cactus on October 21, 2007, 05:08:15 PM
Thank you for your quick answer.
In the file LearnAsSpam.pl  is also a line called:
my $dirname = sprintf "LearnAsSpam";
Do I have to change it to LearnAsHAM, too?
Oops, good remark! Yes please change the folder name accordingly to your setup, in my example it shoudl read LearnAsHam indeed.
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: Normando on October 22, 2007, 01:05:06 AM
I have used this great hack to create folders automatically.

http://wiki.horde.org/ImpAutomaticDefaultFolderCreation?referrer=HowTo#
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: TeNeCo on December 14, 2007, 10:31:05 AM
Hm, it's strange: I move certain mails into that "LearnAsHam" folder, the mails vanish from that folder every night but new mails from that sender are still moved to the junkmail folder.

In the header I can find:
X-Spam-Status: Yes, hits=2.6 required=2.0   tests=HTML_MESSAGE,INVALID_MSGID,NORMAL_HTTP_TO_IP,SPF_PASS

is there somewhere a white list for ClamAV?
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: brianr on December 14, 2007, 10:52:48 AM
Spamassassin uses more than just the sender to score the emails.  Perhaps the content is "spammy", often graphic signatures can trip the scoring.  also your threshold at 2 is VERY low.  I use 3 in a few places but normally got for 4 or 5.

what you need is Darrell May's WBL contrib, which you can find here:

http://mirror.contribs.org/smeserver/contribs/dmay/smeserver/7.x/testing/smeserver-wbl/

download the latest rpm  (not the src one), install it by:

yum localinstall *.rpm

and then go to the server manager and plug in your email address into the email-wbl / accept panel, don't forget to "save" and then "update" form the panel.

Spamassasin will now give the emails from the specified sender a score of -100!     
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: cactus on December 14, 2007, 10:54:11 AM
Hm, it's strange: I move certain mails into that "LearnAsHam" folder, the mails vanish from that folder every night but new mails from that sender are still moved to the junkmail folder.

In the header I can find:
X-Spam-Status: Yes, hits=2.6 required=2.0   tests=HTML_MESSAGE,INVALID_MSGID,NORMAL_HTTP_TO_IP,SPF_PASS

is there somewhere a white list for ClamAV?
This has nothing to do with ClamAV (that is for anti-virus), this is the work of SpamAssasin (the spam fighter engine on SME Server).

It is perfectly normal that this is still qualified as spam as the total score based on the checks is above the required level, moving items to the learns as ham folder will not automatically white list the senders address as it checks more than only e-mailaddresses as you can see in the tests list: it was a message in HTML format, it had a invalid message ID, passed SPF checks... Each check will add (or subtract sometimes) a certain value from the spam score.

To white list senders you will have to make use of the white list feature which you could also have found in the excellent howto from sonoracomm (http://www.sonoracomm.com/index.php?option=com_content&task=view&id=49&Itemid=32). It is listed somewhere in the middle of the page.
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: mmccarn on December 14, 2007, 03:32:09 PM
Bayesian filtering will not affect the score of a message until the bayesian subsystem has seen at least 200 messages of a given category (spam or ham).

So, if you have not yet "fed" LearnAsHam at least 200 messages it will not apply any "this is ham" ruling.

You can learn the status of your bayesian database using sa-learn --dump magic; here's what the output should look like:
Code: [Select]
[root@sme ~]# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0       4041          0  non-token data: nspam
0.000          0       4326          0  non-token data: nham
0.000          0     137660          0  non-token data: ntokens
0.000          0 1185313443          0  non-token data: oldest atime
0.000          0 1197633835          0  non-token data: newest atime
0.000          0 1197564788          0  non-token data: last journal sync atime
0.000          0 1196372192          0  non-token data: last expiry atime
0.000          0   11059200          0  non-token data: last expire atime delta
0.000          0      25757          0  non-token data: last expire reduction count

"nspam" (4041 above) needs to be above 200 before the bayesian filtering will add to the spam score of a message.

"nham" (4326 above) needs to be above 200 before bayesian filtering will reduce the spam score of a message.

Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: crazybob on May 11, 2008, 01:09:14 PM
I have used this great hack to create folders automatically.

http://wiki.horde.org/ImpAutomaticDefaultFolderCreation?referrer=HowTo#

Using the above script, were you able to create folders with upper case letters in the folder name? If so how? 8-)

Thanks

Bob
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: cactus on May 11, 2008, 01:26:12 PM
Using the above script, were you able to create folders with upper case letters in the folder name? If so how? 8-)

Thanks

Bob
Why would you use that method? SME Server has a default skeleton for user folders, if you add the .LearnAsSpam and .LearnAsHam folder in the appropriate place it should add it to every new user you make from there on, you will have to add them to all existing users manually.

The mail folder skel files are located in: /etc/e-smith/skel/user/Maildir
The user Maildir are located at: /home/e-smith/files/$user/Maildir

To quickly create the dirs for existing users you can use the following code as root user:

Code: [Select]
pushd /home/e-smith/files/users/; \
for u in `find -maxdepth 1 -type d`; \
do \
mkdir -p $u/Maildir/.LearnAsHam/{cur,new,tmp}; \
mkdir -p $u/Maildir/.LearnAsSpam/{cur,new,tmp}; \
done; \
popd

To create the directories in the skel folder (so they will be created for every new user):

Code: [Select]
mkdir -p /etc/e-smith/skel/user/Maildir/{.LearnAsHam/{cur,new,tmp},.LearnAsSpam/{cur,new,tmp}}
Edit: updated create directory statements to create ./new ./tmp ./cur in every folder
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: crazybob on May 11, 2008, 06:16:43 PM
Cactus,

Thanks a bunch. Will give it a try.

Bob

Works Great :-P
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: kevinb on May 13, 2008, 09:25:57 PM
I have not tried this myself yet but if you go to the bottom of the page at http://www.scalix.com/wiki/index.php?title=HowTos/SpamAssassin (http://www.scalix.com/wiki/index.php?title=HowTos/SpamAssassin) there is a script that can be modified I believe to work with SME simply.

I was planning on having the script look for emails in the ".Junk" and ".Junk E-mail" folders (these are created and used by Thunderbird and Outlook) that were more than two weeks old (giving the user plenty of time to recover any email if needed) and learn them as SPAM and delete them. This is user friendly since TB and OL have nice interfaces for flagging email as SPAM and HAM.

The users would be instructed NOT to keep emails or folders in their Inbox but move them to the "Saved" folder or sub-folders. These folders are used for the HAM learning as well as speeding up email client and web mail startup.

The LEARNASHAM script would look in the "Saved" folder and sub-folders for email that is newer than one day and learn it as HAM.

These scripts would run every night.

Kevin
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: Amir Inbar on October 01, 2008, 07:27:46 PM
There is one problem with :
Quote
pushd /home/e-smith/files/users/; \
for u in `find -maxdepth 1 -type d`; \
do \
mkdir -p $u/Maildir/.LearnAsHam/{cur,new,tmp}; \
mkdir -p $u/Maildir/.LearnAsSpam/{cur,new,tmp}; \
done; \
popd
since it is ran as root, the owner is root and the group is root thus users can't copy mail to folders created.
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: kevinb on October 01, 2008, 07:54:28 PM
So ...  a chmod, chown and/or chgroup is in order.

Have you tried this or implemented something similar?
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: Amir Inbar on October 01, 2008, 08:50:32 PM
tnx Kevinb for fast answer.

I don't know how to include it at the script since i run it as root.
how do you use it to change the ownership of each user to the user's ownership and group ?
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: kevinb on October 01, 2008, 11:58:36 PM
I am not a scripting guru but I am sure you can do everything through the command line.

You would have to parse the user's names and their groups and them apply the new ownership and permissions accordingly.

Another option would be to give everyone all rights to all the folders and contents ... chmod 777 etc. I can't think of any security issue with this but then again I am also not a security guru.

Sorry I couldn't be more help.
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: Amir Inbar on October 02, 2008, 07:47:51 AM
Tnx for trying anyway.

I'll try to study it and i'll publish my findings later ...


Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: cactus on October 02, 2008, 06:48:24 PM
There is one problem with :since it is ran as root, the owner is root and the group is root thus users can't copy mail to folders created.
This should do the trick I guess setting the proper ownership as well as excluding the admin folder as this does not hold a Maildir.:
Code: [Select]
pushd /home/e-smith/files/users/; \
for u in `ls | grep -v admin`; \
do \
mkdir -p $u/Maildir/.LearnAsHam/{cur,new,tmp}; \
chown -R $u:$u $u/Maildir/.LearnAsHam/; \
mkdir -p $u/Maildir/.LearnAsSpam/{cur,new,tmp}; \
chown -R $u:$u $u/Maildir/.LearnAsSpam/; \
done; \
popd

Edit: exluded admin user as it does not have a Maildir
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: Amir Inbar on October 02, 2008, 11:00:37 PM
Cactus,
Thank you for helping but this script gets the username as "./user" instead of "user"

here is an explanation :
Code: [Select]
pushd /home/e-smith/files/users/; \
for u in `find  -maxdepth 1 -type d`; \
do \
echo $u
done; \
popd
that gives :
Quote
/home/e-smith/files/users /home/e-smith/files/users
.
./rutish
./anatiz
./home
./hadas
/home/e-smith/files/users


So when i try
Code: [Select]
pushd /home/e-smith/files/users/; \
for u in `find -maxdepth 1 -type d`; \
do \
mkdir -p $u/Maildir/.LearnAsHam/{cur,new,tmp}; \
chown -R $u:$u $u/Maildir/.LearnAsHam/; \
mkdir -p $u/Maildir/.LearnAsSpam/{cur,new,tmp}; \
chown -R $u:$u $u/Maildir/.LearnAsSpam/; \
mkdir -p $u/Maildir/.LearnInWL/{cur,new,tmp}; \
chown -R $u:$u $u/Maildir/.LearnInWL/; \
done; \
popd

I get

Quote
/home/e-smith/files/users /home/e-smith/files/users
chown: `.:.': invalid user
chown: `.:.': invalid user
chown: `.:.': invalid user
chown: `./rutish:./rutish': invalid user
chown: `./rutish:./rutish': invalid user
chown: `./rutish:./rutish': invalid user
chown: `./anatiz:./anatiz': invalid user
chown: `./anatiz:./anatiz': invalid user
chown: `./anatiz:./anatiz': invalid user
chown: `./home:./home': invalid user
chown: `./home:./home': invalid user
chown: `./home:./home': invalid user
chown: `./hadas:./hadas': invalid user
chown: `./hadas:./hadas': invalid user
chown: `./hadas:./hadas': invalid user
/home/e-smith/files/users
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: cactus on October 03, 2008, 09:16:33 AM
Cactus,
Thank you for helping but this script gets the username as "./user" instead of "user"

here is an explanation :
Code: [Select]
pushd /home/e-smith/files/users/; \
for u in `find  -maxdepth 1 -type d`; \
do \
echo $u
done; \
popd
that gives :
So when i try
Code: [Select]
pushd /home/e-smith/files/users/; \
for u in `find -maxdepth 1 -type d`; \
do \
mkdir -p $u/Maildir/.LearnAsHam/{cur,new,tmp}; \
chown -R $u:$u $u/Maildir/.LearnAsHam/; \
mkdir -p $u/Maildir/.LearnAsSpam/{cur,new,tmp}; \
chown -R $u:$u $u/Maildir/.LearnAsSpam/; \
mkdir -p $u/Maildir/.LearnInWL/{cur,new,tmp}; \
chown -R $u:$u $u/Maildir/.LearnInWL/; \
done; \
popd

I get

Have a look at my post (http://forums.contribs.org/index.php?topic=38891.msg198432#msg198432) again, I almost immediately modified it as I noticed this problem as well. The current version above should not have this issue, furthermore the others should have the proper ownership now, the difference is that the new version does only lists real directories where the find command would also insert the '.' and the './' before, the version with ls should work properly.
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: Amir Inbar on October 03, 2008, 06:04:48 PM
Thank you Cactus - that did the Trick.
I will try to aknowledge the builders of Learn contrib and Sme-unjunkmgr contrib to include the script with the Howto.

Amir
Title: Re: LearnAsNoSpam fpr Sonora Bayesian filtering ?
Post by: cactus on October 04, 2008, 08:47:19 AM
Thank you Cactus - that did the Trick.
I will try to aknowledge the builders of Learn contrib and Sme-unjunkmgr contrib to include the script with the Howto.

Amir
AFAIK they do not need to, they just need to add those directories to the cd /etc/e-smith/skel/user/Maildir/ folder so all new users automatically have these folders created, for existing users something like the script might be necessary but it should be tight into the event system.