Koozali.org: home of the SME Server

Learn contribs not working ?

Offline femc

  • ***
  • 54
  • +0/-0
Learn contribs not working ?
« on: April 08, 2009, 01:39:58 PM »
I have the feeling the Learn contrib is not working sufficient. Questions :

1) The LearnAsSpam - directory I need to set up manually via an imap client. correct ?

2) 'db configuration show LearnAsSpam' gives
deleteafterlearn=enabled
dir=LearnAsSpam
status=enabled
tag=[SPAM]
Question : do I have for dir= give the full path name ? or is it ok as above ?

hdmueller

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: Learn contribs not working ?
« Reply #1 on: April 08, 2009, 04:05:36 PM »
I have the feeling the Learn contrib is not working sufficient. Questions :

1) The LearnAsSpam - directory I need to set up manually via an imap client. correct ?

2) 'db configuration show LearnAsSpam' gives
deleteafterlearn=enabled
dir=LearnAsSpam
status=enabled
tag=[SPAM]
Question : do I have for dir= give the full path name ? or is it ok as above ?

hdmueller
This should be correct, why do you think the contrib is not working?
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline femc

  • ***
  • 54
  • +0/-0
Re: Learn contribs not working ?
« Reply #2 on: April 08, 2009, 04:33:40 PM »
why i think the Learn contribs is not working : i keep on putting the same spam em ( eg newsletters@response.enterprise-alerts.com ) into the LearnAsSpam mailbox, but it is just not being spamtagged the next time. The box is running since quite some time and should have received by far more than 500 LearnAsSpam emails. I just have the feeling it is not learning ....

fyi - if i check the LearnAsSpam mbox the next day - it is empty, means it is getting deleted.

hdmueller

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: Learn contribs not working ?
« Reply #3 on: April 08, 2009, 04:47:48 PM »
fyi - if i check the LearnAsSpam mbox the next day - it is empty, means it is getting deleted.
Then it is working as expected.
why i think the Learn contribs is not working : i keep on putting the same spam em ( eg newsletters@response.enterprise-alerts.com ) into the LearnAsSpam mailbox, but it is just not being spamtagged the next time. The box is running since quite some time and should have received by far more than 500 LearnAsSpam emails. I just have the feeling it is not learning ....
I think you will need to read up on how spam filtering works, especially on the topic of bayesian filtering. In short each message is fed to the LearnAsSpam script, it will learn it's tokens from the message. These tokens are then used by the bayesian filter to test your e-mail. If the bayesian filter considers it a positive it will add the score assigned to this test to the total spam score.

Normally the bayesian filter needs a minimum of 200 messages (IIRC) before it will even start testing on this filter.

In general as long as the total spam score is not above your spam treshold it will not be tagged as spam, since the bayesian filter implemented by the learn scripts is just a test it will not block spam, it will score it if the minimum number of messages required is reached, which should be visible in the mail headers as this test should then also be listed in the tag that holds the test ran in the header.

You can use the following command to judge if enough messages have bean learnt:
Code: [Select]
sa-learn --dump magicwhich should look something like this:
Code: [Select]
[root@sme74test ~]# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0       1635          0  non-token data: nspam
0.000          0         25          0  non-token data: nham
0.000          0     132133          0  non-token data: ntokens
0.000          0 1163576731          0  non-token data: oldest atime
0.000          0 1239124533          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire reduction count
[root@sme74test ~]#

The nspam entry lists the number of spam messages learned.
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline femc

  • ***
  • 54
  • +0/-0
Re: Learn contribs not working ?
« Reply #4 on: April 08, 2009, 06:00:39 PM »
thks for answer - i get the following reply :
 sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0        142          0  non-token data: nspam
0.000          0         42          0  non-token data: nham
0.000          0      24245          0  non-token data: ntokens
0.000          0 1232817313          0  non-token data: oldest atime
0.000          0 1239197417          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire reduction count

difficult for me to interprete the script - you think the Learn is working ?

hdmueller