Koozali.org: home of the SME Server

[Announce]: DSPAM integration into SpamAssassin

Offline Knuddi

  • *
  • 540
  • +0/-0
    • http://www.scanmailx.com
[Announce]: DSPAM integration into SpamAssassin
« on: January 04, 2010, 03:29:15 PM »
I have for a long time used SME's built-in SpamAssassin with a few custom additions to get rid of most of my spam. Recently I noticed that the DSPAM project was alive again and have since heard from many sources that it did a great job for them. I did not want to get rid of SpamAssassin but wanted to combine the strength of the two spam engines. One of the "weaknesses" of DSPAM is that it requires a significant amount of training before it provides reliable result - this training I am using SpamAssassin scoring to provide.

I have therefore made this DSPAM plug-in which works in co-operation with SpamAssassin to get rid of even more spam.

See the wiki for details on installations.
http://wiki.contribs.org/DSPAM

Enjoy,
Jesper

« Last Edit: January 04, 2010, 08:50:08 PM by Knuddi »

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: [Announce]: DSPAM integration
« Reply #1 on: January 04, 2010, 03:41:22 PM »
Do you have (any pointers to) figures comparing the two? I am curious to know if there are fields where one is out performing the other.
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline Knuddi

  • *
  • 540
  • +0/-0
    • http://www.scanmailx.com
Re: [Announce]: DSPAM integration
« Reply #2 on: January 04, 2010, 03:53:49 PM »
Just for clarification, then this package does not disable or reduce functionality of SME's SpamAssassin. It only adds more functionality and should in general provide higher level spam capture.

DSPAM provides a much more advanced statistical system vs. SpamAssassin Bayes and what I can see so far from my own servers is that the DSPAM_SPAM and DSPAM_HAM tags are very well aligned with my expectations. I expect to see more spam being filtered from DSPAM statistical system vs. SA's Bayes.

The Author of DSPAM has some statements here (rather old though):
http://lists.slug.org.au/archives/slug/2004/02/msg00555.html

Offline Knuddi

  • *
  • 540
  • +0/-0
    • http://www.scanmailx.com
Re: [Announce]: DSPAM integration into SpamAssassin
« Reply #3 on: January 12, 2010, 02:01:34 PM »
The final release of DSPAM (previous post was RC2) has just been released and I have updated the wiki accordingly (http://wiki.contribs.org/DSPAM). If you want to track how SpamAssassin is using its various rules (incl. DSPAM) you can see stats using this little tool.

Code: [Select]
# cd /usr/bin/
# wget http://sme.swerts-knudsen.dk/downloads/DSPAM/sa-stats
# chmod +x sa-stats
# ./sa-stats
If will show which rules were fires for both spam and ham - here is an example from my server

Code: [Select]
Email:     2895  Autolearn:  2591  AvgScore:  22.54  AvgScanTime:  3.74 sec
Spam:      2165  Autolearn:  2075  AvgScore:  33.86  AvgScanTime:  3.44 sec
Ham:        730  Autolearn:   516  AvgScore: -11.05  AvgScanTime:  4.64 sec

Time Spent Running SA:         3.01 hours
Time Spent Processing Spam:    2.07 hours
Time Spent Processing Ham:     0.94 hours

TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM
----------------------------------------------------------------------
   1    RCVD_IN_APEWSL2                  1809    67.05   83.56   18.08
   2    RCVD_IN_BRBL                     1789    62.04   82.63    0.96
   3    RAZOR2_CHECK                     1786    61.93   82.49    0.96
   4    BAYES_99                         1780    61.49   82.22    0.00
   5    RAZOR2_CF_RANGE_51_100           1759    61.00   81.25    0.96
   6    DIGEST_MULTIPLE                  1656    57.37   76.49    0.68
   7    DCC_CHECK                        1567    56.93   72.38   11.10
   8    URIBL_BLACK                      1528    53.26   70.58    1.92
   9    RCVD_IN_XBL                      1494    51.64   69.01    0.14
  10    RAZOR2_CF_RANGE_E8_51_100        1485    51.47   68.59    0.68
  11    RCVD_IN_JMF_BL                   1484    51.68   68.55    1.64
  12    PYZOR_CHECK                      1445    50.36   66.74    1.78
  13    RCVD_IN_PBL                      1413    48.95   65.27    0.55
  14    URIBL_JP_SURBL                   1347    46.53   62.22    0.00
  15    URIBL_SBL                        1320    45.60   60.97    0.00
  16    URIBL_WS_SURBL                   1294    44.70   59.77    0.00
  17    DSPAM_SPAM_99                    1147    39.62   52.98    0.00
  18    SEM_URIRED                       1135    39.79   52.42    2.33
  19    SEM_URI                          1002    34.78   46.28    0.68
  20    HTML_MESSAGE                      981    52.92   45.31   75.48
----------------------------------------------------------------------

TOP HAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM
----------------------------------------------------------------------
   1    BAYES_00                          715    25.98    1.71   97.95
   2    DSPAM_HAM_99                      696    25.01    1.29   95.34
   3    HTML_MESSAGE                      551    52.92   45.31   75.48
   4    SPF_PASS                          329    13.68    3.09   45.07
   5    RCVD_IN_JMF_W                     145     5.11    0.14   19.86
   6    RCVD_IN_APEWSL2                   132    67.05   83.56   18.08
   7    MIME_HTML_ONLY                    131    14.82   13.76   17.95
   8    SPF_HELO_PASS                      96     3.52    0.28   13.15
   9    DCC_CHECK                          81    56.93   72.38   11.10
  10    RCVD_IN_DNSWL_MED                  63     2.18    0.00    8.63
  11    RCVD_IN_DNSWL_LOW                  62     2.14    0.00    8.49
  12    SARE_SUB_ENC_UTF8                  59     3.56    2.03    8.08
  13    MPART_ALT_DIFF                     55     2.63    0.97    7.53
  14    USER_IN_WHITELIST                  48     1.66    0.00    6.58
  15    MIME_HTML_MOSTLY                   43     2.00    0.69    5.89
  16    MIME_QP_LONG_LINE                  31     2.56    1.99    4.25
  17    EXTRA_MPART_TYPE                   31     1.52    0.60    4.25
  18    MIME_BASE64_BLANKS                 31     1.07    0.00    4.25
  19    HTML_IMAGE_RATIO_06                29     1.04    0.05    3.97
  20    MISSING_MID                        28     1.52    0.74    3.84
----------------------------------------------------------------------
« Last Edit: January 12, 2010, 02:33:49 PM by Knuddi »

Offline Franco

  • *
  • 1,171
  • +0/-0
    • http://contribs.org
Re: [Announce]: DSPAM integration into SpamAssassin
« Reply #4 on: January 12, 2010, 02:22:48 PM »
Code: [Select]
wget http://sme.swerts-knudsen.dk/downloads/DSPAM/sa-stats.pl
--11:18:45--  http://sme.swerts-knudsen.dk/downloads/DSPAM/sa-stats.pl
           => `sa-stats.pl'
Resolving sme.swerts-knudsen.dk... 93.164.10.182
Connecting to sme.swerts-knudsen.dk|93.164.10.182|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
11:18:46 ERROR 403: Forbidden
:(

Offline Knuddi

  • *
  • 540
  • +0/-0
    • http://www.scanmailx.com
Re: [Announce]: DSPAM integration into SpamAssassin
« Reply #5 on: January 12, 2010, 02:33:15 PM »
Changed URL - server didn't want to provide .pl

# cd /usr/bin/
# wget http://sme.swerts-knudsen.dk/downloads/DSPAM/sa-stats
# chmod +x sa-stats
# ./sa-stats

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: [Announce]: DSPAM integration into SpamAssassin
« Reply #6 on: January 12, 2010, 08:00:03 PM »
# wget http://sme.swerts-knudsen.dk/downloads/DSPAM/sa-stats
# chmod +x sa-stats
# ./sa-stats

Instead, I'd suggest that people obtain the script from its original source:

wget -O sa-stats http://www.rulesemporium.com/programs/sa-stats-1.0.txt
./sa-stats --logdir /var/log/spamd

Knuddi, do you plan to package this in an rpm? It's considered bad practice to install files in /usr/bin which are not version controlled via rpm.

Offline kevinb

  • *
  • 237
  • +0/-0
Re: [Announce]: DSPAM integration into SpamAssassin
« Reply #7 on: January 12, 2010, 08:29:31 PM »
This is very interesting ... thanks Jesper.
 
Does DSPAM continually learn? How do/can the users train it?

Offline Knuddi

  • *
  • 540
  • +0/-0
    • http://www.scanmailx.com
Re: [Announce]: DSPAM integration into SpamAssassin
« Reply #8 on: January 12, 2010, 10:28:58 PM »
DSPAM currently learns from the overall SpamAssassin (SA) result. This means that if DSPAM thinks that its HAM and SA gives a score above 9 (configurable) and its rejected, then its re-trained. Same goes for reverse situations. Right now there are no methods for users to train the filter, but I will update sme-unkjunkmgr (http://wiki.contribs.org/Sme-unjunkmgr) so to also trains DSPAM. The LearnAsSpam contribs could also be updated to train DSPAM - but the latter is out of my hands these days.


Offline kevinb

  • *
  • 237
  • +0/-0
Re: [Announce]: DSPAM integration into SpamAssassin
« Reply #9 on: January 13, 2010, 03:03:33 AM »
Thanks for the response Jesper. I guess I do not understand DSPAM yet but my first impression is if it learns from SA then it will never be better than SA.
 
If users train SA via "learnasspam / learnasham" will DSPAM be taught also?

Offline Knuddi

  • *
  • 540
  • +0/-0
    • http://www.scanmailx.com
Re: [Announce]: DSPAM integration into SpamAssassin
« Reply #10 on: January 13, 2010, 10:38:50 AM »
That is to some extend a correct observation but since the DSPAM scores are significant for 90% and 99% certainty scored by DSPAM it will have the "strength" to push the overall score to either direction. It would clearly be stronger if the user can train the filter as well and report false positives (unjunkmgr or LearnAsSpam).

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: [Announce]: DSPAM integration into SpamAssassin
« Reply #11 on: January 13, 2010, 09:17:12 PM »
It would clearly be stronger if the user can train the filter as well and report false positives (unjunkmgr or LearnAsSpam).
As far as I read from the snippets on the net the learning effect is what makes DSPAM that good and superior (in the eyes of it's fathers) to SpamAssassin. The way you have set it up makes it still dependent on SpamAssassin's general rules which is not preferable I think as DSPAM is capable of tracking SPAM for individual users.

See for instance the reply of UbuWu, apparently closely related to DSPAM, on UbuntuForums: http://ubuntuforums.org/archive/index.php/t-77766.html
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline Knuddi

  • *
  • 540
  • +0/-0
    • http://www.scanmailx.com
Re: [Announce]: DSPAM integration into SpamAssassin
« Reply #12 on: January 19, 2010, 01:19:24 PM »
DSPAM needs to be feed min. 2500 spam and ham email to have a basic understanding of what is what. These emails needs to be either manually selected (not a job for me) or automatically classified by SpamAssassin (little easier). When the basic training then has completed the filter needs to be adjusted/trimmed continuously for false positive/negatives and here the DSPAM GUI/Quarantine could be very useful. The SME way would be to use the LearnAsSpam or UnjunkMgr in my opinion.