Koozali.org: home of the SME Server
Obsolete Releases => SME 7.x Contribs => Topic started by: Knuddi on January 04, 2010, 03:29:15 PM
-
I have for a long time used SME's built-in SpamAssassin with a few custom additions to get rid of most of my spam. Recently I noticed that the DSPAM project was alive again and have since heard from many sources that it did a great job for them. I did not want to get rid of SpamAssassin but wanted to combine the strength of the two spam engines. One of the "weaknesses" of DSPAM is that it requires a significant amount of training before it provides reliable result - this training I am using SpamAssassin scoring to provide.
I have therefore made this DSPAM plug-in which works in co-operation with SpamAssassin to get rid of even more spam.
See the wiki for details on installations.
http://wiki.contribs.org/DSPAM
Enjoy,
Jesper
-
Do you have (any pointers to) figures comparing the two? I am curious to know if there are fields where one is out performing the other.
-
Just for clarification, then this package does not disable or reduce functionality of SME's SpamAssassin. It only adds more functionality and should in general provide higher level spam capture.
DSPAM provides a much more advanced statistical system vs. SpamAssassin Bayes and what I can see so far from my own servers is that the DSPAM_SPAM and DSPAM_HAM tags are very well aligned with my expectations. I expect to see more spam being filtered from DSPAM statistical system vs. SA's Bayes.
The Author of DSPAM has some statements here (rather old though):
http://lists.slug.org.au/archives/slug/2004/02/msg00555.html
-
The final release of DSPAM (previous post was RC2) has just been released and I have updated the wiki accordingly (http://wiki.contribs.org/DSPAM). If you want to track how SpamAssassin is using its various rules (incl. DSPAM) you can see stats using this little tool.
# cd /usr/bin/
# wget http://sme.swerts-knudsen.dk/downloads/DSPAM/sa-stats
# chmod +x sa-stats
# ./sa-stats
If will show which rules were fires for both spam and ham - here is an example from my server
Email: 2895 Autolearn: 2591 AvgScore: 22.54 AvgScanTime: 3.74 sec
Spam: 2165 Autolearn: 2075 AvgScore: 33.86 AvgScanTime: 3.44 sec
Ham: 730 Autolearn: 516 AvgScore: -11.05 AvgScanTime: 4.64 sec
Time Spent Running SA: 3.01 hours
Time Spent Processing Spam: 2.07 hours
Time Spent Processing Ham: 0.94 hours
TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM
----------------------------------------------------------------------
1 RCVD_IN_APEWSL2 1809 67.05 83.56 18.08
2 RCVD_IN_BRBL 1789 62.04 82.63 0.96
3 RAZOR2_CHECK 1786 61.93 82.49 0.96
4 BAYES_99 1780 61.49 82.22 0.00
5 RAZOR2_CF_RANGE_51_100 1759 61.00 81.25 0.96
6 DIGEST_MULTIPLE 1656 57.37 76.49 0.68
7 DCC_CHECK 1567 56.93 72.38 11.10
8 URIBL_BLACK 1528 53.26 70.58 1.92
9 RCVD_IN_XBL 1494 51.64 69.01 0.14
10 RAZOR2_CF_RANGE_E8_51_100 1485 51.47 68.59 0.68
11 RCVD_IN_JMF_BL 1484 51.68 68.55 1.64
12 PYZOR_CHECK 1445 50.36 66.74 1.78
13 RCVD_IN_PBL 1413 48.95 65.27 0.55
14 URIBL_JP_SURBL 1347 46.53 62.22 0.00
15 URIBL_SBL 1320 45.60 60.97 0.00
16 URIBL_WS_SURBL 1294 44.70 59.77 0.00
17 DSPAM_SPAM_99 1147 39.62 52.98 0.00
18 SEM_URIRED 1135 39.79 52.42 2.33
19 SEM_URI 1002 34.78 46.28 0.68
20 HTML_MESSAGE 981 52.92 45.31 75.48
----------------------------------------------------------------------
TOP HAM RULES FIRED
----------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM
----------------------------------------------------------------------
1 BAYES_00 715 25.98 1.71 97.95
2 DSPAM_HAM_99 696 25.01 1.29 95.34
3 HTML_MESSAGE 551 52.92 45.31 75.48
4 SPF_PASS 329 13.68 3.09 45.07
5 RCVD_IN_JMF_W 145 5.11 0.14 19.86
6 RCVD_IN_APEWSL2 132 67.05 83.56 18.08
7 MIME_HTML_ONLY 131 14.82 13.76 17.95
8 SPF_HELO_PASS 96 3.52 0.28 13.15
9 DCC_CHECK 81 56.93 72.38 11.10
10 RCVD_IN_DNSWL_MED 63 2.18 0.00 8.63
11 RCVD_IN_DNSWL_LOW 62 2.14 0.00 8.49
12 SARE_SUB_ENC_UTF8 59 3.56 2.03 8.08
13 MPART_ALT_DIFF 55 2.63 0.97 7.53
14 USER_IN_WHITELIST 48 1.66 0.00 6.58
15 MIME_HTML_MOSTLY 43 2.00 0.69 5.89
16 MIME_QP_LONG_LINE 31 2.56 1.99 4.25
17 EXTRA_MPART_TYPE 31 1.52 0.60 4.25
18 MIME_BASE64_BLANKS 31 1.07 0.00 4.25
19 HTML_IMAGE_RATIO_06 29 1.04 0.05 3.97
20 MISSING_MID 28 1.52 0.74 3.84
----------------------------------------------------------------------
-
wget http://sme.swerts-knudsen.dk/downloads/DSPAM/sa-stats.pl
--11:18:45-- http://sme.swerts-knudsen.dk/downloads/DSPAM/sa-stats.pl
=> `sa-stats.pl'
Resolving sme.swerts-knudsen.dk... 93.164.10.182
Connecting to sme.swerts-knudsen.dk|93.164.10.182|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
11:18:46 ERROR 403: Forbidden
:(
-
Changed URL - server didn't want to provide .pl
# cd /usr/bin/
# wget http://sme.swerts-knudsen.dk/downloads/DSPAM/sa-stats
# chmod +x sa-stats
# ./sa-stats
-
# wget http://sme.swerts-knudsen.dk/downloads/DSPAM/sa-stats
# chmod +x sa-stats
# ./sa-stats
Instead, I'd suggest that people obtain the script from its original source:
wget -O sa-stats http://www.rulesemporium.com/programs/sa-stats-1.0.txt
./sa-stats --logdir /var/log/spamd
Knuddi, do you plan to package this in an rpm? It's considered bad practice to install files in /usr/bin which are not version controlled via rpm.
-
This is very interesting ... thanks Jesper.
Does DSPAM continually learn? How do/can the users train it?
-
DSPAM currently learns from the overall SpamAssassin (SA) result. This means that if DSPAM thinks that its HAM and SA gives a score above 9 (configurable) and its rejected, then its re-trained. Same goes for reverse situations. Right now there are no methods for users to train the filter, but I will update sme-unkjunkmgr (http://wiki.contribs.org/Sme-unjunkmgr) so to also trains DSPAM. The LearnAsSpam contribs could also be updated to train DSPAM - but the latter is out of my hands these days.
-
Thanks for the response Jesper. I guess I do not understand DSPAM yet but my first impression is if it learns from SA then it will never be better than SA.
If users train SA via "learnasspam / learnasham" will DSPAM be taught also?
-
That is to some extend a correct observation but since the DSPAM scores are significant for 90% and 99% certainty scored by DSPAM it will have the "strength" to push the overall score to either direction. It would clearly be stronger if the user can train the filter as well and report false positives (unjunkmgr or LearnAsSpam).
-
It would clearly be stronger if the user can train the filter as well and report false positives (unjunkmgr or LearnAsSpam).
As far as I read from the snippets on the net the learning effect is what makes DSPAM that good and superior (in the eyes of it's fathers) to SpamAssassin. The way you have set it up makes it still dependent on SpamAssassin's general rules which is not preferable I think as DSPAM is capable of tracking SPAM for individual users.
See for instance the reply of UbuWu, apparently closely related to DSPAM, on UbuntuForums: http://ubuntuforums.org/archive/index.php/t-77766.html
-
DSPAM needs to be feed min. 2500 spam and ham email to have a basic understanding of what is what. These emails needs to be either manually selected (not a job for me) or automatically classified by SpamAssassin (little easier). When the basic training then has completed the filter needs to be adjusted/trimmed continuously for false positive/negatives and here the DSPAM GUI/Quarantine could be very useful. The SME way would be to use the LearnAsSpam or UnjunkMgr in my opinion.