Koozali.org: home of the SME Server
Obsolete Releases => SME 7.x Contribs => Topic started by: devtay on February 23, 2009, 03:52:00 PM
-
I am running 7.4 fully updated. I already setup the learnasham and learnasspam scripts and cron jobs per the posts that were previously made. The scripts run every 12 hours and I get the admin emails just like I am supposed to. However, the emails used to show spam being learned but now they don't.
Here is what I have done to solve this problem so far.
I went to a user Maildir folder (not mine) while logged in as root and ran the sa-learn --ham in the .LearnAsHam folder. It said 8 files examined and 8 tokens learned. That means to me that the permissions on the file/folder are set correctly. So, I put some more emails to LearnAsHam in that folder and ran the script /usr/bin/LearnAsSpam.pl and there were no files examined and no tokens learned.
I checked the forum threads and read through what I had already done. As another way of troubleshooting, I chmod 777 one particular user's folder to test and then ran the LearnAsHam.pl script again with the same results. If anyone has any ideas, I would appreciate the help.
As you can probably tell by my feeble troubleshooting attempts, I have exhaused my limited skills.
-
Here is what I have done to solve this problem so far.
I went to a user Maildir folder (not mine) while logged in as root and ran the sa-learn --ham in the .LearnAsHam folder. It said 8 files examined and 8 tokens learned. That means to me that the permissions on the file/folder are set correctly.
Perhaps but the scripts are run as a limited root user, as all scripts are unware of path settings and other environment variables that are configured when you login as root user, so the check s not really watertight.
So, I put some more emails to LearnAsHam in that folder and ran the script /usr/bin/LearnAsSpam.pl and there were no files examined and no tokens learned.
I think that is never going to work as the LearnAsSpam script does not look at the Ham folder AFAIK.
I checked the forum threads and read through what I had already done. As another way of troubleshooting, I chmod 777 one particular user's folder to test and then ran the LearnAsHam.pl script again with the same results. If anyone has any ideas, I would appreciate the help.
Which you should have never done as all users in that group now have full access to those files and directories, as stated many times before: do not mes with ownership and permissions.
As you can probably tell by my feeble troubleshooting attempts, I have exhaused my limited skills.
Next time, do as a antelope instead of wild goose, stop and think that is what antelopes do better than wild gees.
I think you need to see if the logs tell you something that might give you a clue of what might be going on, or not going on at all. Look through the /var/log/cron files at the time the script should run as well as through the /var/log/messages file and post possible clues here so we can try and help you.
-
Cactus,
Thanks for the input. I should have put more in my post. I am running both LearnAsHam.pl and LearnAsSpam.pl so both the LearnAsHam and LearnAsSpam folders will be looked at for each user.
Also, I made up a fake email address to use to test with the chmod. My thinking was if I screwed up the permissions, I could just delete the user and not worry about it. I looked in the messages log, but didn't see anything that seemed to pertain to either LearnAsSpam or LearnAsHam.
I will check through the logs and see what I can find. I am not sure which one so it may take me a bit of time.
-
I will check through the logs and see what I can find. I am not sure which one so it may take me a bit of time.
That is why I gave you these pointers:
Look through the /var/log/cron files at the time the script should run as well as through the /var/log/messages file and post possible clues here so we can try and help you.
-
Cactus,
I didn't see anything in the /var/log/cron file. It is basically showing the same tasks starting over and over (what I would expect). Here is a sample of the part of the file showing the LearnAsSpam and LearnAsHam entries. They are running every 12 hours like I set them up to run.
/var/log/cron
Feb 24 12:01:01 mail crond[4354]: (root) CMD (run-parts /etc/cron.hourly)
Feb 24 12:02:01 mail crond[4428]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:04:01 mail crond[4545]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:05:01 mail crond[4573]: (root) CMD (/bin/nice /sbin/e-smith/awstats-pp -s -n)
Feb 24 12:06:01 mail crond[4659]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:08:01 mail crond[4760]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:10:01 mail crond[4859]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:10:01 mail crond[4863]: (root) CMD (/bin/nice /sbin/e-smith/awstats-pp -s -n)
Feb 24 12:12:01 mail crond[5024]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:14:01 mail crond[5142]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:15:01 mail crond[5399]: (root) CMD ( perl /usr/bin/LearnAsSpam.pl)
Feb 24 12:15:01 mail crond[5400]: (root) CMD (/bin/nice /sbin/e-smith/awstats-pp -s -n)
Feb 24 12:16:01 mail crond[5813]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:18:01 mail crond[7123]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:20:01 mail crond[7741]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:20:01 mail crond[7743]: (root) CMD (/bin/nice /sbin/e-smith/awstats-pp -s -n)
Feb 24 12:20:01 mail crond[7745]: (root) CMD ( perl /usr/bin/LearnAsHam.pl)
Feb 24 12:22:01 mail crond[9152]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:24:01 mail crond[10042]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:25:01 mail crond[10110]: (root) CMD (/bin/nice /sbin/e-smith/awstats-pp -s -n)
Feb 24 12:26:01 mail crond[10187]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:28:01 mail crond[10318]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:30:01 mail crond[10418]: (root) CMD (/usr/sbin/./usbdisks.sh &> /dev/null)
Feb 24 12:30:01 mail crond[10422]: (root) CMD (/bin/nice /sbin/e-smith/awstats-pp -s -n)
In the /var/log/messages file, I see a lot of stuff that I don't really understand (big suprise there, right). Anyways, during the time that the above cron job was running, I found these entries over and over:
Feb 24 12:15:04 mail su(pam_unix)[5435]: session opened for user root by (uid=0)
Feb 24 12:15:18 mail su(pam_unix)[5435]: session closed for user root
There are entries showing the template change when I enable and disable SSH access to the server as well as some slapd entries. Here is an example of the slapd entries in the /var/log/messages file:
Feb 24 10:48:39 mail slapd[4214]: conn=246 fd=7 ACCEPT from IP=192.168.2.70:3243 (IP=0.0.0.0:389)
Feb 24 10:48:39 mail slapd[4214]: conn=246 op=0 BIND dn="" method=128
Feb 24 10:48:39 mail slapd[4214]: conn=246 op=0 RESULT tag=97 err=0 text=
Feb 24 10:48:39 mail slapd[4214]: conn=246 op=1 SRCH base="" scope=0 deref=0 filter="(objectClass=*)"
Feb 24 10:48:39 mail slapd[4214]: conn=246 op=1 SRCH attr=objectClass defaultNamingContext
Feb 24 10:48:39 mail slapd[4214]: conn=246 op=1 SEARCH RESULT tag=101 err=0 nentries=1 text=
Feb 24 10:48:39 mail slapd[4214]: conn=246 op=2 SRCH base="" scope=0 deref=0 filter="(objectClass=*)"
Feb 24 10:48:39 mail slapd[4214]: conn=246 op=2 SRCH attr=objectClass supportedControl supportedCapabilities
Feb 24 10:48:39 mail slapd[4214]: conn=246 op=2 SEARCH RESULT tag=101 err=0 nentries=1 text=
Feb 24 10:48:39 mail slapd[4214]: conn=246 op=3 UNBIND
Feb 24 10:48:39 mail slapd[4214]: conn=246 fd=7 closed
This looks like an ldap client trying to connect to the ldap service on the server. I am going to look at that client email config and see why it is trying to connect. This section repeats throughout the file with the same results. We are not using the ldap portion of SME server.
This is the only section of the /var/log/messages log that shows an error:
Feb 23 15:15:33 mail smbd[5602]: [2009/02/23 15:15:33, 0] lib/util_sock.c:get_peer_addr(1224)
Feb 23 15:15:33 mail smbd[5602]: getpeername failed. Error was Transport endpoint is not connected
Feb 23 15:15:33 mail smbd[5602]: [2009/02/23 15:15:33, 0] lib/util_sock.c:get_peer_addr(1224)
Feb 23 15:15:33 mail smbd[5602]: getpeername failed. Error was Transport endpoint is not connected
Feb 23 15:15:33 mail smbd[5602]: [2009/02/23 15:15:33, 0] lib/access.c:check_access(327)
Feb 23 15:15:33 mail smbd[5602]: [2009/02/23 15:15:33, 0] lib/util_sock.c:get_peer_addr(1224)
Feb 23 15:15:33 mail smbd[5602]: getpeername failed. Error was Transport endpoint is not connected
Feb 23 15:15:33 mail smbd[5602]: Denied connection from (0.0.0.0)
Feb 23 15:15:33 mail smbd[5602]: [2009/02/23 15:15:33, 0] lib/util_sock.c:get_peer_addr(1224)
Feb 23 15:15:33 mail smbd[5602]: getpeername failed. Error was Transport endpoint is not connected
Feb 23 15:15:33 mail smbd[5602]: Connection denied from 0.0.0.0
Feb 23 15:15:33 mail smbd[5602]: [2009/02/23 15:15:33, 0] lib/util_sock.c:write_data(562)
Feb 23 15:15:33 mail smbd[5602]: write_data: write failure in writing to client 192.168.2.89. Error Connection reset by peer
Feb 23 15:15:33 mail smbd[5602]: [2009/02/23 15:15:33, 0] lib/util_sock.c:send_smb(761)
Feb 23 15:15:33 mail smbd[5602]: Error writing 5 bytes to client. -1. (Connection reset by peer)
I don't think this is much help, sorry. Basically what I know at this point is the scripts were working and now they are not. The cron jobs are scheduled and running at their appropriate times. When the scripts are running they are generating the email messages to me. And the email messages show no learning of either Ham nor Spam. Running the scripts as the root (as you already pointed out) isn't a good test, but they don't learn anything either. Running sa-learn in the appropriate directory allows spam/ham to be learned. Here is my sa-learn --dump magic.
0.000 0 3 0 non-token data: bayes db version
0.000 0 70608 0 non-token data: nspam
0.000 0 26429 0 non-token data: nham
0.000 0 155118 0 non-token data: ntokens
0.000 0 1169316761 0 non-token data: oldest atime
0.000 0 1235504770 0 non-token data: newest atime
0.000 0 1235514048 0 non-token data: last journal sync atime
0.000 0 1235499352 0 non-token data: last expiry atime
0.000 0 11059200 0 non-token data: last expire atime delta
0.000 0 14661 0 non-token data: last expire reduction count
The only changes that have been made to the server are the normal updates that come down through yum. There are a few waiting, but I have not had chance to take the server offline long enough to install them. I am still looking through the logs to see if another one will show me what is happening.
-
You should be able to find out more about what is happening by creating a copy of /usr/bin/LearnAsHam.pl or /usr/bin/LearnAsSpam.pl, editing the copy to un-comment the various 'printf' lines (remove the '#' from in front of any line that says 'printf ...'), then running the modified copy.
If you don't use 'vi', you can edit with 'pico', but be sure to use the '-w' argument to prevent the editor from "helpfully" wrapping the long lines in the file onto two lines in the output...
So, I'm recommending something like this:
cd /usr/bin
cp LearnAsHam.pl test.pl
pico -w test.pl
(make the changes described)
/usr/bin/test.pl
see if you get any useful output...
You may also want to open a bug in bugzilla, as we are discouraged from posting shell command instructions here in the wiki.
Finally, you may not see any learned tokens from LearnAsHam.pl unless you are moving email into the LearnAsHam folder from your SPAM folder - the bayesian autolearning already learns any message as Ham with a score below 0.1...
-
Finally, you may not see any learned tokens from LearnAsHam.pl unless you are moving email into the LearnAsHam folder from your SPAM folder - the bayesian autolearning already learns any message as Ham with a score below 0.1...
Interesting. I have been moving ham from my junkmail folder. One other thing that just popped into my mind is when logged into webmail, some users were unable to move/copy emails from junkmail to the LearnAsHam folder. There was an internal server error (sorry I didn't write it down at the time) that would not let them move it. I fixed this by deleting the LearnAsHam folder in the webmail interface and then re-creating it (also in the webmail interface). This is one reason why I thought there was a permissions problem on the folders causing this problem.
I will create the test.pl script and do as you suggest. Thanks for the help. I really appreciate it.
-
Ok, I finally figured this out (actually got time to look into it long enough to figure it out).
First, I had permissions problems with the LearnAsHam and LearnAsSpam folders on just about every user. What I did was use the script I found here to make the folders automatically. Evidently I used an older script. What would happen is a user would try move a file to the LearnAsSpam/LearnAsHam folder and get an error. They wouldn't worry about it and just deleted the file. This was solved by logging in (webmail) as the user, deleteing the folders and then re-creating the folder as the user.
Once the permissions were fixed, the script could see the email files, but only if it was in the cur folder (just like the script is written). This was fixed by explaining to users the difference between how the Maildir stores emails you have read. People were just moving the files with Outlook into the appropriate folder without opening them. So, the script would be looking in the ~/cur when the file was in ~/new.
I was also having trouble with users calling a requested email (like a newsletter/update) a spam email. This whole thing was not the server software fault. I could have fixed this earlier if I learned a little more Perl. Now I have a book and am workin on it. Thanks for the help from those that commented.
-
Please file a bug so your findings can be incorporated the into the documentation or a future release.
-
I kind of have the same issue:
I use Thunderbird as mailclient.
I simply drag and drop into the learnasspam folder, and everything works as expected (It moves it to the /cur folder.)
I travel a lot so i also use webmail a lot.
I move the mails into the learnasspam folder in the Horde interface. Now it does NOT work (Horde moves it to the /new folder.)
Where do i report it as a bug if there is no entry for smeserver-learn in the bugzilla for sme-contribs?
-
I travel a lot so i also use webmail a lot.
I move the mails into the learnasspam folder in the Horde interface. Now it does NOT work (Horde moves it to the /new folder.)
It should find the message if you select it and then mark it as seen. This will get you going for now. I don't know if this is a bug or a feature request. It seems to me that the script should look in both the new and cur folder under LearnAsSpam/LearnAsHam (maybe all three?). Sometimes I read a message first and sometimes I don't. I changed the script on my server so it looks for the message in the /new folder. This is because I use webmail exclusively for email ham/spam learning. With Thunderbird, it may be changing the message to seen because of a feature like autopreview (from outlook). I don't use Thunderbird, so this is speculation on my part. Hopefully some helpful Perl guru will mod the script. :P
-
It should find the message if you select it and then mark it as seen. This will get you going for now.
I know, that's what i do. But that is an extra task which could be done by the script.
I don't know if this is a bug or a feature request. It seems to me that the script should look in both the new and cur folder under LearnAsSpam/LearnAsHam (maybe all three?).
Yes, agree. It should look in both /cur and /new in all three mailfolders.
With Thunderbird, it may be changing the message to seen because of a feature like autopreview (from outlook).?).
No, i don't open them and i never use autoprieview.
Hopefully some helpful Perl guru will mod the script. :P
I looked at the scripts but i am completely lost, so i wait for the guru too.... :|
Per
-
I have been thinking about this and I have an idea to get started. How does this sound as a script layout to everyone (my thinking is geared toward IMAP and not POP):
- Each user must have a "Saved" and a "SPAM - To be deleted" (maybe you could use be the "junkmail" folder if your users do not use Horde) email folder created (simple enough).
- Users must be instructed to:
-- Move all legitimate email to the "Saved" folder or a subfolder under "Saved". This possibly could partially be done with email rules for any sender that is in the user's contact list (if the contacts from Outlook and Thunderbird or whatever could be synced with Horde). This also has the side benefit of keeping the number on emails in the Inbox to a minimum which helps startup performance of many email and webmail clients.
-- Do not create any folders outside of the "Saved" folder.
-- Use the Junk email button in their email client (Outlook and Thunderbird or whatever) on any junk emails and the Not Junk button for legitimate emails.
- The script would run every night and look for any new email files (less than a 24 hours old) in the "Saved" folder and subfolders (cur and new) and feed these emails to the "learn as ham" script.
- The script would also look for any new email files (less than a 24 hours old) in the "Junk" and "Junk Email" (any folders that the email client dumps spam into, cur and new) and feed these emails to the "learn as spam" script and then move them to the "SPAM - To be deleted/cur" folder.
- The script would also look for any email files in the "SPAM - To be deleted" folder that are older than 30 days and delete them.
I believe this has the following benefits:
- User's can use their email client's "Junk" and "Not Junk" icons that they are so familiar with and fond of.
- Both the email client and the server are learning what is spam and ham.
- The users have 30 days to recover legitimate emails flagged as spam.
- If the user does not check their email (if they are on vacation, etc) no action is taken on their inbox emails.
Another option would be to have the "learn as ham" part of the script only look for emails in the "cur" subfolders of the "Inbox" and "Saved" folders. This would assume that the user would not mark a spam email as "Read" without clicking the "Junk" icon.
I found a somewhat similar script at http://www.scalix.com/wiki/index.php?title=HowTos/SpamAssassin (http://www.scalix.com/wiki/index.php?title=HowTos/SpamAssassin)
Please let me know your comments. Thanks.
Kevin