Koozali.org: home of the SME Server

Troubleshooting Help: SME Server Lockups

Offline AaronG

  • ****
  • 90
  • +0/-0
    • http://www.healthinc.com.au
Troubleshooting Help: SME Server Lockups
« on: November 23, 2007, 04:02:06 AM »
Hi Everyone,

My SME Server (latest version) has in the past week locked up. We use the server primarily as a mail server and a user detected this morning that he could not send or recieve email. The resolution thus far has been to forcibly restart it by way of a power cycle.

I am concerned that there may be a sinister reason for my lock-ups. This is where i need some guidance. Where should I start looking to find the cause of my problems. I have looked through a couple of logs... the "MESSAGES" log looks like it may be useful.

Here is an extract from my server "MESSAGES" from today... as you can see it sat idle mostly idle from 1:27AM until 8:13AM where i believe a shutdown was initiated... Does this log show anything out of the ordinary???

Nov 23 01:27:14 linux sshd: refused connect from 202.105.179.9 (202.105.179.9)
Nov 23 04:29:03 linux squid[4031]: storeDirWriteCleanLogs: Starting...
Nov 23 04:29:03 linux squid[4031]:   Finished.  Wrote 1646 entries.
Nov 23 04:29:03 linux squid[4031]:   Took 0.0 seconds (52282.2 entries/sec).
Nov 23 04:29:03 linux squid[4031]: logfileRotate: /var/log/squid/store.log
Nov 23 04:29:03 linux squid[4031]: logfileRotate: /var/log/squid/access.log
Nov 23 08:13:37 linux shutdown: shutting down for system halt
Nov 23 08:13:37 linux init: Switching to runlevel: 0
Nov 23 08:13:38 linux smolt:  succeeded
Nov 23 08:13:38 linux haldaemon: haldaemon -TERM succeeded
Nov 23 08:13:38 linux messagebus: messagebus -TERM succeeded
Nov 23 08:13:40 linux atalk: papd shutdown succeeded
Nov 23 08:13:40 linux atalk:   Unregistering linux:Workstation: succeeded
Nov 23 08:13:40 linux atalk:   Unregistering linux:netatalk: succeeded
Nov 23 08:13:40 linux atalkd[4052]: done
Nov 23 08:13:40 linux atalk: atalkd shutdown succeeded
Nov 23 08:13:40 linux afpd[4513]: shutting down on signal 15
Nov 23 08:13:41 linux atalk: afpd shutdown succeeded
Nov 23 08:13:41 linux atalk: cnid_metad shutdown succeeded
Nov 23 08:13:41 linux acpid: acpid shutdown succeeded
Nov 23 08:14:54 linux syslogd 1.4.1: restart.
Nov 23 08:14:54 linux syslog: syslogd startup succeeded
Nov 23 08:14:54 linux kernel: klogd 1.4.1, log source = /proc/kmsg started.
Nov 23 08:14:54 linux syslog: klogd startup succeeded
Nov 23 08:14:54 linux kernel: Inspecting /boot/System.map-2.6.9-55.0.12.ELsmp
Nov 23 08:14:54 linux kernel: Loaded 24631 symbols from /boot/System.map-2.6.9-55.0.12.ELsmp.
Nov 23 08:14:54 linux kernel: Symbols match kernel version 2.6.9.
Nov 23 08:14:54 linux kernel: No module symbols loaded - kernel modules not enabled.
Nov 23 08:14:54 linux kernel: Linux version 2.6.9-55.0.12.ELsmp (mockbuild@builder6.centos.org) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-8)) #1 SMP Fri Nov 2 11:19:08 EDT 2007
Nov 23 08:14:54 linux kernel: BIOS-provided physical RAM map:
Nov 23 08:14:54 linux kernel:  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
Nov 23 08:14:54 linux kernel:  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
Nov 23 08:14:54 linux kernel:  BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
Nov 23 08:14:54 linux kernel:  BIOS-e820: 0000000000100000 - 000000001f7a0000 (usable)
Nov 23 08:14:54 linux kernel:  BIOS-e820: 000000001f7a0000 - 000000001f7ae000 (ACPI data)
Nov 23 08:14:54 linux kernel:  BIOS-e820: 000000001f7ae000 - 000000001f7e0000 (ACPI NVS)
Nov 23 08:14:54 linux kernel:  BIOS-e820: 000000001f7e0000 - 000000001f800000 (reserved)
Nov 23 08:14:54 linux kernel:  BIOS-e820: 00000000ffb80000 - 0000000100000000 (reserved)
Nov 23 08:14:54 linux kernel: 0MB HIGHMEM available.



Where else can I look? Any advice is appreciated... oh and yes incase you didn't notice I am somewhat of a Linux noob. But I do know how to get around in a SSH session etc.


Thanks in Advance
Aaron
...

Offline raem

  • *
  • 3,972
  • +4/-0
Re: Troubleshooting Help: SME Server Lockups
« Reply #1 on: November 23, 2007, 06:18:42 AM »
AaronG

Quote
I am concerned that there may be a sinister reason for my lock-ups.

It may be a case of not enough RAM & too much SPAM

You do not tell us anything about your server spec, configuration details, how much mail you receive, what settings are enabled for email filtering, spam, antivirus, RBL's, what other apps you have installed etc etc etc.
...

Offline AaronG

  • ****
  • 90
  • +0/-0
    • http://www.healthinc.com.au
Re: Troubleshooting Help: SME Server Lockups
« Reply #2 on: November 23, 2007, 06:43:05 AM »
AaronG
You do not tell us anything about your server spec, configuration details, how much mail you receive, what settings are enabled for email filtering, spam, antivirus, RBL's, what other apps you have installed etc etc etc.

Hi Ray - I did not know if this was information was relevant yet, I thought perhaps some viewing of logs may be more useful at first.

Pentium 4     2.8 -- 3Ghz
RAM  1GB
160GB RAID1 (Hardware Raid)
SPAM Sensitivity = 5

I have the following Contribs installed:

sme7admin
Dungog Contribs for user-panel and vacation message etc


Note: This server has worked for the past 12 months without any hiccups. I was very surprised when it started playing up.

I hope this information is adequate.

I am also replacing the UPS to rule out any power issues. It locked up again this afternoon but was Powered Off. Sounds like it MAY be a hardware issue (?faulty Power Supply?)... Are there any other logs except for the messages log that I should be looking in?

If the problem persists I will try replacing the power supply.

Thanks again
Aaron

Note: Here are some log extracts from messages.log from the latest lockup at 2pm today... the server turned its self off completely... As you can see 30mins before there was a webmail session but I doubt this had anything to do with the freeze because there are 30 mins in between.


Nov 23 14:09:37 linux HORDE[12435]: [imp] Login success for <email>@addressreplaced.com.au [xxx.214.x.xxx] to {localhost:143} [on line 154 of "/home/httpd/html/horde/imp/redirect.php"]
Nov 23 14:09:38 linux slapd[3801]: conn=3 fd=7 ACCEPT from IP=127.0.0.1:33007 (IP=0.0.0.0:389)
Nov 23 14:09:38 linux slapd[3801]: conn=3 op=0 BIND dn="" method=128
Nov 23 14:09:38 linux slapd[3801]: conn=3 op=0 RESULT tag=97 err=0 text=
Nov 23 14:09:38 linux slapd[3801]: conn=3 op=1 UNBIND
Nov 23 14:09:38 linux slapd[3801]: conn=3 fd=7 closed
Nov 23 14:09:45 linux HORDE[12436]: [imp] Logout for <email>@addressreplaced.com.au [xxx.214.x.xxx] from {localhost:143} [on line 42 of "/home/httpd/html/horde/imp/login.php"]
Nov 23 14:45:37 linux syslogd 1.4.1: restart.
Nov 23 14:45:37 linux syslog: syslogd startup succeeded
Nov 23 14:45:37 linux kernel: klogd 1.4.1, log source = /proc/kmsg started.
Nov 23 14:45:37 linux syslog: klogd startup succeeded
Nov 23 14:45:37 linux kernel: Inspecting /boot/System.map-2.6.9-55.0.12.ELsmp
Nov 23 14:45:37 linux kernel: Loaded 24631 symbols from /boot/System.map-2.6.9-55.0.12.ELsmp.
Nov 23 14:45:37 linux kernel: Symbols match kernel version 2.6.9.
Nov 23 14:45:37 linux kernel: No module symbols loaded - kernel modules not enabled.
Nov 23 14:45:37 linux kernel: Linux version 2.6.9-55.0.12.ELsmp (mockbuild@builder6.centos.org) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-8)) #1 SMP Fri Nov 2 11:19:08 EDT 2007
Nov 23 14:45:37 linux kernel: BIOS-provided physical RAM map:
[/color]
« Last Edit: November 23, 2007, 06:54:28 AM by AaronG »
...

Offline raem

  • *
  • 3,972
  • +4/-0
Re: Troubleshooting Help: SME Server Lockups
« Reply #3 on: November 23, 2007, 06:52:43 AM »
AaronG

Quote
I did not know if this was information was relevant yet...

As we know nothing about your server I think it's relevant.

Some more answers to go yet...
configuration details ? ie gateway, server with router, network configuration ?
how much mail you receive ?
what settings are enabled for antivirus ?
what settings for RBL's ?


Quote
sme7admin

Can cause problems if not configured appropriately, check all default settings as these can be overzealous.


Quote
It locked up again this afternoon but was Powered Off.

Not sure what that means, it locked up when it was powered off ???


You need to see what it happening when it locks up and resist powering it off

top -i
htop
ps -aux

Does the graphing in sme7admin reveal anything ie RAM usage etc
...

Offline AaronG

  • ****
  • 90
  • +0/-0
    • http://www.healthinc.com.au
Re: Troubleshooting Help: SME Server Lockups
« Reply #4 on: November 23, 2007, 07:07:14 AM »
AaronG
Some more answers to go yet...
configuration details ? ie gateway, server with router, network configuration ?
how much mail you receive ?
what settings are enabled for antivirus ?
what settings for RBL's ?

It is configured as SERVER ONLY

Network config
Server Mode                           serveronly
Local IP address / subnet mask   192.168.100.10/255.255.255.0
Gateway                                  192.168.100.40
Additional local networks      192.168.11.0/255.255.255.0 via 192.168.100.2
                                           192.168.100.0/255.255.255.0
DHCP server                         disabled


Here are the basic statistics of my mail server:


Completed messages: 49803
Recipients for completed messages: 58010
Total delivery attempts for completed messages: 58186
Average delivery attempts per completed message: 1.16832
Bytes in completed messages: 2.75438e+09
Bytes weighted by success: 3.17266e+09
Average message qtime (s): 20.6894

Total delivery attempts: 58189
  success: 57972
  failure: 39
  deferral: 178
Total ddelay (s): 427071.929716
Average ddelay per success (s): 7.366866
Total xdelay (s): 52653.169541
Average xdelay per delivery attempt (s): 0.904865
Time span (days): 25.9911
Average concurrency: 0.0234469



Virus Scanning
is enabled...
Sort Spam Mail into JUNK Folder is enabled...
Modify Spam Mail Subject is enabled.

I have not made any changes to RBL's


SME7Admin
Settings look ok.... only send warnings if over 30 incoming and 30 outgoing emails in 5 minute period


It locked up but was powered off
Unfortunately I am working remotely and the local users turned it off without my knowledge. Sorry about my confusing explanation.

This morning it seemed "locked up" as it was responding to PING etc but not sending mail etc. My colleague just powered it off to fix it.

BUT this afternoon it definitely turned it's self off. ??Fault Power Supply??



top -i
htop
ps -aux

I should run this command from the console BEFORE the system is turned off and report the result back here??


RAM Usage sme7admin
Nothing looks out of the ordinary here. CPU usage is fine also with spikes only occuring during VIRUS scanning @ mid night each night.


THANKS!
« Last Edit: November 23, 2007, 07:10:55 AM by AaronG »
...

Offline raem

  • *
  • 3,972
  • +4/-0
Re: Troubleshooting Help: SME Server Lockups
« Reply #5 on: November 23, 2007, 07:23:55 AM »
AaronG

Quote
I have not made any changes to RBL's

Spamassassin is processor and memory intensive.
Enabling RBLs will reduce load on the server. In lower powered systems or even high power systems with a lot of incoming spam, the load caused by spamassassin can cause your system to lockup.
Check the email section of the FAQ for details & my old Howto.


Quote
This morning it seemed "locked up" as it was responding to PING etc but not sending mail etc. My colleague just powered it off to fix it.
BUT this afternoon it definitely turned it's self off. ??Fault Power Supply??

Check if there are any cron jobs related to this.
Check BIOS and network card settings that may be causing power down.
Faulty hardware is certainly a possibility

The first comment suggests process overload,  but the second suggests faulty equipment.

You really need to see what is going on in a lockup situation, what processes are running & so on.

Quote
top -i
htop
ps -aux

I should run this command from the console BEFORE the system is turned off and report the result back here??

Yes

Enable conservative RBL's and disable the system virus scan & see if that makes any difference.
Temporarily disable spam filtering too, if you can tolerate spam for a day or two although if you enable RBLs that will control most of the spam anyway, without loading the processor/memory.

...

Offline AaronG

  • ****
  • 90
  • +0/-0
    • http://www.healthinc.com.au
Re: Troubleshooting Help: SME Server Lockups
« Reply #6 on: November 23, 2007, 07:32:42 AM »
Thanks Ray. I will try as you suggested:
  • turn of spamassassin
  • enable conservative RBL's
  • try and troubleshoot the server before powering off by running those commands

If it powers itself off again I will also try replacing the Powersupply.

Thanks for your help Ray... It is really appreciated.

Thanks
Aaron

...

Offline raem

  • *
  • 3,972
  • +4/-0
Re: Troubleshooting Help: SME Server Lockups
« Reply #7 on: November 23, 2007, 07:41:51 AM »
AaronG

The system virus scan has been known to clash with backup jobs too, so (temporarily) disable the overnight system virus scan.
...