Koozali.org: home of the SME Server
Obsolete Releases => SME Server 9.x => Topic started by: didwedo on February 21, 2018, 08:31:18 AM
-
Hello everyone,
our server crash every Saturday morning for no apparent reason.
rebooting the bios indicates overheating temperature.
looking in the logs I have this, an idea ...
Thx for your help.
see u - chris
IN THE MESSAGES
Feb 17 00:38:14 sme-hp init: tty (/dev/tty1) main process (3296) killed by TERM signal
Feb 17 00:38:14 sme-hp init: tty (/dev/tty2) main process (3298) killed by TERM signal
Feb 17 00:38:14 sme-hp init: tty (/dev/tty3) main process (3300) killed by TERM signal
Feb 17 00:38:14 sme-hp init: tty (/dev/tty4) main process (3303) killed by TERM signal
Feb 17 00:38:14 sme-hp init: tty (/dev/tty5) main process (3306) killed by TERM signal
Feb 17 00:38:14 sme-hp init: tty (/dev/tty6) main process (3309) killed by TERM signal
Feb 17 00:38:15 sme-hp acpid: exiting
Feb 17 00:38:16 sme-hp console-kit-daemon[8639]: WARNING: no sender
Feb 17 00:38:16 sme-hp init: Disconnected from system bus
Feb 17 00:38:16 sme-hp kernel: Kernel logging (proc) stopped.
Feb 17 00:38:16 sme-hp rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="22261" x-info="http://www.rsyslog.com"] exiting on signal 15.
IN THE CRONTAB
Feb 17 00:00:01 sme-hp CROND[7540]: (root) CMD (/usr/lib/sa/sa1 1 1)
Feb 17 00:01:01 sme-hp CROND[7575]: (root) CMD (run-parts /etc/cron.hourly)
Feb 17 00:01:01 sme-hp run-parts(/etc/cron.hourly)[7575]: starting 0anacron
Feb 17 00:01:01 sme-hp anacron[7586]: Anacron started on 2018-02-17
Feb 17 00:01:01 sme-hp anacron[7586]: Jobs will be executed sequentially
Feb 17 00:01:01 sme-hp anacron[7586]: Normal exit (0 jobs run)
Feb 17 00:01:01 sme-hp run-parts(/etc/cron.hourly)[7588]: finished 0anacron
Feb 17 00:05:01 sme-hp CROND[7697]: (root) CMD (/etc/e-smith/events/actions/openvpn-bridge-update-crl 2>&1 /dev/null)
Feb 17 00:10:01 sme-hp CROND[7857]: (root) CMD (/usr/lib/sa/sa1 1 1)
Feb 17 00:12:01 sme-hp CROND[7911]: (root) CMD (/sbin/e-smith/smeserver-clamscan)
Feb 17 00:20:01 sme-hp CROND[8142]: (root) CMD (/usr/lib/sa/sa1 1 1)
Feb 17 00:30:01 sme-hp CROND[8422]: (root) CMD (/usr/lib/sa/sa1 1 1)
Feb 17 00:37:01 sme-hp CROND[8614]: (root) CMD (/sbin/e-smith/warnquota)
Feb 17 00:38:15 sme-hp crond[1856]: (CRON) INFO (Shutting down)
-
check the lines before Feb 17 00:38:14 in messages
-
i believe that's not important...
juste have look
Feb 16 22:30:35 sme-hp dhcpd: DHCPACK on 192.168.3.71 to 64:b9:e8:cf:08:d2 (imac-freelance-1) via br0
Feb 17 00:31:24 sme-hp dhcpd: Wrote 20 leases to leases file.
Feb 17 00:31:24 sme-hp dhcpd: DHCPREQUEST for 192.168.3.71 from 64:b9:e8:cf:08:d2 (imac-freelance-1) via br0
Feb 17 00:31:24 sme-hp dhcpd: DHCPACK on 192.168.3.71 to 64:b9:e8:cf:08:d2 (imac-freelance-1) via br0
-
ok.. is AV scanning active? if so, since it's an heavy task for the cpu, it's likely the source of your problem.
in any case, "CPU overheating" means that your CPU/server is not efficiently cooled.
-
interesting, I'll watch this.
thanks for the idea...
many thx
-
AV was active once a week. I stopped to test. I also took the risk of removing in bios mode "restart" if there is overheating.
but I do not understand why it sends the signal badly and the server does not simply restart.
thx for all.
-
Virus scanning taxes the system but regardless, it should never overheat. Disabling it fixes the symptom but not the problem.
-
you're right, it's hard to know what causes this, the server is new, the disk too ...
-
you're right, it's hard to know what causes this, the server is new, the disk too ...
Did you ever find the root cause for this? My 9.2 system has been doing this at 00:30 every Saturday for some time now also. Can't find anything in the logs...
-
First thing is check what is running at the time.
Quite possibly clamscan, but it won't trip as soon as it starts, but after a period as the machine works harder & generates more heat. It might take a while to get that hot.
Have you got a system monitor installed to watch processes and temps? Sme9admin or system monitor?
That should show processes/load/temps.
If it is overheating....
Check the trip temps in the BIOS.
Check all your fans are clean and working (including PSU), all the vents are clean, and the heatsinks have no dust & hair build up.
CPU fan & heatsink are the most likely culprits.
-
the server is in a cabinet without a back panel. overheating occurs during backup.
if I open the doors during the weekend it's okay ...
everything seems normal in the fans and in the bios, the server is new, he always did that, like the old one.
Thanks for your help...
-
Yup. Backup would do it too.
I'd take a serious look at your ventilation which is obviously inadequate.
More fans required.....