Koozali.org: home of the SME Server

puzzling server slowdown

Offline wdepot

  • ****
  • 101
  • +0/-0
    • http://westerndepot.com
puzzling server slowdown
« on: May 29, 2013, 08:59:21 PM »
We are running into some puzzling server slowdowns ever since we upgraded to a new server and were wondering if it might be due to a couple of contribs we tried and then rejected. When we built the new server, along with the contributions we had been using previously (Sysmon, AWStats, phpMyAdmin, Diskusage, Hardware Info, and DAR2) we decided to try Remote User Access, Ajaxterm and SME8Admin. We noticed that the new server seemed to run our web site slower even though it should have more power than the old server (dual Xeon E5-2620s and 64Gb memory compared to a single quad core Xeon and 12Gb memory on the old one).

We did a Dar2 full backup on the new server and then a Dar2 disaster recovery restore back to the old server, did a totally new install of SME8 (reformatted the hard drives and installed) on the new server, this time without Ajaxterm (which we never could get to work) and SME8Admin (which we suspected of slowing the server down) and again did a Dar2 full backup on the old server with a Dar2 disaster recovery restore to the new server. It seems like we are still getting occasional slowdowns on our web site which uses highly modified osCommerce and thus is highly dependent on MySQL.

We've never noticed any slowdowns with connections to external web sites (our server is also our internet gateway) so we are pretty sure it is something on our server that is causing the periodic slowdown for our web site. Since I did a Dar2 backup without first doing a yum remove on the two contributions we didn't want I was wondering if an extraneous cron job might have been copied that is causing our slowdown. Does anyone have any ideas as to what might be causing such an occasional slowdown? Our web site normally works fast but then it suddenly slows to a crawl with pages that normally load in a couple of seconds taking like a minute to load.

I've noticed on the system monitor graph that CPU utilization normally runs between 1 and 2 percent with spikes up to about 4% at the times that our scheduled backups run and then I see some spikes that suddenly go up to 15 to 25% for unknown reasons. Likewise the System Load chart shows normal Load*30 between 10 and 20 or less but the spikes at the same time as the CPU spikes go up to 100 or 150. These spikes are occuring at the same time we've noticed our web site being extremely slow.

Any suggestions anyone can give as to why we are having such extreme occasional slowdowns would be greatly appreciated.

Offline Jean-Philippe Pialasse

  • *
  • 2,912
  • +11/-0
  • aka Unnilennium
    • http://smeserver.pialasse.com
Re: puzzling server slowdown
« Reply #1 on: May 29, 2013, 09:27:05 PM »
Hello,

I would start with a :
Code: [Select]
/sbin/e-smith/audittools/newrpms
then if you suspected sme8admin
Code: [Select]
du -sh /var/log/messages*
 du -sh /var/log/httpd/*

you can keep a console open with htop , to see which processes seem to run when you feel the slowdown.

also I would check for the hardware configuration and hard drive. Sometime bios is configured for backward compatibility and hard drive controllers are set in a way their access is very very slow ( take hours to format a drive), and it can impact your performance

Offline wdepot

  • ****
  • 101
  • +0/-0
    • http://westerndepot.com
Re: puzzling server slowdown
« Reply #2 on: May 29, 2013, 10:22:50 PM »
I checked the BIOS settings before I ever installed SME8 so I don't think that is the problem. Besides, I think that would be more likely to cause an overall slowdown rather than one that occurs only occasionally.

I'm not sure what I'm looking for on the other tools so I'll report the results. For newrpms I get:
Code: [Select]
awstats.noarch    7.0-2.el5.rf
fping.x86_64    3.4-1.el5.rf
libmcrypt.x86_64    2.5.8-4.el5.centos
lshw.x86_64    2.15-1.el5.rf
lua.x86_64    5.1.4-4.el5
nut.x86_64    2.6.5-1geekery
nut-client.x86_64    2.6.5-1geekery
perl-Geo-IP.x86_64    1.38-1.el5.rf
perl-rrdtool.x86_64    1.4.7-1.el5.rf
php53-mcyrpt.x86_64    5.3.3-1.el5
php53-php-gettext.noarch    1.0.11-3.el5
phpMyAdmin3.noarch    3.5.4-1.el5
powerman.x86_64    2.3.5-2geekery
rrdtool.x86_64    1.4.7-1.el5.rf
smeserver-awstats.noarch    1.2-7.el4.sme
smeserver-dar2.noarch    0.0.1-27.el4.sme
smeserver-diskusage.noarch    0.0.1-13.el4.sme
smeserver-hwinfo.noarch    1.0-15.el4.sme
smeserver-phpmyadmin.noarch    3.5.2.2-2.el5.sme
smeserver-remoteuseraccess.noarch    1.2-33.el4.sme
smeserver-sysmon.noarch    6.2-2.el5.sme

du -sh /var/log/messages* listed about 30 files ranging in size from 8.0K for the one dated today up to a maximum of 5.9M for a file dated on May 11, 2013. I'm afraid I don't really know what I'm looking for there.

du -sh /var/log/httpd/*  lists something like 30 files named access_log.*, another 30 named admin_access_log.*, 30 named admin_error_log.*, 30 named error_log.* and one file each named awstats_lr and awstats_pos. Again I have no idea what I am looking for.

I've never used htop before but as near as I can tell I basically just displays a constant list of currently running processes with a display of the load on each processor core? I take it that when the slowdown occurs I should see a spike in the number of running processes and processor core usage and should look for processes other than httpd and mysqld?

Offline Jean-Philippe Pialasse

  • *
  • 2,912
  • +11/-0
  • aka Unnilennium
    • http://smeserver.pialasse.com
Re: puzzling server slowdown
« Reply #3 on: May 29, 2013, 11:03:59 PM »
I checked the BIOS settings before I ever installed SME8 so I don't think that is the problem. Besides, I think that would be more likely to cause an overall slowdown rather than one that occurs only occasionally.

I'm not sure what I'm looking for on the other tools so I'll report the results. For newrpms I get:
Code: [Select]
awstats.noarch    7.0-2.el5.rf

[quote author=wdepot link=topic=49932.msg250260#msg250260 date=1369858970]
[code]
powerman.x86_64    2.3.5-2geekery
do not know what it is for and what ressources it use..

Code: [Select]
smeserver-awstats.noarch    1.2-7.el4.sme
depending on the size of your http logs it can slow your server at every cron

Code: [Select]
smeserver-diskusage.noarch    0.0.1-13.el4.sme
it can really slow down your server each time you consult the page (du  of every folder)




du -sh /var/log/messages* listed about 30 files ranging in size from 8.0K for the one dated today up to a maximum of 5.9M for a file dated on May 11, 2013. I'm afraid I don't really know what I'm looking for there.
huge log (>1 Mo) can slow your server when you are using log parser like sme8admin, awstat, or a lot of system that scan log for security purpose.

Huge log can also indicate a malfunctionning deamon that is sending a lot of crap to your logs in order to tell come to set me correctly. SO I would start to find why your message log is so big !


du -sh /var/log/httpd/*  lists something like 30 files named access_log.*, another 30 named admin_access_log.*, 30 named admin_error_log.*, 30 named error_log.* and one file each named awstats_lr and awstats_pos. Again I have no idea what I am looking for.

same here the size of your files would help, if your error logs are > 1 Mo then you have a lot of malfunctioning webapps that should be corrected and might be your problem. Reading these logs could help locating the problem
huge access log would mean alot of traffic, which mean a lot of work each time awstats runs. If this is the case you might want to run it more often.



I've never used htop before but as near as I can tell I basically just displays a constant list of currently running processes with a display of the load on each processor core? I take it that when the slowdown occurs I should see a spike in the number of running processes and processor core usage and should look for processes other than httpd and mysqld?

moreover you should see a process using a lot of resources if you sort by CPU or memory
« Last Edit: May 29, 2013, 11:05:37 PM by unnilennium »

Offline wdepot

  • ****
  • 101
  • +0/-0
    • http://westerndepot.com
Re: puzzling server slowdown
« Reply #4 on: May 29, 2013, 11:37:57 PM »
powermon.x86_64 goes with the updated version of nut I had to install to get it to work with our new APC SmartUPS. I suspect it only runs when the power goes out and puts the computer on battery power.

Even with the large log files awstats only seems to take a few seconds each time it runs from what I noticed using htop. And we've been using awstats for years and never noticed any major slowdowns before.

We rarely use the Disk Usage page in the SME Server Manager, maybe once or month at most. If it only uses a lot of time when we access that page I think we can safely say that it is not causing the problem.

Anything in particular I should filter for when looking at the log files? I looked at the largest of the httpd/access_logs (135M) and as near I can tell it was almost all GETs for various files in our web site.

Offline janet

  • *****
  • 4,812
  • +0/-0
Re: puzzling server slowdown
« Reply #5 on: May 30, 2013, 09:33:09 AM »
wdepot

Also try using
top -i
to see only the active processes (rather than just top whihc will display everything, similar to htop)

I'd be looking at email virus scanning & spamassassin filtering, disable them both (temporarily) & see if you stop having system slowdown.
Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline compdoc

  • *
  • 226
  • +0/-0
Re: puzzling server slowdown
« Reply #6 on: June 11, 2013, 11:30:31 PM »
We are running into some puzzling server slowdowns ever since we upgraded to a new server and were wondering if it might be due to a couple of contribs we tried and then rejected. When we built the new server, along with the contributions we had been using previously (Sysmon, AWStats, phpMyAdmin, Diskusage, Hardware Info, and DAR2)

When you say 'full backup' and 'disaster recovery restore' using DAR2, do you mean all directories are restored? I think you're having the same problem I did recently.

Offline wdepot

  • ****
  • 101
  • +0/-0
    • http://westerndepot.com
Re: puzzling server slowdown
« Reply #7 on: June 12, 2013, 12:38:29 AM »
The full backup saves the following folders with any subfolders inside:

etc/e-smith/templates-custom
etc/e-smith/templates-user-custom
etc/group
etc/gshadow
etc/passwd
etc/samba/secrets.tdb
etc/samba/smbpasswd
etc/shadow
etc/smbpasswd
etc/ssh
etc/sudoers
home/e-smith
root

The disaster recovery restore restores all of these same folders. Any other folders would just have stock SMEServer files which you would be resintalling anyway.

Offline wdepot

  • ****
  • 101
  • +0/-0
    • http://westerndepot.com
Re: puzzling server slowdown
« Reply #8 on: June 12, 2013, 02:18:16 AM »
I finally managed to catch the server during one of its big slowdowns using htop. The odd thing is that the thing that was showing like 530 CPU% was /usr/libexec/mysqld. At the same time there were about four other instances of mysqld running at the same time that were showing something like 85 to 95 CPU%. The one that was showing 530% was the one highlighted in blue at the very top of the process list. So I guess the question now would be what might make one single MySQL query take such a large amount of CPU power?

I have the following settings added to the default SMEServer values in the my.cnf file:
Code: [Select]
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
key_buffer_size=2048M
preload_buffer_size=512M
query_cache_limit=16M
query_cache_size=128M
query_cache_type=1
query_prealloc_size=256M
read_buffer_size=2M
read_rnd_buffer_size=4M
sort_buffer_size=2M
thread_cache_size=100

The idea is to try and buffer as much as possible to make the MySQL server faster in responding. However we were noticing some slowdowns before I ever added these to the templates-custom for my.cnf.

The other thing I'm wondering if if might somehow be related to the apparent MySQL problem is the following that I found in the messages log file for today after I applied the latest SME updates this morning:
Code: [Select]
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 108: Duplicate column name 'File_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 114: Duplicate column name 'Grant_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 115: Duplicate column name 'Grant_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 116: Duplicate column name 'Grant_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 127: Duplicate column name 'ssl_type'
Jun 11 10:25:45 www mysql.init: ERROR 1061 (42000) at line 138: Duplicate key name 'Grantor'
Jun 11 10:25:45 www mysql.init: ERROR 1054 (42S22) at line 164: Unknown column 'Type' in 'columns_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 186: Duplicate column name 'type'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 196: Duplicate column name 'Show_db_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 213: Duplicate column name 'max_questions'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 223: Duplicate column name 'Create_tmp_table_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 226: Duplicate column name 'Create_tmp_table_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 320: Duplicate column name 'Create_view_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 323: Duplicate column name 'Create_view_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 326: Duplicate column name 'Create_view_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 332: Duplicate column name 'Show_view_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 335: Duplicate column name 'Show_view_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 338: Duplicate column name 'Show_view_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 355: Duplicate column name 'Create_routine_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 358: Duplicate column name 'Create_routine_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 361: Duplicate column name 'Create_routine_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 367: Duplicate column name 'Alter_routine_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 370: Duplicate column name 'Alter_routine_priv'
Jun 11 10:25:45 www mysql.init: ERROR 1060 (42S21) at line 373: Duplicate column name 'Alter_routine_priv'
Jun 11 10:25:46 www mysql.init: ERROR 1060 (42S21) at line 376: Duplicate column name 'Execute_priv'
Jun 11 10:25:46 www mysql.init: ERROR 1060 (42S21) at line 379: Duplicate column name 'Execute_priv'
Jun 11 10:25:46 www mysql.init: ERROR 1060 (42S21) at line 392: Duplicate column name 'max_user_connections'
Jun 11 10:25:46 www mysql.init: ERROR 1060 (42S21) at line 401: Duplicate column name 'Create_user_priv'
Jun 11 10:25:46 www mysql.init: ERROR 1060 (42S21) at line 423: Duplicate column name 'Routine_type'
Jun 11 10:25:46 www mysql.init: @hadGrantPriv:=1
Jun 11 10:25:46 www mysql.init: 1
Jun 11 10:25:46 www last message repeated 4 times
Jun 11 10:25:46 www mysql.init: @hadShowDbPriv:=1
Jun 11 10:25:46 www mysql.init: 1
Jun 11 10:25:46 www last message repeated 4 times
Jun 11 10:25:46 www mysql.init: @hadCreateViewPriv:=1
Jun 11 10:25:46 www mysql.init: 1
Jun 11 10:25:46 www last message repeated 4 times
Jun 11 10:25:46 www mysql.init: @hadCreateRoutinePriv:=1
Jun 11 10:25:46 www mysql.init: 1
Jun 11 10:25:46 www last message repeated 4 times
Jun 11 10:25:46 www mysql.init: @hadCreateUserPriv:=1
Jun 11 10:25:46 www mysql.init: 1
Jun 11 10:25:46 www last message repeated 4 times
Jun 11 10:25:46 www mysql.init: waiting for mysqld to restart
Is there any possibility that this might somehow be related to the slowdown?