Koozali.org: home of the SME Server

system hangs when CPU is 100% used for couple of hours.

Offline tropicalview

  • *****
  • 196
  • +0/-0
    • http://www.tropicalview.net
system hangs when CPU is 100% used for couple of hours.
« on: August 27, 2008, 01:57:22 PM »
Hi,

I noticed that when the CPU is used fully for a couple of hours (for example for the dar2 backup process)
the system will hang completely.
I noticed this on 3 different hardware configs:

test machine dualcore 2ghz 512mb ram (some buissiness dell client pc)
production machine core 2 dual 3ghz 4gb ram (dell poweredge 860)
snd production machine core 2 dual 3 ghz 2gb ram (dell poweredge 860)

does anybody know why? where i should start looking at in logs?

Is there a posibility to limit the dar2 process to 75 or 50 % cpu?
The sky is not the limit, But when I reach the sky, for sure I will not try to go to the limit.... (donated $25,- upto now)

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #1 on: August 27, 2008, 09:07:05 PM »
On Linux 100% CPU usage is not necessarily a bad thing, processors are there to be used when jobs need to be doen, however a lock-up is not normal.
If your system locks up during a backup using dar we need to know a little more about your setup, for instance are you doing your backup to a network share or are writing it to perhaps an external disk.

You can start by looking at the processor and memory use using the top command in a SME Server shell during a backup job and have a look on which processes are using what amount of processor power and whether they are 'niced', it could be that the priority of the dar backup is set to high which makes it to greedy on CPU.
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline tropicalview

  • *****
  • 196
  • +0/-0
    • http://www.tropicalview.net
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #2 on: August 27, 2008, 09:20:59 PM »
Hi Cactus,

i do backup to a network share.
right now i do not have the change to do a TOP command but will do tonight and post te results.

but is there a way to give DAR a very low proirity or even limit it to maximal 50% cpu use?
that would also be fine in preformance (due to time zone's also during night people are connected)

Thanks for your reply.

by the way, the termial it self doesn't react when the system hangs (stays black screen on keyboard inputs)

Perhaps some other usefull information:
i have a Virtual machine running on that server.
when that one is powered down, I have a greater change to complete the backup.
But also with the virtual machine down i got this error at least once.
The sky is not the limit, But when I reach the sky, for sure I will not try to go to the limit.... (donated $25,- upto now)

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #3 on: August 27, 2008, 09:47:55 PM »
i do backup to a network share.
right now i do not have the change to do a TOP command but will do tonight and post te results.
I suspect this to be the problem as all your bandwidth might be used by the backup, although you also state that the terminal does not react.

but is there a way to give DAR a very low proirity or even limit it to maximal 50% cpu use?
that would also be fine in performance (due to time zone's also during night people are connected)
AFAIK the backup tasks are niced to be lower than system processes., but we will have to see that from the output of your top command.

by the way, the termial it self doesn't react when the system hangs (stays black screen on keyboard inputs)

Perhaps some other usefull information:
i have a Virtual machine running on that server.
when that one is powered down, I have a greater change to complete the backup.
But also with the virtual machine down i got this error at least once.
What are the specs of your machine, perhaps it is to low-end and does not rise to the task. Do you do compression during backup? If this lockup is the case I suggest you start top before your backup starts as you might not be able to do so after it started. :-)
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline tropicalview

  • *****
  • 196
  • +0/-0
    • http://www.tropicalview.net
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #4 on: August 27, 2008, 09:55:06 PM »

Hi Cactus,

Thanks for your help again.

Quote
What are the specs of your machine, perhaps it is to low-end and does not rise to the task. Do you do compression during backup?

The machine is a Dell Poweredge 860 with mirrored SAS drives, core 2 dual 3ghz intell processor and with 2 or 4 gigabyte of memory (on both the machines 1 gig is used by vmware).
I think that should be powerfull enough.
I tried with and without compression, without i have a bigger change of success but not always!

Quote
If this lockup is the case I suggest you start top before your backup starts as you might not be able to do so after it started.

The lockup takes place somewhere 2 or 3 hours after the start of the backup.

The sky is not the limit, But when I reach the sky, for sure I will not try to go to the limit.... (donated $25,- upto now)

Offline tropicalview

  • *****
  • 196
  • +0/-0
    • http://www.tropicalview.net
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #5 on: August 28, 2008, 03:06:50 AM »
I hope this will giv some clue about what is happening.
the strange thing is that no other person encountered this problem yet, and i did on all the 3 servers.




Code: [Select]
top - 20:53:46 up 23:08,  1 user,  load average: 1.34, 0.52, 0.27
Tasks: 218 total,   3 running, 215 sleeping,   0 stopped,   0 zombie
Cpu(s): 46.6% us,  4.2% sy,  0.0% ni, 47.4% id,  1.7% wa,  0.2% hi,  0.0% si
Mem:   2074692k total,  2052972k used,    21720k free,   192256k buffers
Swap:  2031608k total,      128k used,  2031480k free,  1175576k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
16215 root      25   0 16940 9732 1640 R   96  0.5   0:34.66 dar
   56 root      15   0     0    0    0 D    4  0.0   0:17.23 pdflush
 3641 root       5 -10  374m 296m 286m S    1 14.6   8:17.38 vmware-vmx
16166 root      15   0     0    0    0 S    1  0.0   0:00.21 cifsd
   57 root      15   0     0    0    0 S    0  0.0   0:09.35 kswapd0
 4579 www       15   0 33696 4668 2488 S    0  0.2   0:15.18 httpd.vmware
16120 root      16   0  3740 1096  788 R    0  0.1   0:00.39 top
    1 root      16   0  2444  616  528 S    0  0.0   0:00.64 init
    2 root      RT   0     0    0    0 S    0  0.0   0:00.05 migration/0
    3 root      34  19     0    0    0 S    0  0.0   0:00.80 ksoftirqd/0
    4 root      RT   0     0    0    0 S    0  0.0   0:00.04 migration/1
    5 root      34  19     0    0    0 S    0  0.0   0:01.00 ksoftirqd/1
    6 root       5 -10     0    0    0 S    0  0.0   0:00.00 events/0
    7 root       5 -10     0    0    0 S    0  0.0   0:00.00 events/1
    8 root       5 -10     0    0    0 S    0  0.0   0:00.00 khelper
    9 root      15 -10     0    0    0 S    0  0.0   0:00.00 kacpid
   36 root       5 -10     0    0    0 S    0  0.0   0:00.00 kblockd/0
   37 root       5 -10     0    0    0 S    0  0.0   0:00.00 kblockd/1
   38 root      15   0     0    0    0 S    0  0.0   0:00.05 khubd
   55 root      15   0     0    0    0 S    0  0.0   0:00.00 pdflush
   58 root      12 -10     0    0    0 S    0  0.0   0:00.00 aio/0
   59 root      12 -10     0    0    0 S    0  0.0   0:00.00 aio/1
  203 root      25   0     0    0    0 S    0  0.0   0:00.00 kseriod
  441 root      16   0     0    0    0 S    0  0.0   0:00.00 scsi_eh_0
  454 root       7 -10     0    0    0 S    0  0.0   0:00.00 ata/0
  455 root       7 -10     0    0    0 S    0  0.0   0:00.00 ata/1
  456 root       7 -10     0    0    0 S    0  0.0   0:00.00 ata_aux
  486 root      15   0     0    0    0 S    0  0.0   0:00.00 md1_raid1
  488 root      15   0     0    0    0 S    0  0.0   0:00.18 md2_raid1
  493 root      15   0     0    0    0 S    0  0.0   0:15.78 kjournald
 1035 root       6 -10     0    0    0 S    0  0.0   0:00.00 kauditd
 1656 root       6 -10  1900  448  360 S    0  0.0   0:00.00 udevd
 2072 root      15   0     0    0    0 S    0  0.0   0:00.00 kjournald
 2376 root      18   0  2012  416  360 S    0  0.0   0:00.00 mingetty
 2378 root      18   0  2964  416  360 S    0  0.0   0:00.00 mingetty
 2379 root      16   0    72   28   12 S    0  0.0   0:00.00 runsvdir
 2605 root      16   0    60   28   16 S    0  0.0   0:00.00 runsv
The sky is not the limit, But when I reach the sky, for sure I will not try to go to the limit.... (donated $25,- upto now)

Offline janet

  • *****
  • 4,812
  • +0/-0
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #6 on: August 28, 2008, 04:18:25 AM »
tropicalview

Take a look to see if any other cron jobs run during the period dar is running, and see if they relate to the lockup time.

Also try (temporarily at least) disabling spam filtering, and also mail virus scanning, and virus scanning of your system, and then see if dar completes OK.

Do you have RBL rejection enabled to reduce spam load on your systems.

You may be blaming dar, when dar is not really the problem.

Look at the messages log file (and other log files for that matter) and see what was happeing prior to the lockup.
« Last Edit: August 28, 2008, 04:53:59 AM by mary »
Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline p-jones

  • *
  • 594
  • +0/-0
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #7 on: August 28, 2008, 04:47:05 AM »
Tropicalview,

You state this is happening on 3 machines. Does one assume also that the data you are backing up is different on each machine.

I am currently struggling with a restoration issue with that software which I have narrowed down to a specific folder. Either a currupt folder, file within that folder or a filetype the software doesnt like. I am guessing that if it can happen during restoration, it could also be happening during backup.

Also, if your system has locked up totally, I assume you have had to crash boot it. Have you done a filesystem integrity check as prompted during startup ?

Dont forget the bug tracker !
« Last Edit: August 28, 2008, 04:48:50 AM by p-jones »
...

Offline tropicalview

  • *****
  • 196
  • +0/-0
    • http://www.tropicalview.net
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #8 on: August 28, 2008, 01:24:46 PM »
Hi p-jones,

Quote
You state this is happening on 3 machines. Does one assume also that the data you are backing up is different on each machine.

the test server (small config) is the same as the biggest production server.
the second production server is completely different in data.

Quote
Either a currupt folder, file within that folder or a filetype the software doesn't like.
could that be the files of virtual machine files, those are 2GB.
Perhaps i have to change the file size of the Dar's to more than 2 GB.


Quote
Have you done a filesystem integrity check as prompted during startup ?

I guess so. i do not have a terminal standard on the server.
when the server starts does it do this check automatically???

How can i do this check manually?

Quote
Dont forget the bug tracker !
I want to be sure it's not a configuration issue first, a lot of people use this, how could it be possible that I'm the first to notice a fatal error in it?
The sky is not the limit, But when I reach the sky, for sure I will not try to go to the limit.... (donated $25,- upto now)

Offline janet

  • *****
  • 4,812
  • +0/-0
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #9 on: August 28, 2008, 01:32:20 PM »
tropicalview

Did you read my suggestions in Reply #6 ?
Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline tropicalview

  • *****
  • 196
  • +0/-0
    • http://www.tropicalview.net
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #10 on: August 28, 2008, 03:07:53 PM »
Sorry Mary,

I've overlooked your reply.

Indeed i disabled the antivirus scan already from the start.
as far as I know the sme7admin is also working during the dar backup (before i did only on the small test machine and got problems with the graph's, see forum post : http://forums.contribs.org/index.php?topic=41767.0)

This is not the case on the production servers, there it seens like i have enough capacity to also have the graph's processed.

Perhaps i have to stop that? and how can i check other jobs running while the backup is running?? (sorry but SME server is my first step outside the stupid Microsoft products)



see the messages log here: http://tropicalview.net/smeserver/messages.txt

Please note, the fax messages, i tried to install hyla fax but uninstalled later.
the other machine that has the same error doesn't have hylafax installed.

In this log you can take a look at the 27th and the 26th.
on both the dates the backup started at 19.00 hour.

Code: [Select]
Aug 27 19:00:03 admin-svr esmith::event[12423]: Processing event: pre-backup 
Aug 27 19:00:03 admin-svr esmith::event[12423]: Running event handler: /etc/e-smith/events/pre-backup/S10mysql-delete-dumped-tables
Aug 27 19:00:03 admin-svr esmith::event[12423]: S10mysql-delete-dumped-tables=action|Event|pre-backup|Action|S10mysql-delete-dumped-tables|Start|1219878003 532323|End|1219878003 558427|Elapsed|0.026104
Aug 27 19:00:03 admin-svr esmith::event[12423]: Running event handler: /etc/e-smith/events/pre-backup/S20mysql-dump-tables
Aug 27 19:00:03 admin-svr esmith::event[12423]: S20mysql-dump-tables=action|Event|pre-backup|Action|S20mysql-dump-tables|Start|1219878003 558799|End|1219878003 931084|Elapsed|0.372285
Aug 27 19:00:03 admin-svr esmith::event[12423]: Running event handler: /etc/e-smith/events/pre-backup/S50rewind-tape
Aug 27 19:00:04 admin-svr esmith::event[12423]: S50rewind-tape=action|Event|pre-backup|Action|S50rewind-tape|Start|1219878003 931419|End|1219878004 42339|Elapsed|0.11092
Aug 27 19:03:15 admin-svr init: cannot execute "/usr/sbin/faxgetty"
Aug 27 19:03:15 admin-svr last message repeated 9 times
Aug 27 19:03:15 admin-svr init: Id "fax" respawning too fast: disabled for 5 minutes
Aug 28 07:41:26 admin-svr syslogd 1.4.1: restart.
Aug 28 07:41:26 admin-svr syslog: syslogd startup succeeded
Aug 28 07:41:26 admin-svr kernel: klogd 1.4.1, log source = /proc/kmsg started.
« Last Edit: August 28, 2008, 03:10:59 PM by tropicalview »
The sky is not the limit, But when I reach the sky, for sure I will not try to go to the limit.... (donated $25,- upto now)

Offline tropicalview

  • *****
  • 196
  • +0/-0
    • http://www.tropicalview.net
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #11 on: August 29, 2008, 08:18:11 PM »
Does anybody have a idea what is happening on my machine?
do someone want to have direct access to the test machine??

The sky is not the limit, But when I reach the sky, for sure I will not try to go to the limit.... (donated $25,- upto now)

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #12 on: August 30, 2008, 01:06:20 AM »
Does anybody have a idea what is happening on my machine?

If they are really hanging then there is either a kernel software bug or a hardware problem - e.g. CPU overheating or underspecified power supply. But you need to distinguish between the system just being very slow and the system having hung.

If you can log on and run top then your system hasn't hung. If top continues to update, then the system isn't hung.

If your system is multi CPU, then (a single instance of)  dar won't be using 100% of CPU.

Offline tropicalview

  • *****
  • 196
  • +0/-0
    • http://www.tropicalview.net
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #13 on: August 30, 2008, 05:18:54 AM »
Hi  CharlieBrady,

The systems do really hang, even the terminal (keyboad / screen connected) doesn't react at all.
the top command is before the system hangs.

I doubt it is a hardware problem
the 3 machines are all 3 dell pc's of a year old.
the 2 production machines are poweredge servers with plenty of capacity the test machine could perhaps have some lack of memory (512mb)

I don't even think they are overheating, because they are in the airco, the fans of the power edge can move a lot of air and if they can overhead the system will power down....

also if it's a hardware issue i think it's strange all 3 of them has the same problem.
The sky is not the limit, But when I reach the sky, for sure I will not try to go to the limit.... (donated $25,- upto now)

Offline janet

  • *****
  • 4,812
  • +0/-0
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #14 on: August 30, 2008, 06:20:09 AM »
tropicalview

Quote
Does anybody have a idea what is happening on my machine?

I gave you some suggestions but you totally overlooked my post.
When I drew it to your attention again, you appear to only answer/comment on one of the points I raised.
If you don't follow/answer all the suggestions (clearly), then there is not much point us/me helping you !

My suggestions are really a means of eliminating possibilities.
If the problem goes away, then look at what you removed as being the likely source of the problem. If the problem remains, than the suggestions I made are not causing the problem. You would then need to get more aggressive in your troubleshooting.

I suggested:
> Take a look to see if any other cron jobs run during the period dar is running,
> and see if they relate to the lockup time.

look in
/etc/cron.d
/etc/cron.daily
/etc/cron.hourly
/etc/cron.weekly
/etc/cron.monthly

Also look in
/var/log/cron
to see what is actually happening

There could also be crontab entries that don't appear in the locations mentioned above. Read up & search on crontab.


> Also try (temporarily at least) disabling spam filtering,

You did not answer whether spam filtering is enabled or not


> and also mail virus scanning,

You did not mention if virus scanning on incoming mail is enabled or not


> and virus scanning of your system,

You did say you disabled virus scanning, and I assume you mean only daily/weekly scanning of the system & data files.


> Do you have RBL rejection enabled to reduce spam load on your systems.

You do not answer this, find out with

config show qpsmtpd


Email virus scanning & spam filtering can put a big load on your system depending on the type of mail you receive etc.


> Look at the messages log file (and other log files for that matter) and see what was happeing prior to the lockup.

You provide a copy of the messages log file but expect us to go through it with a fine comb. That's your job.
At a quick look, I see plenty of errors regarding fax system, perhaps you have left behind various templates that should have been removed. I think you should clean up the old fax install.

I was suggesting you look for the point when your system supposedly hangs, does that show in the log ie no more entries happening after the "hang" time ? Then look prior to that time and see what else has been happening.
Also look in all other log files around and prior to the time of the "hang", they are accessible from server manager.

See below excerpts from messages log, which certainly shows fax activity prior to the hang.


It looks like a "hang" here:

Aug 23 00:09:52 admin-svr init: cannot execute "/usr/sbin/faxgetty"
Aug 23 00:09:52 admin-svr last message repeated 9 times
Aug 23 00:09:52 admin-svr init: Id "fax" respawning too fast: disabled for 5 minutes
Aug 23 00:14:53 admin-svr init: cannot execute "/usr/sbin/faxgetty"
Aug 23 00:14:53 admin-svr last message repeated 9 times
Aug 23 00:14:53 admin-svr init: Id "fax" respawning too fast: disabled for 5 minutes
Aug 23 00:19:54 admin-svr init: cannot execute "/usr/sbin/faxgetty"
Aug 23 00:19:54 admin-svr last message repeated 9 times
Aug 23 00:19:54 admin-svr init: Id "fax" respawning too fast: disabled for 5 minutes
Aug 23 09:09:59 admin-svr syslogd 1.4.1: restart.
Aug 23 09:09:59 admin-svr syslog: syslogd startup succeeded
Aug 23 09:09:59 admin-svr kernel: klogd 1.4.1, log source = /proc/kmsg started.
Aug 23 09:09:59 admin-svr syslog: klogd startup succeeded


and here:

Aug 27 19:03:15 admin-svr init: cannot execute "/usr/sbin/faxgetty"
Aug 27 19:03:15 admin-svr last message repeated 9 times
Aug 27 19:03:15 admin-svr init: Id "fax" respawning too fast: disabled for 5 minutes
Aug 28 07:41:26 admin-svr syslogd 1.4.1: restart.
Aug 28 07:41:26 admin-svr syslog: syslogd startup succeeded





You should remove sme7admin, there have been many reports of inappropriate settings causing trouble.
If you totally uninstall it, then sme7admin cannot be causing the problem.

According to the wiki do: (although yum remove should be used carefully due to the possibility of removing more than you really want to  ie rpm -e packagename is safer until yum remove gets fixed [at least for sme7]).

yum remove sysstat
yum remove hddtemp
yum remove rrdtool


To monitor what is going on use

top
top -i
htop

You also do not say what version sme you are running, whether they are fully up to date, and what other contribs are installed and/or what other modifications you have made to the servers.

If you stop dar from running, does the system still hang ?

Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #15 on: August 30, 2008, 09:47:05 AM »
See below excerpts from messages log, which certainly shows fax activity prior to the hang.

It looks like a "hang" here:

Aug 23 00:09:52 admin-svr init: cannot execute "/usr/sbin/faxgetty"
Aug 23 00:09:52 admin-svr last message repeated 9 times
Aug 23 00:09:52 admin-svr init: Id "fax" respawning too fast: disabled for 5 minutes
Aug 23 00:14:53 admin-svr init: cannot execute "/usr/sbin/faxgetty"
Aug 23 00:14:53 admin-svr last message repeated 9 times
Aug 23 00:14:53 admin-svr init: Id "fax" respawning too fast: disabled for 5 minutes
Aug 23 00:19:54 admin-svr init: cannot execute "/usr/sbin/faxgetty"
Aug 23 00:19:54 admin-svr last message repeated 9 times
Aug 23 00:19:54 admin-svr init: Id "fax" respawning too fast: disabled for 5 minutes
Aug 23 09:09:59 admin-svr syslogd 1.4.1: restart.
Aug 23 09:09:59 admin-svr syslog: syslogd startup succeeded
Aug 23 09:09:59 admin-svr kernel: klogd 1.4.1, log source = /proc/kmsg started.
Aug 23 09:09:59 admin-svr syslog: klogd startup succeeded

and here:

Aug 27 19:03:15 admin-svr init: cannot execute "/usr/sbin/faxgetty"
Aug 27 19:03:15 admin-svr last message repeated 9 times
Aug 27 19:03:15 admin-svr init: Id "fax" respawning too fast: disabled for 5 minutes
Aug 28 07:41:26 admin-svr syslogd 1.4.1: restart.
Aug 28 07:41:26 admin-svr syslog: syslogd startup succeeded
You are right in stating that the lock-up is likely here...

You should remove sme7admin, there have been many reports of inappropriate settings causing trouble.
If you totally uninstall it, then sme7admin cannot be causing the problem.

According to the wiki do: (although yum remove should be used carefully due to the possibility of removing more than you really want to  ie rpm -e packagename is safer until yum remove gets fixed [at least for sme7]).

yum remove sysstat
yum remove hddtemp
yum remove rrdtool
... but although sme7admin is dodgy it does not seem to indicate that it is the problem here. I think first removing hylafax on this machine as that is what is generating the noise and perhaps the problems as well.
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline janet

  • *****
  • 4,812
  • +0/-0
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #16 on: August 30, 2008, 12:06:43 PM »
cactus

Quote
I think first removing hylafax on this machine as that is what is generating the noise and perhaps the problems as well.

I'm glad you agree, as I did suggest that also.

Quote
At a quick look, I see plenty of errors regarding fax system, perhaps you have left behind various templates that should have been removed. I think you should clean up the old fax install.

I thought tropicalview had said that hylafx had already been removed, but obviously a lot of "dregs" were left behind due to a less than thorough removal.
Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline zatnikatel

  • *****
  • 190
  • +0/-0
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #17 on: August 30, 2008, 06:07:28 PM »
one thing on has talked about the networking maybe the is a problem with the switch seeing that 3 sme server are doing it or a bad cable some were that is causing it some times it can be a simple thing that causes it do all the sme server connect to the same switch try a copy a large 4 gb file over the network and see if it hangs the network share the dar files go to maybe faulty bad network card etc can not handle the high network band width and hands try an etc haard disk if you have one to do the dar back up on and see if you still have the same problem not every thing has to be super complex to cause a problem as an old boss of mine use to say look for the simple things first

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #18 on: August 30, 2008, 06:34:13 PM »
one thing on has talked about the networking maybe the is a problem with the switch seeing that 3 sme server are doing it or a bad cable some were that is causing it some times it can be a simple thing that causes it do all the sme server connect to the same switch try a copy a large 4 gb file over the network and see if it hangs the network share the dar files go to maybe faulty bad network card etc can not handle the high network band width and hands try an etc haard disk if you have one to do the dar back up on and see if you still have the same problem not every thing has to be super complex to cause a problem as an old boss of mine use to say look for the simple things first
Next time try to make use of punctuation as this should not all have been one sentence. Makes reading and understanding your reply a lot easier.
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline zatnikatel

  • *****
  • 190
  • +0/-0
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #19 on: August 30, 2008, 09:09:05 PM »
sorry i will next time was tried when i was writing it.

Offline tropicalview

  • *****
  • 196
  • +0/-0
    • http://www.tropicalview.net
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #20 on: August 31, 2008, 05:21:38 AM »
Hi All,

thanks for the many replies.

and mary, sorry if i do not understand all of your instructions and did not execute them all.
I still do not have that much experience with Linux.

I checked out the cron folders but I'm afraid to edit something there.
And i don't think it's a good idea to copy all the text of all the files and place them on the forum.

I Just made a compressed file of the whole /ect folder and of the /var/log folder, i hope you will be able to take the time to take a look into those files:

http://www.tropicalview.net/smeserver/etc.tgz
http://www.tropicalview.net/smeserver/log.tgz

about the hyla fax issue, i installed hylafax on the test server, and with a dar backup / restore to the production server (after a crash) it copied with it.

on the test server i installed to test and consider to buy a modem for it.
but that didn't work out so good and i uninstalled with the instructions of the contrib page.

somewhere some left overs where on the system, but as i already mention, I'm very carefull in what i do with the command line on Linux. (for sure after the last crash that was my error :?)

after all i do not think the hylafax leftovers are causing this problem because my second server does have the same problem and there i do not have hylafax at all.

as far as i can pinpoint it could be more related to the vmware.
when the vmware is active the risk of haning is bigger than when i shutdown the vmware machines.

I hope the logs and the configs i just send will give some more light into this error, if not please let me know what i need to do do further pinpoint the problem

The sky is not the limit, But when I reach the sky, for sure I will not try to go to the limit.... (donated $25,- upto now)

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #21 on: August 31, 2008, 05:40:38 AM »
as far as i can pinpoint it could be more related to the vmware.

Do not install or use vmware then. vmware is not part of SME server.

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #22 on: August 31, 2008, 05:44:26 AM »
I Just made a compressed file of the whole /ect folder and of the /var/log folder, i hope you will be able to take the time to take a look into those files:

http://www.tropicalview.net/smeserver/etc.tgz
http://www.tropicalview.net/smeserver/log.tgz

Those files are corrupted - perhaps you uploaded them via ftp in text mode. Use binary mode. Or sftp or winSCP.

Offline tropicalview

  • *****
  • 196
  • +0/-0
    • http://www.tropicalview.net
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #23 on: August 31, 2008, 06:09:08 AM »
Hi CharlieBrady,

I reuploaded the files again with bin.
now they work.
but i noticed when i download them for some reason i get .tar files instead of .tgz
and they do not work like that. so i have to rename them.

Perhaps that will aso happen at your place (but also for sure the files where corrupted before)
The sky is not the limit, But when I reach the sky, for sure I will not try to go to the limit.... (donated $25,- upto now)

Offline janet

  • *****
  • 4,812
  • +0/-0
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #24 on: August 31, 2008, 06:27:50 AM »
tropicalview

Quote
I checked out the cron folders but I'm afraid to edit something there.

You were not asked to edit them, just to check (ie read) them, to see what cron jobs ran at what time, to see if they are causing the problem. If they seemed to be a problem, then those cron jobs could be disabled temporarily (or permanently).

You can spend the time going through them !

Quote
about the hyla fax issue, i installed hylafax on the test server, and with a dar backup / restore to the production server (after a crash) it copied with it.
somewhere some left overs where on the system, but as i already mention, I'm very carefull in what i do with the command line on Linux. (for sure after the last crash that was my error :?)

You need to learn what you are doing then, those errors could well be affecting things, at least on the server(s) you are seeing these errors on ie adding to the servers processor and memory load.


Quote
if not please let me know what i need to do do further pinpoint the problem

Did you do all the steps I suggested earlier, I still see the anti virus scan running as a cron job, so you have not disabled that, what about email virus scanning and spam filtering, did you disable those ?
Did you enable RBL ? Have you removed sme7admin ?

How many times do you have to be asked/told, and you give no feedback, as well as expecting others to trawl through Mbytes of log files !!! Stop being lazy and learn to do your own troubleshooting hack work.
Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #25 on: August 31, 2008, 05:15:51 PM »
Stop being lazy and learn to do your own troubleshooting hack work.

Or pay a competent consultant to help you to keep your systems running smoothly.

Offline tropicalview

  • *****
  • 196
  • +0/-0
    • http://www.tropicalview.net
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #26 on: September 04, 2008, 03:30:41 PM »
Hi All,

Sorry it took some time to get back again.
due to serious weather conditions i wasn't able to connect to the server from my location.

Yesterday evening i did a successful backup with the antivirus, anti spam, vmware service disabled and sme7admin uninstalled.

I hope this will stay the case because before sometimes I was able to get trough the process some times.
Tonight i planned to do a backup again with only the VMWARE service enabled and the windows machine in there running.

Thanks for the all the help, and if i get to know where the problem is i will report that back.

Kind regards,
The sky is not the limit, But when I reach the sky, for sure I will not try to go to the limit.... (donated $25,- upto now)

Offline tropicalview

  • *****
  • 196
  • +0/-0
    • http://www.tropicalview.net
Re: system hangs when CPU is 100% used for couple of hours.
« Reply #27 on: September 18, 2008, 10:02:54 PM »
Dear all,

I did a lot of troubleshooting in this issue, mostly by trial and error.

finaly i stripped everything of that i could and the backup was running fine only if i do completely stop the vmware service!!

Just powering down the windows server did not help but ending the service with /etc/init.d/vmwared stop worked.

after that i loaded another 2 gigabyte of memory in the machine. Now everything works great, even with the windows version running!!!!


So i would like to thank you all for the support and if allowed i would like to ask one little question.

As far as i can see DAR should send e-mails out to the admin.
but for some how i do not get any maill, is there any setting needed for that?

Kind regards,


PS. Please give me an indication about what's a good donation amount for this issue, so i can address that in our administration.....


The sky is not the limit, But when I reach the sky, for sure I will not try to go to the limit.... (donated $25,- upto now)