Koozali.org: home of the SME Server

Obsolete Releases => SME Server 8.x => Topic started by: Brave Dave on May 28, 2012, 02:09:54 PM

Title: blocked for more than 120 seconds
Post by: Brave Dave on May 28, 2012, 02:09:54 PM

Is anyone seeing this from the Kernel:

I have:
uname -a Linux r300 2.6.18-308.4.1.el5PAE #1 SMP Tue Apr 17 17:47:38 EDT 2012 i686 i686 i386 GNU/Linux

Seems to b discussed here

http://bugs.centos.org/view.php?id=4515

Everything is running sweet, then kabam, the whole lot freezes up, CPU goes ballistic, then everything comes back, but there are a lot of zombie tasks. It is definitely related to server load - just before this happens the server (4core Xeon in this case) is might be running at 4-5 in htop, then spikes up to 20 and higher

I'm seeing it when I run VMServer - others over at centOS are seeing it with other tasks

May 28 14:17:51 r300 kernel: INFO: task vmware-vmx:10685 blocked for more than 120 seconds. May 28 14:17:51 r300 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 28 14:17:51 r300 kernel: vmware-vmx D 000123A4 1012 10685 1 10686 10684 (NOTLB) May 28 14:17:51 r300 kernel: de20acfc 00003082 64c5bd40 000123a4 00000000 00000000 00000003 0000000a May 28 14:17:51 r300 kernel: f778e000 65214ac0 000123a4 005b8d80 00000002 f778e10c c5620908 f7782040 May 28 14:17:51 r300 kernel: 00000001 00000000 c53e0d60 00000000 d20af788 c042d81f c574cc3c ffffffff

then a stack of other debugging stuff ...

Title: Re: blocked for more than 120 seconds
Post by: Stefano on May 28, 2012, 02:45:36 PM

try booting with another kernel and let us know

Title: Re: blocked for more than 120 seconds
Post by: CharlieBrady on May 28, 2012, 03:00:00 PM

This appears to be a vmware issue, perhaps with the vmware driver, perhaps with the vmware server's I/O performance.

Title: Re: blocked for more than 120 seconds
Post by: Stefano on May 28, 2012, 03:06:52 PM

Quote from: CharlieBrady on May 28, 2012, 03:00:00 PM

This appears to be a vmware issue, perhaps with the vmware driver, perhaps with the vmware server's I/O performance.

I follow the italian CentOS' user forum and there are some similar issues.. with httpd, for example..

the strange thing is that, AFAICS, only the latest CentOS' kernel is affected.. but I could be wrong

Title: Re: blocked for more than 120 seconds
Post by: Brave Dave on May 28, 2012, 11:46:01 PM

I'm reading - like Stefano - that there is a wider issue bubbling, it seems wider than just VMware

I've rebooted one server with an older Kernel

Title: Re: blocked for more than 120 seconds
Post by: CharlieBrady on May 28, 2012, 11:54:05 PM

Quote from: Stefano on May 28, 2012, 03:06:52 PM

the strange thing is that, AFAICS, only the latest CentOS' kernel is affected.. but I could be wrong

The referenced CentOS and RH bug tracker entries are for much older kernels.

If there is a problem with the current RH kernel, then hopefully they will find and fix the problem promptly.

Title: Re: blocked for more than 120 seconds
Post by: CharlieBrady on May 29, 2012, 12:26:35 AM

Quote from: Stefano on May 28, 2012, 03:06:52 PM

the strange thing is that, AFAICS, only the latest CentOS' kernel is affected.. but I could be wrong

No hits on the RH bugzilla that I can see:

site:bugzilla.redhat.com "blocked for more than 120 seconds" 2.6.18-308.4.1.el5

Title: Re: blocked for more than 120 seconds
Post by: Brave Dave on May 29, 2012, 12:40:33 AM

I was just curious if it was being seen at all

It came about because of pretty heavy write activity in the VM Machine, a user was using it to do a backup - so it was a large sustained write

Thanks anyway

Title: Re: blocked for more than 120 seconds
Post by: Brave Dave on May 29, 2012, 12:34:40 PM

I think I have this worked out

The Kernel has a queueing mechanism (I know stating the obvious - always did)
It now exposes the ability to kill off blocked tasks (fair queueing) and sets a default of 120 seconds
If you introduce a VM Task and copy a large file to the disk - or any other that does a long slow write, it is likely to be seen as a Hung Task - this task was a zip of a large file across the network (the things end users do - you would think they would ask the sysadmin first)
You can mod the behaviour by moding /etc/sysctl.conf (templated of course) - I extended the parameter to 600 - I could see it changing behaviour - htop got as high 30 in the average load - so there was a fair build up

VM Server is end of life anyway, and the way to do things used to be able to vm inside SME, and you still can, this only exposes a limit. Probably better to SME inside the VM ...

See here - playing with the time out:
# cat /etc/e-smith/templates-custom/etc/sysctl.conf/kernel.hung_task_timeout_secs
# To cope with extended timeouts
# with large disk writes from vmware
kernel.hung_task_timeout_secs = 600

(didn't actually test this last part - needs a reboot to check)

Title: Re: blocked for more than 120 seconds
Post by: Brave Dave on May 29, 2012, 12:37:37 PM

Quote from: Stefano on May 28, 2012, 02:45:36 PM

try booting with another kernel and let us know

didn't help -no change