Koozali.org: home of the SME Server

blocked for more than 120 seconds

Offline Brave Dave

  • *
  • 185
  • +0/-0
blocked for more than 120 seconds
« on: May 28, 2012, 02:09:54 PM »
Is anyone seeing this from the Kernel:

I have:
uname -a
Linux r300 2.6.18-308.4.1.el5PAE #1 SMP Tue Apr 17 17:47:38 EDT 2012 i686 i686 i386 GNU/Linux


Seems to b discussed here

http://bugs.centos.org/view.php?id=4515

Everything is running sweet, then kabam, the whole lot freezes up, CPU goes ballistic, then everything comes back, but there are a lot of zombie tasks. It is definitely related to server load - just before this happens the server (4core Xeon in this case) is might be running at 4-5 in htop, then spikes up to 20 and higher

I'm seeing it when I run VMServer - others over at centOS are seeing it with other tasks

May 28 14:17:51 r300 kernel: INFO: task vmware-vmx:10685 blocked for more than 120 seconds.
May 28 14:17:51 r300 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 28 14:17:51 r300 kernel: vmware-vmx    D 000123A4  1012 10685      1         10686 10684 (NOTLB)
May 28 14:17:51 r300 kernel:        de20acfc 00003082 64c5bd40 000123a4 00000000 00000000 00000003 0000000a
May 28 14:17:51 r300 kernel:        f778e000 65214ac0 000123a4 005b8d80 00000002 f778e10c c5620908 f7782040
May 28 14:17:51 r300 kernel:        00000001 00000000 c53e0d60 00000000 d20af788 c042d81f c574cc3c ffffffff


then a stack of other debugging stuff ...


.:DB:.

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: blocked for more than 120 seconds
« Reply #1 on: May 28, 2012, 02:45:36 PM »
try booting with another kernel and let us know

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: blocked for more than 120 seconds
« Reply #2 on: May 28, 2012, 03:00:00 PM »
This appears to be a vmware issue, perhaps with the vmware driver, perhaps with the vmware server's I/O performance.

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: blocked for more than 120 seconds
« Reply #3 on: May 28, 2012, 03:06:52 PM »
This appears to be a vmware issue, perhaps with the vmware driver, perhaps with the vmware server's I/O performance.

I follow the italian CentOS' user forum and there are some similar issues.. with httpd, for example..

the strange thing is that, AFAICS, only the latest CentOS' kernel is affected.. but I could be wrong

Offline Brave Dave

  • *
  • 185
  • +0/-0
Re: blocked for more than 120 seconds
« Reply #4 on: May 28, 2012, 11:46:01 PM »
I'm reading - like Stefano - that there is a wider issue bubbling, it seems wider than just VMware

I've rebooted one server with an older Kernel
.:DB:.

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: blocked for more than 120 seconds
« Reply #5 on: May 28, 2012, 11:54:05 PM »
the strange thing is that, AFAICS, only the latest CentOS' kernel is affected.. but I could be wrong

The referenced CentOS and RH bug tracker entries are for much older kernels.

If there is a problem with the current RH kernel, then hopefully they will find and fix the problem promptly.

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: blocked for more than 120 seconds
« Reply #6 on: May 29, 2012, 12:26:35 AM »
the strange thing is that, AFAICS, only the latest CentOS' kernel is affected.. but I could be wrong

No hits on the RH bugzilla that I can see:

site:bugzilla.redhat.com "blocked for more than 120 seconds" 2.6.18-308.4.1.el5

Offline Brave Dave

  • *
  • 185
  • +0/-0
Re: blocked for more than 120 seconds
« Reply #7 on: May 29, 2012, 12:40:33 AM »
I was just curious if it was being seen at all

It came about because of pretty heavy write activity in the VM Machine, a user was using it to do a backup - so it was a large sustained write

Thanks anyway
.:DB:.

Offline Brave Dave

  • *
  • 185
  • +0/-0
Re: blocked for more than 120 seconds
« Reply #8 on: May 29, 2012, 12:34:40 PM »
I think I have this worked out
  • The Kernel has a queueing mechanism (I know stating the obvious - always did)
  • It now exposes the ability to kill off blocked tasks (fair queueing) and sets a default of 120 seconds
  • If you introduce a VM Task and copy a large file to the disk - or any other that does a long slow write, it is likely to be seen as a Hung Task - this task was a zip of a large file across the network (the things end users do - you would think they would ask the sysadmin first)
  • You can mod the behaviour by moding /etc/sysctl.conf (templated of course) - I extended the parameter to 600 - I could see it changing behaviour - htop got as high 30 in the average load - so there was a fair build up

VM Server is end of life anyway, and the way to do things used to be able to vm inside SME, and you still can, this only  exposes a limit. Probably better to SME inside the VM ...

See here - playing with the time out:
 # cat /etc/e-smith/templates-custom/etc/sysctl.conf/kernel.hung_task_timeout_secs
 # To cope with extended timeouts
 # with large disk writes from vmware
 kernel.hung_task_timeout_secs = 600

(didn't actually test this last part - needs a reboot to check)
.:DB:.

Offline Brave Dave

  • *
  • 185
  • +0/-0
Re: blocked for more than 120 seconds
« Reply #9 on: May 29, 2012, 12:37:37 PM »
try booting with another kernel and let us know
didn't help -no change
.:DB:.