I think I have this worked out
- The Kernel has a queueing mechanism (I know stating the obvious - always did)
- It now exposes the ability to kill off blocked tasks (fair queueing) and sets a default of 120 seconds
- If you introduce a VM Task and copy a large file to the disk - or any other that does a long slow write, it is likely to be seen as a Hung Task - this task was a zip of a large file across the network (the things end users do - you would think they would ask the sysadmin first)
- You can mod the behaviour by moding /etc/sysctl.conf (templated of course) - I extended the parameter to 600 - I could see it changing behaviour - htop got as high 30 in the average load - so there was a fair build up
VM Server is end of life anyway, and the way to do things used to be able to vm inside SME, and you still can, this only exposes a limit. Probably better to SME inside the VM ...
See here - playing with the time out:
# cat /etc/e-smith/templates-custom/etc/sysctl.conf/kernel.hung_task_timeout_secs
# To cope with extended timeouts
# with large disk writes from vmware
kernel.hung_task_timeout_secs = 600
(didn't actually test this last part - needs a reboot to check)