Koozali.org: home of the SME Server

Server only crashes when no one is watching

Offline torrestech

  • **
  • 28
  • +0/-0
    • http://www.torrestech.com.au
Server only crashes when no one is watching
« on: May 23, 2005, 06:31:44 AM »
Now i need some advise with this one.
The stranges thig happens. During normal business hours the server runs like a charm. But at night after everyone logs off the server stops working. I asume that there is a cron job stuffing things up but cannot locate it.
Below is a copy of the errors i find in the log files.
Please heeeelllp meeee!
Adam

May 23 13:58:08 server modprobe: modprobe: Can't locate module block-major-22
May 23 13:58:08 server kernel: resize_dma_pool: unknown device type -1
May 23 13:58:08 server modprobe: modprobe: Can't locate module block-major-22
May 23 13:58:09 server insmod: /lib/modules/2.4.20-18.7/kernel/drivers/block/xd.o: init_module: Cannot allocate memory
May 23 13:58:09 server insmod: Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters.       You may find more information in syslog or the output from dmesg
May 23 13:58:09 server insmod: /lib/modules/2.4.20-18.7/kernel/drivers/block/xd.o: insmod block-major-13 failed
May 23 13:58:09 server kernel: xd: Out of memory.
May 23 13:58:09 server insmod: /lib/modules/2.4.20-18.7/kernel/drivers/block/xd.o: init_module: Cannot allocate memory
May 23 13:58:09 server insmod: Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters.       You may find more information in syslog or the output from dmesg
May 23 13:58:09 server insmod: /lib/modules/2.4.20-18.7/kernel/drivers/block/xd.o: insmod block-major-13 failed
May 23 13:58:09 server modprobe: modprobe: Can't locate module block-major-104
May 23 13:58:10 server last message repeated 7 times
May 23 13:58:10 server modprobe: modprobe: Can't locate module block-major-105
May 23 13:58:10 server last message repeated 7 times
May 23 13:58:10 server modprobe: modprobe: Can't locate module block-major-72
May 23 13:58:11 server last message repeated 7 times
May 23 13:58:11 server modprobe: modprobe: Can't locate module block-major-73
May 23 13:58:11 server last message repeated 7 times
May 23 13:58:11 server modprobe: modprobe: Can't locate module block-major-48
May 23 13:58:12 server last message repeated 7 times
May 23 13:58:12 server modprobe: modprobe: Can't locate module block-major-49
May 23 13:58:20 server crond: crond -HUP succeeded
May 23 13:58:47 server login(pam_unix)[2720]: session closed for user root
---------------------------------------------------------------------------------------------------------
May 23 01:38:02 server kernel: Out of Memory: Killed process 28702 (httpd).
May 23 01:38:05 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=220.245.111.148 DST=220.245.99.26 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=3643 DF PROTO=TCP SPT=4278 DPT=135 WINDOW=65535 RES=0x00 SYN URGP=0
May 23 01:38:12 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=220.245.156.87 DST=220.245.99.26 LEN=48 TOS=0x00 PREC=0x00 TTL=125 ID=19701 DF PROTO=TCP SPT=3446 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0
May 23 01:38:15 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=220.245.156.87 DST=220.245.99.26 LEN=48 TOS=0x00 PREC=0x00 TTL=125 ID=19973 DF PROTO=TCP SPT=3446 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0
May 23 01:38:37 server kernel: Out of Memory: Killed process 28706 (httpd).
May 23 01:38:39 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=61.151.239.123 DST=220.245.99.26 LEN=322 TOS=0x00 PREC=0x00 TTL=49 ID=0 DF PROTO=UDP SPT=59401 DPT=1028 LEN=302
May 23 01:38:39 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=61.151.239.123 DST=220.245.99.26 LEN=322 TOS=0x00 PREC=0x00 TTL=49 ID=0 DF PROTO=UDP SPT=59401 DPT=1026 LEN=302
May 23 01:38:39 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=61.151.239.123 DST=220.245.99.26 LEN=322 TOS=0x00 PREC=0x00 TTL=49 ID=0 DF PROTO=UDP SPT=59401 DPT=1029 LEN=302
May 23 01:38:44 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=61.151.239.55 DST=220.245.99.26 LEN=467 TOS=0x00 PREC=0x00 TTL=49 ID=0 DF PROTO=UDP SPT=56003 DPT=1027 LEN=447
May 23 01:38:44 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=61.151.239.55 DST=220.245.99.26 LEN=467 TOS=0x00 PREC=0x00 TTL=49 ID=0 DF PROTO=UDP SPT=56003 DPT=1029 LEN=447
May 23 01:38:56 server kernel: Out of Memory: Killed process 28708 (httpd).
May 23 01:42:58 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=220.245.174.58 DST=220.245.99.26 LEN=48 TOS=0x00 PREC=0x00 TTL=124 ID=5022 DF PROTO=TCP SPT=2627 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0
May 23 01:43:01 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=220.245.174.58 DST=220.245.99.26 LEN=48 TOS=0x00 PREC=0x00 TTL=124 ID=5348 DF PROTO=TCP SPT=2627 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0
May 23 01:43:18 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=220.245.111.148 DST=220.245.99.26 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=60854 DF PROTO=TCP SPT=3692 DPT=135 WINDOW=65535 RES=0x00 SYN URGP=0
May 23 01:43:21 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=220.245.111.148 DST=220.245.99.26 LEN=48 TOS=0x00 PREC=0x00 TTL=126 ID=61389 DF PROTO=TCP SPT=3692 DPT=135 WINDOW=65535 RES=0x00 SYN URGP=0
May 23 01:43:44 server kernel: Out of Memory: Killed process 8752 (httpd).
May 23 01:43:49 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=218.12.197.181 DST=220.245.99.26 LEN=498 TOS=0x00 PREC=0x00 TTL=45 ID=0 DF PROTO=UDP SPT=34507 DPT=1026 LEN=478
May 23 01:43:49 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=218.12.197.181 DST=220.245.99.26 LEN=498 TOS=0x00 PREC=0x00 TTL=45 ID=0 DF PROTO=UDP SPT=34507 DPT=1027 LEN=478
May 23 01:43:50 server kernel: Out of Memory: Killed process 28701 (httpd).
May 23 01:44:25 server kernel: Out of Memory: Killed process 8832 (httpd).
May 23 13:58:09 server kernel: xd: Out of memory.
...

Offline raem

  • *
  • 3,972
  • +4/-0
Re: Server only crashes when no one is watching
« Reply #1 on: May 23, 2005, 07:56:51 AM »
torrestech

> May 23 01:38:02 server kernel: Out of Memory:  Killed process 28702 (httpd).
> May 23 01:43:50 server kernel: Out of Memory: Killed process 28701 (httpd).
> May 23 01:44:25 server kernel: Out of Memory: Killed process 8832 (httpd).
> May 23 13:58:09 server kernel: xd: Out of memory

Doesn't that tell you something !
What is your server spec, memory etc ?

You might want to implement some of the suggestions here:
http://mirror.contribs.org/smeserver/contribs/rmitchell/smeserver/howto/Mail%20system%20tweaks%20HOWTO%20for%20sme%20server.htm
...

Offline torrestech

  • **
  • 28
  • +0/-0
    • http://www.torrestech.com.au
You are a genious
« Reply #2 on: May 23, 2005, 02:02:11 PM »
Thanks for the tip. At present we have 256Mb ram on a 1.8Ghz CPU. I shall try to bring the Ram up to 512 and see i things are happier. Maybe i should also do a memtest to make sure that the existing ram is good.

Thanks.
Adam
...

Offline m

  • *****
  • 276
  • +0/-0
  • Peet
Re: You are a genious
« Reply #3 on: May 23, 2005, 03:16:09 PM »
Quote from: "torrestech"
At present we have 256Mb ram on a 1.8Ghz CPU. I shall try to bring the Ram up to 512 and see i things are happier. Maybe i should also do a memtest to make sure that the existing ram is good


I don't believe that insufficient RAM is the issue. 256KB is enough for most applications. I have a SME with many web-based applications like webmail, image archive, groupware etc. up and running in a VMWare with 256 RAM assigned for more than 1 1/2 year.  I mostlikely would suspect a PHP (or another cgi) crash/bug. You should look into the httpd access and error logs to find out what web application was running when the server crashed. I had experienced similar with PHP and IMAP, see my bug report 0000196.

Michael

Offline irian

  • *
  • 184
  • +0/-0
Server only crashes when no one is watching
« Reply #4 on: May 23, 2005, 08:55:33 PM »
If you want to know if it's a cronjob try:
crontab -e

MdV

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: Server only crashes when no one is watching
« Reply #5 on: May 24, 2005, 01:03:52 AM »
Quote from: "torrestech"

May 23 13:58:08 server modprobe: modprobe: Can't locate module block-major-22
May 23 13:58:08 server kernel: resize_dma_pool: unknown device type -1
May 23 13:58:08 server modprobe: modprobe: Can't locate module block-major-22
May 23 13:58:09 server insmod: /lib/modules/2.4.20-18.7/kernel/drivers/block/xd.o: init_module: Cannot allocate memory
...


You need to work out what is trying to load all those drivers, and why your system is running out of memory. It's very, very, very unlikely that you have a legitimate reason to be running the xd device driver - you don't have an XT disk controller in your system do you?

Is the first error message you showed us the very first one which might be relevant?

Is your system stock-standard 6.0.x? If not, in what ways have you modified it?

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: Out of memory problems
« Reply #6 on: May 24, 2005, 01:07:51 AM »
Quote from: "mweinber"

I mostlikely would suspect a PHP (or another cgi) crash/bug.


I don't see how that might cause a long list of irrelevant drivers to be loaded.

Quote

You should look into the httpd access and error logs to find out what web application was running when the server crashed. I had experienced similar with PHP and IMAP, see my bug report 0000196.


The killing of httpd could very well be co-incidental - the kernel makes its own choice of what process to kill when it is out of memory.

Offline torrestech

  • **
  • 28
  • +0/-0
    • http://www.torrestech.com.au
System configuration
« Reply #7 on: May 24, 2005, 01:16:11 AM »
We do have a Kouwell KW571B Controller with software mirror of 2 X 200GB ide Drives. 256 MB Ram, and on closer inspection 1.3 AMD CPU. Nothing unusual. Perhaps the controller is faulting?
I will get some more information from the logs today and post them in hope that it will give us some clues..
 Regards,
Adam
...

Offline torrestech

  • **
  • 28
  • +0/-0
    • http://www.torrestech.com.au
Still unable to find the problem
« Reply #8 on: May 25, 2005, 10:11:49 PM »
Sorry for the delay. I still have not tracked down the problem. Though i am statring to wonder about the IDE controller card. I have gathered some more log files in the hope that someone can help me to find a likely cause before i start randomly replacing hardware.
Regards,
Adam

[Tue May 24 07:22:59 2005] [warn] child process 17303 still did not exit, sending a SIGTERM
[Tue May 24 07:22:59 2005] [warn] child process 17386 still did not exit, sending a SIGTERM
[Tue May 24 07:22:59 2005] [warn] child process 17388 still did not exit, sending a SIGTERM
[Tue May 24 07:22:59 2005] [warn] child process 17389 still did not exit, sending a SIGTERM
[Tue May 24 07:22:59 2005] [warn] child process 17391 still did not exit, sending a SIGTERM
[Tue May 24 07:22:59 2005] [warn] child process 17297 still did not exit, sending a SIGTERM
[Tue May 24 07:22:59 2005] [warn] child process 17392 still did not exit, sending a SIGTERM
[Tue May 24 07:22:59 2005] [warn] child process 17393 still did not exit, sending a SIGTERM
[Tue May 24 07:22:59 2005] [warn] child process 17394 still did not exit, sending a SIGTERM
[Tue May 24 07:22:59 2005] [warn] child process 25135 still did not exit, sending a SIGTERM
[Tue May 24 07:25:12 2005] [notice] Apache configured -- resuming normal operations
[Tue May 24 07:25:12 2005] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Tue May 24 07:25:12 2005] [notice] Accept mutex: sysvsem (Default: sysvsem)
--------------------------------------------------------------------------------------------

Viewed at Wed 25 May 2005 02:16:43 PM EST.Linux version 2.4.20-18.7 (bhcompile@bugs.devel.redhat.com) (gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-113)) #1 Thu May 29 06:51:53 EDT 2003
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009d800 (usable)
 BIOS-e820: 000000000009d800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000000f7f0000 (usable)
 BIOS-e820: 000000000f7f0000 - 000000000f7f3000 (ACPI NVS)
 BIOS-e820: 000000000f7f3000 - 000000000f800000 (ACPI data)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
247MB LOWMEM available.
On node 0 totalpages: 63472
zone(0): 4096 pages.
zone(1): 59376 pages.
zone(2): 0 pages.
Kernel command line: auto BOOT_IMAGE=SMEServer-up ro root=901 BOOT_FILE=/boot/vmlinuz-2.4.20-18.7
Initializing CPU#0
Detected 1394.007 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 2778.72 BogoMIPS
Memory: 245288k/253888k available (1160k kernel code, 6224k reserved, 983k data, 120k init, 0k highmem)
Dentry cache hash table entries: 32768 (order: 6, 262144 bytes)
Inode cache hash table entries: 16384 (order: 5, 131072 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer-cache hash table entries: 16384 (order: 4, 65536 bytes)
Page-cache hash table entries: 65536 (order: 6, 262144 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 64K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 0383f9ff c1c3f9ff 00000000 00000000
CPU:             Common caps: 0383f9ff c1c3f9ff 00000000 00000000
CPU: AMD Duron(tm) procuws{ stepping 01
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
PCI: PCI BIOS revision 2.10 entry at 0xfb400, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Using IRQ router VIA [1106/3177] at 00:11.0
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16)
Starting kswapd
VFS: Disk quotas vdquot_6.5.1
Detected PS/2 Mouse Port.
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS0 at 0x03f8 (irq = 4) is a 16550A
ttyS1 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
NET4: Frame Diverter 0.46
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00beta3-.2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PDC20270: IDE controller at PCI slot 00:0a.0
PCI: Found IRQ 11 for device 00:0a.0
PCI: Sharing IRQ 11 with 00:10.1
PDC20270: chipset revision 2
PDC20270: not 100% native mode: will probe irqs later
    ide2: BM-DMA at 0xd000-0xd007, BIOS settings: hde:pio, hdf:pio
    ide3: BM-DMA at 0xd008-0xd00f, BIOS settings: hdg:pio, hdh:pio
VP_IDE: IDE controller at PCI slot 00:11.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: VIA vt8235 (rev 00) IDE UDMA133 controller on pci00:11.1
    ide0: BM-DMA at 0xe400-0xe407, BIOS settings: hda:pio, hdb:DMA
    ide1: BM-DMA at 0xe408-0xe40f, BIOS settings: hdc:pio, hdd:pio
hdb: SAMSUNG CD-ROM SC-152A, ATAPI CD/DVD-ROM drive
hde: ST3200822A, ATA DISK drive
blk: queue c03725c8, I/O limit 4095Mb (mask 0xffffffff)
hdg: ST3200822A, ATA DISK drive
blk: queue c0372a2c, I/O limit 4095Mb (mask 0xffffffff)
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide2 at 0xc000-0xc007,0xc402 on irq 11
ide3 at 0xc800-0xc807,0xcc02 on irq 11
hde: attached ide-disk driver.
hde: task_no_data_intr: status=0x51 { DriveReady SeekComplete Error }
hde: task_no_data_intr: error=0x04 { DriveStatusError }
hde: task_no_data_intr: status=0x51 { DriveReady SeekComplete Error }
hde: task_no_data_intr: error=0x04 { DriveStatusError }
hde: 390721968 sectors (200050 MB) w/8192KiB Cache, CHS=24321/255/63, UDMA(100)
hdg: attached ide-disk driver.
hdg: task_no_data_intr: status=0x51 { DriveReady SeekComplete Error }
hdg: task_no_data_intr: error=0x04 { DriveStatusError }
hdg: task_no_data_intr: status=0x51 { DriveReady SeekComplete Error }
hdg: task_no_data_intr: error=0x04 { DriveStatusError }
hdg: 390721968 sectors (200050 MB) w/8192KiB Cache, CHS=24321/255/63, UDMA(100)
ide-floppy driver 0.99.newide
Partition check:
 hde: hde1 hde2 hde3
 hdg: hdg1 hdg2 hdg3
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
 [events: 000000a0]
 [events: 000000a3]
 [events: 000000a0]
 [events: 000000a0]
 [events: 000000a3]
 [events: 000000a0]
md: autorun ...
md: considering hdg3 ...
md:  adding hdg3 ...
md:  adding hde3 ...
md: created md2
md: bind<hde3,1>
md: bind<hdg3,2>
md: running: <hdg3><hde3>
md: hdg3's event counter: 000000a0
md: hde3's event counter: 000000a0
md: md2: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md2 stopped.
md: unbind<hdg3,1>
md: export_rdev(hdg3)
md: unbind<hde3,0>
md: export_rdev(hde3)
md: considering hdg2 ...
md:  adding hdg2 ...
md:  adding hde2 ...
md: created md1
md: bind<hde2,1>
md: bind<hdg2,2>
md: running: <hdg2><hde2>
md: hdg2's event counter: 000000a3
md: hde2's event counter: 000000a3
md: md1: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md1 stopped.
md: unbind<hdg2,1>
md: export_rdev(hdg2)
md: unbind<hde2,0>
md: export_rdev(hde2)
md: considering hdg1 ...
md:  adding hdg1 ...
md:  adding hde1 ...
md: created md0
md: bind<hde1,1>
md: bind<hdg1,2>
md: running: <hdg1><hde1>
md: hdg1's event counter: 000000a0
md: hde1's event counter: 000000a0
md: md0: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md0 stopped.
md: unbind<hdg1,1>
md: export_rdev(hdg1)
md: unbind<hde1,0>
md: export_rdev(hde1)
md: ... autorun DONE.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 2048 buckets, 16Kbytes
TCP: Hash tables configured (established 16384 bind 32768)
Linux IP multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 130k freed
VFS: Mounted root (ext2 filesystem).
md: raid1 personality registered as nr 3
Journalled Block Device driver loaded
md: Autodetecting RAID arrays.
 [events: 000000a0]
 [events: 000000a0]
 [events: 000000a3]
 [events: 000000a3]
 [events: 000000a0]
 [events: 000000a0]
md: autorun ...
md: considering hde1 ...
md:  adding hde1 ...
md:  adding hdg1 ...
md: created md0
md: bind<hdg1,1>
md: bind<hde1,2>
md: running: <hde1><hdg1>
md: hde1's event counter: 000000a0
md: hdg1's event counter: 000000a0
md: md0: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md0: max total readahead window set to 124k
md0: 1 data-disks, max readahead per data-disk: 124k
raid1: device hde1 operational as mirror 0
raid1: device hdg1 operational as mirror 1
raid1: raid set md0 not clean; reconstructing mirrors
raid1: raid set md0 active with 2 out of 2 mirrors
md: updating md0 RAID superblock on device
md: hde1 [events: 000000a1]<6>(write) hde1's sb offset: 104320
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 100 KB/sec/disc.
md: using maximum available idle IO bandwith (but not more than 10000 KB/sec) for reconstruction.
md: using 124k window, over a total of 104320 blocks.
md: hdg1 [events: 000000a1]<6>(write) hdg1's sb offset: 104320
md: considering hde2 ...
md:  adding hde2 ...
md:  adding hdg2 ...
md: created md1
md: bind<hdg2,1>
md: bind<hde2,2>
md: running: <hde2><hdg2>
md: hde2's event counter: 000000a3
md: hdg2's event counter: 000000a3
md: md1: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md1: max total readahead window set to 124k
md1: 1 data-disks, max readahead per data-disk: 124k
raid1: device hde2 operational as mirror 0
raid1: device hdg2 operational as mirror 1
raid1: raid set md1 not clean; reconstructing mirrors
raid1: raid set md1 active with 2 out of 2 mirrors
md: updating md1 RAID superblock on device
md: hde2 [events: 000000a4]<6>(write) hde2's sb offset: 194988864
md: delaying resync of md1 until md0 has finished resync (they share one or more physical units)
md: hdg2 [events: 000000a4]<6>(write) hdg2's sb offset: 194988864
md: considering hde3 ...
md:  adding hde3 ...
md:  adding hdg3 ...
md: created md2
md: bind<hdg3,1>
md: bind<hde3,2>
md: running: <hde3><hdg3>
md: hde3's event counter: 000000a0
md: hdg3's event counter: 000000a0
md: md2: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md2: max total readahead window set to 124k
md2: 1 data-disks, max readahead per data-disk: 124k
raid1: device hde3 operational as mirror 0
raid1: device hdg3 operational as mirror 1
raid1: raid set md2 not clean; reconstructing mirrors
raid1: raid set md2 active with 2 out of 2 mirrors
md: updating md2 RAID superblock on device
md: hde3 [events: 000000a1]<6>(write) hde3's sb offset: 264960
md: delaying resync of md2 until md0 has finished resync (they share one or more physical units)
md: hdg3 [events: 000000a1]<6>(write) hdg3's sb offset: 264960
md: ... autorun DONE.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: md(9,1): orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 213018
ext3_orphan_cleanup: deleting unreferenced inode 22528021
ext3_orphan_cleanup: deleting unreferenced inode 213009
EXT3-fs: md(9,1): 3 orphan inodes deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 120k freed
Adding Swap: 264952k swap-space (priority -1)
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
usb-uhci.c: $Revision: 1.275 $ time 06:56:59 May 29 2003
usb-uhci.c: High bandwidth mode enabled
PCI: Found IRQ 12 for device 00:10.0
PCI: Sharing IRQ 12 with 00:0d.0
PCI: Sharing IRQ 12 with 00:12.0
usb-uhci.c: USB UHCI at I/O 0xd800, IRQ 12
usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 2 ports detected
PCI: Found IRQ 11 for device 00:10.1
PCI: Sharing IRQ 11 with 00:0a.0
usb-uhci.c: USB UHCI at I/O 0xdc00, IRQ 11
usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 2
hub.c: USB hub found
hub.c: 2 ports detected
PCI: Found IRQ 5 for device 00:10.2
PCI: Sharing IRQ 5 with 00:11.5
usb-uhci.c: USB UHCI at I/O 0xe000, IRQ 5
usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 3
hub.c: USB hub found
hub.c: 2 ports detected
usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
PCI: Found IRQ 9 for device 00:10.3
ehci-hcd 00:10.3: VIA Technologies, Inc. USB 2.0
ehci-hcd 00:10.3: irq 9, pci mem d0073000
usb.c: new USB bus registered, assigned bus number 4
PCI: 00:10.3 PCI cache line size set incorrectly (32 bytes) by BIOS/FW.
PCI: 00:10.3 PCI cache line size corrected to 64.
ehci-hcd 00:10.3: USB 2.0 enabled, EHCI 1.00, driver 2003-Jan-22
hub.c: USB hub found
hub.c: 6 ports detected
md: md0: sync done.
md: syncing RAID array md2
md: minimum _guaranteed_ reconstruction speed: 100 KB/sec/disc.
md: using maximum available idle IO bandwith (but not more than 10000 KB/sec) for reconstruction.
md: using 124k window, over a total of 264960 blocks.
md: delaying resync of md1 until md2 has finished resync (they share one or more physical units)
EXT3 FS 2.4-0.9.19, 19 August 2002 on md(9,1), internal journal
spurious 8259A interrupt: IRQ7.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on md(9,0), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
hdb: attached ide-cdrom driver.
hdb: ATAPI 52X CD-ROM drive, 128kB Cache, DMA
Uniform CD-ROM driver Revision: 3.12
SCSI subsystem driver Revision: 1.00
scsi0 : SCSI host adapter emulation for IDE ATAPI devices
hdb: DMA disabled
...

Offline raem

  • *
  • 3,972
  • +4/-0
Re: Server only crashes when no one is watching
« Reply #9 on: May 26, 2005, 01:57:11 AM »
torrestech

> During normal business hours the server runs like a charm.
> But at night after everyone logs off the server stops working.

> May 23 01:43:49 server kernel: denylog:IN=ppp0 OUT= MAC= SRC=218.12.197.181 DST=220.245.99.26 LEN=498 TOS=0x00 PREC=0x00 TTL=45 ID=0 DF PROTO=UDP SPT=34507 DPT=1027 LEN=478


Isn't that a (unsuccessful) UDP connection on port 1027 ?
see http://forums.contribs.org/index.php?topic=26828.0



> May 23 01:43:50 server kernel: Out of Memory: Killed process 28701 (httpd).
> May 23 01:44:25 server kernel: Out of Memory: Killed process 8832 (httpd).
> May 23 13:58:09 server kernel: xd: Out of memory

So you are hit with many requests that use up your memory, perhaps also many virus & spam messages are being processed & caught by clamav & spamassassin which will use up 256Mb RAM quite easily.

Do you have clamav & spamassassin enabled ?

The out of memory error occurs before (date & time wise) the modprobe messages.
Is the out of memory contributing to modprob errors or the other way around ?
As you say the system runs OK during office hours, perhaps your server is being hit hard at night ?


> May 23 13:58:08 server modprobe: modprobe: Can't locate module block-major-22
> May 23 13:58:08 server kernel: resize_dma_pool: unknown device type -1
> May 23 13:58:08 server modprobe: modprobe: Can't locate module block-major-22
> May 23 13:58:09 server insmod: /lib/modules/2.4.20-18.7/kernel/drivers/block/xd.o: init_module: Cannot allocate memory
> May 23 13:58:09 server insmod: Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters.       You may find more information in syslog or the output from dmesg


If you are running clam & spamassassin you probably should increase RAM significantly and also implement the suggestions in the Mail system tweaks HOWTO link provided earlier.

You may have multiple problems so fixing one and eliminating it will then lead you to (hopefully) find the other(s).
...

Offline torrestech

  • **
  • 28
  • +0/-0
    • http://www.torrestech.com.au
Only crashes when no one is watching
« Reply #10 on: July 11, 2005, 08:16:55 AM »
Ok i have been away for 3 of weeks and guess what! The server never crashed once. But as soon as i came back i started doing some ftp access to the website and strait away it crashed, and has now been crashing on a daily basis. It even dropped one partition from the raid.
I am going today to change the controller and add more RAM but beyond that i am lost in the wilderness.
Suggestions always welcome to help me solve this mystery.
...

Offline warren

  • *
  • 293
  • +0/-0
Server crashes when no one watching..insmod block-major-13
« Reply #11 on: July 28, 2005, 01:06:09 AM »
torrestech,

This is caused by the raidmonitor programme :
If found this out after searching RH site for errors loading xd.so https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=52013
The error is reproducible if one runs sfdisk -l , then check var/log/messages straight away.

Some detective work ( and hint from Charlie : "You need to work out what is trying to load all those drivers...) led me to checking out /usr/local/bin/raidmonitor..
line 57: /sbin/sfdisk -d > $RMDIR/sfdisk.out

Can anyone else confirm that they get the same results ?

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: Server crashes when no one watching..insmod block-major-
« Reply #12 on: July 28, 2005, 10:42:31 PM »
Quote from: "warren"


Some detective work ( and hint from Charlie : "You need to work out what is trying to load all those drivers...) led me to checking out /usr/local/bin/raidmonitor..
line 57: /sbin/sfdisk -d > $RMDIR/sfdisk.out


As a workaround, add:

alias block-major-13 off

to /etc/modules.conf. That'll stop the loading of the xd module. Ditto for all the other block-major-nn that it's complaining about.

But I think "xd: Out of memory" is only a symptom. Something else is using up all your memory, and causing the crashes.