Koozali.org: home of the SME Server
Obsolete Releases => SME Server 7.x => Topic started by: bpivk on November 29, 2006, 11:13:43 PM
-
Hy
I don't know what's wrong with my server but i get a kernel panic message every 6 days or so. I don't know if it could be because of internet traffic or peerguardian runing on other computers.
Here is the error if someone can make something of it...
Nov 29 19:32:07 wegeland kernel: ------------[ cut here ]------------
Nov 29 19:32:07 wegeland kernel: kernel BUG at mm/rmap.c:479!
Nov 29 19:32:07 wegeland kernel: invalid operand: 0000 [#1]
Nov 29 19:32:07 wegeland kernel: SMP
Nov 29 19:32:07 wegeland kernel: Modules linked in: appletalk(U) 8139too ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_state ipt_TOS ip_nat_ftp ip_conntrack_ftp iptable_mangle iptable_nat ip_conntrack iptable_filter ip_tables button battery ac ohci_hcd ehci_hcd mii bonding(U) floppy dm_snapshot dm_zero dm_mirror ext3 jbd raid1 dm_mod
Nov 29 19:32:07 wegeland kernel: CPU: 0
Nov 29 19:32:07 wegeland kernel: EIP: 0060:[page_remove_rmap+35/74] Not tainted VLI
Nov 29 19:32:07 wegeland kernel: EIP: 0060:[<c0152b91>] Not tainted VLI
Nov 29 19:32:07 wegeland kernel: EFLAGS: 00010286 (2.6.9-42.0.2.ELsmp)
Nov 29 19:32:07 wegeland kernel: EIP is at page_remove_rmap+0x23/0x4a
Nov 29 19:32:07 wegeland kernel: eax: ffffffff ebx: 000ffd80 ecx: c12041c0 edx: c10ffd80
Nov 29 19:32:07 wegeland kernel: esi: 00000000 edi: c9501248 ebp: c816abd8 esp: c3d97ec0
Nov 29 19:32:07 wegeland kernel: ds: 007b es: 007b ss: 0068
Nov 29 19:32:07 wegeland kernel: Process sysmon (pid: 3884, threadinfo=c3d97000 task=ca1fce30)
Nov 29 19:32:07 wegeland kernel: Stack: c014c830 07fec067 00000000 c10ffd80 0003c000 0933f000 c12041c0 c84ef680
Nov 29 19:32:07 wegeland kernel: c84ef680 0933f000 093bf000 c9501250 c12041c0 c014c952 00080000 00000000
Nov 29 19:32:07 wegeland kernel: 0933f000 ce9562e8 093bf000 c12041c0 c014c9b1 00080000 00000000 c3d97f78
Nov 29 19:32:07 wegeland kernel: Call Trace:
Nov 29 19:32:07 wegeland kernel: [zap_pte_range+640/841] zap_pte_range+0x280/0x349
Nov 29 19:32:07 wegeland kernel: [<c014c830>] zap_pte_range+0x280/0x349
Nov 29 19:32:07 wegeland kernel: [zap_pmd_range+89/124] zap_pmd_range+0x59/0x7c
Nov 29 19:32:07 wegeland kernel: [<c014c952>] zap_pmd_range+0x59/0x7c
Nov 29 19:32:07 wegeland kernel: [unmap_page_range+60/95] unmap_page_range+0x3c/0x5f
Nov 29 19:32:07 wegeland kernel: [<c014c9b1>] unmap_page_range+0x3c/0x5f
Nov 29 19:32:07 wegeland kernel: [unmap_vmas+241/517] unmap_vmas+0xf1/0x205
Nov 29 19:32:07 wegeland kernel: [<c014cac5>] unmap_vmas+0xf1/0x205
Nov 29 19:32:07 wegeland kernel: [exit_mmap+121/328] exit_mmap+0x79/0x148
Nov 29 19:32:07 wegeland kernel: [<c0150ebb>] exit_mmap+0x79/0x148
Nov 29 19:32:07 wegeland kernel: [mmput+78/114] mmput+0x4e/0x72
Nov 29 19:32:07 wegeland kernel: [<c012079c>] mmput+0x4e/0x72
Nov 29 19:32:07 wegeland kernel: [do_exit+527/1028] do_exit+0x20f/0x404
Nov 29 19:32:07 wegeland kernel: [<c0124739>] do_exit+0x20f/0x404
Nov 29 19:32:07 wegeland kernel: [sys_exit_group+0/13] sys_exit_group+0x0/0xd
Nov 29 19:32:07 wegeland kernel: [<c0124a19>] sys_exit_group+0x0/0xd
Nov 29 19:32:07 wegeland kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Nov 29 19:32:07 wegeland kernel: [<c02d47bf>] syscall_call+0x7/0xb
Nov 29 19:32:07 wegeland kernel: [packet_rcv+394/775] packet_rcv+0x18a/0x307
Nov 29 19:32:07 wegeland kernel: [<c02d007b>] packet_rcv+0x18a/0x307
Nov 29 19:32:07 wegeland kernel: Code: 3c c0 ff 42 10 51 9d c3 89 c2 8b 00 f6 c4 08 74 08 0f 0b dc 01 61 8c 2e c0 f0 83 42 08 ff 0f 98 c0 84 c0 74 2c 8b 42 08 40 79 08 <0f> 0b df 01 61 8c 2e c0 9c 59 fa b8 00 f0 ff ff 21 e0 8b 40 10
Nov 29 19:32:07 wegeland kernel: <0>Fatal exception: panic in 5 seconds
Nov 29 19:32:07 wegeland kernel: bad: scheduling while atomic!
Nov 29 19:32:08 wegeland kernel: [schedule+45/2267] schedule+0x2d/0x8db
Nov 29 19:32:08 wegeland kernel: [<c02d1e71>] schedule+0x2d/0x8db
Nov 29 19:32:08 wegeland kernel: [__mod_timer+257/267] __mod_timer+0x101/0x10b
Nov 29 19:32:08 wegeland kernel: [<c0129e39>] __mod_timer+0x101/0x10b
Nov 29 19:32:08 wegeland kernel: [poke_blanked_console+143/154] poke_blanked_console+0x8f/0x9a
Nov 29 19:32:08 wegeland kernel: [<c020c52c>] poke_blanked_console+0x8f/0x9a
Nov 29 19:32:08 wegeland kernel: [vt_console_print+660/677] vt_console_print+0x294/0x2a5
Nov 29 19:32:08 wegeland kernel: [<c020b8cd>] vt_console_print+0x294/0x2a5
Nov 29 19:32:08 wegeland kernel: [__mod_timer+257/267] __mod_timer+0x101/0x10b
Nov 29 19:32:08 wegeland kernel: [<c0129e39>] __mod_timer+0x101/0x10b
Nov 29 19:32:08 wegeland kernel: [schedule_timeout+313/340] schedule_timeout+0x139/0x154
Nov 29 19:32:08 wegeland kernel: [<c02d2f8d>] schedule_timeout+0x139/0x154
Nov 29 19:32:08 wegeland kernel: [process_timeout+0/5] process_timeout+0x0/0x5
Nov 29 19:32:08 wegeland kernel: [<c012a6de>] process_timeout+0x0/0x5
Nov 29 19:32:08 wegeland kernel: [printk+14/17] printk+0xe/0x11
Nov 29 19:32:08 wegeland kernel: [<c01228ac>] printk+0xe/0x11
Nov 29 19:32:08 wegeland kernel: [die+346/363] die+0x15a/0x16b
Nov 29 19:32:08 wegeland kernel: [<c01060c2>] die+0x15a/0x16b
Nov 29 19:32:08 wegeland kernel: [do_invalid_op+207/242] do_invalid_op+0xcf/0xf2
Nov 29 19:32:08 wegeland kernel: [<c0106425>] do_invalid_op+0xcf/0xf2
Nov 29 19:32:08 wegeland kernel: [page_remove_rmap+35/74] page_remove_rmap+0x23/0x4a
Nov 29 19:32:08 wegeland kernel: [<c0152b91>] page_remove_rmap+0x23/0x4a
Nov 29 19:32:08 wegeland kernel: [buffered_rmqueue+381/421] buffered_rmqueue+0x17d/0x1a5
Nov 29 19:32:08 wegeland kernel: [<c0143fa4>] buffered_rmqueue+0x17d/0x1a5
Nov 29 19:32:08 wegeland kernel: [do_IRQ+418/430] do_IRQ+0x1a2/0x1ae
Nov 29 19:32:08 wegeland kernel: [<c0107ab4>] do_IRQ+0x1a2/0x1ae
Nov 29 19:32:08 wegeland kernel: [free_pages_bulk+459/471] free_pages_bulk+0x1cb/0x1d7
Nov 29 19:32:08 wegeland kernel: [<c014399c>] free_pages_bulk+0x1cb/0x1d7
Nov 29 19:32:08 wegeland kernel: [do_invalid_op+0/242] do_invalid_op+0x0/0xf2
Nov 29 19:32:08 wegeland kernel: [<c0106356>] do_invalid_op+0x0/0xf2
Nov 29 19:32:08 wegeland kernel: [error_code+47/56] error_code+0x2f/0x38
Nov 29 19:32:08 wegeland kernel: [<c02d52b7>] error_code+0x2f/0x38
Nov 29 19:32:08 wegeland kernel: [page_remove_rmap+35/74] page_remove_rmap+0x23/0x4a
Nov 29 19:32:08 wegeland kernel: [<c0152b91>] page_remove_rmap+0x23/0x4a
Nov 29 19:32:08 wegeland kernel: [zap_pte_range+640/841] zap_pte_range+0x280/0x349
Nov 29 19:32:08 wegeland kernel: [<c014c830>] zap_pte_range+0x280/0x349
Nov 29 19:32:08 wegeland kernel: [zap_pmd_range+89/124] zap_pmd_range+0x59/0x7c
Nov 29 19:32:08 wegeland kernel: [<c014c952>] zap_pmd_range+0x59/0x7c
Nov 29 19:32:08 wegeland kernel: [unmap_page_range+60/95] unmap_page_range+0x3c/0x5f
Nov 29 19:32:08 wegeland kernel: [<c014c9b1>] unmap_page_range+0x3c/0x5f
Nov 29 19:32:08 wegeland kernel: [unmap_vmas+241/517] unmap_vmas+0xf1/0x205
Nov 29 19:32:08 wegeland kernel: [<c014cac5>] unmap_vmas+0xf1/0x205
Nov 29 19:32:08 wegeland kernel: [exit_mmap+121/328] exit_mmap+0x79/0x148
Nov 29 19:32:08 wegeland kernel: [<c0150ebb>] exit_mmap+0x79/0x148
Nov 29 19:32:08 wegeland kernel: [mmput+78/114] mmput+0x4e/0x72
Nov 29 19:32:08 wegeland kernel: [<c012079c>] mmput+0x4e/0x72
Nov 29 19:32:08 wegeland kernel: [do_exit+527/1028] do_exit+0x20f/0x404
Nov 29 19:32:08 wegeland kernel: [<c0124739>] do_exit+0x20f/0x404
Nov 29 19:32:08 wegeland kernel: [sys_exit_group+0/13] sys_exit_group+0x0/0xd
Nov 29 19:32:08 wegeland kernel: [<c0124a19>] sys_exit_group+0x0/0xd
Nov 29 19:32:08 wegeland kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Nov 29 19:32:08 wegeland kernel: [<c02d47bf>] syscall_call+0x7/0xb
Nov 29 19:32:08 wegeland kernel: [packet_rcv+394/775] packet_rcv+0x18a/0x307
Nov 29 19:32:08 wegeland kernel: [<c02d007b>] packet_rcv+0x18a/0x307
-
Try replacing the RAM.
Jon
-
I don't think that it's the ram acting up because i didn't have any problems on previous OS.
I googled a little and found out that it could be ram indeed.
So i'll change it and see what happens. The box is runing on 128mb sdram so i think that it's about time to upgrade to 512ddr. :D
-
hi there. i too have been having kernal panics ever since i started messing around with network bonding, using realtek 8139 nics. i am curious if you too are using identical nics as this is what was causing my kernal panic and in your log it mentions network bonding
-
Yes i have two realtec nics just like you.
But i didn't do anything to bond them.
But i did notice that the sistem monitor contrib now shows bond info. But this just appeared. I didn't anything to activate it. The info is empty and it just shows an empty graph.
I read somwhere that this can be turned on if the server is in gateway mode but my server is in server and gateway mode.
Can i change this trough pannel od command line or do i change one of the nic's to stop this.
-
yes. i did not request it to bond. it just happenes automatically. i read that we are supposed to get an option in the console to bond identical nics but so far i have not seen this option. today i am going to try two things.
1-install 2 identical intel nics,
2-reinstall sme twice, once with identical realtek (to see if this bonding option is only in the installer) and again with the intel nics.
i will let you know what happens
-
Well fresh install isn't an option in my case so i'll have to find a way to remove this thing.
I'll wait for someone to tell me how to turn this off or i'll buy a different NIC and just swap one of them.
-
dual realtek cards were the cause of my kernal panic woes. i swapped the nics for dual intels and everything was just fine. btw. i was just doing a clean install on a spare box for testing, and as it turns out the supposed option for nic bonding did not appear in the setup console on either the dual realteks or the dual intels. not sure why people talk about this option because i have never seen it. perhaps it was in a version 7 beta?. anyway my advice is just get rid of one of the realteks and everything will be fine. don't bother trying to remove the bond0. just change one of the cards.
-
i read that we are supposed to get an option in the console to bond identical nics but so far i have not seen this option.
(http://www.magicwilly.webhostingpal.com/ContribsForumPictures/bonding/bond1.png)
(http://www.magicwilly.webhostingpal.com/ContribsForumPictures/bonding/bond2.png)
Available during install(configuration) and server-console.
-
i read that we are supposed to get an option in the console to bond identical nics but so far i have not seen this option.
Available during install(configuration) and server-console.
I get the nic bonding option too, but my servers are running in server-only mode.
John
-
"dual realtek cards were the cause of my kernal panic woes. i swapped the nics for dual intels and everything was just fine. "
brentonv, would you be so kind as to put a bug report about this issue. Whilst it is not an SME issue per se, it would be good to document this in the FAQ or whaqtever for future reference. This sort of information tends to get left behind if only in the forum...
Thanks.
chris
-
I've also seen reports around the place that changing PCI slots may also help.
-
I'll try to change the slots.
But i don't have an option to turn bond on or off. It just turned on and that's it. I didn't do anything. And because i don't have any options to turn it off i can't do that.
P.S.: My server is runing in server and gateway.
-
I'll try to change the slots.
But i don't have an option to turn bond on or off. It just turned on and that's it. I didn't do anything. And because i don't have any options to turn it off i can't do that.
P.S.: My server is runing in server and gateway.
How can you bond 2 nics in server/gateway mode ? One is LAN and the other is WAN.
In server only mode both are LAN.
[root@clean-server-only ~]# ifconfig
bond0 Link encap:Ethernet HWaddr 00:0C:29:AE:51:F2
inet addr:192.168.2.111 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING ALLMULTI MASTER MULTICAST MTU:1500 Metric:1
RX packets:10280 errors:0 dropped:0 overruns:0 frame:0
TX packets:4992 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:9832691 (9.3 MiB) TX bytes:350492 (342.2 KiB)
eth0 Link encap:Ethernet HWaddr 00:0C:29:AE:51:F2
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:8921 errors:0 dropped:0 overruns:0 frame:0
TX packets:5000 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:9652926 (9.2 MiB) TX bytes:351804 (343.5 KiB)
Interrupt:177 Base address:0x1400
eth1 Link encap:Ethernet HWaddr 00:0C:29:AE:51:F2
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:1367 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:180293 (176.0 KiB) TX bytes:0 (0.0 b)
Interrupt:185 Base address:0x1480
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:308 errors:0 dropped:0 overruns:0 frame:0
TX packets:308 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:27916 (27.2 KiB) TX bytes:27916 (27.2 KiB)
[root@clean-server-only ~]#
-
Take a look. And again my server is in "server and gateway" mode and everything works as it should. Apart from kernell errors. :)
bond0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:1.1.1.1 Bcast:1.255.255.255 Mask:255.0.0.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
eth0 Link encap:Ethernet HWaddr 00:50:BF:01:0D:52
inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING ALLMULTI MULTICAST MTU:1500 Metric:1
RX packets:6238578 errors:685 dropped:3592 overruns:352 frame:0
TX packets:6462232 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1848246777 (1.7 GiB) TX bytes:3315664738 (3.0 GiB)
Interrupt:5 Base address:0xdc00
eth1 Link encap:Ethernet HWaddr 00:50:FC:3A:A5:F6
inet addr:89.212.16.101 Bcast:89.xxx.xxx.xxx Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6514401 errors:0 dropped:0 overruns:0 frame:0
TX packets:6348565 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3326750078 (3.0 GiB) TX bytes:1884561109 (1.7 GiB)
Interrupt:11 Base address:0xd800
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:20703 errors:0 dropped:0 overruns:0 frame:0
TX packets:20703 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4136414 (3.9 MiB) TX bytes:4136414 (3.9 MiB)
-
Just did a server/gateway install in a VM and no bond0.
Maybe you did something non-standard.
Time for me to go read the docs again.
From the bug tracker...
http://bugs.contribs.org/show_bug.cgi?id=449#c5
Verified. Console offers nic bonding in server-only mode when 2 nics present.
Nic bonding works and ifconfig shows the bonded nics. Tried several types of
connections to the server (http, https, ssl and vpn) and all worked.
Change to server-gateway and bonding not offered and the nics are configured
properly and ifconfig shows the server back to eth0 and eth1 only. http,
https, ssl and vpn connections still working.
-
Like i said. I installed sme and then it appeared. Nothing non-standard. A few contribs (remote user and stuff and dansguardian) and i removed dansguardian, base, oink, snort and all that stuff.
But this thing just appeared. I don't do much trough console.
I checked my uptime trough sistem monitor contrib and i noticed bond. But i say again..... i didn't do anything to install, setup or run this thing.
I don't even know how to do it and i don't need it. And i don't just go and type commands i don't know in my console.
I read this exact part of bugtracker. But as you can see i have a problem which shouldn't be a problem (ergo kernel panic) and which shouldn't even exist.
I think that i have a posible solution. I'll set my server into server only mode and shut of bond. Then i'll set it to server and gateway and it should work.
-
Ok i switched one card with another and it seems to work.
No more bond when i type ifconfig so i think that this problem is solved. We'll se in 6 days or so if kernel panic is solved or not.
-
Ok i switched one card with another and it seems to work.
No more bond when i type ifconfig so i think that this problem is solved. We'll se in 6 days or so if kernel panic is solved or not.
If everytime you install SME(Gateway/Server) with those NICs you get bond0 then maybe you should create a bug entry.
The developers will then tell you if it's normal or not.
-
Well i didn't notice that untill the kernel errors started to show up. I'll keep my eyes open to see if this is a regular pattern with my cards.
-
Ok when i said that kernel bonding is off. Guess again...
I had a kernel error today and when i checked my logs:
Dec 1 16:23:18 wegeland esmith::event[2496]: expanding /etc/sysconfig/network-scripts/ifcfg-bond0
Dec 1 16:23:27 wegeland esmith::event[2496]: Running event handler: /etc/e-smith/events/bootstrap-console-save/S10rmmod-bonding
1 16:23:27 wegeland kernel: divert: freeing divert_blk for bond0
Dec 1 16:23:28 wegeland net.agent[2694]: remove event not handled
Dec 1 16:23:28 wegeland net.agent[2705]: remove event not handled
Dec 1 16:23:28 wegeland esmith::event[2496]: S10rmmod-bonding=action|Event|bootstrap-console-save|Action|S10rmmod-bonding|Start|1164986607 941924|End|1164986608 142442|Elapsed|0.200518
Dec 3 11:41:51 wegeland kernel: Ethernet Channel Bonding Driver: v2.6.3 (June 8, 2005)
Dec 3 11:41:33 wegeland rc.sysinit: Remounting root filesystem in read-write mode: succeeded
Dec 3 11:41:51 wegeland kernel: bonding: MII link monitoring set to 200 ms
Dec 3 11:41:34 wegeland lvm.static: 2 logical volume(s) in volume group main now active
Dec 3 11:41:51 wegeland kernel: divert: allocating divert_blk for bond0
So i'm back at square 1.
Well bonding started on december 1. and it's still on. I pasted december 3. because i noticed it then but i found the same commands on both days.
-
Ok sorry for reopening my old post but my bond problem (which dissaperaared after a while) just came back and i hade to reboot my server twice today.
-
Ok sorry for reopening my old post but my bond problem (which dissaperaared after a while) just came back and i hade to reboot my server twice today.
Bug 2130 (http://bugs.contribs.org/show_bug.cgi?id=2130)
-
Yes this is my bug post. And i didn't get any reply. And it's hard to live with this problem if your internet connection dies twice a day.
-
bpivk
> And i didn't get any reply.
You did get replies and you were asked to provide full details, which you have not done.
Charlie wants log entries pertinent to the problem you see and entries/information that show it is a bond0 problem. Do you get any error messages ? If so quote these exactly, and also give log file entries from the times those errors occurred.
Start with looking in the /var/log/messages file and also look in others for relevant info.
Quoting:
"Please provide full details of the kernel panic, including exactly when it
occurs. What tells you that it is bond0 which causes the kernel panic?"
-
Well i did search my logs and i didn't find any references to my problem. Some of the logs are posted above (on page 1) and some of the errors (including those that i had today) aren't even loged. Or i can't find them.
-
bpivk
> Some of the logs are posted above (on page 1)
You should post those log entries to the bug tracker (as an attachment). Make it easy for the developers so they don't need to waste time finding & reading forum posts.
> ..some of the errors (including those that i had today) aren't even loged.
You should at least quote the error messages verbatim including the ones you received today (in bugzilla, not here).
If you can't find log entries or don't know where to look, then say so or ask in bugzilla (not here). The developers will guide you re the information to provide, but you must be willing to follow up your original complaint with supporting information.
You can't just say "no one answered me" when you have provided very little information, and not followed up on requested information.
The developers are waiting for your follow up information before they will answer further.
I believe it is standard policy that developers don't waste their time on bug reports that have no follow up feedback provided.
-
Did you check the bugtracker lately?
-I didn't reply because i had problems with my modem.
-I posted two logs (i have mentioned and posted a link to my logs in one of my first posts but if you want it posted then ok).
-I can find the messages that are logged, but some of them aren't.
You can't just say "no one answered me"
Can you quote me on this statement? I didn't get a reply like i don't like links. I did what you told me as soon as i fixed my modem.