Koozali.org: home of the SME Server

Database access going on and off [SOLVED] - NICE TIP!!!!

Offline pablitobs

  • **
  • 35
  • +0/-0
Database access going on and off [SOLVED] - NICE TIP!!!!
« on: February 19, 2009, 05:36:10 AM »
    Hello folks, I am having a weird behaviour on my sme servers I hope some one could figure it out.

    First: my sme structure
    • 1. One SME Server 7.3 as getway
    • 2. One SME Server 7.3 as serveronly mode acting as web application server for intranet
    • 3. Both servers are identical on hardware and new
    • 4. Web host hosted and runing perfectily

    So, this is the problem. the web application server used to connect to the web host server to retrieve and update some db records... it used to works fine, the web server is configured to allow only access from authorized IP's .
    The problem is that the web application server somethings connects to the web server and after some seconds it just hangs.
    If I do a telnet to the mysql server of the web server from the web application server it usually fails some times works some not.
    If I do a telnet to the mysql server of the web server from the getway server it works.... so it means there is no port blocking.
    The web server is working fine.
    I already flush host, and flush squid cache, but still the web applications server works on  and off.

    So is there any way to check why is it hanging and not connecting? is there any configuration on the firewall to check. I've been trying to follow the logs, but I can't find a log that tells me something.

    any help will be appreciated...thanks.


« Last Edit: February 20, 2009, 08:32:23 AM by pablitobs »

Offline janet

  • *****
  • 4,812
  • +0/-0
Re: Database access going on and off
« Reply #1 on: February 19, 2009, 12:56:10 PM »
pablitobs

From the description you give, your problem is not clear to me.

I think you mean you are having trouble accessing mysql remotely.
See
http://wiki.contribs.org/SME_Server:Documentation:FAQ#Access_MySQL_from_the_local_network
and
http://wiki.contribs.org/SME_Server:Documentation:FAQ#Access_MySQL_from_a_remote_network

Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline pablitobs

  • **
  • 35
  • +0/-0
Re: Database access going on and off
« Reply #2 on: February 19, 2009, 02:03:10 PM »
Hi, I know what you mean, but it is not the case, what I am trying to do is to access a normal website's mysql database from my serveronly sme server, but it works on and off, I mean some times running a php script from the sme server it connects fine to the webserver hosted outside my intrantet but after three atemps or less it gives me connection error. So first I figure it was the web server, so I decide to make a test, I open a putty terminal and connect to my getway server and then I telnet the mysql server on the webhost, It works fine each time I tried. then simuntaneosly I open another putty terminal and connect to the other sme server(serveronly mode), from there I telnet the mysql server on the web host, but this time it gives time out error or it connect once in 5 tries. So it means is not the web server and is not the internet connection or a blocked port as the getway server always connects fine to the web server. That is my problem.
What I would like is to know why is this happening as few days ago it was working fine.
Thanks

Offline David Harper

  • *
  • 653
  • +0/-0
  • Watch this space
    • Workgroup Technology Solutions
Re: Database access going on and off
« Reply #3 on: February 19, 2009, 02:35:51 PM »
Do you have anything like Dansguardian installed on the gateway server? Can you ping and tracert to the external web host from the intranet server during an outage?

Offline pablitobs

  • **
  • 35
  • +0/-0
Re: Database access going on and off
« Reply #4 on: February 19, 2009, 02:43:12 PM »
Yes I have Dansguardian installed, but the Ip and the domain are not blocked, in fact if they where I believe I would have no access at all, but it is failing 4 from 5 connections. I can ping the ip and the domain name from the getway as well as telnet.

Offline David Harper

  • *
  • 653
  • +0/-0
  • Watch this space
    • Workgroup Technology Solutions
Re: Database access going on and off
« Reply #5 on: February 19, 2009, 02:44:41 PM »
Try doing a traceroute from the affected machine to the external server when the issue is occuring. This may shed some light on where the connection is timing out.

Offline pablitobs

  • **
  • 35
  • +0/-0
Re: Database access going on and off
« Reply #6 on: February 19, 2009, 02:47:36 PM »
Thanks I will do it and I will post the results here...

Offline pablitobs

  • **
  • 35
  • +0/-0
Re: Database access going on and off
« Reply #7 on: February 20, 2009, 01:38:27 AM »
Here is the traceroute:

traceroute to 69.65.40.154 (69.65.40.154), 30 hops max, 38 byte packets
 1  pc-00001 (192.168.1.1)  0.142 ms  0.091 ms  0.091 ms
 2  118.23.8.0 (118.23.8.0)  3.153 ms  3.303 ms  3.482 ms
 3  118.23.5.5 (118.23.5.5)  3.496 ms  3.080 ms  2.748 ms
 4  125.206.149.245 (125.206.149.245)  4.731 ms  4.569 ms  4.490 ms
 5  60.37.11.41 (60.37.11.41)  2.982 ms  2.843 ms  2.742 ms
 6  210.254.188.141 (210.254.188.141)  3.235 ms  3.336 ms  2.737 ms
 7  210.254.188.146 (210.254.188.146)  3.238 ms  3.311 ms  3.495 ms
 8  210.145.252.186 (210.145.252.186)  4.483 ms  3.331 ms  3.491 ms
 9  ae-5.r21.tokyjp01.jp.bb.gin.ntt.net (129.250.11.53)  3.738 ms  3.860 ms  3.491 ms
Icmp checksum is wrong
10  as-2.r21.snjsca04.us.bb.gin.ntt.net (129.250.5.81)  131.171 msIcmp checksum is wrong
  131.474 msIcmp checksum is wrong
 as-0.r21.lsanca03.us.bb.gin.ntt.net (129.250.3.145)  118.423 ms
11  * * *
12  * * *
13  xe-0.level3.sttlwa01.us.bb.gin.ntt.net (129.250.9.162)  122.802 ms xe-1.level3.sttlwa01.us.bb.gin.ntt.net (129.250.9.210)  99.459 ms xe-0.level3.lsanca03.us.bb.gin.ntt.net (129.250.8.182)  114.283 ms
14  ae-32-52.ebr2.Seattle1.Level3.net (4.68.105.62)  104.327 ms  100.261 ms ae-92-92.ebr2.SanJose1.Level3.net (4.69.134.221)  128.142 ms
15  ae-3.ebr1.Denver1.Level3.net (4.69.132.58)  198.953 ms  184.215 ms ae-2.ebr2.Denver1.Level3.net (4.69.132.54)  213.611 ms
16  ae-1-100.ebr2.Denver1.Level3.net (4.69.132.38)  176.742 ms ae-2.ebr3.SanJose1.Level3.net (4.69.132.9)  125.790 ms ae-3.ebr1.Chicago2.Level3.net (4.69.132.62)  179.203 ms
17  ae-3.ebr1.Chicago2.Level3.net (4.69.132.62)  160.986 ms  161.231 ms  161.128 ms
18  ae-11-53.car1.Chicago1.Level3.net (4.68.101.66)  179.390 ms ae-62-62.ebr2.SanJose1.Level3.net (4.69.134.209)  119.511 ms  135.026 ms
19  ae-11-55.car1.Chicago1.Level3.net (4.68.101.130)  161.628 ms ae-3.ebr1.Denver1.Level3.net (4.69.132.58)  184.191 ms ae-11-55.car1.Chicago1.Level3.net (4.68.101.130)  146.227 ms
20  pos2-1.csr1.Chi3.Servernap.net (69.39.239.170)  179.715 ms  180.994 ms ae0-40.er1.Chi1.Servernap.net (4.79.65.50)  161.619 ms
21  ae-6.ebr1.Chicago1.Level3.net (4.69.140.189)  185.471 ms houston.micfo.com (69.65.40.154)  173.252 ms pos2-1.csr1.Chi3.Servernap.net (69.39.239.170)  147.270 ms

Offline David Harper

  • *
  • 653
  • +0/-0
  • Watch this space
    • Workgroup Technology Solutions
Re: Database access going on and off
« Reply #8 on: February 20, 2009, 02:32:40 AM »
9  ae-5.r21.tokyjp01.jp.bb.gin.ntt.net (129.250.11.53)  3.738 ms  3.860 ms  3.491 ms
Icmp checksum is wrong
10  as-2.r21.snjsca04.us.bb.gin.ntt.net (129.250.5.81)  131.171 msIcmp checksum is wrong
  131.474 msIcmp checksum is wrong

I'm guessing that this might have something to do with your issue. Can we please see two more traceroutes for comparison:

1. From the intranet server to the external server when everything is working correctly

2. From the gateway to the external server when the error is happening

Thanks :)

Offline pablitobs

  • **
  • 35
  • +0/-0
Re: Database access going on and off
« Reply #9 on: February 20, 2009, 02:47:00 AM »
Hi, as per your request
----------------------------------------------------------------------
2. From the gateway to the external server when the error is happening
----------------------------------------------------------------------

traceroute to 69.65.40.154 (69.65.40.154), 30 hops max, 38 byte packets
 1  118.23.8.0 (118.23.8.0)  6.875 ms  3.708 ms  3.219 ms
 2  118.23.5.5 (118.23.5.5)  3.221 ms  3.015 ms  2.701 ms
 3  125.206.149.245 (125.206.149.245)  4.714 ms  4.449 ms  4.742 ms
 4  60.37.11.41 (60.37.11.41)  2.955 ms  3.216 ms  2.977 ms
 5  210.254.188.141 (210.254.188.141)  2.979 ms  3.261 ms  2.725 ms
 6  210.254.188.146 (210.254.188.146)  3.228 ms  3.067 ms  3.225 ms
 7  210.145.252.186 (210.145.252.186)  3.238 ms  3.286 ms  3.480 ms
 8  ae-5.r21.tokyjp01.jp.bb.gin.ntt.net (129.250.11.53)  106.918 ms  3.487 ms  3.232 ms
 9  as-2.r21.snjsca04.us.bb.gin.ntt.net (129.250.5.81)  116.414 ms as-0.r21.lsanca03.us.bb.gin.ntt.net (129.250.3.145)  125.968 ms as-2.r21.snjsca04.us.bb.gin.ntt.net (129.250.5.81)  116.779 ms
     MPLS Label=299792 CoS=6 TTL=1 S=0
10  po-2.r01.lsanca03.us.bb.gin.ntt.net (129.250.3.162)  179.848 ms as-1.r21.sttlwa01.us.bb.gin.ntt.net (129.250.3.87)  123.275 ms *
11  xe-11-1-0.edge1.SanJose3.level3.net (4.68.111.189)  128.300 ms * xe-9-0-0.edge1.SanJose3.level3.net (4.68.110.49)  135.435 ms
12  xe-1.level3.sttlwa01.us.bb.gin.ntt.net (129.250.9.210)  183.340 ms vlan99.csw4.SanJose1.Level3.net (4.68.18.254)  130.977 ms vlan89.csw3.SanJose1.Level3.net (4.68.18.190)  119.894 ms
13  ae-62-62.ebr2.SanJose1.Level3.net (4.69.134.209)  127.100 ms ae-32-52.ebr2.Seattle1.Level3.net (4.68.105.62)  131.300 ms ae-62-62.ebr2.SanJose1.Level3.net (4.69.134.209)  132.882 ms
14  ae-2.ebr2.Denver1.Level3.net (4.69.132.54)  205.076 ms  186.827 ms  214.599 ms
15  ae-63-63.csw1.SanJose1.Level3.net (4.69.134.226)  134.404 ms ae-2.ebr3.SanJose1.Level3.net (4.69.132.9)  131.350 ms ae-63-63.csw1.SanJose1.Level3.net (4.69.134.226)  129.041 ms
16  ae-62-62.ebr2.SanJose1.Level3.net (4.69.134.209)  124.877 ms ae-3.ebr1.Chicago2.Level3.net (4.69.132.62)  161.359 ms ae-62-62.ebr2.SanJose1.Level3.net (4.69.134.209)  131.526 ms
17  ae-11-51.car1.Chicago1.Level3.net (4.68.101.2)  303.539 ms  315.791 ms  212.599 ms
18  ae-3.ebr1.Denver1.Level3.net (4.69.132.58)  190.616 ms  174.020 ms ae0-40.er1.Chi1.Servernap.net (4.79.65.50)  170.891 ms
19  pos2-1.csr1.Chi3.Servernap.net (69.39.239.170)  148.880 ms ae0-40.er1.Chi1.Servernap.net (4.79.65.50)  146.141 ms  161.592 ms
20  pos2-1.csr1.Chi3.Servernap.net (69.39.239.170)  162.878 ms houston.micfo.com (69.65.40.154)  149.128 ms  149.668 ms

---------------------------------------------------------------------------------
1. From the intranet server to the external server when everything is working correctly
--------------------------------------------------------------------------------

traceroute to 69.65.40.154 (69.65.40.154), 30 hops max, 38 byte packets
 1  pc-00001 (192.168.1.1)  0.124 ms  0.090 ms  0.103 ms
 2  118.23.8.0 (118.23.8.0)  3.198 ms  3.115 ms  3.456 ms
 3  118.23.5.5 (118.23.5.5)  2.937 ms  2.626 ms  2.951 ms
 4  125.206.149.245 (125.206.149.245)  4.428 ms  4.909 ms  4.487 ms
 5  60.37.11.41 (60.37.11.41)  2.996 ms  2.617 ms  3.215 ms
 6  210.254.188.141 (210.254.188.141)  2.958 ms  3.130 ms  2.958 ms
 7  210.254.188.146 (210.254.188.146)  2.956 ms  3.156 ms  3.457 ms
 8  210.145.252.186 (210.145.252.186)  3.496 ms  3.159 ms  3.489 ms
 9  ae-5.r21.tokyjp01.jp.bb.gin.ntt.net (129.250.11.53)  3.746 ms  3.850 ms  3.489 ms
Icmp checksum is wrong
10  ae-3.r21.osakjp01.jp.bb.gin.ntt.net (129.250.4.214)  12.499 msIcmp checksum is wrong
 as-0.r21.lsanca03.us.bb.gin.ntt.net (129.250.3.145)  119.534 msIcmp checksum is wrong
  121.023 ms
11  *Icmp checksum is wrong
 as-1.r21.sttlwa01.us.bb.gin.ntt.net (129.250.3.87)  99.660 ms *
12  xe-11-1-0.edge1.SanJose3.level3.net (4.68.111.189)  212.199 ms xe-11-0-0.edge1.SanJose3.level3.net (4.68.111.249)  111.581 ms *
13  vlan89.csw3.SanJose1.Level3.net (4.68.18.190)  136.163 ms xe-0.level3.sttlwa01.us.bb.gin.ntt.net (129.250.9.162)  206.470 ms xe-1.level3.lsanca03.us.bb.gin.ntt.net (129.250.9.86)  116.288 ms
14  ae-32-52.ebr2.Seattle1.Level3.net (4.68.105.62)  106.786 ms ae-93-93.ebr3.LosAngeles1.Level3.net (4.69.137.45)  119.816 ms ae-82-82.ebr2.SanJose1.Level3.net (4.69.134.217)  127.003 ms
15  ae-2.ebr2.Denver1.Level3.net (4.69.132.54)  194.728 ms ae-83-83.ebr3.LosAngeles1.Level3.net (4.69.137.41)  114.800 ms ae-3.ebr1.Denver1.Level3.net (4.69.132.58)  195.191 ms
16  ae-73-73.csw2.SanJose1.Level3.net (4.69.134.230)  129.284 ms ae-1-100.ebr2.Denver1.Level3.net (4.69.132.38)  196.479 ms ae-73-73.csw2.SanJose1.Level3.net (4.69.134.230)  134.501 ms
17  ae-6.ebr1.Chicago1.Level3.net (4.69.140.189)  204.910 ms ae-3.ebr1.Chicago2.Level3.net (4.69.132.62)  161.008 ms ae-83-83.csw3.SanJose1.Level3.net (4.69.134.234)  137.044 ms
18  ae-11-51.car1.Chicago1.Level3.net (4.68.101.2)  374.343 ms ae-3.ebr1.Denver1.Level3.net (4.69.132.58)  181.722 ms ae-6.ebr1.Chicago1.Level3.net (4.69.140.189)  190.738 ms
19  ae-3.ebr1.Denver1.Level3.net (4.69.132.58)  188.244 ms ae0-40.er1.Chi1.Servernap.net (4.79.65.50)  178.792 ms ae-3.ebr1.Denver1.Level3.net (4.69.132.58)  171.763 ms
20  ae0-40.er1.Chi1.Servernap.net (4.79.65.50)  146.506 ms pos2-1.csr1.Chi3.Servernap.net (69.39.239.170)  180.022 ms ae0-40.er1.Chi1.Servernap.net (4.79.65.50)  146.757 ms
21  ae-6.ebr1.Chicago1.Level3.net (4.69.140.189)  181.494 ms ae-3.ebr1.Chicago2.Level3.net (4.69.132.62)  155.782 ms pos2-1.csr1.Chi3.Servernap.net (69.39.239.170)  162.472 ms
22  ae-6.ebr1.Chicago1.Level3.net (4.69.140.189)  186.994 ms  181.500 ms houston.micfo.com (69.65.40.154)  149.129 ms

---------------------------------------------------------------------------------------
On this last traceroute as soon as I got connection between the intranet and the webserver  I generate the traceroute, checked againg the connection from the intranet server to the web server and it was failing ....
Googling this error I found some people believe it is a Centos bug, but everything was working fine on my servers for the past 4 months, and they are twin servers (hardware and software), so it is hard to believe to me it is a bug. I have dansguardian, could it be the reason?....

Hope the data could help you..

Offline David Harper

  • *
  • 653
  • +0/-0
  • Watch this space
    • Workgroup Technology Solutions
Re: Database access going on and off
« Reply #10 on: February 20, 2009, 03:01:58 AM »
---------------------------------------------------------------------------------
2. From the intranet server to the external server when everything is working correctly
--------------------------------------------------------------------------------

 9  ae-5.r21.tokyjp01.jp.bb.gin.ntt.net (129.250.11.53)  3.746 ms  3.850 ms  3.489 ms
Icmp checksum is wrong
10  ae-3.r21.osakjp01.jp.bb.gin.ntt.net (129.250.4.214)  12.499 msIcmp checksum is wrong
 as-0.r21.lsanca03.us.bb.gin.ntt.net (129.250.3.145)  119.534 msIcmp checksum is wrong
  121.023 ms
11  *Icmp checksum is wrong

The icmp checksum issue happens when the connection is working as well, so we can probably eliminate this as a potential error - likely it's just the CentOS bug manifesting itself, as you suggest.

With regards to the additional traces you have provided, you can see from the traces that the data is going via a longer and quite different route when things are not working.

What I suggest is this: let's eliminate SME Server as the cause of your issue. Temporarily replace SME with a garden-variety SOHO broadband router (Netgear, DLink or similar), and see if the issue remains. If your connection problems go away, next try reintroducing a vanilla install of SME Server (no contribs etc., just the base install) on a spare box and trying again.

This may give us a better idea as to what might be behind the problem. I'm leaning towards an ISP issue, but if taking the SME gateway out of the picture solves the problem, then I stand to be corrected.

Offline pablitobs

  • **
  • 35
  • +0/-0
Re: Database access going on and off
« Reply #11 on: February 20, 2009, 03:12:28 AM »
OK, right now the servers are in producciont, I will find a window time today to make the tests and I will get back to you as soon as I get any news....

thanks for your help.

Offline pablitobs

  • **
  • 35
  • +0/-0
Re: Database access going on and off [SOLVED] - NICE TIP!!!!
« Reply #12 on: February 20, 2009, 08:21:43 AM »
Hi guys, finally I found the problem, now the story:

My server is a new dell PowerEdge T300, full of ram and space and all that stuff, but also with two NICs, which when I installed the SME 7.x connect via the bonding option on the configuration panel.
Somehow, the problem was that the kernell can not understand wich of the NICs should be use when a request comes, so it takes a lot of time for him to figure it till the answer is ready, causing, delays, latency and the Icmp checksum is wrong.

I google it a little and it is something called the The ARP Flux Problem (http://linux-ip.net/html/ether-arp.html - Scroll almost to the end.) it happends When a linux box is connected to a network segment with multiple network cards, a potential problem with the link layer address to IP address mapping can occur.

So after I unplug one of the NICs, reconfigure the server to unset the bonding, everthing start working fine.

I asume there is a problem with some kind of cache, because the first configuration with the two NICs was done like 4 months ago, and it was working fine, but after some time it starts to fail a little every day, till two days ago was impossible to reach any server outside the box.

Thanks for the help and the suggestions, I hope this solution could help other people.

Offline David Harper

  • *
  • 653
  • +0/-0
  • Watch this space
    • Workgroup Technology Solutions
Re: Database access going on and off [SOLVED] - NICE TIP!!!!
« Reply #13 on: February 20, 2009, 09:11:45 AM »
I'm glad you sorted it out!

Offline pablitobs

  • **
  • 35
  • +0/-0
Re: Database access going on and off [SOLVED] - NICE TIP!!!!
« Reply #14 on: February 20, 2009, 09:14:40 AM »
me too thanks for your help

Offline Normando

  • *
  • 841
  • +2/-1
    • Unixlan
Re: Database access going on and off [SOLVED] - NICE TIP!!!!
« Reply #15 on: February 21, 2009, 04:47:43 AM »
May be you want to update the wiki with your experience:

http://wiki.contribs.org/KnownProblems#Problem_with_NIC_card_or_integrated_NIC.

Offline pablitobs

  • **
  • 35
  • +0/-0
Re: Database access going on and off [SOLVED] - NICE TIP!!!!
« Reply #16 on: February 21, 2009, 04:38:14 PM »
Sure, it will be a honor, but how can I update the wiki, I went to the link but did not find a way...sorry...

Offline janet

  • *****
  • 4,812
  • +0/-0
Re: Database access going on and off [SOLVED] - NICE TIP!!!!
« Reply #17 on: February 22, 2009, 03:49:06 AM »
pablitobs

Quote
....how can I update the wiki, I went to the link but did not find a way..

If you do not already have wiki edit access, then you must request Wiki edit access by lodging a bug report. See the bugzilla link at top of forums. If you have never used bugzilla before then you will need to register as a new user.

After access has been granted, you login at the top of the wiki page using the same username and password as you use in the Forums. Then you will see the Edit tag alongside each article.
« Last Edit: February 22, 2009, 03:50:51 AM by mary »
Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: Database access going on and off [SOLVED] - NICE TIP!!!!
« Reply #18 on: February 22, 2009, 11:40:27 AM »
If you do not already have wiki edit access, then you must request Wiki edit access by lodging a bug report. See the bugzilla link at top of forums. If you have never used bugzilla before then you will need to register as a new user.
No that is no longer necessary, the procedure has recently been improved and can be found here: http://wiki.contribs.org/Help:Contents
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)