Koozali.org: home of the SME Server

Frequent unexpected reboots

Offline AJB

  • *
  • 17
  • +0/-0
Frequent unexpected reboots
« on: July 11, 2008, 01:45:51 PM »
Hi all,

I'm having a problem with my Compaq ProLiant 1600 box running SME Server 7.3, fully updated. The thing keeps rebooting on me every now and then for no apparent reason. The time interval between reboots varies from some 20 minutes or so till a couple of hours. The time at which the reboot occurs is not related to any of the cronjobs as far as I know. I have made no changes to the configuration, nor did I install, change or remove any contribs around the time this behaviour started. In short: I don't have the slightest idea where to search for a solution.

Of course I am more than happy to provide logs upon request, or other data that might help.

Thanks in advance for helping me figure this one out.

Offline cirkit

  • ****
  • 73
  • +0/-0
Re: Frequent unexpected reboots
« Reply #1 on: July 11, 2008, 01:50:27 PM »
frequent reboots attribute to either ram problems or overheating problems. check your bios settings, check cpu, system temperatures and fan responses, if you have centos live cd boot with it and on prompt type memtest to conduct full ram tests. it seems more of a hardware than a software issue

Offline AJB

  • *
  • 17
  • +0/-0
Re: Frequent unexpected reboots
« Reply #2 on: July 11, 2008, 02:11:36 PM »
Thanks for the response. I'll check the bios and hardware as you suggested. One question, though: the box has ECC RAM, would that make a difference? I mean, even if error correction does not work properly, would one not expect some log entry to be written if a memory error occurs? I'm asking because it's a production server, and while it is not business critical, running a reliable memtest takes quite some time. If possible, I would therefore like to pinpoint the problem as exactly as possible before taking the box down altogether.

Thanks.

Offline AJB

  • *
  • 17
  • +0/-0
Re: Frequent unexpected reboots
« Reply #3 on: July 11, 2008, 04:47:26 PM »
OK, I've checked the hardware for errors, mechanical or otherwise. All the fans that should be running are running, temps are normal, and I ran 4 consecutive passes of the HP Server Diagnostics disk; no errors. I'm running memtest right now for more extensive memory testing, but I am starting to get fairly certain that this is no hardware issue.

So, once again, I'm fresh out of options. Any help or suggestion is greatly appreciated.

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: Frequent unexpected reboots
« Reply #4 on: July 11, 2008, 05:40:30 PM »
So, once again, I'm fresh out of options. Any help or suggestion is greatly appreciated.
If you have some spare RAM you could replace the RAM in the dodgy server and see if the server will be running more stable. You can then also test the original RAM in a other system which you do not need at the moment (all supposing you have spare parts and systems).
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline AJB

  • *
  • 17
  • +0/-0
Re: Frequent unexpected reboots
« Reply #5 on: July 11, 2008, 05:51:33 PM »
Don't have spare parts (as in: the same hardware). (If I did, I would have switched to other hardware by now ;). I do have a backup server running Affa, but that box is considerably less well spec'd, so I am a bit reluctant to make that my production server.)

But even so, it really doesn't seem to be hardware related. Still running memtest, still no errors. I am in GMT +1, so it's almost the end of the day (and week for that matter) which means I can keep running memtest for a little while longer. Still, I am leaning more and more to a software issue.

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: Frequent unexpected reboots
« Reply #6 on: July 11, 2008, 05:56:28 PM »
Don't have spare parts (as in: the same hardware). (If I did, I would have switched to other hardware by now ;). I do have a backup server running Affa, but that box is considerably less well spec'd, so I am a bit reluctant to make that my production server.)

But even so, it really doesn't seem to be hardware related. Still running memtest, still no errors.
I find the software issues hard to believe as there should be clues leading to that, if you can not find entries in your log files at the or around the time of reboot I tend to say it is not software related. I very rarely, more like never ever, had software issues cause frequent and unexpected reboots

I am in GMT +1, so it's almost the end of the day (and week for that matter) which means I can keep running memtest for a little while longer. Still, I am leaning more and more to a software issue.
I am in GMT +1 as well and am already enjoying my weekend ;-)
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline AJB

  • *
  • 17
  • +0/-0
Re: Frequent unexpected reboots
« Reply #7 on: July 11, 2008, 06:12:57 PM »
Hmm, up until this morning I was imagining myself enjoying a well-earned beer by now, but apparently fate decided otherwise today... ;).

As far as the logs are concerned: it's not that I can positively say there aren't any clues, it's just that I have no idea where to look for them. I checked the messages log which showed nothing that I consider to be out of the ordinary. I am, however, no expert in log file analysis.

That being said, I totally understand why you are reluctant to believe that the problem is not hardware-related. That's also why I spent the afternoon testing it, as it would be the most likely source of the problem. When testing the hardware doesn't result in any errors, however, troubleshooting becomes quite difficult.

Offline imcintyre

  • *
  • 609
  • +0/-0
Re: Frequent unexpected reboots
« Reply #8 on: July 11, 2008, 06:17:52 PM »
Time to report a bug? I checked and there doesn't appear to be anything like this.

Offline cirkit

  • ****
  • 73
  • +0/-0
Re: Frequent unexpected reboots
« Reply #9 on: July 11, 2008, 08:15:51 PM »
disable all usb ports on your motherboard via the bios. Then check for 2-3 hours. I have seen such behaviour with some usb's.

Offline AJB

  • *
  • 17
  • +0/-0
Re: Frequent unexpected reboots
« Reply #10 on: July 11, 2008, 08:22:09 PM »
The server has no USB ports, and I disabled the USB support during bootup a long time ago: the thing would hang on shutdown with USB support enabled. (Don't ask, I spent several days to figure that one out last year, and as it is I know it works but still have no idea why :?.)

Offline imcintyre

  • *
  • 609
  • +0/-0
Re: Frequent unexpected reboots
« Reply #11 on: July 11, 2008, 08:56:22 PM »
Stupid question time

Perhaps your AC or power supply is wonky. Is it plugged into the wall securely?  In your bios, do you have ( I forget the exact term) "Reboot on restore of AC" activated. You may want to turn this feature off as an experiment.


Offline AJB

  • *
  • 17
  • +0/-0
Re: Frequent unexpected reboots
« Reply #12 on: July 11, 2008, 09:20:26 PM »
Hmm, don't think the problem is with the power cables and/or outlets; a ProLiant has three PSU's and tolerates one of them to malfunction. I will turn the bios setting off, though, for that would reveal any problems in the power supply further upstream. Thanks for the suggestion, and I'll keep you all posted.

Offline arnie25

  • *
  • 16
  • +0/-0
Re: Frequent unexpected reboots
« Reply #13 on: July 15, 2008, 11:46:04 AM »
Try to switch LAN card for internal and external traffic.
I remember having such problems, when for some reason my external LAN card hanged and caused server stuck and reboot.
...

Offline AJB

  • *
  • 17
  • +0/-0
Re: Frequent unexpected reboots
« Reply #14 on: July 15, 2008, 02:00:18 PM »
Hi,

Thanks for the suggestion. I'm starting to think however that the problem pretty much somehow resolved itself. I brought the server back up last Friday after running all sorts of hardware tests, and it has been up ever since.

Not too sure if I'm entirely happy with this, as I am still at a loss as to what caused these reboots, but as long as it keeps running it is neither possible nor necessary to troubleshoot the issue.

I think we can call this one resolved for now, I guess. Thanks for all the responses, guys!