I'm assisting a remote colleague with the recovery of an SME server. One of two disks in a hardware-managed RAID 1 set failed in a Dell PowerEdge T100. The RAID controller indicated that the disk was "degraded", and the system failed to boot. The disk was replaced under warranty, and the RAID controller dutifully synchronized the disks after the new one was installed.
SME, when booting, indicates that the system was not shut down cleanly and that it ought to be checked. Due to the apparent severity of the issue, it has not been possible for my colleague to bypass the disk check -- it is forced. However, the check never appears to make progress, even when allowed to run overnight.
Using an installation CD, it is possible to enter Rescue Mode and mount the SME file system. My colleague is going to ensure that his data and configuration are backed up, assuming that his USB backup drive is accessible from Rescue Mode.
After running further hardware diagnostics today, it's possible that the other hard drive is also experiencing errors, which casts doubt (in my mind) on the state of the file system as synchronized with the replacement drive. (When booted alone in the server, this new drive is also incapable of making progress on or completing a file system check.)
What ideas do you have on this situation? What further information could we provide to shed more light on the matter?