My hunch was correct: this method of upgrading a disk is *not* such a good idea.
The synchronisation gets about 98.4% of the way through, then finds some read errors on the original disk. it hangs everything for about three minutes, then simply jumps back to the beginning, synchronising at 0% again. I guess this is now in an endless loop until I can do something about the bad sectors at the end of the disk.
What do I do next? Should I put the old second RAID disk back in for now, and try to recover all data (that won't risk jumping back two days by putting an out-of-date disk back in, will it)? Should I take the server down and scan the single disk with the data on to find and fix hardware errors? Is there something else I should do now?
Any help appreciated.
Edit: in case it means anything, this error gets repeated four times as the server appears to hang:
Apr 7 11:55:56 sme kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Apr 7 11:55:56 sme kernel: ata1.00: (BMDMA stat 0x65)
Apr 7 11:55:56 sme kernel: ata1.00: cmd 25/00:00:cd:2b:5b/00:04:39:00:00/e0 tag 0 cdb 0x0 data 524288 in
Apr 7 11:55:56 sme kernel: res 51/40:ce:ff:2d:5b/40:01:39:00:00/e9 Emask 0x9 (media error)
Apr 7 11:55:56 sme kernel: ata1.00: configured for UDMA/133
Apr 7 11:55:56 sme kernel: ata1.01: configured for UDMA/133
Apr 7 11:55:56 sme kernel: ata1: EH complete
Then a bunch of these for a while:
Apr 7 12:15:25 sme kernel: SCSI error : <0 0 0 0> return code = 0x8000002
Apr 7 12:15:25 sme kernel: Info fld=0x4000000 (nonstd), Invalid sda: sense = 72 11
Apr 7 12:15:26 sme kernel: end_request: I/O error, dev sda, sector 962538605
Apr 7 12:15:26 sme kernel: ata1: EH complete
Apr 7 12:15:26 sme kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Apr 7 12:15:27 sme kernel: ata1.00: (BMDMA stat 0x65)
Apr 7 12:15:31 sme kernel: ata1.00: cmd 25/00:d8:75:2c:5f/00:02:39:00:00/e0 tag 0 cdb 0x0 data 372736 in
Apr 7 12:15:31 sme kernel: res 51/40:66:e7:2d:5f/40:01:39:00:00/e9 Emask 0x9 (media error)
Apr 7 12:15:32 sme kernel: ata1.00: configured for UDMA/133
Apr 7 12:15:33 sme kernel: ata1.01: configured for UDMA/133
I am guessing this is not appropriate to raise as an SME bug, so I am hoping someone here has had some similar experiences and can offer some possible solutions.
Am I right in thinking I am going to have to take the server out of service, and copy the one remaining hard drive to a new hard drive using some other bootable distro? If so, any hints? I've tried copying SME disks before and never managed to produce working disks.