Koozali.org: home of the SME Server

How can I keep my files safe?

Offline Gert

  • *****
  • 208
  • +0/-0
    • http://www.huge.co.za
How can I keep my files safe?
« on: May 21, 2007, 11:41:16 PM »
I am running SME Server Installed on a 80gig IDE Drive (hda), then i have 5 x 250gig IDE drives 1 connected on hdc and the other four connected to a 2 port IDE controller card (hd[efgh]). I have a software RAID5 setup on the 250gigs which auto mounts at startup, and 2 ibays created with symlinks to my raid5 array (md3, which is formatted ext3). It is purely used as a File Server.

The reason why I installed the OS on a separate drive is to make it easy to reinstall if it should crash without the need to backup all of the 1TB of data. (Previously my ide controller lost a channel resulting in 2 of the 5 drives dropping out of the raid5 array, It was a nightmare to recover the data. Luckily I succeeded after many sleepless nights.)

I while ago I started noticing that random files are becoming corrupted. Specifically .nrg .iso files. Nero would crash if I mount those images. Then I would recreate the image files from the original cds and it would work again. But eventually they would become corrupt again. Also files that do CRC checks before the run like ZIP files and Installation .EXE files would say Bad CRC and the file is rendered useless. Other files like videos (I do a lot of video editing), photos mp3’s seemed to be OK.

I unmounted the filesystem on md3 and ran fsck wich complained about 64 inodes with duplicate/bad blocks.
fsck took over a 100 hours to complete and I ended up with ALL my files in the lost+found dir. I ran fsck again without any errors or problems.

Now I am left with about 900gb of data in my lost+found, pretty much a total waste. I guess ill search for everything that is irreplaceable and delete the rest.

I ran memtest86 to test my 1gb memory and found no problem.

What can I do to prevent a disaster like this in the future? Should I run fsck and memtest every now and then?
Can anyone point me in the right direction?

Offline Daniel B.

  • *
  • 1,700
  • +0/-0
    • Firewall Services, la sécurité des réseaux
How can I keep my files safe?
« Reply #1 on: May 22, 2007, 11:30:32 AM »
Hi. If the data are really important, I strongly encourage to use hardware raid5. I had several problems with software raid5 and I think it's not a good idea to use it. Now harware raid is not very expensive and SME support a lot of controllers (from a very old mylex dac960 to the last perc5).

Additionnaly, you MUST backup your data, even hardware RAID is not safe enaugh. I use backuppc on all my servers and it has already saved about 100Go of very important data which were on hardware raid5 (2 disks were HS at the same time, they was less than 1 year old).

And don't forget to do offsite backups too (on a remote host, with usb disks or tapes)

Cheers.
C'est la fin du monde !!! :lol:

Offline Gert

  • *****
  • 208
  • +0/-0
    • http://www.huge.co.za
How can I keep my files safe?
« Reply #2 on: May 22, 2007, 12:12:14 PM »
Thanks for the reply.

I am not concerned about raid for now. My concern is file system corruption. Using software raid or hardware raid will not make any difference in that regard.

How can i prevent my file system from becomming corrupt again?
Will it help to run fsck on a regular basis?

Offline Daniel B.

  • *
  • 1,700
  • +0/-0
    • Firewall Services, la sécurité des réseaux
How can I keep my files safe?
« Reply #3 on: May 22, 2007, 12:23:18 PM »
I think the file system issue comes from an error in software raid (as I said, I already have problems with software raid, and most of my files wasn't accessible anymore), or from some bad sectors on disks, I don't think it's an ext3 issue. ext3 is very strong, and each time I had fs errors, it was because one disk was HS. The only other thing that can dammage a ext3 is an unclean shutdown. Well, this is my personnal experience, so maybe others will have different opinion but to conclude, I'd say yes, using software or hardware RAID can make a difference as ext3 relys on the hardware layer (which is emulated by your software RAID in your case)
C'est la fin du monde !!! :lol:

aimed

How can I keep my files safe?
« Reply #4 on: May 28, 2007, 08:47:28 PM »
I had similar problems with 2x160Gb software RAID-1. Suddenly all the .iso's and other files begun corrupting. It turned out that my IDE cable was faulty. I replaced the cables and everything started working. Unfortunately the files were lost but hey.. Life isn't always fair  :wink:

Offline Gert

  • *****
  • 208
  • +0/-0
    • http://www.huge.co.za
How can I keep my files safe?
« Reply #5 on: May 29, 2007, 08:28:25 AM »
aimed

It is interesting that you experienced the same problem with raid1 and not raid5 like mine.  That implies that should you not have used software raid at all you would still encountered the problem due to the faulty ide cable. Now that pretty much take software raid (or hardware raid for that matter) out of the equation.

On the other hand, as far as I know the hdd connected to the faulty ide cable would have been dropped out of the array because software raid does crc checking. Unless both your drives were connected to the same ide cable. Once I had a faulty ide port on a software raid 5 setup and both the drives connected to that port dropped out of the array every time I rebooted. And by watching the drives checksum on the superblock I could see all the checksum errors while copying files that array.

So the question remains: How can one prevent this from happening? Maybe use a file system with crc error checking build in.