Hi, I've been working on very similar issues.
Here are my current plans:
1. Run up 1 or more pieces of server hardware (I currently have 3 servers set up for this);
2. Use OpenVZ to run my workloads in virtual containers on those servers;
3. Use Ceph (or similar) to create shared storage between the servers, by combining the local disks on the servers;
4. Run all my services, including SME Server (or similar) in OpenVZ containers, stored on the Ceph shared storage;
5. In addition to our regular tape backup of the fileserver (only 1.5TB), I will PXE boot our 40-odd workstations each night into a linux kernel running Ceph, and combine the the spare disk capacity of all those workstations to make disk-based, short-term (nightly, weekly and monthly incremental) backups of the fileserver.
* OpenVZ gives me the ability to migrate running containers from one piece of hardware to another in case of hardware failure;
(for those installations where there is more than 1 server);
* my current configuration for the fileserver disk array is in an external SCSI RAID chassis. In the event of a hardware failure of the server connected to the SCSI chassis, I have to shutdown hardware, switch cables, restart hardware, and (possibly) migrate a container. This would still take less than 30 minutes all up;
* For an installation with only 1 server, then a server can be rebuilt in less than an hour by simply copying the container image to new hardware.
* If I have multiple servers. I can even migrate containers to do load-balancing (presuming the only storage that service uses is shared storage);
The backup for the large fileserver data needs to be appropriate to the storage size. We currently have a 16-tape LTO library, which we are about to replace/augment with a backup NAS device.
Hope this helps, and I am interested in others' thoughts.
Cheers!
Nik