Friday, December 31, 2010

My Little NAS

I'm not sufficiently disciplined to perform backups manually. I admire people that are. It's necessary for me to automate that or it's not going to happen on a sufficiently regular schedule. I've done that using rsync to copy my ~/Documents directory to my home file server. The brief description is that I create a new copy on the first day of every month and then use the rsync --link-dest=DIR option to create incremental backups the remaining days of the month. These incremental backups then get overwritten a month later but the first day of the month backups get preserved forever. Hard drive space is cheap and getting cheaper.

The weak spot in this strategy was the lack of offsite backups. I decided to do something about that. The overall strategy was to colocate a backup server at our son's place so I could backup over the Internet. There are cloud based services that do this, but I'm not sure many support Linux clients. Instead I chose to trade up front equipment costs against a monthly fee. I also didn't want to have to be counting bytes to make sure I didn't exceed any quotas. And finally, I like the thought of using an Ubuntu based NAS where I have full control over operation and setup rather then be limited by the APIs and services provided by a vendor. The backup to the remote is handled by an appropriate rsync command that runs over an ssh connection.

Here is the H/W I chose:
I went with low power drives because I was more concerned about power usage than performance. Backup will be over cable Internet connections where upstream bandwidth is capped so extra performance would be wasted. The WDEARS drive presented an additional wrinkle. It uses 4K sectors but hides that fact from the OS. (More on this later.) The overall scheme was to mirror the two drives and install Ubuntu 10.04 LTS Server. I briefly considered a BSD based NAS package, but it didn't do something I needed. (Boot from a RAID, IIRC.)

Another nicety is that this board seems to reliably support Wake On Lan. (WOL) In fact, it is possible to WOL across the Internet if you can get your router properly configured. That certainly gets into some arcane aspects of routing, but we were able to configure it on my DD-WRT based router as well as my son's (?) D-Link offering. I suggest if you decide to do this, you search for a forum post, wiki, etc. that details how to do it with your equipment. With that information in hand, we found this to be not too difficult.

To install, I downloaded the 32 bit server install CD .iso and used it to create a bootable USB flash drive. The actual installation presented a few wrinkles. As mentioned, the WDEARS lies about physical sector size. The result is that default partitioning will not align partitions on the drive's native sectors. This will degrade performance because writes to the drive will need to be blocked to the actual physical sector size. Reads should suffer less. I had a go at getting the partitioning right but it seemed that the installation tools did not provide that capability. Were I to do this again, I'd either hook up the drive to another system or boot a live CD and partition the drives with whatever tools did this the best. I think that fdisk actually provides that capability now but I could not find that using the tools on the install CD. Since I was not too worried about performance, I settled for the tools at hand and moved on.

I had found a couple descriptions for how to install to a RAIDed boot partition, but these seemed to not be too helpful for 10.04 LTS. RAID is one of those corner cases and I think support for boot may be in some flux. The layout I finally wound up with was separate partitions on each drive which were then combined into RAID devices. I think I could have RAIDED the entire drives but then I would have needed to use LVM to provide separate root and data partitions. I thought life would be simpler without LVM. What finally seemed to work was to install to a single drive for which the partitions were RAIDed. There were two RAID1 partitions which were operating in degraded mode during the install. Once the system was up and running, I added the second drive partitions to the RAIDs and everything was fully working. I think. I did not actually try to remove either drive and verify that the system would still boot. I can't assume that should a drive fail, the system will still boot. It seems likely that the boot process would hang on the failed drive anyway. At worst, I think I might need to boot a live CD/USB to get access to by data should there be a failure. I'm comfortable with that.

Another wrinkle turned out to be ignorance of what my backup commands were doing. As mentioned above, I was using symlinks to reduce backup space when backing up to my local drive. What I did not realize was that the rsync command I used to copy to the remote backup dereferenced the links. I discovered this when I attempted to resize the local backup LVM partition. I wiped out my local backup copy. I found I could not restore my remote backup because it was too big as a result of all of the dereferenced links. I would up biting the bullet and upgrading my local raid (5x 200GB RAID5) to new drives (2x 2TB RAID1.) I have since added the --links option to my rsync command which should fix the expanding storage requirements problem.

I still have a couple issues to deal with. The rsync command on the local PC runs on a user account. because of that, it encounters permission problems for other users that do not make their ~/Documents directory world readable. That could be sidestepped by running the local rsync command as root, but that further complicates credentials on the remote host to allow the ssh connection. (In other words, I could not get that to work.) I suppose the easier way is to modify the perms on the ~/Documents directories. That seems like an acceptable solution on a home LAN used by husband and wife. The other issue is some setup for notification of problems. I don't have sufficient discipline to regularly check the health of various systems. I could automate that, but I need to set up mail delivery to effect that. Years ago I set up sendmail and used mail spools to handle this sort of stuff, but I have left postfix unconfigured. I need to dig into email configuration and sort out how to integrate that into our system where I normally receive my email via gmail.

No comments:

Post a Comment