This is one of a series of computer configuration stories, in reverse chronological order. Others are accessible as follows:home page for this activity.
Assemble a home server which is suitable (in component reliability, power consumption, and other factors) to run continuously.
Bought a refurbished Dell PowerEdge server and began setting it up with Ubuntu Server 16.04 and a 100GB partition on the supplied 1 TB disk. Word to the wise: don’t start doing this unless plugged in to hardwired Ethernet connection, so as to enable it to automatically configure the network. (Interesting prospect: powerline networking to bring a differently-wired connection downstairs?) In the initial iteration, I left it with /etc/network/interfaces using dhcp with a fixed local IP provided by the router, rather than configuring the IP locally on the server, as the latter didn’t immediately seem able to get out to the broader Internet. For reference, the current server (go-ng)’s /etc/network/interfaces has been running with dhcp, as per the following stanzas – I recall that the WLAN clause had worked, but that I'd intentionally commented it out to deactivate:
#The wired Ethernet port
iface eth0 inet dhcp
#The once-primary WLAN
#iface wlan0 inet dhcp
#wpa-ssid (my WLAN ssid elided)
#wpa-psk (my WLAN psk elided)
Needed to adjust the BIOS setting so that it won’t stop booting to show a warning when the side panel has been opened. Nice to see connector labels on the inside panel, and power and data cables for additional SATA drives. Somehow amusing but accurate to have a Dell system box that is provided with “OS: None". I was impressed that a wattmeter showed only about 25 watts of power consumption, though that’ll probably rise when I add more disks.
Next steps: obtain and add more disks, figure out how to setup RAID, apply the Intel management security fix, maybe change the IP address setup.
Applied Intel’s nicely-provided Linux tools for the AMT security vulnerability, carried over on a USB stick; the discovery tool concluded that my system was vulnerable, but the unprovisioning tool then returned a desirable indication that “The system is not provisioned. No further mitigation is needed.” As I understand the vulnerability as described, this means that I could still be vulnerable to malicious provisioning by a local attacker, but (since AMT isn’t now provisioned) not to a network-based attack. Maybe, though, it would be a good time to update the BIOS in any case so I’m not just mitigating. Once I did so, the discovery tool now reports “System is not vulnerable”, which is nice to see.
I was pleased to see how it’s possible to apply Dell BIOS updates without running Windows: A Dell article discusses how to do just that! And, it “just worked” to update the installed 1.0.0 to the latest 1.0.6, with only a little startup fiddling to get the right boot menu to appear and offer the needed “BIOS Flash Update” option. They’ve clearly made noticeable effort to make at least this enterprise product Ubuntu-friendly!
Decided that I’d set the initial Ubuntu partition smaller than I might like. In the course of enlarging it with gparted, found myself deleting and recreating the swap partition. It ended up on the same /dev/sda5, but of course with a different uuid, which (over a few cycles) I updated in /etc/fstab (in a line subsequently commented out) and in /etc/crypttab. Lesson for future: cleaner to move than to recreate. At one point in the process, I got repeated boot errors that asked me to enter a password for cryptswap (?!), but this has apparently been a known bug and several fixes have been posted. It would be easy to find oneself terminally stuck if trying to debug a Linux distribution that has few users, few users posting results, or without an Internet to access such discussions!
Recalling lessons learned from years past, it’s really inconvenient to set up a ubuntu server without access to a wired Ethernet cable, even if there’s a wireless network available. One recommended way to install wireless requires the wireless-tools package which – of course – can’t be obtained by a system that isn’t already networked! Among other motivations, a fairly recent Ubuntu version changed the names of interfaces from predictable values like eth0 and wlan0 to values that are more descriptive of the interface device involved and, hence, less easily guessable without tools. Rather than figuring out how to get around this, I may just wait till I have the additional disks installed and bring it upstairs from the lair then. I don’t expect to run it in “production” with a wireless interface active anyway. Under Ubuntu with only the initial disk, seems to consume about 17 watts at idle. Impressive.
Getting ready to RAID: I think I’ll set up a 2TB /ext4 volume, RAID1 within partitions on each disk. Thinking to put it under /srv. May need to use parted rather than fdisk to create GPT-partitioned disks. I’m hesitant to bulk-encrypt the array, fearing complications or failures if I need to recover it. Instead, will expect selective encryption of some of the data objects stored in it.
On nth thought, I’m wondering whether to encrypt /home in some form. Could use ecryptfs at the file level as I’ve been doing, or try out encrypted RAID in another volume on the soon-to-arrive and soon-to-partition disks. Or, just mitigate by creating a Veracrypt file system on my MAC for particularly sensitive items? There’s something called duplicity which creates encrypted backups in conjunction with rsync. Even with what seems like lots of hardware space, I’d rather not fragment it based on a premature guess of future usage. 2-2.5TB seems like a good number for the initial RAID on /srv. May grow the root partition on the initial disk once again to about 500GB to allow its /home to comfortably stage rsync copies from non-Linux clients.
But, what is (or should be) the threat model here? Is having sensitive bits readable and spinning on a usually-on basis actually less of a concern than physical theft of the media in my living room? It’s unlikely that an Eastern European operative would try to hack me by stealing my disks, and I should be able to overwrite them before disposing of a machine.
Received a pair of 4TB disks, plugged them in and partitioned with gparted, creating a RAID1 array spanning about 2.3T on each of them. Very smooth to start, with few surprises: the how-to at a fine Digital Ocean tutorial basically tells the story. The array creation processing ran for about 4 hours. Impressively, the power consumption with 2 added disks and (clearly!) lots of I/O is still only about 30 watts.
Some confusion about a directive (in another writeup) to set the partition type for the partitions where the RAID is to reside to hex FD00, but this didn’t seem to make sense when I saw that the type value within these GPT-partioned disks was in fact a UUID, and the tools I tried didn’t seem to offer the ability to specify such a change. Now that the array’s being created, fdisk -l shows the partitions’ types as “Linux filesystem”, so hopefully all is OK. I fat-fingered and overwrote /etc/mdadm/mdadm.conf with the ARRAY line rather than appending it – but it seems that ARRAY is the only content that’s actually required in order to get the array properly started at boot time! Maybe fix this by finding the default content for the remainder of the config file, but this isn’t critical priority.
So, I have a RAID array. Next up, need to figure out where and how to place it in the directory tree – under /srv and/or elsewhere. I recall seeing something about being able to set up multiple mount points or equivalent referencing different subdirectories; that may be the right way to go. Symlinks or bind mounts are apparently available alternative approaches, as discussed on Stack Exchange. As one might expect, there are tradeoffs among them...
The BIOS defaulted to (Intel) RAID mode; now that I have a pair of RAID-suitable disks installed in the system, it offers the prospect of selecting such a RAID setup at boot time with a control key, but the consensus du net seems to be that it’s OK to ignore that and use Linux software RAID even with the BIOS-initialized mode set to support Intel RAID. I guess it just remains latent. I note that with the array now present, it seems that the disk I/O LED is continuously flickering; haven’t had the machine on long enough to see if this calms down over time.
Returned from travel. Decided to take a simple route and just mount the array on /srv, then put everything to be backed up into there. Brought it upstairs and plugged it into the network. Needed to install avahi-daemon and adopt/tweak samba configuration file in order to find the machine via “go3.local”. Need to add and format a partition in order to hold the new Time Machine repository, so back downstairs once again to use the screen with gparted. Did that, and set up netatalk to do TM backups onto that partition on one of the NAS disks. Didn’t take the next level plunge to put the TM backup onto another RAID2 array, or to alternate Time Machine backups to partitions on each of the two disks.
So, now I’ve basically recreated the configuration I had before. After running overnight, I see that the machine’s disk I/O light isn’t flickering frequently, and the load average is 0.00, so it’s apparently settled into a deeper idle. I’m still wondering how to handle the backups of my home directories, which currently encrypt (I think – sometimes with ecryptfs it’s not obvious to tell...) onto the home directory on the boot disk. There’s plenty of space there, but it’s not RAID-ed. How should I approach the encryption vs. reliability tradeoff? There are probably plenty of choices, but there’s also plenty of time to think about them... A nightly root cron job to copy the contents of /home/.ecryptfs/jlinn onto the RAID seems like a good choice; I could then verify that the tools that are supposed to be able to unwrap such a folder can actually do so. For completeness, I should probably also set up a cron job to copy the remainder of the boot drive (less the elements that shouldn’t be backed up, or that are already copied elsewhere onto /srv).
Now have the encrypted jlinn home directory files rsync’d to the RAID array on a nightly basis. Should also set up an MTA to get reports on CRON jobs and RAID status, not that I hope or expect for either of them to be particularly interesting. If I were even more anal than I’m currently feeling, I could set up Time Machine to alternately back up to partitions on each of the two NAS drives...
Set up postfix, at least in part – can reach it and send mail via telnet localhost 25, but not telnet go3.local 25. Will see if tonight’s CRON’d rsync generates receivable mail – PS: it did; I got a large email notification addressed to “root”! More broadly, should generalize and rethink how to handle network naming and addressing for a server on the local LAN.
A quick benchmark: sha1sum of a large (2.1 GB) file. Using the “--rfc-3339=ns” option to the “date” command, go3 takes about 3.8 seconds. This compares to about 6.1 seconds on the TNG desktop system, with both running Linux. Md5sum (testing again, on 12 November) is somewhat faster than sha1, and with what appears to be somewhat narrower proportional divergence between the platforms: 3.4 seconds on go3 vs. 4.1 seconds on TNG. Interesting, that; both systems are quad-core Intel processors. (Of course, also note that they’re running different Linux versions, which might contain different levels of optimization!) I think my 2012 MacBook Pro may have the fastest CPU in the house, though; it was able to MD5 a 15GB VM file in 42 seconds, while it took go3 78 seconds to crunch the same result.
A power failure overnight reminded me to install a new UPS ostensibly capable of communicating with the APC daemon and enabling orderly shutdown. I’m not immediately sure how well this is working – oddly enough, the “apctest” program core dumps, and the USB link to the UPS seems prone to indicating dropped communication. Changing the config file entry to “UPSTYPE usb” now seems to yield much more information from “apcaccess status” - will see if it seems more stable. This also lets “apctest” work, but it doesn’t seem that my UPS supports many of the fancy testing functions there. It appears that apcupsd auto-shutdown on powerfail Might Just Work, but I haven’t worked through the testing yet.
Always a learning experience, but it’s great that devices like these have Linux support! As I read apcaccess status, it shows a LOADPCT of 3% and TIMELEFT as 196 minutes, over 3 hours, so it may be at least some time before this combination of large battery and relatively low power consumption should actually run out and attempt to trigger a shutdown. Hm: given the indicated consumption, I’d actually expect more than 3 hours. Wonder what’s up?
In other news, it seems that the Time Machine process from a MacBook Air running El Capitan isn’t happy after the server reboots until and unless the Mac is rebooted as well – per the syslog, it’s trying and failing to do some sort of afp continuation operation. Should check if it’s a Mac version thing, since I haven’t been noticing it on my own Mac. Incentive to leave the server always-on?
Some more experimentation, only partly fruitful. Enabled and then retracted the ability for (my username) to log in via ssh using a public key; it got in, but then required entering the password anyway in order to decrypt the encrypted home directory. And, it stopped rsync from working. Didn’t seem very pointful; undid it by renaming the authorized keys file in /home/.ssh from (my username) to (my username)-disbl.
Turned on the ufw firewall and found lots of log entries coming from multicast traffic, which seems also to be a prerequisite to keeping Time Machine happy. Followed a hint to add a couple of lines to /etc/ufw/before.rules, allowing such traffic. Also installed Apache and figured out how to point the DocumentRoot into the RAID array, now at /srv/www. Some remnants of the old site work, but the cgi’s don’t as of yet.
Trying backup to a USB 3.0 external hard drive, clearly much faster than USB 2.0. Backing up the system’s contents (about 350GB) to an almost-clear NTFS-formatted external drive took about 121 minutes. Trying again with minimal changes to the content took just 4-6 minutes on a couple of tries, so rsync is clearly good at optimizing that case. Nonetheless, it hit a high enough load average (at least when decrypting the encrypted jlinn home directory) so that Time Machine backup wasn’t then able to succeed. Once I’ve finished the timings above, should try “nice”-ing the rsync commands in the backup shell script to see if that helps the situation. With “nice”, I actually got the faster of the two timings above, but this may be a function of what other load was or was not present.