Home Servers, Tunneling, etc
As a follow-up to my post earlier this week, I'll discuss some other interesting things about setting up a home server.
Unfortunately, the technology here is a little bit opaque, and I'm not really aware of any good documentation that exists on how to set up servers that is newbie friendly. Most of the writing here doesn't really start from first principles, and a lot of what you'll find is aimed at super knowledgeable people, or people like IT systems administrators.
There's a lot of stuff on Internet forums and on Reddit and on various peoples blogs. And when I figure this stuff out I do a lot of Googling and visiting Reddit threads, and visiting Stack Overflow threads.
I've thought about writing a bit more about how the technology works and how to set up this type of server. But this is not something i've done yet
Setting up HTTPS has become a lot easier than it used to be. Caddy, which I use for the reverse proxy in my server basically handles SSL without me having to do much. There's also a helper for NGINX which deals with a lot of the setting up the reverse proxy and setting up SSL.
The existence of Let's Encrypt has basically eliminated the need to buy SSL certificates from designated certificate authorities, and it's what the tools I mentioned above are built on top of.
The security situation is kind of a mixed bag, there are some tools I ran into that have super insecure default configurations, fortunately the security of the most common software programs has improved a lot compared to where it used to be. Most of the big tools that you'll run into like Web servers and so on are pretty much secure by default, you would have to actively change the configuration in undesirable ways to make it insecure.
And I think container programs like Docker and so on also help a lot with security, basically every application I have running on my server has its own docker container. The Caddy reverse proxy works as the glue between these containers.
Docker is a way of packaging software programs with needed libraries and dependencies, it functions in a very VM like way – there is a high level of isolation between the different containers by default. This isolates security issues, if one of the services running on the server gets owned it's hard for the hacker to privilege escalate to the rest of the server, so it's possible to just deal with the security issue by just nuking that one container and starting fresh.
Additionally, since there are a lot of docker images that are packaged either by the developers of the software or by someone else upstream, it's pretty easy to find a docker container where everything's packaged into a pretty secure by default container.
For backups, I use the Duplicati tool, set to make daily backups of the server. It's possible to back up to a portable hard drive, or to another server with Duplicati on it that's off-site. I haven't taken any of these purist paths, and I have taken the more non-self hosted route of uploading my data to a cloud storage provider (in this case Wasabi).
Duplicati is capable of encrypting the backups before they go to the cloud storage provider, or friend's server, or whatever else you're using for your remote backup.
There are two ways to connect the server to the outside world.
The traditional way, what I used, is to get a static IP address from your ISP. AT&T, who I use for my Internet, sells static IP addresses in a /29 block, that is six usable IP addresses, unfortunately, they won't give you just one static IP address. Additionally, I still have access to one dynamic IP address from them.
My router/gateway/modem gets assigned one of the static IP addresses, the home server gets assigned another, basically every other device on my network gets put behind the dynamic IP address.
Even for standard home dynamic IP addresses, IP address geolocation, at least what you can do from publicly available data sources, isn't super accurate, the main thing you'll get with almost perfect accuracy is country and which ISP you're using.
If it's a standard dynamic IP address, it will probably be able to take a really good guess at what city the user of that IP addresses is in, and a pretty inaccurate guess at what neighborhood they are in. Short of getting the ISPs logs of IP address allocations or customer records, you'll never be able to map this IP address one to one with a physical geographic location.
For the static IP addresses, as far as I can tell AT&T (and my guess would be most other major ISPs) allocate all of their static IP addresses from one big pool of IP addresses without giving much consideration to geographic area. I haven't seen any of the IP address geolocation services accurately guess anything else other than what country I am in.
I don't consider the privacy implications of a static IP address that huge. Probably the main risk of using a static IP address is that it is known to the public and the network it's on becomes susceptible to denial of service attacks.
The barrier of entry to conduct a fairly crippling denial of service attack on a small server or network is pretty low. Taking down your typical home server on a fiber Internet connection is definitely something the typical unskilled script kiddie can do.
The more newfangled way of connecting your server to the internet is to use a tunneling service.
Ngrok and PageKite are two pretty good examples of these types of services. Your server opens a connection to the tunneling service, and the tunneling service assigns an IP address to your traffic (or subdomain that can be attached to a domain name as a CNAME record).
All of these tunneling services hide the server's IP address from the open Internet.They also have the added security benefit of adding another step between spinning a service up and exposing it to the open Internet, making it harder to accidentally expose a service that shouldn't be attached to the public Internet.
The one I've done the most experimentation with has been Cloudflare Tunnel. The biggest problem with this service is that it kind of adds another ISP-like intermediary between your server and the user. This is a step back in terms of avoiding over dependence on centralized services, but since the data itself lives on a server you control, it's still an improvement over your standard content silos or proprietary services. I didn't use this type of service in my original post for this exact reason.
Cloudflare goes a bit further than many of the other tunneling services terms of the amount of integration with your site – and not only routes the data, but also takes over the SSL certificate and does a lot of filtering and analysis on the traffic. This does provide a lot of useful security features, but it does mean that Cloudflare does have access to all the traffic going in and out of your server, and it can view SSLed traffic in unencrypted form.
Cloudflare tunnel is probably the option I'd recommend to people who don't have a super in depth technical knowledge.
Concerns about this one company having control over an increasing amount on the Internet aside, it's a very powerful service with a very powerful free tier. Other than the core tunneling service, it handles a lot of things.
It handles reverse proxying, i.e. it can do what I use the Caddy web server for, that is acting as the glue between the various services running on your server.
It can put private services that aren't supposed to be accessible to the whole world behind an authentication portal, this can act as a source of two factor authentication for self hosted web apps that don't necessarily support two factor authentication natively. It can provide a wrapper around SSH allowing external access through a web app, but with additional authentication.
It provides a lot of security features. It can provide DDOS protection.The DDOS protection basically eliminates the threat posed by script kiddy style DDOS attacks, and if the rest of your server is configured correctly it can mitigate even very powerful denial of service attacks. It can rate limit bots accessing your site, which reduces some security threats. Particularly with the paid tier, it can provide a web application firewall which provides some attempts to block known exploits from being used on your site.
It also provides a content delivery network, meaning that Cloudflare's servers store and send out frequently accessed static content instead of sending a request to your server every time. This basically eliminates the type of scalability issues that I spent most of my original post talking about.