Infrastructure Overview

(The previous infrastructure overview is here)

A. Domain name

  1. social.coop domain is registered under the https://gandi.net account of Enric of FairCoop, who should be notified and sent payment yearly before it expires
  2. It would be good to get the domain transferred or at least administratively delegated to an account under social.coop control
  3. DNS is managed via https://cloudflare.com where we have the option of turning on the DDOS protection and CDN/caching functionality if necessary

B. Infrastructure

As of 2022-12-18 we have two dedicated servers on Hetzner: Runko (95.216.13.24) and Rhizome (65.109.113.162).

Runko

Our primary server as of the time of writing; meaning it hosts PostgreSQL, Redis and our Mastodon instance.

  • Hostname: runko.social.coop
    • 32GB RAM (4x RAM 8192 MB DDR3)
    • i7-4770 CPU @ 3.40GHz).
    • 2x 250 GB disks (SSD, in a striped setup -- not mirrored)

Disks are arranged as follows (run sudo pvs, sudo vgs, sudo lvs in the server for live information):

NAME          MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
	  sda             8:0    0 223.6G  0 disk 
	  ├─sda1          8:1    0     1G  0 part /boot
	  └─sda2          8:2    0 222.6G  0 part 
	    ├─vg0-root1 253:0    0    25G  0 lvm  /
	    ├─vg0-root2 253:1    0    25G  0 lvm  
	    └─vg0-opt   253:2    0 396.1G  0 lvm  /opt
	  sdb             8:16   0 223.6G  0 disk 
	  └─sdb1          8:17   0 223.6G  0 part 
	    └─vg0-opt   253:2    0 396.1G  0 lvm  /opt

Using LVM such that we have:

  LV    VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
	    opt   vg0 -wi-ao---- 396.13g                                                    
	    root1 vg0 -wi-ao----  25.00g                                                    
	    root2 vg0 -wi-a-----  25.00g 

opt is mounted at /opt.

Rhizome

Currently our secondary or "hot spare" server; but we intend to migrate services here as it's newer and faster.

  • Hostname: rhizome.social.coop
    • 64GB RAM (4x RAM 8192 MB DDR3)
    • AMD Ryzen 5 3600 CPU @ 3.60GHz.
    • 2x 476GiB disks (NVM, RAID 1/mirrored).

Third party services

  • DigitalOcean Spaces (S3 compatible, for storing Mastodon media and backups; also we use their transient VMs for development / testing)
  • Email delivery by mailgun (10,000 emails free every month)
  • Email relaying for our social.coop domain is handled by a Mailcow server with Web Architects (no significant storage available). This is needed for git.coop registration.
  • datadog and pingometer for monitoring
  • cloudflare for DNS management.

C. Monitoring

D. Regular Updates

  1. Host systems (Ubuntu LTS package upgrades)
  2. datadog agent https://hub.docker.com/r/datadog/agent/tags/
  3. Regular Mastodon upgrades

E. Security

  1. HTTPS / SSL certificates
  2. Backups
  3. Firewalls
  4. DDOS
    • See about enabling Cloudflare

F. Documentation & Communication

  1. Document any new infrastructure / software / service / config
  2. Keep all code in shared git version control (either in sauce or ansible)
  3. Keep configuration and private keys / passwords separate, and place all config files in shared git version control
  4. Proactively communicate with Tech WG about reasons, approach and outcome of every change / update, and then add to documentation
  5. Let fellow Ops Team members know before any prolonged unavailability (as much as possible)
  6. Communicate with Ops Team during any emergency, or before doing anything that affects live services
  7. Create/use individual accounts/passwords for each admin as much as possible
  8. Use a pass for storing all shared secrets (like passwords)

G. Fix unexpected issues

YMMV

Handy commands, etc

systemd services

service purpose
social.coop-mastodon a service to control the mastodon installation via docker-compose
social.coop-remove-media runs the media cleanup command to remove remote media >7 days old via a .timer
certbot runs the renewals via .timer

logs

command purpose
systemctl list-timers lists timers!
journalctl -f tail ALL system logs
docker-compose logs -f web view and tail web logs (when in /opt/social.coop/sauce/docker/)
docker-compose logs -f db view and tail db logs (you get the pattern?)
journalctl -f -u certbot see when certbot was run
journalctl -f -u social.coop-mastodon see the output from the docker-compose commands ran with systemctl, but not the docker container logs themselves
journalctl -f -u social.coop-remove-media see what the remove media command is up to

docker-compose

All of these commands must be run on runko.social.coop in the /opt/social.coop/sauce/docker/ directory.

List all Docker containers: docker-compose ps

Stop a service, e.g. docker-compose stop redis

Start a service, e.g. docker-compose start redis

Redeploy (only changed things): docker-compose up -d (or systemd refresh social.coop-mastodon - does same thing)

If you want to run commands make sure to use the --rm argument, or the containers will hang around.

e.g. docker-compose run --rm web rails console (to get a rails console)

Location of Postgres database files: /opt/social.coop/var/lib/postgresql/data/

Mastodon upgrades

Here are some example steps that you might take when upgrading Mastodon. Please note that the process may be different every time, and that issues may arise, so make sure to have a few hours ahead of you!

Depending on the significance of the upgrade, you may want to create a database backup. We don't currently have a ready made script for this.

This is a in-theory guide to doing the upgrade, please proceed carefully and understand what you are doing. If in doubt ask for someone to assist. If this guide is not up to date, please update it afterwards :)

  • Update the mastodon docker image versions in https://git.coop/social.coop/tech/sauce/blob/rebuild/docker/docker-compose.yml
  • git pull in /opt/social.coop/sauce/
  • reload systemctl refresh social.coop-mastodon (will take a bit of time to download the new image)
  • if you need to run database migration (check release notes):
    • docker-compose run --rm web rails db:migrate
  • if you need to update the assets (it'll say to regenerate them in release notes, but we just copy them out as they are prebuilt in the docker image), something like:
    • mkdir /tmp/updated-assets && chown 991:991 /tmp/updated-assets
    • docker run --rm -v /tmp/updated-assets:/assets tootsuite/mastodon:v2.5.0 cp -r public/. /assets
    • then empty out /opt/social.coop/var/www/mastodon and fill with the stuff from /tmp/updated-assets and keep the permissions to 991:991 (I would probably keep the existing directory rather than mv, as it's mounted by docker and it might screw it up). rsync with --delete could be a good option. put the commands back in here once you worked out a good way!