Infrastructure Overview
(Originally authored by @mayel on this cryptpad) and discussed on this loomio thread)
A. Domain name
- social.coop domain is registered under the https://gandi.net account of Enric of FairCoop, who should be notified and sent payment yearly before it expires
- It would be good to get the domain transferred or at least administratively delegated to an account under social.coop control
- DNS is managed via https://cloudflare.com where we have the option of turning on the DDOS protection and CDN/caching functionality if necessary
B. Infrastructure
- Back-end server (Dedicated with 8GB RAM, 4x 2.4GHz ARM cores) trunk.social.coop
- 2x 50GB SSD volumes
- Docker Swarm manager (main config: /root/stacks/social.coop/docker-cloud.yml)
- 5x Docker containers running Mastodon (config: /root/stacks/social.coop/.env.production)
- Front-end server (VPS with 2GB RAM, 2GHz ARM core) toot.social.coop
- 1x 50GB SSD volume
- Docker Swarm worker
- Docker container with Nginx proxy serving all web requests
- Docker container with Membership Application/Invite app (PHP, uses same Postgres database server as Mastodon on back-end, code: https://github.com/socialcooperative/dataverse)
- I've also been hosting free-of-charge on other servers:
- https://status.social.coop (powered by https://cachethq.io) This needs to be on different infrastructure so it remains accessible in case the main ones go down.
- https://wiki.social.coop (powered by Mediawiki)- this was initially set up as an experiment and should now be migrated to social.coop's servers.
- 3rd party services
- https://cloudvault.me (Mayel) provides the servers
- https://cloud.docker.com for managing docker swarm and deployments
- Uploaded files from Mastodon (images, etc) are on https://www.dreamhost.com DreamObjects storage, and delivered by CDN https://www.fastly.com
- Email delivery by https://www.mailgun.com (10,000 emails free every month)
- https://www.datadoghq.com and http://pingometer.com for monitoring
- https://cloudflare.com for DNS
C. Monitoring
- Web services
- Monitor down alerts from services like http://pingometer.com
- https://status.social.coop currently needs to be updated manually but could be hooked up to a monitoring system
- Enough free space in volumes
- Performance / resource usage / container logs (Victor has set up Docker to feed into https://www.datadoghq.com)
D. Regular Updates
- Host systems (Ubuntu LTS package upgrades)
- Dockerfiles and containers
- Regular Mastodon upgrades
- Make sure to make backups first, then check for updates and setup instructions at https://github.com/tootsuite/mastodon/releases
- Occasional updates of membership app, Mediawiki, etc.
E. Security
- HTTPS / SSL certificates
- Using https://certbot.eff.org
- Currently certbot needs be run manually for each domain (social.coop and members.social.coop) before the certificate expires (every 3 months)
- Need to set up a better Docker-compatible way to auto-renew certificates
- Backups
- Currently manual backups are done occasionally and stored offsite by admins
- Need to create backup & recovery processes
- Need to choose/setup backup solutions & storage location
- Firewalls
- Review rules
- Monitor logs
- DDOS
- See about enabling Cloudflare
F. Documentation & Communication
- Document any new infrastructure / software / service / config
- Keep all code in shared git version control
- Keep configuration and private keys / passwords separate, and place all config files in shared git version control
- Proactively communicate with Tech WG about reasons, approach and outcome of every change / update, and then add to documentation
- Let fellow Ops Team members know before any prolonged unavailability (as much as possible)
- Communicate with Ops Team during any emergency, or before doing anything that affects live services
- Create/use individual accounts/passwords for each admin as much as possible
- Use a secure solution for storing all shared secrets (like passwords)
G. Fix unexpected issues
YMMV
Some initial documentation
All of these commands must be run on the server that is the Docker swarm manager (trunk.social.coop):
To list all Docker swarm containers: docker service ls
To stop a service: docker service scale [service name]=0
For example: docker service scale mastodon_dataverse=0
To start a service: docker service scale [service name]=1
For example: docker service scale mastodon_dataverse=1
To re-deploy the whole swarm: cd /root/mastodon/repo/ ; docker stack deploy -c docker-cloud.yml mastodon
This command seems to be quite clever in that it only touches services that have had changes done, either to configs or updated images.
Avoid running commands using docker-compose (it will start new instances). Instead you should run commands against existing containers using (you can use tab to autocomplete the container name): docker exec -it [container name] [command]
For example: docker exec -it mastodon_db[TAB] bash
.
Docker Cloud is set to auto-build a container when new code is pushed to our Github repos (like the members app) This takes a while, but you can then upgrade the container with: docker service update --image socialcooperative/members-dataverse mastodon_dataverse
For special social.coop customisations, the docker config mounts /var/nfs/data/www/mastodon/
over its internal volume which contains files that override the stock ones provided by Mastodon (homepage, bylaws, custom signup form, logo, stylesheets, etc).
Location of Postgres database files: /var/vol2/postgres/mastodon
Mastodon upgrades
Here are some example steps that you might take when upgrading Mastodon. Please note that the process may be different every time, and that issues may arise, so make sure to have a few hours ahead of you!
Put social.coop in maintenance mode:
cd /var/nfs/data/www/mastodon/public/ ; mv maintenance_off.html maintenance.html
Make a backup of the database:
(Note, [TAB]
means tab autocompletion)
docker exec -it mastodon_db[TAB] bash
df -h # check there are > ~5GB space in /var/lib/postgresql/data/!
pg_dumpall -U postgres -c -v -f /var/lib/postgresql/data/db-backup-$(date +%F_%R).bak
exit
Bump up all the Mastodon version numbers in
/root/mastodon/repo/docker-cloud.yml
(for sidekiq, web & streaming services)Re-deploy the whole stack (will only touch what changed):
cd /root/mastodon/repo/ ; docker stack deploy -c docker-cloud.yml mastodon
Check how things are doing:
docker stack ps mastodon
Enter the main mastodon app container:
docker exec -it mastodon_web[TAB] bash
- Run all appropriate rake tasks as instructed by the Mastodon release notes (check notes from all releases between current version and new version).
- We can also view all rake tasks available:
rake -A -T
Exit the mastodon app container:
exit
Make social.coop live and check if everything is working:
cd /var/nfs/data/www/mastodon/public/ ; mv maintenance.html maintenance_off.html