Operations Overview¶
Runbooks¶
| Document | Description |
|---|---|
| Deployment Guide | How to deploy changes to infrastructure |
| Emergency Access | SSM/IAP backdoor procedures |
| OOB & Physical Access | Serial console and physical access for Owl |
| Troubleshooting | Common issues and solutions |
| Disaster Recovery | Owl gateway DR procedures and drill checklist |
Quick Reference¶
SSH Access¶
# Cloud instances
ssh joe@pluto # AWS production
ssh joe@dumbo # GCE general
ssh joe@bogart # GCE PowerDNS
ssh joe@mickey # AWS dev (ephemeral)
# Gateways (via ZeroTier)
ssh joe@192.168.194.10 # Owl
ssh joe@10.15.0.1 # Blue (from Blue site)
Emergency Access¶
# AWS (SSM)
aws ssm start-session --target i-05e7dd5e009d6d766 --region us-west-2
# GCE (IAP)
gcloud compute ssh dumbo --zone=us-central1-a --tunnel-through-iap
Ansible¶
# Full deployment
ansible-playbook -i inventory/production.yml playbooks/site.yml --limit HOST
# Specific role
ansible-playbook -i inventory/production.yml playbooks/base.yml --limit HOST
Terraform¶
# Plan changes (always use -target)
terraform plan -target=aws_instance.pluto
# Apply changes
terraform apply -target=aws_instance.pluto
Daily Operations¶
Check Host Status¶
# ZeroTier connectivity
zerotier-cli listnetworks
zerotier-cli listpeers
# Service status (Linux)
sudo systemctl status zerotier-one
sudo systemctl status fail2ban
sudo systemctl status cloudflared
# Service status (OPNsense)
sudo configctl service status
View Logs¶
# fail2ban
sudo journalctl -u fail2ban -f
# ZeroTier
sudo journalctl -u zerotier-one -f
# SSH auth
sudo journalctl -u sshd -f
DNS Operations¶
# Test internal DNS
dig @10.10.10.10 owl.scandora.net
# Test external DNS
dig @1.1.1.1 owl.scandora.net
# Update PowerDNS record
curl -X PATCH "http://10.10.10.10:8081/api/v1/servers/localhost/zones/scandora.net." \
-H "X-API-Key: $KEY" \
-d '{"rrsets":[...]}'
Change Management¶
Before Making Changes¶
- Read existing code - Understand what you're modifying
- Test in dev - Use mickey or OPNsense dev VM
- Backup - Snapshot AMI or export config.xml
- Document - Update relevant docs
After Making Changes¶
- Verify - Test the change works
- Commit - Git commit with descriptive message
- Push - Push to remote after milestones
- Reboot test - Where appropriate
Maintenance Windows¶
No formal maintenance windows - changes are made as needed with appropriate testing.
Recommended Order¶
For multi-host changes:
- mickey (dev) - Test first
- bogart (untrusted) - Low-risk
- dumbo (GCE) - Secondary production
- pluto (AWS) - Primary production
- Gateways (last) - Most impact