Skip to content

Rocky Deployment Runbook

Rocky is the Meanservers bare metal VPS (Chicago data center). It runs as a WireGuard VPN exit node for the Denver region, plus standard monitoring and DNS infrastructure.

Validated: 2026-03-01 — Full factory-reset → fully-configured deployment confirmed.


Quick Reference

Property Value
Public IP 193.8.172.100
ZeroTier IP 192.168.194.103
SSH user ansible (automation) / joe (human)
Roles base, dotfiles, zerotier, internal-dns, wireguard, ddns, node-exporter, blackbox-exporter
Docker No (lightweight server)
Cloud SQL No
Monitoring stack No (scrape target only; stack lives on dumbo)
WireGuard Yes — exit node for 10.99.0.0/24 (Denver)

Script: run-rocky.sh

The deployment entry point is:

cloud/ansible/scripts/run-rocky.sh [--password 'ROOT_PASSWORD'] [ansible-playbook-args]

State Machine

The script auto-detects the server's current state and drives it to a fully configured end state with no manual intervention. It probes connection paths in order:

State Condition Action
A — Subsequent run ZeroTier reachable at 192.168.194.103 Full deployment via ZeroTier
B — First full deploy ansible SSH works on public IP; ZeroTier not yet installed Full deployment via public IP
C — Fresh OS No ansible SSH access anywhere Bootstrap (create user + key), then full deploy

State C requires --password 'ROOT_PASSWORD'. States A and B need no extra flags.

The probe timeout for detecting State C is ~60 seconds — expected behavior while SSH connection attempts fail over from ZeroTier to public IP to "give up".

Example Commands

# Fresh OS (State C) — factory reset or new VPS
./cloud/ansible/scripts/run-rocky.sh --prod --password 'VPS_ROOT_PASSWORD'

# Subsequent run (State A/B) — idempotent config refresh
./cloud/ansible/scripts/run-rocky.sh --prod

# Single role re-run
./cloud/ansible/scripts/run-rocky.sh --prod --tags blackbox-exporter

# Dry run
./cloud/ansible/scripts/run-rocky.sh --check

Full Factory-Reset Procedure

This is the validated end-to-end sequence for deploying rocky from a freshly reinstalled OS.

Prerequisites

  • op CLI authenticated (1Password desktop app running, Touch ID available)
  • SSH known_hosts is not pre-populated for rocky's IPs (or will be auto-cleaned by script)
  • Root password for the fresh OS install

Step 1: Verify Connectivity

# Check public IP is reachable
ping -c 3 193.8.172.100

# Check SSH port is open
nc -zv 193.8.172.100 22

Step 2: Run the Deployment

cd /path/to/scandora.net
./cloud/ansible/scripts/run-rocky.sh --prod --password 'ROOT_PASSWORD'

The script will:

  1. Inject secrets from 1Password (.vars.rocky.yml) — Touch ID may fire here
  2. Detect State C (no ansible SSH) — auto-clean known_hosts for 193.8.172.100 and 192.168.194.103
  3. Bootstrap: run migrate-to-ansible-user.yml as root with password auth
  4. Full deployment: run site.yml as ansible user

Expected duration: ~5–10 minutes for a fresh deploy.

Step 3: Handle Transient Failures

If a role fails due to a transient network issue (e.g., GitHub download timeout), re-run just that tag:

./cloud/ansible/scripts/run-rocky.sh --prod --tags blackbox-exporter

The blackbox-exporter role downloads from GitHub releases. Transient connection resets from GitHub are common — simply re-run.

Step 4: Verify Deployment

After a successful run, the script prints verification commands. Run them:

# 1. Node Exporter metrics (use ZeroTier IP after State A)
curl -s http://192.168.194.103:9100/metrics | head

# 2. WireGuard VPN
ssh rocky 'sudo wg show'

# 3. Internal DNS
ssh rocky 'dig bogart.scandora.net'

# 4. ZeroTier network membership
ssh rocky 'sudo zerotier-cli listnetworks'

Expected ZeroTier output: OK PRIVATE with IP 192.168.194.103.


Secrets

All credentials are injected at deploy time via op inject from:

scripts/env-files/.vars.rocky.yml
Variable 1Password Item Vault
zerotier_api_token zerotier_api_token_network_management scandora-automation
cf_api_key cloudflare_api_token_dns_automation scandora-automation
pdns_api_key powerdns_api_key_bogart_production scandora-prd-automation
ansible_authorized_key ssh_key_ansible (public_key field) scandora-automation

The pdns_api_key is in scandora-prd-automation — Touch ID required (production vault).


Known Issues and Gotchas

op:// in YAML Comments Breaks op inject

Bug discovered: 2026-03-01 during rocky production deployment.

op inject scans the entire file for op:// patterns — including YAML comments. Any comment containing op:// will be parsed as a secret reference and fail.

# BROKEN — op inject tries to parse this comment:
# DO NOT put actual secret values here — op:// references only.

# CORRECT:
# DO NOT put actual secret values here — secret references only.

All .vars.*.yml files were updated to remove op:// from their comments (commit e177e80).

op inject --force Required When Output File Pre-Exists

Bug discovered: 2026-03-01 during rocky production deployment.

mktemp creates an empty file. When op inject then tries to write to it, it asks interactively: "Overwrite existing file?" There is no TTY in a background subprocess, so the prompt hangs or fails with:

cannot prompt for confirmation. Use the '-f' or '--force' flag to skip confirmation.

Fix: Always use op inject --force when writing to a pre-created temp file. All six run-*.sh scripts now use --force (commit e177e80).

Stale known_hosts After Server Reset

When rocky is reinstalled, its SSH host key changes. The client refuses to connect with "REMOTE HOST IDENTIFICATION HAS CHANGED". The script auto-clears known_hosts for both public IP and ZeroTier IP on State C detection:

ssh-keygen -R 193.8.172.100
ssh-keygen -R 192.168.194.103
ssh-keygen -R rocky

This is handled automatically — no manual intervention needed.

ZeroTier Static IP Assignment

Rocky's ZeroTier node ID changes after a factory reset. The ZeroTier network assigns IP by node ID — a new node ID gets a new IP unless ipAssignments is configured in ZeroTier Central for the specific MAC address.

The zerotier Ansible role auto-authorizes the node using the API token. After authorization, the IP assignment 192.168.194.103 is enforced via ZeroTier's static IP assignments in the network config (configured in ZeroTier Central, not in this IaC repo).

If the IP fails to be 192.168.194.103, check ZeroTier Central → Members → rocky, and verify the IP assignment is set to 192.168.194.103.


Roles Deployed to Rocky

From cloud/ansible/playbooks/site.yml (with rocky's inventory vars):

Role Tag Purpose Condition
base base Packages, users, SSH hardening, fail2ban Always
dotfiles dotfiles Shell config, aliases Always
docker docker Docker CE Skipped (docker_enabled: false)
zerotier zerotier Overlay network, auto-authorize Always
internal-dns internal-dns Routes *.scandora.net to bogart Always
wireguard wireguard VPN server — Denver exit node, 10.99.0.0/24 wireguard_enabled: true
ddns ddns rocky.scandora.net → public IP via Cloudflare cf_api_key is defined
node-exporter node-exporter Prometheus metrics, binds to ZeroTier IP zerotier_ip is defined
blackbox-exporter blackbox-exporter ICMP/TCP probing mesh zerotier_ip is defined

Skipped roles (not applicable to rocky): home-disk, docker, cloudsql-client, powerdns, monitoring-stack, iac-tools, github-runner.


WireGuard Configuration

Rocky runs the WireGuard server for the Denver exit node:

Property Value
Server address 10.99.0.1/24
Listen port 51820
NAT interface eth0
Luna peer 10.99.0.2/32 — key z2IPPElnNhvnBXH0GzYZrtPAUMWv+78xVtidfxYIEXs=
Rocky WireGuard public key Y2G2hNJNj0XckPdVuxDMRYoEZpV97wAMcnGRAPE710w=

The server's private key is generated by the wireguard Ansible role and stored in /etc/wireguard/wg0.conf on rocky (not in this repo — regenerated on fresh deploy).

After a factory reset, the server's key pair is regenerated. Follow the Post-Reset: 1Password Key Sync procedure below to update 1Password and luna's client config before testing the tunnel.


Prometheus Monitoring

Rocky is a scrape target on dumbo's Prometheus stack:

Target URL
Node Exporter http://192.168.194.103:9100/metrics
Blackbox Exporter http://192.168.194.103:9115/metrics

After a factory reset, re-run the monitoring stack deployment to restore scraping:

./cloud/ansible/scripts/run-monitoring.sh --prod dumbo deploy

Post-Reset Checklist

After a full factory reset deployment, verify:

  • ZeroTier: ssh rocky 'sudo zerotier-cli listnetworks'OK PRIVATE 192.168.194.103
  • SSH via ZeroTier: ssh ansible@192.168.194.103 echo ok
  • Node Exporter: curl -s http://192.168.194.103:9100/metrics | grep node_uname_info
  • Blackbox Exporter: curl -s http://192.168.194.103:9115/metrics | grep blackbox_exporter_build_info
  • WireGuard: ssh rocky 'sudo wg show' → shows peers and latest handshake
  • Internal DNS: ssh rocky 'dig bogart.scandora.net +short'192.168.194.133
  • DDNS: dig rocky.scandora.net +short193.8.172.100
  • Prometheus scraping: Check dumbo Grafana targets page for rocky
  • WireGuard keys synced to 1Password (see Post-Reset: 1Password Key Sync below)
  • Luna tunnel: wg-quick down rocky-denver && wg-quick up rocky-denver → no errors
  • Tunnel handshake: sudo wg show on luna shows a recent handshake with rocky peer

Post-Reset: 1Password Key Sync

After every factory reset, the wireguard Ansible role generates a fresh keypair. Run these commands to sync the new keys into 1Password and update luna's client config.

Step 1: Retrieve the new server keypair

# Get the new private key
NEW_PRIVKEY=$(ssh rocky 'sudo cat /etc/wireguard/wg0.conf' | grep PrivateKey | awk '{print $3}')

# Get the new public key (derived from the private key by WireGuard)
NEW_PUBKEY=$(ssh rocky 'sudo wg show wg0 public-key')

echo "New public key: $NEW_PUBKEY"

Step 2: Update 1Password

# Update the server item (Touch ID not required — scandora-automation vault)
op item edit wireguard_rocky_denver_server \
  --vault scandora-automation \
  "private_key=$NEW_PRIVKEY" \
  "public_key=$NEW_PUBKEY"

# Update the client item's cross-reference
op item edit wireguard_luna_denver_client \
  --vault scandora-automation \
  "server_public_key=$NEW_PUBKEY"

Step 3: Update luna's client config

# Edit the WireGuard client config on luna
# Replace the [Peer] PublicKey line with the new value
sed -i '' "s|^PublicKey = .*|PublicKey = $NEW_PUBKEY|" ~/.wireguard/rocky-denver.conf

# Verify
grep PublicKey ~/.wireguard/rocky-denver.conf

Step 4: Restart the tunnel and verify

wg-quick down rocky-denver
wg-quick up rocky-denver
sudo wg show

Expected: a handshake with the rocky peer within ~30 seconds.


File Purpose
cloud/ansible/scripts/run-rocky.sh Deployment entry point
cloud/ansible/inventory/rocky.yml Host variables
cloud/ansible/playbooks/site.yml Site playbook (shared, all hosts)
cloud/ansible/playbooks/migrate-to-ansible-user.yml Bootstrap playbook
scripts/env-files/.vars.rocky.yml Secret references (1Password)
docs/operations/REBOOT-CHECKLIST.md Post-reboot validation steps