Skip to content

Slack Notifications Integration

Complete guide to Slack notifications for scandora.net infrastructure.

Overview

The #scandora-notifications Slack channel receives automated notifications from:

  1. Prometheus AlertManager - Infrastructure monitoring alerts
  2. GitHub - Repository events (commits, PRs, releases, issues)

Architecture

┌─────────────────────┐
│  Prometheus/Alert   │
│     Manager         │
│  (192.168.194.131)  │
└──────────┬──────────┘
           │ HTTPS POST
┌─────────────────────┐      ┌─────────────────────┐
│  Slack Incoming     │      │    GitHub           │
│     Webhook         │◀─────│   Repository        │
│  (hooks.slack.com)  │      │ (scandora/...net)   │
└──────────┬──────────┘      └─────────────────────┘
┌─────────────────────┐
│  #scandora-         │
│   notifications     │
│    (Slack)          │
└─────────────────────┘

1. Prometheus/AlertManager → Slack

Configuration Status

Deployed and Active (as of 2026-02-13)

  • AlertManager: Running on dumbo (192.168.194.131:9093)
  • Webhook URL: Stored in 1Password (slack_webhook_scandora_notifications)
  • Channel: #scandora-notifications
  • Ansible Role: monitoring-stack

Alert Types

Warning Alerts (4-hour repeat):

  • Host down (InstanceDown)
  • High CPU/memory/disk usage
  • Network issues
  • ZeroTier connectivity problems
  • Dev VM running too long (1 hour)

Critical Alerts (1-hour repeat):

  • Multiple hosts down
  • Critical resource exhaustion
  • Gateway failures
  • Dev VM running critically long (4 hours)

Message Format

Warning Example:

FIRING 🔥 Host Down Alert
Instance dumbo:9100 is unreachable (node_exporter down)

Critical Example:

🚨 CRITICAL: Gateway Failure
Owl gateway (192.168.194.10) is unreachable - network connectivity lost

Resolved Example:

RESOLVED ✅ Host Down Alert
Instance dumbo:9100 is now reachable

Testing Alerts

# Stop node_exporter on a host to trigger alert
ssh dumbo "sudo systemctl stop node-exporter"

# Check #scandora-notifications for alert (within 30s)

# Restore service
ssh dumbo "sudo systemctl start node-exporter"

# Check for resolved message (within 5m)

Configuration Files

  • Ansible defaults: cloud/ansible/roles/monitoring-stack/defaults/main.yml
  • AlertManager template: cloud/ansible/roles/monitoring-stack/templates/alertmanager.yml.j2
  • Deployment script: cloud/ansible/scripts/run-monitoring.sh
  • 1Password item: slack_webhook_scandora_notifications (scandora-automation vault)

Redeployment

cd cloud/ansible
./scripts/run-monitoring.sh dumbo deploy

2. GitHub → Slack

Configuration Options

Three options available (choose one):

Option A: Native Slack GitHub App (Easiest)

Setup:

# In Slack
/github subscribe scandora/scandora.net
/github subscribe scandora/scandora.net commits:all pulls issues

Pros:

  • Quick setup (one command)
  • Rich formatting with previews
  • Built-in features (unfurl, threading)

Cons:

  • Not managed via IaC
  • Requires Slack app permission
  • Less control over formatting

Option B: Manual GitHub Webhook

⚠️ Does not work with Slack incoming webhooks - requires transformation middleware

GitHub's webhook payload format is incompatible with Slack's incoming webhook format. This option only works if you have a custom endpoint (Lambda, Cloud Function) that transforms the payload.

Option C: Terraform-Managed Webhook (Requires Middleware)

Setup:

# 1. Create GitHub PAT (admin:repo_hook scope)
#    https://github.com/settings/tokens

# 2. Store in 1Password
op item create \
  --category="API Credential" \
  --title="GitHub Personal Access Token - Terraform" \
  --vault="scandora.net" \
  credential[password]="<token>"

# 3. Load credentials
source scripts/terraform/tf-github-slack-env.sh

# 4. Deploy
cd cloud/terraform/environments/production/network/github-slack
terraform init
terraform plan
terraform apply

Pros:

  • ✅ Infrastructure as Code
  • ✅ Version controlled
  • ✅ Auditable changes
  • ✅ Reproducible

Cons:

  • More initial setup
  • Requires GitHub PAT management

Events Sent to Slack

When using webhooks (Options B or C):

  • push - Commits to any branch
  • pull_request - PRs created, merged, closed
  • release - New releases published
  • issues - Issues opened, closed
  • issue_comment - PR/issue comments

Testing GitHub Integration

# After setup, test with empty commit
git commit --allow-empty -m "test: verify Slack notifications"
git push origin main

# Check #scandora-notifications within seconds

Webhook Management

Credentials Storage

All webhook credentials stored in 1Password:

Item Vault Field Purpose
slack_webhook_scandora_notifications scandora-automation webhook_url AlertManager + GitHub
GitHub Personal Access Token - Terraform scandora.net credential Terraform GitHub provider

Rotating Webhook URL

If webhook URL needs to change (compromised, new app, etc.):

# 1. Create new webhook in Slack (see Setup section)

# 2. Update 1Password
op item edit "slack_webhook_scandora_notifications" \
  --vault scandora-automation \
  webhook_url[password]="<new-webhook-url>"

# 3. Redeploy AlertManager
cd cloud/ansible
./scripts/run-monitoring.sh dumbo deploy

# 4. Update GitHub webhook (if using Terraform)
source scripts/terraform/tf-github-slack-env.sh
cd cloud/terraform/environments/production/network/github-slack
terraform apply

# 5. Test both integrations

Webhook Security

Best Practices:

  • ✅ Never commit webhook URLs to git
  • ✅ Store in 1Password with limited access
  • ✅ Use separate webhook per environment (dev/prod) if needed
  • ✅ Rotate periodically (annually recommended)
  • ✅ Monitor Slack audit logs for unusual activity

Access Control:

  • Webhook URL allows anyone to post to channel
  • Treat as sensitive credential
  • If exposed, rotate immediately

Troubleshooting

AlertManager Not Sending Notifications

Check AlertManager status:

curl http://192.168.194.131:9093/api/v1/status

Check configuration:

ssh dumbo "cat /opt/monitoring/alertmanager/alertmanager.yml | grep -A 5 slack"

Check logs:

ssh dumbo "docker logs monitoring-alertmanager 2>&1 | tail -50"

Test webhook manually:

WEBHOOK_URL=$(op item get "slack_webhook_scandora_notifications" \
  --vault scandora-automation --fields webhook_url --reveal)

curl -X POST -H 'Content-type: application/json' \
  --data '{"text":"Test from curl"}' \
  "$WEBHOOK_URL"

GitHub Webhook Not Firing

Check webhook deliveries:

  1. Go to: https://github.com/scandora/scandora.net/settings/hooks
  2. Click on webhook
  3. View "Recent Deliveries"
  4. Check for 200 OK responses

Common issues:

  • Webhook URL incorrect (verify in 1Password)
  • Events not selected (push, pulls, etc.)
  • Webhook disabled/inactive
  • SSL verification issue (should be disabled for Slack)

Re-trigger webhook:

  1. Find failed delivery in Recent Deliveries
  2. Click "Redeliver"
  3. Check Slack for message

Monitoring and Alerts

Notification Volume

Expected notification frequency:

  • GitHub: 5-20/day (depending on development activity)
  • Prometheus: 0-5/day normally (more during incidents)

Alert Fatigue Prevention

  • Alerts grouped by severity (warning vs critical)
  • Repeat intervals: 4hr warning, 1hr critical
  • Inhibition rules (host down suppresses other alerts)
  • Dev VM alerts separate from production

Notification Review

Periodically review:

  • Are alerts actionable?
  • Too many false positives?
  • Missing important events?
  • Need different channels per severity?

Future Enhancements

Potential improvements:

  1. Separate channels by severity
  2. scandora-critical (critical only)

  3. scandora-info (github, warnings)

  4. Enhanced formatting

  5. Custom Slack blocks for richer display
  6. Thread replies for alert updates
  7. Buttons for common actions (acknowledge, silence)

  8. Integration with other services

  9. PagerDuty for on-call rotation
  10. Opsgenie for escalation
  11. Slack slash commands for querying status

  12. Custom webhooks for specific events

  13. Backup completions/failures
  14. Cost threshold alerts
  15. Security scan results

  16. Bi-directional integration

  17. Slack → Prometheus (silence alerts via command)
  18. Slack → GitHub (create issues from alerts)
  • AlertManager Template: cloud/ansible/roles/monitoring-stack/templates/alertmanager.yml.j2
  • GitHub Terraform Module: cloud/terraform/modules/github-slack-webhook/
  • Terraform Environment: cloud/terraform/environments/production/network/github-slack/
  • Credential Helper: scripts/terraform/tf-github-slack-env.sh

Quick Reference

Useful Commands

# View AlertManager config
ssh dumbo "cat /opt/monitoring/alertmanager/alertmanager.yml"

# Check alert status
curl -s http://192.168.194.131:9093/api/v1/alerts | jq

# Silence an alert (1 hour)
curl -X POST http://192.168.194.131:9093/api/v1/silences \
  -d '{"matchers":[{"name":"alertname","value":"InstanceDown"}],"startsAt":"2026-02-13T00:00:00Z","endsAt":"2026-02-13T01:00:00Z","comment":"Maintenance window"}'

# List GitHub webhooks (via Terraform)
cd cloud/terraform/environments/production/network/github-slack
terraform state list
terraform show

# Test Slack webhook
WEBHOOK_URL=$(op item get "slack_webhook_scandora_notifications" \
  --vault scandora-automation --fields webhook_url --reveal)
curl -X POST "$WEBHOOK_URL" \
  -H 'Content-type: application/json' \
  -d '{"text":"Test notification"}'

Last updated: 2026-02-13 Status: Active and operational Owner: Infrastructure team Contact: #scandora-notifications (Slack)