Skip to content

Dev VM Firmware Endpoint Hang

Summary

The OPNsense API endpoint /api/core/firmware/info hangs indefinitely on the nested KVM dev VM (GCE n2-standard-4 host → KVM guest). The same endpoint works fine on production OPNsense hardware (owl, blue).

This blocks multiple tools that use the endpoint as a health check or startup probe.

Affected Systems

Tool How It Uses the Endpoint Impact Workaround
Ansible (packages tag) Plugin installation checks firmware status Tag times out --skip-tags packages
Ansible (monitoring tag) SNMP setup calls firmware info Tag times out --skip-tags monitoring
vespo92/OPNSenseMCP testConnection() calls /api/core/firmware/info at startup MCP server fails health check, shows "Failed to connect" Patch client.js (see below)
OPNsense Web UI Firmware page in browser Page loads but firmware check spinner may hang Use other pages

Reproduction

# SSH tunnel to dev VM must be active
ssh -f -N -L 8443:10.7.0.1:443 joe@<gce-external-ip>

# This hangs (no response, curl waits forever):
curl -sk -u "$API_KEY:$API_SECRET" https://localhost:8443/api/core/firmware/info

# These work instantly:
curl -sk -u "$API_KEY:$API_SECRET" https://localhost:8443/api/diagnostics/interface/getInterfaceNames
curl -sk -u "$API_KEY:$API_SECRET" https://localhost:8443/api/core/system/status

Environment

  • Host: GCE n2-standard-4, Ubuntu 24.04, nested virtualization enabled
  • Guest: OPNsense 26.1, KVM/libvirt, 6GB RAM, 4 vCPUs, 4 NICs (virtio)
  • OPNsense version: 26.1 (FreeBSD 14-based)
  • Hypervisor: QEMU/KVM via libvirt on GCE (nested — GCE itself runs on KVM)

Hypothesis

The /api/core/firmware/info endpoint likely invokes pkg or opnsense-update under the hood to check installed package versions and available updates. On nested KVM:

  1. Double virtualization overhead: The guest runs inside a VM that's already inside GCE's hypervisor. Disk I/O and process spawning are significantly slower.
  2. Network access from guest: The firmware check may try to reach OPNsense update mirrors to check for available updates. The guest's WAN is NAT'd through the KVM host, which is NAT'd through GCE — DNS resolution or mirror connectivity may stall.
  3. pkg database lock: If the firmware subsystem holds a lock or waits for a background process (like pkg audit or opnsense-health), the API call blocks until that completes.
  4. Timer/interrupt issues: Nested virtualization can cause timer skew and interrupt delivery delays in FreeBSD, which may cause timeout logic to malfunction.

Current Workarounds

Ansible

Skip the affected tags when running against the dev VM:

./scripts/run-opnsense.sh opnsense-dev --skip-tags packages,monitoring

All other tags (system, interfaces, firewall, dhcp, dns, zerotier, ipv6-tunnel) work normally.

MCP Server (vespo92/OPNSenseMCP)

Patch the globally installed client.js to use a different endpoint:

File: /opt/homebrew/lib/node_modules/opnsense-mcp-server/dist/api/client.js

Find:

async testConnection() {
    try {
        const result = await this.get('/core/firmware/info');
        return {
            success: true,
            version: result.product_version,
            product: result.product_name
        };

Replace with:

async testConnection() {
    try {
        const result = await this.get('/diagnostics/interface/getInterfaceNames');
        const ifCount = Object.keys(result).length;
        return {
            success: true,
            version: `unknown (${ifCount} interfaces)`,
            product: 'OPNsense'
        };

Note: This patch is overwritten by npm update -g opnsense-mcp-server. Reapply after updates.

Investigation Plan

When time permits, investigate the root cause:

  • SSH into the dev VM guest and run opnsense-version directly — does it hang?
  • Check if pkg info hangs (package database access)
  • Check if the firmware endpoint tries to reach update mirrors (tcpdump on guest WAN)
  • Try disabling firmware mirror checks in OPNsense settings
  • Check configd logs (/var/log/configd/latest.log) for stuck jobs
  • Test with a longer timeout (5 minutes) to see if it eventually returns
  • Compare dmesg output between dev VM and production for timer/interrupt warnings
  • Check if the issue persists after pkg update -f on the guest
  • gateways/owl/docs/DEV-WORKFLOW.md — documents the limitation for dev VM testing
  • cloud/ansible/ROLES-STATUS.md — notes packages/monitoring timeout
  • docs/operations/opnsense-mcp-servers.md — documents MCP server patch