The PostHog Deploy That Took Down Our API (And the Pre-Deploy Script That Prevents It Next Time)

We added PostHog analytics to 0Latency — tracking activation events, recall usage, plan conversions. Standard product analytics instrumentation. We deployed it and the API went down. Not degraded. Down. Nginx was returning 502s on every request to api.0latency.ai.

The fix took about 10 minutes once we knew what to look for. Getting to that point took longer than it should have, and the lesson from the whole thing wasn't about PostHog specifically — it was about what we should have been checking before every deploy, and weren't.

What Happened

PostHog's recommended setup is a reverse proxy through your own domain — rather than calling app.posthog.com directly from the browser, you route analytics through yourdomain.com/ingest. This is the right call for ad-blocker bypass and for keeping analytics traffic on your own domain.

We set it up in nginx. The PostHog proxy config worked fine in staging. In production, we had an nginx upstream block for the API service at 127.0.0.1:8000. The PostHog proxy config conflicted with it — specifically, the location /ingest block we added had a misconfigured proxy_pass directive that caused nginx to reload into a broken state where the upstream connection to the API couldn't be established.

The error in nginx logs was clear in retrospect:

2026/04/01 18:43:12 [error] 12847#12847: *1 connect() failed
(111: Connection refused) while connecting to upstream,
client: 104.21.x.x, server: api.0latency.ai,
request: "POST /recall HTTP/1.1",
upstream: "http://127.0.0.1:8000/recall"

The API process was running. The gunicorn workers were healthy. The database was accessible. Nginx just couldn't reach the upstream. The PostHog proxy config had broken the upstream routing without producing an error at nginx reload time — it only failed on the first actual request.

The Fix

Three lines in the nginx config. The PostHog proxy_pass needed an explicit resolver and the upstream block needed to be in a named location that didn't conflict with the API routing:

# Correct PostHog proxy config
location /ingest/ {
    proxy_pass https://app.posthog.com/;
    proxy_ssl_server_name on;
    resolver 8.8.8.8;
    proxy_set_header Host app.posthog.com;
    proxy_set_header X-Real-IP $remote_addr;
}

Reloaded nginx. API recovered. PostHog started receiving events. Total downtime: about 23 minutes, most of which was diagnosis.

What We Should Have Had

The thing that bothered us afterward wasn't the nginx bug. Nginx configuration has sharp edges and anyone who's deployed a non-trivial nginx setup has hit them. What bothered us was that our pre-deploy process was: make the change, run nginx -t to check syntax, reload. That's it.

nginx -t validates syntax. It doesn't validate that the configuration actually routes requests correctly. A syntactically valid config can fail at runtime in ways that only show up on the first real request. We knew this. We didn't have a check for it.

The pre-deploy script we wrote after the incident:

#!/bin/bash
# pre-deploy-check.sh — run before every nginx config change

set -e

echo "=== Pre-deploy verification ==="

# 1. Nginx syntax check
echo "Checking nginx syntax..."
nginx -t

# 2. Verify API upstream is reachable
echo "Checking API upstream..."
curl -sf http://127.0.0.1:8000/health > /dev/null || {
  echo "ERROR: API upstream not reachable on :8000"
  exit 1
}

# 3. Verify nginx can reach upstream after reload
echo "Reloading nginx..."
systemctl reload nginx
sleep 2

# 4. End-to-end API check through nginx
echo "Verifying API through nginx..."
response=$(curl -sf -o /dev/null -w "%{http_code}" \
  -H "X-API-Key: $HEALTH_CHECK_KEY" \
  https://api.0latency.ai/health)

if [ "$response" != "200" ]; then
  echo "ERROR: API returning $response through nginx — rolling back"
  cp /etc/nginx/nginx.conf.backup /etc/nginx/nginx.conf
  systemctl reload nginx
  exit 1
fi

echo "=== All checks passed ==="

The backup step matters. We now keep a nginx.conf.backup that's always the last known-good config. If the post-reload check fails, the script rolls back automatically. No manual intervention required.

The Broader Pattern

Every infrastructure change that touches routing or proxying should have an end-to-end functional test, not just a syntax check. nginx -t is necessary but not sufficient. The same is true for: systemd service file changes (does the service actually start?), Python dependency updates (does the import graph still resolve?), database migration scripts (does the application still connect and query correctly?), and SSL certificate renewals (is the chain valid end-to-end from a client perspective?).

The pattern is the same in each case: the tool-level check tells you the artifact is well-formed. It doesn't tell you the system works. You need a functional check that actually exercises the path you changed.

We run the pre-deploy script before every nginx change now. It's added maybe 45 seconds to the deploy process. It would have saved 23 minutes and a small amount of production downtime if we'd had it before the PostHog deploy.

PostHog is working correctly, for what it's worth. The activation event tracking, the recall usage funnel, the conversion metrics — all flowing. It just took one 502 incident and a script to get there cleanly.

Add persistent memory to your agent.

REST API, MCP server, PostgreSQL + pgvector. Free tier: 10,000 memories.

Get Your API Key →