feat(health): add /health/ endpoint for OpsLog monitoring #56
Reference in New Issue
Block a user
Delete Branch "feature/health-endpoint-opslog"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #55
Summary
apps.healthapp with DB, cache, celery, and backup checks plus a machine-readable/health/endpointGIT_SHAandBUILD_IDinto production builds and update deploy health checks to probe/health/Testing
pytest apps/health/ -q --tb=short -o addopts=''ruff check .mypy apps configpytest --ignore=e2eQA Review — PR #56 (health endpoint)
Tests pass (21/21 health, 180/180 full suite), ruff clean, no regressions. Code is well-structured overall. Two issues need fixing before merge.
🔴 Must Fix
1.
check_backup()permanently degrades every environmentchecks.py:60-61— WhenBACKUP_STATUS_FILEis unset,check_backup()returns{"status": "fail", ...}. Since backup is not inCRITICAL_CHECKS, this causes the health endpoint to return"degraded"(HTTP 200) in every environment where that env var is absent — which is currently all of them (not set in.env,docker-compose.prod.yml, orentrypoint.prod.sh).This is inconsistent with how
check_celery()handles the unconfigured case (returns"ok"with a detail note at line 43). For an OpsLog monitoring endpoint, permanent"degraded"defeats the purpose — OpsLog will never see a clean"ok"status, causing alert fatigue and masking real degradations.Fix: Either:
{"status": "ok", "detail": "Backup monitoring not configured: BACKUP_STATUS_FILE is unset"}matching the celery pattern, orBACKUP_STATUS_FILEto the production environment and treat missing config as a genuine failure — but then this should be documented.Option (a) is strongly preferred for consistency.
2.
test_version_fieldsis a no-op assertiontest_views.py:86-94— The test setsGIT_SHAandBUILD_IDvia monkeypatch, but then only asserts the values are truthy:Since the view defaults to
"unknown"when the env vars are unset, and"unknown"is truthy, this test passes even without the monkeypatch. It does not verify the env vars are actually read.Fix: Assert the exact values:
🟡 Minor (non-blocking)
health_view— The endpoint accepts POST, PUT, DELETE, etc. Consider adding@require_GETorrequire_http_methods(["GET", "HEAD"]).@pytest.mark.django_dbontest_db_ok/test_db_fail— These tests monkeypatch the cursor and never touch the real DB; the marker adds unnecessary transaction overhead.check_cache()skips cleanup on mismatch — Whencached_value != probe_value(line 33), the function returns withoutcache.delete(cache_key). The 5s TTL makes this harmless, but afinallyblock would be tidier.Amendment to review item #1:
On reflection, the
check_backup()behaviour is correct — backups are not optional in production the way Celery is, so returning"degraded"whenBACKUP_STATUS_FILEis unset is the right signal. The fix should be on the deployment side: configureBACKUP_STATUS_FILEin the production.env(ordocker-compose.prod.ymlenvironment block) pointing to an actual backup status file.The PR should add this env var to the production config so the check passes on deploy. The celery comparison was a false equivalence on my part.
Review item #2 (
test_version_fieldsno-op assertion) still stands.Superseded — resubmitting with corrected feedback.
QA Review — PR #56 (health endpoint)
Tests pass (21/21 health, 180/180 full suite), ruff clean, no regressions. Code is well-structured overall. Two issues need fixing before merge.
🔴 Must Fix
1.
BACKUP_STATUS_FILEnot configured in productioncheck_backup()correctly returns"fail"whenBACKUP_STATUS_FILEis unset — backups are not optional and a missing config should surface as degraded. However, this env var is not defined anywhere in the deployment config (.env,docker-compose.prod.yml, orentrypoint.prod.sh), so the health endpoint will report"degraded"immediately on deploy.Fix: Add
BACKUP_STATUS_FILEto the production environment (e.g. indocker-compose.prod.ymlor the.envtemplate) pointing to the actual backup status file path.2.
test_version_fieldsis a no-op assertiontest_views.py:86-94— The test setsGIT_SHAandBUILD_IDvia monkeypatch, but then only asserts the values are truthy:Since the view defaults to
"unknown"when the env vars are unset, and"unknown"is truthy, this test passes even without the monkeypatch. It does not verify the env vars are actually read.Fix: Assert the exact values:
🟡 Minor (non-blocking)
health_view— Consider adding@require_GETorrequire_http_methods(["GET", "HEAD"]).@pytest.mark.django_dbontest_db_ok/test_db_fail— These tests monkeypatch the cursor and never touch the real DB; the marker adds unnecessary transaction overhead.check_cache()skips cleanup on mismatch — Whencached_value != probe_value(line 33), the function returns withoutcache.delete(cache_key). The 5s TTL makes this harmless, but afinallyblock would be tidier.Both review items addressed. BACKUP_STATUS_FILE wired into prod compose with a read-only host mount, version test now asserts exact values. 180 tests pass, ruff clean. LGTM.