Files
budget/.plans/hardening_050626.md
T
Spencer Twaddle ac3dcc2f31 Security & resource hardening: eliminate CPU/disk attack surface
Addresses production CPU spike incident. Key changes:

- Guard OTel exporter behind OTEL_EXPORTER_OTLP_ENDPOINT env var; filter
  tracing to /api paths only — unconditional export was primary suspect
- Remove /healthz endpoint entirely (unauthenticated, hit DB on every call)
- Replace KnownUserMiddleware with POST /api/users/me called once on login
  from TokenSync — eliminates unconditional DB write on every request
- Add DB indexes: (BudgetId, IsDeleted) on Incomes/Outgos, OwnerUserId on
  Budgets, SharedWithUserId and (IsPending, SharedWithEmail) on BudgetShares
- Move UseRateLimiter() before UseStaticFiles() so all requests are throttled
- Replace full-array reorder with move-by-position (id + newIndex) — bounded
  input, fewer DB writes, better API design
- Lock ForwardedHeaders to 172.20.0.0/16 subnet; fixes KnownNetworks
  deprecation warning (0 warnings in build now)
- Add AsNoTracking() to all read-only queries in Summary/Incomes/OutgosController
- FrequencyCalculator returns 0 for unknown enum values instead of throwing
- Thread.Sleep → await Task.Delay in OIDC startup loop
- AllowedHosts locked to budget.stwaddle.com

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 22:17:18 -05:00

8.2 KiB

Security & Resource Hardening Plan

Date: 2026-05-06
Trigger: Production host CPU pegged >100% for several hours; Disk I/O high; network normal.
Root cause hypothesis: C-1 (unconditional OTel exporter) + H-1 (unconditional DB write per request).


Critical

C-1 — Guard UseOtlpExporter() Behind Env Var Check DONE

Resolved 2026-05-06: Entire OTel block (logging + tracing + exporter) is now gated on OTEL_EXPORTER_OTLP_ENDPOINT being set. AspNetCore instrumentation also filtered to /api paths only so static file requests never generate spans. The duplicate builder.Logging.AddOpenTelemetry() is inside the same guard, eliminating the double-pipeline issue. Build verified clean.


C-2 — Require Auth on /healthz (or Remove DB Check) DONE

Resolved 2026-05-06: Endpoint removed entirely. Not wired to any orchestrator health check; budget app is lower priority than recipes/journal. AddHealthChecks(), AddDbContextCheck<AppDbContext>(), MapHealthChecks(), and both health-check using statements removed from Program.cs. Build verified clean.


C-3 — Move UseRateLimiter() Before UseStaticFiles() DONE

File: src/Budget.Api/Program.cs
Risk: UseStaticFiles short-circuits the pipeline before UseRateLimiter, so all SPA asset requests (JS bundles, CSS, index.html fallback) bypass rate limiting. Combined with C-1, every unthrottled static file request still generates a telemetry span.

Fix: Reorder middleware so app.UseRateLimiter() appears before app.UseDefaultFiles() and app.UseStaticFiles().


High

H-1 — Throttle KnownUserMiddleware DB Writes DONE

Resolved 2026-05-06: KnownUserMiddleware removed entirely. Replaced with POST /api/users/me endpoint in new UsersController — same upsert + pending share resolution logic, but now called once on login from TokenSync.tsx rather than on every request. api.post body param made optional to support the no-body call. Backend and frontend builds verified clean.


H-2 — Reduce BudgetAuthorizationService to a Single Query

File: src/Budget.Infrastructure/Services/BudgetAuthorizationService.cs
Risk: GetAccessAsync always issues two sequential queries (budget row, then share row). Combined with controller entity loads, a single write endpoint makes 4+ DB round-trips; GET /summary makes 5.

Fix: Combine into one projected query:

var result = await db.Budgets
    .AsNoTracking()
    .Where(b => b.Id == budgetId)
    .Select(b => new {
        b.OwnerUserId,
        Share = b.Shares.FirstOrDefault(s => s.SharedWithUserId == userId && !s.IsPending)
    })
    .FirstOrDefaultAsync();

H-3 — Add Missing Database Indexes DONE

Resolved 2026-05-06: Added 5 indexes to AppDbContext.OnModelCreating. Migration 20260507024323_AddHardeningIndexes generated and verified. Replaces single-column BudgetId indexes on Incomes/Outgos with composite (BudgetId, IsDeleted) versions; adds BudgetShares.(IsPending, SharedWithEmail), BudgetShares.SharedWithUserId, and Budgets.OwnerUserId. Will be applied automatically at next startup via MigrateAsync().


H-4 — Cap Reorder Endpoint Input Size DONE

Resolved 2026-05-06: Replaced full-array reorder with a move-by-position design. ReorderIncomesRequest/ReorderOutgosRequest removed and replaced with MoveIncomeRequest(Guid Id, int NewIndex) / MoveOutgoRequest(Guid Id, int NewIndex). Server loads items sorted by SortOrder, finds the item, shifts only the affected range, sets the new position. Frontend sends { id, newIndex } from dnd-kit's existing oldIdx/newIdxarrayMove still used for optimistic local state. Input is inherently bounded to one GUID and one integer.


H-5 — Lock Down ForwardedHeaders to Known Proxy IP DONE

Resolved 2026-05-06: Replaced clear-all with explicit subnet trust. KnownIPNetworks (the .NET 10 replacement for deprecated KnownNetworks) is cleared and populated with 172.20.0.0/16 — the Docker network subnet for the nginx-proxy stack. Using the subnet rather than a specific container IP avoids breakage if nginx-proxy gets a different IP on restart. Deprecation warning eliminated; build clean with 0 warnings.


Medium

M-1 — OIDC JWKS Fetched Over Plain HTTP

File: src/Budget.Api/Program.cs
Risk: MetadataAddress points to http://auth:8080/.... JWKS (token signing keys) arrive over plain HTTP. On a compromised Docker network, an attacker could substitute keys and forge valid JWTs. Low severity on a single-host deployment; higher risk in any multi-host topology.

Fix: Enable internal TLS on the auth service, or pin the signing key in configuration instead of relying on discovery.


M-2 — Unknown Frequency Enum Crashes Entire Budget DONE

File: src/Budget.Core/Services/FrequencyCalculator.cs
Risk: An unrecognised enum integer (data corruption or future migration) throws ArgumentOutOfRangeException, making all income/outgo/summary endpoints permanently broken for that budget and generating error spans that amplify OTel export volume.

Fix: Validate enum values at the controller layer with [EnumDataType(typeof(Frequency))] on write DTOs. Return a graceful default or 422 in the calculator rather than throwing.


M-3 — Missing AsNoTracking() on Read Queries DONE

Files: src/Budget.Api/Controllers/SummaryController.cs, IncomesController.cs, OutgosController.cs
Risk: EF change tracking runs on every loaded entity even for pure-read endpoints, wasting CPU/memory. With OTel EF instrumentation active, every tracked entity event adds overhead.

Fix: Add .AsNoTracking() to all List and Get queries that do not write.


M-4 — Thread.Sleep in Async Startup Loop DONE

File: src/Budget.Api/Program.cs
Risk: Synchronous Thread.Sleep inside the async OIDC discovery retry loop blocks a thread pool thread for up to 2 minutes during container startup.

Fix:

await Task.Delay(TimeSpan.FromSeconds(5));

M-5 — AllowedHosts: "*" Disables Host Header Validation DONE

File: src/Budget.Api/appsettings.json
Risk: Allows HTTP host header injection if the app is ever behind a misconfigured proxy.

Fix: Set to the actual public hostname:

"AllowedHosts": "budget.stwaddle.com"

Low / Informational

ID File Issue Fix
L-1 appsettings.json Docker-internal MetadataAddress in base config — if the Docker service name auth is not resolvable, OIDC discovery silently fails Move MetadataAddress to a Docker Compose env override; base config default should be empty
L-2 Program.cs No timeout on startup MigrateAsync() — hangs indefinitely if Postgres is unresponsive Wrap with a CancellationTokenSource with a deadline
L-3 ErrorHandlingMiddleware.cs Full exception stack traces exported via OTel on every unhandled error — exception storms spike collector disk I/O Consider LogWarning for expected exceptions; reserve LogError for truly unexpected ones

Implementation Order

  1. C-1 — Guard UseOtlpExporter() (likely root cause; deploy ASAP) Done (2026-05-06)
  2. H-1 — Throttle KnownUserMiddleware writes (amplifies Disk I/O on every request) Done (2026-05-06)
  3. C-2 — Auth-gate or simplify /healthz Removed entirely (2026-05-06)
  4. H-3 — Add missing indexes (single migration, low risk) Done (2026-05-06)
  5. C-3 — Move rate limiter before static files Done (2026-05-06)
  6. H-4 — Cap Reorder input Done — redesigned as move-by-position (2026-05-06)
  7. H-5 — Lock down ForwardedHeaders to proxy IP Done (2026-05-06)
  8. M-3 — Add AsNoTracking() across read controllers Done (2026-05-06)
  9. M-2 — Harden FrequencyCalculator enum handling Done (2026-05-06)
  10. M-4Thread.Sleepawait Task.Delay in startup loop Done (2026-05-06)
  11. M-5AllowedHosts: "*"budget.stwaddle.com Done (2026-05-06)