Addresses production CPU spike incident. Key changes: - Guard OTel exporter behind OTEL_EXPORTER_OTLP_ENDPOINT env var; filter tracing to /api paths only — unconditional export was primary suspect - Remove /healthz endpoint entirely (unauthenticated, hit DB on every call) - Replace KnownUserMiddleware with POST /api/users/me called once on login from TokenSync — eliminates unconditional DB write on every request - Add DB indexes: (BudgetId, IsDeleted) on Incomes/Outgos, OwnerUserId on Budgets, SharedWithUserId and (IsPending, SharedWithEmail) on BudgetShares - Move UseRateLimiter() before UseStaticFiles() so all requests are throttled - Replace full-array reorder with move-by-position (id + newIndex) — bounded input, fewer DB writes, better API design - Lock ForwardedHeaders to 172.20.0.0/16 subnet; fixes KnownNetworks deprecation warning (0 warnings in build now) - Add AsNoTracking() to all read-only queries in Summary/Incomes/OutgosController - FrequencyCalculator returns 0 for unknown enum values instead of throwing - Thread.Sleep → await Task.Delay in OIDC startup loop - AllowedHosts locked to budget.stwaddle.com Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8.2 KiB
Security & Resource Hardening Plan
Date: 2026-05-06
Trigger: Production host CPU pegged >100% for several hours; Disk I/O high; network normal.
Root cause hypothesis: C-1 (unconditional OTel exporter) + H-1 (unconditional DB write per request).
Critical
C-1 — Guard UseOtlpExporter() Behind Env Var Check ✅ DONE
UseOtlpExporter() Behind Env Var CheckResolved 2026-05-06: Entire OTel block (logging + tracing + exporter) is now gated on OTEL_EXPORTER_OTLP_ENDPOINT being set. AspNetCore instrumentation also filtered to /api paths only so static file requests never generate spans. The duplicate builder.Logging.AddOpenTelemetry() is inside the same guard, eliminating the double-pipeline issue. Build verified clean.
C-2 — Require Auth on /healthz (or Remove DB Check) ✅ DONE
/healthz (or Remove DB Check)Resolved 2026-05-06: Endpoint removed entirely. Not wired to any orchestrator health check; budget app is lower priority than recipes/journal. AddHealthChecks(), AddDbContextCheck<AppDbContext>(), MapHealthChecks(), and both health-check using statements removed from Program.cs. Build verified clean.
C-3 — Move UseRateLimiter() Before UseStaticFiles() ✅ DONE
UseRateLimiter() Before UseStaticFiles()File: src/Budget.Api/Program.cs
Risk: UseStaticFiles short-circuits the pipeline before UseRateLimiter, so all SPA asset requests (JS bundles, CSS, index.html fallback) bypass rate limiting. Combined with C-1, every unthrottled static file request still generates a telemetry span.
Fix: Reorder middleware so app.UseRateLimiter() appears before app.UseDefaultFiles() and app.UseStaticFiles().
High
H-1 — Throttle KnownUserMiddleware DB Writes ✅ DONE
KnownUserMiddleware DB WritesResolved 2026-05-06: KnownUserMiddleware removed entirely. Replaced with POST /api/users/me endpoint in new UsersController — same upsert + pending share resolution logic, but now called once on login from TokenSync.tsx rather than on every request. api.post body param made optional to support the no-body call. Backend and frontend builds verified clean.
H-2 — Reduce BudgetAuthorizationService to a Single Query
File: src/Budget.Infrastructure/Services/BudgetAuthorizationService.cs
Risk: GetAccessAsync always issues two sequential queries (budget row, then share row). Combined with controller entity loads, a single write endpoint makes 4+ DB round-trips; GET /summary makes 5.
Fix: Combine into one projected query:
var result = await db.Budgets
.AsNoTracking()
.Where(b => b.Id == budgetId)
.Select(b => new {
b.OwnerUserId,
Share = b.Shares.FirstOrDefault(s => s.SharedWithUserId == userId && !s.IsPending)
})
.FirstOrDefaultAsync();
H-3 — Add Missing Database Indexes ✅ DONE
Resolved 2026-05-06: Added 5 indexes to AppDbContext.OnModelCreating. Migration 20260507024323_AddHardeningIndexes generated and verified. Replaces single-column BudgetId indexes on Incomes/Outgos with composite (BudgetId, IsDeleted) versions; adds BudgetShares.(IsPending, SharedWithEmail), BudgetShares.SharedWithUserId, and Budgets.OwnerUserId. Will be applied automatically at next startup via MigrateAsync().
H-4 — Cap Reorder Endpoint Input Size ✅ DONE
Reorder Endpoint Input SizeResolved 2026-05-06: Replaced full-array reorder with a move-by-position design. ReorderIncomesRequest/ReorderOutgosRequest removed and replaced with MoveIncomeRequest(Guid Id, int NewIndex) / MoveOutgoRequest(Guid Id, int NewIndex). Server loads items sorted by SortOrder, finds the item, shifts only the affected range, sets the new position. Frontend sends { id, newIndex } from dnd-kit's existing oldIdx/newIdx — arrayMove still used for optimistic local state. Input is inherently bounded to one GUID and one integer.
H-5 — Lock Down ForwardedHeaders to Known Proxy IP ✅ DONE
ForwardedHeaders to Known Proxy IPResolved 2026-05-06: Replaced clear-all with explicit subnet trust. KnownIPNetworks (the .NET 10 replacement for deprecated KnownNetworks) is cleared and populated with 172.20.0.0/16 — the Docker network subnet for the nginx-proxy stack. Using the subnet rather than a specific container IP avoids breakage if nginx-proxy gets a different IP on restart. Deprecation warning eliminated; build clean with 0 warnings.
Medium
M-1 — OIDC JWKS Fetched Over Plain HTTP
File: src/Budget.Api/Program.cs
Risk: MetadataAddress points to http://auth:8080/.... JWKS (token signing keys) arrive over plain HTTP. On a compromised Docker network, an attacker could substitute keys and forge valid JWTs. Low severity on a single-host deployment; higher risk in any multi-host topology.
Fix: Enable internal TLS on the auth service, or pin the signing key in configuration instead of relying on discovery.
M-2 — Unknown Frequency Enum Crashes Entire Budget ✅ DONE
Frequency Enum Crashes Entire BudgetFile: src/Budget.Core/Services/FrequencyCalculator.cs
Risk: An unrecognised enum integer (data corruption or future migration) throws ArgumentOutOfRangeException, making all income/outgo/summary endpoints permanently broken for that budget and generating error spans that amplify OTel export volume.
Fix: Validate enum values at the controller layer with [EnumDataType(typeof(Frequency))] on write DTOs. Return a graceful default or 422 in the calculator rather than throwing.
M-3 — Missing AsNoTracking() on Read Queries ✅ DONE
AsNoTracking() on Read QueriesFiles: src/Budget.Api/Controllers/SummaryController.cs, IncomesController.cs, OutgosController.cs
Risk: EF change tracking runs on every loaded entity even for pure-read endpoints, wasting CPU/memory. With OTel EF instrumentation active, every tracked entity event adds overhead.
Fix: Add .AsNoTracking() to all List and Get queries that do not write.
M-4 — Thread.Sleep in Async Startup Loop ✅ DONE
Thread.Sleep in Async Startup LoopFile: src/Budget.Api/Program.cs
Risk: Synchronous Thread.Sleep inside the async OIDC discovery retry loop blocks a thread pool thread for up to 2 minutes during container startup.
Fix:
await Task.Delay(TimeSpan.FromSeconds(5));
M-5 — AllowedHosts: "*" Disables Host Header Validation ✅ DONE
AllowedHosts: "*" Disables Host Header ValidationFile: src/Budget.Api/appsettings.json
Risk: Allows HTTP host header injection if the app is ever behind a misconfigured proxy.
Fix: Set to the actual public hostname:
"AllowedHosts": "budget.stwaddle.com"
Low / Informational
| ID | File | Issue | Fix |
|---|---|---|---|
| L-1 | appsettings.json |
Docker-internal MetadataAddress in base config — if the Docker service name auth is not resolvable, OIDC discovery silently fails |
Move MetadataAddress to a Docker Compose env override; base config default should be empty |
| L-2 | Program.cs |
No timeout on startup MigrateAsync() — hangs indefinitely if Postgres is unresponsive |
Wrap with a CancellationTokenSource with a deadline |
| L-3 | ErrorHandlingMiddleware.cs |
Full exception stack traces exported via OTel on every unhandled error — exception storms spike collector disk I/O | Consider LogWarning for expected exceptions; reserve LogError for truly unexpected ones |
Implementation Order
C-1 — Guard✅ Done (2026-05-06)UseOtlpExporter()(likely root cause; deploy ASAP)H-1 — Throttle✅ Done (2026-05-06)KnownUserMiddlewarewrites (amplifies Disk I/O on every request)C-2 — Auth-gate or simplify✅ Removed entirely (2026-05-06)/healthzH-3 — Add missing indexes (single migration, low risk)✅ Done (2026-05-06)C-3 — Move rate limiter before static files✅ Done (2026-05-06)H-4 — Cap✅ Done — redesigned as move-by-position (2026-05-06)ReorderinputH-5 — Lock down✅ Done (2026-05-06)ForwardedHeadersto proxy IPM-3 — Add✅ Done (2026-05-06)AsNoTracking()across read controllersM-2 — Harden✅ Done (2026-05-06)FrequencyCalculatorenum handlingM-4 —✅ Done (2026-05-06)Thread.Sleep→await Task.Delayin startup loopM-5 —✅ Done (2026-05-06)AllowedHosts: "*"→budget.stwaddle.com