diff --git a/REPORT.md b/REPORT.md index 89439ed..9c224e6 100644 --- a/REPORT.md +++ b/REPORT.md @@ -1,138 +1,152 @@ -# Security Audit Report +# Security Production Readiness Report -Date: 2026-02-21 +Date: 2026-03-01 Repository: /Users/bedas/Developer/GitHub/dcm -Audit type: Static, read-only code and configuration review +Review Type: Static security review for production readiness ## Scope -- Backend API, worker, extraction and routing pipeline, settings handling, and storage interactions. -- Frontend dependency posture. -- Docker runtime and service exposure. - -## Method -- File-level inspection with targeted code tracing for authn/authz, input validation, upload and archive processing, outbound network behavior, secret handling, logging, and deployment hardening. -- No runtime penetration testing was performed. +- Backend: FastAPI API, worker queue, settings and model runtime services +- Frontend: React and Vite API client and document preview rendering +- Infrastructure: docker-compose service exposure and secret configuration ## Findings -### 1) Critical - Missing authentication and authorization on privileged API routes -- Impact: Any reachable client can access document, settings, and log-management functionality. +### Critical + +1. Redis queue is exposed without authentication and can be abused for worker job injection. +- Impact: If Redis is reachable by an attacker, queued job payloads can be injected and executed by the worker process, leading to remote code execution and data compromise. +- Exploit path: Reach Redis on port 6379, enqueue crafted RQ jobs into queue dcm, wait for worker consumption. - Evidence: - - `backend/app/main.py:29` - - `backend/app/api/router.py:14` - - `backend/app/api/routes_documents.py:464` - - `backend/app/api/routes_documents.py:666` - - `backend/app/api/routes_settings.py:148` - - `backend/app/api/routes_processing_logs.py:22` -- Recommendation: - - Enforce authentication globally for non-health routes. - - Add per-endpoint authorization checks for read/update/delete/admin actions. + - docker-compose publishes Redis host port: `docker-compose.yml:21` + - worker consumes from Redis queue directly: `docker-compose.yml:77` + - queue connection uses bare Redis URL with no auth/TLS: `backend/app/worker/queue.py:15`, `backend/app/worker/queue.py:21` + - current environment binds services to all interfaces: `.env:1` +- Remediation: + - Do not publish Redis externally in production. + - Enforce Redis authentication and TLS. + - Place Redis on a private network segment with strict ACLs. + - Treat queue producers as privileged components only. -### 2) Critical - SSRF and data exfiltration risk via configurable model provider base URL -- Impact: An attacker can redirect model calls to attacker-controlled or internal hosts and exfiltrate document-derived content. +2. Untrusted uploaded content is previewed in an unsandboxed iframe. +- Impact: Stored XSS and active content execution in preview context can enable account action abuse and data exfiltration in the browser. +- Exploit path: Upload active content (for example HTML), open preview, script executes in iframe without sandbox constraints. - Evidence: - - `backend/app/api/routes_settings.py:148` - - `backend/app/schemas/settings.py:24` - - `backend/app/services/app_settings.py:249` - - `backend/app/services/model_runtime.py:144` - - `backend/app/services/model_runtime.py:170` - - `backend/app/worker/tasks.py:505` - - `backend/app/services/routing_pipeline.py:803` -- Recommendation: - - Restrict provider endpoints to an allowlist. - - Validate URL scheme and block private/link-local destinations. - - Protect settings updates behind strict admin authorization. - - Enforce outbound egress controls at runtime. + - upload endpoint accepts generic uploaded files: `backend/app/api/routes_documents.py:493` + - MIME type is derived from bytes and persisted: `backend/app/api/routes_documents.py:530` + - preview endpoint returns original bytes inline with stored media type: `backend/app/api/routes_documents.py:449`, `backend/app/api/routes_documents.py:457` + - frontend renders preview in iframe without sandbox attribute: `frontend/src/components/DocumentViewer.tsx:486` + - preview source is a blob URL created from fetched content: `frontend/src/components/DocumentViewer.tsx:108`, `frontend/src/components/DocumentViewer.tsx:113` +- Remediation: + - Block inline preview for script-capable MIME types. + - Add strict iframe sandboxing if iframe preview remains required. + - Prefer force-download for active formats. + - Serve untrusted preview content from an isolated origin with restrictive CSP. -### 3) High - Unbounded upload and archive extraction can cause memory/disk denial of service -- Impact: Oversized files or compressed archive bombs can exhaust API/worker resources. +### High + +1. Frontend distributes a bearer token to all clients. +- Impact: Any user with browser access can extract the token and replay authenticated calls, preventing per-user accountability and increasing blast radius. +- Exploit path: Read token from frontend runtime environment or request headers, replay API requests with Authorization header. - Evidence: - - `backend/app/api/routes_documents.py:486` - - `backend/app/services/extractor.py:309` - - `backend/app/services/extractor.py:312` - - `backend/app/worker/tasks.py:122` - - `backend/app/core/config.py:20` -- Recommendation: - - Enforce request and file size limits. - - Stream uploads and extraction where possible. - - Cap total uncompressed archive size and per-entry size. + - frontend consumes token from public Vite env: `frontend/src/lib/api.ts:24` + - token is attached to every request when present: `frontend/src/lib/api.ts:38` + - compose passes `VITE_API_TOKEN` from user token: `docker-compose.yml:115` + - privileged routes rely on static token role checks: `backend/app/api/router.py:19`, `backend/app/api/auth.py:47`, `backend/app/api/auth.py:51` +- Remediation: + - Replace shared static token model with per-user authentication. + - Keep secrets server-side only. + - Use short-lived credentials with rotation and revocation. -### 4) High - Sensitive data logging exposed through unsecured log endpoints -- Impact: Extracted text, prompts, and model outputs may be retrievable by unauthorized callers. +2. Default and static service secrets are present in deploy config. +- Impact: If service ports are exposed, predictable credentials and keys allow unauthorized access to data services. +- Exploit path: Connect to published Postgres or Typesense ports and authenticate with known static values. - Evidence: - - `backend/app/models/processing_log.py:31` - - `backend/app/models/processing_log.py:32` - - `backend/app/services/routing_pipeline.py:803` - - `backend/app/services/routing_pipeline.py:814` - - `backend/app/worker/tasks.py:479` - - `backend/app/schemas/processing_logs.py:21` - - `backend/app/api/routes_processing_logs.py:22` -- Recommendation: - - Require admin authorization for log endpoints. - - Remove or redact sensitive payloads from logs. - - Reduce retention for operational logs that may include sensitive context. + - static Postgres credentials: `docker-compose.yml:5`, `docker-compose.yml:6` + - static Typesense key in compose and runtime env: `docker-compose.yml:29`, `docker-compose.yml:55`, `docker-compose.yml:93` + - database and Typesense ports are published to host: `docker-compose.yml:9`, `docker-compose.yml:32` + - current environment uses placeholder tokens: `.env:2`, `.env:3`, `.env:4` +- Remediation: + - Use high-entropy secrets managed outside repository configuration. + - Remove unnecessary host port publications in production. + - Restrict service network access to trusted internal components. -### 5) High - Internal services exposed with weak default posture in docker compose -- Impact: Exposed Redis/Postgres/Typesense can enable data compromise and queue abuse. +3. ZIP recursion depth control is not enforced across queued descendants. +- Impact: Nested archives can create uncontrolled fan-out, causing CPU, queue, and storage exhaustion. +- Exploit path: Upload ZIP containing ZIPs; children are queued as independent documents without inherited depth, repeating recursively. - Evidence: - - `docker-compose.yml:5` - - `docker-compose.yml:6` - - `docker-compose.yml:9` - - `docker-compose.yml:21` - - `docker-compose.yml:29` - - `docker-compose.yml:32` - - `docker-compose.yml:68` - - `backend/app/worker/queue.py:15` - - `backend/app/core/config.py:34` -- Recommendation: - - Remove unnecessary host port exposure for internal services. - - Use strong credentials and network ACL segmentation. - - Enable authentication and transport protections for stateful services. + - configured depth limit exists: `backend/app/core/config.py:28` + - extractor takes a depth argument but is called without propagation: `backend/app/services/extractor.py:302`, `backend/app/services/extractor.py:306` + - worker invokes extractor without depth context: `backend/app/worker/tasks.py:122` + - worker enqueues child archive jobs recursively: `backend/app/worker/tasks.py:225`, `backend/app/worker/tasks.py:226` +- Remediation: + - Persist and propagate archive depth per document lineage. + - Enforce absolute descendant and fan-out limits per root upload. + - Reject nested archives beyond configured depth. -### 6) Medium - Plaintext secrets and weak defaults in configuration paths -- Impact: Credentials and API keys can be exposed from source or storage. +### Medium + +1. OCR provider path does not apply DNS revalidation equivalent to model runtime path. +- Impact: Under permissive network flags, SSRF defenses can be weakened by DNS rebinding on OCR traffic. +- Exploit path: Persist provider URL that passes initial checks, then rebind DNS to private target before OCR requests. - Evidence: - - `backend/app/services/app_settings.py:129` - - `backend/app/services/app_settings.py:257` - - `backend/app/services/app_settings.py:667` - - `backend/app/core/config.py:17` - - `backend/app/core/config.py:34` - - `backend/.env.example:15` -- Recommendation: - - Use managed secrets storage and encryption at rest. - - Remove default credentials. - - Rotate exposed and default keys/credentials. + - task model runtime enforces `resolve_dns=True`: `backend/app/services/model_runtime.py:41` + - provider normalization in app settings does not pass DNS revalidation flag: `backend/app/services/app_settings.py:253` + - OCR runtime uses persisted URL for client base URL: `backend/app/services/app_settings.py:891`, `backend/app/services/handwriting.py:159` +- Remediation: + - Apply DNS revalidation before outbound OCR requests or on every runtime load. + - Disallow private network egress by default and require explicit controlled exceptions. -### 7) Low - Minimal HTTP hardening headers and broad CORS shape -- Impact: Increased browser-side attack surface, especially once authentication is introduced. +2. Provider API keys are persisted in plaintext settings on storage volume. +- Impact: File system or backup compromise reveals upstream provider secrets. +- Exploit path: Read persisted settings file from storage volume or backup artifact. - Evidence: - - `backend/app/main.py:23` - - `backend/app/main.py:25` - - `backend/app/main.py:26` - - `backend/app/main.py:27` -- Recommendation: - - Add standard security headers middleware. - - Constrain allowed methods and headers to actual application needs. + - settings file location under storage root: `backend/app/services/app_settings.py:133` + - provider payload includes plaintext `api_key`: `backend/app/services/app_settings.py:268` + - settings payload is written to disk as JSON: `backend/app/services/app_settings.py:680`, `backend/app/services/app_settings.py:685` + - OCR settings read returns stored API key value for runtime: `backend/app/services/app_settings.py:894` +- Remediation: + - Move provider secrets to dedicated secret management. + - If local persistence is unavoidable, encrypt sensitive fields at rest and restrict file permissions. -### 8) Low - Containers appear to run as root by default -- Impact: In-container compromise has higher blast radius. +### Low + +1. Frontend dependency is floating on latest. +- Impact: Non-deterministic installs and elevated supply chain drift risk. +- Exploit path: Fresh install resolves a newer unreviewed dependency release. - Evidence: - - `backend/Dockerfile:1` - - `backend/Dockerfile:17` - - `frontend/Dockerfile:1` - - `frontend/Dockerfile:16` -- Recommendation: - - Run containers as non-root users. - - Drop unnecessary Linux capabilities. + - dependency pinned to latest tag: `frontend/package.json:13` +- Remediation: + - Pin exact versions and update through controlled dependency review. -## Residual Risk and Assumptions -- This audit assumes services may be reachable beyond a strictly isolated localhost-only environment. -- If an external auth proxy is enforced upstream, risk severity of unauthenticated routes is reduced but not eliminated unless backend also enforces trust boundaries. -- Dependency CVE posture was not exhaustively enumerated in this static pass. +## Validation Commands and Outcomes +- `/Users/bedas/Developer/Python/global_venv/bin/python backend/tests/test_security_controls.py` + - Outcome: passed, 13 tests. +- `/Users/bedas/Developer/Python/global_venv/bin/python -m unittest discover -s backend/tests -p 'test_*.py'` + - Outcome: passed, 24 tests. -## Priority Remediation Order -1. Enforce authentication and authorization across API routes. -2. Lock down settings mutation paths, especially model provider endpoint configuration. -3. Add strict upload/extraction resource limits. -4. Remove sensitive logging and protect log APIs. -5. Harden Docker/network exposure and secrets management. +## Coverage and Residual Risk +- Coverage: + - Authentication and authorization controls. + - Document upload and preview data flow. + - Worker queue and archive processing path. + - Provider configuration and outbound request handling. + - Docker service exposure and secret defaults. +- Residual risk and limits: + - Static analysis only, no live penetration testing executed. + - Perimeter controls (reverse proxy, firewall, WAF, TLS topology) were not verifiable from repository state. + - Dependency CVE scanning was not executed in this review pass. + +## Delegation Report +- Primary owner by package: + - Security findings package: `security_reviewer` subagent, consolidated and validated by main thread. + - Repository reconnaissance package: main thread fallback after `explorer` interruption. + - Report authoring package: main thread. +- Agents invoked: + - `security_reviewer` (completed) + - `explorer` (interrupted) + - `awaiter` (completed validation command execution) +- Skills activated: + - `secure-delivery-gates` + - `documentation-standards` +- Required delegations not used and reason: + - `explorer` as final reconnaissance owner was required but unavailable due runtime interruption, so main thread performed direct source reconnaissance fallback.