Update Report

docs: update security production readiness report
2026-03-01 12:42:52 -03:00 · 2026-03-01 12:35:57 -03:00
1 changed files with 89 additions and 118 deletions
@@ -1,138 +1,109 @@
-# Security Audit Report
+# Security Production Readiness Report

-Date: 2026-02-21
+Date: 2026-03-01
 Repository: /Users/bedas/Developer/GitHub/dcm
-Audit type: Static, read-only code and configuration review
+Review Type: Static security review for production readiness

 ## Scope
- Backend API, worker, extraction and routing pipeline, settings handling, and storage interactions.
- Frontend dependency posture.
- Docker runtime and service exposure.
-
-## Method
- File-level inspection with targeted code tracing for authn/authz, input validation, upload and archive processing, outbound network behavior, secret handling, logging, and deployment hardening.
- No runtime penetration testing was performed.
+- Backend: FastAPI API, worker queue, settings and model runtime services
+- Frontend: React and Vite API client and document preview rendering
+- Infrastructure: docker-compose service exposure and secret configuration

 ## Findings

-### 1) Critical - Missing authentication and authorization on privileged API routes
- Impact: Any reachable client can access document, settings, and log-management functionality.
+### Critical
+
+1. Redis queue is exposed without authentication and can be abused for worker job injection.
+- Impact: If Redis is reachable by an attacker, queued job payloads can be injected and executed by the worker process, leading to remote code execution and data compromise.
+- Exploit path: Reach Redis on port 6379, enqueue crafted RQ jobs into queue dcm, wait for worker consumption.
 - Evidence:
-  - `backend/app/main.py:29`
-  - `backend/app/api/router.py:14`
-  - `backend/app/api/routes_documents.py:464`
-  - `backend/app/api/routes_documents.py:666`
-  - `backend/app/api/routes_settings.py:148`
-  - `backend/app/api/routes_processing_logs.py:22`
- Recommendation:
-  - Enforce authentication globally for non-health routes.
-  - Add per-endpoint authorization checks for read/update/delete/admin actions.
+  - docker-compose publishes Redis host port: `docker-compose.yml:21`
+  - worker consumes from Redis queue directly: `docker-compose.yml:77`
+  - queue connection uses bare Redis URL with no auth/TLS: `backend/app/worker/queue.py:15`, `backend/app/worker/queue.py:21`
+  - current environment binds services to all interfaces: `.env:1`
+- Remediation:
+  - Do not publish Redis externally in production.
+  - Enforce Redis authentication and TLS.
+  - Place Redis on a private network segment with strict ACLs.
+  - Treat queue producers as privileged components only.

-### 2) Critical - SSRF and data exfiltration risk via configurable model provider base URL
- Impact: An attacker can redirect model calls to attacker-controlled or internal hosts and exfiltrate document-derived content.
+2. Untrusted uploaded content is previewed in an unsandboxed iframe.
+- Impact: Stored XSS and active content execution in preview context can enable account action abuse and data exfiltration in the browser.
+- Exploit path: Upload active content (for example HTML), open preview, script executes in iframe without sandbox constraints.
 - Evidence:
-  - `backend/app/api/routes_settings.py:148`
-  - `backend/app/schemas/settings.py:24`
-  - `backend/app/services/app_settings.py:249`
-  - `backend/app/services/model_runtime.py:144`
-  - `backend/app/services/model_runtime.py:170`
-  - `backend/app/worker/tasks.py:505`
-  - `backend/app/services/routing_pipeline.py:803`
- Recommendation:
-  - Restrict provider endpoints to an allowlist.
-  - Validate URL scheme and block private/link-local destinations.
-  - Protect settings updates behind strict admin authorization.
-  - Enforce outbound egress controls at runtime.
+  - upload endpoint accepts generic uploaded files: `backend/app/api/routes_documents.py:493`
+  - MIME type is derived from bytes and persisted: `backend/app/api/routes_documents.py:530`
+  - preview endpoint returns original bytes inline with stored media type: `backend/app/api/routes_documents.py:449`, `backend/app/api/routes_documents.py:457`
+  - frontend renders preview in iframe without sandbox attribute: `frontend/src/components/DocumentViewer.tsx:486`
+  - preview source is a blob URL created from fetched content: `frontend/src/components/DocumentViewer.tsx:108`, `frontend/src/components/DocumentViewer.tsx:113`
+- Remediation:
+  - Block inline preview for script-capable MIME types.
+  - Add strict iframe sandboxing if iframe preview remains required.
+  - Prefer force-download for active formats.
+  - Serve untrusted preview content from an isolated origin with restrictive CSP.

-### 3) High - Unbounded upload and archive extraction can cause memory/disk denial of service
- Impact: Oversized files or compressed archive bombs can exhaust API/worker resources.
+### High
+
+1. Frontend distributes a bearer token to all clients.
+- Impact: Any user with browser access can extract the token and replay authenticated calls, preventing per-user accountability and increasing blast radius.
+- Exploit path: Read token from frontend runtime environment or request headers, replay API requests with Authorization header.
 - Evidence:
-  - `backend/app/api/routes_documents.py:486`
-  - `backend/app/services/extractor.py:309`
-  - `backend/app/services/extractor.py:312`
-  - `backend/app/worker/tasks.py:122`
-  - `backend/app/core/config.py:20`
- Recommendation:
-  - Enforce request and file size limits.
-  - Stream uploads and extraction where possible.
-  - Cap total uncompressed archive size and per-entry size.
+  - frontend consumes token from public Vite env: `frontend/src/lib/api.ts:24`
+  - token is attached to every request when present: `frontend/src/lib/api.ts:38`
+  - compose passes `VITE_API_TOKEN` from user token: `docker-compose.yml:115`
+  - privileged routes rely on static token role checks: `backend/app/api/router.py:19`, `backend/app/api/auth.py:47`, `backend/app/api/auth.py:51`
+- Remediation:
+  - Replace shared static token model with per-user authentication.
+  - Keep secrets server-side only.
+  - Use short-lived credentials with rotation and revocation.

-### 4) High - Sensitive data logging exposed through unsecured log endpoints
- Impact: Extracted text, prompts, and model outputs may be retrievable by unauthorized callers.
+2. Default and static service secrets are present in deploy config.
+- Impact: If service ports are exposed, predictable credentials and keys allow unauthorized access to data services.
+- Exploit path: Connect to published Postgres or Typesense ports and authenticate with known static values.
 - Evidence:
-  - `backend/app/models/processing_log.py:31`
-  - `backend/app/models/processing_log.py:32`
-  - `backend/app/services/routing_pipeline.py:803`
-  - `backend/app/services/routing_pipeline.py:814`
-  - `backend/app/worker/tasks.py:479`
-  - `backend/app/schemas/processing_logs.py:21`
-  - `backend/app/api/routes_processing_logs.py:22`
- Recommendation:
-  - Require admin authorization for log endpoints.
-  - Remove or redact sensitive payloads from logs.
-  - Reduce retention for operational logs that may include sensitive context.
+  - static Postgres credentials: `docker-compose.yml:5`, `docker-compose.yml:6`
+  - static Typesense key in compose and runtime env: `docker-compose.yml:29`, `docker-compose.yml:55`, `docker-compose.yml:93`
+  - database and Typesense ports are published to host: `docker-compose.yml:9`, `docker-compose.yml:32`
+  - current environment uses placeholder tokens: `.env:2`, `.env:3`, `.env:4`
+- Remediation:
+  - Use high-entropy secrets managed outside repository configuration.
+  - Remove unnecessary host port publications in production.
+  - Restrict service network access to trusted internal components.

-### 5) High - Internal services exposed with weak default posture in docker compose
- Impact: Exposed Redis/Postgres/Typesense can enable data compromise and queue abuse.
+3. ZIP recursion depth control is not enforced across queued descendants.
+- Impact: Nested archives can create uncontrolled fan-out, causing CPU, queue, and storage exhaustion.
+- Exploit path: Upload ZIP containing ZIPs; children are queued as independent documents without inherited depth, repeating recursively.
 - Evidence:
-  - `docker-compose.yml:5`
-  - `docker-compose.yml:6`
-  - `docker-compose.yml:9`
-  - `docker-compose.yml:21`
-  - `docker-compose.yml:29`
-  - `docker-compose.yml:32`
-  - `docker-compose.yml:68`
-  - `backend/app/worker/queue.py:15`
-  - `backend/app/core/config.py:34`
- Recommendation:
-  - Remove unnecessary host port exposure for internal services.
-  - Use strong credentials and network ACL segmentation.
-  - Enable authentication and transport protections for stateful services.
+  - configured depth limit exists: `backend/app/core/config.py:28`
+  - extractor takes a depth argument but is called without propagation: `backend/app/services/extractor.py:302`, `backend/app/services/extractor.py:306`
+  - worker invokes extractor without depth context: `backend/app/worker/tasks.py:122`
+  - worker enqueues child archive jobs recursively: `backend/app/worker/tasks.py:225`, `backend/app/worker/tasks.py:226`
+- Remediation:
+  - Persist and propagate archive depth per document lineage.
+  - Enforce absolute descendant and fan-out limits per root upload.
+  - Reject nested archives beyond configured depth.

-### 6) Medium - Plaintext secrets and weak defaults in configuration paths
- Impact: Credentials and API keys can be exposed from source or storage.
+### Medium
+
+1. OCR provider path does not apply DNS revalidation equivalent to model runtime path.
+- Impact: Under permissive network flags, SSRF defenses can be weakened by DNS rebinding on OCR traffic.
+- Exploit path: Persist provider URL that passes initial checks, then rebind DNS to private target before OCR requests.
 - Evidence:
-  - `backend/app/services/app_settings.py:129`
-  - `backend/app/services/app_settings.py:257`
-  - `backend/app/services/app_settings.py:667`
-  - `backend/app/core/config.py:17`
-  - `backend/app/core/config.py:34`
-  - `backend/.env.example:15`
- Recommendation:
-  - Use managed secrets storage and encryption at rest.
-  - Remove default credentials.
-  - Rotate exposed and default keys/credentials.
+  - task model runtime enforces `resolve_dns=True`: `backend/app/services/model_runtime.py:41`
+  - provider normalization in app settings does not pass DNS revalidation flag: `backend/app/services/app_settings.py:253`
+  - OCR runtime uses persisted URL for client base URL: `backend/app/services/app_settings.py:891`, `backend/app/services/handwriting.py:159`
+- Remediation:
+  - Apply DNS revalidation before outbound OCR requests or on every runtime load.
+  - Disallow private network egress by default and require explicit controlled exceptions.

-### 7) Low - Minimal HTTP hardening headers and broad CORS shape
- Impact: Increased browser-side attack surface, especially once authentication is introduced.
+2. Provider API keys are persisted in plaintext settings on storage volume.
+- Impact: File system or backup compromise reveals upstream provider secrets.
+- Exploit path: Read persisted settings file from storage volume or backup artifact.
 - Evidence:
-  - `backend/app/main.py:23`
-  - `backend/app/main.py:25`
-  - `backend/app/main.py:26`
-  - `backend/app/main.py:27`
- Recommendation:
-  - Add standard security headers middleware.
-  - Constrain allowed methods and headers to actual application needs.
-
-### 8) Low - Containers appear to run as root by default
- Impact: In-container compromise has higher blast radius.
- Evidence:
-  - `backend/Dockerfile:1`
-  - `backend/Dockerfile:17`
-  - `frontend/Dockerfile:1`
-  - `frontend/Dockerfile:16`
- Recommendation:
-  - Run containers as non-root users.
-  - Drop unnecessary Linux capabilities.
-
-## Residual Risk and Assumptions
- This audit assumes services may be reachable beyond a strictly isolated localhost-only environment.
- If an external auth proxy is enforced upstream, risk severity of unauthenticated routes is reduced but not eliminated unless backend also enforces trust boundaries.
- Dependency CVE posture was not exhaustively enumerated in this static pass.
-
-## Priority Remediation Order
-1. Enforce authentication and authorization across API routes.
-2. Lock down settings mutation paths, especially model provider endpoint configuration.
-3. Add strict upload/extraction resource limits.
-4. Remove sensitive logging and protect log APIs.
-5. Harden Docker/network exposure and secrets management.
+  - settings file location under storage root: `backend/app/services/app_settings.py:133`
+  - provider payload includes plaintext `api_key`: `backend/app/services/app_settings.py:268`
+  - settings payload is written to disk as JSON: `backend/app/services/app_settings.py:680`, `backend/app/services/app_settings.py:685`
+  - OCR settings read returns stored API key value for runtime: `backend/app/services/app_settings.py:894`
+- Remediation:
+  - Move provider secrets to dedicated secret management.
+  - If local persistence is unavoidable, encrypt sensitive fields at rest and restrict file permissions.
Author	SHA1	Message	Date
smileBeda	da5cbc2c01	Update Report	2026-03-01 12:42:52 -03:00
smileBeda	652d7e8f25	docs: update security production readiness report	2026-03-01 12:35:57 -03:00