Replace REPORT.md with production security readiness assessment

2026-03-01 14:56:26 -03:00
parent c5423fc9c3
commit 7a19f22f41
1 changed files with 128 additions and 88 deletions
@@ -2,108 +2,148 @@
 Date: 2026-03-01
 Repository: /Users/bedas/Developer/GitHub/dcm
-Review Type: Static security review for production readiness
+Review type: Static code and configuration review (no runtime penetration testing)
 ## Scope
- Backend: FastAPI API, worker queue, settings and model runtime services
+- Backend API and worker: `backend/app`
- Frontend: React and Vite API client and document preview rendering
+- Frontend API client/auth transport: `frontend/src`
- Infrastructure: docker-compose service exposure and secret configuration
+- Compose and environment defaults: `docker-compose.yml`, `.env`
-## Findings
+## Method and Limits
 - Reviewed source and configuration files in the current checkout.
 - Verified findings with direct file evidence.
 - Did not run dynamic security testing, dependency CVE scanning, or infrastructure perimeter testing.
 ## Confirmed Product Security Findings
 ### Critical
-1. Redis queue is exposed without authentication and can be abused for worker job injection.
+1. Browser-exposed shared bearer token path (`VITE_API_TOKEN` fallback)
- Impact: If Redis is reachable by an attacker, queued job payloads can be injected and executed by the worker process, leading to remote code execution and data compromise.
+- Severity: Critical
- Exploit path: Reach Redis on port 6379, enqueue crafted RQ jobs into queue dcm, wait for worker consumption.
+- Why this is a product issue: The frontend code supports a build-time token fallback and injects it into all API requests. This creates a shared credential model in browser code.
 - Impact: Any user with browser access can recover and reuse the token, collapsing auth boundaries and auditability.
 - Exploit path: Open app -> inspect runtime/bundle or intercepted request -> replay bearer token against protected API endpoints.
 - Evidence:
-  - docker-compose publishes Redis host port: `docker-compose.yml:21`
+  - `frontend/src/lib/api.ts:39`
-  - worker consumes from Redis queue directly: `docker-compose.yml:77`
+  - `frontend/src/lib/api.ts:98`
-  - queue connection uses bare Redis URL with no auth/TLS: `backend/app/worker/queue.py:15`, `backend/app/worker/queue.py:21`
+  - `frontend/src/lib/api.ts:111`
-  - current environment binds services to all interfaces: `.env:1`
+  - `frontend/src/lib/api.ts:155`
- Remediation:
+  - `docker-compose.yml:123`
-  - Do not publish Redis externally in production.
+  - `backend/app/api/router.py:25`
-  - Enforce Redis authentication and TLS.
+  - `backend/app/api/router.py:37`
-  - Place Redis on a private network segment with strict ACLs.
+- Production recommendation:
-  - Treat queue producers as privileged components only.
+  - Remove browser-side static token fallback.
-
+  - Use per-user server-issued auth (session or short-lived JWT) with role-bound authorization.
 2. Untrusted uploaded content is previewed in an unsandboxed iframe.
 - Impact: Stored XSS and active content execution in preview context can enable account action abuse and data exfiltration in the browser.
 - Exploit path: Upload active content (for example HTML), open preview, script executes in iframe without sandbox constraints.
 - Evidence:
  - upload endpoint accepts generic uploaded files: `backend/app/api/routes_documents.py:493`
  - MIME type is derived from bytes and persisted: `backend/app/api/routes_documents.py:530`
  - preview endpoint returns original bytes inline with stored media type: `backend/app/api/routes_documents.py:449`, `backend/app/api/routes_documents.py:457`
  - frontend renders preview in iframe without sandbox attribute: `frontend/src/components/DocumentViewer.tsx:486`
  - preview source is a blob URL created from fetched content: `frontend/src/components/DocumentViewer.tsx:108`, `frontend/src/components/DocumentViewer.tsx:113`
 - Remediation:
  - Block inline preview for script-capable MIME types.
  - Add strict iframe sandboxing if iframe preview remains required.
  - Prefer force-download for active formats.
  - Serve untrusted preview content from an isolated origin with restrictive CSP.
 ### High
-1. Frontend distributes a bearer token to all clients.
+1. CORS policy is effectively any HTTP/HTTPS origin, with credentials enabled
- Impact: Any user with browser access can extract the token and replay authenticated calls, preventing per-user accountability and increasing blast radius.
+- Severity: High
- Exploit path: Read token from frontend runtime environment or request headers, replay API requests with Authorization header.
+- Why this is a product issue: CORS middleware enables `allow_origin_regex` that matches broad web origins and sets `allow_credentials=True`.
 - Impact: If credentials are present, cross-origin access risk increases and token abuse becomes easier from arbitrary origins.
 - Exploit path: Malicious origin performs cross-origin requests with available credentials and can read API responses under permissive CORS policy.
 - Evidence:
-  - frontend consumes token from public Vite env: `frontend/src/lib/api.ts:24`
+  - `backend/app/main.py:21`
-  - token is attached to every request when present: `frontend/src/lib/api.ts:38`
+  - `backend/app/main.py:41`
-  - compose passes `VITE_API_TOKEN` from user token: `docker-compose.yml:115`
+  - `backend/app/main.py:42`
-  - privileged routes rely on static token role checks: `backend/app/api/router.py:19`, `backend/app/api/auth.py:47`, `backend/app/api/auth.py:51`
+  - `backend/app/main.py:44`
- Remediation:
+- Production recommendation:
-  - Replace shared static token model with per-user authentication.
+  - Replace regex-based broad origin acceptance with explicit trusted origin allowlist.
-  - Keep secrets server-side only.
+  - Keep `allow_credentials=False` unless strictly required for cookie-based flows.
  - Use short-lived credentials with rotation and revocation.
 2. Default and static service secrets are present in deploy config.
 - Impact: If service ports are exposed, predictable credentials and keys allow unauthorized access to data services.
 - Exploit path: Connect to published Postgres or Typesense ports and authenticate with known static values.
 - Evidence:
  - static Postgres credentials: `docker-compose.yml:5`, `docker-compose.yml:6`
  - static Typesense key in compose and runtime env: `docker-compose.yml:29`, `docker-compose.yml:55`, `docker-compose.yml:93`
  - database and Typesense ports are published to host: `docker-compose.yml:9`, `docker-compose.yml:32`
  - current environment uses placeholder tokens: `.env:2`, `.env:3`, `.env:4`
 - Remediation:
  - Use high-entropy secrets managed outside repository configuration.
  - Remove unnecessary host port publications in production.
  - Restrict service network access to trusted internal components.
 3. ZIP recursion depth control is not enforced across queued descendants.
 - Impact: Nested archives can create uncontrolled fan-out, causing CPU, queue, and storage exhaustion.
 - Exploit path: Upload ZIP containing ZIPs; children are queued as independent documents without inherited depth, repeating recursively.
 - Evidence:
  - configured depth limit exists: `backend/app/core/config.py:28`
  - extractor takes a depth argument but is called without propagation: `backend/app/services/extractor.py:302`, `backend/app/services/extractor.py:306`
  - worker invokes extractor without depth context: `backend/app/worker/tasks.py:122`
  - worker enqueues child archive jobs recursively: `backend/app/worker/tasks.py:225`, `backend/app/worker/tasks.py:226`
 - Remediation:
  - Persist and propagate archive depth per document lineage.
  - Enforce absolute descendant and fan-out limits per root upload.
  - Reject nested archives beyond configured depth.
 ### Medium
-1. OCR provider path does not apply DNS revalidation equivalent to model runtime path.
+1. Sensitive processing content is persisted in logs by default
- Impact: Under permissive network flags, SSRF defenses can be weakened by DNS rebinding on OCR traffic.
+- Severity: Medium
- Exploit path: Persist provider URL that passes initial checks, then rebind DNS to private target before OCR requests.
+- Why this is a product issue: Pipeline logging records OCR text, extraction text, prompts, and LLM outputs into persistent processing logs.
 - Impact: Increased confidentiality risk and larger data-retention blast radius if logs are queried or exfiltrated.
 - Exploit path: Access to admin log endpoints or database allows retrieval of sensitive operational content.
 - Evidence:
-  - task model runtime enforces `resolve_dns=True`: `backend/app/services/model_runtime.py:41`
+  - `backend/app/worker/tasks.py:619`
-  - provider normalization in app settings does not pass DNS revalidation flag: `backend/app/services/app_settings.py:253`
+  - `backend/app/worker/tasks.py:638`
-  - OCR runtime uses persisted URL for client base URL: `backend/app/services/app_settings.py:891`, `backend/app/services/handwriting.py:159`
+  - `backend/app/services/routing_pipeline.py:789`
- Remediation:
+  - `backend/app/services/routing_pipeline.py:802`
-  - Apply DNS revalidation before outbound OCR requests or on every runtime load.
+  - `backend/app/services/routing_pipeline.py:814`
-  - Disallow private network egress by default and require explicit controlled exceptions.
+  - `backend/app/core/config.py:45`
 - Production recommendation:
  - Default to metadata-only logs.
  - Disable persistent storage of prompt/response/raw extracted text unless temporary debug mode is explicitly enabled with strict TTL.
-2. Provider API keys are persisted in plaintext settings on storage volume.
+2. Markdown export endpoint is unbounded and memory-amplifiable
- Impact: File system or backup compromise reveals upstream provider secrets.
+- Severity: Medium
- Exploit path: Read persisted settings file from storage volume or backup artifact.
+- Why this is a product issue: Export loads all matching documents and builds ZIP in-memory with `BytesIO`, without hard limits on selection size.
 - Impact: Authenticated users can trigger high memory use and service degradation.
 - Exploit path: Repeated wide `path_prefix` exports cause large in-memory archive construction.
 - Evidence:
-  - settings file location under storage root: `backend/app/services/app_settings.py:133`
+  - `backend/app/api/routes_documents.py:402`
-  - provider payload includes plaintext `api_key`: `backend/app/services/app_settings.py:268`
+  - `backend/app/api/routes_documents.py:412`
-  - settings payload is written to disk as JSON: `backend/app/services/app_settings.py:680`, `backend/app/services/app_settings.py:685`
+  - `backend/app/api/routes_documents.py:416`
-  - OCR settings read returns stored API key value for runtime: `backend/app/services/app_settings.py:894`
+  - `backend/app/api/routes_documents.py:418`
- Remediation:
+  - `backend/app/api/routes_documents.py:421`
-  - Move provider secrets to dedicated secret management.
+  - `backend/app/api/routes_documents.py:425`
-  - If local persistence is unavoidable, encrypt sensitive fields at rest and restrict file permissions.
+- Production recommendation:
  - Enforce max export document count and total bytes.
  - Stream archive generation to temp files.
  - Add endpoint rate limiting.
 ## Risks Requiring Product Decision or Further Verification
 1. Authorization model appears role-based without per-document ownership boundaries
 - Evidence:
  - `backend/app/models/document.py:29`
  - `backend/app/api/router.py:19`
  - `backend/app/api/router.py:31`
 - Question: Is this intentionally single-operator, or should production support multi-user/tenant data isolation?
 2. Worker startup command uses raw Redis URL string and bypasses in-code URL security validator at startup
 - Evidence:
  - `docker-compose.yml:81`
  - `backend/app/worker/queue.py:15`
 - Question: Should worker startup also enforce `validate_redis_url_security` before consuming jobs?
 3. Provider key encryption uses custom cryptographic construction
 - Evidence:
  - `backend/app/services/app_settings.py:131`
  - `backend/app/services/app_settings.py:154`
  - `backend/app/services/app_settings.py:176`
 - Question: Are compliance or internal policy requirements demanding standardized AEAD primitives from vetted cryptography libraries?
 ## User-Managed Configuration Observations (Not Product Defects)
 These are deployment/operator choices and should be tracked separately from code defects.
 1. Development-mode posture in local `.env`
 - Evidence:
  - `.env:1`
  - `.env:3`
 - Notes: `APP_ENV=development` and anonymous development access are enabled.
 2. Local `.env` includes placeholder shared API token values
 - Evidence:
  - `.env:15`
  - `.env:16`
  - `.env:31`
 - Notes: If replaced with real values and reused, this increases operational risk. This is operator responsibility.
 3. Compose defaults allow permissive provider egress controls
 - Evidence:
  - `docker-compose.yml:51`
  - `docker-compose.yml:52`
  - `.env:21`
  - `.env:22`
  - `.env:23`
 - Notes: Allowing HTTP/private-network provider targets is a deployment policy choice.
 4. Internal service transport defaults are plaintext in local stack
 - Evidence:
  - `docker-compose.yml:56`
  - `.env:11`
 - Notes: `http`/`redis://` may be acceptable for isolated local dev, but not for exposed production networks.
 ## Production Readiness Priority Order
 1. Remove browser static token model and adopt per-user auth.
 2. Tighten CORS to explicit trusted origins only.
 3. Reduce persistent sensitive logging to metadata by default.
 4. Add hard limits and streaming behavior for markdown export.
 5. Resolve product decisions on tenant isolation, worker Redis security enforcement, and cryptography standardization.