From 7a19f22f417c7b1d4b2d777b0a1b31d319dd0cf5 Mon Sep 17 00:00:00 2001 From: Beda Schmid Date: Sun, 1 Mar 2026 14:56:26 -0300 Subject: [PATCH] Replace REPORT.md with production security readiness assessment --- REPORT.md | 216 ++++++++++++++++++++++++++++++++---------------------- 1 file changed, 128 insertions(+), 88 deletions(-) diff --git a/REPORT.md b/REPORT.md index e85b40e..91ce7c6 100644 --- a/REPORT.md +++ b/REPORT.md @@ -2,108 +2,148 @@ Date: 2026-03-01 Repository: /Users/bedas/Developer/GitHub/dcm -Review Type: Static security review for production readiness +Review type: Static code and configuration review (no runtime penetration testing) ## Scope -- Backend: FastAPI API, worker queue, settings and model runtime services -- Frontend: React and Vite API client and document preview rendering -- Infrastructure: docker-compose service exposure and secret configuration +- Backend API and worker: `backend/app` +- Frontend API client/auth transport: `frontend/src` +- Compose and environment defaults: `docker-compose.yml`, `.env` -## Findings +## Method and Limits +- Reviewed source and configuration files in the current checkout. +- Verified findings with direct file evidence. +- Did not run dynamic security testing, dependency CVE scanning, or infrastructure perimeter testing. + +## Confirmed Product Security Findings ### Critical -1. Redis queue is exposed without authentication and can be abused for worker job injection. -- Impact: If Redis is reachable by an attacker, queued job payloads can be injected and executed by the worker process, leading to remote code execution and data compromise. -- Exploit path: Reach Redis on port 6379, enqueue crafted RQ jobs into queue dcm, wait for worker consumption. +1. Browser-exposed shared bearer token path (`VITE_API_TOKEN` fallback) +- Severity: Critical +- Why this is a product issue: The frontend code supports a build-time token fallback and injects it into all API requests. This creates a shared credential model in browser code. +- Impact: Any user with browser access can recover and reuse the token, collapsing auth boundaries and auditability. +- Exploit path: Open app -> inspect runtime/bundle or intercepted request -> replay bearer token against protected API endpoints. - Evidence: - - docker-compose publishes Redis host port: `docker-compose.yml:21` - - worker consumes from Redis queue directly: `docker-compose.yml:77` - - queue connection uses bare Redis URL with no auth/TLS: `backend/app/worker/queue.py:15`, `backend/app/worker/queue.py:21` - - current environment binds services to all interfaces: `.env:1` -- Remediation: - - Do not publish Redis externally in production. - - Enforce Redis authentication and TLS. - - Place Redis on a private network segment with strict ACLs. - - Treat queue producers as privileged components only. - -2. Untrusted uploaded content is previewed in an unsandboxed iframe. -- Impact: Stored XSS and active content execution in preview context can enable account action abuse and data exfiltration in the browser. -- Exploit path: Upload active content (for example HTML), open preview, script executes in iframe without sandbox constraints. -- Evidence: - - upload endpoint accepts generic uploaded files: `backend/app/api/routes_documents.py:493` - - MIME type is derived from bytes and persisted: `backend/app/api/routes_documents.py:530` - - preview endpoint returns original bytes inline with stored media type: `backend/app/api/routes_documents.py:449`, `backend/app/api/routes_documents.py:457` - - frontend renders preview in iframe without sandbox attribute: `frontend/src/components/DocumentViewer.tsx:486` - - preview source is a blob URL created from fetched content: `frontend/src/components/DocumentViewer.tsx:108`, `frontend/src/components/DocumentViewer.tsx:113` -- Remediation: - - Block inline preview for script-capable MIME types. - - Add strict iframe sandboxing if iframe preview remains required. - - Prefer force-download for active formats. - - Serve untrusted preview content from an isolated origin with restrictive CSP. + - `frontend/src/lib/api.ts:39` + - `frontend/src/lib/api.ts:98` + - `frontend/src/lib/api.ts:111` + - `frontend/src/lib/api.ts:155` + - `docker-compose.yml:123` + - `backend/app/api/router.py:25` + - `backend/app/api/router.py:37` +- Production recommendation: + - Remove browser-side static token fallback. + - Use per-user server-issued auth (session or short-lived JWT) with role-bound authorization. ### High -1. Frontend distributes a bearer token to all clients. -- Impact: Any user with browser access can extract the token and replay authenticated calls, preventing per-user accountability and increasing blast radius. -- Exploit path: Read token from frontend runtime environment or request headers, replay API requests with Authorization header. +1. CORS policy is effectively any HTTP/HTTPS origin, with credentials enabled +- Severity: High +- Why this is a product issue: CORS middleware enables `allow_origin_regex` that matches broad web origins and sets `allow_credentials=True`. +- Impact: If credentials are present, cross-origin access risk increases and token abuse becomes easier from arbitrary origins. +- Exploit path: Malicious origin performs cross-origin requests with available credentials and can read API responses under permissive CORS policy. - Evidence: - - frontend consumes token from public Vite env: `frontend/src/lib/api.ts:24` - - token is attached to every request when present: `frontend/src/lib/api.ts:38` - - compose passes `VITE_API_TOKEN` from user token: `docker-compose.yml:115` - - privileged routes rely on static token role checks: `backend/app/api/router.py:19`, `backend/app/api/auth.py:47`, `backend/app/api/auth.py:51` -- Remediation: - - Replace shared static token model with per-user authentication. - - Keep secrets server-side only. - - Use short-lived credentials with rotation and revocation. - -2. Default and static service secrets are present in deploy config. -- Impact: If service ports are exposed, predictable credentials and keys allow unauthorized access to data services. -- Exploit path: Connect to published Postgres or Typesense ports and authenticate with known static values. -- Evidence: - - static Postgres credentials: `docker-compose.yml:5`, `docker-compose.yml:6` - - static Typesense key in compose and runtime env: `docker-compose.yml:29`, `docker-compose.yml:55`, `docker-compose.yml:93` - - database and Typesense ports are published to host: `docker-compose.yml:9`, `docker-compose.yml:32` - - current environment uses placeholder tokens: `.env:2`, `.env:3`, `.env:4` -- Remediation: - - Use high-entropy secrets managed outside repository configuration. - - Remove unnecessary host port publications in production. - - Restrict service network access to trusted internal components. - -3. ZIP recursion depth control is not enforced across queued descendants. -- Impact: Nested archives can create uncontrolled fan-out, causing CPU, queue, and storage exhaustion. -- Exploit path: Upload ZIP containing ZIPs; children are queued as independent documents without inherited depth, repeating recursively. -- Evidence: - - configured depth limit exists: `backend/app/core/config.py:28` - - extractor takes a depth argument but is called without propagation: `backend/app/services/extractor.py:302`, `backend/app/services/extractor.py:306` - - worker invokes extractor without depth context: `backend/app/worker/tasks.py:122` - - worker enqueues child archive jobs recursively: `backend/app/worker/tasks.py:225`, `backend/app/worker/tasks.py:226` -- Remediation: - - Persist and propagate archive depth per document lineage. - - Enforce absolute descendant and fan-out limits per root upload. - - Reject nested archives beyond configured depth. + - `backend/app/main.py:21` + - `backend/app/main.py:41` + - `backend/app/main.py:42` + - `backend/app/main.py:44` +- Production recommendation: + - Replace regex-based broad origin acceptance with explicit trusted origin allowlist. + - Keep `allow_credentials=False` unless strictly required for cookie-based flows. ### Medium -1. OCR provider path does not apply DNS revalidation equivalent to model runtime path. -- Impact: Under permissive network flags, SSRF defenses can be weakened by DNS rebinding on OCR traffic. -- Exploit path: Persist provider URL that passes initial checks, then rebind DNS to private target before OCR requests. +1. Sensitive processing content is persisted in logs by default +- Severity: Medium +- Why this is a product issue: Pipeline logging records OCR text, extraction text, prompts, and LLM outputs into persistent processing logs. +- Impact: Increased confidentiality risk and larger data-retention blast radius if logs are queried or exfiltrated. +- Exploit path: Access to admin log endpoints or database allows retrieval of sensitive operational content. - Evidence: - - task model runtime enforces `resolve_dns=True`: `backend/app/services/model_runtime.py:41` - - provider normalization in app settings does not pass DNS revalidation flag: `backend/app/services/app_settings.py:253` - - OCR runtime uses persisted URL for client base URL: `backend/app/services/app_settings.py:891`, `backend/app/services/handwriting.py:159` -- Remediation: - - Apply DNS revalidation before outbound OCR requests or on every runtime load. - - Disallow private network egress by default and require explicit controlled exceptions. + - `backend/app/worker/tasks.py:619` + - `backend/app/worker/tasks.py:638` + - `backend/app/services/routing_pipeline.py:789` + - `backend/app/services/routing_pipeline.py:802` + - `backend/app/services/routing_pipeline.py:814` + - `backend/app/core/config.py:45` +- Production recommendation: + - Default to metadata-only logs. + - Disable persistent storage of prompt/response/raw extracted text unless temporary debug mode is explicitly enabled with strict TTL. -2. Provider API keys are persisted in plaintext settings on storage volume. -- Impact: File system or backup compromise reveals upstream provider secrets. -- Exploit path: Read persisted settings file from storage volume or backup artifact. +2. Markdown export endpoint is unbounded and memory-amplifiable +- Severity: Medium +- Why this is a product issue: Export loads all matching documents and builds ZIP in-memory with `BytesIO`, without hard limits on selection size. +- Impact: Authenticated users can trigger high memory use and service degradation. +- Exploit path: Repeated wide `path_prefix` exports cause large in-memory archive construction. - Evidence: - - settings file location under storage root: `backend/app/services/app_settings.py:133` - - provider payload includes plaintext `api_key`: `backend/app/services/app_settings.py:268` - - settings payload is written to disk as JSON: `backend/app/services/app_settings.py:680`, `backend/app/services/app_settings.py:685` - - OCR settings read returns stored API key value for runtime: `backend/app/services/app_settings.py:894` -- Remediation: - - Move provider secrets to dedicated secret management. - - If local persistence is unavoidable, encrypt sensitive fields at rest and restrict file permissions. + - `backend/app/api/routes_documents.py:402` + - `backend/app/api/routes_documents.py:412` + - `backend/app/api/routes_documents.py:416` + - `backend/app/api/routes_documents.py:418` + - `backend/app/api/routes_documents.py:421` + - `backend/app/api/routes_documents.py:425` +- Production recommendation: + - Enforce max export document count and total bytes. + - Stream archive generation to temp files. + - Add endpoint rate limiting. + +## Risks Requiring Product Decision or Further Verification + +1. Authorization model appears role-based without per-document ownership boundaries +- Evidence: + - `backend/app/models/document.py:29` + - `backend/app/api/router.py:19` + - `backend/app/api/router.py:31` +- Question: Is this intentionally single-operator, or should production support multi-user/tenant data isolation? + +2. Worker startup command uses raw Redis URL string and bypasses in-code URL security validator at startup +- Evidence: + - `docker-compose.yml:81` + - `backend/app/worker/queue.py:15` +- Question: Should worker startup also enforce `validate_redis_url_security` before consuming jobs? + +3. Provider key encryption uses custom cryptographic construction +- Evidence: + - `backend/app/services/app_settings.py:131` + - `backend/app/services/app_settings.py:154` + - `backend/app/services/app_settings.py:176` +- Question: Are compliance or internal policy requirements demanding standardized AEAD primitives from vetted cryptography libraries? + +## User-Managed Configuration Observations (Not Product Defects) + +These are deployment/operator choices and should be tracked separately from code defects. + +1. Development-mode posture in local `.env` +- Evidence: + - `.env:1` + - `.env:3` +- Notes: `APP_ENV=development` and anonymous development access are enabled. + +2. Local `.env` includes placeholder shared API token values +- Evidence: + - `.env:15` + - `.env:16` + - `.env:31` +- Notes: If replaced with real values and reused, this increases operational risk. This is operator responsibility. + +3. Compose defaults allow permissive provider egress controls +- Evidence: + - `docker-compose.yml:51` + - `docker-compose.yml:52` + - `.env:21` + - `.env:22` + - `.env:23` +- Notes: Allowing HTTP/private-network provider targets is a deployment policy choice. + +4. Internal service transport defaults are plaintext in local stack +- Evidence: + - `docker-compose.yml:56` + - `.env:11` +- Notes: `http`/`redis://` may be acceptable for isolated local dev, but not for exposed production networks. + +## Production Readiness Priority Order + +1. Remove browser static token model and adopt per-user auth. +2. Tighten CORS to explicit trusted origins only. +3. Reduce persistent sensitive logging to metadata by default. +4. Add hard limits and streaming behavior for markdown export. +5. Resolve product decisions on tenant isolation, worker Redis security enforcement, and cryptography standardization.