Replace REPORT.md with production security readiness assessment
This commit is contained in:
216
REPORT.md
216
REPORT.md
@@ -2,108 +2,148 @@
|
|||||||
|
|
||||||
Date: 2026-03-01
|
Date: 2026-03-01
|
||||||
Repository: /Users/bedas/Developer/GitHub/dcm
|
Repository: /Users/bedas/Developer/GitHub/dcm
|
||||||
Review Type: Static security review for production readiness
|
Review type: Static code and configuration review (no runtime penetration testing)
|
||||||
|
|
||||||
## Scope
|
## Scope
|
||||||
- Backend: FastAPI API, worker queue, settings and model runtime services
|
- Backend API and worker: `backend/app`
|
||||||
- Frontend: React and Vite API client and document preview rendering
|
- Frontend API client/auth transport: `frontend/src`
|
||||||
- Infrastructure: docker-compose service exposure and secret configuration
|
- Compose and environment defaults: `docker-compose.yml`, `.env`
|
||||||
|
|
||||||
## Findings
|
## Method and Limits
|
||||||
|
- Reviewed source and configuration files in the current checkout.
|
||||||
|
- Verified findings with direct file evidence.
|
||||||
|
- Did not run dynamic security testing, dependency CVE scanning, or infrastructure perimeter testing.
|
||||||
|
|
||||||
|
## Confirmed Product Security Findings
|
||||||
|
|
||||||
### Critical
|
### Critical
|
||||||
|
|
||||||
1. Redis queue is exposed without authentication and can be abused for worker job injection.
|
1. Browser-exposed shared bearer token path (`VITE_API_TOKEN` fallback)
|
||||||
- Impact: If Redis is reachable by an attacker, queued job payloads can be injected and executed by the worker process, leading to remote code execution and data compromise.
|
- Severity: Critical
|
||||||
- Exploit path: Reach Redis on port 6379, enqueue crafted RQ jobs into queue dcm, wait for worker consumption.
|
- Why this is a product issue: The frontend code supports a build-time token fallback and injects it into all API requests. This creates a shared credential model in browser code.
|
||||||
|
- Impact: Any user with browser access can recover and reuse the token, collapsing auth boundaries and auditability.
|
||||||
|
- Exploit path: Open app -> inspect runtime/bundle or intercepted request -> replay bearer token against protected API endpoints.
|
||||||
- Evidence:
|
- Evidence:
|
||||||
- docker-compose publishes Redis host port: `docker-compose.yml:21`
|
- `frontend/src/lib/api.ts:39`
|
||||||
- worker consumes from Redis queue directly: `docker-compose.yml:77`
|
- `frontend/src/lib/api.ts:98`
|
||||||
- queue connection uses bare Redis URL with no auth/TLS: `backend/app/worker/queue.py:15`, `backend/app/worker/queue.py:21`
|
- `frontend/src/lib/api.ts:111`
|
||||||
- current environment binds services to all interfaces: `.env:1`
|
- `frontend/src/lib/api.ts:155`
|
||||||
- Remediation:
|
- `docker-compose.yml:123`
|
||||||
- Do not publish Redis externally in production.
|
- `backend/app/api/router.py:25`
|
||||||
- Enforce Redis authentication and TLS.
|
- `backend/app/api/router.py:37`
|
||||||
- Place Redis on a private network segment with strict ACLs.
|
- Production recommendation:
|
||||||
- Treat queue producers as privileged components only.
|
- Remove browser-side static token fallback.
|
||||||
|
- Use per-user server-issued auth (session or short-lived JWT) with role-bound authorization.
|
||||||
2. Untrusted uploaded content is previewed in an unsandboxed iframe.
|
|
||||||
- Impact: Stored XSS and active content execution in preview context can enable account action abuse and data exfiltration in the browser.
|
|
||||||
- Exploit path: Upload active content (for example HTML), open preview, script executes in iframe without sandbox constraints.
|
|
||||||
- Evidence:
|
|
||||||
- upload endpoint accepts generic uploaded files: `backend/app/api/routes_documents.py:493`
|
|
||||||
- MIME type is derived from bytes and persisted: `backend/app/api/routes_documents.py:530`
|
|
||||||
- preview endpoint returns original bytes inline with stored media type: `backend/app/api/routes_documents.py:449`, `backend/app/api/routes_documents.py:457`
|
|
||||||
- frontend renders preview in iframe without sandbox attribute: `frontend/src/components/DocumentViewer.tsx:486`
|
|
||||||
- preview source is a blob URL created from fetched content: `frontend/src/components/DocumentViewer.tsx:108`, `frontend/src/components/DocumentViewer.tsx:113`
|
|
||||||
- Remediation:
|
|
||||||
- Block inline preview for script-capable MIME types.
|
|
||||||
- Add strict iframe sandboxing if iframe preview remains required.
|
|
||||||
- Prefer force-download for active formats.
|
|
||||||
- Serve untrusted preview content from an isolated origin with restrictive CSP.
|
|
||||||
|
|
||||||
### High
|
### High
|
||||||
|
|
||||||
1. Frontend distributes a bearer token to all clients.
|
1. CORS policy is effectively any HTTP/HTTPS origin, with credentials enabled
|
||||||
- Impact: Any user with browser access can extract the token and replay authenticated calls, preventing per-user accountability and increasing blast radius.
|
- Severity: High
|
||||||
- Exploit path: Read token from frontend runtime environment or request headers, replay API requests with Authorization header.
|
- Why this is a product issue: CORS middleware enables `allow_origin_regex` that matches broad web origins and sets `allow_credentials=True`.
|
||||||
|
- Impact: If credentials are present, cross-origin access risk increases and token abuse becomes easier from arbitrary origins.
|
||||||
|
- Exploit path: Malicious origin performs cross-origin requests with available credentials and can read API responses under permissive CORS policy.
|
||||||
- Evidence:
|
- Evidence:
|
||||||
- frontend consumes token from public Vite env: `frontend/src/lib/api.ts:24`
|
- `backend/app/main.py:21`
|
||||||
- token is attached to every request when present: `frontend/src/lib/api.ts:38`
|
- `backend/app/main.py:41`
|
||||||
- compose passes `VITE_API_TOKEN` from user token: `docker-compose.yml:115`
|
- `backend/app/main.py:42`
|
||||||
- privileged routes rely on static token role checks: `backend/app/api/router.py:19`, `backend/app/api/auth.py:47`, `backend/app/api/auth.py:51`
|
- `backend/app/main.py:44`
|
||||||
- Remediation:
|
- Production recommendation:
|
||||||
- Replace shared static token model with per-user authentication.
|
- Replace regex-based broad origin acceptance with explicit trusted origin allowlist.
|
||||||
- Keep secrets server-side only.
|
- Keep `allow_credentials=False` unless strictly required for cookie-based flows.
|
||||||
- Use short-lived credentials with rotation and revocation.
|
|
||||||
|
|
||||||
2. Default and static service secrets are present in deploy config.
|
|
||||||
- Impact: If service ports are exposed, predictable credentials and keys allow unauthorized access to data services.
|
|
||||||
- Exploit path: Connect to published Postgres or Typesense ports and authenticate with known static values.
|
|
||||||
- Evidence:
|
|
||||||
- static Postgres credentials: `docker-compose.yml:5`, `docker-compose.yml:6`
|
|
||||||
- static Typesense key in compose and runtime env: `docker-compose.yml:29`, `docker-compose.yml:55`, `docker-compose.yml:93`
|
|
||||||
- database and Typesense ports are published to host: `docker-compose.yml:9`, `docker-compose.yml:32`
|
|
||||||
- current environment uses placeholder tokens: `.env:2`, `.env:3`, `.env:4`
|
|
||||||
- Remediation:
|
|
||||||
- Use high-entropy secrets managed outside repository configuration.
|
|
||||||
- Remove unnecessary host port publications in production.
|
|
||||||
- Restrict service network access to trusted internal components.
|
|
||||||
|
|
||||||
3. ZIP recursion depth control is not enforced across queued descendants.
|
|
||||||
- Impact: Nested archives can create uncontrolled fan-out, causing CPU, queue, and storage exhaustion.
|
|
||||||
- Exploit path: Upload ZIP containing ZIPs; children are queued as independent documents without inherited depth, repeating recursively.
|
|
||||||
- Evidence:
|
|
||||||
- configured depth limit exists: `backend/app/core/config.py:28`
|
|
||||||
- extractor takes a depth argument but is called without propagation: `backend/app/services/extractor.py:302`, `backend/app/services/extractor.py:306`
|
|
||||||
- worker invokes extractor without depth context: `backend/app/worker/tasks.py:122`
|
|
||||||
- worker enqueues child archive jobs recursively: `backend/app/worker/tasks.py:225`, `backend/app/worker/tasks.py:226`
|
|
||||||
- Remediation:
|
|
||||||
- Persist and propagate archive depth per document lineage.
|
|
||||||
- Enforce absolute descendant and fan-out limits per root upload.
|
|
||||||
- Reject nested archives beyond configured depth.
|
|
||||||
|
|
||||||
### Medium
|
### Medium
|
||||||
|
|
||||||
1. OCR provider path does not apply DNS revalidation equivalent to model runtime path.
|
1. Sensitive processing content is persisted in logs by default
|
||||||
- Impact: Under permissive network flags, SSRF defenses can be weakened by DNS rebinding on OCR traffic.
|
- Severity: Medium
|
||||||
- Exploit path: Persist provider URL that passes initial checks, then rebind DNS to private target before OCR requests.
|
- Why this is a product issue: Pipeline logging records OCR text, extraction text, prompts, and LLM outputs into persistent processing logs.
|
||||||
|
- Impact: Increased confidentiality risk and larger data-retention blast radius if logs are queried or exfiltrated.
|
||||||
|
- Exploit path: Access to admin log endpoints or database allows retrieval of sensitive operational content.
|
||||||
- Evidence:
|
- Evidence:
|
||||||
- task model runtime enforces `resolve_dns=True`: `backend/app/services/model_runtime.py:41`
|
- `backend/app/worker/tasks.py:619`
|
||||||
- provider normalization in app settings does not pass DNS revalidation flag: `backend/app/services/app_settings.py:253`
|
- `backend/app/worker/tasks.py:638`
|
||||||
- OCR runtime uses persisted URL for client base URL: `backend/app/services/app_settings.py:891`, `backend/app/services/handwriting.py:159`
|
- `backend/app/services/routing_pipeline.py:789`
|
||||||
- Remediation:
|
- `backend/app/services/routing_pipeline.py:802`
|
||||||
- Apply DNS revalidation before outbound OCR requests or on every runtime load.
|
- `backend/app/services/routing_pipeline.py:814`
|
||||||
- Disallow private network egress by default and require explicit controlled exceptions.
|
- `backend/app/core/config.py:45`
|
||||||
|
- Production recommendation:
|
||||||
|
- Default to metadata-only logs.
|
||||||
|
- Disable persistent storage of prompt/response/raw extracted text unless temporary debug mode is explicitly enabled with strict TTL.
|
||||||
|
|
||||||
2. Provider API keys are persisted in plaintext settings on storage volume.
|
2. Markdown export endpoint is unbounded and memory-amplifiable
|
||||||
- Impact: File system or backup compromise reveals upstream provider secrets.
|
- Severity: Medium
|
||||||
- Exploit path: Read persisted settings file from storage volume or backup artifact.
|
- Why this is a product issue: Export loads all matching documents and builds ZIP in-memory with `BytesIO`, without hard limits on selection size.
|
||||||
|
- Impact: Authenticated users can trigger high memory use and service degradation.
|
||||||
|
- Exploit path: Repeated wide `path_prefix` exports cause large in-memory archive construction.
|
||||||
- Evidence:
|
- Evidence:
|
||||||
- settings file location under storage root: `backend/app/services/app_settings.py:133`
|
- `backend/app/api/routes_documents.py:402`
|
||||||
- provider payload includes plaintext `api_key`: `backend/app/services/app_settings.py:268`
|
- `backend/app/api/routes_documents.py:412`
|
||||||
- settings payload is written to disk as JSON: `backend/app/services/app_settings.py:680`, `backend/app/services/app_settings.py:685`
|
- `backend/app/api/routes_documents.py:416`
|
||||||
- OCR settings read returns stored API key value for runtime: `backend/app/services/app_settings.py:894`
|
- `backend/app/api/routes_documents.py:418`
|
||||||
- Remediation:
|
- `backend/app/api/routes_documents.py:421`
|
||||||
- Move provider secrets to dedicated secret management.
|
- `backend/app/api/routes_documents.py:425`
|
||||||
- If local persistence is unavoidable, encrypt sensitive fields at rest and restrict file permissions.
|
- Production recommendation:
|
||||||
|
- Enforce max export document count and total bytes.
|
||||||
|
- Stream archive generation to temp files.
|
||||||
|
- Add endpoint rate limiting.
|
||||||
|
|
||||||
|
## Risks Requiring Product Decision or Further Verification
|
||||||
|
|
||||||
|
1. Authorization model appears role-based without per-document ownership boundaries
|
||||||
|
- Evidence:
|
||||||
|
- `backend/app/models/document.py:29`
|
||||||
|
- `backend/app/api/router.py:19`
|
||||||
|
- `backend/app/api/router.py:31`
|
||||||
|
- Question: Is this intentionally single-operator, or should production support multi-user/tenant data isolation?
|
||||||
|
|
||||||
|
2. Worker startup command uses raw Redis URL string and bypasses in-code URL security validator at startup
|
||||||
|
- Evidence:
|
||||||
|
- `docker-compose.yml:81`
|
||||||
|
- `backend/app/worker/queue.py:15`
|
||||||
|
- Question: Should worker startup also enforce `validate_redis_url_security` before consuming jobs?
|
||||||
|
|
||||||
|
3. Provider key encryption uses custom cryptographic construction
|
||||||
|
- Evidence:
|
||||||
|
- `backend/app/services/app_settings.py:131`
|
||||||
|
- `backend/app/services/app_settings.py:154`
|
||||||
|
- `backend/app/services/app_settings.py:176`
|
||||||
|
- Question: Are compliance or internal policy requirements demanding standardized AEAD primitives from vetted cryptography libraries?
|
||||||
|
|
||||||
|
## User-Managed Configuration Observations (Not Product Defects)
|
||||||
|
|
||||||
|
These are deployment/operator choices and should be tracked separately from code defects.
|
||||||
|
|
||||||
|
1. Development-mode posture in local `.env`
|
||||||
|
- Evidence:
|
||||||
|
- `.env:1`
|
||||||
|
- `.env:3`
|
||||||
|
- Notes: `APP_ENV=development` and anonymous development access are enabled.
|
||||||
|
|
||||||
|
2. Local `.env` includes placeholder shared API token values
|
||||||
|
- Evidence:
|
||||||
|
- `.env:15`
|
||||||
|
- `.env:16`
|
||||||
|
- `.env:31`
|
||||||
|
- Notes: If replaced with real values and reused, this increases operational risk. This is operator responsibility.
|
||||||
|
|
||||||
|
3. Compose defaults allow permissive provider egress controls
|
||||||
|
- Evidence:
|
||||||
|
- `docker-compose.yml:51`
|
||||||
|
- `docker-compose.yml:52`
|
||||||
|
- `.env:21`
|
||||||
|
- `.env:22`
|
||||||
|
- `.env:23`
|
||||||
|
- Notes: Allowing HTTP/private-network provider targets is a deployment policy choice.
|
||||||
|
|
||||||
|
4. Internal service transport defaults are plaintext in local stack
|
||||||
|
- Evidence:
|
||||||
|
- `docker-compose.yml:56`
|
||||||
|
- `.env:11`
|
||||||
|
- Notes: `http`/`redis://` may be acceptable for isolated local dev, but not for exposed production networks.
|
||||||
|
|
||||||
|
## Production Readiness Priority Order
|
||||||
|
|
||||||
|
1. Remove browser static token model and adopt per-user auth.
|
||||||
|
2. Tighten CORS to explicit trusted origins only.
|
||||||
|
3. Reduce persistent sensitive logging to metadata by default.
|
||||||
|
4. Add hard limits and streaming behavior for markdown export.
|
||||||
|
5. Resolve product decisions on tenant isolation, worker Redis security enforcement, and cryptography standardization.
|
||||||
|
|||||||
Reference in New Issue
Block a user