ledgerdock/REPORT.md

# Security Audit Report

Date: 2026-02-21
Repository: /Users/bedas/Developer/GitHub/dcm
Audit type: Static, read-only code and configuration review

## Scope
- Backend API, worker, extraction and routing pipeline, settings handling, and storage interactions.
- Frontend dependency posture.
- Docker runtime and service exposure.

## Method
- File-level inspection with targeted code tracing for authn/authz, input validation, upload and archive processing, outbound network behavior, secret handling, logging, and deployment hardening.
- No runtime penetration testing was performed.

## Findings

### 1) Critical - Missing authentication and authorization on privileged API routes
- Impact: Any reachable client can access document, settings, and log-management functionality.
- Evidence:
  - `backend/app/main.py:29`
  - `backend/app/api/router.py:14`
  - `backend/app/api/routes_documents.py:464`
  - `backend/app/api/routes_documents.py:666`
  - `backend/app/api/routes_settings.py:148`
  - `backend/app/api/routes_processing_logs.py:22`
- Recommendation:
  - Enforce authentication globally for non-health routes.
  - Add per-endpoint authorization checks for read/update/delete/admin actions.

### 2) Critical - SSRF and data exfiltration risk via configurable model provider base URL
- Impact: An attacker can redirect model calls to attacker-controlled or internal hosts and exfiltrate document-derived content.
- Evidence:
  - `backend/app/api/routes_settings.py:148`
  - `backend/app/schemas/settings.py:24`
  - `backend/app/services/app_settings.py:249`
  - `backend/app/services/model_runtime.py:144`
  - `backend/app/services/model_runtime.py:170`
  - `backend/app/worker/tasks.py:505`
  - `backend/app/services/routing_pipeline.py:803`
- Recommendation:
  - Restrict provider endpoints to an allowlist.
  - Validate URL scheme and block private/link-local destinations.
  - Protect settings updates behind strict admin authorization.
  - Enforce outbound egress controls at runtime.

### 3) High - Unbounded upload and archive extraction can cause memory/disk denial of service
- Impact: Oversized files or compressed archive bombs can exhaust API/worker resources.
- Evidence:
  - `backend/app/api/routes_documents.py:486`
  - `backend/app/services/extractor.py:309`
  - `backend/app/services/extractor.py:312`
  - `backend/app/worker/tasks.py:122`
  - `backend/app/core/config.py:20`
- Recommendation:
  - Enforce request and file size limits.
  - Stream uploads and extraction where possible.
  - Cap total uncompressed archive size and per-entry size.

### 4) High - Sensitive data logging exposed through unsecured log endpoints
- Impact: Extracted text, prompts, and model outputs may be retrievable by unauthorized callers.
- Evidence:
  - `backend/app/models/processing_log.py:31`
  - `backend/app/models/processing_log.py:32`
  - `backend/app/services/routing_pipeline.py:803`
  - `backend/app/services/routing_pipeline.py:814`
  - `backend/app/worker/tasks.py:479`
  - `backend/app/schemas/processing_logs.py:21`
  - `backend/app/api/routes_processing_logs.py:22`
- Recommendation:
  - Require admin authorization for log endpoints.
  - Remove or redact sensitive payloads from logs.
  - Reduce retention for operational logs that may include sensitive context.

### 5) High - Internal services exposed with weak default posture in docker compose
- Impact: Exposed Redis/Postgres/Typesense can enable data compromise and queue abuse.
- Evidence:
  - `docker-compose.yml:5`
  - `docker-compose.yml:6`
  - `docker-compose.yml:9`
  - `docker-compose.yml:21`
  - `docker-compose.yml:29`
  - `docker-compose.yml:32`
  - `docker-compose.yml:68`
  - `backend/app/worker/queue.py:15`
  - `backend/app/core/config.py:34`
- Recommendation:
  - Remove unnecessary host port exposure for internal services.
  - Use strong credentials and network ACL segmentation.
  - Enable authentication and transport protections for stateful services.

### 6) Medium - Plaintext secrets and weak defaults in configuration paths
- Impact: Credentials and API keys can be exposed from source or storage.
- Evidence:
  - `backend/app/services/app_settings.py:129`
  - `backend/app/services/app_settings.py:257`
  - `backend/app/services/app_settings.py:667`
  - `backend/app/core/config.py:17`
  - `backend/app/core/config.py:34`
  - `backend/.env.example:15`
- Recommendation:
  - Use managed secrets storage and encryption at rest.
  - Remove default credentials.
  - Rotate exposed and default keys/credentials.

### 7) Low - Minimal HTTP hardening headers and broad CORS shape
- Impact: Increased browser-side attack surface, especially once authentication is introduced.
- Evidence:
  - `backend/app/main.py:23`
  - `backend/app/main.py:25`
  - `backend/app/main.py:26`
  - `backend/app/main.py:27`
- Recommendation:
  - Add standard security headers middleware.
  - Constrain allowed methods and headers to actual application needs.

### 8) Low - Containers appear to run as root by default
- Impact: In-container compromise has higher blast radius.
- Evidence:
  - `backend/Dockerfile:1`
  - `backend/Dockerfile:17`
  - `frontend/Dockerfile:1`
  - `frontend/Dockerfile:16`
- Recommendation:
  - Run containers as non-root users.
  - Drop unnecessary Linux capabilities.

## Residual Risk and Assumptions
- This audit assumes services may be reachable beyond a strictly isolated localhost-only environment.
- If an external auth proxy is enforced upstream, risk severity of unauthenticated routes is reduced but not eliminated unless backend also enforces trust boundaries.
- Dependency CVE posture was not exhaustively enumerated in this static pass.

## Priority Remediation Order
1. Enforce authentication and authorization across API routes.
2. Lock down settings mutation paths, especially model provider endpoint configuration.
3. Add strict upload/extraction resource limits.
4. Remove sensitive logging and protect log APIs.
5. Harden Docker/network exposure and secrets management.