docs: refresh security production readiness report

2026-03-01 18:03:45 -03:00
parent 74d91eb4b1
commit aba320b617
1 changed files with 71 additions and 122 deletions
@@ -2,148 +2,97 @@

 Date: 2026-03-01
 Repository: /Users/bedas/Developer/GitHub/dcm
-Review type: Static code and configuration review (no runtime penetration testing)
+Assessment type: Static code/configuration review with local security-test execution
+
+## Verdict
+Not production ready.
+
+Reason: one blocking, code-level security issue was found.

 ## Scope
- Backend API and worker: `backend/app`
- Frontend API client/auth transport: `frontend/src`
- Compose and environment defaults: `docker-compose.yml`, `.env`
+- Backend API and worker code in `backend/app`
+- Frontend auth/API client code in `frontend/src`
+- Runtime/deployment configuration in `.env` and `docker-compose.yml`

-## Method and Limits
- Reviewed source and configuration files in the current checkout.
- Verified findings with direct file evidence.
- Did not run dynamic security testing, dependency CVE scanning, or infrastructure perimeter testing.
+## Validation Commands And Outcomes
+- `command -v git` -> pass
+- `git rev-parse --is-inside-work-tree` -> pass (`true`)
+- `git status --short` -> clean working tree before analysis
+- `/Users/bedas/Developer/Python/global_venv/bin/python backend/tests/test_security_controls.py` -> pass (24 tests)
+- `/Users/bedas/Developer/Python/global_venv/bin/python backend/tests/test_app_settings_provider_resilience.py` -> pass (6 tests)
+- `/Users/bedas/Developer/Python/global_venv/bin/python backend/tests/test_processing_log_retention_settings.py` -> pass (5 tests)
+- `/Users/bedas/Developer/Python/global_venv/bin/python backend/tests/test_upload_request_size_middleware.py` -> failed to run (import error in test setup)
+- `/Users/bedas/Developer/Python/global_venv/bin/python -m pytest ...` -> not runnable in current venv (`No module named pytest`)

-## Confirmed Product Security Findings
+## Blocking Security Findings

-### Critical
-
-1. Browser-exposed shared bearer token path (`VITE_API_TOKEN` fallback)
- Severity: Critical
- Why this is a product issue: The frontend code supports a build-time token fallback and injects it into all API requests. This creates a shared credential model in browser code.
- Impact: Any user with browser access can recover and reuse the token, collapsing auth boundaries and auditability.
- Exploit path: Open app -> inspect runtime/bundle or intercepted request -> replay bearer token against protected API endpoints.
+### High: No brute-force protection on authentication login
+- Severity: High (blocking)
+- Why this is blocking: `/api/v1/auth/login` accepts unlimited credential attempts with no per-IP or per-username throttling, no lockout, and no backoff. This leaves credential stuffing and password guessing defenses incomplete for production.
+- Impact: online account takeover risk increases substantially when passwords are weak, reused, leaked, or defaulted.
+- Exploit path: repeated automated POST requests to `/api/v1/auth/login` until valid credentials are found.
 - Evidence:
-  - `frontend/src/lib/api.ts:39`
-  - `frontend/src/lib/api.ts:98`
-  - `frontend/src/lib/api.ts:111`
-  - `frontend/src/lib/api.ts:155`
-  - `docker-compose.yml:123`
-  - `backend/app/api/router.py:25`
-  - `backend/app/api/router.py:37`
- Production recommendation:
-  - Remove browser-side static token fallback.
-  - Use per-user server-issued auth (session or short-lived JWT) with role-bound authorization.
+  - `backend/app/api/routes_auth.py:34` defines login endpoint
+  - `backend/app/api/routes_auth.py:42` to `backend/app/api/routes_auth.py:51` performs auth check and returns 401 only
+  - `backend/app/api/routes_documents.py:75` to `backend/app/api/routes_documents.py:95` shows rate limiting exists but is applied only to content export, not login
+  - `backend/app/services/rate_limiter.py:16` to `backend/app/services/rate_limiter.py:42` contains reusable limiter logic currently not used by auth routes
+- Required remediation:
+  - add login throttling keyed by username and source IP
+  - add escalating delay or temporary lockout on repeated failures
+  - return a stable error message and status on throttled attempts
+  - log and monitor failed auth attempt rates

-### High
+## Additional Code-Level Findings (Non-blocking)

-1. CORS policy is effectively any HTTP/HTTPS origin, with credentials enabled
- Severity: High
- Why this is a product issue: CORS middleware enables `allow_origin_regex` that matches broad web origins and sets `allow_credentials=True`.
- Impact: If credentials are present, cross-origin access risk increases and token abuse becomes easier from arbitrary origins.
- Exploit path: Malicious origin performs cross-origin requests with available credentials and can read API responses under permissive CORS policy.
- Evidence:
-  - `backend/app/main.py:21`
-  - `backend/app/main.py:41`
-  - `backend/app/main.py:42`
-  - `backend/app/main.py:44`
- Production recommendation:
-  - Replace regex-based broad origin acceptance with explicit trusted origin allowlist.
-  - Keep `allow_credentials=False` unless strictly required for cookie-based flows.
-
-### Medium
-
-1. Sensitive processing content is persisted in logs by default
+### Medium: Non-deterministic frontend dependency versioning
 - Severity: Medium
- Why this is a product issue: Pipeline logging records OCR text, extraction text, prompts, and LLM outputs into persistent processing logs.
- Impact: Increased confidentiality risk and larger data-retention blast radius if logs are queried or exfiltrated.
- Exploit path: Access to admin log endpoints or database allows retrieval of sensitive operational content.
+- Risk: `lucide-react` is set to `latest`, making builds time-dependent and increasing supply-chain unpredictability.
 - Evidence:
-  - `backend/app/worker/tasks.py:619`
-  - `backend/app/worker/tasks.py:638`
-  - `backend/app/services/routing_pipeline.py:789`
-  - `backend/app/services/routing_pipeline.py:802`
-  - `backend/app/services/routing_pipeline.py:814`
-  - `backend/app/core/config.py:45`
- Production recommendation:
-  - Default to metadata-only logs.
-  - Disable persistent storage of prompt/response/raw extracted text unless temporary debug mode is explicitly enabled with strict TTL.
+  - `frontend/package.json:13`
+- Recommendation:
+  - pin an explicit semver version and update intentionally via dependency review workflow

-2. Markdown export endpoint is unbounded and memory-amplifiable
- Severity: Medium
- Why this is a product issue: Export loads all matching documents and builds ZIP in-memory with `BytesIO`, without hard limits on selection size.
- Impact: Authenticated users can trigger high memory use and service degradation.
- Exploit path: Repeated wide `path_prefix` exports cause large in-memory archive construction.
+### Low: One security middleware regression test is currently not executable
+- Severity: Low
+- Risk: reduced confidence in continued enforcement of upload middleware behavior.
 - Evidence:
-  - `backend/app/api/routes_documents.py:402`
-  - `backend/app/api/routes_documents.py:412`
-  - `backend/app/api/routes_documents.py:416`
-  - `backend/app/api/routes_documents.py:418`
-  - `backend/app/api/routes_documents.py:421`
-  - `backend/app/api/routes_documents.py:425`
- Production recommendation:
-  - Enforce max export document count and total bytes.
-  - Stream archive generation to temp files.
-  - Add endpoint rate limiting.
+  - `backend/tests/test_upload_request_size_middleware.py` currently errors at import time when run directly in this environment
+- Recommendation:
+  - fix test stubs/import assumptions so this test is runnable in CI and local developer environments

-## Risks Requiring Product Decision or Further Verification
+## MUST KNOW User-Dependent Risks (Not Blocking Per Request)

-1. Authorization model appears role-based without per-document ownership boundaries
- Evidence:
-  - `backend/app/models/document.py:29`
-  - `backend/app/api/router.py:19`
-  - `backend/app/api/router.py:31`
- Question: Is this intentionally single-operator, or should production support multi-user/tenant data isolation?
+These are deployment/operator-controlled and therefore reported as must-know, not blocking findings.

-2. Worker startup command uses raw Redis URL string and bypasses in-code URL security validator at startup
- Evidence:
-  - `docker-compose.yml:81`
-  - `backend/app/worker/queue.py:15`
- Question: Should worker startup also enforce `validate_redis_url_security` before consuming jobs?
+1. Environment is set to development posture with broad host binding.
+- Evidence: `.env:1` (`APP_ENV=development`), `.env:2` (`HOST_BIND_IP=0.0.0.0`)
+- Risk: exposed surface and relaxed defaults if used outside isolated development network.

-3. Provider key encryption uses custom cryptographic construction
- Evidence:
-  - `backend/app/services/app_settings.py:131`
-  - `backend/app/services/app_settings.py:154`
-  - `backend/app/services/app_settings.py:176`
- Question: Are compliance or internal policy requirements demanding standardized AEAD primitives from vetted cryptography libraries?
+2. Bootstrap credentials appear placeholder/weak and must be replaced before live use.
+- Evidence: `.env:15`, `.env:17`
+- Risk: straightforward credential compromise if unchanged.

-## User-Managed Configuration Observations (Not Product Defects)
+3. Sensitive log text persistence is enabled in current `.env`.
+- Evidence: `.env:22`, `.env:23`; code path at `backend/app/services/processing_logs.py:47` and `backend/app/services/processing_logs.py:128`
+- Risk: OCR/model prompts, responses, and payload text can be stored and later exposed to admins or backups.

-These are deployment/operator choices and should be tracked separately from code defects.
+4. Provider outbound network controls are permissive in current `.env`.
+- Evidence: `.env:29` to `.env:31`
+- Risk: admin-configured provider endpoints can target non-HTTPS/private hosts if account compromise occurs.

-1. Development-mode posture in local `.env`
- Evidence:
-  - `.env:1`
-  - `.env:3`
- Notes: `APP_ENV=development` and anonymous development access are enabled.
+5. Public base URL is HTTP and CORS includes LAN origin.
+- Evidence: `.env:33`, `.env:34`
+- Risk: transport security and origin trust are weaker than production HTTPS allowlist posture.

-2. Local `.env` includes placeholder shared API token values
- Evidence:
-  - `.env:15`
-  - `.env:16`
-  - `.env:31`
- Notes: If replaced with real values and reused, this increases operational risk. This is operator responsibility.
+6. Redis transport is non-TLS in current env posture.
+- Evidence: `.env:10` to `.env:12`
+- Risk: acceptable for isolated local stack, unsafe for untrusted networks.

-3. Compose defaults allow permissive provider egress controls
- Evidence:
-  - `docker-compose.yml:51`
-  - `docker-compose.yml:52`
-  - `.env:21`
-  - `.env:22`
-  - `.env:23`
- Notes: Allowing HTTP/private-network provider targets is a deployment policy choice.
+7. Compose defaults are development-first for several sensitive controls.
+- Evidence: `docker-compose.yml:40`, `docker-compose.yml:52`, `docker-compose.yml:53`, `docker-compose.yml:60`, `docker-compose.yml:68`
+- Risk: if overrides are missed, deployment may run with weaker network and provider policies.

-4. Internal service transport defaults are plaintext in local stack
- Evidence:
-  - `docker-compose.yml:56`
-  - `.env:11`
- Notes: `http`/`redis://` may be acceptable for isolated local dev, but not for exposed production networks.
+## Production Decision
+Current state is not production ready due to the blocking auth brute-force gap.

-## Production Readiness Priority Order
-
-1. Remove browser static token model and adopt per-user auth.
-2. Tighten CORS to explicit trusted origins only.
-3. Reduce persistent sensitive logging to metadata by default.
-4. Add hard limits and streaming behavior for markdown export.
-5. Resolve product decisions on tenant isolation, worker Redis security enforcement, and cryptography standardization.
+After fixing the blocking issue, production readiness still depends on secure operator configuration of `.env` and runtime network perimeter.