Harden auth, redaction, upload size checks, and compose token requirements

This commit is contained in:
2026-02-21 13:48:55 -03:00
parent 5792586a90
commit 3cbad053cc
21 changed files with 1168 additions and 85 deletions

View File

@@ -6,7 +6,7 @@ This directory contains technical documentation for DMS.
- `../README.md` - project overview, setup, and quick operations
- `architecture-overview.md` - backend, frontend, and infrastructure architecture
- `api-contract.md` - API endpoint contract grouped by route module, including settings and processing-log trim defaults
- `api-contract.md` - API endpoint contract grouped by route module, including token auth roles, upload limits, and settings or processing-log security constraints
- `data-model-reference.md` - database entity definitions and lifecycle states
- `operations-and-configuration.md` - runtime operations, ports, volumes, and persisted settings configuration
- `operations-and-configuration.md` - runtime operations, hardened compose defaults, security environment variables, and persisted settings configuration
- `frontend-design-foundation.md` - frontend visual system, tokens, UI implementation rules, processing-log timeline behavior, and settings helper-copy guidance

View File

@@ -10,6 +10,17 @@ Primary implementation modules:
- `backend/app/api/routes_processing_logs.py`
- `backend/app/api/routes_settings.py`
## Authentication And Authorization
- Protected endpoints require `Authorization: Bearer <token>`.
- `ADMIN_API_TOKEN` is required for all privileged access and acts as fail-closed root credential.
- `USER_API_TOKEN` is optional and, when configured, grants access to document endpoints only.
- Authorization matrix:
- `documents/*`: `admin` or `user`
- `search/*`: `admin` or `user`
- `settings/*`: `admin` only
- `processing/logs/*`: `admin` only
## Health
- `GET /health`
@@ -18,6 +29,8 @@ Primary implementation modules:
## Documents
- Access: admin or user token required
### Collection and metadata helpers
- `GET /documents`
@@ -76,9 +89,13 @@ Primary implementation modules:
- `ask`: returns `conflicts` if duplicate checksum is detected
- `replace`: creates new document linked to replaced document id
- `duplicate`: creates additional document record
- request rejected with `411` when `Content-Length` is missing
- request rejected with `413` when file count, per-file size, or total request size exceeds configured limits
## Search
- Access: admin or user token required
- `GET /search`
- Query: `query` (min length 2), `offset`, `limit`, `include_trashed`, `only_trashed`, `path_filter`, `tag_filter`, `type_filter`, `processed_from`, `processed_to`
- Response model: `SearchResponse`
@@ -86,23 +103,31 @@ Primary implementation modules:
## Processing Logs
- Access: admin token required
- `GET /processing/logs`
- Query: `offset`, `limit`, `document_id`
- Response model: `ProcessingLogListResponse`
- `limit` is capped by runtime configuration
- sensitive fields are redacted in API responses
- `POST /processing/logs/trim`
- Query: optional `keep_document_sessions`, `keep_unbound_entries`
- Behavior: omitted query values fall back to persisted `/settings.processing_log_retention`
- query values are capped by runtime retention limits
- Response: trim counters
- `POST /processing/logs/clear`
- Response: clear counters
## Settings
- Access: admin token required
- `GET /settings`
- Response model: `AppSettingsResponse`
- `PATCH /settings`
- Body model: `AppSettingsUpdateRequest`
- Response model: `AppSettingsResponse`
- rejects invalid provider base URLs with `400` when scheme, allowlist, or network safety checks fail
- `POST /settings/reset`
- Response model: `AppSettingsResponse`
- `PATCH /settings/handwriting`

View File

@@ -3,12 +3,12 @@
## Runtime Services
`docker-compose.yml` defines the runtime stack:
- `db` (Postgres 16, port `5432`)
- `redis` (Redis 7, port `6379`)
- `typesense` (Typesense 29, port `8108`)
- `api` (FastAPI backend, port `8000`)
- `db` (Postgres 16, localhost-bound port `5432`)
- `redis` (Redis 7, localhost-bound port `6379`)
- `typesense` (Typesense 29, localhost-bound port `8108`)
- `api` (FastAPI backend, localhost-bound port `8000`)
- `worker` (RQ background worker)
- `frontend` (Vite UI, port `5173`)
- `frontend` (Vite UI, localhost-bound port `5173`)
## Named Volumes
@@ -44,6 +44,15 @@ Tail logs:
docker compose logs -f
```
Before running compose, provide explicit API tokens in your shell or project `.env` file:
```bash
export ADMIN_API_TOKEN="<random-admin-token>"
export USER_API_TOKEN="<random-user-token>"
```
Compose now fails fast if either token variable is missing.
## Backend Configuration
Settings source:
@@ -55,8 +64,13 @@ Key environment variables used by `api` and `worker` in compose:
- `DATABASE_URL`
- `REDIS_URL`
- `STORAGE_ROOT`
- `ADMIN_API_TOKEN`
- `USER_API_TOKEN`
- `PUBLIC_BASE_URL`
- `CORS_ORIGINS` (API service)
- `PROVIDER_BASE_URL_ALLOWLIST`
- `PROVIDER_BASE_URL_ALLOW_HTTP`
- `PROVIDER_BASE_URL_ALLOW_PRIVATE_NETWORK`
- `TYPESENSE_PROTOCOL`
- `TYPESENSE_HOST`
- `TYPESENSE_PORT`
@@ -65,9 +79,17 @@ Key environment variables used by `api` and `worker` in compose:
Selected defaults from `Settings` (`backend/app/core/config.py`):
- `upload_chunk_size = 4194304`
- `max_upload_files_per_request = 50`
- `max_upload_file_size_bytes = 26214400`
- `max_upload_request_size_bytes = 104857600`
- `max_zip_members = 250`
- `max_zip_depth = 2`
- `max_zip_member_uncompressed_bytes = 26214400`
- `max_zip_total_uncompressed_bytes = 157286400`
- `max_zip_compression_ratio = 120.0`
- `max_text_length = 500000`
- `processing_log_max_document_sessions = 20`
- `processing_log_max_unbound_entries = 400`
- `default_openai_model = "gpt-4.1-mini"`
- `default_openai_timeout_seconds = 45`
- `default_summary_model = "gpt-4.1-mini"`
@@ -105,6 +127,25 @@ Settings include:
Retention settings are used by worker cleanup and by `POST /api/v1/processing/logs/trim` when trim query values are not provided.
## Security Controls
- Privileged APIs are token-gated with bearer auth:
- `documents` endpoints: user token or admin token
- `settings` and `processing/logs` endpoints: admin token only
- Authentication fails closed when `ADMIN_API_TOKEN` is not configured.
- Provider base URLs are validated on settings updates and before outbound model calls:
- allowlist enforcement (`PROVIDER_BASE_URL_ALLOWLIST`)
- scheme restrictions (`https` by default)
- local/private-network blocking and per-request DNS revalidation checks for outbound runtime calls
- Upload and archive safety guards are enforced:
- multipart upload requires `Content-Length` and enforces file-count, per-file size, and total request size limits
- ZIP member count, per-member uncompressed size, total decompressed size, and compression-ratio guards
- Processing logs redact sensitive payload and text fields, and trim endpoints enforce retention caps from runtime config.
- Compose hardening defaults:
- host ports bind to `127.0.0.1` unless `HOST_BIND_IP` override is set
- `api`, `worker`, and `frontend` drop all Linux capabilities and set `no-new-privileges`
- backend and frontend containers run as non-root users by default
## Validation Checklist
After operational or configuration changes, verify: