# Architecture Overview ## System Topology DMS runs as a multi-service application defined in `docker-compose.yml`: - `frontend` serves the React UI on port `5173` - `api` serves FastAPI on port `8000` - `worker` executes asynchronous extraction and indexing jobs - `db` provides PostgreSQL persistence on the internal compose network - `redis` backs queueing on the internal compose network - `typesense` stores search index and vector-adjacent metadata on the internal compose network ## Backend Architecture Backend source root: `backend/app/` Main boundaries: - `api/` route handlers and HTTP contract - `services/` domain logic (authentication, storage, extraction, routing, settings, processing logs, Typesense) - `db/` SQLAlchemy base, engine, and session lifecycle - `models/` persistence entities (`AppUser`, `AuthSession`, `Document`, `ProcessingLogEntry`) - `schemas/` Pydantic response and request schemas - `worker/` RQ queue integration and background processing tasks Application bootstrap in `backend/app/main.py`: - mounts routers under `/api/v1` - configures CORS from settings - initializes storage, database schema, bootstrap users, settings, and Typesense collection on startup ## Processing Lifecycle 1. Upload starts at `POST /api/v1/documents/upload`. 2. API stores file bytes and inserts document rows with status `queued`. 3. API enqueues `app.worker.tasks.process_document_task` into Redis. 4. Worker extracts content and metadata, handles ZIP expansion, runs OCR and routing suggestions, and writes processing logs. 5. Worker updates database fields, document status, and search index entries. 6. UI polls for documents and processing logs to reflect progress. ## Frontend Architecture Frontend source root: `frontend/src/` Core structure: - `App.tsx` orchestrates screen switching, state, polling, and action flows - `components/` contains upload, filter, grid, viewer, modal, settings, and log panel modules - `lib/api.ts` centralizes API client calls - `types.ts` defines typed API contracts used by components - `design-foundation.css` and `styles.css` define design tokens and global/component styling Main user flows: - Login and role-gated navigation (`admin` and `user`) - Upload and conflict resolution - Search and filtered document browsing - Metadata editing and lifecycle actions (trash, restore, delete, reprocess) - Settings management for providers, tasks, and UI defaults (admin only) - Processing log review (admin only) ## Persistence and State Persistent data: - PostgreSQL stores document metadata and processing logs - Docker volume-backed storage keeps original files, previews, and settings JSON - Typesense stores indexed search representations Transient runtime state: - Redis queues processing tasks and worker execution state - frontend local component state drives active filters, selection, and modal flows Security-sensitive runtime behavior: - API access is session-based with per-user server-issued bearer tokens and role checks. - Document and search reads for `user` role are owner-scoped via `owner_user_id`; `admin` can access global scope. - Redis connection URLs are validated by backend queue helpers with environment-aware auth and TLS policy enforcement. - Worker startup runs through `python -m app.worker.run_worker`, which validates Redis URL policy before queue consumption. - Inline preview is limited to safe MIME types and script-capable content is served as attachment-only. - Archive fan-out processing propagates root and depth lineage metadata and enforces depth and per-root descendant caps. - Markdown export applies per-user rate limits, hard document-count and total-byte caps, and spool-file streaming. - Processing logs default to metadata-only persistence, with explicit operator toggles required to store model IO text.