Files
ledgerdock/doc/architecture-overview.md

3.7 KiB

Architecture Overview

System Topology

DMS runs as a multi-service application defined in docker-compose.yml:

  • frontend serves the React UI on port 5173
  • api serves FastAPI on port 8000
  • worker executes asynchronous extraction and indexing jobs
  • db provides PostgreSQL persistence on the internal compose network
  • redis backs queueing on the internal compose network
  • typesense stores search index and vector-adjacent metadata on the internal compose network

Backend Architecture

Backend source root: backend/app/

Main boundaries:

  • api/ route handlers and HTTP contract
  • services/ domain logic (authentication, storage, extraction, routing, settings, processing logs, Typesense)
  • db/ SQLAlchemy base, engine, and session lifecycle
  • models/ persistence entities (AppUser, AuthSession, Document, ProcessingLogEntry)
  • schemas/ Pydantic response and request schemas
  • worker/ RQ queue integration and background processing tasks

Application bootstrap in backend/app/main.py:

  • mounts routers under /api/v1
  • configures CORS from settings
  • initializes storage, database schema, bootstrap users, settings, and Typesense collection on startup

Processing Lifecycle

  1. Upload starts at POST /api/v1/documents/upload.
  2. API stores file bytes and inserts document rows with status queued.
  3. API enqueues app.worker.tasks.process_document_task into Redis.
  4. Worker extracts content and metadata, handles ZIP expansion, runs OCR and routing suggestions, and writes processing logs.
  5. Worker updates database fields, document status, and search index entries.
  6. UI polls for documents and processing logs to reflect progress.

Frontend Architecture

Frontend source root: frontend/src/

Core structure:

  • App.tsx orchestrates screen switching, state, polling, and action flows
  • components/ contains upload, filter, grid, viewer, modal, settings, and log panel modules
  • lib/api.ts centralizes API client calls
  • types.ts defines typed API contracts used by components
  • design-foundation.css and styles.css define design tokens and global/component styling

Main user flows:

  • Login and role-gated navigation (admin and user)
  • Upload and conflict resolution
  • Search and filtered document browsing
  • Metadata editing and lifecycle actions (trash, restore, delete, reprocess)
  • Settings management for providers, tasks, and UI defaults (admin only)
  • Processing log review (admin only)

Persistence and State

Persistent data:

  • PostgreSQL stores document metadata and processing logs
  • Docker volume-backed storage keeps original files, previews, and settings JSON
  • Typesense stores indexed search representations

Transient runtime state:

  • Redis queues processing tasks and worker execution state
  • frontend local component state drives active filters, selection, and modal flows

Security-sensitive runtime behavior:

  • API access is session-based with per-user server-issued bearer tokens and role checks.
  • Document and search reads for user role are owner-scoped via owner_user_id; admin can access global scope.
  • Redis connection URLs are validated by backend queue helpers with environment-aware auth and TLS policy enforcement.
  • Worker startup runs through python -m app.worker.run_worker, which validates Redis URL policy before queue consumption.
  • Inline preview is limited to safe MIME types and script-capable content is served as attachment-only.
  • Archive fan-out processing propagates root and depth lineage metadata and enforces depth and per-root descendant caps.
  • Markdown export applies per-user rate limits, hard document-count and total-byte caps, and spool-file streaming.
  • Processing logs default to metadata-only persistence, with explicit operator toggles required to store model IO text.