LedgerDock

LedgerDock is a self-hosted document management system (DMS) for ingesting, processing, organizing, and searching files.

Core Capabilities

  • Drag and drop upload from anywhere in the UI
  • File and folder upload with path preservation
  • Asynchronous extraction and OCR for PDF, images, DOCX, XLSX, TXT, and ZIP
  • Metadata and full-text search
  • Routing suggestions based on previous decisions
  • Original file download and extracted markdown export

Technology Stack

  • Backend: FastAPI, SQLAlchemy, RQ worker (backend/)
  • Frontend: React, Vite, TypeScript (frontend/)
  • Infrastructure: PostgreSQL, Redis, Typesense (docker-compose.yml)

Runtime Services

The default docker compose stack includes:

  • frontend - React UI (http://localhost:5173)
  • api - FastAPI backend (http://localhost:8000, docs at /docs)
  • worker - background processing jobs
  • db - PostgreSQL (internal service network)
  • redis - queue backend (internal service network)
  • typesense - search index (internal service network)

Requirements

  • Docker Engine
  • Docker Compose plugin
  • Internet access for first-time image build

Quick Start

From repository root:

docker compose up --build -d

Before first run, set required secrets and connection values in .env (or your shell):

  • POSTGRES_USER
  • POSTGRES_PASSWORD
  • POSTGRES_DB
  • DATABASE_URL
  • REDIS_PASSWORD
  • REDIS_URL
  • AUTH_BOOTSTRAP_ADMIN_USERNAME
  • AUTH_BOOTSTRAP_ADMIN_PASSWORD
  • optional AUTH_BOOTSTRAP_USER_USERNAME
  • optional AUTH_BOOTSTRAP_USER_PASSWORD
  • APP_SETTINGS_ENCRYPTION_KEY
  • TYPESENSE_API_KEY

Start from .env.example to avoid missing required variables.

Open:

  • Frontend: http://localhost:5173
  • API docs: http://localhost:8000/docs
  • Health: http://localhost:8000/api/v1/health

Use bootstrap credentials (AUTH_BOOTSTRAP_ADMIN_USERNAME and AUTH_BOOTSTRAP_ADMIN_PASSWORD) to sign in from the frontend login screen.

Stop the stack:

docker compose down

Security Must-Know Before Real User Deployment

The items below port the MUST KNOW User-Dependent Risks from REPORT.md into explicit operator actions.

High: Development-first defaults can be promoted to production

Avoid:

  • Set APP_ENV=production.
  • Set PROVIDER_BASE_URL_ALLOW_HTTP=false.
  • Set PROVIDER_BASE_URL_ALLOW_PRIVATE_NETWORK=false.
  • Set a strict non-empty PROVIDER_BASE_URL_ALLOWLIST for approved provider hosts only.
  • Set PUBLIC_BASE_URL to HTTPS.
  • Restrict CORS_ORIGINS to exact production frontend origins.
  • Use REDIS_URL with rediss://.
  • Set REDIS_SECURITY_MODE=strict.
  • Set REDIS_TLS_MODE=required.
  • Keep HOST_BIND_IP=127.0.0.1 and expose services only through an HTTPS reverse proxy.

Remedy:

  • Immediately correct the values above and redeploy api and worker (docker compose up -d api worker).
  • Rotate AUTH_BOOTSTRAP_* credentials, provider API keys, and Redis credentials if insecure values were used in a reachable environment.
  • Re-check .env.example and docker-compose.yml before each production promotion.

Medium: Login throttle IP identity depends on proxy trust model

Current behavior:

  • Login throttle identity currently uses request.client.host directly.

Avoid:

  • Deploy so the backend receives true client IP addresses and does not collapse all traffic to one proxy source IP.
  • Validate lockout behavior with multiple client IPs before going live behind a proxy.

Remedy:

  • If lockouts affect many users at once, temporarily increase AUTH_LOGIN_FAILURE_LIMIT and tune lockout timings to reduce impact while mitigation is in progress.
  • Update network and proxy topology so client IP identity is preserved for the backend, then re-run lockout validation tests.

Medium: API documentation endpoints are exposed by default

Avoid:

  • Block public access to /docs, /redoc, and /openapi.json at the reverse proxy or edge firewall.
  • Keep docs endpoints reachable only from trusted internal/admin networks.

Remedy:

  • Add deny rules for those paths immediately and reload the proxy.
  • Verify those routes return 403 or 404 from untrusted networks.

Medium: Bearer token is stored in browser sessionStorage

Avoid:

  • Enforce strict CSP and disallow inline script execution where possible.
  • Avoid rendering untrusted HTML or script-capable content in the frontend.
  • Keep dependencies patched to reduce known XSS vectors.

Remedy:

  • If XSS is suspected, revoke active sessions, rotate privileged credentials, and redeploy frontend fixes before restoring user access.
  • Treat exposed browser sessions as compromised until revocation and credential rotation are complete.

Low: Typesense transport defaults to HTTP on internal network

Avoid:

  • Keep Typesense on isolated internal networks only.
  • Do not expose Typesense service ports directly to untrusted networks.

Remedy:

  • For cross-host or untrusted network paths, terminate TLS in front of Typesense (or use equivalent secure service networking) and require encrypted transport for all clients.

Common Operations

Start or rebuild:

docker compose up --build -d

Stop:

docker compose down

Tail logs:

docker compose logs -f

Tail API and worker logs only:

docker compose logs -f api worker

Reset all runtime data (destructive):

docker compose down -v

Frontend-Only Local Workflow

If backend services are already running, you can run frontend tooling locally:

cd frontend && npm run dev
cd frontend && npm run build
cd frontend && npm run preview

npm run preview serves the built app on port 4173.

Configuration

Main runtime variables are defined in docker-compose.yml:

  • API and worker: DATABASE_URL, REDIS_URL, REDIS_SECURITY_MODE, REDIS_TLS_MODE, STORAGE_ROOT, PUBLIC_BASE_URL, CORS_ORIGINS, AUTH_BOOTSTRAP_*, PROCESSING_LOG_STORE_*, CONTENT_EXPORT_*, TYPESENSE_*, APP_SETTINGS_ENCRYPTION_KEY
  • Frontend: optional VITE_API_BASE

When VITE_API_BASE is unset, the frontend uses http://<current-hostname>:8000/api/v1.

Application settings saved from the UI persist at:

  • <STORAGE_ROOT>/settings.json (inside the storage volume)

Provider API keys are persisted encrypted at rest (api_key_encrypted) and are no longer written as plaintext values.

Settings endpoints:

  • GET/PATCH /api/v1/settings
  • POST /api/v1/settings/reset
  • PATCH /api/v1/settings/handwriting
  • POST /api/v1/processing/logs/trim (admin only)

Auth endpoints:

  • POST /api/v1/auth/login
  • GET /api/v1/auth/me
  • POST /api/v1/auth/logout

Detailed DEV and LIVE environment guidance, including HTTPS reverse-proxy deployment values, is documented in doc/operations-and-configuration.md and .env.example.

Data Persistence

Docker named volumes used by the stack:

  • db-data
  • redis-data
  • dcm-storage
  • typesense-data

Validation Checklist

After setup or config changes, verify:

  • GET /api/v1/health returns {"status":"ok"}
  • Upload and processing complete successfully
  • Search returns expected results
  • Preview and download work for uploaded documents
  • docker compose logs -f api worker has no failures

Repository Layout

  • backend/ - FastAPI API, services, models, worker
  • frontend/ - React application
  • doc/ - technical documentation for architecture, API, data model, and operations
  • docker-compose.yml - local runtime topology

Documentation Index

  • doc/README.md - technical documentation entrypoint
  • doc/architecture-overview.md - service and runtime architecture
  • doc/api-contract.md - endpoint and payload contract
  • doc/data-model-reference.md - persistence model reference
  • doc/operations-and-configuration.md - runtime operations and configuration
  • doc/frontend-design-foundation.md - frontend design rules
Description
Document command deck for OCR, routing intelligence, and controlled metadata ops.
Readme 979 KiB
Languages
Python 67.2%
TypeScript 27.8%
CSS 4.7%
Dockerfile 0.3%