# LedgerDock LedgerDock is a self-hosted document management system (DMS) for ingesting, processing, organizing, and searching files. ## Core Capabilities - Drag and drop upload from anywhere in the UI - File and folder upload with path preservation - Asynchronous extraction and OCR for PDF, images, DOCX, XLSX, TXT, and ZIP - Metadata and full-text search - Routing suggestions based on previous decisions - Original file download and extracted markdown export ## Technology Stack - Backend: FastAPI, SQLAlchemy, RQ worker (`backend/`) - Frontend: React, Vite, TypeScript (`frontend/`) - Infrastructure: PostgreSQL, Redis, Typesense (`docker-compose.yml`) ## Runtime Services The default `docker compose` stack includes: - `frontend` - React UI (`http://localhost:5173`) - `api` - FastAPI backend (`http://localhost:8000`, docs at `/docs`) - `worker` - background processing jobs - `db` - PostgreSQL (internal service network) - `redis` - queue backend (internal service network) - `typesense` - search index (internal service network) ## Requirements - Docker Engine - Docker Compose plugin - Internet access for first-time image build ## Quick Start From repository root: ```bash docker compose up --build -d ``` Before first run, set required secrets and connection values in `.env` (or your shell): - `POSTGRES_USER` - `POSTGRES_PASSWORD` - `POSTGRES_DB` - `DATABASE_URL` - `REDIS_PASSWORD` - `REDIS_URL` - `AUTH_BOOTSTRAP_ADMIN_USERNAME` - `AUTH_BOOTSTRAP_ADMIN_PASSWORD` - optional `AUTH_BOOTSTRAP_USER_USERNAME` - optional `AUTH_BOOTSTRAP_USER_PASSWORD` - `APP_SETTINGS_ENCRYPTION_KEY` - `TYPESENSE_API_KEY` Start from `.env.example` to avoid missing required variables. Open: - Frontend: `http://localhost:5173` - API docs: `http://localhost:8000/docs` - Health: `http://localhost:8000/api/v1/health` Use bootstrap credentials (`AUTH_BOOTSTRAP_ADMIN_USERNAME` and `AUTH_BOOTSTRAP_ADMIN_PASSWORD`) to sign in from the frontend login screen. Stop the stack: ```bash docker compose down ``` ## Security Must-Know Before Real User Deployment The items below port the `MUST KNOW User-Dependent Risks` from `REPORT.md` into explicit operator actions. ### High: Development-first defaults can be promoted to production Avoid: - Set `APP_ENV=production`. - Set `PROVIDER_BASE_URL_ALLOW_HTTP=false`. - Set `PROVIDER_BASE_URL_ALLOW_PRIVATE_NETWORK=false`. - Set a strict non-empty `PROVIDER_BASE_URL_ALLOWLIST` for approved provider hosts only. - Set `PUBLIC_BASE_URL` to HTTPS. - Restrict `CORS_ORIGINS` to exact production frontend origins. - Use `REDIS_URL` with `rediss://`. - Set `REDIS_SECURITY_MODE=strict`. - Set `REDIS_TLS_MODE=required`. - Keep `HOST_BIND_IP=127.0.0.1` and expose services only through an HTTPS reverse proxy. Remedy: - Immediately correct the values above and redeploy `api` and `worker` (`docker compose up -d api worker`). - Rotate `AUTH_BOOTSTRAP_*` credentials, provider API keys, and Redis credentials if insecure values were used in a reachable environment. - Re-check `.env.example` and `docker-compose.yml` before each production promotion. ### Medium: Login throttle IP identity depends on proxy trust model Current behavior: - Login throttle identity currently uses `request.client.host` directly. Avoid: - Deploy so the backend receives true client IP addresses and does not collapse all traffic to one proxy source IP. - Validate lockout behavior with multiple client IPs before going live behind a proxy. Remedy: - If lockouts affect many users at once, temporarily increase `AUTH_LOGIN_FAILURE_LIMIT` and tune lockout timings to reduce impact while mitigation is in progress. - Update network and proxy topology so client IP identity is preserved for the backend, then re-run lockout validation tests. ### Medium: API documentation endpoints are exposed by default Avoid: - Block public access to `/docs`, `/redoc`, and `/openapi.json` at the reverse proxy or edge firewall. - Keep docs endpoints reachable only from trusted internal/admin networks. Remedy: - Add deny rules for those paths immediately and reload the proxy. - Verify those routes return `403` or `404` from untrusted networks. ### Medium: Auth session tokens are cookie-based Avoid: - Keep dependencies patched to reduce known XSS vectors. - Keep frontend dependencies locked and scanned for known payload paths. - Treat any suspected script injection as a session risk and rotate bootstrap credentials immediately. Remedy: - If script injection is suspected, revoke active sessions, rotate bootstrap credentials, and redeploy frontend fixes before restoring access. - Treat exposed sessions as compromised until revocation and credential rotation are complete. - Cookies are HttpOnly and cannot be read by JavaScript, but session scope still ends on server-side revocation and expiry controls. ### Low: Typesense transport defaults to HTTP on internal network Avoid: - Keep Typesense on isolated internal networks only. - Do not expose Typesense service ports directly to untrusted networks. Remedy: - For cross-host or untrusted network paths, terminate TLS in front of Typesense (or use equivalent secure service networking) and require encrypted transport for all clients. ## Common Operations Start or rebuild: ```bash docker compose up --build -d ``` Stop: ```bash docker compose down ``` Tail logs: ```bash docker compose logs -f ``` Tail API and worker logs only: ```bash docker compose logs -f api worker ``` Reset all runtime data (destructive): ```bash docker compose down -v ``` ## Frontend-Only Local Workflow If backend services are already running, you can run frontend tooling locally: ```bash cd frontend && npm run dev cd frontend && npm run build cd frontend && npm run preview ``` `npm run preview` serves the built app on port `4173`. ## Configuration Main runtime variables are defined in `docker-compose.yml`: - API and worker: `DATABASE_URL`, `REDIS_URL`, `REDIS_SECURITY_MODE`, `REDIS_TLS_MODE`, `STORAGE_ROOT`, `PUBLIC_BASE_URL`, `CORS_ORIGINS`, `AUTH_BOOTSTRAP_*`, `PROCESSING_LOG_STORE_*`, `CONTENT_EXPORT_*`, `TYPESENSE_*`, `APP_SETTINGS_ENCRYPTION_KEY` - Frontend: optional `VITE_API_BASE` When `VITE_API_BASE` is unset, the frontend uses `http://:8000/api/v1`. Application settings saved from the UI persist at: - `/settings.json` (inside the storage volume) Provider API keys are persisted encrypted at rest (`api_key_encrypted`) and are no longer written as plaintext values. Settings endpoints: - `GET/PATCH /api/v1/settings` - `POST /api/v1/settings/reset` - `PATCH /api/v1/settings/handwriting` - `POST /api/v1/processing/logs/trim` (admin only) Auth endpoints: - `POST /api/v1/auth/login` - `GET /api/v1/auth/me` - `POST /api/v1/auth/logout` Detailed DEV and LIVE environment guidance, including HTTPS reverse-proxy deployment values, is documented in `doc/operations-and-configuration.md` and `.env.example`. ## Data Persistence Docker named volumes used by the stack: - `db-data` - `redis-data` - `dcm-storage` - `typesense-data` ## Validation Checklist After setup or config changes, verify: - `GET /api/v1/health` returns `{"status":"ok"}` - Upload and processing complete successfully - Search returns expected results - Preview and download work for uploaded documents - `docker compose logs -f api worker` has no failures ## Repository Layout - `backend/` - FastAPI API, services, models, worker - `frontend/` - React application - `doc/` - technical documentation for architecture, API, data model, and operations - `docker-compose.yml` - local runtime topology ## Documentation Index - `doc/README.md` - technical documentation entrypoint - `doc/architecture-overview.md` - service and runtime architecture - `doc/api-contract.md` - endpoint and payload contract - `doc/data-model-reference.md` - persistence model reference - `doc/operations-and-configuration.md` - runtime operations and configuration - `doc/frontend-design-foundation.md` - frontend design rules