From 6fba5818658e8183d7f0c6c0dbf61c07cf792411 Mon Sep 17 00:00:00 2001 From: Beda Schmid Date: Mon, 2 Mar 2026 13:40:29 -0300 Subject: [PATCH] Rewrite README for end-user Docker setup and env guidance --- README.md | 258 +++++++++++++++--------------------------------------- 1 file changed, 73 insertions(+), 185 deletions(-) diff --git a/README.md b/README.md index 710ee1b..0fd01bb 100644 --- a/README.md +++ b/README.md @@ -1,146 +1,90 @@ # LedgerDock -LedgerDock is a self-hosted document management system (DMS) for ingesting, processing, organizing, and searching files. +LedgerDock is a private document workspace you can run on your own computer or server. +It helps teams collect files, process text from documents, and find information quickly with search. -## Core Capabilities +## What LedgerDock Is For -- Drag and drop upload from anywhere in the UI -- File and folder upload with path preservation -- Asynchronous extraction and OCR for PDF, images, DOCX, XLSX, TXT, and ZIP -- Metadata and full-text search -- Routing suggestions based on previous decisions -- Original file download and extracted markdown export +- Upload files and folders from one place +- Keep documents organized and searchable +- Extract text from scans and images (OCR) +- Download originals or extracted text -## Technology Stack +## Before You Start -- Backend: FastAPI, SQLAlchemy, RQ worker (`backend/`) -- Frontend: React, Vite, TypeScript (`frontend/`) -- Infrastructure: PostgreSQL, Redis, Typesense (`docker-compose.yml`) +You need: -## Runtime Services +- Docker Desktop (Windows or macOS) or Docker Engine + Docker Compose (Linux) +- A terminal app +- The project folder on your machine +- Internet access the first time you build containers -The default `docker compose` stack includes: +## Install With Docker Compose -- `frontend` - React UI (`http://localhost:5173`) -- `api` - FastAPI backend (`http://localhost:8000`, docs at `/docs`) -- `worker` - background processing jobs -- `db` - PostgreSQL (internal service network) -- `redis` - queue backend (internal service network) -- `typesense` - search index (internal service network) +Follow these steps from the project folder (where `docker-compose.yml` is located). -## Requirements +1. Create your local settings file from the template. -- Docker Engine -- Docker Compose plugin -- Internet access for first-time image build +```bash +cp .env.example .env +``` -## Quick Start - -From repository root: +2. Open `.env` in a text editor and set your own passwords and keys. +3. Start LedgerDock. ```bash docker compose up --build -d ``` -Before first run, set required secrets and connection values in `.env` (or your shell): +4. Wait until startup is complete, then open the app: +- LedgerDock web app: `http://localhost:5173` +- Health check: `http://localhost:8000/api/v1/health` +5. Sign in with the admin username and password you set in `.env`. -- `POSTGRES_USER` -- `POSTGRES_PASSWORD` -- `POSTGRES_DB` -- `DATABASE_URL` -- `REDIS_PASSWORD` -- `REDIS_URL` -- `AUTH_BOOTSTRAP_ADMIN_USERNAME` -- `AUTH_BOOTSTRAP_ADMIN_PASSWORD` -- optional `AUTH_BOOTSTRAP_USER_USERNAME` -- optional `AUTH_BOOTSTRAP_USER_PASSWORD` -- `APP_SETTINGS_ENCRYPTION_KEY` -- `TYPESENSE_API_KEY` +## `.env` Settings Explained In Plain Language -Start from `.env.example` to avoid missing required variables. +LedgerDock reads settings from `.env`. Some values are required and some are optional. -Open: +### Required: Change These Before First Use -- Frontend: `http://localhost:5173` -- API docs: `http://localhost:8000/docs` -- Health: `http://localhost:8000/api/v1/health` +- `POSTGRES_PASSWORD`: Password for the internal database. +- `REDIS_PASSWORD`: Password for the internal queue service. +- `AUTH_BOOTSTRAP_ADMIN_PASSWORD`: First admin login password. +- `APP_SETTINGS_ENCRYPTION_KEY`: Secret used to protect saved app settings. +- `TYPESENSE_API_KEY`: Secret key for the search engine. -Use bootstrap credentials (`AUTH_BOOTSTRAP_ADMIN_USERNAME` and `AUTH_BOOTSTRAP_ADMIN_PASSWORD`) to sign in from the frontend login screen. +Use long, unique values for each one. Do not reuse personal passwords. -Stop the stack: +### Required: Usually Keep Defaults Unless You Know You Need Changes -```bash -docker compose down -``` +- `POSTGRES_USER`: Database username. +- `POSTGRES_DB`: Database name. +- `DATABASE_URL`: Connection string to the database service. +- `REDIS_URL`: Connection string to the Redis service. +- `AUTH_BOOTSTRAP_ADMIN_USERNAME`: First admin username (default `admin`). -## Security Must-Know Before Real User Deployment +If you change passwords, make sure matching URLs use the same new password. -The items below port the `MUST KNOW User-Dependent Risks` from `REPORT.md` into explicit operator actions. +### Optional User Account (Can Be Left Empty) -### High: Development-first defaults can be promoted to production +- `AUTH_BOOTSTRAP_USER_USERNAME` +- `AUTH_BOOTSTRAP_USER_PASSWORD` -Avoid: -- Set `APP_ENV=production`. -- Set `PROVIDER_BASE_URL_ALLOW_HTTP=false`. -- Set `PROVIDER_BASE_URL_ALLOW_PRIVATE_NETWORK=false`. -- Set a strict non-empty `PROVIDER_BASE_URL_ALLOWLIST` for approved provider hosts only. -- Set `PUBLIC_BASE_URL` to HTTPS. -- Restrict `CORS_ORIGINS` to exact production frontend origins. -- Use `REDIS_URL` with `rediss://`. -- Set `REDIS_SECURITY_MODE=strict`. -- Set `REDIS_TLS_MODE=required`. -- Keep `HOST_BIND_IP=127.0.0.1` and expose services only through an HTTPS reverse proxy. +These create an extra non-admin account on first startup. -Remedy: -- Immediately correct the values above and redeploy `api` and `worker` (`docker compose up -d api worker`). -- Rotate `AUTH_BOOTSTRAP_*` credentials, provider API keys, and Redis credentials if insecure values were used in a reachable environment. -- Re-check `.env.example` and `docker-compose.yml` before each production promotion. +### Network and Access Settings -### Medium: Login throttle IP identity depends on proxy trust model +- `HOST_BIND_IP`: Where services listen. Keep `127.0.0.1` for local-only access. +- `PUBLIC_BASE_URL`: Backend base URL. Local default is `http://localhost:8000`. +- `CORS_ORIGINS`: Allowed frontend origins. Keep local defaults for single-machine use. +- `VITE_API_BASE`: Frontend API URL override. Leave empty unless you know you need it. -Current behavior: -- Login throttle identity currently uses `request.client.host` directly. +### Environment Mode -Avoid: -- Deploy so the backend receives true client IP addresses and does not collapse all traffic to one proxy source IP. -- Validate lockout behavior with multiple client IPs before going live behind a proxy. +- `APP_ENV=development`: Local mode (default). +- `APP_ENV=production`: Use when running as a real shared deployment with HTTPS and tighter security settings. -Remedy: -- If lockouts affect many users at once, temporarily increase `AUTH_LOGIN_FAILURE_LIMIT` and tune lockout timings to reduce impact while mitigation is in progress. -- Update network and proxy topology so client IP identity is preserved for the backend, then re-run lockout validation tests. - -### Medium: API documentation endpoints are exposed by default - -Avoid: -- Block public access to `/docs`, `/redoc`, and `/openapi.json` at the reverse proxy or edge firewall. -- Keep docs endpoints reachable only from trusted internal/admin networks. - -Remedy: -- Add deny rules for those paths immediately and reload the proxy. -- Verify those routes return `403` or `404` from untrusted networks. - -### Medium: Auth session tokens are cookie-based - -Avoid: -- Keep dependencies patched to reduce known XSS vectors. -- Keep frontend dependencies locked and scanned for known payload paths. -- Treat any suspected script injection as a session risk and rotate bootstrap credentials immediately. - -Remedy: -- If script injection is suspected, revoke active sessions, rotate bootstrap credentials, and redeploy frontend fixes before restoring access. -- Treat exposed sessions as compromised until revocation and credential rotation are complete. -- Cookies are HttpOnly and cannot be read by JavaScript, but session scope still ends on server-side revocation and expiry controls. - -### Low: Typesense transport defaults to HTTP on internal network - -Avoid: -- Keep Typesense on isolated internal networks only. -- Do not expose Typesense service ports directly to untrusted networks. - -Remedy: -- For cross-host or untrusted network paths, terminate TLS in front of Typesense (or use equivalent secure service networking) and require encrypted transport for all clients. - -## Common Operations +## Daily Use Commands Start or rebuild: @@ -154,97 +98,41 @@ Stop: docker compose down ``` -Tail logs: +View logs: ```bash docker compose logs -f ``` -Tail API and worker logs only: +View backend logs only: ```bash docker compose logs -f api worker ``` -Reset all runtime data (destructive): +## Where Your Data Is Stored + +LedgerDock stores data in Docker volumes so it survives container restarts: + +- `db-data` for PostgreSQL data +- `redis-data` for Redis data +- `dcm-storage` for uploaded files and app storage +- `typesense-data` for the search index + +To remove everything, including data: ```bash docker compose down -v ``` -## Frontend-Only Local Workflow +Warning: this permanently deletes your LedgerDock data on this machine. -If backend services are already running, you can run frontend tooling locally: +## First Checks After Install -```bash -cd frontend && npm run dev -cd frontend && npm run build -cd frontend && npm run preview -``` +- Open `http://localhost:5173` and confirm the login page appears. +- Open `http://localhost:8000/api/v1/health` and confirm you get `{"status":"ok"}`. +- Upload one sample file and confirm it appears in search. -`npm run preview` serves the built app on port `4173`. +## Need Technical Documentation? -## Configuration - -Main runtime variables are defined in `docker-compose.yml`: - -- API and worker: `DATABASE_URL`, `REDIS_URL`, `REDIS_SECURITY_MODE`, `REDIS_TLS_MODE`, `STORAGE_ROOT`, `PUBLIC_BASE_URL`, `CORS_ORIGINS`, `AUTH_BOOTSTRAP_*`, `PROCESSING_LOG_STORE_*`, `CONTENT_EXPORT_*`, `TYPESENSE_*`, `APP_SETTINGS_ENCRYPTION_KEY` -- Frontend: optional `VITE_API_BASE` - -When `VITE_API_BASE` is unset, the frontend uses `http://:8000/api/v1`. - -Application settings saved from the UI persist at: - -- `/settings.json` (inside the storage volume) - -Provider API keys are persisted encrypted at rest (`api_key_encrypted`) and are no longer written as plaintext values. - -Settings endpoints: - -- `GET/PATCH /api/v1/settings` -- `POST /api/v1/settings/reset` -- `PATCH /api/v1/settings/handwriting` -- `POST /api/v1/processing/logs/trim` (admin only) - -Auth endpoints: - -- `POST /api/v1/auth/login` -- `GET /api/v1/auth/me` -- `POST /api/v1/auth/logout` - -Detailed DEV and LIVE environment guidance, including HTTPS reverse-proxy deployment values, is documented in `doc/operations-and-configuration.md` and `.env.example`. - -## Data Persistence - -Docker named volumes used by the stack: - -- `db-data` -- `redis-data` -- `dcm-storage` -- `typesense-data` - -## Validation Checklist - -After setup or config changes, verify: - -- `GET /api/v1/health` returns `{"status":"ok"}` -- Upload and processing complete successfully -- Search returns expected results -- Preview and download work for uploaded documents -- `docker compose logs -f api worker` has no failures - -## Repository Layout - -- `backend/` - FastAPI API, services, models, worker -- `frontend/` - React application -- `doc/` - technical documentation for architecture, API, data model, and operations -- `docker-compose.yml` - local runtime topology - -## Documentation Index - -- `doc/README.md` - technical documentation entrypoint -- `doc/architecture-overview.md` - service and runtime architecture -- `doc/api-contract.md` - endpoint and payload contract -- `doc/data-model-reference.md` - persistence model reference -- `doc/operations-and-configuration.md` - runtime operations and configuration -- `doc/frontend-design-foundation.md` - frontend design rules +Developer and operator docs are in `doc/`, starting at `doc/README.md`.