docs: complete repository technical documentation refresh

This commit is contained in:
2026-02-21 11:27:44 -03:00
parent 17dccbbe20
commit 6501841426
6 changed files with 454 additions and 88 deletions

161
README.md
View File

@@ -1,118 +1,121 @@
# DMS
DMS is a single-user document management system with:
- drag-and-drop upload anywhere in the UI
- file and folder upload
- document processing and indexing (PDF, text, OpenAI handwriting/image transcription, DOCX, XLSX, ZIP extraction)
- fallback handling for unsupported formats
- original file preservation and download
- metadata-based and full-text search
- learning-based routing suggestions
DMS is a single-user document management system for ingesting, processing, organizing, and searching files.
Core capabilities:
- drag-and-drop upload from anywhere in the UI
- file and folder upload with path preservation
- asynchronous processing with OCR and extraction for PDF, images, DOCX, XLSX, TXT, and ZIP
- metadata and full-text search
- routing suggestions based on learned decisions
- original file download and extracted markdown export
## Stack
- Backend: FastAPI + SQLAlchemy + RQ worker (`backend/`)
- Frontend: React + Vite + TypeScript (`frontend/`)
- Infrastructure: Postgres, Redis, Typesense (`docker-compose.yml`)
## Requirements
- Docker Engine with Docker Compose plugin
- Docker Engine
- Docker Compose plugin
- Internet access for the first image build
## Install And Run With Docker Compose
## Quick Start
1. Open a terminal in this repository root.
2. Start the full stack:
1. Start the full stack from repository root:
```bash
docker compose up --build -d
```
3. Open the applications:
2. Open services:
- Frontend: `http://localhost:5173`
- Backend API docs: `http://localhost:8000/docs`
- Health check: `http://localhost:8000/api/v1/health`
- Backend OpenAPI docs: `http://localhost:8000/docs`
- Health endpoint: `http://localhost:8000/api/v1/health`
## Setup
1. Open the frontend and upload files or folders.
2. Set default destination path and tags before upload if needed.
3. Configure handwriting transcription settings in the UI:
- OpenAI compatible base URL
- model (default: `gpt-4.1-mini`)
- API key and timeout
4. Open a document in the details panel, adjust destination and tags, then save.
5. Keep `Learn from this routing decision` enabled to train future routing suggestions.
## Data Persistence On Host
All runtime data is stored on the host using bind mounts.
Default host location:
- `./data/postgres`
- `./data/redis`
- `./data/storage`
To persist under another host directory, set `DCM_DATA_DIR`:
```bash
DCM_DATA_DIR=/data docker compose up --build -d
```
This will place runtime data under `/data` on the host.
## Common Commands
Start:
```bash
docker compose up --build -d
```
Stop:
3. Stop when done:
```bash
docker compose down
```
View logs:
## Common Commands
Start services:
```bash
docker compose up --build -d
```
Stop services:
```bash
docker compose down
```
Stream logs:
```bash
docker compose logs -f
```
Rebuild a clean stack while keeping persisted data:
Rebuild services:
```bash
docker compose down
docker compose up --build -d
```
Reset all persisted runtime data:
Reset runtime data (destructive, removes named volumes):
```bash
docker compose down
rm -rf ./data
docker compose down -v
```
## Handwriting Transcription Notes
## Data Persistence
- Handwriting and image transcription uses an OpenAI compatible vision endpoint.
- Before transcription, images are normalized:
- EXIF rotation is corrected
- long side is resized to a maximum of 2000px
- image is sent as a base64 data URL payload
- Handwriting provider settings are persisted in host storage and survive container restarts.
Runtime state is persisted in Docker named volumes declared in `docker-compose.yml`:
- `db-data`
- `redis-data`
- `dcm-storage`
- `typesense-data`
## API Overview
The application settings file is stored under the storage volume at `/data/storage/settings.json` inside containers.
GET endpoints:
- `GET /api/v1/health`
- `GET /api/v1/documents`
- `GET /api/v1/documents/{document_id}`
- `GET /api/v1/documents/{document_id}/preview`
- `GET /api/v1/documents/{document_id}/download`
- `GET /api/v1/documents/tags`
- `GET /api/v1/search?query=...`
- `GET /api/v1/settings`
## Configuration Notes
Additional endpoints used by the UI:
- `POST /api/v1/documents/upload`
- `PATCH /api/v1/documents/{document_id}`
- `POST /api/v1/documents/{document_id}/reprocess`
- `PATCH /api/v1/settings/handwriting`
- API and worker runtime environment values are configured in `docker-compose.yml` (`DATABASE_URL`, `REDIS_URL`, `STORAGE_ROOT`, `PUBLIC_BASE_URL`, `CORS_ORIGINS`, `TYPESENSE_*`).
- Frontend API target is controlled by `VITE_API_BASE` in the `frontend` service.
- Handwriting, provider, routing, summary, display, and upload defaults are managed through the settings UI and persisted by the backend settings service.
## Manual Validation Checklist
After changes, verify:
- `GET /api/v1/health` returns `{"status":"ok"}`
- upload and processing complete successfully
- search returns expected results
- preview or download works for uploaded documents
- `docker compose logs -f` shows no API or worker failures
## API Surface Summary
Base prefix: `/api/v1`
- Health: `/health`
- Documents: `/documents` (listing, upload, metadata update, lifecycle actions, download and preview, markdown export)
- Search: `/search`
- Processing logs: `/processing/logs`
- Settings: `/settings` and `/settings/handwriting`
See `doc/api-contract.md` for the complete endpoint contract.
## Technical Documentation
- `doc/README.md` - technical documentation index
- `doc/architecture-overview.md` - service and runtime architecture
- `doc/api-contract.md` - HTTP endpoint contract and payload model map
- `doc/data-model-reference.md` - database model reference
- `doc/operations-and-configuration.md` - operations runbook and configuration reference
- `doc/frontend-design-foundation.md` - frontend design system and UI rules