docs: complete repository technical documentation refresh

This commit is contained in:
2026-02-21 11:27:44 -03:00
parent 17dccbbe20
commit 6501841426
6 changed files with 454 additions and 88 deletions

View File

@@ -1,15 +1,18 @@
# Documentation
This is the documentation entrypoint for DMS.
This directory contains technical documentation for DMS.
## Available Documents
## Core References
- Project setup and operations: `../README.md`
- Frontend visual system and compact UI rules: `frontend-design-foundation.md`
- Handwriting style implementation plan: `../PLAN.md`
- `../README.md` - project overview, setup, and quick operations
- `architecture-overview.md` - backend, frontend, and infrastructure architecture
- `api-contract.md` - API endpoint contract grouped by route module
- `data-model-reference.md` - database entity definitions and lifecycle states
- `operations-and-configuration.md` - runtime operations, ports, volumes, and configuration values
- `frontend-design-foundation.md` - frontend visual system, tokens, and UI implementation rules
## Planned Additions
## Documentation Rules
- Architecture overview
- Data model reference
- API contract details
- Keep this file as the documentation index and add new technical documents here.
- Update referenced documents whenever behavior, routes, models, or runtime configuration change.
- Prefer concise, implementation-backed descriptions with explicit paths to source modules.

130
doc/api-contract.md Normal file
View File

@@ -0,0 +1,130 @@
# API Contract
Base URL prefix: `/api/v1`
Primary implementation modules:
- `backend/app/api/router.py`
- `backend/app/api/routes_health.py`
- `backend/app/api/routes_documents.py`
- `backend/app/api/routes_search.py`
- `backend/app/api/routes_processing_logs.py`
- `backend/app/api/routes_settings.py`
## Health
- `GET /health`
- Purpose: liveness check
- Response: `{ "status": "ok" }`
## Documents
### Collection and metadata helpers
- `GET /documents`
- Query: `offset`, `limit`, `include_trashed`, `only_trashed`, `path_prefix`, `path_filter`, `tag_filter`, `type_filter`, `processed_from`, `processed_to`
- Response model: `DocumentsListResponse`
- `GET /documents/tags`
- Query: `include_trashed`
- Response: `{ "tags": string[] }`
- `GET /documents/paths`
- Query: `include_trashed`
- Response: `{ "paths": string[] }`
- `GET /documents/types`
- Query: `include_trashed`
- Response: `{ "types": string[] }`
- `POST /documents/content-md/export`
- Body model: `ContentExportRequest`
- Response: ZIP stream containing one markdown file per matched document
### Per-document operations
- `GET /documents/{document_id}`
- Response model: `DocumentDetailResponse`
- `GET /documents/{document_id}/download`
- Response: original file bytes
- `GET /documents/{document_id}/preview`
- Response: inline preview stream where browser-supported
- `GET /documents/{document_id}/thumbnail`
- Response: generated thumbnail image when available
- `GET /documents/{document_id}/content-md`
- Response: extracted markdown content for one document
- `PATCH /documents/{document_id}`
- Body model: `DocumentUpdateRequest`
- Response model: `DocumentResponse`
- `POST /documents/{document_id}/trash`
- Response model: `DocumentResponse`
- `POST /documents/{document_id}/restore`
- Response model: `DocumentResponse`
- `DELETE /documents/{document_id}`
- Behavior: permanent delete, requires document to be trashed first
- Response: deletion counters
- `POST /documents/{document_id}/reprocess`
- Response model: `DocumentResponse`
- Behavior: requeues asynchronous processing task
### Upload
- `POST /documents/upload`
- Multipart form fields:
- `files[]` (required)
- `relative_paths[]` (optional)
- `logical_path` (optional, defaults to `Inbox`)
- `tags` (optional CSV)
- `conflict_mode` (`ask`, `replace`, `duplicate`)
- Response model: `UploadResponse`
- Behavior:
- `ask`: returns `conflicts` if duplicate checksum is detected
- `replace`: creates new document linked to replaced document id
- `duplicate`: creates additional document record
## Search
- `GET /search`
- Query: `query` (min length 2), `offset`, `limit`, `include_trashed`, `only_trashed`, `path_filter`, `tag_filter`, `type_filter`, `processed_from`, `processed_to`
- Response model: `SearchResponse`
- Behavior: PostgreSQL full-text and metadata ranking
## Processing Logs
- `GET /processing/logs`
- Query: `offset`, `limit`, `document_id`
- Response model: `ProcessingLogListResponse`
- `POST /processing/logs/trim`
- Query: `keep_document_sessions`, `keep_unbound_entries`
- Response: trim counters
- `POST /processing/logs/clear`
- Response: clear counters
## Settings
- `GET /settings`
- Response model: `AppSettingsResponse`
- `PATCH /settings`
- Body model: `AppSettingsUpdateRequest`
- Response model: `AppSettingsResponse`
- `POST /settings/reset`
- Response model: `AppSettingsResponse`
- `PATCH /settings/handwriting`
- Body model: `HandwritingSettingsUpdateRequest`
- Response model: `AppSettingsResponse`
- `GET /settings/handwriting`
- Response model: `HandwritingSettingsResponse`
## Schema Families
Document schemas in `backend/app/schemas/documents.py`:
- `DocumentResponse`
- `DocumentDetailResponse`
- `DocumentsListResponse`
- `UploadConflict`
- `UploadResponse`
- `DocumentUpdateRequest`
- `SearchResponse`
- `ContentExportRequest`
Processing log schemas in `backend/app/schemas/processing_logs.py`:
- `ProcessingLogEntryResponse`
- `ProcessingLogListResponse`
Settings schemas in `backend/app/schemas/settings.py`:
- Provider, task, upload-default, display, predefined paths or tags, handwriting-style, and legacy handwriting models grouped under `AppSettingsResponse` and `AppSettingsUpdateRequest`.

View File

@@ -0,0 +1,66 @@
# Architecture Overview
## System Topology
DMS runs as a multi-service application defined in `docker-compose.yml`:
- `frontend` serves the React UI on port `5173`
- `api` serves FastAPI on port `8000`
- `worker` executes asynchronous extraction and indexing jobs
- `db` provides PostgreSQL persistence on port `5432`
- `redis` backs queueing on port `6379`
- `typesense` stores search index and vector-adjacent metadata on port `8108`
## Backend Architecture
Backend source root: `backend/app/`
Main boundaries:
- `api/` route handlers and HTTP contract
- `services/` domain logic (storage, extraction, routing, settings, processing logs, Typesense)
- `db/` SQLAlchemy base, engine, and session lifecycle
- `models/` persistence entities (`Document`, `ProcessingLogEntry`)
- `schemas/` Pydantic response and request schemas
- `worker/` RQ queue integration and background processing tasks
Application bootstrap in `backend/app/main.py`:
- mounts routers under `/api/v1`
- configures CORS from settings
- initializes storage, settings, database schema, and Typesense collection on startup
## Processing Lifecycle
1. Upload starts at `POST /api/v1/documents/upload`.
2. API stores file bytes and inserts document rows with status `queued`.
3. API enqueues `app.worker.tasks.process_document_task` into Redis.
4. Worker extracts content and metadata, handles ZIP expansion, runs OCR and routing suggestions, and writes processing logs.
5. Worker updates database fields, document status, and search index entries.
6. UI polls for documents and processing logs to reflect progress.
## Frontend Architecture
Frontend source root: `frontend/src/`
Core structure:
- `App.tsx` orchestrates screen switching, state, polling, and action flows
- `components/` contains upload, filter, grid, viewer, modal, settings, and log panel modules
- `lib/api.ts` centralizes API client calls
- `types.ts` defines typed API contracts used by components
- `design-foundation.css` and `styles.css` define design tokens and global/component styling
Main user flows:
- Upload and conflict resolution
- Search and filtered document browsing
- Metadata editing and lifecycle actions (trash, restore, delete, reprocess)
- Settings management for providers, tasks, and UI defaults
- Processing log review
## Persistence and State
Persistent data:
- PostgreSQL stores document metadata and processing logs
- Docker volume-backed storage keeps original files, previews, and settings JSON
- Typesense stores indexed search representations
Transient runtime state:
- Redis queues processing tasks and worker execution state
- frontend local component state drives active filters, selection, and modal flows

View File

@@ -0,0 +1,53 @@
# Data Model Reference
Primary SQLAlchemy models are defined in `backend/app/models/`.
## documents
Model: `Document` in `backend/app/models/document.py`
Purpose:
- Stores source file identity, storage location, extracted content, lifecycle status, and classification metadata.
Core fields:
- Identity and source: `id`, `original_filename`, `source_relative_path`, `stored_relative_path`
- File attributes: `mime_type`, `extension`, `sha256`, `size_bytes`
- Organization: `logical_path`, `suggested_path`, `tags`, `suggested_tags`
- Processing outputs: `extracted_text`, `image_text_type`, `handwriting_style_id`, `preview_available`
- Lifecycle and relations: `status`, `is_archive_member`, `archived_member_path`, `parent_document_id`, `replaces_document_id`
- Metadata and timestamps: `metadata_json`, `created_at`, `processed_at`, `updated_at`
Enum `DocumentStatus`:
- `queued`
- `processed`
- `unsupported`
- `error`
- `trashed`
Relationships:
- Self-referential `parent_document` relationship for archive extraction trees.
## processing_logs
Model: `ProcessingLogEntry` in `backend/app/models/processing_log.py`
Purpose:
- Stores timestamped pipeline events for upload, extraction, OCR, routing, indexing, and errors.
Core fields:
- Event identity and timing: `id`, `created_at`
- Event classification: `level`, `stage`, `event`
- Document linkage: `document_id`, `document_filename`
- Model context: `provider_id`, `model_name`
- Prompt or response traces: `prompt_text`, `response_text`
- Structured event payload: `payload_json`
Foreign keys:
- `document_id` references `documents.id` with `ON DELETE SET NULL`.
## Model Lifecycle Notes
- Upload inserts a `Document` row in `queued` state and enqueues background processing.
- Worker updates extraction results and final status (`processed`, `unsupported`, or `error`).
- Trash and restore operations toggle `status` while preserving source files until permanent delete.
- Permanent delete removes the document tree (including archive descendants) and associated stored files.

View File

@@ -0,0 +1,111 @@
# Operations And Configuration
## Runtime Services
`docker-compose.yml` defines the runtime stack:
- `db` (Postgres 16, port `5432`)
- `redis` (Redis 7, port `6379`)
- `typesense` (Typesense 29, port `8108`)
- `api` (FastAPI backend, port `8000`)
- `worker` (RQ background worker)
- `frontend` (Vite UI, port `5173`)
## Named Volumes
Persistent volumes:
- `db-data`
- `redis-data`
- `dcm-storage`
- `typesense-data`
Reset all persisted runtime data:
```bash
docker compose down -v
```
## Operational Commands
Start or rebuild stack:
```bash
docker compose up --build -d
```
Stop stack:
```bash
docker compose down
```
Tail logs:
```bash
docker compose logs -f
```
## Backend Configuration
Settings source:
- Runtime settings class: `backend/app/core/config.py`
- API settings persistence: `backend/app/services/app_settings.py`
Key environment variables used by `api` and `worker` in compose:
- `APP_ENV`
- `DATABASE_URL`
- `REDIS_URL`
- `STORAGE_ROOT`
- `PUBLIC_BASE_URL`
- `CORS_ORIGINS` (API service)
- `TYPESENSE_PROTOCOL`
- `TYPESENSE_HOST`
- `TYPESENSE_PORT`
- `TYPESENSE_API_KEY`
- `TYPESENSE_COLLECTION_NAME`
Selected defaults from `Settings` (`backend/app/core/config.py`):
- `upload_chunk_size = 4194304`
- `max_zip_members = 250`
- `max_zip_depth = 2`
- `max_text_length = 500000`
- `default_openai_model = "gpt-4.1-mini"`
- `default_openai_timeout_seconds = 45`
- `default_summary_model = "gpt-4.1-mini"`
- `default_routing_model = "gpt-4.1-mini"`
- `typesense_timeout_seconds = 120`
- `typesense_num_retries = 0`
## Frontend Configuration
Frontend runtime API target:
- `VITE_API_BASE` in `docker-compose.yml` frontend service
Frontend local commands:
```bash
cd frontend && npm run dev
cd frontend && npm run build
cd frontend && npm run preview
```
## Settings Persistence
Application-level settings managed from the UI are persisted by backend settings service:
- file path: `<STORAGE_ROOT>/settings.json`
- endpoints: `/api/v1/settings`, `/api/v1/settings/reset`, `/api/v1/settings/handwriting`
Settings include:
- upload defaults
- display options
- provider configuration
- OCR, summary, and routing task settings
- predefined paths and tags
- handwriting-style clustering settings
## Validation Checklist
After operational or configuration changes, verify:
- `GET /api/v1/health` is healthy
- frontend can list, upload, and search documents
- processing worker logs show successful task execution
- settings save or reset works and persists after restart