docs: complete repository technical documentation refresh
This commit is contained in:
@@ -1,15 +1,18 @@
|
||||
# Documentation
|
||||
|
||||
This is the documentation entrypoint for DMS.
|
||||
This directory contains technical documentation for DMS.
|
||||
|
||||
## Available Documents
|
||||
## Core References
|
||||
|
||||
- Project setup and operations: `../README.md`
|
||||
- Frontend visual system and compact UI rules: `frontend-design-foundation.md`
|
||||
- Handwriting style implementation plan: `../PLAN.md`
|
||||
- `../README.md` - project overview, setup, and quick operations
|
||||
- `architecture-overview.md` - backend, frontend, and infrastructure architecture
|
||||
- `api-contract.md` - API endpoint contract grouped by route module
|
||||
- `data-model-reference.md` - database entity definitions and lifecycle states
|
||||
- `operations-and-configuration.md` - runtime operations, ports, volumes, and configuration values
|
||||
- `frontend-design-foundation.md` - frontend visual system, tokens, and UI implementation rules
|
||||
|
||||
## Planned Additions
|
||||
## Documentation Rules
|
||||
|
||||
- Architecture overview
|
||||
- Data model reference
|
||||
- API contract details
|
||||
- Keep this file as the documentation index and add new technical documents here.
|
||||
- Update referenced documents whenever behavior, routes, models, or runtime configuration change.
|
||||
- Prefer concise, implementation-backed descriptions with explicit paths to source modules.
|
||||
|
||||
130
doc/api-contract.md
Normal file
130
doc/api-contract.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# API Contract
|
||||
|
||||
Base URL prefix: `/api/v1`
|
||||
|
||||
Primary implementation modules:
|
||||
- `backend/app/api/router.py`
|
||||
- `backend/app/api/routes_health.py`
|
||||
- `backend/app/api/routes_documents.py`
|
||||
- `backend/app/api/routes_search.py`
|
||||
- `backend/app/api/routes_processing_logs.py`
|
||||
- `backend/app/api/routes_settings.py`
|
||||
|
||||
## Health
|
||||
|
||||
- `GET /health`
|
||||
- Purpose: liveness check
|
||||
- Response: `{ "status": "ok" }`
|
||||
|
||||
## Documents
|
||||
|
||||
### Collection and metadata helpers
|
||||
|
||||
- `GET /documents`
|
||||
- Query: `offset`, `limit`, `include_trashed`, `only_trashed`, `path_prefix`, `path_filter`, `tag_filter`, `type_filter`, `processed_from`, `processed_to`
|
||||
- Response model: `DocumentsListResponse`
|
||||
- `GET /documents/tags`
|
||||
- Query: `include_trashed`
|
||||
- Response: `{ "tags": string[] }`
|
||||
- `GET /documents/paths`
|
||||
- Query: `include_trashed`
|
||||
- Response: `{ "paths": string[] }`
|
||||
- `GET /documents/types`
|
||||
- Query: `include_trashed`
|
||||
- Response: `{ "types": string[] }`
|
||||
- `POST /documents/content-md/export`
|
||||
- Body model: `ContentExportRequest`
|
||||
- Response: ZIP stream containing one markdown file per matched document
|
||||
|
||||
### Per-document operations
|
||||
|
||||
- `GET /documents/{document_id}`
|
||||
- Response model: `DocumentDetailResponse`
|
||||
- `GET /documents/{document_id}/download`
|
||||
- Response: original file bytes
|
||||
- `GET /documents/{document_id}/preview`
|
||||
- Response: inline preview stream where browser-supported
|
||||
- `GET /documents/{document_id}/thumbnail`
|
||||
- Response: generated thumbnail image when available
|
||||
- `GET /documents/{document_id}/content-md`
|
||||
- Response: extracted markdown content for one document
|
||||
- `PATCH /documents/{document_id}`
|
||||
- Body model: `DocumentUpdateRequest`
|
||||
- Response model: `DocumentResponse`
|
||||
- `POST /documents/{document_id}/trash`
|
||||
- Response model: `DocumentResponse`
|
||||
- `POST /documents/{document_id}/restore`
|
||||
- Response model: `DocumentResponse`
|
||||
- `DELETE /documents/{document_id}`
|
||||
- Behavior: permanent delete, requires document to be trashed first
|
||||
- Response: deletion counters
|
||||
- `POST /documents/{document_id}/reprocess`
|
||||
- Response model: `DocumentResponse`
|
||||
- Behavior: requeues asynchronous processing task
|
||||
|
||||
### Upload
|
||||
|
||||
- `POST /documents/upload`
|
||||
- Multipart form fields:
|
||||
- `files[]` (required)
|
||||
- `relative_paths[]` (optional)
|
||||
- `logical_path` (optional, defaults to `Inbox`)
|
||||
- `tags` (optional CSV)
|
||||
- `conflict_mode` (`ask`, `replace`, `duplicate`)
|
||||
- Response model: `UploadResponse`
|
||||
- Behavior:
|
||||
- `ask`: returns `conflicts` if duplicate checksum is detected
|
||||
- `replace`: creates new document linked to replaced document id
|
||||
- `duplicate`: creates additional document record
|
||||
|
||||
## Search
|
||||
|
||||
- `GET /search`
|
||||
- Query: `query` (min length 2), `offset`, `limit`, `include_trashed`, `only_trashed`, `path_filter`, `tag_filter`, `type_filter`, `processed_from`, `processed_to`
|
||||
- Response model: `SearchResponse`
|
||||
- Behavior: PostgreSQL full-text and metadata ranking
|
||||
|
||||
## Processing Logs
|
||||
|
||||
- `GET /processing/logs`
|
||||
- Query: `offset`, `limit`, `document_id`
|
||||
- Response model: `ProcessingLogListResponse`
|
||||
- `POST /processing/logs/trim`
|
||||
- Query: `keep_document_sessions`, `keep_unbound_entries`
|
||||
- Response: trim counters
|
||||
- `POST /processing/logs/clear`
|
||||
- Response: clear counters
|
||||
|
||||
## Settings
|
||||
|
||||
- `GET /settings`
|
||||
- Response model: `AppSettingsResponse`
|
||||
- `PATCH /settings`
|
||||
- Body model: `AppSettingsUpdateRequest`
|
||||
- Response model: `AppSettingsResponse`
|
||||
- `POST /settings/reset`
|
||||
- Response model: `AppSettingsResponse`
|
||||
- `PATCH /settings/handwriting`
|
||||
- Body model: `HandwritingSettingsUpdateRequest`
|
||||
- Response model: `AppSettingsResponse`
|
||||
- `GET /settings/handwriting`
|
||||
- Response model: `HandwritingSettingsResponse`
|
||||
|
||||
## Schema Families
|
||||
|
||||
Document schemas in `backend/app/schemas/documents.py`:
|
||||
- `DocumentResponse`
|
||||
- `DocumentDetailResponse`
|
||||
- `DocumentsListResponse`
|
||||
- `UploadConflict`
|
||||
- `UploadResponse`
|
||||
- `DocumentUpdateRequest`
|
||||
- `SearchResponse`
|
||||
- `ContentExportRequest`
|
||||
|
||||
Processing log schemas in `backend/app/schemas/processing_logs.py`:
|
||||
- `ProcessingLogEntryResponse`
|
||||
- `ProcessingLogListResponse`
|
||||
|
||||
Settings schemas in `backend/app/schemas/settings.py`:
|
||||
- Provider, task, upload-default, display, predefined paths or tags, handwriting-style, and legacy handwriting models grouped under `AppSettingsResponse` and `AppSettingsUpdateRequest`.
|
||||
66
doc/architecture-overview.md
Normal file
66
doc/architecture-overview.md
Normal file
@@ -0,0 +1,66 @@
|
||||
# Architecture Overview
|
||||
|
||||
## System Topology
|
||||
|
||||
DMS runs as a multi-service application defined in `docker-compose.yml`:
|
||||
- `frontend` serves the React UI on port `5173`
|
||||
- `api` serves FastAPI on port `8000`
|
||||
- `worker` executes asynchronous extraction and indexing jobs
|
||||
- `db` provides PostgreSQL persistence on port `5432`
|
||||
- `redis` backs queueing on port `6379`
|
||||
- `typesense` stores search index and vector-adjacent metadata on port `8108`
|
||||
|
||||
## Backend Architecture
|
||||
|
||||
Backend source root: `backend/app/`
|
||||
|
||||
Main boundaries:
|
||||
- `api/` route handlers and HTTP contract
|
||||
- `services/` domain logic (storage, extraction, routing, settings, processing logs, Typesense)
|
||||
- `db/` SQLAlchemy base, engine, and session lifecycle
|
||||
- `models/` persistence entities (`Document`, `ProcessingLogEntry`)
|
||||
- `schemas/` Pydantic response and request schemas
|
||||
- `worker/` RQ queue integration and background processing tasks
|
||||
|
||||
Application bootstrap in `backend/app/main.py`:
|
||||
- mounts routers under `/api/v1`
|
||||
- configures CORS from settings
|
||||
- initializes storage, settings, database schema, and Typesense collection on startup
|
||||
|
||||
## Processing Lifecycle
|
||||
|
||||
1. Upload starts at `POST /api/v1/documents/upload`.
|
||||
2. API stores file bytes and inserts document rows with status `queued`.
|
||||
3. API enqueues `app.worker.tasks.process_document_task` into Redis.
|
||||
4. Worker extracts content and metadata, handles ZIP expansion, runs OCR and routing suggestions, and writes processing logs.
|
||||
5. Worker updates database fields, document status, and search index entries.
|
||||
6. UI polls for documents and processing logs to reflect progress.
|
||||
|
||||
## Frontend Architecture
|
||||
|
||||
Frontend source root: `frontend/src/`
|
||||
|
||||
Core structure:
|
||||
- `App.tsx` orchestrates screen switching, state, polling, and action flows
|
||||
- `components/` contains upload, filter, grid, viewer, modal, settings, and log panel modules
|
||||
- `lib/api.ts` centralizes API client calls
|
||||
- `types.ts` defines typed API contracts used by components
|
||||
- `design-foundation.css` and `styles.css` define design tokens and global/component styling
|
||||
|
||||
Main user flows:
|
||||
- Upload and conflict resolution
|
||||
- Search and filtered document browsing
|
||||
- Metadata editing and lifecycle actions (trash, restore, delete, reprocess)
|
||||
- Settings management for providers, tasks, and UI defaults
|
||||
- Processing log review
|
||||
|
||||
## Persistence and State
|
||||
|
||||
Persistent data:
|
||||
- PostgreSQL stores document metadata and processing logs
|
||||
- Docker volume-backed storage keeps original files, previews, and settings JSON
|
||||
- Typesense stores indexed search representations
|
||||
|
||||
Transient runtime state:
|
||||
- Redis queues processing tasks and worker execution state
|
||||
- frontend local component state drives active filters, selection, and modal flows
|
||||
53
doc/data-model-reference.md
Normal file
53
doc/data-model-reference.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# Data Model Reference
|
||||
|
||||
Primary SQLAlchemy models are defined in `backend/app/models/`.
|
||||
|
||||
## documents
|
||||
|
||||
Model: `Document` in `backend/app/models/document.py`
|
||||
|
||||
Purpose:
|
||||
- Stores source file identity, storage location, extracted content, lifecycle status, and classification metadata.
|
||||
|
||||
Core fields:
|
||||
- Identity and source: `id`, `original_filename`, `source_relative_path`, `stored_relative_path`
|
||||
- File attributes: `mime_type`, `extension`, `sha256`, `size_bytes`
|
||||
- Organization: `logical_path`, `suggested_path`, `tags`, `suggested_tags`
|
||||
- Processing outputs: `extracted_text`, `image_text_type`, `handwriting_style_id`, `preview_available`
|
||||
- Lifecycle and relations: `status`, `is_archive_member`, `archived_member_path`, `parent_document_id`, `replaces_document_id`
|
||||
- Metadata and timestamps: `metadata_json`, `created_at`, `processed_at`, `updated_at`
|
||||
|
||||
Enum `DocumentStatus`:
|
||||
- `queued`
|
||||
- `processed`
|
||||
- `unsupported`
|
||||
- `error`
|
||||
- `trashed`
|
||||
|
||||
Relationships:
|
||||
- Self-referential `parent_document` relationship for archive extraction trees.
|
||||
|
||||
## processing_logs
|
||||
|
||||
Model: `ProcessingLogEntry` in `backend/app/models/processing_log.py`
|
||||
|
||||
Purpose:
|
||||
- Stores timestamped pipeline events for upload, extraction, OCR, routing, indexing, and errors.
|
||||
|
||||
Core fields:
|
||||
- Event identity and timing: `id`, `created_at`
|
||||
- Event classification: `level`, `stage`, `event`
|
||||
- Document linkage: `document_id`, `document_filename`
|
||||
- Model context: `provider_id`, `model_name`
|
||||
- Prompt or response traces: `prompt_text`, `response_text`
|
||||
- Structured event payload: `payload_json`
|
||||
|
||||
Foreign keys:
|
||||
- `document_id` references `documents.id` with `ON DELETE SET NULL`.
|
||||
|
||||
## Model Lifecycle Notes
|
||||
|
||||
- Upload inserts a `Document` row in `queued` state and enqueues background processing.
|
||||
- Worker updates extraction results and final status (`processed`, `unsupported`, or `error`).
|
||||
- Trash and restore operations toggle `status` while preserving source files until permanent delete.
|
||||
- Permanent delete removes the document tree (including archive descendants) and associated stored files.
|
||||
111
doc/operations-and-configuration.md
Normal file
111
doc/operations-and-configuration.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# Operations And Configuration
|
||||
|
||||
## Runtime Services
|
||||
|
||||
`docker-compose.yml` defines the runtime stack:
|
||||
- `db` (Postgres 16, port `5432`)
|
||||
- `redis` (Redis 7, port `6379`)
|
||||
- `typesense` (Typesense 29, port `8108`)
|
||||
- `api` (FastAPI backend, port `8000`)
|
||||
- `worker` (RQ background worker)
|
||||
- `frontend` (Vite UI, port `5173`)
|
||||
|
||||
## Named Volumes
|
||||
|
||||
Persistent volumes:
|
||||
- `db-data`
|
||||
- `redis-data`
|
||||
- `dcm-storage`
|
||||
- `typesense-data`
|
||||
|
||||
Reset all persisted runtime data:
|
||||
|
||||
```bash
|
||||
docker compose down -v
|
||||
```
|
||||
|
||||
## Operational Commands
|
||||
|
||||
Start or rebuild stack:
|
||||
|
||||
```bash
|
||||
docker compose up --build -d
|
||||
```
|
||||
|
||||
Stop stack:
|
||||
|
||||
```bash
|
||||
docker compose down
|
||||
```
|
||||
|
||||
Tail logs:
|
||||
|
||||
```bash
|
||||
docker compose logs -f
|
||||
```
|
||||
|
||||
## Backend Configuration
|
||||
|
||||
Settings source:
|
||||
- Runtime settings class: `backend/app/core/config.py`
|
||||
- API settings persistence: `backend/app/services/app_settings.py`
|
||||
|
||||
Key environment variables used by `api` and `worker` in compose:
|
||||
- `APP_ENV`
|
||||
- `DATABASE_URL`
|
||||
- `REDIS_URL`
|
||||
- `STORAGE_ROOT`
|
||||
- `PUBLIC_BASE_URL`
|
||||
- `CORS_ORIGINS` (API service)
|
||||
- `TYPESENSE_PROTOCOL`
|
||||
- `TYPESENSE_HOST`
|
||||
- `TYPESENSE_PORT`
|
||||
- `TYPESENSE_API_KEY`
|
||||
- `TYPESENSE_COLLECTION_NAME`
|
||||
|
||||
Selected defaults from `Settings` (`backend/app/core/config.py`):
|
||||
- `upload_chunk_size = 4194304`
|
||||
- `max_zip_members = 250`
|
||||
- `max_zip_depth = 2`
|
||||
- `max_text_length = 500000`
|
||||
- `default_openai_model = "gpt-4.1-mini"`
|
||||
- `default_openai_timeout_seconds = 45`
|
||||
- `default_summary_model = "gpt-4.1-mini"`
|
||||
- `default_routing_model = "gpt-4.1-mini"`
|
||||
- `typesense_timeout_seconds = 120`
|
||||
- `typesense_num_retries = 0`
|
||||
|
||||
## Frontend Configuration
|
||||
|
||||
Frontend runtime API target:
|
||||
- `VITE_API_BASE` in `docker-compose.yml` frontend service
|
||||
|
||||
Frontend local commands:
|
||||
|
||||
```bash
|
||||
cd frontend && npm run dev
|
||||
cd frontend && npm run build
|
||||
cd frontend && npm run preview
|
||||
```
|
||||
|
||||
## Settings Persistence
|
||||
|
||||
Application-level settings managed from the UI are persisted by backend settings service:
|
||||
- file path: `<STORAGE_ROOT>/settings.json`
|
||||
- endpoints: `/api/v1/settings`, `/api/v1/settings/reset`, `/api/v1/settings/handwriting`
|
||||
|
||||
Settings include:
|
||||
- upload defaults
|
||||
- display options
|
||||
- provider configuration
|
||||
- OCR, summary, and routing task settings
|
||||
- predefined paths and tags
|
||||
- handwriting-style clustering settings
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
After operational or configuration changes, verify:
|
||||
- `GET /api/v1/health` is healthy
|
||||
- frontend can list, upload, and search documents
|
||||
- processing worker logs show successful task execution
|
||||
- settings save or reset works and persists after restart
|
||||
Reference in New Issue
Block a user