Files
ledgerdock/doc/api-contract.md

205 lines
8.0 KiB
Markdown

# API Contract
Base URL prefix: `/api/v1`
Primary implementation modules:
- `backend/app/api/router.py`
- `backend/app/api/routes_auth.py`
- `backend/app/api/routes_health.py`
- `backend/app/api/routes_documents.py`
- `backend/app/api/routes_search.py`
- `backend/app/api/routes_processing_logs.py`
- `backend/app/api/routes_settings.py`
## Authentication And Authorization
- Authentication is cookie-based session auth with a server-issued hashed session token.
- Clients authenticate with `POST /auth/login` using username and password.
- Backend issues a server-stored session token and sets `HttpOnly` `dcm_session` and readable `dcm_csrf` cookies.
- Login brute-force protection enforces Redis-backed throttle checks keyed by username and source IP.
- State-changing requests from browser clients must send `x-csrf-token: <dcm_csrf>` in request headers (double-submit pattern).
- For non-browser API clients, the optional `Authorization: Bearer <token>` path remains supported when the token is sent explicitly.
- `GET /auth/me` returns current identity and role.
- `POST /auth/logout` revokes current session token.
Role matrix:
- `documents/*`: `admin` or `user`
- `search/*`: `admin` or `user`
- `settings/*`: `admin` only
- `processing/logs/*`: `admin` only
Ownership rules:
- `user` role is restricted to its own documents.
- `admin` role can access all documents.
## Auth
- `POST /auth/login`
- Body model: `AuthLoginRequest`
- Response model: `AuthLoginResponse`
- Additional responses:
- `401` for invalid credentials
- `429` for throttled login attempts, with stable message and `Retry-After` header
- `503` when the login rate-limiter backend is unavailable
- `GET /auth/me`
- Response model: `AuthSessionResponse`
- `POST /auth/logout`
- Response model: `AuthLogoutResponse`
## Health
- `GET /health`
- Purpose: liveness check
- Response: `{ "status": "ok" }`
## Documents
### Collection and metadata helpers
- `GET /documents`
- Query: `offset`, `limit`, `include_trashed`, `only_trashed`, `path_prefix`, `path_filter`, `tag_filter`, `type_filter`, `processed_from`, `processed_to`
- Response model: `DocumentsListResponse`
- `GET /documents/tags`
- Query: `include_trashed`
- Response: `{ "tags": string[] }`
- Behavior:
- all document-assigned tags visible to caller scope are included
- predefined tags are role-filtered: `admin` receives full catalog, `user` receives only entries with `global_shared=true`
- `GET /documents/paths`
- Query: `include_trashed`
- Response: `{ "paths": string[] }`
- Behavior:
- all document-assigned logical paths visible to caller scope are included
- predefined paths are role-filtered: `admin` receives full catalog, `user` receives only entries with `global_shared=true`
- `GET /documents/types`
- Query: `include_trashed`
- Response: `{ "types": string[] }`
- `POST /documents/content-md/export`
- Body model: `ContentExportRequest`
- Response: ZIP stream containing one markdown file per matched document
- Limits:
- hard cap on matched document count (`CONTENT_EXPORT_MAX_DOCUMENTS`)
- hard cap on cumulative markdown bytes (`CONTENT_EXPORT_MAX_TOTAL_BYTES`)
- per-user rate limit (`CONTENT_EXPORT_RATE_LIMIT_PER_MINUTE`)
- Behavior: archive is streamed from spool file instead of unbounded in-memory buffer
### Per-document operations
- `GET /documents/{document_id}`
- Response model: `DocumentDetailResponse`
- `GET /documents/{document_id}/download`
- Response: original file bytes
- `GET /documents/{document_id}/preview`
- Response: inline preview stream only for safe MIME types
- Behavior: script-capable MIME types are forced to attachment responses with `X-Content-Type-Options: nosniff`
- `GET /documents/{document_id}/thumbnail`
- Response: generated thumbnail image when available
- `GET /documents/{document_id}/content-md`
- Response: extracted markdown content for one document
- `PATCH /documents/{document_id}`
- Body model: `DocumentUpdateRequest`
- Response model: `DocumentResponse`
- `POST /documents/{document_id}/trash`
- Response model: `DocumentResponse`
- `POST /documents/{document_id}/restore`
- Response model: `DocumentResponse`
- `DELETE /documents/{document_id}`
- Behavior: permanent delete, requires document to be trashed first
- Response: deletion counters
- `POST /documents/{document_id}/reprocess`
- Response model: `DocumentResponse`
- Behavior: requeues asynchronous processing task
### Upload
- `POST /documents/upload`
- Multipart form fields:
- `files[]` (required)
- `relative_paths[]` (optional)
- `logical_path` (optional, defaults to `Inbox`)
- `tags` (optional CSV)
- `conflict_mode` (`ask`, `replace`, `duplicate`)
- Response model: `UploadResponse`
- Behavior:
- `ask`: returns `conflicts` if duplicate checksum is detected for caller-visible documents
- `replace`: creates new document linked to replaced document id
- `duplicate`: creates additional document record
- upload `POST` request rejected with `411` when `Content-Length` is missing
- `OPTIONS /documents/upload` CORS preflight bypasses upload `Content-Length` enforcement
- request rejected with `413` when file count, per-file size, or total request size exceeds configured limits
## Search
- `GET /search`
- Query: `query` (min length 2), `offset`, `limit`, `include_trashed`, `only_trashed`, `path_filter`, `tag_filter`, `type_filter`, `processed_from`, `processed_to`
- Response model: `SearchResponse`
- Behavior: PostgreSQL full-text and metadata ranking with role-based ownership scope
## Processing Logs
- Access: admin only
- `GET /processing/logs`
- Query: `offset`, `limit`, `document_id`
- Response model: `ProcessingLogListResponse`
- `limit` is capped by runtime configuration
- sensitive fields are redacted in API responses
- `POST /processing/logs/trim`
- Query: optional `keep_document_sessions`, `keep_unbound_entries`
- Behavior: omitted query values fall back to persisted `/settings.processing_log_retention`
- query values are capped by runtime retention limits
- Response: trim counters
- `POST /processing/logs/clear`
- Response: clear counters
Persistence mode:
- default is metadata-only logging (`PROCESSING_LOG_STORE_MODEL_IO_TEXT=false`, `PROCESSING_LOG_STORE_PAYLOAD_TEXT=false`)
- full prompt/response or payload content storage requires explicit operator opt-in
## Settings
- Access: admin only
- `GET /settings`
- Response model: `AppSettingsResponse`
- persisted providers with invalid base URLs are ignored during read sanitization; response falls back to remaining valid providers or secure defaults
- provider API keys are exposed only as `api_key_set` and `api_key_masked`
- `PATCH /settings`
- Body model: `AppSettingsUpdateRequest`
- Response model: `AppSettingsResponse`
- rejects invalid provider base URLs with `400` when scheme, allowlist, or network safety checks fail
- provider API keys are persisted encrypted at rest (`api_key_encrypted`) and plaintext keys are not written to storage
- `POST /settings/reset`
- Response model: `AppSettingsResponse`
- `PATCH /settings/handwriting`
- Body model: `HandwritingSettingsUpdateRequest`
- Response model: `AppSettingsResponse`
- `GET /settings/handwriting`
- Response model: `HandwritingSettingsResponse`
## Schema Families
Auth schemas in `backend/app/schemas/auth.py`:
- `AuthLoginRequest`
- `AuthUserResponse`
- `AuthSessionResponse`
- `AuthLoginResponse`
- `AuthLogoutResponse`
Document schemas in `backend/app/schemas/documents.py`:
- `DocumentResponse`
- `DocumentDetailResponse`
- `DocumentsListResponse`
- `UploadConflict`
- `UploadResponse`
- `DocumentUpdateRequest`
- `SearchResponse`
- `ContentExportRequest`
Processing log schemas in `backend/app/schemas/processing_logs.py`:
- `ProcessingLogEntryResponse`
- `ProcessingLogListResponse`
Settings schemas in `backend/app/schemas/settings.py`:
- provider, task, upload-default, display, processing-log retention, predefined paths or tags, handwriting-style, and legacy handwriting models grouped under `AppSettingsResponse` and `AppSettingsUpdateRequest`.