docs: complete repository technical documentation refresh
This commit is contained in:
53
doc/data-model-reference.md
Normal file
53
doc/data-model-reference.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# Data Model Reference
|
||||
|
||||
Primary SQLAlchemy models are defined in `backend/app/models/`.
|
||||
|
||||
## documents
|
||||
|
||||
Model: `Document` in `backend/app/models/document.py`
|
||||
|
||||
Purpose:
|
||||
- Stores source file identity, storage location, extracted content, lifecycle status, and classification metadata.
|
||||
|
||||
Core fields:
|
||||
- Identity and source: `id`, `original_filename`, `source_relative_path`, `stored_relative_path`
|
||||
- File attributes: `mime_type`, `extension`, `sha256`, `size_bytes`
|
||||
- Organization: `logical_path`, `suggested_path`, `tags`, `suggested_tags`
|
||||
- Processing outputs: `extracted_text`, `image_text_type`, `handwriting_style_id`, `preview_available`
|
||||
- Lifecycle and relations: `status`, `is_archive_member`, `archived_member_path`, `parent_document_id`, `replaces_document_id`
|
||||
- Metadata and timestamps: `metadata_json`, `created_at`, `processed_at`, `updated_at`
|
||||
|
||||
Enum `DocumentStatus`:
|
||||
- `queued`
|
||||
- `processed`
|
||||
- `unsupported`
|
||||
- `error`
|
||||
- `trashed`
|
||||
|
||||
Relationships:
|
||||
- Self-referential `parent_document` relationship for archive extraction trees.
|
||||
|
||||
## processing_logs
|
||||
|
||||
Model: `ProcessingLogEntry` in `backend/app/models/processing_log.py`
|
||||
|
||||
Purpose:
|
||||
- Stores timestamped pipeline events for upload, extraction, OCR, routing, indexing, and errors.
|
||||
|
||||
Core fields:
|
||||
- Event identity and timing: `id`, `created_at`
|
||||
- Event classification: `level`, `stage`, `event`
|
||||
- Document linkage: `document_id`, `document_filename`
|
||||
- Model context: `provider_id`, `model_name`
|
||||
- Prompt or response traces: `prompt_text`, `response_text`
|
||||
- Structured event payload: `payload_json`
|
||||
|
||||
Foreign keys:
|
||||
- `document_id` references `documents.id` with `ON DELETE SET NULL`.
|
||||
|
||||
## Model Lifecycle Notes
|
||||
|
||||
- Upload inserts a `Document` row in `queued` state and enqueues background processing.
|
||||
- Worker updates extraction results and final status (`processed`, `unsupported`, or `error`).
|
||||
- Trash and restore operations toggle `status` while preserving source files until permanent delete.
|
||||
- Permanent delete removes the document tree (including archive descendants) and associated stored files.
|
||||
Reference in New Issue
Block a user