# Data Model Reference Primary SQLAlchemy models are defined in `backend/app/models/`. ## app_users Model: `AppUser` in `backend/app/models/auth.py` Purpose: - Stores authenticatable user identities for session-based API access. Core fields: - Identity and credentials: `id`, `username`, `password_hash` - Authorization and lifecycle: `role`, `is_active` - Audit timestamps: `created_at`, `updated_at` Enum `UserRole`: - `admin` - `user` ## auth_sessions Model: `AuthSession` in `backend/app/models/auth.py` Purpose: - Stores issued bearer sessions linked to user identities. Core fields: - Identity and linkage: `id`, `user_id`, `token_hash` - Session lifecycle: `expires_at`, `revoked_at` - Request context: `user_agent`, `ip_address` - Audit timestamps: `created_at`, `updated_at` Foreign keys: - `user_id` references `app_users.id` with `ON DELETE CASCADE`. ## documents Model: `Document` in `backend/app/models/document.py` Purpose: - Stores source file identity, storage location, extracted content, lifecycle status, and classification metadata. Core fields: - Identity and source: `id`, `original_filename`, `source_relative_path`, `stored_relative_path` - File attributes: `mime_type`, `extension`, `sha256`, `size_bytes` - Ownership and organization: `owner_user_id`, `logical_path`, `suggested_path`, `tags`, `suggested_tags` - Processing outputs: `extracted_text`, `image_text_type`, `handwriting_style_id`, `preview_available` - Lifecycle and relations: `status`, `is_archive_member`, `archived_member_path`, `parent_document_id`, `replaces_document_id` - Metadata and timestamps: `metadata_json`, `created_at`, `processed_at`, `updated_at` Enum `DocumentStatus`: - `queued` - `processed` - `unsupported` - `error` - `trashed` Foreign keys: - `owner_user_id` references `app_users.id` with `ON DELETE SET NULL`. Relationships: - Self-referential `parent_document` relationship for archive extraction trees. - `owner_user` relationship to `AppUser`. ## processing_logs Model: `ProcessingLogEntry` in `backend/app/models/processing_log.py` Purpose: - Stores timestamped pipeline events for upload, extraction, OCR, routing, indexing, and errors. Core fields: - Event identity and timing: `id`, `created_at` - Event classification: `level`, `stage`, `event` - Document linkage: `document_id`, `document_filename` - Model context: `provider_id`, `model_name` - Prompt or response traces: `prompt_text`, `response_text` - Structured event payload: `payload_json` Foreign keys: - `document_id` references `documents.id` with `ON DELETE SET NULL`. ## Model Lifecycle Notes - API startup initializes schema and creates or refreshes bootstrap users from auth environment variables. - `POST /auth/login` validates `AppUser` credentials, creates `AuthSession` with hashed token, and returns bearer token once. - Upload inserts `Document` row in `queued` state, assigns `owner_user_id`, and enqueues background processing. - Worker updates extraction results and final status (`processed`, `unsupported`, or `error`), preserving ownership on archive descendants. - User-role queries are owner-scoped; admin-role queries can access all documents. - Trash and restore operations toggle `status` while preserving source files until permanent delete. - Permanent delete removes the document tree (including archive descendants) and associated stored files.