Files
ledgerdock/doc/data-model-reference.md

3.3 KiB

Data Model Reference

Primary SQLAlchemy models are defined in backend/app/models/.

app_users

Model: AppUser in backend/app/models/auth.py

Purpose:

  • Stores authenticatable user identities for session-based API access.

Core fields:

  • Identity and credentials: id, username, password_hash
  • Authorization and lifecycle: role, is_active
  • Audit timestamps: created_at, updated_at

Enum UserRole:

  • admin
  • user

auth_sessions

Model: AuthSession in backend/app/models/auth.py

Purpose:

  • Stores issued bearer sessions linked to user identities.

Core fields:

  • Identity and linkage: id, user_id, token_hash
  • Session lifecycle: expires_at, revoked_at
  • Request context: user_agent, ip_address
  • Audit timestamps: created_at, updated_at

Foreign keys:

  • user_id references app_users.id with ON DELETE CASCADE.

documents

Model: Document in backend/app/models/document.py

Purpose:

  • Stores source file identity, storage location, extracted content, lifecycle status, and classification metadata.

Core fields:

  • Identity and source: id, original_filename, source_relative_path, stored_relative_path
  • File attributes: mime_type, extension, sha256, size_bytes
  • Ownership and organization: owner_user_id, logical_path, suggested_path, tags, suggested_tags
  • Processing outputs: extracted_text, image_text_type, handwriting_style_id, preview_available
  • Lifecycle and relations: status, is_archive_member, archived_member_path, parent_document_id, replaces_document_id
  • Metadata and timestamps: metadata_json, created_at, processed_at, updated_at

Enum DocumentStatus:

  • queued
  • processed
  • unsupported
  • error
  • trashed

Foreign keys:

  • owner_user_id references app_users.id with ON DELETE SET NULL.

Relationships:

  • Self-referential parent_document relationship for archive extraction trees.
  • owner_user relationship to AppUser.

processing_logs

Model: ProcessingLogEntry in backend/app/models/processing_log.py

Purpose:

  • Stores timestamped pipeline events for upload, extraction, OCR, routing, indexing, and errors.

Core fields:

  • Event identity and timing: id, created_at
  • Event classification: level, stage, event
  • Document linkage: document_id, document_filename
  • Model context: provider_id, model_name
  • Prompt or response traces: prompt_text, response_text
  • Structured event payload: payload_json

Foreign keys:

  • document_id references documents.id with ON DELETE SET NULL.

Model Lifecycle Notes

  • API startup initializes schema and creates or refreshes bootstrap users from auth environment variables.
  • POST /auth/login validates AppUser credentials, creates AuthSession with hashed token, and returns bearer token once.
  • Upload inserts Document row in queued state, assigns owner_user_id, and enqueues background processing.
  • Worker updates extraction results and final status (processed, unsupported, or error), preserving ownership on archive descendants.
  • User-role queries are owner-scoped; admin-role queries can access all documents.
  • Trash and restore operations toggle status while preserving source files until permanent delete.
  • Permanent delete removes the document tree (including archive descendants) and associated stored files.