Files
ledgerdock/doc/data-model-reference.md

2.0 KiB

Data Model Reference

Primary SQLAlchemy models are defined in backend/app/models/.

documents

Model: Document in backend/app/models/document.py

Purpose:

  • Stores source file identity, storage location, extracted content, lifecycle status, and classification metadata.

Core fields:

  • Identity and source: id, original_filename, source_relative_path, stored_relative_path
  • File attributes: mime_type, extension, sha256, size_bytes
  • Organization: logical_path, suggested_path, tags, suggested_tags
  • Processing outputs: extracted_text, image_text_type, handwriting_style_id, preview_available
  • Lifecycle and relations: status, is_archive_member, archived_member_path, parent_document_id, replaces_document_id
  • Metadata and timestamps: metadata_json, created_at, processed_at, updated_at

Enum DocumentStatus:

  • queued
  • processed
  • unsupported
  • error
  • trashed

Relationships:

  • Self-referential parent_document relationship for archive extraction trees.

processing_logs

Model: ProcessingLogEntry in backend/app/models/processing_log.py

Purpose:

  • Stores timestamped pipeline events for upload, extraction, OCR, routing, indexing, and errors.

Core fields:

  • Event identity and timing: id, created_at
  • Event classification: level, stage, event
  • Document linkage: document_id, document_filename
  • Model context: provider_id, model_name
  • Prompt or response traces: prompt_text, response_text
  • Structured event payload: payload_json

Foreign keys:

  • document_id references documents.id with ON DELETE SET NULL.

Model Lifecycle Notes

  • Upload inserts a Document row in queued state and enqueues background processing.
  • Worker updates extraction results and final status (processed, unsupported, or error).
  • Trash and restore operations toggle status while preserving source files until permanent delete.
  • Permanent delete removes the document tree (including archive descendants) and associated stored files.