Initial commit

2026-02-21 09:44:18 -03:00
commit 5dfc2cbd85
65 changed files with 11989 additions and 0 deletions
@@ -0,0 +1,30 @@
 # Python
 __pycache__/
 *.py[cod]
 *.pyo
 .mypy_cache/
 .pytest_cache/
 # Node / JS
 node_modules/
 frontend/node_modules/
 frontend/dist/
 # Build output (optional)
 dist/
 build/
 # Environment
 .env
 *.env
 !.env.example
 # Data and generated artifacts (runtime only)
 data/postgres/
 data/redis/
 data/storage/
 # OS / IDE
 .DS_Store
 .vscode/
 .idea/
@@ -0,0 +1,44 @@
 # Repository Guidelines
 ## Project Structure & Module Organization
 `backend/` contains FastAPI and worker code. Main logic is in `backend/app/`: routes in `api/`, business logic in `services/`, persistence in `db/` and `models/`, and jobs in `worker/`.  
 `frontend/` is a Vite + React + TypeScript app with UI code in `frontend/src/` (`components/`, `lib/api.ts`, `types.ts`, styles).  
 `doc/` stores project documentation, with `doc/README.md` as the entrypoint.  
 `docker-compose.yml` defines API, worker, frontend, Postgres, Redis, and Typesense. Runtime data is persisted in `./data/`.
 ## Build, Test, and Development Commands
 Docker workflow (required):
 - `multipass shell dcm-dev` - enter the Linux VM used for Docker.
 - `cd ~/dcm` - move to the project directory inside the VM.
 - `docker compose up --build -d` - build and start all services.
 - `docker compose down` - stop and remove containers.
 - `docker compose logs -f` - stream logs across services.
 - `DCM_DATA_DIR=/data docker compose up --build -d` - override host data directory.
 - `exit` - leave the VM after Docker operations.
 Frontend-only workflow:
 - `cd frontend && npm run dev` - start Vite dev server.
 - `cd frontend && npm run build` - run TypeScript checks and build production assets.
 - `cd frontend && npm run preview` - serve built frontend locally.
 ## Coding Style & Naming Conventions
 Follow existing module boundaries; keep files focused.  
 Python style in `backend/`: 4-space indentation, type hints, and docstrings for modules and functions; use `snake_case` for functions/modules and `PascalCase` for classes.  
 TypeScript style in `frontend/`: strict compiler settings (`strict`, `noFallthroughCasesInSwitch`, ES2022 target). Use `PascalCase` for React components (`DocumentCard.tsx`) and `camelCase` for variables/functions.
 ## Testing Guidelines
 This repository currently has no committed automated test suite. For each change, run the stack and validate impacted API/UI flows manually. Verify:
 - API health at `GET /api/v1/health`
 - document upload and search behavior in the frontend
 - service logs for processing failures (`docker compose logs -f` inside the VM)
 When introducing automated tests, place them near the relevant module and document execution commands in `README.md`.
 Use `test_*.py` naming for backend tests and `*.test.tsx` for frontend tests.
 ## Commit & Pull Request Guidelines
 Git history is not available in this workspace snapshot, so no local message pattern can be inferred. Use concise, imperative commit subjects scoped to one change.  
 PRs should include:
 - what changed and why
 - linked issue/task ID
 - manual verification steps and outcomes
 - screenshots or short recordings for UI changes
@@ -0,0 +1,118 @@
 # DMS
 DMS is a single-user document management system with:
 - drag-and-drop upload anywhere in the UI
 - file and folder upload
 - document processing and indexing (PDF, text, OpenAI handwriting/image transcription, DOCX, XLSX, ZIP extraction)
 - fallback handling for unsupported formats
 - original file preservation and download
 - metadata-based and full-text search
 - learning-based routing suggestions
 ## Requirements
 - Docker Engine with Docker Compose plugin
 - Internet access for the first image build
 ## Install And Run With Docker Compose
 1. Open a terminal in this repository root.
 2. Start the full stack:
 ```bash
 docker compose up --build -d
 ```
 3. Open the applications:
 - Frontend: `http://localhost:5173`
 - Backend API docs: `http://localhost:8000/docs`
 - Health check: `http://localhost:8000/api/v1/health`
 ## Setup
 1. Open the frontend and upload files or folders.
 2. Set default destination path and tags before upload if needed.
 3. Configure handwriting transcription settings in the UI:
 - OpenAI compatible base URL
 - model (default: `gpt-4.1-mini`)
 - API key and timeout
 4. Open a document in the details panel, adjust destination and tags, then save.
 5. Keep `Learn from this routing decision` enabled to train future routing suggestions.
 ## Data Persistence On Host
 All runtime data is stored on the host using bind mounts.
 Default host location:
 - `./data/postgres`
 - `./data/redis`
 - `./data/storage`
 To persist under another host directory, set `DCM_DATA_DIR`:
 ```bash
 DCM_DATA_DIR=/data docker compose up --build -d
 ```
 This will place runtime data under `/data` on the host.
 ## Common Commands
 Start:
 ```bash
 docker compose up --build -d
 ```
 Stop:
 ```bash
 docker compose down
 ```
 View logs:
 ```bash
 docker compose logs -f
 ```
 Rebuild a clean stack while keeping persisted data:
 ```bash
 docker compose down
 docker compose up --build -d
 ```
 Reset all persisted runtime data:
 ```bash
 docker compose down
 rm -rf ./data
 ```
 ## Handwriting Transcription Notes
 - Handwriting and image transcription uses an OpenAI compatible vision endpoint.
 - Before transcription, images are normalized:
  - EXIF rotation is corrected
  - long side is resized to a maximum of 2000px
  - image is sent as a base64 data URL payload
 - Handwriting provider settings are persisted in host storage and survive container restarts.
 ## API Overview
 GET endpoints:
 - `GET /api/v1/health`
 - `GET /api/v1/documents`
 - `GET /api/v1/documents/{document_id}`
 - `GET /api/v1/documents/{document_id}/preview`
 - `GET /api/v1/documents/{document_id}/download`
 - `GET /api/v1/documents/tags`
 - `GET /api/v1/search?query=...`
 - `GET /api/v1/settings`
 Additional endpoints used by the UI:
 - `POST /api/v1/documents/upload`
 - `PATCH /api/v1/documents/{document_id}`
 - `POST /api/v1/documents/{document_id}/reprocess`
 - `PATCH /api/v1/settings/handwriting`
@@ -0,0 +1,17 @@
 APP_ENV=development
 DATABASE_URL=postgresql+psycopg://dcm:dcm@db:5432/dcm
 REDIS_URL=redis://redis:6379/0
 STORAGE_ROOT=/data/storage
 DEFAULT_OPENAI_BASE_URL=https://api.openai.com/v1
 DEFAULT_OPENAI_MODEL=gpt-4.1-mini
 DEFAULT_OPENAI_TIMEOUT_SECONDS=45
 DEFAULT_OPENAI_HANDWRITING_ENABLED=true
 DEFAULT_OPENAI_API_KEY=
 DEFAULT_SUMMARY_MODEL=gpt-4.1-mini
 DEFAULT_ROUTING_MODEL=gpt-4.1-mini
 TYPESENSE_PROTOCOL=http
 TYPESENSE_HOST=typesense
 TYPESENSE_PORT=8108
 TYPESENSE_API_KEY=dcm-typesense-key
 TYPESENSE_COLLECTION_NAME=documents
 PUBLIC_BASE_URL=http://localhost:8000
@@ -0,0 +1,17 @@
 FROM python:3.12-slim
 ENV PYTHONDONTWRITEBYTECODE=1
 ENV PYTHONUNBUFFERED=1
 WORKDIR /app
 RUN apt-get update && apt-get install -y --no-install-recommends \
    libmagic1 \
    && rm -rf /var/lib/apt/lists/*
 COPY requirements.txt /app/requirements.txt
 RUN pip install --no-cache-dir -r /app/requirements.txt
 COPY app /app/app
 CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
@@ -0,0 +1 @@
 """Backend application package for the DMS service."""
@@ -0,0 +1 @@
 """API package containing route modules and router registration."""
@@ -0,0 +1,17 @@
 """API router registration for all HTTP route modules."""
 from fastapi import APIRouter
 from app.api.routes_documents import router as documents_router
 from app.api.routes_health import router as health_router
 from app.api.routes_processing_logs import router as processing_logs_router
 from app.api.routes_search import router as search_router
 from app.api.routes_settings import router as settings_router
 api_router = APIRouter()
 api_router.include_router(health_router)
 api_router.include_router(documents_router, prefix="/documents", tags=["documents"])
 api_router.include_router(processing_logs_router, prefix="/processing/logs", tags=["processing-logs"])
 api_router.include_router(search_router, prefix="/search", tags=["search"])
 api_router.include_router(settings_router, prefix="/settings", tags=["settings"])
@@ -0,0 +1,725 @@
 """Document CRUD, lifecycle, metadata, file access, and content export endpoints."""
 import io
 import re
 import unicodedata
 import zipfile
 from datetime import datetime, time
 from pathlib import Path
 from typing import Annotated, Literal
 from uuid import UUID
 from fastapi import APIRouter, Depends, File, Form, HTTPException, Query, UploadFile
 from fastapi.responses import FileResponse, Response, StreamingResponse
 from sqlalchemy import or_, func, select
 from sqlalchemy.orm import Session
 from app.services.app_settings import read_predefined_paths_settings, read_predefined_tags_settings
 from app.db.base import get_session
 from app.models.document import Document, DocumentStatus
 from app.schemas.documents import (
    ContentExportRequest,
    DocumentDetailResponse,
    DocumentResponse,
    DocumentsListResponse,
    DocumentUpdateRequest,
    UploadConflict,
    UploadResponse,
 )
 from app.services.extractor import sniff_mime
 from app.services.handwriting_style import delete_many_handwriting_style_documents
 from app.services.processing_logs import log_processing_event, set_processing_log_autocommit
 from app.services.storage import absolute_path, compute_sha256, store_bytes
 from app.services.typesense_index import delete_many_documents_index, upsert_document_index
 from app.worker.queue import get_processing_queue
 router = APIRouter()
 def _parse_csv(value: str | None) -> list[str]:
    """Parses comma-separated query values into a normalized non-empty list."""
    if not value:
        return []
    return [part.strip() for part in value.split(",") if part.strip()]
 def _parse_date(value: str | None) -> datetime | None:
    """Parses ISO date strings into UTC-naive midnight datetimes."""
    if not value:
        return None
    try:
        parsed = datetime.fromisoformat(value)
        return parsed
    except ValueError:
        pass
    try:
        date_value = datetime.strptime(value, "%Y-%m-%d").date()
        return datetime.combine(date_value, time.min)
    except ValueError:
        return None
 def _apply_discovery_filters(
    statement,
    *,
    path_filter: str | None,
    tag_filter: str | None,
    type_filter: str | None,
    processed_from: str | None,
    processed_to: str | None,
 ):
    """Applies optional path/tag/type/date filters to list and search statements."""
    if path_filter and path_filter.strip():
        statement = statement.where(Document.logical_path.ilike(f"{path_filter.strip()}%"))
    tags = _parse_csv(tag_filter)
    if tags:
        statement = statement.where(Document.tags.overlap(tags))
    types = _parse_csv(type_filter)
    if types:
        type_clauses = []
        for value in types:
            lowered = value.lower()
            type_clauses.append(Document.extension.ilike(lowered))
            type_clauses.append(Document.mime_type.ilike(lowered))
            type_clauses.append(Document.image_text_type.ilike(lowered))
        statement = statement.where(or_(*type_clauses))
    processed_from_dt = _parse_date(processed_from)
    if processed_from_dt is not None:
        statement = statement.where(Document.processed_at.is_not(None), Document.processed_at >= processed_from_dt)
    processed_to_dt = _parse_date(processed_to)
    if processed_to_dt is not None:
        statement = statement.where(Document.processed_at.is_not(None), Document.processed_at <= processed_to_dt)
    return statement
 def _summary_for_index(document: Document) -> str:
    """Resolves best-available summary text for semantic index updates outside worker pipeline."""
    candidate = document.metadata_json.get("summary_text")
    if isinstance(candidate, str) and candidate.strip():
        return candidate.strip()
    extracted = document.extracted_text.strip()
    if extracted:
        return extracted[:12000]
    return f"{document.original_filename}\n{document.mime_type}\n{document.logical_path}"
 def _normalize_tags(raw_tags: str | None) -> list[str]:
    """Parses comma-separated tags into a cleaned unique list."""
    if not raw_tags:
        return []
    tags = [tag.strip() for tag in raw_tags.split(",") if tag.strip()]
    return list(dict.fromkeys(tags))[:50]
 def _sanitize_filename(filename: str) -> str:
    """Normalizes user-supplied filenames while preserving readability and extensions."""
    base = filename.strip().replace("\\", " ").replace("/", " ")
    base = re.sub(r"\s+", " ", base)
    return base[:512] or "document"
 def _slugify_segment(value: str) -> str:
    """Creates a filesystem-safe slug for path segments and markdown file names."""
    normalized = unicodedata.normalize("NFKD", value)
    ascii_text = normalized.encode("ascii", "ignore").decode("ascii")
    cleaned = re.sub(r"[^a-zA-Z0-9._ -]+", "", ascii_text).strip()
    compact = re.sub(r"\s+", "-", cleaned)
    compact = compact.strip(".-_")
    return compact[:120] or "document"
 def _markdown_for_document(document: Document) -> str:
    """Builds a markdown representation of extracted document content and metadata."""
    lines = [
        f"# {document.original_filename}",
        "",
        f"- Document ID: `{document.id}`",
        f"- Logical Path: `{document.logical_path}`",
        f"- Source Path: `{document.source_relative_path}`",
        f"- Tags: {', '.join(document.tags) if document.tags else '(none)' }",
        "",
        "## Extracted Content",
        "",
    ]
    if document.extracted_text.strip():
        lines.append(document.extracted_text)
    else:
        lines.append("_No extracted text available for this document._")
    return "\n".join(lines).strip() + "\n"
 def _markdown_filename(document: Document) -> str:
    """Builds a deterministic markdown filename for a single document export."""
    stem = Path(document.original_filename).stem or document.original_filename
    slug = _slugify_segment(stem)
    return f"{slug}-{str(document.id)[:8]}.md"
 def _zip_entry_name(document: Document, used_names: set[str]) -> str:
    """Builds a unique zip entry path for a document markdown export."""
    path_segments = [segment for segment in document.logical_path.split("/") if segment]
    sanitized_segments = [_slugify_segment(segment) for segment in path_segments]
    filename = _markdown_filename(document)
    base_entry = "/".join([*sanitized_segments, filename]) if sanitized_segments else filename
    entry = base_entry
    suffix = 1
    while entry in used_names:
        stem = Path(filename).stem
        ext = Path(filename).suffix
        candidate = f"{stem}-{suffix}{ext}"
        entry = "/".join([*sanitized_segments, candidate]) if sanitized_segments else candidate
        suffix += 1
    used_names.add(entry)
    return entry
 def _resolve_previous_status(metadata_json: dict, fallback_status: DocumentStatus) -> DocumentStatus:
    """Resolves the status to restore from trash using recorded metadata."""
    raw_status = metadata_json.get("status_before_trash")
    if isinstance(raw_status, str):
        try:
            parsed = DocumentStatus(raw_status)
            if parsed != DocumentStatus.TRASHED:
                return parsed
        except ValueError:
            pass
    return fallback_status
 def _build_document_list_statement(
    only_trashed: bool,
    include_trashed: bool,
    path_prefix: str | None,
 ):
    """Builds a base SQLAlchemy select statement with lifecycle and path filters."""
    statement = select(Document)
    if only_trashed:
        statement = statement.where(Document.status == DocumentStatus.TRASHED)
    elif not include_trashed:
        statement = statement.where(Document.status != DocumentStatus.TRASHED)
    if path_prefix:
        trimmed_prefix = path_prefix.strip()
        if trimmed_prefix:
            statement = statement.where(Document.logical_path.ilike(f"{trimmed_prefix}%"))
    return statement
 def _collect_document_tree(session: Session, root_document_id: UUID) -> list[tuple[int, Document]]:
    """Collects a document and all descendants for recursive permanent deletion."""
    queue: list[tuple[UUID, int]] = [(root_document_id, 0)]
    visited: set[UUID] = set()
    collected: list[tuple[int, Document]] = []
    while queue:
        current_id, depth = queue.pop(0)
        if current_id in visited:
            continue
        visited.add(current_id)
        document = session.execute(select(Document).where(Document.id == current_id)).scalar_one_or_none()
        if document is None:
            continue
        collected.append((depth, document))
        child_ids = session.execute(
            select(Document.id).where(Document.parent_document_id == current_id)
        ).scalars().all()
        for child_id in child_ids:
            queue.append((child_id, depth + 1))
    collected.sort(key=lambda item: item[0], reverse=True)
    return collected
@router.get("", response_model=DocumentsListResponse)
 def list_documents(
    offset: int = Query(default=0, ge=0),
    limit: int = Query(default=50, ge=1, le=200),
    include_trashed: bool = Query(default=False),
    only_trashed: bool = Query(default=False),
    path_prefix: str | None = Query(default=None),
    path_filter: str | None = Query(default=None),
    tag_filter: str | None = Query(default=None),
    type_filter: str | None = Query(default=None),
    processed_from: str | None = Query(default=None),
    processed_to: str | None = Query(default=None),
    session: Session = Depends(get_session),
 ) -> DocumentsListResponse:
    """Returns paginated documents ordered by newest upload timestamp."""
    base_statement = _build_document_list_statement(
        only_trashed=only_trashed,
        include_trashed=include_trashed,
        path_prefix=path_prefix,
    )
    base_statement = _apply_discovery_filters(
        base_statement,
        path_filter=path_filter,
        tag_filter=tag_filter,
        type_filter=type_filter,
        processed_from=processed_from,
        processed_to=processed_to,
    )
    statement = base_statement.order_by(Document.created_at.desc()).offset(offset).limit(limit)
    items = session.execute(statement).scalars().all()
    count_statement = select(func.count()).select_from(base_statement.subquery())
    total = session.execute(count_statement).scalar_one()
    return DocumentsListResponse(total=total, items=[DocumentResponse.model_validate(item) for item in items])
@router.get("/tags")
 def list_tags(
    include_trashed: bool = Query(default=False),
    session: Session = Depends(get_session),
 ) -> dict[str, list[str]]:
    """Returns distinct tags currently assigned across all matching documents."""
    statement = select(Document.tags)
    if not include_trashed:
        statement = statement.where(Document.status != DocumentStatus.TRASHED)
    rows = session.execute(statement).scalars().all()
    tags = {tag for row in rows for tag in row if tag}
    tags.update(
        str(item.get("value", "")).strip()
        for item in read_predefined_tags_settings()
        if str(item.get("value", "")).strip()
    )
    tags = sorted(tags)
    return {"tags": tags}
@router.get("/paths")
 def list_paths(
    include_trashed: bool = Query(default=False),
    session: Session = Depends(get_session),
 ) -> dict[str, list[str]]:
    """Returns distinct logical paths currently assigned across all matching documents."""
    statement = select(Document.logical_path)
    if not include_trashed:
        statement = statement.where(Document.status != DocumentStatus.TRASHED)
    rows = session.execute(statement).scalars().all()
    paths = {row for row in rows if row}
    paths.update(
        str(item.get("value", "")).strip()
        for item in read_predefined_paths_settings()
        if str(item.get("value", "")).strip()
    )
    paths = sorted(paths)
    return {"paths": paths}
@router.get("/types")
 def list_types(
    include_trashed: bool = Query(default=False),
    session: Session = Depends(get_session),
 ) -> dict[str, list[str]]:
    """Returns distinct document type values from extension, MIME, and image text type."""
    statement = select(Document.extension, Document.mime_type, Document.image_text_type)
    if not include_trashed:
        statement = statement.where(Document.status != DocumentStatus.TRASHED)
    rows = session.execute(statement).all()
    values: set[str] = set()
    for extension, mime_type, image_text_type in rows:
        for candidate in (extension, mime_type, image_text_type):
            normalized = str(candidate).strip().lower() if isinstance(candidate, str) else ""
            if normalized:
                values.add(normalized)
    return {"types": sorted(values)}
@router.post("/content-md/export")
 def export_contents_markdown(
    payload: ContentExportRequest,
    session: Session = Depends(get_session),
 ) -> StreamingResponse:
    """Exports extracted contents for selected documents as individual markdown files in a ZIP archive."""
    has_document_ids = len(payload.document_ids) > 0
    has_path_prefix = bool(payload.path_prefix and payload.path_prefix.strip())
    if not has_document_ids and not has_path_prefix:
        raise HTTPException(status_code=400, detail="Provide document_ids or path_prefix for export")
    statement = select(Document)
    if has_document_ids:
        statement = statement.where(Document.id.in_(payload.document_ids))
    if has_path_prefix:
        statement = statement.where(Document.logical_path.ilike(f"{payload.path_prefix.strip()}%"))
    if payload.only_trashed:
        statement = statement.where(Document.status == DocumentStatus.TRASHED)
    elif not payload.include_trashed:
        statement = statement.where(Document.status != DocumentStatus.TRASHED)
    documents = session.execute(statement.order_by(Document.logical_path.asc(), Document.created_at.asc())).scalars().all()
    if not documents:
        raise HTTPException(status_code=404, detail="No matching documents found for export")
    archive_buffer = io.BytesIO()
    used_entries: set[str] = set()
    with zipfile.ZipFile(archive_buffer, mode="w", compression=zipfile.ZIP_DEFLATED) as archive:
        for document in documents:
            entry_name = _zip_entry_name(document, used_entries)
            archive.writestr(entry_name, _markdown_for_document(document))
    archive_buffer.seek(0)
    headers = {"Content-Disposition": 'attachment; filename="document-contents-md.zip"'}
    return StreamingResponse(archive_buffer, media_type="application/zip", headers=headers)
@router.get("/{document_id}", response_model=DocumentDetailResponse)
 def get_document(document_id: UUID, session: Session = Depends(get_session)) -> DocumentDetailResponse:
    """Returns one document by unique identifier."""
    document = session.execute(select(Document).where(Document.id == document_id)).scalar_one_or_none()
    if document is None:
        raise HTTPException(status_code=404, detail="Document not found")
    return DocumentDetailResponse.model_validate(document)
@router.get("/{document_id}/download")
 def download_document(document_id: UUID, session: Session = Depends(get_session)) -> FileResponse:
    """Downloads original document bytes for the requested document identifier."""
    document = session.execute(select(Document).where(Document.id == document_id)).scalar_one_or_none()
    if document is None:
        raise HTTPException(status_code=404, detail="Document not found")
    file_path = absolute_path(document.stored_relative_path)
    return FileResponse(path=file_path, filename=document.original_filename, media_type=document.mime_type)
@router.get("/{document_id}/preview")
 def preview_document(document_id: UUID, session: Session = Depends(get_session)) -> FileResponse:
    """Streams the original document inline when browser rendering is supported."""
    document = session.execute(select(Document).where(Document.id == document_id)).scalar_one_or_none()
    if document is None:
        raise HTTPException(status_code=404, detail="Document not found")
    original_path = absolute_path(document.stored_relative_path)
    return FileResponse(path=original_path, media_type=document.mime_type)
@router.get("/{document_id}/thumbnail")
 def thumbnail_document(document_id: UUID, session: Session = Depends(get_session)) -> FileResponse:
    """Returns a generated thumbnail image for dashboard card previews."""
    document = session.execute(select(Document).where(Document.id == document_id)).scalar_one_or_none()
    if document is None:
        raise HTTPException(status_code=404, detail="Document not found")
    preview_relative_path = document.metadata_json.get("preview_relative_path")
    if not preview_relative_path:
        raise HTTPException(status_code=404, detail="Thumbnail not available")
    preview_path = absolute_path(preview_relative_path)
    if not preview_path.exists():
        raise HTTPException(status_code=404, detail="Thumbnail file not found")
    return FileResponse(path=preview_path)
@router.get("/{document_id}/content-md")
 def download_document_content_markdown(document_id: UUID, session: Session = Depends(get_session)) -> Response:
    """Downloads extracted content for one document as a markdown file."""
    document = session.execute(select(Document).where(Document.id == document_id)).scalar_one_or_none()
    if document is None:
        raise HTTPException(status_code=404, detail="Document not found")
    markdown_content = _markdown_for_document(document)
    filename = _markdown_filename(document)
    headers = {"Content-Disposition": f'attachment; filename="{filename}"'}
    return Response(content=markdown_content, media_type="text/markdown; charset=utf-8", headers=headers)
@router.post("/upload", response_model=UploadResponse)
 async def upload_documents(
    files: Annotated[list[UploadFile], File(description="Files to upload")],
    relative_paths: Annotated[list[str] | None, Form()] = None,
    logical_path: Annotated[str, Form()] = "Inbox",
    tags: Annotated[str | None, Form()] = None,
    conflict_mode: Annotated[Literal["ask", "replace", "duplicate"], Form()] = "ask",
    session: Session = Depends(get_session),
 ) -> UploadResponse:
    """Uploads files, records metadata, and enqueues asynchronous extraction tasks."""
    set_processing_log_autocommit(session, True)
    normalized_tags = _normalize_tags(tags)
    queue = get_processing_queue()
    uploaded: list[DocumentResponse] = []
    conflicts: list[UploadConflict] = []
    indexed_relative_paths = relative_paths or []
    prepared_uploads: list[dict[str, object]] = []
    for idx, file in enumerate(files):
        filename = file.filename or f"uploaded_{idx}"
        data = await file.read()
        sha256 = compute_sha256(data)
        source_relative_path = indexed_relative_paths[idx] if idx < len(indexed_relative_paths) else filename
        extension = Path(filename).suffix.lower()
        detected_mime = sniff_mime(data)
        log_processing_event(
            session=session,
            stage="upload",
            event="Upload request received",
            level="info",
            document_filename=filename,
            payload_json={
                "source_relative_path": source_relative_path,
                "logical_path": logical_path,
                "tags": normalized_tags,
                "mime_type": detected_mime,
                "size_bytes": len(data),
                "conflict_mode": conflict_mode,
            },
        )
        prepared_uploads.append(
            {
                "filename": filename,
                "data": data,
                "sha256": sha256,
                "source_relative_path": source_relative_path,
                "extension": extension,
                "mime_type": detected_mime,
            }
        )
        existing = session.execute(select(Document).where(Document.sha256 == sha256)).scalar_one_or_none()
        if existing and conflict_mode == "ask":
            log_processing_event(
                session=session,
                stage="upload",
                event="Upload conflict detected",
                level="warning",
                document_id=existing.id,
                document_filename=filename,
                payload_json={
                    "sha256": sha256,
                    "existing_document_id": str(existing.id),
                },
            )
            conflicts.append(
                UploadConflict(
                    original_filename=filename,
                    sha256=sha256,
                    existing_document_id=existing.id,
                )
            )
    if conflicts and conflict_mode == "ask":
        session.commit()
        return UploadResponse(uploaded=[], conflicts=conflicts)
    for prepared in prepared_uploads:
        existing = session.execute(
            select(Document).where(Document.sha256 == str(prepared["sha256"]))
        ).scalar_one_or_none()
        replaces_document_id = existing.id if existing and conflict_mode == "replace" else None
        stored_relative_path = store_bytes(str(prepared["filename"]), bytes(prepared["data"]))
        document = Document(
            original_filename=str(prepared["filename"]),
            source_relative_path=str(prepared["source_relative_path"]),
            stored_relative_path=stored_relative_path,
            mime_type=str(prepared["mime_type"]),
            extension=str(prepared["extension"]),
            sha256=str(prepared["sha256"]),
            size_bytes=len(bytes(prepared["data"])),
            logical_path=logical_path,
            tags=list(normalized_tags),
            replaces_document_id=replaces_document_id,
            metadata_json={"upload": "web"},
        )
        session.add(document)
        session.flush()
        queue.enqueue("app.worker.tasks.process_document_task", str(document.id))
        log_processing_event(
            session=session,
            stage="upload",
            event="Document record created and queued",
            level="info",
            document=document,
            payload_json={
                "source_relative_path": document.source_relative_path,
                "stored_relative_path": document.stored_relative_path,
                "logical_path": document.logical_path,
                "tags": list(document.tags),
                "replaces_document_id": str(replaces_document_id) if replaces_document_id is not None else None,
            },
        )
        uploaded.append(DocumentResponse.model_validate(document))
    session.commit()
    return UploadResponse(uploaded=uploaded, conflicts=conflicts)
@router.patch("/{document_id}", response_model=DocumentResponse)
 def update_document(
    document_id: UUID,
    payload: DocumentUpdateRequest,
    session: Session = Depends(get_session),
 ) -> DocumentResponse:
    """Updates document metadata and refreshes semantic index representation."""
    document = session.execute(select(Document).where(Document.id == document_id)).scalar_one_or_none()
    if document is None:
        raise HTTPException(status_code=404, detail="Document not found")
    if payload.original_filename is not None:
        document.original_filename = _sanitize_filename(payload.original_filename)
    if payload.logical_path is not None:
        document.logical_path = payload.logical_path.strip() or "Inbox"
    if payload.tags is not None:
        document.tags = list(dict.fromkeys([tag.strip() for tag in payload.tags if tag.strip()]))[:50]
    try:
        upsert_document_index(document=document, summary_text=_summary_for_index(document))
    except Exception:
        pass
    session.commit()
    session.refresh(document)
    return DocumentResponse.model_validate(document)
@router.post("/{document_id}/trash", response_model=DocumentResponse)
 def trash_document(document_id: UUID, session: Session = Depends(get_session)) -> DocumentResponse:
    """Marks a document as trashed without deleting files from storage."""
    document = session.execute(select(Document).where(Document.id == document_id)).scalar_one_or_none()
    if document is None:
        raise HTTPException(status_code=404, detail="Document not found")
    if document.status != DocumentStatus.TRASHED:
        document.metadata_json = {
            **document.metadata_json,
            "status_before_trash": document.status.value,
        }
        document.status = DocumentStatus.TRASHED
        try:
            upsert_document_index(document=document, summary_text=_summary_for_index(document))
        except Exception:
            pass
        session.commit()
        session.refresh(document)
    return DocumentResponse.model_validate(document)
@router.post("/{document_id}/restore", response_model=DocumentResponse)
 def restore_document(document_id: UUID, session: Session = Depends(get_session)) -> DocumentResponse:
    """Restores a trashed document to its previous lifecycle status."""
    document = session.execute(select(Document).where(Document.id == document_id)).scalar_one_or_none()
    if document is None:
        raise HTTPException(status_code=404, detail="Document not found")
    if document.status == DocumentStatus.TRASHED:
        fallback = DocumentStatus.PROCESSED if document.processed_at else DocumentStatus.QUEUED
        restored_status = _resolve_previous_status(document.metadata_json, fallback)
        document.status = restored_status
        metadata_json = dict(document.metadata_json)
        metadata_json.pop("status_before_trash", None)
        document.metadata_json = metadata_json
        try:
            upsert_document_index(document=document, summary_text=_summary_for_index(document))
        except Exception:
            pass
        session.commit()
        session.refresh(document)
    return DocumentResponse.model_validate(document)
@router.delete("/{document_id}")
 def delete_document(document_id: UUID, session: Session = Depends(get_session)) -> dict[str, int]:
    """Permanently deletes a document and all descendant archive members including stored files."""
    root = session.execute(select(Document).where(Document.id == document_id)).scalar_one_or_none()
    if root is None:
        raise HTTPException(status_code=404, detail="Document not found")
    if root.status != DocumentStatus.TRASHED:
        raise HTTPException(status_code=400, detail="Move document to trash before permanent deletion")
    document_tree = _collect_document_tree(session=session, root_document_id=document_id)
    document_ids = [document.id for _, document in document_tree]
    try:
        delete_many_documents_index([str(current_id) for current_id in document_ids])
    except Exception:
        pass
    try:
        delete_many_handwriting_style_documents([str(current_id) for current_id in document_ids])
    except Exception:
        pass
    deleted_files = 0
    for _, document in document_tree:
        source_path = absolute_path(document.stored_relative_path)
        if source_path.exists() and source_path.is_file():
            source_path.unlink(missing_ok=True)
            deleted_files += 1
        preview_relative_path = document.metadata_json.get("preview_relative_path")
        if isinstance(preview_relative_path, str):
            preview_path = absolute_path(preview_relative_path)
            if preview_path.exists() and preview_path.is_file():
                preview_path.unlink(missing_ok=True)
        session.delete(document)
    session.commit()
    return {"deleted_documents": len(document_tree), "deleted_files": deleted_files}
@router.post("/{document_id}/reprocess", response_model=DocumentResponse)
 def reprocess_document(document_id: UUID, session: Session = Depends(get_session)) -> DocumentResponse:
    """Re-enqueues a document for extraction and suggestion processing."""
    document = session.execute(select(Document).where(Document.id == document_id)).scalar_one_or_none()
    if document is None:
        raise HTTPException(status_code=404, detail="Document not found")
    if document.status == DocumentStatus.TRASHED:
        raise HTTPException(status_code=400, detail="Restore document before reprocessing")
    queue = get_processing_queue()
    document.status = DocumentStatus.QUEUED
    try:
        upsert_document_index(document=document, summary_text=_summary_for_index(document))
    except Exception:
        pass
    session.commit()
    queue.enqueue("app.worker.tasks.process_document_task", str(document.id))
    session.refresh(document)
    return DocumentResponse.model_validate(document)
@@ -0,0 +1,13 @@
 """Health and readiness endpoints for orchestration and uptime checks."""
 from fastapi import APIRouter
 router = APIRouter(prefix="/health", tags=["health"])
@router.get("")
 def health() -> dict[str, str]:
    """Returns service liveness status."""
    return {"status": "ok"}
@@ -0,0 +1,66 @@
 """Read-only API endpoints for processing pipeline event logs."""
 from uuid import UUID
 from fastapi import APIRouter, Depends, Query
 from sqlalchemy.orm import Session
 from app.db.base import get_session
 from app.schemas.processing_logs import ProcessingLogEntryResponse, ProcessingLogListResponse
 from app.services.processing_logs import (
    cleanup_processing_logs,
    clear_processing_logs,
    count_processing_logs,
    list_processing_logs,
 )
 router = APIRouter()
@router.get("", response_model=ProcessingLogListResponse)
 def get_processing_logs(
    offset: int = Query(default=0, ge=0),
    limit: int = Query(default=120, ge=1, le=400),
    document_id: UUID | None = Query(default=None),
    session: Session = Depends(get_session),
 ) -> ProcessingLogListResponse:
    """Returns paginated processing logs ordered from newest to oldest."""
    items = list_processing_logs(
        session=session,
        limit=limit,
        offset=offset,
        document_id=document_id,
    )
    total = count_processing_logs(session=session, document_id=document_id)
    return ProcessingLogListResponse(
        total=total,
        items=[ProcessingLogEntryResponse.model_validate(item) for item in items],
    )
@router.post("/trim")
 def trim_processing_logs(
    keep_document_sessions: int = Query(default=2, ge=0, le=20),
    keep_unbound_entries: int = Query(default=80, ge=0, le=400),
    session: Session = Depends(get_session),
 ) -> dict[str, int]:
    """Deletes old processing logs while keeping recent document sessions and unbound events."""
    result = cleanup_processing_logs(
        session=session,
        keep_document_sessions=keep_document_sessions,
        keep_unbound_entries=keep_unbound_entries,
    )
    session.commit()
    return result
@router.post("/clear")
 def clear_all_processing_logs(session: Session = Depends(get_session)) -> dict[str, int]:
    """Deletes all processing logs to reset the diagnostics timeline."""
    result = clear_processing_logs(session=session)
    session.commit()
    return result
@@ -0,0 +1,84 @@
 """Search endpoints for full-text and metadata document discovery."""
 from fastapi import APIRouter, Depends, Query
 from sqlalchemy import Text, cast, func, select
 from sqlalchemy.orm import Session
 from app.api.routes_documents import _apply_discovery_filters
 from app.db.base import get_session
 from app.models.document import Document, DocumentStatus
 from app.schemas.documents import DocumentResponse, SearchResponse
 router = APIRouter()
@router.get("", response_model=SearchResponse)
 def search_documents(
    query: str = Query(min_length=2),
    offset: int = Query(default=0, ge=0),
    limit: int = Query(default=50, ge=1, le=200),
    include_trashed: bool = Query(default=False),
    only_trashed: bool = Query(default=False),
    path_filter: str | None = Query(default=None),
    tag_filter: str | None = Query(default=None),
    type_filter: str | None = Query(default=None),
    processed_from: str | None = Query(default=None),
    processed_to: str | None = Query(default=None),
    session: Session = Depends(get_session),
 ) -> SearchResponse:
    """Searches documents using PostgreSQL full-text ranking plus metadata matching."""
    vector = func.to_tsvector(
        "simple",
        func.coalesce(Document.original_filename, "")
        + " "
        + func.coalesce(Document.logical_path, "")
        + " "
        + func.coalesce(Document.extracted_text, "")
        + " "
        + func.coalesce(cast(Document.tags, Text), ""),
    )
    ts_query = func.plainto_tsquery("simple", query)
    rank = func.ts_rank_cd(vector, ts_query)
    search_filter = (
        vector.op("@@")(ts_query)
        | Document.original_filename.ilike(f"%{query}%")
        | Document.logical_path.ilike(f"%{query}%")
        | cast(Document.tags, Text).ilike(f"%{query}%")
    )
    statement = select(Document).where(search_filter)
    if only_trashed:
        statement = statement.where(Document.status == DocumentStatus.TRASHED)
    elif not include_trashed:
        statement = statement.where(Document.status != DocumentStatus.TRASHED)
    statement = _apply_discovery_filters(
        statement,
        path_filter=path_filter,
        tag_filter=tag_filter,
        type_filter=type_filter,
        processed_from=processed_from,
        processed_to=processed_to,
    )
    statement = statement.order_by(rank.desc(), Document.created_at.desc()).offset(offset).limit(limit)
    items = session.execute(statement).scalars().all()
    count_statement = select(func.count(Document.id)).where(search_filter)
    if only_trashed:
        count_statement = count_statement.where(Document.status == DocumentStatus.TRASHED)
    elif not include_trashed:
        count_statement = count_statement.where(Document.status != DocumentStatus.TRASHED)
    count_statement = _apply_discovery_filters(
        count_statement,
        path_filter=path_filter,
        tag_filter=tag_filter,
        type_filter=type_filter,
        processed_from=processed_from,
        processed_to=processed_to,
    )
    total = session.execute(count_statement).scalar_one()
    return SearchResponse(total=total, items=[DocumentResponse.model_validate(item) for item in items])
@@ -0,0 +1,232 @@
 """API routes for managing persistent single-user application settings."""
 from fastapi import APIRouter
 from app.schemas.settings import (
    AppSettingsUpdateRequest,
    AppSettingsResponse,
    DisplaySettingsResponse,
    HandwritingSettingsResponse,
    HandwritingStyleSettingsResponse,
    HandwritingSettingsUpdateRequest,
    OcrTaskSettingsResponse,
    ProviderSettingsResponse,
    RoutingTaskSettingsResponse,
    SummaryTaskSettingsResponse,
    TaskSettingsResponse,
    UploadDefaultsResponse,
 )
 from app.services.app_settings import (
    TASK_OCR_HANDWRITING,
    TASK_ROUTING_CLASSIFICATION,
    TASK_SUMMARY_GENERATION,
    read_app_settings,
    reset_app_settings,
    update_app_settings,
    update_handwriting_settings,
 )
 router = APIRouter()
 def _build_response(payload: dict) -> AppSettingsResponse:
    """Converts internal settings dictionaries into API response models."""
    upload_defaults_payload = payload.get("upload_defaults", {})
    display_payload = payload.get("display", {})
    providers_payload = payload.get("providers", [])
    tasks_payload = payload.get("tasks", {})
    handwriting_style_payload = payload.get("handwriting_style_clustering", {})
    ocr_payload = tasks_payload.get(TASK_OCR_HANDWRITING, {})
    summary_payload = tasks_payload.get(TASK_SUMMARY_GENERATION, {})
    routing_payload = tasks_payload.get(TASK_ROUTING_CLASSIFICATION, {})
    return AppSettingsResponse(
        upload_defaults=UploadDefaultsResponse(
            logical_path=str(upload_defaults_payload.get("logical_path", "Inbox")),
            tags=[
                str(tag).strip()
                for tag in upload_defaults_payload.get("tags", [])
                if isinstance(tag, str) and tag.strip()
            ],
        ),
        display=DisplaySettingsResponse(
            cards_per_page=int(display_payload.get("cards_per_page", 12)),
            log_typing_animation_enabled=bool(display_payload.get("log_typing_animation_enabled", True)),
        ),
        handwriting_style_clustering=HandwritingStyleSettingsResponse(
            enabled=bool(handwriting_style_payload.get("enabled", True)),
            embed_model=str(handwriting_style_payload.get("embed_model", "ts/clip-vit-b-p32")),
            neighbor_limit=int(handwriting_style_payload.get("neighbor_limit", 8)),
            match_min_similarity=float(handwriting_style_payload.get("match_min_similarity", 0.86)),
            bootstrap_match_min_similarity=float(
                handwriting_style_payload.get("bootstrap_match_min_similarity", 0.89)
            ),
            bootstrap_sample_size=int(handwriting_style_payload.get("bootstrap_sample_size", 3)),
            image_max_side=int(handwriting_style_payload.get("image_max_side", 1024)),
        ),
        predefined_paths=[
            {
                "value": str(item.get("value", "")).strip(),
                "global_shared": bool(item.get("global_shared", False)),
            }
            for item in payload.get("predefined_paths", [])
            if isinstance(item, dict) and str(item.get("value", "")).strip()
        ],
        predefined_tags=[
            {
                "value": str(item.get("value", "")).strip(),
                "global_shared": bool(item.get("global_shared", False)),
            }
            for item in payload.get("predefined_tags", [])
            if isinstance(item, dict) and str(item.get("value", "")).strip()
        ],
        providers=[
            ProviderSettingsResponse(
                id=str(provider.get("id", "")),
                label=str(provider.get("label", "")),
                provider_type=str(provider.get("provider_type", "openai_compatible")),
                base_url=str(provider.get("base_url", "https://api.openai.com/v1")),
                timeout_seconds=int(provider.get("timeout_seconds", 45)),
                api_key_set=bool(provider.get("api_key_set", False)),
                api_key_masked=str(provider.get("api_key_masked", "")),
            )
            for provider in providers_payload
        ],
        tasks=TaskSettingsResponse(
            ocr_handwriting=OcrTaskSettingsResponse(
                enabled=bool(ocr_payload.get("enabled", True)),
                provider_id=str(ocr_payload.get("provider_id", "openai-default")),
                model=str(ocr_payload.get("model", "gpt-4.1-mini")),
                prompt=str(ocr_payload.get("prompt", "")),
            ),
            summary_generation=SummaryTaskSettingsResponse(
                enabled=bool(summary_payload.get("enabled", True)),
                provider_id=str(summary_payload.get("provider_id", "openai-default")),
                model=str(summary_payload.get("model", "gpt-4.1-mini")),
                prompt=str(summary_payload.get("prompt", "")),
                max_input_tokens=int(summary_payload.get("max_input_tokens", 8000)),
            ),
            routing_classification=RoutingTaskSettingsResponse(
                enabled=bool(routing_payload.get("enabled", True)),
                provider_id=str(routing_payload.get("provider_id", "openai-default")),
                model=str(routing_payload.get("model", "gpt-4.1-mini")),
                prompt=str(routing_payload.get("prompt", "")),
                neighbor_count=int(routing_payload.get("neighbor_count", 8)),
                neighbor_min_similarity=float(routing_payload.get("neighbor_min_similarity", 0.84)),
                auto_apply_confidence_threshold=float(routing_payload.get("auto_apply_confidence_threshold", 0.78)),
                auto_apply_neighbor_similarity_threshold=float(
                    routing_payload.get("auto_apply_neighbor_similarity_threshold", 0.55)
                ),
                neighbor_path_override_enabled=bool(routing_payload.get("neighbor_path_override_enabled", True)),
                neighbor_path_override_min_similarity=float(
                    routing_payload.get("neighbor_path_override_min_similarity", 0.86)
                ),
                neighbor_path_override_min_gap=float(routing_payload.get("neighbor_path_override_min_gap", 0.04)),
                neighbor_path_override_max_confidence=float(
                    routing_payload.get("neighbor_path_override_max_confidence", 0.9)
                ),
            ),
        ),
    )
@router.get("", response_model=AppSettingsResponse)
 def get_app_settings() -> AppSettingsResponse:
    """Returns persisted provider and per-task settings configuration."""
    return _build_response(read_app_settings())
@router.patch("", response_model=AppSettingsResponse)
 def set_app_settings(payload: AppSettingsUpdateRequest) -> AppSettingsResponse:
    """Updates providers and task settings and returns resulting persisted configuration."""
    providers_payload = None
    if payload.providers is not None:
        providers_payload = [provider.model_dump() for provider in payload.providers]
    tasks_payload = None
    if payload.tasks is not None:
        tasks_payload = payload.tasks.model_dump(exclude_none=True)
    upload_defaults_payload = None
    if payload.upload_defaults is not None:
        upload_defaults_payload = payload.upload_defaults.model_dump(exclude_none=True)
    display_payload = None
    if payload.display is not None:
        display_payload = payload.display.model_dump(exclude_none=True)
    handwriting_style_payload = None
    if payload.handwriting_style_clustering is not None:
        handwriting_style_payload = payload.handwriting_style_clustering.model_dump(exclude_none=True)
    predefined_paths_payload = None
    if payload.predefined_paths is not None:
        predefined_paths_payload = [item.model_dump(exclude_none=True) for item in payload.predefined_paths]
    predefined_tags_payload = None
    if payload.predefined_tags is not None:
        predefined_tags_payload = [item.model_dump(exclude_none=True) for item in payload.predefined_tags]
    updated = update_app_settings(
        providers=providers_payload,
        tasks=tasks_payload,
        upload_defaults=upload_defaults_payload,
        display=display_payload,
        handwriting_style=handwriting_style_payload,
        predefined_paths=predefined_paths_payload,
        predefined_tags=predefined_tags_payload,
    )
    return _build_response(updated)
@router.post("/reset", response_model=AppSettingsResponse)
 def reset_settings_to_defaults() -> AppSettingsResponse:
    """Resets all persisted settings to default providers and task bindings."""
    return _build_response(reset_app_settings())
@router.patch("/handwriting", response_model=AppSettingsResponse)
 def set_handwriting_settings(payload: HandwritingSettingsUpdateRequest) -> AppSettingsResponse:
    """Updates handwriting transcription settings and returns the resulting configuration."""
    updated = update_handwriting_settings(
        enabled=payload.enabled,
        openai_base_url=payload.openai_base_url,
        openai_model=payload.openai_model,
        openai_timeout_seconds=payload.openai_timeout_seconds,
        openai_api_key=payload.openai_api_key,
        clear_openai_api_key=payload.clear_openai_api_key,
    )
    return _build_response(updated)
@router.get("/handwriting", response_model=HandwritingSettingsResponse)
 def get_handwriting_settings() -> HandwritingSettingsResponse:
    """Returns legacy handwriting response shape for compatibility with older clients."""
    payload = _build_response(read_app_settings())
    fallback_provider = ProviderSettingsResponse(
        id="openai-default",
        label="OpenAI Default",
        provider_type="openai_compatible",
        base_url="https://api.openai.com/v1",
        timeout_seconds=45,
        api_key_set=False,
        api_key_masked="",
    )
    ocr = payload.tasks.ocr_handwriting
    provider = next((item for item in payload.providers if item.id == ocr.provider_id), None)
    if provider is None:
        provider = payload.providers[0] if payload.providers else fallback_provider
    return HandwritingSettingsResponse(
        provider=provider.provider_type,
        enabled=ocr.enabled,
        openai_base_url=provider.base_url,
        openai_model=ocr.model,
        openai_timeout_seconds=provider.timeout_seconds,
        openai_api_key_set=provider.api_key_set,
        openai_api_key_masked=provider.api_key_masked,
    )
@@ -0,0 +1 @@
 """Core settings and shared configuration package."""
@@ -0,0 +1,46 @@
 """Application settings and environment configuration."""
 from functools import lru_cache
 from pathlib import Path
 from pydantic import Field
 from pydantic_settings import BaseSettings, SettingsConfigDict
 class Settings(BaseSettings):
    """Defines runtime configuration values loaded from environment variables."""
    model_config = SettingsConfigDict(env_file=".env", env_file_encoding="utf-8", extra="ignore")
    app_name: str = "dcm-dms"
    app_env: str = "development"
    database_url: str = "postgresql+psycopg://dcm:dcm@db:5432/dcm"
    redis_url: str = "redis://redis:6379/0"
    storage_root: Path = Path("/data/storage")
    upload_chunk_size: int = 4 * 1024 * 1024
    max_zip_members: int = 250
    max_zip_depth: int = 2
    max_text_length: int = 500_000
    default_openai_base_url: str = "https://api.openai.com/v1"
    default_openai_model: str = "gpt-4.1-mini"
    default_openai_timeout_seconds: int = 45
    default_openai_handwriting_enabled: bool = True
    default_openai_api_key: str = ""
    default_summary_model: str = "gpt-4.1-mini"
    default_routing_model: str = "gpt-4.1-mini"
    typesense_protocol: str = "http"
    typesense_host: str = "typesense"
    typesense_port: int = 8108
    typesense_api_key: str = "dcm-typesense-key"
    typesense_collection_name: str = "documents"
    typesense_timeout_seconds: int = 120
    typesense_num_retries: int = 0
    public_base_url: str = "http://localhost:8000"
    cors_origins: list[str] = Field(default_factory=lambda: ["http://localhost:5173", "http://localhost:3000"])
@lru_cache(maxsize=1)
 def get_settings() -> Settings:
    """Returns a cached settings object for dependency injection and service access."""
    return Settings()
@@ -0,0 +1 @@
 """Database package exposing engine and session utilities."""
@@ -0,0 +1,53 @@
 """Database engine and session utilities for persistence operations."""
 from collections.abc import Generator
 from sqlalchemy import create_engine, text
 from sqlalchemy.orm import Session, declarative_base, sessionmaker
 from app.core.config import get_settings
 Base = declarative_base()
 settings = get_settings()
 engine = create_engine(settings.database_url, pool_pre_ping=True)
 SessionLocal = sessionmaker(bind=engine, autoflush=False, autocommit=False, expire_on_commit=False)
 def get_session() -> Generator[Session, None, None]:
    """Provides a transactional database session for FastAPI request handling."""
    session = SessionLocal()
    try:
        yield session
    finally:
        session.close()
 def init_db() -> None:
    """Initializes all ORM tables and search-related database extensions/indexes."""
    from app import models  # noqa: F401
    Base.metadata.create_all(bind=engine)
    with engine.begin() as connection:
        connection.execute(text("CREATE EXTENSION IF NOT EXISTS pg_trgm"))
        connection.execute(
            text(
                """
                CREATE INDEX IF NOT EXISTS idx_documents_text_search
                ON documents
                USING GIN (
                    to_tsvector(
                        'simple',
                        coalesce(original_filename, '') || ' ' ||
                        coalesce(logical_path, '') || ' ' ||
                        coalesce(extracted_text, '')
                    )
                )
                """
            )
        )
        connection.execute(text("CREATE INDEX IF NOT EXISTS idx_documents_sha256 ON documents (sha256)"))
@@ -0,0 +1,50 @@
 """FastAPI entrypoint for the DMS backend service."""
 from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware
 from app.api.router import api_router
 from app.core.config import get_settings
 from app.db.base import init_db
 from app.services.app_settings import ensure_app_settings
 from app.services.handwriting_style import ensure_handwriting_style_collection
 from app.services.storage import ensure_storage
 from app.services.typesense_index import ensure_typesense_collection
 settings = get_settings()
 def create_app() -> FastAPI:
    """Builds and configures the FastAPI application instance."""
    app = FastAPI(title="DCM DMS API", version="0.1.0")
    app.add_middleware(
        CORSMiddleware,
        allow_origins=settings.cors_origins,
        allow_credentials=True,
        allow_methods=["*"],
        allow_headers=["*"],
    )
    app.include_router(api_router, prefix="/api/v1")
    @app.on_event("startup")
    def startup_event() -> None:
        """Initializes storage directories and database schema on service startup."""
        ensure_storage()
        ensure_app_settings()
        init_db()
        try:
            ensure_typesense_collection()
        except Exception:
            pass
        try:
            ensure_handwriting_style_collection()
        except Exception:
            pass
    return app
 app = create_app()
@@ -0,0 +1,6 @@
 """Model exports for ORM metadata discovery."""
 from app.models.document import Document, DocumentStatus
 from app.models.processing_log import ProcessingLogEntry
 __all__ = ["Document", "DocumentStatus", "ProcessingLogEntry"]
@@ -0,0 +1,65 @@
 """Data model representing a stored and processed document."""
 import uuid
 from datetime import UTC, datetime
 from enum import Enum
 from sqlalchemy import Boolean, DateTime, Enum as SqlEnum, ForeignKey, Integer, String, Text
 from sqlalchemy.dialects.postgresql import ARRAY, JSONB, UUID
 from sqlalchemy.orm import Mapped, mapped_column, relationship
 from app.db.base import Base
 class DocumentStatus(str, Enum):
    """Enumerates processing states for uploaded documents."""
    QUEUED = "queued"
    PROCESSED = "processed"
    UNSUPPORTED = "unsupported"
    ERROR = "error"
    TRASHED = "trashed"
 class Document(Base):
    """Stores file identity, storage paths, extracted content, and classification metadata."""
    __tablename__ = "documents"
    id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    original_filename: Mapped[str] = mapped_column(String(512), nullable=False)
    source_relative_path: Mapped[str] = mapped_column(String(1024), nullable=False, default="")
    stored_relative_path: Mapped[str] = mapped_column(String(1024), nullable=False)
    mime_type: Mapped[str] = mapped_column(String(255), nullable=False, default="application/octet-stream")
    extension: Mapped[str] = mapped_column(String(32), nullable=False, default="")
    sha256: Mapped[str] = mapped_column(String(128), nullable=False)
    size_bytes: Mapped[int] = mapped_column(Integer, nullable=False)
    logical_path: Mapped[str] = mapped_column(String(1024), nullable=False, default="Inbox")
    suggested_path: Mapped[str | None] = mapped_column(String(1024), nullable=True)
    tags: Mapped[list[str]] = mapped_column(ARRAY(String), nullable=False, default=list)
    suggested_tags: Mapped[list[str]] = mapped_column(ARRAY(String), nullable=False, default=list)
    metadata_json: Mapped[dict] = mapped_column(JSONB, nullable=False, default=dict)
    extracted_text: Mapped[str] = mapped_column(Text, nullable=False, default="")
    image_text_type: Mapped[str | None] = mapped_column(String(64), nullable=True)
    handwriting_style_id: Mapped[str | None] = mapped_column(String(64), nullable=True, index=True)
    status: Mapped[DocumentStatus] = mapped_column(SqlEnum(DocumentStatus), nullable=False, default=DocumentStatus.QUEUED)
    preview_available: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
    archived_member_path: Mapped[str | None] = mapped_column(String(1024), nullable=True)
    is_archive_member: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
    parent_document_id: Mapped[uuid.UUID | None] = mapped_column(UUID(as_uuid=True), ForeignKey("documents.id"), nullable=True)
    replaces_document_id: Mapped[uuid.UUID | None] = mapped_column(UUID(as_uuid=True), ForeignKey("documents.id"), nullable=True)
    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False, default=lambda: datetime.now(UTC))
    processed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
    updated_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True),
        nullable=False,
        default=lambda: datetime.now(UTC),
        onupdate=lambda: datetime.now(UTC),
    )
    parent_document: Mapped["Document | None"] = relationship(
        "Document",
        remote_side="Document.id",
        foreign_keys=[parent_document_id],
        post_update=True,
    )
@@ -0,0 +1,33 @@
 """Data model representing one persisted processing pipeline log entry."""
 import uuid
 from datetime import UTC, datetime
 from sqlalchemy import BigInteger, DateTime, ForeignKey, String, Text
 from sqlalchemy.dialects.postgresql import JSONB, UUID
 from sqlalchemy.orm import Mapped, mapped_column
 from app.db.base import Base
 class ProcessingLogEntry(Base):
    """Stores a timestamped processing event with optional model prompt and response text."""
    __tablename__ = "processing_logs"
    id: Mapped[int] = mapped_column(BigInteger, primary_key=True, autoincrement=True)
    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False, default=lambda: datetime.now(UTC))
    level: Mapped[str] = mapped_column(String(16), nullable=False, default="info")
    stage: Mapped[str] = mapped_column(String(64), nullable=False)
    event: Mapped[str] = mapped_column(String(256), nullable=False)
    document_id: Mapped[uuid.UUID | None] = mapped_column(
        UUID(as_uuid=True),
        ForeignKey("documents.id", ondelete="SET NULL"),
        nullable=True,
    )
    document_filename: Mapped[str | None] = mapped_column(String(512), nullable=True)
    provider_id: Mapped[str | None] = mapped_column(String(128), nullable=True)
    model_name: Mapped[str | None] = mapped_column(String(256), nullable=True)
    prompt_text: Mapped[str | None] = mapped_column(Text, nullable=True)
    response_text: Mapped[str | None] = mapped_column(Text, nullable=True)
    payload_json: Mapped[dict] = mapped_column(JSONB, nullable=False, default=dict)
@@ -0,0 +1 @@
 """Pydantic schema package for API request and response models."""
@@ -0,0 +1,92 @@
 """Pydantic schema definitions for document API payloads."""
 from datetime import datetime
 from uuid import UUID
 from pydantic import BaseModel, Field
 from app.models.document import DocumentStatus
 class DocumentResponse(BaseModel):
    """Represents a document record returned by API endpoints."""
    id: UUID
    original_filename: str
    source_relative_path: str
    mime_type: str
    extension: str
    size_bytes: int
    sha256: str
    logical_path: str
    suggested_path: str | None
    image_text_type: str | None
    handwriting_style_id: str | None
    tags: list[str] = Field(default_factory=list)
    suggested_tags: list[str] = Field(default_factory=list)
    status: DocumentStatus
    preview_available: bool
    is_archive_member: bool
    archived_member_path: str | None
    parent_document_id: UUID | None
    replaces_document_id: UUID | None
    created_at: datetime
    processed_at: datetime | None
    class Config:
        """Enables ORM object parsing for SQLAlchemy model instances."""
        from_attributes = True
 class DocumentDetailResponse(DocumentResponse):
    """Represents a full document payload including extracted text content."""
    extracted_text: str
    metadata_json: dict
 class DocumentsListResponse(BaseModel):
    """Represents a paginated document list response payload."""
    total: int
    items: list[DocumentResponse]
 class UploadConflict(BaseModel):
    """Describes an upload conflict where a matching checksum already exists."""
    original_filename: str
    sha256: str
    existing_document_id: UUID
 class UploadResponse(BaseModel):
    """Represents the result of a batch upload request."""
    uploaded: list[DocumentResponse] = Field(default_factory=list)
    conflicts: list[UploadConflict] = Field(default_factory=list)
 class DocumentUpdateRequest(BaseModel):
    """Captures document metadata changes."""
    original_filename: str | None = None
    logical_path: str | None = None
    tags: list[str] | None = None
 class SearchResponse(BaseModel):
    """Represents the result of a search query."""
    total: int
    items: list[DocumentResponse]
 class ContentExportRequest(BaseModel):
    """Describes filters used to export extracted document contents as Markdown files."""
    document_ids: list[UUID] = Field(default_factory=list)
    path_prefix: str | None = None
    include_trashed: bool = False
    only_trashed: bool = False
@@ -0,0 +1,35 @@
 """Pydantic schemas for processing pipeline log API payloads."""
 from datetime import datetime
 from uuid import UUID
 from pydantic import BaseModel, Field
 class ProcessingLogEntryResponse(BaseModel):
    """Represents one persisted processing log event returned by API endpoints."""
    id: int
    created_at: datetime
    level: str
    stage: str
    event: str
    document_id: UUID | None
    document_filename: str | None
    provider_id: str | None
    model_name: str | None
    prompt_text: str | None
    response_text: str | None
    payload_json: dict
    class Config:
        """Enables ORM object parsing for SQLAlchemy model instances."""
        from_attributes = True
 class ProcessingLogListResponse(BaseModel):
    """Represents a paginated collection of processing log records."""
    total: int
    items: list[ProcessingLogEntryResponse] = Field(default_factory=list)
@@ -0,0 +1,242 @@
 """Pydantic schemas for application-level runtime settings."""
 from pydantic import BaseModel, Field
 class ProviderSettingsResponse(BaseModel):
    """Represents a persisted model provider with non-secret connection metadata."""
    id: str
    label: str
    provider_type: str = "openai_compatible"
    base_url: str
    timeout_seconds: int
    api_key_set: bool
    api_key_masked: str = ""
 class ProviderSettingsUpdateRequest(BaseModel):
    """Represents a model provider create-or-update request."""
    id: str
    label: str
    provider_type: str = "openai_compatible"
    base_url: str
    timeout_seconds: int = Field(default=45, ge=5, le=180)
    api_key: str | None = None
    clear_api_key: bool = False
 class OcrTaskSettingsResponse(BaseModel):
    """Represents OCR task runtime settings and prompt configuration."""
    enabled: bool
    provider_id: str
    model: str
    prompt: str
 class OcrTaskSettingsUpdateRequest(BaseModel):
    """Represents OCR task settings updates."""
    enabled: bool | None = None
    provider_id: str | None = None
    model: str | None = None
    prompt: str | None = None
 class SummaryTaskSettingsResponse(BaseModel):
    """Represents summarization task runtime settings."""
    enabled: bool
    provider_id: str
    model: str
    prompt: str
    max_input_tokens: int
 class SummaryTaskSettingsUpdateRequest(BaseModel):
    """Represents summarization task settings updates."""
    enabled: bool | None = None
    provider_id: str | None = None
    model: str | None = None
    prompt: str | None = None
    max_input_tokens: int | None = Field(default=None, ge=512, le=64000)
 class RoutingTaskSettingsResponse(BaseModel):
    """Represents routing task runtime settings for path and tag classification."""
    enabled: bool
    provider_id: str
    model: str
    prompt: str
    neighbor_count: int
    neighbor_min_similarity: float
    auto_apply_confidence_threshold: float
    auto_apply_neighbor_similarity_threshold: float
    neighbor_path_override_enabled: bool
    neighbor_path_override_min_similarity: float
    neighbor_path_override_min_gap: float
    neighbor_path_override_max_confidence: float
 class RoutingTaskSettingsUpdateRequest(BaseModel):
    """Represents routing task settings updates."""
    enabled: bool | None = None
    provider_id: str | None = None
    model: str | None = None
    prompt: str | None = None
    neighbor_count: int | None = Field(default=None, ge=1, le=40)
    neighbor_min_similarity: float | None = Field(default=None, ge=0.0, le=1.0)
    auto_apply_confidence_threshold: float | None = Field(default=None, ge=0.0, le=1.0)
    auto_apply_neighbor_similarity_threshold: float | None = Field(default=None, ge=0.0, le=1.0)
    neighbor_path_override_enabled: bool | None = None
    neighbor_path_override_min_similarity: float | None = Field(default=None, ge=0.0, le=1.0)
    neighbor_path_override_min_gap: float | None = Field(default=None, ge=0.0, le=1.0)
    neighbor_path_override_max_confidence: float | None = Field(default=None, ge=0.0, le=1.0)
 class UploadDefaultsResponse(BaseModel):
    """Represents default upload destination and default tags."""
    logical_path: str
    tags: list[str] = Field(default_factory=list)
 class UploadDefaultsUpdateRequest(BaseModel):
    """Represents updates for default upload destination and default tags."""
    logical_path: str | None = None
    tags: list[str] | None = None
 class DisplaySettingsResponse(BaseModel):
    """Represents document-list display preferences."""
    cards_per_page: int = Field(default=12, ge=1, le=200)
    log_typing_animation_enabled: bool = True
 class DisplaySettingsUpdateRequest(BaseModel):
    """Represents updates for document-list display preferences."""
    cards_per_page: int | None = Field(default=None, ge=1, le=200)
    log_typing_animation_enabled: bool | None = None
 class PredefinedPathEntryResponse(BaseModel):
    """Represents one predefined logical path with global discoverability scope."""
    value: str
    global_shared: bool
 class PredefinedPathEntryUpdateRequest(BaseModel):
    """Represents one predefined logical path create-or-update request."""
    value: str
    global_shared: bool = False
 class PredefinedTagEntryResponse(BaseModel):
    """Represents one predefined tag with global discoverability scope."""
    value: str
    global_shared: bool
 class PredefinedTagEntryUpdateRequest(BaseModel):
    """Represents one predefined tag create-or-update request."""
    value: str
    global_shared: bool = False
 class HandwritingStyleSettingsResponse(BaseModel):
    """Represents handwriting-style clustering settings used by Typesense image embeddings."""
    enabled: bool
    embed_model: str
    neighbor_limit: int
    match_min_similarity: float
    bootstrap_match_min_similarity: float
    bootstrap_sample_size: int
    image_max_side: int
 class HandwritingStyleSettingsUpdateRequest(BaseModel):
    """Represents updates for handwriting-style clustering and match thresholds."""
    enabled: bool | None = None
    embed_model: str | None = None
    neighbor_limit: int | None = Field(default=None, ge=1, le=32)
    match_min_similarity: float | None = Field(default=None, ge=0.0, le=1.0)
    bootstrap_match_min_similarity: float | None = Field(default=None, ge=0.0, le=1.0)
    bootstrap_sample_size: int | None = Field(default=None, ge=1, le=30)
    image_max_side: int | None = Field(default=None, ge=256, le=4096)
 class TaskSettingsResponse(BaseModel):
    """Represents all task-level model bindings and prompt settings."""
    ocr_handwriting: OcrTaskSettingsResponse
    summary_generation: SummaryTaskSettingsResponse
    routing_classification: RoutingTaskSettingsResponse
 class TaskSettingsUpdateRequest(BaseModel):
    """Represents partial updates for task-level settings."""
    ocr_handwriting: OcrTaskSettingsUpdateRequest | None = None
    summary_generation: SummaryTaskSettingsUpdateRequest | None = None
    routing_classification: RoutingTaskSettingsUpdateRequest | None = None
 class AppSettingsResponse(BaseModel):
    """Represents all application settings exposed by the API."""
    upload_defaults: UploadDefaultsResponse
    display: DisplaySettingsResponse
    handwriting_style_clustering: HandwritingStyleSettingsResponse
    predefined_paths: list[PredefinedPathEntryResponse] = Field(default_factory=list)
    predefined_tags: list[PredefinedTagEntryResponse] = Field(default_factory=list)
    providers: list[ProviderSettingsResponse]
    tasks: TaskSettingsResponse
 class AppSettingsUpdateRequest(BaseModel):
    """Represents full settings update input for providers and task bindings."""
    upload_defaults: UploadDefaultsUpdateRequest | None = None
    display: DisplaySettingsUpdateRequest | None = None
    handwriting_style_clustering: HandwritingStyleSettingsUpdateRequest | None = None
    predefined_paths: list[PredefinedPathEntryUpdateRequest] | None = None
    predefined_tags: list[PredefinedTagEntryUpdateRequest] | None = None
    providers: list[ProviderSettingsUpdateRequest] | None = None
    tasks: TaskSettingsUpdateRequest | None = None
 class HandwritingSettingsResponse(BaseModel):
    """Represents legacy handwriting response shape kept for backward compatibility."""
    provider: str = "openai_compatible"
    enabled: bool
    openai_base_url: str
    openai_model: str
    openai_timeout_seconds: int
    openai_api_key_set: bool
    openai_api_key_masked: str = ""
 class HandwritingSettingsUpdateRequest(BaseModel):
    """Represents legacy handwriting update shape kept for backward compatibility."""
    enabled: bool | None = None
    openai_base_url: str | None = None
    openai_model: str | None = None
    openai_timeout_seconds: int | None = Field(default=None, ge=5, le=180)
    openai_api_key: str | None = None
    clear_openai_api_key: bool = False
@@ -0,0 +1 @@
 """Domain services package for storage, extraction, and classification logic."""
@@ -0,0 +1,885 @@
 """Persistent single-user application settings service backed by host-mounted storage."""
 import json
 import re
 from pathlib import Path
 from typing import Any
 from app.core.config import get_settings
 settings = get_settings()
 TASK_OCR_HANDWRITING = "ocr_handwriting"
 TASK_SUMMARY_GENERATION = "summary_generation"
 TASK_ROUTING_CLASSIFICATION = "routing_classification"
 HANDWRITING_STYLE_SETTINGS_KEY = "handwriting_style_clustering"
 PREDEFINED_PATHS_SETTINGS_KEY = "predefined_paths"
 PREDEFINED_TAGS_SETTINGS_KEY = "predefined_tags"
 DEFAULT_HANDWRITING_STYLE_EMBED_MODEL = "ts/clip-vit-b-p32"
 DEFAULT_OCR_PROMPT = (
    "You are an expert at reading messy handwritten notes, including hard-to-read writing.\n"
    "Task: transcribe the handwriting as exactly as possible.\n\n"
    "Rules:\n"
    "- Output ONLY the transcription in German, no commentary.\n"
    "- Preserve original line breaks where they clearly exist.\n"
    "- Do NOT translate or correct grammar or spelling.\n"
    "- If a word or character is unclear, wrap your best guess in [[? ... ?]].\n"
    "- If something is unreadable, write [[?unleserlich?]] in its place."
 )
 DEFAULT_SUMMARY_PROMPT = (
    "You summarize documents for indexing and routing.\n"
    "Return concise markdown with key entities, purpose, and document category hints.\n"
    "Do not invent facts and do not include any explanation outside the summary."
 )
 DEFAULT_ROUTING_PROMPT = (
    "You classify one document into an existing logical path and tags.\n"
    "Prefer existing paths and tags when possible.\n"
    "If the evidence is weak, keep chosen_path as null and use suggestions instead.\n"
    "Return JSON only with this exact shape:\n"
    "{\n"
    "  \"chosen_path\": string | null,\n"
    "  \"chosen_tags\": string[],\n"
    "  \"suggested_new_paths\": string[],\n"
    "  \"suggested_new_tags\": string[],\n"
    "  \"confidence\": number\n"
    "}\n"
    "Confidence must be between 0 and 1."
 )
 def _default_settings() -> dict[str, Any]:
    """Builds default settings including providers and model task bindings."""
    return {
        "upload_defaults": {
            "logical_path": "Inbox",
            "tags": [],
        },
        "display": {
            "cards_per_page": 12,
            "log_typing_animation_enabled": True,
        },
        PREDEFINED_PATHS_SETTINGS_KEY: [],
        PREDEFINED_TAGS_SETTINGS_KEY: [],
        HANDWRITING_STYLE_SETTINGS_KEY: {
            "enabled": True,
            "embed_model": DEFAULT_HANDWRITING_STYLE_EMBED_MODEL,
            "neighbor_limit": 8,
            "match_min_similarity": 0.86,
            "bootstrap_match_min_similarity": 0.89,
            "bootstrap_sample_size": 3,
            "image_max_side": 1024,
        },
        "providers": [
            {
                "id": "openai-default",
                "label": "OpenAI Default",
                "provider_type": "openai_compatible",
                "base_url": settings.default_openai_base_url,
                "timeout_seconds": settings.default_openai_timeout_seconds,
                "api_key": settings.default_openai_api_key,
            }
        ],
        "tasks": {
            TASK_OCR_HANDWRITING: {
                "enabled": settings.default_openai_handwriting_enabled,
                "provider_id": "openai-default",
                "model": settings.default_openai_model,
                "prompt": DEFAULT_OCR_PROMPT,
            },
            TASK_SUMMARY_GENERATION: {
                "enabled": True,
                "provider_id": "openai-default",
                "model": settings.default_summary_model,
                "prompt": DEFAULT_SUMMARY_PROMPT,
                "max_input_tokens": 8000,
            },
            TASK_ROUTING_CLASSIFICATION: {
                "enabled": True,
                "provider_id": "openai-default",
                "model": settings.default_routing_model,
                "prompt": DEFAULT_ROUTING_PROMPT,
                "neighbor_count": 8,
                "neighbor_min_similarity": 0.84,
                "auto_apply_confidence_threshold": 0.78,
                "auto_apply_neighbor_similarity_threshold": 0.55,
                "neighbor_path_override_enabled": True,
                "neighbor_path_override_min_similarity": 0.86,
                "neighbor_path_override_min_gap": 0.04,
                "neighbor_path_override_max_confidence": 0.9,
            },
        },
    }
 def _settings_path() -> Path:
    """Returns the absolute path of the persisted settings file."""
    return settings.storage_root / "settings.json"
 def _clamp_timeout(value: int) -> int:
    """Clamps timeout values to a safe and practical range."""
    return max(5, min(180, value))
 def _clamp_input_tokens(value: int) -> int:
    """Clamps per-request summary input token budget values to practical bounds."""
    return max(512, min(64000, value))
 def _clamp_neighbor_count(value: int) -> int:
    """Clamps nearest-neighbor lookup count for routing classification."""
    return max(1, min(40, value))
 def _clamp_cards_per_page(value: int) -> int:
    """Clamps dashboard cards-per-page display setting to practical bounds."""
    return max(1, min(200, value))
 def _clamp_predefined_entries_limit(value: int) -> int:
    """Clamps maximum count for predefined tag/path catalog entries."""
    return max(1, min(2000, value))
 def _clamp_handwriting_style_neighbor_limit(value: int) -> int:
    """Clamps handwriting-style nearest-neighbor count used for style matching."""
    return max(1, min(32, value))
 def _clamp_handwriting_style_sample_size(value: int) -> int:
    """Clamps handwriting-style bootstrap sample size used for stricter matching."""
    return max(1, min(30, value))
 def _clamp_handwriting_style_image_max_side(value: int) -> int:
    """Clamps handwriting-style image normalization max-side pixel size."""
    return max(256, min(4096, value))
 def _clamp_probability(value: float, fallback: float) -> float:
    """Clamps probability-like numbers to the range [0, 1]."""
    try:
        parsed = float(value)
    except (TypeError, ValueError):
        return fallback
    return max(0.0, min(1.0, parsed))
 def _safe_int(value: Any, fallback: int) -> int:
    """Safely converts arbitrary values to integers with fallback handling."""
    try:
        return int(value)
    except (TypeError, ValueError):
        return fallback
 def _normalize_provider_id(value: str | None, fallback: str) -> str:
    """Normalizes provider identifiers into stable lowercase slug values."""
    candidate = (value or "").strip().lower()
    candidate = re.sub(r"[^a-z0-9_-]+", "-", candidate).strip("-")
    return candidate or fallback
 def _mask_api_key(value: str) -> str:
    """Masks a secret API key while retaining enough characters for identification."""
    if not value:
        return ""
    if len(value) <= 6:
        return "*" * len(value)
    return f"{value[:4]}...{value[-2:]}"
 def _normalize_provider(
    payload: dict[str, Any],
    fallback_id: str,
    fallback_values: dict[str, Any],
 ) -> dict[str, Any]:
    """Normalizes one provider payload to a stable shape with bounds and defaults."""
    defaults = _default_settings()["providers"][0]
    provider_id = _normalize_provider_id(str(payload.get("id", fallback_id)), fallback_id)
    provider_type = str(payload.get("provider_type", fallback_values.get("provider_type", defaults["provider_type"]))).strip()
    if provider_type != "openai_compatible":
        provider_type = "openai_compatible"
    api_key_value = payload.get("api_key", fallback_values.get("api_key", defaults["api_key"]))
    api_key = str(api_key_value).strip() if api_key_value is not None else ""
    return {
        "id": provider_id,
        "label": str(payload.get("label", fallback_values.get("label", provider_id))).strip() or provider_id,
        "provider_type": provider_type,
        "base_url": str(payload.get("base_url", fallback_values.get("base_url", defaults["base_url"]))).strip()
        or defaults["base_url"],
        "timeout_seconds": _clamp_timeout(
            _safe_int(
                payload.get("timeout_seconds", fallback_values.get("timeout_seconds", defaults["timeout_seconds"])),
                defaults["timeout_seconds"],
            )
        ),
        "api_key": api_key,
    }
 def _normalize_ocr_task(payload: dict[str, Any], provider_ids: list[str]) -> dict[str, Any]:
    """Normalizes OCR task settings while enforcing valid provider references."""
    defaults = _default_settings()["tasks"][TASK_OCR_HANDWRITING]
    provider_id = str(payload.get("provider_id", defaults["provider_id"])).strip()
    if provider_id not in provider_ids:
        provider_id = provider_ids[0]
    return {
        "enabled": bool(payload.get("enabled", defaults["enabled"])),
        "provider_id": provider_id,
        "model": str(payload.get("model", defaults["model"])).strip() or defaults["model"],
        "prompt": str(payload.get("prompt", defaults["prompt"])).strip() or defaults["prompt"],
    }
 def _normalize_summary_task(payload: dict[str, Any], provider_ids: list[str]) -> dict[str, Any]:
    """Normalizes summary task settings while enforcing valid provider references."""
    defaults = _default_settings()["tasks"][TASK_SUMMARY_GENERATION]
    provider_id = str(payload.get("provider_id", defaults["provider_id"])).strip()
    if provider_id not in provider_ids:
        provider_id = provider_ids[0]
    raw_max_tokens = payload.get("max_input_tokens")
    if raw_max_tokens is None:
        legacy_chars = _safe_int(payload.get("max_source_chars", 0), 0)
        if legacy_chars > 0:
            raw_max_tokens = max(512, legacy_chars // 4)
        else:
            raw_max_tokens = defaults["max_input_tokens"]
    return {
        "enabled": bool(payload.get("enabled", defaults["enabled"])),
        "provider_id": provider_id,
        "model": str(payload.get("model", defaults["model"])).strip() or defaults["model"],
        "prompt": str(payload.get("prompt", defaults["prompt"])).strip() or defaults["prompt"],
        "max_input_tokens": _clamp_input_tokens(
            _safe_int(raw_max_tokens, defaults["max_input_tokens"])
        ),
    }
 def _normalize_routing_task(payload: dict[str, Any], provider_ids: list[str]) -> dict[str, Any]:
    """Normalizes routing task settings while enforcing valid provider references."""
    defaults = _default_settings()["tasks"][TASK_ROUTING_CLASSIFICATION]
    provider_id = str(payload.get("provider_id", defaults["provider_id"])).strip()
    if provider_id not in provider_ids:
        provider_id = provider_ids[0]
    return {
        "enabled": bool(payload.get("enabled", defaults["enabled"])),
        "provider_id": provider_id,
        "model": str(payload.get("model", defaults["model"])).strip() or defaults["model"],
        "prompt": str(payload.get("prompt", defaults["prompt"])).strip() or defaults["prompt"],
        "neighbor_count": _clamp_neighbor_count(
            _safe_int(payload.get("neighbor_count", defaults["neighbor_count"]), defaults["neighbor_count"])
        ),
        "neighbor_min_similarity": _clamp_probability(
            payload.get("neighbor_min_similarity", defaults["neighbor_min_similarity"]),
            defaults["neighbor_min_similarity"],
        ),
        "auto_apply_confidence_threshold": _clamp_probability(
            payload.get("auto_apply_confidence_threshold", defaults["auto_apply_confidence_threshold"]),
            defaults["auto_apply_confidence_threshold"],
        ),
        "auto_apply_neighbor_similarity_threshold": _clamp_probability(
            payload.get(
                "auto_apply_neighbor_similarity_threshold",
                defaults["auto_apply_neighbor_similarity_threshold"],
            ),
            defaults["auto_apply_neighbor_similarity_threshold"],
        ),
        "neighbor_path_override_enabled": bool(
            payload.get("neighbor_path_override_enabled", defaults["neighbor_path_override_enabled"])
        ),
        "neighbor_path_override_min_similarity": _clamp_probability(
            payload.get(
                "neighbor_path_override_min_similarity",
                defaults["neighbor_path_override_min_similarity"],
            ),
            defaults["neighbor_path_override_min_similarity"],
        ),
        "neighbor_path_override_min_gap": _clamp_probability(
            payload.get("neighbor_path_override_min_gap", defaults["neighbor_path_override_min_gap"]),
            defaults["neighbor_path_override_min_gap"],
        ),
        "neighbor_path_override_max_confidence": _clamp_probability(
            payload.get(
                "neighbor_path_override_max_confidence",
                defaults["neighbor_path_override_max_confidence"],
            ),
            defaults["neighbor_path_override_max_confidence"],
        ),
    }
 def _normalize_tasks(payload: dict[str, Any], provider_ids: list[str]) -> dict[str, Any]:
    """Normalizes task settings map for OCR, summarization, and routing tasks."""
    if not isinstance(payload, dict):
        payload = {}
    return {
        TASK_OCR_HANDWRITING: _normalize_ocr_task(payload.get(TASK_OCR_HANDWRITING, {}), provider_ids),
        TASK_SUMMARY_GENERATION: _normalize_summary_task(payload.get(TASK_SUMMARY_GENERATION, {}), provider_ids),
        TASK_ROUTING_CLASSIFICATION: _normalize_routing_task(payload.get(TASK_ROUTING_CLASSIFICATION, {}), provider_ids),
    }
 def _normalize_upload_defaults(payload: dict[str, Any], defaults: dict[str, Any]) -> dict[str, Any]:
    """Normalizes upload default destination path and tags."""
    if not isinstance(payload, dict):
        payload = {}
    default_path = str(defaults.get("logical_path", "Inbox")).strip() or "Inbox"
    raw_path = str(payload.get("logical_path", default_path)).strip()
    logical_path = raw_path or default_path
    raw_tags = payload.get("tags", defaults.get("tags", []))
    tags: list[str] = []
    seen_lowered: set[str] = set()
    if isinstance(raw_tags, list):
        for raw_tag in raw_tags:
            normalized = str(raw_tag).strip()
            if not normalized:
                continue
            lowered = normalized.lower()
            if lowered in seen_lowered:
                continue
            seen_lowered.add(lowered)
            tags.append(normalized)
            if len(tags) >= 50:
                break
    return {
        "logical_path": logical_path,
        "tags": tags,
    }
 def _normalize_display_settings(payload: dict[str, Any], defaults: dict[str, Any]) -> dict[str, Any]:
    """Normalizes display settings used by the document dashboard UI."""
    if not isinstance(payload, dict):
        payload = {}
    default_cards_per_page = _safe_int(defaults.get("cards_per_page", 12), 12)
    cards_per_page = _clamp_cards_per_page(
        _safe_int(payload.get("cards_per_page", default_cards_per_page), default_cards_per_page)
    )
    return {
        "cards_per_page": cards_per_page,
        "log_typing_animation_enabled": bool(
            payload.get("log_typing_animation_enabled", defaults.get("log_typing_animation_enabled", True))
        ),
    }
 def _normalize_predefined_paths(
    payload: Any,
    existing_items: list[dict[str, Any]] | None = None,
 ) -> list[dict[str, Any]]:
    """Normalizes predefined path entries and enforces irreversible global-sharing flag."""
    existing_map: dict[str, dict[str, Any]] = {}
    if isinstance(existing_items, list):
        for item in existing_items:
            if not isinstance(item, dict):
                continue
            value = str(item.get("value", "")).strip().strip("/")
            if not value:
                continue
            existing_map[value.lower()] = {
                "value": value,
                "global_shared": bool(item.get("global_shared", False)),
            }
    if not isinstance(payload, list):
        return list(existing_map.values())
    normalized: list[dict[str, Any]] = []
    seen: set[str] = set()
    limit = _clamp_predefined_entries_limit(len(payload))
    for item in payload:
        if not isinstance(item, dict):
            continue
        value = str(item.get("value", "")).strip().strip("/")
        if not value:
            continue
        lowered = value.lower()
        if lowered in seen:
            continue
        seen.add(lowered)
        existing = existing_map.get(lowered)
        requested_global = bool(item.get("global_shared", False))
        global_shared = bool(existing.get("global_shared", False) if existing else False) or requested_global
        normalized.append(
            {
                "value": value,
                "global_shared": global_shared,
            }
        )
        if len(normalized) >= limit:
            break
    return normalized
 def _normalize_predefined_tags(
    payload: Any,
    existing_items: list[dict[str, Any]] | None = None,
 ) -> list[dict[str, Any]]:
    """Normalizes predefined tag entries and enforces irreversible global-sharing flag."""
    existing_map: dict[str, dict[str, Any]] = {}
    if isinstance(existing_items, list):
        for item in existing_items:
            if not isinstance(item, dict):
                continue
            value = str(item.get("value", "")).strip()
            if not value:
                continue
            existing_map[value.lower()] = {
                "value": value,
                "global_shared": bool(item.get("global_shared", False)),
            }
    if not isinstance(payload, list):
        return list(existing_map.values())
    normalized: list[dict[str, Any]] = []
    seen: set[str] = set()
    limit = _clamp_predefined_entries_limit(len(payload))
    for item in payload:
        if not isinstance(item, dict):
            continue
        value = str(item.get("value", "")).strip()
        if not value:
            continue
        lowered = value.lower()
        if lowered in seen:
            continue
        seen.add(lowered)
        existing = existing_map.get(lowered)
        requested_global = bool(item.get("global_shared", False))
        global_shared = bool(existing.get("global_shared", False) if existing else False) or requested_global
        normalized.append(
            {
                "value": value,
                "global_shared": global_shared,
            }
        )
        if len(normalized) >= limit:
            break
    return normalized
 def _normalize_handwriting_style_settings(payload: dict[str, Any], defaults: dict[str, Any]) -> dict[str, Any]:
    """Normalizes handwriting-style clustering settings exposed in the settings UI."""
    if not isinstance(payload, dict):
        payload = {}
    default_enabled = bool(defaults.get("enabled", True))
    default_embed_model = str(defaults.get("embed_model", DEFAULT_HANDWRITING_STYLE_EMBED_MODEL)).strip()
    default_neighbor_limit = _safe_int(defaults.get("neighbor_limit", 8), 8)
    default_match_min = _clamp_probability(defaults.get("match_min_similarity", 0.86), 0.86)
    default_bootstrap_match_min = _clamp_probability(defaults.get("bootstrap_match_min_similarity", 0.89), 0.89)
    default_bootstrap_sample_size = _safe_int(defaults.get("bootstrap_sample_size", 3), 3)
    default_image_max_side = _safe_int(defaults.get("image_max_side", 1024), 1024)
    return {
        "enabled": bool(payload.get("enabled", default_enabled)),
        "embed_model": str(payload.get("embed_model", default_embed_model)).strip() or default_embed_model,
        "neighbor_limit": _clamp_handwriting_style_neighbor_limit(
            _safe_int(payload.get("neighbor_limit", default_neighbor_limit), default_neighbor_limit)
        ),
        "match_min_similarity": _clamp_probability(
            payload.get("match_min_similarity", default_match_min),
            default_match_min,
        ),
        "bootstrap_match_min_similarity": _clamp_probability(
            payload.get("bootstrap_match_min_similarity", default_bootstrap_match_min),
            default_bootstrap_match_min,
        ),
        "bootstrap_sample_size": _clamp_handwriting_style_sample_size(
            _safe_int(payload.get("bootstrap_sample_size", default_bootstrap_sample_size), default_bootstrap_sample_size)
        ),
        "image_max_side": _clamp_handwriting_style_image_max_side(
            _safe_int(payload.get("image_max_side", default_image_max_side), default_image_max_side)
        ),
    }
 def _sanitize_settings(payload: dict[str, Any]) -> dict[str, Any]:
    """Sanitizes all persisted settings into a stable normalized structure."""
    if not isinstance(payload, dict):
        payload = {}
    defaults = _default_settings()
    providers_payload = payload.get("providers")
    normalized_providers: list[dict[str, Any]] = []
    seen_provider_ids: set[str] = set()
    if isinstance(providers_payload, list):
        for index, provider_payload in enumerate(providers_payload):
            if not isinstance(provider_payload, dict):
                continue
            fallback = defaults["providers"][0]
            candidate = _normalize_provider(provider_payload, fallback_id=f"provider-{index + 1}", fallback_values=fallback)
            if candidate["id"] in seen_provider_ids:
                continue
            seen_provider_ids.add(candidate["id"])
            normalized_providers.append(candidate)
    if not normalized_providers:
        normalized_providers = [dict(defaults["providers"][0])]
    provider_ids = [provider["id"] for provider in normalized_providers]
    tasks_payload = payload.get("tasks", {})
    normalized_tasks = _normalize_tasks(tasks_payload, provider_ids)
    upload_defaults = _normalize_upload_defaults(payload.get("upload_defaults", {}), defaults["upload_defaults"])
    display_settings = _normalize_display_settings(payload.get("display", {}), defaults["display"])
    predefined_paths = _normalize_predefined_paths(
        payload.get(PREDEFINED_PATHS_SETTINGS_KEY, []),
        existing_items=payload.get(PREDEFINED_PATHS_SETTINGS_KEY, []),
    )
    predefined_tags = _normalize_predefined_tags(
        payload.get(PREDEFINED_TAGS_SETTINGS_KEY, []),
        existing_items=payload.get(PREDEFINED_TAGS_SETTINGS_KEY, []),
    )
    handwriting_style_settings = _normalize_handwriting_style_settings(
        payload.get(HANDWRITING_STYLE_SETTINGS_KEY, {}),
        defaults[HANDWRITING_STYLE_SETTINGS_KEY],
    )
    return {
        "upload_defaults": upload_defaults,
        "display": display_settings,
        PREDEFINED_PATHS_SETTINGS_KEY: predefined_paths,
        PREDEFINED_TAGS_SETTINGS_KEY: predefined_tags,
        HANDWRITING_STYLE_SETTINGS_KEY: handwriting_style_settings,
        "providers": normalized_providers,
        "tasks": normalized_tasks,
    }
 def ensure_app_settings() -> None:
    """Creates a settings file with defaults when no persisted settings are present."""
    path = _settings_path()
    path.parent.mkdir(parents=True, exist_ok=True)
    if path.exists():
        return
    defaults = _sanitize_settings(_default_settings())
    path.write_text(json.dumps(defaults, indent=2), encoding="utf-8")
 def _read_raw_settings() -> dict[str, Any]:
    """Reads persisted settings from disk and returns normalized values."""
    ensure_app_settings()
    path = _settings_path()
    try:
        payload = json.loads(path.read_text(encoding="utf-8"))
    except (OSError, json.JSONDecodeError):
        payload = {}
    return _sanitize_settings(payload)
 def _write_settings(payload: dict[str, Any]) -> None:
    """Persists sanitized settings payload to host-mounted storage."""
    path = _settings_path()
    path.parent.mkdir(parents=True, exist_ok=True)
    path.write_text(json.dumps(payload, indent=2), encoding="utf-8")
 def read_app_settings() -> dict[str, Any]:
    """Reads settings and returns a sanitized view safe for API responses."""
    payload = _read_raw_settings()
    providers_response: list[dict[str, Any]] = []
    for provider in payload["providers"]:
        api_key = str(provider.get("api_key", ""))
        providers_response.append(
            {
                "id": provider["id"],
                "label": provider["label"],
                "provider_type": provider["provider_type"],
                "base_url": provider["base_url"],
                "timeout_seconds": int(provider["timeout_seconds"]),
                "api_key_set": bool(api_key),
                "api_key_masked": _mask_api_key(api_key),
            }
        )
    return {
        "upload_defaults": payload.get("upload_defaults", {"logical_path": "Inbox", "tags": []}),
        "display": payload.get("display", {"cards_per_page": 12, "log_typing_animation_enabled": True}),
        PREDEFINED_PATHS_SETTINGS_KEY: payload.get(PREDEFINED_PATHS_SETTINGS_KEY, []),
        PREDEFINED_TAGS_SETTINGS_KEY: payload.get(PREDEFINED_TAGS_SETTINGS_KEY, []),
        HANDWRITING_STYLE_SETTINGS_KEY: payload.get(HANDWRITING_STYLE_SETTINGS_KEY, {}),
        "providers": providers_response,
        "tasks": payload["tasks"],
    }
 def reset_app_settings() -> dict[str, Any]:
    """Resets persisted application settings to sanitized repository defaults."""
    defaults = _sanitize_settings(_default_settings())
    _write_settings(defaults)
    return read_app_settings()
 def read_task_runtime_settings(task_name: str) -> dict[str, Any]:
    """Returns runtime task settings and resolved provider including secret values."""
    payload = _read_raw_settings()
    tasks = payload["tasks"]
    if task_name not in tasks:
        raise KeyError(f"Unknown task settings key: {task_name}")
    task = dict(tasks[task_name])
    provider_map = {provider["id"]: provider for provider in payload["providers"]}
    provider = provider_map.get(task.get("provider_id"))
    if provider is None:
        provider = payload["providers"][0]
        task["provider_id"] = provider["id"]
    return {
        "task": task,
        "provider": dict(provider),
    }
 def update_app_settings(
    providers: list[dict[str, Any]] | None = None,
    tasks: dict[str, dict[str, Any]] | None = None,
    upload_defaults: dict[str, Any] | None = None,
    display: dict[str, Any] | None = None,
    handwriting_style: dict[str, Any] | None = None,
    predefined_paths: list[dict[str, Any]] | None = None,
    predefined_tags: list[dict[str, Any]] | None = None,
 ) -> dict[str, Any]:
    """Updates app settings, persists them, and returns API-safe values."""
    current_payload = _read_raw_settings()
    next_payload: dict[str, Any] = {
        "upload_defaults": dict(current_payload.get("upload_defaults", {"logical_path": "Inbox", "tags": []})),
        "display": dict(current_payload.get("display", {"cards_per_page": 12, "log_typing_animation_enabled": True})),
        PREDEFINED_PATHS_SETTINGS_KEY: list(current_payload.get(PREDEFINED_PATHS_SETTINGS_KEY, [])),
        PREDEFINED_TAGS_SETTINGS_KEY: list(current_payload.get(PREDEFINED_TAGS_SETTINGS_KEY, [])),
        HANDWRITING_STYLE_SETTINGS_KEY: dict(
            current_payload.get(HANDWRITING_STYLE_SETTINGS_KEY, _default_settings()[HANDWRITING_STYLE_SETTINGS_KEY])
        ),
        "providers": list(current_payload["providers"]),
        "tasks": dict(current_payload["tasks"]),
    }
    if providers is not None:
        existing_provider_map = {provider["id"]: provider for provider in current_payload["providers"]}
        next_providers: list[dict[str, Any]] = []
        for index, provider_payload in enumerate(providers):
            if not isinstance(provider_payload, dict):
                continue
            provider_id = _normalize_provider_id(
                str(provider_payload.get("id", "")),
                fallback=f"provider-{index + 1}",
            )
            existing_provider = existing_provider_map.get(provider_id, {})
            merged_payload = dict(provider_payload)
            merged_payload["id"] = provider_id
            if bool(provider_payload.get("clear_api_key", False)):
                merged_payload["api_key"] = ""
            elif "api_key" in provider_payload and provider_payload.get("api_key") is not None:
                merged_payload["api_key"] = str(provider_payload.get("api_key")).strip()
            else:
                merged_payload["api_key"] = str(existing_provider.get("api_key", ""))
            normalized_provider = _normalize_provider(
                merged_payload,
                fallback_id=provider_id,
                fallback_values=existing_provider,
            )
            next_providers.append(normalized_provider)
        if next_providers:
            next_payload["providers"] = next_providers
    if tasks is not None:
        merged_tasks = dict(current_payload["tasks"])
        for task_name, task_update in tasks.items():
            if task_name not in merged_tasks or not isinstance(task_update, dict):
                continue
            existing_task = dict(merged_tasks[task_name])
            for key, value in task_update.items():
                if value is None:
                    continue
                existing_task[key] = value
            merged_tasks[task_name] = existing_task
        next_payload["tasks"] = merged_tasks
    if upload_defaults is not None and isinstance(upload_defaults, dict):
        next_upload_defaults = dict(next_payload.get("upload_defaults", {}))
        for key in ("logical_path", "tags"):
            if key in upload_defaults:
                next_upload_defaults[key] = upload_defaults[key]
        next_payload["upload_defaults"] = next_upload_defaults
    if display is not None and isinstance(display, dict):
        next_display = dict(next_payload.get("display", {}))
        if "cards_per_page" in display:
            next_display["cards_per_page"] = display["cards_per_page"]
        if "log_typing_animation_enabled" in display:
            next_display["log_typing_animation_enabled"] = bool(display["log_typing_animation_enabled"])
        next_payload["display"] = next_display
    if handwriting_style is not None and isinstance(handwriting_style, dict):
        next_handwriting_style = dict(next_payload.get(HANDWRITING_STYLE_SETTINGS_KEY, {}))
        for key in (
            "enabled",
            "embed_model",
            "neighbor_limit",
            "match_min_similarity",
            "bootstrap_match_min_similarity",
            "bootstrap_sample_size",
            "image_max_side",
        ):
            if key in handwriting_style:
                next_handwriting_style[key] = handwriting_style[key]
        next_payload[HANDWRITING_STYLE_SETTINGS_KEY] = next_handwriting_style
    if predefined_paths is not None:
        next_payload[PREDEFINED_PATHS_SETTINGS_KEY] = _normalize_predefined_paths(
            predefined_paths,
            existing_items=next_payload.get(PREDEFINED_PATHS_SETTINGS_KEY, []),
        )
    if predefined_tags is not None:
        next_payload[PREDEFINED_TAGS_SETTINGS_KEY] = _normalize_predefined_tags(
            predefined_tags,
            existing_items=next_payload.get(PREDEFINED_TAGS_SETTINGS_KEY, []),
        )
    sanitized = _sanitize_settings(next_payload)
    _write_settings(sanitized)
    return read_app_settings()
 def read_handwriting_provider_settings() -> dict[str, Any]:
    """Returns OCR settings in legacy shape for the handwriting transcription service."""
    runtime = read_task_runtime_settings(TASK_OCR_HANDWRITING)
    provider = runtime["provider"]
    task = runtime["task"]
    return {
        "provider": provider["provider_type"],
        "enabled": bool(task.get("enabled", True)),
        "openai_base_url": str(provider.get("base_url", settings.default_openai_base_url)),
        "openai_model": str(task.get("model", settings.default_openai_model)),
        "openai_timeout_seconds": int(provider.get("timeout_seconds", settings.default_openai_timeout_seconds)),
        "openai_api_key": str(provider.get("api_key", "")),
        "prompt": str(task.get("prompt", DEFAULT_OCR_PROMPT)),
        "provider_id": str(provider.get("id", "openai-default")),
    }
 def read_handwriting_style_settings() -> dict[str, Any]:
    """Returns handwriting-style clustering settings for Typesense style assignment logic."""
    payload = _read_raw_settings()
    defaults = _default_settings()[HANDWRITING_STYLE_SETTINGS_KEY]
    return _normalize_handwriting_style_settings(
        payload.get(HANDWRITING_STYLE_SETTINGS_KEY, {}),
        defaults,
    )
 def read_predefined_paths_settings() -> list[dict[str, Any]]:
    """Returns normalized predefined logical path catalog entries."""
    payload = _read_raw_settings()
    return _normalize_predefined_paths(
        payload.get(PREDEFINED_PATHS_SETTINGS_KEY, []),
        existing_items=payload.get(PREDEFINED_PATHS_SETTINGS_KEY, []),
    )
 def read_predefined_tags_settings() -> list[dict[str, Any]]:
    """Returns normalized predefined tag catalog entries."""
    payload = _read_raw_settings()
    return _normalize_predefined_tags(
        payload.get(PREDEFINED_TAGS_SETTINGS_KEY, []),
        existing_items=payload.get(PREDEFINED_TAGS_SETTINGS_KEY, []),
    )
 def update_handwriting_settings(
    enabled: bool | None = None,
    openai_base_url: str | None = None,
    openai_model: str | None = None,
    openai_timeout_seconds: int | None = None,
    openai_api_key: str | None = None,
    clear_openai_api_key: bool = False,
 ) -> dict[str, Any]:
    """Updates OCR task and bound provider values using the legacy handwriting API contract."""
    runtime = read_task_runtime_settings(TASK_OCR_HANDWRITING)
    provider = runtime["provider"]
    provider_update: dict[str, Any] = {
        "id": provider["id"],
        "label": provider["label"],
        "provider_type": provider["provider_type"],
        "base_url": openai_base_url if openai_base_url is not None else provider["base_url"],
        "timeout_seconds": openai_timeout_seconds if openai_timeout_seconds is not None else provider["timeout_seconds"],
    }
    if clear_openai_api_key:
        provider_update["clear_api_key"] = True
    elif openai_api_key is not None:
        provider_update["api_key"] = openai_api_key
    tasks_update: dict[str, dict[str, Any]] = {TASK_OCR_HANDWRITING: {}}
    if enabled is not None:
        tasks_update[TASK_OCR_HANDWRITING]["enabled"] = enabled
    if openai_model is not None:
        tasks_update[TASK_OCR_HANDWRITING]["model"] = openai_model
    return update_app_settings(
        providers=[provider_update],
        tasks=tasks_update,
    )
@@ -0,0 +1,315 @@
 """Document extraction service for text indexing, previews, and archive fan-out."""
 import io
 import re
 import zipfile
 from dataclasses import dataclass, field
 from pathlib import Path
 import magic
 from docx import Document as DocxDocument
 from openpyxl import load_workbook
 from PIL import Image, ImageOps
 from pypdf import PdfReader
 import pymupdf
 from app.core.config import get_settings
 from app.services.handwriting import (
    IMAGE_TEXT_TYPE_NO_TEXT,
    IMAGE_TEXT_TYPE_UNKNOWN,
    HandwritingTranscriptionError,
    HandwritingTranscriptionNotConfiguredError,
    HandwritingTranscriptionTimeoutError,
    classify_image_text_bytes,
    transcribe_handwriting_bytes,
 )
 settings = get_settings()
 IMAGE_EXTENSIONS = {
    ".jpg",
    ".jpeg",
    ".png",
    ".tif",
    ".tiff",
    ".bmp",
    ".gif",
    ".webp",
    ".heic",
 }
 SUPPORTED_TEXT_EXTENSIONS = {
    ".txt",
    ".md",
    ".csv",
    ".json",
    ".xml",
    ".svg",
    ".pdf",
    ".docx",
    ".xlsx",
    *IMAGE_EXTENSIONS,
 }
@dataclass
 class ExtractionResult:
    """Represents output generated during extraction for a single file."""
    text: str
    preview_bytes: bytes | None
    preview_suffix: str | None
    status: str
    metadata_json: dict[str, object] = field(default_factory=dict)
@dataclass
 class ArchiveMember:
    """Represents an extracted file entry from an archive."""
    name: str
    data: bytes
 def sniff_mime(data: bytes) -> str:
    """Detects MIME type using libmagic for robust format handling."""
    return magic.from_buffer(data, mime=True) or "application/octet-stream"
 def is_supported_for_extraction(extension: str, mime_type: str) -> bool:
    """Determines if a file should be text-processed for indexing and classification."""
    return extension in SUPPORTED_TEXT_EXTENSIONS or mime_type.startswith("text/")
 def _normalize_text(text: str) -> str:
    """Normalizes extracted text by removing repeated form separators and controls."""
    cleaned = text.replace("\r", "\n").replace("\x00", "")
    lines: list[str] = []
    for line in cleaned.split("\n"):
        stripped = line.strip()
        if stripped and re.fullmatch(r"[.\-_*=~\s]{4,}", stripped):
            continue
        lines.append(line)
    normalized = "\n".join(lines)
    normalized = re.sub(r"\n{3,}", "\n\n", normalized)
    return normalized.strip()
 def _extract_pdf_text(data: bytes) -> str:
    """Extracts text from PDF bytes using pypdf page parsing."""
    reader = PdfReader(io.BytesIO(data))
    pages: list[str] = []
    for page in reader.pages:
        pages.append(page.extract_text() or "")
    return _normalize_text("\n".join(pages))
 def _extract_pdf_preview(data: bytes) -> tuple[bytes | None, str | None]:
    """Creates a JPEG thumbnail preview from the first PDF page."""
    try:
        document = pymupdf.open(stream=data, filetype="pdf")
    except Exception:
        return None, None
    try:
        if document.page_count < 1:
            return None, None
        page = document.load_page(0)
        pixmap = page.get_pixmap(matrix=pymupdf.Matrix(1.5, 1.5), alpha=False)
        return pixmap.tobytes("jpeg"), ".jpg"
    except Exception:
        return None, None
    finally:
        document.close()
 def _extract_docx_text(data: bytes) -> str:
    """Extracts paragraph text from DOCX content."""
    document = DocxDocument(io.BytesIO(data))
    return _normalize_text("\n".join(paragraph.text for paragraph in document.paragraphs if paragraph.text))
 def _extract_xlsx_text(data: bytes) -> str:
    """Extracts cell text from XLSX workbook sheets for indexing."""
    workbook = load_workbook(io.BytesIO(data), data_only=True, read_only=True)
    chunks: list[str] = []
    for sheet in workbook.worksheets:
        chunks.append(sheet.title)
        row_count = 0
        for row in sheet.iter_rows(min_row=1, max_row=200):
            row_values = [str(cell.value) for cell in row if cell.value is not None]
            if row_values:
                chunks.append(" ".join(row_values))
            row_count += 1
            if row_count >= 200:
                break
    return _normalize_text("\n".join(chunks))
 def _build_image_preview(data: bytes) -> tuple[bytes | None, str | None]:
    """Builds a JPEG preview thumbnail for image files."""
    try:
        with Image.open(io.BytesIO(data)) as image:
            preview = ImageOps.exif_transpose(image).convert("RGB")
            preview.thumbnail((600, 600))
            output = io.BytesIO()
            preview.save(output, format="JPEG", optimize=True, quality=82)
            return output.getvalue(), ".jpg"
    except Exception:
        return None, None
 def _extract_handwriting_text(data: bytes, mime_type: str) -> ExtractionResult:
    """Extracts text from image bytes and records handwriting-vs-printed classification metadata."""
    preview_bytes, preview_suffix = _build_image_preview(data)
    metadata_json: dict[str, object] = {}
    try:
        text_type = classify_image_text_bytes(data, mime_type=mime_type)
        metadata_json = {
            "image_text_type": text_type.label,
            "image_text_type_confidence": text_type.confidence,
            "image_text_type_provider": text_type.provider,
            "image_text_type_model": text_type.model,
        }
    except HandwritingTranscriptionNotConfiguredError as error:
        return ExtractionResult(
            text="",
            preview_bytes=preview_bytes,
            preview_suffix=preview_suffix,
            status="unsupported",
            metadata_json={"transcription_error": str(error), "image_text_type": IMAGE_TEXT_TYPE_UNKNOWN},
        )
    except HandwritingTranscriptionTimeoutError as error:
        metadata_json = {
            "image_text_type": IMAGE_TEXT_TYPE_UNKNOWN,
            "image_text_type_error": str(error),
        }
    except HandwritingTranscriptionError as error:
        metadata_json = {
            "image_text_type": IMAGE_TEXT_TYPE_UNKNOWN,
            "image_text_type_error": str(error),
        }
    if metadata_json.get("image_text_type") == IMAGE_TEXT_TYPE_NO_TEXT:
        metadata_json["transcription_skipped"] = "no_text_detected"
        return ExtractionResult(
            text="",
            preview_bytes=preview_bytes,
            preview_suffix=preview_suffix,
            status="processed",
            metadata_json=metadata_json,
        )
    try:
        transcription = transcribe_handwriting_bytes(data, mime_type=mime_type)
        transcription_metadata: dict[str, object] = {
            "transcription_provider": transcription.provider,
            "transcription_model": transcription.model,
            "transcription_uncertainties": transcription.uncertainties,
        }
        return ExtractionResult(
            text=_normalize_text(transcription.text),
            preview_bytes=preview_bytes,
            preview_suffix=preview_suffix,
            status="processed",
            metadata_json={**metadata_json, **transcription_metadata},
        )
    except HandwritingTranscriptionNotConfiguredError as error:
        return ExtractionResult(
            text="",
            preview_bytes=preview_bytes,
            preview_suffix=preview_suffix,
            status="unsupported",
            metadata_json={**metadata_json, "transcription_error": str(error)},
        )
    except HandwritingTranscriptionTimeoutError as error:
        return ExtractionResult(
            text="",
            preview_bytes=preview_bytes,
            preview_suffix=preview_suffix,
            status="error",
            metadata_json={**metadata_json, "transcription_error": str(error)},
        )
    except HandwritingTranscriptionError as error:
        return ExtractionResult(
            text="",
            preview_bytes=preview_bytes,
            preview_suffix=preview_suffix,
            status="error",
            metadata_json={**metadata_json, "transcription_error": str(error)},
        )
 def extract_text_content(filename: str, data: bytes, mime_type: str) -> ExtractionResult:
    """Extracts text and optional preview bytes for supported file types."""
    extension = Path(filename).suffix.lower()
    text = ""
    preview_bytes: bytes | None = None
    preview_suffix: str | None = None
    try:
        if extension == ".pdf":
            text = _extract_pdf_text(data)
            preview_bytes, preview_suffix = _extract_pdf_preview(data)
        elif extension in {".txt", ".md", ".csv", ".json", ".xml", ".svg"} or mime_type.startswith("text/"):
            text = _normalize_text(data.decode("utf-8", errors="ignore"))
        elif extension == ".docx":
            text = _extract_docx_text(data)
        elif extension == ".xlsx":
            text = _extract_xlsx_text(data)
        elif extension in IMAGE_EXTENSIONS:
            return _extract_handwriting_text(data=data, mime_type=mime_type)
        else:
            return ExtractionResult(
                text="",
                preview_bytes=None,
                preview_suffix=None,
                status="unsupported",
                metadata_json={"reason": "unsupported_format"},
            )
    except Exception as error:
        return ExtractionResult(
            text="",
            preview_bytes=None,
            preview_suffix=None,
            status="error",
            metadata_json={"reason": "extraction_exception", "error": str(error)},
        )
    return ExtractionResult(
        text=text[: settings.max_text_length],
        preview_bytes=preview_bytes,
        preview_suffix=preview_suffix,
        status="processed",
        metadata_json={},
    )
 def extract_archive_members(data: bytes, depth: int = 0) -> list[ArchiveMember]:
    """Extracts processable members from zip archives with configurable depth limits."""
    members: list[ArchiveMember] = []
    if depth > settings.max_zip_depth:
        return members
    with zipfile.ZipFile(io.BytesIO(data)) as archive:
        infos = [info for info in archive.infolist() if not info.is_dir()][: settings.max_zip_members]
        for info in infos:
            member_data = archive.read(info.filename)
            members.append(ArchiveMember(name=info.filename, data=member_data))
    return members
@@ -0,0 +1,477 @@
 """Handwriting transcription service using OpenAI-compatible vision models."""
 import base64
 import io
 import json
 import re
 from dataclasses import dataclass
 from typing import Any
 from openai import APIConnectionError, APIError, APITimeoutError, OpenAI
 from PIL import Image, ImageOps
 from app.services.app_settings import DEFAULT_OCR_PROMPT, read_handwriting_provider_settings
 MAX_IMAGE_SIDE = 2000
 IMAGE_TEXT_TYPE_HANDWRITING = "handwriting"
 IMAGE_TEXT_TYPE_PRINTED = "printed_text"
 IMAGE_TEXT_TYPE_NO_TEXT = "no_text"
 IMAGE_TEXT_TYPE_UNKNOWN = "unknown"
 IMAGE_TEXT_CLASSIFICATION_PROMPT = (
    "Classify the text content in this image.\n"
    "Choose exactly one label from: handwriting, printed_text, no_text.\n"
    "Definitions:\n"
    "- handwriting: text exists and most readable text is handwritten.\n"
    "- printed_text: text exists and most readable text is machine printed or typed.\n"
    "- no_text: no readable text is present.\n"
    "Return strict JSON only with shape:\n"
    "{\n"
    '  "label": "handwriting|printed_text|no_text",\n'
    '  "confidence": number\n'
    "}\n"
    "Confidence must be between 0 and 1."
 )
 class HandwritingTranscriptionError(Exception):
    """Raised when handwriting transcription fails for a non-timeout reason."""
 class HandwritingTranscriptionTimeoutError(HandwritingTranscriptionError):
    """Raised when handwriting transcription exceeds the configured timeout."""
 class HandwritingTranscriptionNotConfiguredError(HandwritingTranscriptionError):
    """Raised when handwriting transcription is disabled or missing credentials."""
@dataclass
 class HandwritingTranscription:
    """Represents transcription output and uncertainty markers."""
    text: str
    uncertainties: list[str]
    provider: str
    model: str
@dataclass
 class ImageTextClassification:
    """Represents model classification of image text modality for one image."""
    label: str
    confidence: float
    provider: str
    model: str
 def _extract_uncertainties(text: str) -> list[str]:
    """Extracts uncertainty markers from transcription output."""
    matches = re.findall(r"\[\[\?(.*?)\?\]\]", text)
    return [match.strip() for match in matches if match.strip()]
 def _coerce_json_object(payload: str) -> dict[str, Any]:
    """Parses and extracts a JSON object from raw model output text."""
    text = payload.strip()
    if not text:
        return {}
    try:
        parsed = json.loads(text)
        if isinstance(parsed, dict):
            return parsed
    except json.JSONDecodeError:
        pass
    fenced = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, flags=re.DOTALL | re.IGNORECASE)
    if fenced:
        try:
            parsed = json.loads(fenced.group(1))
            if isinstance(parsed, dict):
                return parsed
        except json.JSONDecodeError:
            pass
    first_brace = text.find("{")
    last_brace = text.rfind("}")
    if first_brace >= 0 and last_brace > first_brace:
        candidate = text[first_brace : last_brace + 1]
        try:
            parsed = json.loads(candidate)
            if isinstance(parsed, dict):
                return parsed
        except json.JSONDecodeError:
            return {}
    return {}
 def _clamp_probability(value: Any, fallback: float = 0.0) -> float:
    """Clamps confidence-like values to the inclusive [0, 1] range."""
    try:
        parsed = float(value)
    except (TypeError, ValueError):
        return fallback
    return max(0.0, min(1.0, parsed))
 def _normalize_image_text_type(label: str) -> str:
    """Normalizes classifier labels into one supported canonical image text type."""
    normalized = label.strip().lower().replace("-", "_").replace(" ", "_")
    if normalized in {IMAGE_TEXT_TYPE_HANDWRITING, "handwritten", "handwritten_text"}:
        return IMAGE_TEXT_TYPE_HANDWRITING
    if normalized in {IMAGE_TEXT_TYPE_PRINTED, "printed", "typed", "machine_text"}:
        return IMAGE_TEXT_TYPE_PRINTED
    if normalized in {IMAGE_TEXT_TYPE_NO_TEXT, "no-text", "none", "no readable text"}:
        return IMAGE_TEXT_TYPE_NO_TEXT
    return IMAGE_TEXT_TYPE_UNKNOWN
 def _normalize_image_bytes(image_data: bytes) -> tuple[bytes, str]:
    """Applies EXIF rotation and scales large images down for efficient transcription."""
    with Image.open(io.BytesIO(image_data)) as image:
        rotated = ImageOps.exif_transpose(image)
        prepared = rotated.convert("RGB")
        long_side = max(prepared.width, prepared.height)
        if long_side > MAX_IMAGE_SIDE:
            scale = MAX_IMAGE_SIDE / long_side
            resized_width = max(1, int(prepared.width * scale))
            resized_height = max(1, int(prepared.height * scale))
            prepared = prepared.resize((resized_width, resized_height), Image.Resampling.LANCZOS)
        output = io.BytesIO()
        prepared.save(output, format="JPEG", quality=90, optimize=True)
        return output.getvalue(), "image/jpeg"
 def _create_client(provider_settings: dict[str, Any]) -> OpenAI:
    """Creates an OpenAI client configured for compatible endpoints and timeouts."""
    api_key = str(provider_settings.get("openai_api_key", "")).strip() or "no-key-required"
    return OpenAI(
        api_key=api_key,
        base_url=str(provider_settings["openai_base_url"]),
        timeout=int(provider_settings["openai_timeout_seconds"]),
    )
 def _extract_text_from_response(response: Any) -> str:
    """Extracts plain text from responses API output objects."""
    output_text = getattr(response, "output_text", None)
    if isinstance(output_text, str) and output_text.strip():
        return output_text.strip()
    output_items = getattr(response, "output", None)
    if not isinstance(output_items, list):
        return ""
    texts: list[str] = []
    for item in output_items:
        item_data = item.model_dump() if hasattr(item, "model_dump") else item
        if not isinstance(item_data, dict):
            continue
        item_type = item_data.get("type")
        if item_type == "output_text":
            text = str(item_data.get("text", "")).strip()
            if text:
                texts.append(text)
        if item_type == "message":
            for content in item_data.get("content", []) or []:
                if not isinstance(content, dict):
                    continue
                if content.get("type") in {"output_text", "text"}:
                    text = str(content.get("text", "")).strip()
                    if text:
                        texts.append(text)
    return "\n".join(texts).strip()
 def _transcribe_with_responses(client: OpenAI, model: str, prompt: str, image_data_url: str) -> str:
    """Transcribes handwriting using the responses API."""
    response = client.responses.create(
        model=model,
        input=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "input_text",
                        "text": prompt,
                    },
                    {
                        "type": "input_image",
                        "image_url": image_data_url,
                        "detail": "high",
                    },
                ],
            }
        ],
    )
    return _extract_text_from_response(response)
 def _transcribe_with_chat(client: OpenAI, model: str, prompt: str, image_data_url: str) -> str:
    """Transcribes handwriting using chat completions for endpoint compatibility."""
    response = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": prompt,
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": image_data_url,
                            "detail": "high",
                        },
                    },
                ],
            }
        ],
    )
    message_content = response.choices[0].message.content
    if isinstance(message_content, str):
        return message_content.strip()
    if isinstance(message_content, list):
        text_parts: list[str] = []
        for part in message_content:
            if isinstance(part, dict):
                text = str(part.get("text", "")).strip()
                if text:
                    text_parts.append(text)
        return "\n".join(text_parts).strip()
    return ""
 def _classify_with_responses(client: OpenAI, model: str, prompt: str, image_data_url: str) -> str:
    """Classifies image text modality using the responses API."""
    response = client.responses.create(
        model=model,
        input=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "input_text",
                        "text": prompt,
                    },
                    {
                        "type": "input_image",
                        "image_url": image_data_url,
                        "detail": "high",
                    },
                ],
            }
        ],
    )
    return _extract_text_from_response(response)
 def _classify_with_chat(client: OpenAI, model: str, prompt: str, image_data_url: str) -> str:
    """Classifies image text modality using chat completions for compatibility."""
    response = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": prompt,
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": image_data_url,
                            "detail": "high",
                        },
                    },
                ],
            }
        ],
    )
    message_content = response.choices[0].message.content
    if isinstance(message_content, str):
        return message_content.strip()
    if isinstance(message_content, list):
        text_parts: list[str] = []
        for part in message_content:
            if isinstance(part, dict):
                text = str(part.get("text", "")).strip()
                if text:
                    text_parts.append(text)
        return "\n".join(text_parts).strip()
    return ""
 def _classify_image_text_data_url(image_data_url: str) -> ImageTextClassification:
    """Classifies an image as handwriting, printed text, or no text."""
    provider_settings = read_handwriting_provider_settings()
    provider_type = str(provider_settings.get("provider", "openai_compatible")).strip()
    if provider_type != "openai_compatible":
        raise HandwritingTranscriptionError(f"unsupported_provider_type:{provider_type}")
    if not bool(provider_settings.get("enabled", True)):
        raise HandwritingTranscriptionNotConfiguredError("handwriting_transcription_disabled")
    model = str(provider_settings.get("openai_model", "gpt-4.1-mini")).strip() or "gpt-4.1-mini"
    client = _create_client(provider_settings)
    try:
        output_text = _classify_with_responses(
            client=client,
            model=model,
            prompt=IMAGE_TEXT_CLASSIFICATION_PROMPT,
            image_data_url=image_data_url,
        )
        if not output_text:
            output_text = _classify_with_chat(
                client=client,
                model=model,
                prompt=IMAGE_TEXT_CLASSIFICATION_PROMPT,
                image_data_url=image_data_url,
            )
    except APITimeoutError as error:
        raise HandwritingTranscriptionTimeoutError("openai_request_timeout") from error
    except (APIConnectionError, APIError):
        try:
            output_text = _classify_with_chat(
                client=client,
                model=model,
                prompt=IMAGE_TEXT_CLASSIFICATION_PROMPT,
                image_data_url=image_data_url,
            )
        except APITimeoutError as timeout_error:
            raise HandwritingTranscriptionTimeoutError("openai_request_timeout") from timeout_error
        except Exception as fallback_error:
            raise HandwritingTranscriptionError(str(fallback_error)) from fallback_error
    except Exception as error:
        raise HandwritingTranscriptionError(str(error)) from error
    parsed = _coerce_json_object(output_text)
    if not parsed:
        raise HandwritingTranscriptionError("image_text_classification_parse_failed")
    label = _normalize_image_text_type(str(parsed.get("label", "")))
    confidence = _clamp_probability(parsed.get("confidence", 0.0), fallback=0.0)
    return ImageTextClassification(
        label=label,
        confidence=confidence,
        provider="openai",
        model=model,
    )
 def _transcribe_image_data_url(image_data_url: str) -> HandwritingTranscription:
    """Transcribes a handwriting image data URL with configured OpenAI provider settings."""
    provider_settings = read_handwriting_provider_settings()
    provider_type = str(provider_settings.get("provider", "openai_compatible")).strip()
    if provider_type != "openai_compatible":
        raise HandwritingTranscriptionError(f"unsupported_provider_type:{provider_type}")
    if not bool(provider_settings.get("enabled", True)):
        raise HandwritingTranscriptionNotConfiguredError("handwriting_transcription_disabled")
    model = str(provider_settings.get("openai_model", "gpt-4.1-mini")).strip() or "gpt-4.1-mini"
    prompt = str(provider_settings.get("prompt", DEFAULT_OCR_PROMPT)).strip() or DEFAULT_OCR_PROMPT
    client = _create_client(provider_settings)
    try:
        text = _transcribe_with_responses(client=client, model=model, prompt=prompt, image_data_url=image_data_url)
        if not text:
            text = _transcribe_with_chat(client=client, model=model, prompt=prompt, image_data_url=image_data_url)
    except APITimeoutError as error:
        raise HandwritingTranscriptionTimeoutError("openai_request_timeout") from error
    except (APIConnectionError, APIError) as error:
        try:
            text = _transcribe_with_chat(client=client, model=model, prompt=prompt, image_data_url=image_data_url)
        except APITimeoutError as timeout_error:
            raise HandwritingTranscriptionTimeoutError("openai_request_timeout") from timeout_error
        except Exception as fallback_error:
            raise HandwritingTranscriptionError(str(fallback_error)) from fallback_error
    except Exception as error:
        raise HandwritingTranscriptionError(str(error)) from error
    final_text = text.strip()
    return HandwritingTranscription(
        text=final_text,
        uncertainties=_extract_uncertainties(final_text),
        provider="openai",
        model=model,
    )
 def transcribe_handwriting_base64(image_base64: str, mime_type: str = "image/jpeg") -> HandwritingTranscription:
    """Transcribes handwriting from a base64 payload without data URL prefix."""
    normalized_mime = mime_type.strip().lower() if mime_type.strip() else "image/jpeg"
    image_data_url = f"data:{normalized_mime};base64,{image_base64}"
    return _transcribe_image_data_url(image_data_url)
 def transcribe_handwriting_url(image_url: str) -> HandwritingTranscription:
    """Transcribes handwriting from a direct image URL."""
    return _transcribe_image_data_url(image_url)
 def transcribe_handwriting_bytes(image_data: bytes, mime_type: str = "image/jpeg") -> HandwritingTranscription:
    """Transcribes handwriting from raw image bytes after normalization."""
    normalized_bytes, normalized_mime = _normalize_image_bytes(image_data)
    encoded = base64.b64encode(normalized_bytes).decode("ascii")
    return transcribe_handwriting_base64(encoded, mime_type=normalized_mime)
 def classify_image_text_base64(image_base64: str, mime_type: str = "image/jpeg") -> ImageTextClassification:
    """Classifies image text type from a base64 payload without data URL prefix."""
    normalized_mime = mime_type.strip().lower() if mime_type.strip() else "image/jpeg"
    image_data_url = f"data:{normalized_mime};base64,{image_base64}"
    return _classify_image_text_data_url(image_data_url)
 def classify_image_text_url(image_url: str) -> ImageTextClassification:
    """Classifies image text type from a direct image URL."""
    return _classify_image_text_data_url(image_url)
 def classify_image_text_bytes(image_data: bytes, mime_type: str = "image/jpeg") -> ImageTextClassification:
    """Classifies image text type from raw image bytes after normalization."""
    normalized_bytes, normalized_mime = _normalize_image_bytes(image_data)
    encoded = base64.b64encode(normalized_bytes).decode("ascii")
    return classify_image_text_base64(encoded, mime_type=normalized_mime)
 def transcribe_handwriting(image: bytes | str, mime_type: str = "image/jpeg") -> HandwritingTranscription:
    """Transcribes handwriting from bytes, base64 text, or URL input."""
    if isinstance(image, bytes):
        return transcribe_handwriting_bytes(image, mime_type=mime_type)
    stripped = image.strip()
    if stripped.startswith("http://") or stripped.startswith("https://"):
        return transcribe_handwriting_url(stripped)
    return transcribe_handwriting_base64(stripped, mime_type=mime_type)
@@ -0,0 +1,435 @@
 """Handwriting-style clustering and style-scoped path composition for image documents."""
 import base64
 import io
 import re
 from dataclasses import dataclass
 from typing import Any
 from PIL import Image, ImageOps
 from sqlalchemy import func, select
 from sqlalchemy.orm import Session
 from app.core.config import get_settings
 from app.models.document import Document, DocumentStatus
 from app.services.app_settings import (
    DEFAULT_HANDWRITING_STYLE_EMBED_MODEL,
    read_handwriting_style_settings,
 )
 from app.services.typesense_index import get_typesense_client
 settings = get_settings()
 IMAGE_TEXT_TYPE_HANDWRITING = "handwriting"
 HANDWRITING_STYLE_COLLECTION_SUFFIX = "_handwriting_styles"
 HANDWRITING_STYLE_EMBED_MODEL = DEFAULT_HANDWRITING_STYLE_EMBED_MODEL
 HANDWRITING_STYLE_MATCH_MIN_SIMILARITY = 0.86
 HANDWRITING_STYLE_BOOTSTRAP_MIN_SIMILARITY = 0.89
 HANDWRITING_STYLE_BOOTSTRAP_SAMPLE_SIZE = 3
 HANDWRITING_STYLE_NEIGHBOR_LIMIT = 8
 HANDWRITING_STYLE_IMAGE_MAX_SIDE = 1024
 HANDWRITING_STYLE_ID_PREFIX = "hw_style_"
 HANDWRITING_STYLE_ID_PATTERN = re.compile(r"^hw_style_(\d+)$")
@dataclass
 class HandwritingStyleNeighbor:
    """Represents one nearest handwriting-style neighbor returned from Typesense."""
    document_id: str
    style_cluster_id: str
    vector_distance: float
    similarity: float
@dataclass
 class HandwritingStyleAssignment:
    """Represents the chosen handwriting-style cluster assignment for one document."""
    style_cluster_id: str
    matched_existing: bool
    similarity: float
    vector_distance: float
    compared_neighbors: int
    match_min_similarity: float
    bootstrap_match_min_similarity: float
 def _style_collection_name() -> str:
    """Builds the dedicated Typesense collection name used for handwriting-style vectors."""
    return f"{settings.typesense_collection_name}{HANDWRITING_STYLE_COLLECTION_SUFFIX}"
 def _style_collection() -> Any:
    """Returns the Typesense collection handle for handwriting-style indexing."""
    client = get_typesense_client()
    return client.collections[_style_collection_name()]
 def _distance_to_similarity(vector_distance: float) -> float:
    """Converts Typesense vector distance into conservative similarity in [0, 1]."""
    return max(0.0, min(1.0, 1.0 - (vector_distance / 2.0)))
 def _encode_style_image_base64(image_data: bytes, image_max_side: int) -> str:
    """Normalizes and downsizes image bytes and returns a base64-encoded JPEG payload."""
    with Image.open(io.BytesIO(image_data)) as image:
        prepared = ImageOps.exif_transpose(image).convert("RGB")
        longest_side = max(prepared.width, prepared.height)
        if longest_side > image_max_side:
            scale = image_max_side / longest_side
            resized_width = max(1, int(prepared.width * scale))
            resized_height = max(1, int(prepared.height * scale))
            prepared = prepared.resize((resized_width, resized_height), Image.Resampling.LANCZOS)
        output = io.BytesIO()
        prepared.save(output, format="JPEG", quality=86, optimize=True)
        return base64.b64encode(output.getvalue()).decode("ascii")
 def ensure_handwriting_style_collection() -> None:
    """Creates the handwriting-style Typesense collection when it is not present."""
    runtime_settings = read_handwriting_style_settings()
    embed_model = str(runtime_settings.get("embed_model", HANDWRITING_STYLE_EMBED_MODEL)).strip() or HANDWRITING_STYLE_EMBED_MODEL
    collection = _style_collection()
    should_recreate_collection = False
    try:
        existing_schema = collection.retrieve()
        if isinstance(existing_schema, dict):
            existing_fields = existing_schema.get("fields", [])
            if isinstance(existing_fields, list):
                for field in existing_fields:
                    if not isinstance(field, dict):
                        continue
                    if str(field.get("name", "")).strip() != "embedding":
                        continue
                    embed_config = field.get("embed", {})
                    model_config = embed_config.get("model_config", {}) if isinstance(embed_config, dict) else {}
                    existing_model = str(model_config.get("model_name", "")).strip()
                    if existing_model and existing_model != embed_model:
                        should_recreate_collection = True
                        break
        if not should_recreate_collection:
            return
    except Exception as error:
        message = str(error).lower()
        if "404" not in message and "not found" not in message:
            raise
    client = get_typesense_client()
    if should_recreate_collection:
        client.collections[_style_collection_name()].delete()
    schema = {
        "name": _style_collection_name(),
        "fields": [
            {
                "name": "style_cluster_id",
                "type": "string",
                "facet": True,
            },
            {
                "name": "image_text_type",
                "type": "string",
                "facet": True,
            },
            {
                "name": "created_at",
                "type": "int64",
            },
            {
                "name": "image",
                "type": "image",
                "store": False,
            },
            {
                "name": "embedding",
                "type": "float[]",
                "embed": {
                    "from": ["image"],
                    "model_config": {
                        "model_name": embed_model,
                    },
                },
            },
        ],
        "default_sorting_field": "created_at",
    }
    client.collections.create(schema)
 def _search_style_neighbors(
    image_base64: str,
    limit: int,
    exclude_document_id: str | None = None,
 ) -> list[HandwritingStyleNeighbor]:
    """Returns nearest handwriting-style neighbors for one encoded image payload."""
    ensure_handwriting_style_collection()
    client = get_typesense_client()
    filter_clauses = [f"image_text_type:={IMAGE_TEXT_TYPE_HANDWRITING}"]
    if exclude_document_id:
        filter_clauses.append(f"id:!={exclude_document_id}")
    search_payload = {
        "q": "*",
        "query_by": "embedding",
        "vector_query": f"embedding:([], image:{image_base64}, k:{max(1, limit)})",
        "exclude_fields": "embedding,image",
        "per_page": max(1, limit),
        "filter_by": " && ".join(filter_clauses),
    }
    response = client.multi_search.perform(
        {
            "searches": [
                {
                    "collection": _style_collection_name(),
                    **search_payload,
                }
            ]
        },
        {},
    )
    results = response.get("results", []) if isinstance(response, dict) else []
    first_result = results[0] if isinstance(results, list) and len(results) > 0 else {}
    hits = first_result.get("hits", []) if isinstance(first_result, dict) else []
    neighbors: list[HandwritingStyleNeighbor] = []
    for hit in hits:
        if not isinstance(hit, dict):
            continue
        document = hit.get("document")
        if not isinstance(document, dict):
            continue
        document_id = str(document.get("id", "")).strip()
        style_cluster_id = str(document.get("style_cluster_id", "")).strip()
        if not document_id or not style_cluster_id:
            continue
        try:
            vector_distance = float(hit.get("vector_distance", 2.0))
        except (TypeError, ValueError):
            vector_distance = 2.0
        neighbors.append(
            HandwritingStyleNeighbor(
                document_id=document_id,
                style_cluster_id=style_cluster_id,
                vector_distance=vector_distance,
                similarity=_distance_to_similarity(vector_distance),
            )
        )
        if len(neighbors) >= limit:
            break
    return neighbors
 def _next_style_cluster_id(session: Session) -> str:
    """Allocates the next stable handwriting-style folder identifier."""
    existing_ids = session.execute(
        select(Document.handwriting_style_id).where(Document.handwriting_style_id.is_not(None))
    ).scalars().all()
    max_value = 0
    for existing_id in existing_ids:
        candidate = str(existing_id).strip()
        match = HANDWRITING_STYLE_ID_PATTERN.fullmatch(candidate)
        if not match:
            continue
        numeric_part = int(match.group(1))
        max_value = max(max_value, numeric_part)
    return f"{HANDWRITING_STYLE_ID_PREFIX}{max_value + 1}"
 def _style_cluster_sample_size(session: Session, style_cluster_id: str) -> int:
    """Returns the number of indexed documents currently assigned to one style cluster."""
    return int(
        session.execute(
            select(func.count())
            .select_from(Document)
            .where(Document.handwriting_style_id == style_cluster_id)
            .where(Document.image_text_type == IMAGE_TEXT_TYPE_HANDWRITING)
        ).scalar_one()
    )
 def assign_handwriting_style(
    session: Session,
    document: Document,
    image_data: bytes,
 ) -> HandwritingStyleAssignment:
    """Assigns a document to an existing handwriting-style cluster or creates a new one."""
    runtime_settings = read_handwriting_style_settings()
    image_max_side = int(runtime_settings.get("image_max_side", HANDWRITING_STYLE_IMAGE_MAX_SIDE))
    neighbor_limit = int(runtime_settings.get("neighbor_limit", HANDWRITING_STYLE_NEIGHBOR_LIMIT))
    match_min_similarity = float(runtime_settings.get("match_min_similarity", HANDWRITING_STYLE_MATCH_MIN_SIMILARITY))
    bootstrap_match_min_similarity = float(
        runtime_settings.get("bootstrap_match_min_similarity", HANDWRITING_STYLE_BOOTSTRAP_MIN_SIMILARITY)
    )
    bootstrap_sample_size = int(runtime_settings.get("bootstrap_sample_size", HANDWRITING_STYLE_BOOTSTRAP_SAMPLE_SIZE))
    image_base64 = _encode_style_image_base64(image_data, image_max_side=image_max_side)
    neighbors = _search_style_neighbors(
        image_base64=image_base64,
        limit=neighbor_limit,
        exclude_document_id=str(document.id),
    )
    best_neighbor = neighbors[0] if neighbors else None
    similarity = best_neighbor.similarity if best_neighbor else 0.0
    vector_distance = best_neighbor.vector_distance if best_neighbor else 2.0
    cluster_sample_size = 0
    if best_neighbor:
        cluster_sample_size = _style_cluster_sample_size(
            session=session,
            style_cluster_id=best_neighbor.style_cluster_id,
        )
    required_similarity = (
        bootstrap_match_min_similarity
        if cluster_sample_size < bootstrap_sample_size
        else match_min_similarity
    )
    should_match_existing = (
        best_neighbor is not None and similarity >= required_similarity
    )
    if should_match_existing and best_neighbor:
        style_cluster_id = best_neighbor.style_cluster_id
        matched_existing = True
    else:
        existing_style_cluster_id = (document.handwriting_style_id or "").strip()
        if HANDWRITING_STYLE_ID_PATTERN.fullmatch(existing_style_cluster_id):
            style_cluster_id = existing_style_cluster_id
        else:
            style_cluster_id = _next_style_cluster_id(session=session)
        matched_existing = False
    ensure_handwriting_style_collection()
    collection = _style_collection()
    payload = {
        "id": str(document.id),
        "style_cluster_id": style_cluster_id,
        "image_text_type": IMAGE_TEXT_TYPE_HANDWRITING,
        "created_at": int(document.created_at.timestamp()),
        "image": image_base64,
    }
    collection.documents.upsert(payload)
    return HandwritingStyleAssignment(
        style_cluster_id=style_cluster_id,
        matched_existing=matched_existing,
        similarity=similarity,
        vector_distance=vector_distance,
        compared_neighbors=len(neighbors),
        match_min_similarity=match_min_similarity,
        bootstrap_match_min_similarity=bootstrap_match_min_similarity,
    )
 def delete_handwriting_style_document(document_id: str) -> None:
    """Deletes one document id from the handwriting-style Typesense collection."""
    collection = _style_collection()
    try:
        collection.documents[document_id].delete()
    except Exception as error:
        message = str(error).lower()
        if "404" in message or "not found" in message:
            return
        raise
 def delete_many_handwriting_style_documents(document_ids: list[str]) -> None:
    """Deletes many document ids from the handwriting-style Typesense collection."""
    for document_id in document_ids:
        delete_handwriting_style_document(document_id)
 def apply_handwriting_style_path(style_cluster_id: str | None, path_value: str | None) -> str | None:
    """Composes style-prefixed logical paths while preventing duplicate prefix nesting."""
    if path_value is None:
        return None
    normalized_path = path_value.strip().strip("/")
    if not normalized_path:
        return None
    normalized_style = (style_cluster_id or "").strip().strip("/")
    if not normalized_style:
        return normalized_path
    segments = [segment for segment in normalized_path.split("/") if segment]
    while segments and HANDWRITING_STYLE_ID_PATTERN.fullmatch(segments[0]):
        segments.pop(0)
    if segments and segments[0].strip().lower() == normalized_style.lower():
        segments.pop(0)
    if len(segments) == 0:
        return normalized_style
    sanitized_path = "/".join(segments)
    return f"{normalized_style}/{sanitized_path}"
 def resolve_handwriting_style_path_prefix(
    session: Session,
    style_cluster_id: str | None,
    *,
    exclude_document_id: str | None = None,
 ) -> str | None:
    """Resolves a stable path prefix for one style cluster, preferring known non-style root segments."""
    normalized_style = (style_cluster_id or "").strip()
    if not normalized_style:
        return None
    statement = select(Document.logical_path).where(
        Document.handwriting_style_id == normalized_style,
        Document.image_text_type == IMAGE_TEXT_TYPE_HANDWRITING,
        Document.status != DocumentStatus.TRASHED,
    )
    if exclude_document_id:
        statement = statement.where(Document.id != exclude_document_id)
    rows = session.execute(statement).scalars().all()
    segment_counts: dict[str, int] = {}
    segment_labels: dict[str, str] = {}
    for raw_path in rows:
        if not isinstance(raw_path, str):
            continue
        segments = [segment.strip() for segment in raw_path.split("/") if segment.strip()]
        if not segments:
            continue
        first_segment = segments[0]
        lowered = first_segment.lower()
        if lowered == "inbox":
            continue
        if HANDWRITING_STYLE_ID_PATTERN.fullmatch(first_segment):
            continue
        segment_counts[lowered] = segment_counts.get(lowered, 0) + 1
        if lowered not in segment_labels:
            segment_labels[lowered] = first_segment
    if not segment_counts:
        return normalized_style
    winner = sorted(
        segment_counts.items(),
        key=lambda item: (-item[1], item[0]),
    )[0][0]
    return segment_labels.get(winner, normalized_style)
@@ -0,0 +1,227 @@
 """Model runtime utilities for provider-bound LLM task execution."""
 from dataclasses import dataclass
 from typing import Any
 from urllib.parse import urlparse, urlunparse
 from openai import APIConnectionError, APIError, APITimeoutError, OpenAI
 from app.services.app_settings import read_task_runtime_settings
 class ModelTaskError(Exception):
    """Raised when a model task request fails."""
 class ModelTaskTimeoutError(ModelTaskError):
    """Raised when a model task request times out."""
 class ModelTaskDisabledError(ModelTaskError):
    """Raised when a model task is disabled in settings."""
@dataclass
 class ModelTaskRuntime:
    """Resolved runtime configuration for one task and provider."""
    task_name: str
    provider_id: str
    provider_type: str
    base_url: str
    timeout_seconds: int
    api_key: str
    model: str
    prompt: str
 def _normalize_base_url(raw_value: str) -> str:
    """Normalizes provider base URL and appends /v1 for OpenAI-compatible servers."""
    trimmed = raw_value.strip().rstrip("/")
    if not trimmed:
        return "https://api.openai.com/v1"
    parsed = urlparse(trimmed)
    path = parsed.path or ""
    if not path.endswith("/v1"):
        path = f"{path}/v1" if path else "/v1"
    return urlunparse(parsed._replace(path=path))
 def _should_fallback_to_chat(error: Exception) -> bool:
    """Determines whether a responses API failure should fallback to chat completions."""
    status_code = getattr(error, "status_code", None)
    if isinstance(status_code, int) and status_code in {400, 404, 405, 415, 422, 501}:
        return True
    message = str(error).lower()
    fallback_markers = (
        "404",
        "not found",
        "unknown endpoint",
        "unsupported",
        "invalid url",
        "responses",
    )
    return any(marker in message for marker in fallback_markers)
 def _extract_text_from_response(response: Any) -> str:
    """Extracts plain text from Responses API outputs."""
    output_text = getattr(response, "output_text", None)
    if isinstance(output_text, str) and output_text.strip():
        return output_text.strip()
    output_items = getattr(response, "output", None)
    if not isinstance(output_items, list):
        return ""
    chunks: list[str] = []
    for item in output_items:
        item_data = item.model_dump() if hasattr(item, "model_dump") else item
        if not isinstance(item_data, dict):
            continue
        item_type = item_data.get("type")
        if item_type == "output_text":
            text = str(item_data.get("text", "")).strip()
            if text:
                chunks.append(text)
        if item_type == "message":
            for content in item_data.get("content", []) or []:
                if not isinstance(content, dict):
                    continue
                if content.get("type") in {"output_text", "text"}:
                    text = str(content.get("text", "")).strip()
                    if text:
                        chunks.append(text)
    return "\n".join(chunks).strip()
 def _extract_text_from_chat_response(response: Any) -> str:
    """Extracts text from Chat Completions API outputs."""
    message_content = response.choices[0].message.content
    if isinstance(message_content, str):
        return message_content.strip()
    if not isinstance(message_content, list):
        return ""
    chunks: list[str] = []
    for content in message_content:
        if not isinstance(content, dict):
            continue
        text = str(content.get("text", "")).strip()
        if text:
            chunks.append(text)
    return "\n".join(chunks).strip()
 def resolve_task_runtime(task_name: str) -> ModelTaskRuntime:
    """Resolves one task runtime including provider endpoint, model, and prompt."""
    runtime_payload = read_task_runtime_settings(task_name)
    task_payload = runtime_payload["task"]
    provider_payload = runtime_payload["provider"]
    if not bool(task_payload.get("enabled", True)):
        raise ModelTaskDisabledError(f"task_disabled:{task_name}")
    provider_type = str(provider_payload.get("provider_type", "openai_compatible")).strip()
    if provider_type != "openai_compatible":
        raise ModelTaskError(f"unsupported_provider_type:{provider_type}")
    return ModelTaskRuntime(
        task_name=task_name,
        provider_id=str(provider_payload.get("id", "")),
        provider_type=provider_type,
        base_url=_normalize_base_url(str(provider_payload.get("base_url", "https://api.openai.com/v1"))),
        timeout_seconds=int(provider_payload.get("timeout_seconds", 45)),
        api_key=str(provider_payload.get("api_key", "")).strip() or "no-key-required",
        model=str(task_payload.get("model", "")).strip(),
        prompt=str(task_payload.get("prompt", "")).strip(),
    )
 def _create_client(runtime: ModelTaskRuntime) -> OpenAI:
    """Builds an OpenAI SDK client for OpenAI-compatible provider endpoints."""
    return OpenAI(
        api_key=runtime.api_key,
        base_url=runtime.base_url,
        timeout=runtime.timeout_seconds,
    )
 def complete_text_task(task_name: str, user_text: str, prompt_override: str | None = None) -> str:
    """Runs a text-only task against the configured provider and returns plain output text."""
    runtime = resolve_task_runtime(task_name)
    client = _create_client(runtime)
    prompt = (prompt_override or runtime.prompt).strip() or runtime.prompt
    try:
        response = client.responses.create(
            model=runtime.model,
            input=[
                {
                    "role": "system",
                    "content": [
                        {
                            "type": "input_text",
                            "text": prompt,
                        }
                    ],
                },
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "input_text",
                            "text": user_text,
                        }
                    ],
                },
            ],
        )
        text = _extract_text_from_response(response)
        if text:
            return text
    except APITimeoutError as error:
        raise ModelTaskTimeoutError(f"task_timeout:{task_name}") from error
    except APIConnectionError as error:
        raise ModelTaskError(f"task_error:{task_name}:{error}") from error
    except APIError as error:
        if not _should_fallback_to_chat(error):
            raise ModelTaskError(f"task_error:{task_name}:{error}") from error
    except Exception as error:
        if not _should_fallback_to_chat(error):
            raise ModelTaskError(f"task_error:{task_name}:{error}") from error
    try:
        fallback = client.chat.completions.create(
            model=runtime.model,
            messages=[
                {
                    "role": "system",
                    "content": prompt,
                },
                {
                    "role": "user",
                    "content": user_text,
                },
            ],
        )
        return _extract_text_from_chat_response(fallback)
    except APITimeoutError as error:
        raise ModelTaskTimeoutError(f"task_timeout:{task_name}") from error
    except (APIConnectionError, APIError) as error:
        raise ModelTaskError(f"task_error:{task_name}:{error}") from error
    except Exception as error:
        raise ModelTaskError(f"task_error:{task_name}:{error}") from error
@@ -0,0 +1,192 @@
 """Persistence helpers for writing and querying processing pipeline log events."""
 from typing import Any
 from uuid import UUID
 from sqlalchemy import delete, func, select
 from sqlalchemy.orm import Session
 from app.models.document import Document
 from app.models.processing_log import ProcessingLogEntry
 MAX_STAGE_LENGTH = 64
 MAX_EVENT_LENGTH = 256
 MAX_LEVEL_LENGTH = 16
 MAX_PROVIDER_LENGTH = 128
 MAX_MODEL_LENGTH = 256
 MAX_DOCUMENT_FILENAME_LENGTH = 512
 MAX_PROMPT_LENGTH = 200000
 MAX_RESPONSE_LENGTH = 200000
 DEFAULT_KEEP_DOCUMENT_SESSIONS = 2
 DEFAULT_KEEP_UNBOUND_ENTRIES = 80
 PROCESSING_LOG_AUTOCOMMIT_SESSION_KEY = "processing_log_autocommit"
 def _trim(value: str | None, max_length: int) -> str | None:
    """Normalizes and truncates text values for safe log persistence."""
    if value is None:
        return None
    normalized = value.strip()
    if not normalized:
        return None
    if len(normalized) <= max_length:
        return normalized
    return normalized[: max_length - 3] + "..."
 def _safe_payload(payload_json: dict[str, Any] | None) -> dict[str, Any]:
    """Ensures payload values are persisted as dictionaries."""
    return payload_json if isinstance(payload_json, dict) else {}
 def set_processing_log_autocommit(session: Session, enabled: bool) -> None:
    """Toggles per-session immediate commit behavior for processing log events."""
    session.info[PROCESSING_LOG_AUTOCOMMIT_SESSION_KEY] = bool(enabled)
 def is_processing_log_autocommit_enabled(session: Session) -> bool:
    """Returns whether processing logs are committed immediately for the current session."""
    return bool(session.info.get(PROCESSING_LOG_AUTOCOMMIT_SESSION_KEY, False))
 def log_processing_event(
    session: Session,
    stage: str,
    event: str,
    *,
    level: str = "info",
    document: Document | None = None,
    document_id: UUID | None = None,
    document_filename: str | None = None,
    provider_id: str | None = None,
    model_name: str | None = None,
    prompt_text: str | None = None,
    response_text: str | None = None,
    payload_json: dict[str, Any] | None = None,
 ) -> None:
    """Persists one processing log entry linked to an optional document context."""
    resolved_document_id = document.id if document is not None else document_id
    resolved_document_filename = document.original_filename if document is not None else document_filename
    entry = ProcessingLogEntry(
        level=_trim(level, MAX_LEVEL_LENGTH) or "info",
        stage=_trim(stage, MAX_STAGE_LENGTH) or "pipeline",
        event=_trim(event, MAX_EVENT_LENGTH) or "event",
        document_id=resolved_document_id,
        document_filename=_trim(resolved_document_filename, MAX_DOCUMENT_FILENAME_LENGTH),
        provider_id=_trim(provider_id, MAX_PROVIDER_LENGTH),
        model_name=_trim(model_name, MAX_MODEL_LENGTH),
        prompt_text=_trim(prompt_text, MAX_PROMPT_LENGTH),
        response_text=_trim(response_text, MAX_RESPONSE_LENGTH),
        payload_json=_safe_payload(payload_json),
    )
    session.add(entry)
    if is_processing_log_autocommit_enabled(session):
        session.commit()
 def count_processing_logs(session: Session, document_id: UUID | None = None) -> int:
    """Counts persisted processing logs, optionally restricted to one document."""
    statement = select(func.count()).select_from(ProcessingLogEntry)
    if document_id is not None:
        statement = statement.where(ProcessingLogEntry.document_id == document_id)
    return int(session.execute(statement).scalar_one())
 def list_processing_logs(
    session: Session,
    *,
    limit: int,
    offset: int,
    document_id: UUID | None = None,
 ) -> list[ProcessingLogEntry]:
    """Lists processing logs ordered by newest-first with optional document filter."""
    statement = select(ProcessingLogEntry)
    if document_id is not None:
        statement = statement.where(ProcessingLogEntry.document_id == document_id)
    statement = statement.order_by(ProcessingLogEntry.created_at.desc(), ProcessingLogEntry.id.desc()).offset(offset).limit(limit)
    return session.execute(statement).scalars().all()
 def cleanup_processing_logs(
    session: Session,
    *,
    keep_document_sessions: int = DEFAULT_KEEP_DOCUMENT_SESSIONS,
    keep_unbound_entries: int = DEFAULT_KEEP_UNBOUND_ENTRIES,
 ) -> dict[str, int]:
    """Deletes old log entries while keeping recent document sessions and unbound events."""
    normalized_keep_sessions = max(0, keep_document_sessions)
    normalized_keep_unbound = max(0, keep_unbound_entries)
    deleted_document_entries = 0
    deleted_unbound_entries = 0
    recent_document_rows = session.execute(
        select(
            ProcessingLogEntry.document_id,
            func.max(ProcessingLogEntry.created_at).label("last_seen"),
        )
        .where(ProcessingLogEntry.document_id.is_not(None))
        .group_by(ProcessingLogEntry.document_id)
        .order_by(func.max(ProcessingLogEntry.created_at).desc())
        .limit(normalized_keep_sessions)
    ).all()
    keep_document_ids = [row[0] for row in recent_document_rows if row[0] is not None]
    if keep_document_ids:
        deleted_document_entries = int(
            session.execute(
                delete(ProcessingLogEntry).where(
                    ProcessingLogEntry.document_id.is_not(None),
                    ProcessingLogEntry.document_id.notin_(keep_document_ids),
                )
            ).rowcount
            or 0
        )
    else:
        deleted_document_entries = int(
            session.execute(delete(ProcessingLogEntry).where(ProcessingLogEntry.document_id.is_not(None))).rowcount or 0
        )
    keep_unbound_rows = session.execute(
        select(ProcessingLogEntry.id)
        .where(ProcessingLogEntry.document_id.is_(None))
        .order_by(ProcessingLogEntry.created_at.desc(), ProcessingLogEntry.id.desc())
        .limit(normalized_keep_unbound)
    ).all()
    keep_unbound_ids = [row[0] for row in keep_unbound_rows]
    if keep_unbound_ids:
        deleted_unbound_entries = int(
            session.execute(
                delete(ProcessingLogEntry).where(
                    ProcessingLogEntry.document_id.is_(None),
                    ProcessingLogEntry.id.notin_(keep_unbound_ids),
                )
            ).rowcount
            or 0
        )
    else:
        deleted_unbound_entries = int(
            session.execute(delete(ProcessingLogEntry).where(ProcessingLogEntry.document_id.is_(None))).rowcount or 0
        )
    return {
        "deleted_document_entries": deleted_document_entries,
        "deleted_unbound_entries": deleted_unbound_entries,
    }
 def clear_processing_logs(session: Session) -> dict[str, int]:
    """Deletes all persisted processing log entries and returns deletion count."""
    deleted_entries = int(session.execute(delete(ProcessingLogEntry)).rowcount or 0)
    return {"deleted_entries": deleted_entries}
@@ -0,0 +1,59 @@
 """File storage utilities for persistence, retrieval, and checksum calculation."""
 import hashlib
 import uuid
 from datetime import UTC, datetime
 from pathlib import Path
 from app.core.config import get_settings
 settings = get_settings()
 def ensure_storage() -> None:
    """Ensures required storage directories exist at service startup."""
    for relative in ["originals", "derived/previews", "tmp"]:
        (settings.storage_root / relative).mkdir(parents=True, exist_ok=True)
 def compute_sha256(data: bytes) -> str:
    """Computes a SHA-256 hex digest for raw file bytes."""
    return hashlib.sha256(data).hexdigest()
 def store_bytes(filename: str, data: bytes) -> str:
    """Stores file content under a unique path and returns its storage-relative location."""
    stamp = datetime.now(UTC).strftime("%Y/%m/%d")
    safe_ext = Path(filename).suffix.lower()
    target_dir = settings.storage_root / "originals" / stamp
    target_dir.mkdir(parents=True, exist_ok=True)
    target_name = f"{uuid.uuid4()}{safe_ext}"
    target_path = target_dir / target_name
    target_path.write_bytes(data)
    return str(target_path.relative_to(settings.storage_root))
 def read_bytes(relative_path: str) -> bytes:
    """Reads and returns bytes from a storage-relative path."""
    return (settings.storage_root / relative_path).read_bytes()
 def absolute_path(relative_path: str) -> Path:
    """Returns the absolute filesystem path for a storage-relative location."""
    return settings.storage_root / relative_path
 def write_preview(document_id: str, data: bytes, suffix: str = ".jpg") -> str:
    """Writes preview bytes and returns the preview path relative to storage root."""
    target_dir = settings.storage_root / "derived" / "previews"
    target_dir.mkdir(parents=True, exist_ok=True)
    target_path = target_dir / f"{document_id}{suffix}"
    target_path.write_bytes(data)
    return str(target_path.relative_to(settings.storage_root))
@@ -0,0 +1,257 @@
 """Typesense indexing and semantic-neighbor retrieval for document routing."""
 from dataclasses import dataclass
 from typing import Any
 import typesense
 from app.core.config import get_settings
 from app.models.document import Document, DocumentStatus
 settings = get_settings()
 MAX_TYPESENSE_QUERY_CHARS = 600
@dataclass
 class SimilarDocument:
    """Represents one nearest-neighbor document returned by Typesense semantic search."""
    document_id: str
    document_name: str
    summary_text: str
    logical_path: str
    tags: list[str]
    vector_distance: float
 def _build_client() -> typesense.Client:
    """Builds a Typesense API client using configured host and credentials."""
    return typesense.Client(
        {
            "nodes": [
                {
                    "host": settings.typesense_host,
                    "port": str(settings.typesense_port),
                    "protocol": settings.typesense_protocol,
                }
            ],
            "api_key": settings.typesense_api_key,
            "connection_timeout_seconds": settings.typesense_timeout_seconds,
            "num_retries": settings.typesense_num_retries,
        }
    )
 _client: typesense.Client | None = None
 def get_typesense_client() -> typesense.Client:
    """Returns a cached Typesense client for repeated indexing and search operations."""
    global _client
    if _client is None:
        _client = _build_client()
    return _client
 def _collection() -> Any:
    """Returns the configured Typesense collection handle."""
    client = get_typesense_client()
    return client.collections[settings.typesense_collection_name]
 def ensure_typesense_collection() -> None:
    """Creates the document semantic collection when it does not already exist."""
    collection = _collection()
    try:
        collection.retrieve()
        return
    except Exception as error:
        message = str(error).lower()
        if "404" not in message and "not found" not in message:
            raise
    schema = {
        "name": settings.typesense_collection_name,
        "fields": [
            {
                "name": "document_name",
                "type": "string",
            },
            {
                "name": "summary_text",
                "type": "string",
            },
            {
                "name": "logical_path",
                "type": "string",
                "facet": True,
            },
            {
                "name": "tags",
                "type": "string[]",
                "facet": True,
            },
            {
                "name": "status",
                "type": "string",
                "facet": True,
            },
            {
                "name": "mime_type",
                "type": "string",
                "optional": True,
                "facet": True,
            },
            {
                "name": "extension",
                "type": "string",
                "optional": True,
                "facet": True,
            },
            {
                "name": "created_at",
                "type": "int64",
            },
            {
                "name": "has_labels",
                "type": "bool",
                "facet": True,
            },
            {
                "name": "embedding",
                "type": "float[]",
                "embed": {
                    "from": [
                        "document_name",
                        "summary_text",
                    ],
                    "model_config": {
                        "model_name": "ts/e5-small-v2",
                        "indexing_prefix": "passage:",
                        "query_prefix": "query:",
                    },
                },
            },
        ],
        "default_sorting_field": "created_at",
    }
    client = get_typesense_client()
    client.collections.create(schema)
 def _has_labels(document: Document) -> bool:
    """Determines whether a document has usable human-assigned routing metadata."""
    if document.logical_path.strip() and document.logical_path.strip().lower() != "inbox":
        return True
    return len([tag for tag in document.tags if tag.strip()]) > 0
 def upsert_document_index(document: Document, summary_text: str) -> None:
    """Upserts one document into Typesense for semantic retrieval and routing examples."""
    ensure_typesense_collection()
    collection = _collection()
    payload = {
        "id": str(document.id),
        "document_name": document.original_filename,
        "summary_text": summary_text[:50000],
        "logical_path": document.logical_path,
        "tags": [tag for tag in document.tags if tag.strip()][:50],
        "status": document.status.value,
        "mime_type": document.mime_type,
        "extension": document.extension,
        "created_at": int(document.created_at.timestamp()),
        "has_labels": _has_labels(document) and document.status != DocumentStatus.TRASHED,
    }
    collection.documents.upsert(payload)
 def delete_document_index(document_id: str) -> None:
    """Deletes one document from Typesense by identifier."""
    collection = _collection()
    try:
        collection.documents[document_id].delete()
    except Exception as error:
        message = str(error).lower()
        if "404" in message or "not found" in message:
            return
        raise
 def delete_many_documents_index(document_ids: list[str]) -> None:
    """Deletes many documents from Typesense by identifiers."""
    for document_id in document_ids:
        delete_document_index(document_id)
 def query_similar_documents(summary_text: str, limit: int, exclude_document_id: str | None = None) -> list[SimilarDocument]:
    """Returns semantic nearest neighbors among labeled non-trashed indexed documents."""
    ensure_typesense_collection()
    collection = _collection()
    normalized_query = " ".join(summary_text.strip().split())
    query_text = normalized_query[:MAX_TYPESENSE_QUERY_CHARS] if normalized_query else "document"
    search_payload = {
        "q": query_text,
        "query_by": "embedding",
        "vector_query": f"embedding:([], k:{max(1, limit)})",
        "exclude_fields": "embedding",
        "per_page": max(1, limit),
        "filter_by": "has_labels:=true && status:!=trashed",
    }
    try:
        response = collection.documents.search(search_payload)
    except Exception as error:
        message = str(error).lower()
        if "query string exceeds max allowed length" not in message:
            raise
        fallback_payload = dict(search_payload)
        fallback_payload["q"] = "document"
        response = collection.documents.search(fallback_payload)
    hits = response.get("hits", []) if isinstance(response, dict) else []
    neighbors: list[SimilarDocument] = []
    for hit in hits:
        if not isinstance(hit, dict):
            continue
        document = hit.get("document", {})
        if not isinstance(document, dict):
            continue
        document_id = str(document.get("id", "")).strip()
        if not document_id:
            continue
        if exclude_document_id and document_id == exclude_document_id:
            continue
        raw_tags = document.get("tags", [])
        tags = [str(tag).strip() for tag in raw_tags if str(tag).strip()] if isinstance(raw_tags, list) else []
        try:
            distance = float(hit.get("vector_distance", 2.0))
        except (TypeError, ValueError):
            distance = 2.0
        neighbors.append(
            SimilarDocument(
                document_id=document_id,
                document_name=str(document.get("document_name", "")).strip(),
                summary_text=str(document.get("summary_text", "")).strip(),
                logical_path=str(document.get("logical_path", "")).strip(),
                tags=tags,
                vector_distance=distance,
            )
        )
        if len(neighbors) >= limit:
            break
    return neighbors
@@ -0,0 +1 @@
 """Background worker package for queueing and document processing tasks."""
@@ -0,0 +1,21 @@
 """Queue connection helpers used by API and worker processes."""
 from redis import Redis
 from rq import Queue
 from app.core.config import get_settings
 settings = get_settings()
 def get_redis() -> Redis:
    """Creates a Redis connection from configured URL."""
    return Redis.from_url(settings.redis_url)
 def get_processing_queue() -> Queue:
    """Returns the named queue for document processing jobs."""
    return Queue("dcm", connection=get_redis())
@@ -0,0 +1,544 @@
 """Background worker tasks for extraction, indexing, and archive fan-out."""
 import uuid
 from datetime import UTC, datetime
 from pathlib import Path
 from sqlalchemy import select
 from app.db.base import SessionLocal
 from app.models.document import Document, DocumentStatus
 from app.services.app_settings import read_handwriting_provider_settings, read_handwriting_style_settings
 from app.services.extractor import (
    IMAGE_EXTENSIONS,
    extract_archive_members,
    extract_text_content,
    is_supported_for_extraction,
    sniff_mime,
 )
 from app.services.handwriting import IMAGE_TEXT_TYPE_HANDWRITING
 from app.services.handwriting_style import (
    assign_handwriting_style,
    delete_handwriting_style_document,
 )
 from app.services.processing_logs import cleanup_processing_logs, log_processing_event, set_processing_log_autocommit
 from app.services.routing_pipeline import (
    apply_routing_decision,
    classify_document_routing,
    summarize_document,
    upsert_semantic_index,
 )
 from app.services.storage import absolute_path, compute_sha256, store_bytes, write_preview
 from app.worker.queue import get_processing_queue
 def _create_archive_member_document(
    parent: Document,
    member_name: str,
    member_data: bytes,
    mime_type: str,
 ) -> Document:
    """Creates a child document entity for a file extracted from an uploaded archive."""
    extension = Path(member_name).suffix.lower()
    stored_relative_path = store_bytes(member_name, member_data)
    return Document(
        original_filename=Path(member_name).name,
        source_relative_path=f"{parent.source_relative_path}/{member_name}".strip("/"),
        stored_relative_path=stored_relative_path,
        mime_type=mime_type,
        extension=extension,
        sha256=compute_sha256(member_data),
        size_bytes=len(member_data),
        logical_path=parent.logical_path,
        tags=list(parent.tags),
        metadata_json={"origin": "archive", "parent": str(parent.id)},
        is_archive_member=True,
        archived_member_path=member_name,
        parent_document_id=parent.id,
    )
 def process_document_task(document_id: str) -> None:
    """Processes one queued document and updates extraction and suggestion fields."""
    with SessionLocal() as session:
        set_processing_log_autocommit(session, True)
        queue = get_processing_queue()
        document = session.execute(
            select(Document).where(Document.id == uuid.UUID(document_id))
        ).scalar_one_or_none()
        if document is None:
            return
        log_processing_event(
            session=session,
            stage="worker",
            event="Document processing started",
            level="info",
            document=document,
            payload_json={"status": document.status.value},
        )
        if document.status == DocumentStatus.TRASHED:
            log_processing_event(
                session=session,
                stage="worker",
                event="Document skipped because it is trashed",
                level="warning",
                document=document,
            )
            session.commit()
            return
        source_path = absolute_path(document.stored_relative_path)
        data = source_path.read_bytes()
        if document.extension == ".zip":
            child_ids: list[str] = []
            log_processing_event(
                session=session,
                stage="archive",
                event="Archive extraction started",
                level="info",
                document=document,
                payload_json={"size_bytes": len(data)},
            )
            try:
                members = extract_archive_members(data)
                for member in members:
                    mime_type = sniff_mime(member.data)
                    child = _create_archive_member_document(
                        parent=document,
                        member_name=member.name,
                        member_data=member.data,
                        mime_type=mime_type,
                    )
                    session.add(child)
                    session.flush()
                    child_ids.append(str(child.id))
                    log_processing_event(
                        session=session,
                        stage="archive",
                        event="Archive member extracted and queued",
                        level="info",
                        document=child,
                        payload_json={
                            "parent_document_id": str(document.id),
                            "member_name": member.name,
                            "member_size_bytes": len(member.data),
                            "mime_type": mime_type,
                        },
                    )
                document.status = DocumentStatus.PROCESSED
                document.extracted_text = f"archive with {len(members)} files"
                log_processing_event(
                    session=session,
                    stage="archive",
                    event="Archive extraction completed",
                    level="info",
                    document=document,
                    payload_json={"member_count": len(members)},
                )
            except Exception as exc:
                document.status = DocumentStatus.ERROR
                document.metadata_json = {**document.metadata_json, "error": str(exc)}
                log_processing_event(
                    session=session,
                    stage="archive",
                    event="Archive extraction failed",
                    level="error",
                    document=document,
                    response_text=str(exc),
                )
            if document.status == DocumentStatus.PROCESSED:
                try:
                    summary_text = summarize_document(session=session, document=document)
                    metadata_json = dict(document.metadata_json)
                    metadata_json["summary_text"] = summary_text[:20000]
                    document.metadata_json = metadata_json
                    routing_decision = classify_document_routing(session=session, document=document, summary_text=summary_text)
                    apply_routing_decision(document=document, decision=routing_decision, session=session)
                    routing_metadata = document.metadata_json.get("routing", {})
                    log_processing_event(
                        session=session,
                        stage="routing",
                        event="Routing decision applied",
                        level="info",
                        document=document,
                        payload_json=routing_metadata if isinstance(routing_metadata, dict) else {},
                    )
                    log_processing_event(
                        session=session,
                        stage="indexing",
                        event="Typesense upsert started",
                        level="info",
                        document=document,
                    )
                    upsert_semantic_index(document=document, summary_text=summary_text)
                    log_processing_event(
                        session=session,
                        stage="indexing",
                        event="Typesense upsert completed",
                        level="info",
                        document=document,
                    )
                except Exception as exc:
                    document.metadata_json = {
                        **document.metadata_json,
                        "routing_error": str(exc),
                    }
                    log_processing_event(
                        session=session,
                        stage="routing",
                        event="Routing or indexing failed for archive document",
                        level="error",
                        document=document,
                        response_text=str(exc),
                    )
            document.processed_at = datetime.now(UTC)
            log_processing_event(
                session=session,
                stage="worker",
                event="Document processing completed",
                level="info",
                document=document,
                payload_json={"status": document.status.value},
            )
            cleanup_processing_logs(session=session, keep_document_sessions=2, keep_unbound_entries=80)
            session.commit()
            for child_id in child_ids:
                queue.enqueue("app.worker.tasks.process_document_task", child_id)
            for child_id in child_ids:
                log_processing_event(
                    session=session,
                    stage="archive",
                    event="Archive child job enqueued",
                    level="info",
                    document_id=uuid.UUID(child_id),
                    payload_json={"parent_document_id": str(document.id)},
                )
            session.commit()
            return
        if not is_supported_for_extraction(document.extension, document.mime_type):
            document.status = DocumentStatus.UNSUPPORTED
            document.processed_at = datetime.now(UTC)
            log_processing_event(
                session=session,
                stage="extraction",
                event="Document type unsupported for extraction",
                level="warning",
                document=document,
                payload_json={"extension": document.extension, "mime_type": document.mime_type},
            )
            log_processing_event(
                session=session,
                stage="worker",
                event="Document processing completed",
                level="info",
                document=document,
                payload_json={"status": document.status.value},
            )
            cleanup_processing_logs(session=session, keep_document_sessions=2, keep_unbound_entries=80)
            session.commit()
            return
        if document.extension in IMAGE_EXTENSIONS:
            ocr_settings = read_handwriting_provider_settings()
            log_processing_event(
                session=session,
                stage="ocr",
                event="OCR request started",
                level="info",
                document=document,
                provider_id=str(ocr_settings.get("provider_id", "")),
                model_name=str(ocr_settings.get("openai_model", "")),
                prompt_text=str(ocr_settings.get("prompt", "")),
                payload_json={"mime_type": document.mime_type},
            )
        else:
            log_processing_event(
                session=session,
                stage="extraction",
                event="Text extraction started",
                level="info",
                document=document,
                payload_json={"extension": document.extension, "mime_type": document.mime_type},
            )
        extraction = extract_text_content(document.original_filename, data, document.mime_type)
        if extraction.preview_bytes and extraction.preview_suffix:
            preview_relative_path = write_preview(str(document.id), extraction.preview_bytes, extraction.preview_suffix)
            document.metadata_json = {**document.metadata_json, "preview_relative_path": preview_relative_path}
            document.preview_available = True
            log_processing_event(
                session=session,
                stage="extraction",
                event="Preview generated",
                level="info",
                document=document,
                payload_json={"preview_relative_path": preview_relative_path},
            )
        if extraction.metadata_json:
            document.metadata_json = {**document.metadata_json, **extraction.metadata_json}
        if document.extension in IMAGE_EXTENSIONS:
            image_text_type = extraction.metadata_json.get("image_text_type")
            if isinstance(image_text_type, str) and image_text_type.strip():
                document.image_text_type = image_text_type.strip()
            else:
                document.image_text_type = None
        else:
            document.image_text_type = None
            document.handwriting_style_id = None
        if extraction.status == "error":
            document.status = DocumentStatus.ERROR
            document.metadata_json = {**document.metadata_json, "error": "extraction_failed"}
            if document.extension in IMAGE_EXTENSIONS:
                document.handwriting_style_id = None
                metadata_json = dict(document.metadata_json)
                metadata_json.pop("handwriting_style", None)
                document.metadata_json = metadata_json
                try:
                    delete_handwriting_style_document(str(document.id))
                except Exception:
                    pass
            document.processed_at = datetime.now(UTC)
            log_processing_event(
                session=session,
                stage="extraction",
                event="Extraction failed",
                level="error",
                document=document,
                response_text=str(extraction.metadata_json.get("error", "extraction_failed")),
                payload_json=extraction.metadata_json,
            )
            if "transcription_error" in extraction.metadata_json:
                log_processing_event(
                    session=session,
                    stage="ocr",
                    event="OCR request failed",
                    level="error",
                    document=document,
                    response_text=str(extraction.metadata_json.get("transcription_error", "")),
                )
            log_processing_event(
                session=session,
                stage="worker",
                event="Document processing completed",
                level="info",
                document=document,
                payload_json={"status": document.status.value},
            )
            cleanup_processing_logs(session=session, keep_document_sessions=2, keep_unbound_entries=80)
            session.commit()
            return
        if extraction.status == "unsupported":
            document.status = DocumentStatus.UNSUPPORTED
            if document.extension in IMAGE_EXTENSIONS:
                document.handwriting_style_id = None
                metadata_json = dict(document.metadata_json)
                metadata_json.pop("handwriting_style", None)
                document.metadata_json = metadata_json
                try:
                    delete_handwriting_style_document(str(document.id))
                except Exception:
                    pass
            document.processed_at = datetime.now(UTC)
            log_processing_event(
                session=session,
                stage="extraction",
                event="Extraction returned unsupported",
                level="warning",
                document=document,
                payload_json=extraction.metadata_json,
            )
            log_processing_event(
                session=session,
                stage="worker",
                event="Document processing completed",
                level="info",
                document=document,
                payload_json={"status": document.status.value},
            )
            cleanup_processing_logs(session=session, keep_document_sessions=2, keep_unbound_entries=80)
            session.commit()
            return
        if document.extension in IMAGE_EXTENSIONS:
            image_text_type = document.image_text_type or ""
            if image_text_type == IMAGE_TEXT_TYPE_HANDWRITING:
                style_settings = read_handwriting_style_settings()
                if not bool(style_settings.get("enabled", True)):
                    document.handwriting_style_id = None
                    metadata_json = dict(document.metadata_json)
                    metadata_json.pop("handwriting_style", None)
                    metadata_json["handwriting_style_disabled"] = True
                    document.metadata_json = metadata_json
                    log_processing_event(
                        session=session,
                        stage="style",
                        event="Handwriting style clustering disabled",
                        level="warning",
                        document=document,
                        payload_json={
                            "enabled": False,
                            "embed_model": style_settings.get("embed_model"),
                        },
                    )
                else:
                    try:
                        assignment = assign_handwriting_style(
                            session=session,
                            document=document,
                            image_data=data,
                        )
                        document.handwriting_style_id = assignment.style_cluster_id
                        metadata_json = dict(document.metadata_json)
                        metadata_json["handwriting_style"] = {
                            "style_cluster_id": assignment.style_cluster_id,
                            "matched_existing": assignment.matched_existing,
                            "similarity": assignment.similarity,
                            "vector_distance": assignment.vector_distance,
                            "compared_neighbors": assignment.compared_neighbors,
                            "match_min_similarity": assignment.match_min_similarity,
                            "bootstrap_match_min_similarity": assignment.bootstrap_match_min_similarity,
                        }
                        metadata_json.pop("handwriting_style_disabled", None)
                        document.metadata_json = metadata_json
                        log_processing_event(
                            session=session,
                            stage="style",
                            event="Handwriting style assigned",
                            level="info",
                            document=document,
                            payload_json=metadata_json["handwriting_style"],
                        )
                    except Exception as style_error:
                        document.handwriting_style_id = None
                        metadata_json = dict(document.metadata_json)
                        metadata_json["handwriting_style_error"] = str(style_error)
                        metadata_json.pop("handwriting_style", None)
                        metadata_json.pop("handwriting_style_disabled", None)
                        document.metadata_json = metadata_json
                        log_processing_event(
                            session=session,
                            stage="style",
                            event="Handwriting style assignment failed",
                            level="error",
                            document=document,
                            response_text=str(style_error),
                        )
            else:
                document.handwriting_style_id = None
                metadata_json = dict(document.metadata_json)
                metadata_json.pop("handwriting_style", None)
                metadata_json.pop("handwriting_style_disabled", None)
                document.metadata_json = metadata_json
                try:
                    delete_handwriting_style_document(str(document.id))
                except Exception:
                    pass
        if document.extension in IMAGE_EXTENSIONS:
            log_processing_event(
                session=session,
                stage="ocr",
                event="OCR response received",
                level="info",
                document=document,
                provider_id=str(
                    extraction.metadata_json.get(
                        "transcription_provider",
                        extraction.metadata_json.get("image_text_type_provider", ""),
                    )
                ),
                model_name=str(
                    extraction.metadata_json.get(
                        "transcription_model",
                        extraction.metadata_json.get("image_text_type_model", ""),
                    )
                ),
                response_text=extraction.text,
                payload_json={
                    "image_text_type": document.image_text_type,
                    "image_text_type_confidence": extraction.metadata_json.get("image_text_type_confidence"),
                    "transcription_skipped": extraction.metadata_json.get("transcription_skipped"),
                    "uncertainty_count": len(
                        extraction.metadata_json.get("transcription_uncertainties", [])
                        if isinstance(extraction.metadata_json.get("transcription_uncertainties", []), list)
                        else []
                    )
                },
            )
        else:
            log_processing_event(
                session=session,
                stage="extraction",
                event="Text extraction completed",
                level="info",
                document=document,
                response_text=extraction.text,
                payload_json={"text_length": len(extraction.text)},
            )
        document.extracted_text = extraction.text
        try:
            summary_text = summarize_document(session=session, document=document)
            routing_decision = classify_document_routing(session=session, document=document, summary_text=summary_text)
            apply_routing_decision(document=document, decision=routing_decision, session=session)
            routing_metadata = document.metadata_json.get("routing", {})
            log_processing_event(
                session=session,
                stage="routing",
                event="Routing decision applied",
                level="info",
                document=document,
                payload_json=routing_metadata if isinstance(routing_metadata, dict) else {},
            )
            log_processing_event(
                session=session,
                stage="indexing",
                event="Typesense upsert started",
                level="info",
                document=document,
            )
            upsert_semantic_index(document=document, summary_text=summary_text)
            log_processing_event(
                session=session,
                stage="indexing",
                event="Typesense upsert completed",
                level="info",
                document=document,
            )
            metadata_json = dict(document.metadata_json)
            metadata_json["summary_text"] = summary_text[:20000]
            document.metadata_json = metadata_json
        except Exception as exc:
            document.metadata_json = {
                **document.metadata_json,
                "routing_error": str(exc),
            }
            log_processing_event(
                session=session,
                stage="routing",
                event="Routing or indexing failed",
                level="error",
                document=document,
                response_text=str(exc),
            )
        document.status = DocumentStatus.PROCESSED
        document.processed_at = datetime.now(UTC)
        log_processing_event(
            session=session,
            stage="worker",
            event="Document processing completed",
            level="info",
            document=document,
            payload_json={"status": document.status.value},
        )
        cleanup_processing_logs(session=session, keep_document_sessions=2, keep_unbound_entries=80)
        session.commit()
@@ -0,0 +1,18 @@
 fastapi==0.116.1
 uvicorn[standard]==0.35.0
 sqlalchemy==2.0.39
 psycopg[binary]==3.2.9
 pydantic-settings==2.10.1
 python-multipart==0.0.20
 redis==6.4.0
 rq==2.3.2
 python-magic==0.4.27
 pillow==11.3.0
 pypdf==5.9.0
 pymupdf==1.26.4
 python-docx==1.2.0
 openpyxl==3.1.5
 orjson==3.11.3
 openai==1.107.2
 typesense==1.1.1
 tiktoken==0.11.0
@@ -0,0 +1,15 @@
 # Documentation
 This is the documentation entrypoint for DMS.
 ## Available Documents
 - Project setup and operations: `../README.md`
 - Frontend visual system and compact UI rules: `frontend-design-foundation.md`
 - Handwriting style implementation plan: `../PLAN.md`
 ## Planned Additions
 - Architecture overview
 - Data model reference
 - API contract details
@@ -0,0 +1,49 @@
 # Frontend Design Foundation
 ## Direction
 The DCM frontend now follows a compact command-deck direction:
 - dark layered surfaces with strong separation between sections
 - tight spacing and small radii to maximize information density
 - consistent control primitives across buttons, inputs, selects, and panels
 - high-legibility typography tuned for metadata-heavy workflows
 ## Token Source
 Use `frontend/src/design-foundation.css` as the single token source for:
 - typography (`--font-display`, `--font-body`, `--font-mono`)
 - color system (`--color-*`)
 - spacing (`--space-*`)
 - radii and shadows (`--radius-*`, `--shadow-*`)
 - interaction timing (`--transition-*`)
 Do not hardcode new palette or spacing values in component styles when a token already exists.
 ## Layout Principles
 - The top bar is sticky and should remain compact under all breakpoints.
 - Documents and viewer operate as a two-pane layout on desktop and collapse to one pane on narrower screens.
 - Toolbar rows should keep critical actions visible without forcing large vertical gaps.
 - Settings sections should preserve dense form grouping while remaining keyboard friendly.
 ## Control Standards
 - Global input, select, textarea, and button styles are defined once in `frontend/src/styles.css`.
 - Variant button classes (`secondary-action`, `active-view-button`, `warning-action`, `danger-action`) are the only approved button color routes.
 - Tag chips, routing pills, card chips, and icon buttons must stay within the compact radius and spacing scale.
 - Focus states use `:focus-visible` and tokenized focus color to preserve keyboard discoverability.
 ## Motion Rules
 - Use `rise-in` for section entry and `pulse-border` for card selection emphasis.
 - Keep transitions brief and functional.
 - Avoid decorative animation loops outside explicit status indicators like terminal caret blink.
 ## Extension Checklist
 When adding or redesigning a UI area:
 1. Start from existing tokens in `frontend/src/design-foundation.css`.
 2. Add missing tokens in `frontend/src/design-foundation.css`, not per-component styles.
 3. Implement component styles in `frontend/src/styles.css` using existing layout and variant conventions.
 4. Validate responsive behavior at `1240px`, `1040px`, `760px`, and `560px` breakpoints.
 5. Verify keyboard focus visibility and text contrast before merging.
@@ -0,0 +1,111 @@
 services:
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: dcm
      POSTGRES_PASSWORD: dcm
      POSTGRES_DB: dcm
    ports:
      - "5432:5432"
    volumes:
      - db-data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U dcm -d dcm"]
      interval: 10s
      timeout: 5s
      retries: 10
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data
  typesense:
    image: typesense/typesense:29.0
    command:
      - "--data-dir=/data"
      - "--api-key=dcm-typesense-key"
      - "--enable-cors"
    ports:
      - "8108:8108"
    volumes:
      - typesense-data:/data
  api:
    build:
      context: ./backend
    environment:
      APP_ENV: development
      DATABASE_URL: postgresql+psycopg://dcm:dcm@db:5432/dcm
      REDIS_URL: redis://redis:6379/0
      STORAGE_ROOT: /data/storage
      OCR_LANGUAGES: eng,deu
      PUBLIC_BASE_URL: http://192.168.2.5:8000
      CORS_ORIGINS: '["http://localhost:5173","http://localhost:3000","http://192.168.2.5:5173"]'
      TYPESENSE_PROTOCOL: http
      TYPESENSE_HOST: typesense
      TYPESENSE_PORT: 8108
      TYPESENSE_API_KEY: dcm-typesense-key
      TYPESENSE_COLLECTION_NAME: documents
    ports:
      - "8000:8000"
    volumes:
      - ./backend/app:/app/app
      - dcm-storage:/data
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
      typesense:
        condition: service_started
  worker:
    build:
      context: ./backend
    command: ["rq", "worker", "dcm", "--url", "redis://redis:6379/0"]
    environment:
      APP_ENV: development
      DATABASE_URL: postgresql+psycopg://dcm:dcm@db:5432/dcm
      REDIS_URL: redis://redis:6379/0
      STORAGE_ROOT: /data/storage
      OCR_LANGUAGES: eng,deu
      PUBLIC_BASE_URL: http://localhost:8000
      TYPESENSE_PROTOCOL: http
      TYPESENSE_HOST: typesense
      TYPESENSE_PORT: 8108
      TYPESENSE_API_KEY: dcm-typesense-key
      TYPESENSE_COLLECTION_NAME: documents
    volumes:
      - ./backend/app:/app/app
      - dcm-storage:/data
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
      typesense:
        condition: service_started
  frontend:
    build:
      context: ./frontend
    environment:
      VITE_API_BASE: http://192.168.2.5:8000/api/v1
    ports:
      - "5173:5173"
    volumes:
      - ./frontend/src:/app/src
      - ./frontend/index.html:/app/index.html
      - ./frontend/vite.config.ts:/app/vite.config.ts
    depends_on:
      api:
        condition: service_started
 volumes:
  db-data:
  redis-data:
  dcm-storage:
  typesense-data:
@@ -0,0 +1,16 @@
 FROM node:22-alpine
 WORKDIR /app
 COPY package.json /app/package.json
 RUN npm install
 COPY tsconfig.json /app/tsconfig.json
 COPY tsconfig.node.json /app/tsconfig.node.json
 COPY vite.config.ts /app/vite.config.ts
 COPY index.html /app/index.html
 COPY src /app/src
 EXPOSE 5173
 CMD ["npm", "run", "dev", "--", "--host", "0.0.0.0", "--port", "5173"]
@@ -0,0 +1,12 @@
 <!doctype html>
 <html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>DCM DMS</title>
  </head>
  <body>
    <div id="root"></div>
    <script type="module" src="/src/main.tsx"></script>
  </body>
 </html>
@@ -0,0 +1,22 @@
 {
  "name": "dcm-dms-frontend",
  "version": "0.1.0",
  "private": true,
  "type": "module",
  "scripts": {
    "dev": "vite",
    "build": "tsc -b && vite build",
    "preview": "vite preview --host 0.0.0.0 --port 4173"
  },
  "dependencies": {
    "lucide-react": "latest",
    "react": "19.1.1",
    "react-dom": "19.1.1"
  },
  "devDependencies": {
    "@types/react": "19.1.11",
    "@types/react-dom": "19.1.7",
    "typescript": "5.9.2",
    "vite": "7.1.5"
  }
 }
@@ -0,0 +1,795 @@
 /**
 * Main application layout and orchestration for document and settings workspaces.
 */
 import { useCallback, useEffect, useMemo, useRef, useState } from 'react';
 import ActionModal from './components/ActionModal';
 import DocumentGrid from './components/DocumentGrid';
 import DocumentViewer from './components/DocumentViewer';
 import PathInput from './components/PathInput';
 import ProcessingLogPanel from './components/ProcessingLogPanel';
 import SearchFiltersBar from './components/SearchFiltersBar';
 import SettingsScreen from './components/SettingsScreen';
 import UploadSurface from './components/UploadSurface';
 import {
  clearProcessingLogs,
  deleteDocument,
  exportContentsMarkdown,
  getAppSettings,
  listDocuments,
  listPaths,
  listProcessingLogs,
  listTags,
  listTypes,
  resetAppSettings,
  searchDocuments,
  trashDocument,
  updateAppSettings,
  uploadDocuments,
 } from './lib/api';
 import type { AppSettings, AppSettingsUpdate, DmsDocument, ProcessingLogEntry } from './types';
 type AppScreen = 'documents' | 'settings';
 type DocumentView = 'active' | 'trash';
 interface DialogOption {
  key: string;
  label: string;
  tone?: 'neutral' | 'primary' | 'warning' | 'danger';
 }
 interface DialogState {
  title: string;
  message: string;
  options: DialogOption[];
 }
 /**
 * Defines the root DMS frontend component.
 */
 export default function App(): JSX.Element {
  const DEFAULT_PAGE_SIZE = 12;
  const [screen, setScreen] = useState<AppScreen>('documents');
  const [documentView, setDocumentView] = useState<DocumentView>('active');
  const [documents, setDocuments] = useState<DmsDocument[]>([]);
  const [totalDocuments, setTotalDocuments] = useState<number>(0);
  const [currentPage, setCurrentPage] = useState<number>(1);
  const [isLoading, setIsLoading] = useState<boolean>(false);
  const [isUploading, setIsUploading] = useState<boolean>(false);
  const [searchText, setSearchText] = useState<string>('');
  const [activeSearchQuery, setActiveSearchQuery] = useState<string>('');
  const [selectedDocumentId, setSelectedDocumentId] = useState<string | null>(null);
  const [selectedDocumentIds, setSelectedDocumentIds] = useState<string[]>([]);
  const [exportPathInput, setExportPathInput] = useState<string>('');
  const [tagFilter, setTagFilter] = useState<string>('');
  const [typeFilter, setTypeFilter] = useState<string>('');
  const [pathFilter, setPathFilter] = useState<string>('');
  const [processedFrom, setProcessedFrom] = useState<string>('');
  const [processedTo, setProcessedTo] = useState<string>('');
  const [knownTags, setKnownTags] = useState<string[]>([]);
  const [knownPaths, setKnownPaths] = useState<string[]>([]);
  const [knownTypes, setKnownTypes] = useState<string[]>([]);
  const [appSettings, setAppSettings] = useState<AppSettings | null>(null);
  const [settingsSaveAction, setSettingsSaveAction] = useState<(() => Promise<void>) | null>(null);
  const [processingLogs, setProcessingLogs] = useState<ProcessingLogEntry[]>([]);
  const [isLoadingLogs, setIsLoadingLogs] = useState<boolean>(false);
  const [isClearingLogs, setIsClearingLogs] = useState<boolean>(false);
  const [processingLogError, setProcessingLogError] = useState<string | null>(null);
  const [isSavingSettings, setIsSavingSettings] = useState<boolean>(false);
  const [isRunningBulkAction, setIsRunningBulkAction] = useState<boolean>(false);
  const [error, setError] = useState<string | null>(null);
  const [dialogState, setDialogState] = useState<DialogState | null>(null);
  const dialogResolverRef = useRef<((value: string) => void) | null>(null);
  const pageSize = useMemo(() => {
    const configured = appSettings?.display?.cards_per_page;
    if (!configured || Number.isNaN(configured)) {
      return DEFAULT_PAGE_SIZE;
    }
    return Math.max(1, Math.min(200, configured));
  }, [appSettings]);
  const presentDialog = useCallback((title: string, message: string, options: DialogOption[]): Promise<string> => {
    setDialogState({ title, message, options });
    return new Promise<string>((resolve) => {
      dialogResolverRef.current = resolve;
    });
  }, []);
  const requestConfirmation = useCallback(
    async (title: string, message: string, confirmLabel = 'Confirm'): Promise<boolean> => {
      const choice = await presentDialog(title, message, [
        { key: 'cancel', label: 'Cancel', tone: 'neutral' },
        { key: 'confirm', label: confirmLabel, tone: 'danger' },
      ]);
      return choice === 'confirm';
    },
    [presentDialog],
  );
  const closeDialog = useCallback((key: string): void => {
    const resolver = dialogResolverRef.current;
    dialogResolverRef.current = null;
    setDialogState(null);
    if (resolver) {
      resolver(key);
    }
  }, []);
  const downloadBlob = useCallback((blob: Blob, filename: string): void => {
    const objectUrl = URL.createObjectURL(blob);
    const anchor = document.createElement('a');
    anchor.href = objectUrl;
    anchor.download = filename;
    anchor.click();
    URL.revokeObjectURL(objectUrl);
  }, []);
  const loadCatalogs = useCallback(async (): Promise<void> => {
    const [tags, paths, types] = await Promise.all([listTags(true), listPaths(true), listTypes(true)]);
    setKnownTags(tags);
    setKnownPaths(paths);
    setKnownTypes(types);
  }, []);
  const loadDocuments = useCallback(async (options?: { silent?: boolean }): Promise<void> => {
    const silent = options?.silent ?? false;
    if (!silent) {
      setIsLoading(true);
      setError(null);
    }
    try {
      const offset = (currentPage - 1) * pageSize;
      const search = activeSearchQuery.trim();
      const filters = {
        pathFilter,
        tagFilter,
        typeFilter,
        processedFrom,
        processedTo,
      };
      const payload =
        search.length > 0
          ? await searchDocuments(search, {
              limit: pageSize,
              offset,
              onlyTrashed: documentView === 'trash',
              ...filters,
            })
          : await listDocuments({
              limit: pageSize,
              offset,
              onlyTrashed: documentView === 'trash',
              ...filters,
            });
      setDocuments(payload.items);
      setTotalDocuments(payload.total);
      if (payload.items.length === 0) {
        setSelectedDocumentId(null);
      } else if (!payload.items.some((item) => item.id === selectedDocumentId)) {
        setSelectedDocumentId(payload.items[0].id);
      }
      setSelectedDocumentIds((current) => current.filter((documentId) => payload.items.some((item) => item.id === documentId)));
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Failed to load documents');
    } finally {
      if (!silent) {
        setIsLoading(false);
      }
    }
  }, [
    activeSearchQuery,
    currentPage,
    documentView,
    pageSize,
    pathFilter,
    processedFrom,
    processedTo,
    selectedDocumentId,
    tagFilter,
    typeFilter,
  ]);
  const loadSettings = useCallback(async (): Promise<void> => {
    setError(null);
    try {
      const payload = await getAppSettings();
      setAppSettings(payload);
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Failed to load settings');
    }
  }, []);
  const loadProcessingTimeline = useCallback(async (options?: { silent?: boolean }): Promise<void> => {
    const silent = options?.silent ?? false;
    if (!silent) {
      setIsLoadingLogs(true);
    }
    try {
      const payload = await listProcessingLogs({ limit: 180 });
      setProcessingLogs(payload.items);
      setProcessingLogError(null);
    } catch (caughtError) {
      setProcessingLogError(caughtError instanceof Error ? caughtError.message : 'Failed to load processing logs');
    } finally {
      if (!silent) {
        setIsLoadingLogs(false);
      }
    }
  }, []);
  useEffect(() => {
    const bootstrap = async (): Promise<void> => {
      try {
        await Promise.all([loadDocuments(), loadCatalogs(), loadSettings(), loadProcessingTimeline()]);
      } catch (caughtError) {
        setError(caughtError instanceof Error ? caughtError.message : 'Failed to initialize application');
      }
    };
    void bootstrap();
  }, [loadCatalogs, loadDocuments, loadProcessingTimeline, loadSettings]);
  useEffect(() => {
    setSelectedDocumentIds([]);
    setCurrentPage(1);
  }, [documentView, pageSize]);
  useEffect(() => {
    if (screen !== 'documents') {
      return;
    }
    void loadDocuments();
  }, [loadDocuments, screen]);
  useEffect(() => {
    if (screen !== 'documents') {
      return;
    }
    const pollInterval = window.setInterval(() => {
      void loadDocuments({ silent: true });
    }, 3000);
    return () => window.clearInterval(pollInterval);
  }, [loadDocuments, screen]);
  useEffect(() => {
    if (screen !== 'documents') {
      return;
    }
    void loadProcessingTimeline();
    const pollInterval = window.setInterval(() => {
      void loadProcessingTimeline({ silent: true });
    }, 1500);
    return () => window.clearInterval(pollInterval);
  }, [loadProcessingTimeline, screen]);
  const selectedDocument = useMemo(
    () => documents.find((document) => document.id === selectedDocumentId) ?? null,
    [documents, selectedDocumentId],
  );
  const totalPages = useMemo(() => Math.max(1, Math.ceil(totalDocuments / pageSize)), [pageSize, totalDocuments]);
  const allVisibleSelected = useMemo(() => documents.length > 0 && documents.every((document) => selectedDocumentIds.includes(document.id)), [documents, selectedDocumentIds]);
  const isProcessingActive = useMemo(() => documents.some((document) => document.status === 'queued'), [documents]);
  const typingAnimationEnabled = appSettings?.display?.log_typing_animation_enabled ?? true;
  const hasActiveSearch = Boolean(
    activeSearchQuery.trim() || tagFilter || typeFilter || pathFilter || processedFrom || processedTo,
  );
  const handleUpload = useCallback(async (files: File[]): Promise<void> => {
    if (files.length === 0) {
      return;
    }
    setIsUploading(true);
    setError(null);
    try {
      const uploadDefaults = appSettings?.upload_defaults ?? { logical_path: 'Inbox', tags: [] };
      const tagsCsv = uploadDefaults.tags.join(',');
      const firstAttempt = await uploadDocuments(files, {
        logicalPath: uploadDefaults.logical_path,
        tags: tagsCsv,
        conflictMode: 'ask',
      });
      if (firstAttempt.conflicts.length > 0) {
        const choice = await presentDialog(
          'Upload Conflicts Detected',
          `${firstAttempt.conflicts.length} file(s) already exist. Replace existing records or keep duplicates?`,
          [
            { key: 'duplicate', label: 'Keep Duplicates', tone: 'neutral' },
            { key: 'replace', label: 'Replace Existing', tone: 'warning' },
          ],
        );
        await uploadDocuments(files, {
          logicalPath: uploadDefaults.logical_path,
          tags: tagsCsv,
          conflictMode: choice === 'replace' ? 'replace' : 'duplicate',
        });
      }
      await Promise.all([loadDocuments(), loadCatalogs(), loadProcessingTimeline()]);
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Upload failed');
    } finally {
      setIsUploading(false);
    }
  }, [appSettings, loadCatalogs, loadDocuments, loadProcessingTimeline, presentDialog]);
  const handleSearch = useCallback(async (): Promise<void> => {
    setSelectedDocumentIds([]);
    setCurrentPage(1);
    setActiveSearchQuery(searchText.trim());
  }, [searchText]);
  const handleResetSearch = useCallback((): void => {
    setSearchText('');
    setActiveSearchQuery('');
    setTagFilter('');
    setTypeFilter('');
    setPathFilter('');
    setProcessedFrom('');
    setProcessedTo('');
    setCurrentPage(1);
    setSelectedDocumentIds([]);
  }, []);
  const handleDocumentUpdated = useCallback((updated: DmsDocument): void => {
    setDocuments((current) => {
      const shouldAppear = documentView === 'trash' ? updated.status === 'trashed' : updated.status !== 'trashed';
      if (!shouldAppear) {
        return current.filter((document) => document.id !== updated.id);
      }
      const exists = current.some((document) => document.id === updated.id);
      if (!exists) {
        return [updated, ...current];
      }
      return current.map((document) => (document.id === updated.id ? updated : document));
    });
    if (documentView === 'trash' && updated.status !== 'trashed') {
      setSelectedDocumentIds((current) => current.filter((id) => id !== updated.id));
      if (selectedDocumentId === updated.id) {
        setSelectedDocumentId(null);
      }
    }
    if (documentView === 'active' && updated.status === 'trashed') {
      setSelectedDocumentIds((current) => current.filter((id) => id !== updated.id));
      if (selectedDocumentId === updated.id) {
        setSelectedDocumentId(null);
      }
    }
    void loadCatalogs();
  }, [documentView, loadCatalogs, selectedDocumentId]);
  const handleDocumentDeleted = useCallback((documentId: string): void => {
    setDocuments((current) => current.filter((document) => document.id !== documentId));
    setSelectedDocumentIds((current) => current.filter((id) => id !== documentId));
    if (selectedDocumentId === documentId) {
      setSelectedDocumentId(null);
    }
    void loadCatalogs();
  }, [loadCatalogs, selectedDocumentId]);
  const handleToggleChecked = useCallback((documentId: string, checked: boolean): void => {
    setSelectedDocumentIds((current) => {
      if (checked && !current.includes(documentId)) {
        return [...current, documentId];
      }
      if (!checked) {
        return current.filter((item) => item !== documentId);
      }
      return current;
    });
  }, []);
  const handleToggleSelectAllVisible = useCallback((): void => {
    if (documents.length === 0) {
      return;
    }
    if (allVisibleSelected) {
      setSelectedDocumentIds([]);
      return;
    }
    setSelectedDocumentIds(documents.map((document) => document.id));
  }, [allVisibleSelected, documents]);
  const handleTrashSelected = useCallback(async (): Promise<void> => {
    if (selectedDocumentIds.length === 0) {
      return;
    }
    setIsRunningBulkAction(true);
    setError(null);
    try {
      await Promise.all(selectedDocumentIds.map((documentId) => trashDocument(documentId)));
      setSelectedDocumentIds([]);
      await Promise.all([loadDocuments(), loadCatalogs()]);
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Failed to trash selected documents');
    } finally {
      setIsRunningBulkAction(false);
    }
  }, [loadCatalogs, loadDocuments, selectedDocumentIds]);
  const handleTrashDocumentCard = useCallback(async (documentId: string): Promise<void> => {
    if (documentView === 'trash') {
      return;
    }
    setError(null);
    try {
      await trashDocument(documentId);
      setSelectedDocumentIds((current) => current.filter((id) => id !== documentId));
      if (selectedDocumentId === documentId) {
        setSelectedDocumentId(null);
      }
      await Promise.all([loadDocuments(), loadCatalogs()]);
    } catch (caughtError) {
      const message = caughtError instanceof Error ? caughtError.message : 'Failed to trash document';
      setError(message);
      throw caughtError instanceof Error ? caughtError : new Error(message);
    }
  }, [documentView, loadCatalogs, loadDocuments, selectedDocumentId]);
  const handleDeleteSelected = useCallback(async (): Promise<void> => {
    if (selectedDocumentIds.length === 0) {
      return;
    }
    const confirmed = await requestConfirmation(
      'Delete Selected Documents Permanently',
      'This removes selected documents and stored files permanently.',
      'Delete Permanently',
    );
    if (!confirmed) {
      return;
    }
    setIsRunningBulkAction(true);
    setError(null);
    try {
      await Promise.all(selectedDocumentIds.map((documentId) => deleteDocument(documentId)));
      setSelectedDocumentIds([]);
      await Promise.all([loadDocuments(), loadCatalogs()]);
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Failed to delete selected documents');
    } finally {
      setIsRunningBulkAction(false);
    }
  }, [loadCatalogs, loadDocuments, requestConfirmation, selectedDocumentIds]);
  const handleExportSelected = useCallback(async (): Promise<void> => {
    if (selectedDocumentIds.length === 0) {
      return;
    }
    setIsRunningBulkAction(true);
    setError(null);
    try {
      const result = await exportContentsMarkdown({
        document_ids: selectedDocumentIds,
        only_trashed: documentView === 'trash',
        include_trashed: false,
      });
      downloadBlob(result.blob, result.filename);
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Failed to export selected markdown files');
    } finally {
      setIsRunningBulkAction(false);
    }
  }, [documentView, downloadBlob, selectedDocumentIds]);
  const handleExportPath = useCallback(async (): Promise<void> => {
    const trimmedPrefix = exportPathInput.trim();
    if (!trimmedPrefix) {
      setError('Enter a path prefix before exporting by path');
      return;
    }
    setIsRunningBulkAction(true);
    setError(null);
    try {
      const result = await exportContentsMarkdown({
        path_prefix: trimmedPrefix,
        only_trashed: documentView === 'trash',
        include_trashed: false,
      });
      downloadBlob(result.blob, result.filename);
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Failed to export path markdown files');
    } finally {
      setIsRunningBulkAction(false);
    }
  }, [documentView, downloadBlob, exportPathInput]);
  const handleSaveSettings = useCallback(async (payload: AppSettingsUpdate): Promise<void> => {
    setIsSavingSettings(true);
    setError(null);
    try {
      const updated = await updateAppSettings(payload);
      setAppSettings(updated);
      await loadCatalogs();
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Failed to save settings');
      throw caughtError;
    } finally {
      setIsSavingSettings(false);
    }
  }, [loadCatalogs]);
  const handleSaveSettingsFromHeader = useCallback(async (): Promise<void> => {
    if (!settingsSaveAction) {
      setError('Settings are still loading');
      return;
    }
    await settingsSaveAction();
  }, [settingsSaveAction]);
  const handleRegisterSettingsSaveAction = useCallback((action: (() => Promise<void>) | null): void => {
    setSettingsSaveAction(() => action);
  }, []);
  const handleResetSettings = useCallback(async (): Promise<void> => {
    const confirmed = await requestConfirmation(
      'Reset Settings',
      'This resets all settings to defaults and overwrites current values.',
      'Reset Settings',
    );
    if (!confirmed) {
      return;
    }
    setIsSavingSettings(true);
    setError(null);
    try {
      const updated = await resetAppSettings();
      setAppSettings(updated);
      await loadCatalogs();
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Failed to reset settings');
    } finally {
      setIsSavingSettings(false);
    }
  }, [loadCatalogs, requestConfirmation]);
  const handleClearProcessingLogs = useCallback(async (): Promise<void> => {
    const confirmed = await requestConfirmation(
      'Clear Processing Log',
      'This clears the full diagnostics timeline.',
      'Clear Logs',
    );
    if (!confirmed) {
      return;
    }
    setIsClearingLogs(true);
    try {
      await clearProcessingLogs();
      await loadProcessingTimeline();
      setProcessingLogError(null);
    } catch (caughtError) {
      setProcessingLogError(caughtError instanceof Error ? caughtError.message : 'Failed to clear processing logs');
    } finally {
      setIsClearingLogs(false);
    }
  }, [loadProcessingTimeline, requestConfirmation]);
  const handleFilterPathFromCard = useCallback((pathValue: string): void => {
    setActiveSearchQuery('');
    setSearchText('');
    setTagFilter('');
    setTypeFilter('');
    setProcessedFrom('');
    setProcessedTo('');
    setPathFilter(pathValue);
    setCurrentPage(1);
  }, []);
  const handleFilterTagFromCard = useCallback((tagValue: string): void => {
    setActiveSearchQuery('');
    setSearchText('');
    setPathFilter('');
    setTypeFilter('');
    setProcessedFrom('');
    setProcessedTo('');
    setTagFilter(tagValue);
    setCurrentPage(1);
  }, []);
  return (
    <main className="app-shell">
      <header className="topbar">
        <div>
          <h1>LedgerDock</h1>
          <p>Document command deck for OCR, routing intelligence, and controlled metadata ops.</p>
        </div>
        <div className="topbar-controls">
          <div className="topbar-nav-group">
            <button
              type="button"
              className={screen === 'documents' && documentView === 'active' ? 'active-view-button' : 'secondary-action'}
              onClick={() => {
                setScreen('documents');
                setDocumentView('active');
              }}
            >
              Documents
            </button>
            <button
              type="button"
              className={screen === 'documents' && documentView === 'trash' ? 'active-view-button' : 'secondary-action'}
              onClick={() => {
                setScreen('documents');
                setDocumentView('trash');
              }}
            >
              Trash
            </button>
            <button
              type="button"
              className={screen === 'settings' ? 'active-view-button' : 'secondary-action'}
              onClick={() => setScreen('settings')}
            >
              Settings
            </button>
          </div>
          {screen === 'documents' && (
            <div className="topbar-document-group">
              <UploadSurface onUploadRequested={handleUpload} isUploading={isUploading} variant="inline" />
            </div>
          )}
          {screen === 'settings' && (
            <div className="topbar-settings-group">
              <button type="button" className="secondary-action" onClick={() => void handleResetSettings()} disabled={isSavingSettings}>
                Reset To Defaults
              </button>
              <button type="button" onClick={() => void handleSaveSettingsFromHeader()} disabled={isSavingSettings || !settingsSaveAction}>
                {isSavingSettings ? 'Saving Settings...' : 'Save Settings'}
              </button>
            </div>
          )}
        </div>
      </header>
      {error && <p className="error-banner">{error}</p>}
      {screen === 'settings' && (
        <SettingsScreen
          settings={appSettings}
          isSaving={isSavingSettings}
          knownTags={knownTags}
          knownPaths={knownPaths}
          onSave={handleSaveSettings}
          onRegisterSaveAction={handleRegisterSettingsSaveAction}
        />
      )}
      {screen === 'documents' && (
        <>
          <section className="layout-grid">
            <div>
              <div className="panel-header document-panel-header">
                <div className="document-panel-title-row">
                  <h2>{documentView === 'trash' ? 'Trashed Documents' : 'Documents'}</h2>
                  <p>{isLoading ? 'Loading...' : `${totalDocuments} document(s)`}</p>
                </div>
                <SearchFiltersBar
                  searchText={searchText}
                  onSearchTextChange={setSearchText}
                  onSearchSubmit={() => void handleSearch()}
                  onReset={handleResetSearch}
                  hasActiveSearch={hasActiveSearch}
                  knownTags={knownTags}
                  knownPaths={knownPaths}
                  knownTypes={knownTypes}
                  tagFilter={tagFilter}
                  onTagFilterChange={(value) => {
                    setTagFilter(value);
                    setCurrentPage(1);
                  }}
                  typeFilter={typeFilter}
                  onTypeFilterChange={(value) => {
                    setTypeFilter(value);
                    setCurrentPage(1);
                  }}
                  pathFilter={pathFilter}
                  onPathFilterChange={(value) => {
                    setPathFilter(value);
                    setCurrentPage(1);
                  }}
                  processedFrom={processedFrom}
                  onProcessedFromChange={(value) => {
                    setProcessedFrom(value);
                    setCurrentPage(1);
                  }}
                  processedTo={processedTo}
                  onProcessedToChange={(value) => {
                    setProcessedTo(value);
                    setCurrentPage(1);
                  }}
                  isLoading={isLoading}
                />
                <div className="document-toolbar-row">
                  <div className="document-toolbar-pagination compact-pagination">
                    <button type="button" className="secondary-action" onClick={() => setCurrentPage(1)} disabled={isLoading || currentPage <= 1}>
                      First
                    </button>
                    <button type="button" className="secondary-action" onClick={() => setCurrentPage((current) => Math.max(1, current - 1))} disabled={isLoading || currentPage <= 1}>
                      Prev
                    </button>
                    <span className="small">Page {currentPage} / {totalPages}</span>
                    <button type="button" className="secondary-action" onClick={() => setCurrentPage((current) => Math.min(totalPages, current + 1))} disabled={isLoading || currentPage >= totalPages}>
                      Next
                    </button>
                    <button type="button" className="secondary-action" onClick={() => setCurrentPage(totalPages)} disabled={isLoading || currentPage >= totalPages}>
                      Last
                    </button>
                  </div>
                </div>
                <div className="document-toolbar-row">
                  <div className="document-toolbar-selection">
                    <span className="small">Select:</span>
                    <button type="button" className="secondary-action" onClick={handleToggleSelectAllVisible} disabled={documents.length === 0}>
                      {allVisibleSelected ? 'Unselect Page' : 'Select Page'}
                    </button>
                    <span className="small">Selected {selectedDocumentIds.length}</span>
                    {documentView !== 'trash' && (
                      <button type="button" className="warning-action" onClick={() => void handleTrashSelected()} disabled={isRunningBulkAction || selectedDocumentIds.length === 0}>
                        Move To Trash
                      </button>
                    )}
                    {documentView === 'trash' && (
                      <button type="button" className="danger-action" onClick={() => void handleDeleteSelected()} disabled={isRunningBulkAction || selectedDocumentIds.length === 0}>
                        Delete Permanently
                      </button>
                    )}
                    <button type="button" className="secondary-action" onClick={() => void handleExportSelected()} disabled={isRunningBulkAction || selectedDocumentIds.length === 0}>
                      Export Selected MD
                    </button>
                  </div>
                  <div className="document-toolbar-export-path">
                    <PathInput value={exportPathInput} onChange={setExportPathInput} placeholder="Export by path prefix" suggestions={knownPaths} />
                    <button type="button" className="secondary-action" onClick={() => void handleExportPath()} disabled={isRunningBulkAction}>
                      Export Path MD
                    </button>
                  </div>
                </div>
              </div>
              <DocumentGrid
                documents={documents}
                selectedDocumentId={selectedDocumentId}
                selectedDocumentIds={selectedDocumentIds}
                isTrashView={documentView === 'trash'}
                onSelect={(document) => setSelectedDocumentId(document.id)}
                onToggleChecked={handleToggleChecked}
                onTrashDocument={handleTrashDocumentCard}
                onFilterPath={handleFilterPathFromCard}
                onFilterTag={handleFilterTagFromCard}
              />
            </div>
            <DocumentViewer
              document={selectedDocument}
              isTrashView={documentView === 'trash'}
              existingTags={knownTags}
              existingPaths={knownPaths}
              onDocumentUpdated={handleDocumentUpdated}
              onDocumentDeleted={handleDocumentDeleted}
              requestConfirmation={requestConfirmation}
            />
          </section>
          {processingLogError && <p className="error-banner">{processingLogError}</p>}
          <ProcessingLogPanel
            entries={processingLogs}
            isLoading={isLoadingLogs}
            isClearing={isClearingLogs}
            selectedDocumentId={selectedDocumentId}
            isProcessingActive={isProcessingActive}
            typingAnimationEnabled={typingAnimationEnabled}
            onClear={() => void handleClearProcessingLogs()}
          />
        </>
      )}
      <ActionModal
        isOpen={dialogState !== null}
        title={dialogState?.title ?? ''}
        message={dialogState?.message ?? ''}
        options={dialogState?.options ?? []}
        onSelect={closeDialog}
        onDismiss={() => closeDialog('cancel')}
      />
    </main>
  );
 }
@@ -0,0 +1,79 @@
 /**
 * Reusable modal for confirmations and multi-action prompts.
 */
 interface ActionModalOption {
  key: string;
  label: string;
  tone?: 'neutral' | 'primary' | 'warning' | 'danger';
 }
 interface ActionModalProps {
  isOpen: boolean;
  title: string;
  message: string;
  options: ActionModalOption[];
  onSelect: (key: string) => void;
  onDismiss: () => void;
 }
 /**
 * Renders a centered modal dialog with configurable action buttons.
 */
 export default function ActionModal({
  isOpen,
  title,
  message,
  options,
  onSelect,
  onDismiss,
 }: ActionModalProps): JSX.Element | null {
  if (!isOpen) {
    return null;
  }
  return (
    <div
      className="modal-backdrop"
      role="button"
      tabIndex={0}
      aria-label="Close dialog"
      onClick={onDismiss}
      onKeyDown={(event) => {
        if (event.key === 'Escape') {
          onDismiss();
        }
      }}
    >
      <section
        className="action-modal"
        role="dialog"
        aria-modal="true"
        aria-labelledby="action-modal-title"
        onClick={(event) => event.stopPropagation()}
      >
        <h3 id="action-modal-title">{title}</h3>
        <p>{message}</p>
        <div className="action-modal-buttons">
          {options.map((option) => (
            <button
              key={option.key}
              type="button"
              className={
                option.tone === 'danger'
                  ? 'danger-action'
                  : option.tone === 'warning'
                    ? 'warning-action'
                    : option.tone === 'neutral'
                      ? 'secondary-action'
                      : ''
              }
              onClick={() => onSelect(option.key)}
            >
              {option.label}
            </button>
          ))}
        </div>
      </section>
    </div>
  );
 }
@@ -0,0 +1,220 @@
 /**
 * Card view for displaying document summary, preview, and metadata.
 */
 import { useState } from 'react';
 import { Download, FileText, Trash2 } from 'lucide-react';
 import type { DmsDocument } from '../types';
 import { contentMarkdownUrl, downloadUrl, thumbnailUrl } from '../lib/api';
 /**
 * Defines properties accepted by the document card component.
 */
 interface DocumentCardProps {
  document: DmsDocument;
  isSelected: boolean;
  isChecked: boolean;
  isTrashView: boolean;
  onSelect: (document: DmsDocument) => void;
  onToggleChecked: (documentId: string, checked: boolean) => void;
  onTrashDocument: (documentId: string) => Promise<void>;
  onFilterPath: (path: string) => void;
  onFilterTag: (tag: string) => void;
 }
 /**
 * Defines visual processing status variants rendered in the card header indicator.
 */
 type StatusTone = 'success' | 'progress' | 'failed';
 /**
 * Resolves status tone and tooltip text from backend document status values.
 */
 function statusPresentation(status: DmsDocument['status']): { tone: StatusTone; tooltip: string } {
  if (status === 'processed') {
    return { tone: 'success', tooltip: 'Processing status: success' };
  }
  if (status === 'queued') {
    return { tone: 'progress', tooltip: 'Processing status: in progress' };
  }
  if (status === 'error') {
    return { tone: 'failed', tooltip: 'Processing status: failed' };
  }
  if (status === 'unsupported') {
    return { tone: 'failed', tooltip: 'Processing status: failed (unsupported type)' };
  }
  return { tone: 'success', tooltip: 'Processing status: success (moved to trash)' };
 }
 /**
 * Limits logical-path length while preserving start and end context with middle ellipsis.
 */
 function compactLogicalPath(path: string, maxChars = 180): string {
  const normalized = path.trim();
  if (!normalized) {
    return '';
  }
  if (normalized.length <= maxChars) {
    return normalized;
  }
  const keepChars = Math.max(12, maxChars - 3);
  const headChars = Math.ceil(keepChars * 0.6);
  const tailChars = keepChars - headChars;
  return `${normalized.slice(0, headChars)}...${normalized.slice(-tailChars)}`;
 }
 /**
 * Renders one document card with optional image preview and searchable metadata.
 */
 export default function DocumentCard({
  document,
  isSelected,
  isChecked,
  isTrashView,
  onSelect,
  onToggleChecked,
  onTrashDocument,
  onFilterPath,
  onFilterTag,
 }: DocumentCardProps): JSX.Element {
  const [isTrashing, setIsTrashing] = useState<boolean>(false);
  const createdDate = new Date(document.created_at).toLocaleString();
  const status = statusPresentation(document.status);
  const compactPath = compactLogicalPath(document.logical_path, 180);
  const trashDisabled = isTrashView || document.status === 'trashed' || isTrashing;
  const trashTitle = trashDisabled ? 'Already in trash' : 'Move to trash';
  return (
    <article
      className={`document-card ${isSelected ? 'selected' : ''}`}
      role="button"
      tabIndex={0}
      onClick={() => onSelect(document)}
      onKeyDown={(event) => {
        if (event.currentTarget !== event.target) {
          return;
        }
        if (event.key === 'Enter' || event.key === ' ') {
          event.preventDefault();
          onSelect(document);
        }
      }}
    >
      <header className="document-card-header">
        <div
          className={`card-status-indicator ${status.tone}`}
          title={status.tooltip}
          aria-label={status.tooltip}
        />
        <label className="card-checkbox card-checkbox-compact" onClick={(event) => event.stopPropagation()}>
          <input
            type="checkbox"
            checked={isChecked}
            onChange={(event) => onToggleChecked(document.id, event.target.checked)}
            onClick={(event) => event.stopPropagation()}
            aria-label={`Select ${document.original_filename}`}
            title="Select document"
          />
        </label>
      </header>
      <div className="document-preview">
        {document.preview_available ? (
          <img src={thumbnailUrl(document.id)} alt={document.original_filename} loading="lazy" />
        ) : (
          <div className="document-preview-fallback">{document.extension || 'file'}</div>
        )}
      </div>
      <div className="document-content document-card-body">
        <h3 title={`${document.logical_path}/${document.original_filename}`}>
          <span className="document-title-path">{compactPath}/</span>
          <span className="document-title-name">{document.original_filename}</span>
        </h3>
        <p className="document-date">{createdDate}</p>
      </div>
      <footer className="document-card-footer">
        <div className="card-footer-discovery">
          <button
            type="button"
            className="card-chip path-chip"
            onClick={(event) => {
              event.preventDefault();
              event.stopPropagation();
              onFilterPath(document.logical_path);
            }}
            title={`Filter by path: ${document.logical_path}`}
          >
            {document.logical_path}
          </button>
          <div className="card-chip-row">
            {document.tags.slice(0, 4).map((tag) => (
              <button
                key={`${document.id}-${tag}`}
                type="button"
                className="card-chip"
                onClick={(event) => {
                  event.preventDefault();
                  event.stopPropagation();
                  onFilterTag(tag);
                }}
                title={`Filter by tag: ${tag}`}
              >
                #{tag}
              </button>
            ))}
          </div>
        </div>
        <div className="card-action-row">
          <button
            type="button"
            className="card-icon-button"
            aria-label="Download original"
            title="Download original"
            onClick={(event) => {
              event.preventDefault();
              event.stopPropagation();
              window.open(downloadUrl(document.id), '_blank', 'noopener,noreferrer');
            }}
          >
            <Download aria-hidden="true" />
          </button>
          <button
            type="button"
            className="card-icon-button"
            aria-label="Export recognized text as markdown"
            title="Export recognized text as markdown"
            onClick={(event) => {
              event.preventDefault();
              event.stopPropagation();
              window.open(contentMarkdownUrl(document.id), '_blank', 'noopener,noreferrer');
            }}
          >
            <FileText aria-hidden="true" />
          </button>
          <button
            type="button"
            className="card-icon-button danger"
            aria-label={trashTitle}
            title={trashTitle}
            disabled={trashDisabled}
            onClick={async (event) => {
              event.preventDefault();
              event.stopPropagation();
              if (trashDisabled) {
                return;
              }
              setIsTrashing(true);
              try {
                await onTrashDocument(document.id);
              } catch {
              } finally {
                setIsTrashing(false);
              }
            }}
          >
            <Trash2 aria-hidden="true" />
          </button>
        </div>
      </footer>
    </article>
  );
 }
@@ -0,0 +1,54 @@
 /**
 * Grid renderer for document collections.
 */
 import type { DmsDocument } from '../types';
 import DocumentCard from './DocumentCard';
 /**
 * Defines props for document grid rendering.
 */
 interface DocumentGridProps {
  documents: DmsDocument[];
  selectedDocumentId: string | null;
  selectedDocumentIds: string[];
  isTrashView: boolean;
  onSelect: (document: DmsDocument) => void;
  onToggleChecked: (documentId: string, checked: boolean) => void;
  onTrashDocument: (documentId: string) => Promise<void>;
  onFilterPath: (path: string) => void;
  onFilterTag: (tag: string) => void;
 }
 /**
 * Renders cards in a responsive grid with selection state.
 */
 export default function DocumentGrid({
  documents,
  selectedDocumentId,
  selectedDocumentIds,
  isTrashView,
  onSelect,
  onToggleChecked,
  onTrashDocument,
  onFilterPath,
  onFilterTag,
 }: DocumentGridProps): JSX.Element {
  return (
    <section className="document-grid">
      {documents.map((document) => (
        <DocumentCard
          key={document.id}
          document={document}
          onSelect={onSelect}
          isSelected={selectedDocumentId === document.id}
          isChecked={selectedDocumentIds.includes(document.id)}
          onToggleChecked={onToggleChecked}
          isTrashView={isTrashView}
          onTrashDocument={onTrashDocument}
          onFilterPath={onFilterPath}
          onFilterTag={onFilterTag}
        />
      ))}
    </section>
  );
 }
@@ -0,0 +1,585 @@
 /**
 * Embedded document viewer panel for preview, metadata updates, and lifecycle actions.
 */
 import { useEffect, useMemo, useState } from 'react';
 import {
  contentMarkdownUrl,
  deleteDocument,
  getDocumentDetails,
  previewUrl,
  reprocessDocument,
  restoreDocument,
  trashDocument,
  updateDocumentMetadata,
 } from '../lib/api';
 import type { DmsDocument, DmsDocumentDetail } from '../types';
 import PathInput from './PathInput';
 import TagInput from './TagInput';
 /**
 * Defines props for the selected document viewer panel.
 */
 interface DocumentViewerProps {
  document: DmsDocument | null;
  isTrashView: boolean;
  existingTags: string[];
  existingPaths: string[];
  onDocumentUpdated: (document: DmsDocument) => void;
  onDocumentDeleted: (documentId: string) => void;
  requestConfirmation: (title: string, message: string, confirmLabel?: string) => Promise<boolean>;
 }
 /**
 * Renders selected document preview with editable metadata and lifecycle controls.
 */
 export default function DocumentViewer({
  document,
  isTrashView,
  existingTags,
  existingPaths,
  onDocumentUpdated,
  onDocumentDeleted,
  requestConfirmation,
 }: DocumentViewerProps): JSX.Element {
  const [documentDetail, setDocumentDetail] = useState<DmsDocumentDetail | null>(null);
  const [isLoadingDetails, setIsLoadingDetails] = useState<boolean>(false);
  const [originalFilename, setOriginalFilename] = useState<string>('');
  const [logicalPath, setLogicalPath] = useState<string>('');
  const [tags, setTags] = useState<string[]>([]);
  const [isSaving, setIsSaving] = useState<boolean>(false);
  const [isReprocessing, setIsReprocessing] = useState<boolean>(false);
  const [isTrashing, setIsTrashing] = useState<boolean>(false);
  const [isRestoring, setIsRestoring] = useState<boolean>(false);
  const [isDeleting, setIsDeleting] = useState<boolean>(false);
  const [isMetadataDirty, setIsMetadataDirty] = useState<boolean>(false);
  const [error, setError] = useState<string | null>(null);
  /**
   * Syncs editable metadata fields whenever selection changes.
   */
  useEffect(() => {
    if (!document) {
      setDocumentDetail(null);
      setIsMetadataDirty(false);
      return;
    }
    setOriginalFilename(document.original_filename);
    setLogicalPath(document.logical_path);
    setTags(document.tags);
    setIsMetadataDirty(false);
    setError(null);
  }, [document?.id]);
  /**
   * Refreshes editable metadata from list updates only while form is clean.
   */
  useEffect(() => {
    if (!document || isMetadataDirty) {
      return;
    }
    setOriginalFilename(document.original_filename);
    setLogicalPath(document.logical_path);
    setTags(document.tags);
  }, [
    document?.id,
    document?.original_filename,
    document?.logical_path,
    document?.tags,
    isMetadataDirty,
  ]);
  /**
   * Loads full selected-document details for extracted text and metadata display.
   */
  useEffect(() => {
    if (!document) {
      return;
    }
    let cancelled = false;
    const loadDocumentDetails = async (): Promise<void> => {
      setIsLoadingDetails(true);
      try {
        const payload = await getDocumentDetails(document.id);
        if (!cancelled) {
          setDocumentDetail(payload);
        }
      } catch (caughtError) {
        if (!cancelled) {
          setError(caughtError instanceof Error ? caughtError.message : 'Failed to load document details');
        }
      } finally {
        if (!cancelled) {
          setIsLoadingDetails(false);
        }
      }
    };
    void loadDocumentDetails();
    return () => {
      cancelled = true;
    };
  }, [document?.id]);
  /**
   * Resolves whether selected document should render as an image element in preview.
   */
  const isImageDocument = useMemo(() => {
    if (!document) {
      return false;
    }
    return document.mime_type.startsWith('image/');
  }, [document]);
  /**
   * Extracts provider/transcription errors from document metadata for user visibility.
   */
  const transcriptionError = useMemo(() => {
    const value = documentDetail?.metadata_json?.transcription_error;
    return typeof value === 'string' ? value : '';
  }, [documentDetail]);
  /**
   * Extracts routing errors from metadata to surface classification issues.
   */
  const routingError = useMemo(() => {
    const value = documentDetail?.metadata_json?.routing_error;
    return typeof value === 'string' ? value : '';
  }, [documentDetail]);
  /**
   * Builds a compact routing status summary for user visibility.
   */
  const routingSummary = useMemo(() => {
    const value = documentDetail?.metadata_json?.routing;
    if (!value || typeof value !== 'object') {
      return '';
    }
    const routing = value as Record<string, unknown>;
    const confidence = typeof routing.confidence === 'number' ? routing.confidence : null;
    const similarity = typeof routing.neighbor_similarity === 'number' ? routing.neighbor_similarity : null;
    const confidenceThreshold =
      typeof routing.auto_apply_confidence_threshold === 'number'
        ? routing.auto_apply_confidence_threshold
        : null;
    const autoApplied = typeof routing.auto_applied === 'boolean' ? routing.auto_applied : null;
    const autoAppliedPath = typeof routing.auto_applied_path === 'boolean' ? routing.auto_applied_path : null;
    const autoAppliedTags = typeof routing.auto_applied_tags === 'boolean' ? routing.auto_applied_tags : null;
    const blockedReasonsRaw = routing.auto_apply_blocked_reasons;
    const blockedReasons =
      Array.isArray(blockedReasonsRaw) && blockedReasonsRaw.length > 0
        ? blockedReasonsRaw
            .map((reason) => String(reason))
            .map((reason) => {
              if (reason === 'missing_chosen_path') {
                return 'no chosen path';
              }
              if (reason === 'confidence_below_threshold') {
                return 'confidence below threshold';
              }
              if (reason === 'neighbor_similarity_below_threshold') {
                return 'neighbor similarity below threshold';
              }
              return reason;
            })
        : [];
    const parts: string[] = [];
    if (autoApplied !== null) {
      parts.push(`Auto Applied: ${autoApplied ? 'yes' : 'no'}`);
    }
    if (autoApplied) {
      const appliedTargets: string[] = [];
      if (autoAppliedPath) {
        appliedTargets.push('path');
      }
      if (autoAppliedTags) {
        appliedTargets.push('tags');
      }
      if (appliedTargets.length > 0) {
        parts.push(`Applied: ${appliedTargets.join(' + ')}`);
      }
    }
    if (confidence !== null) {
      if (confidenceThreshold !== null) {
        parts.push(`Confidence: ${confidence.toFixed(2)} / ${confidenceThreshold.toFixed(2)}`);
      } else {
        parts.push(`Confidence: ${confidence.toFixed(2)}`);
      }
    }
    if (similarity !== null) {
      parts.push(`Neighbor Similarity (info): ${similarity.toFixed(2)}`);
    }
    if (autoApplied === false && blockedReasons.length > 0) {
      parts.push(`Blocked: ${blockedReasons.join(', ')}`);
    }
    return parts.join(' | ');
  }, [documentDetail]);
  /**
   * Resolves whether routing already auto-applied path and tags.
   */
  const routingAutoApplyState = useMemo(() => {
    const value = documentDetail?.metadata_json?.routing;
    if (!value || typeof value !== 'object') {
      return {
        autoAppliedPath: false,
        autoAppliedTags: false,
      };
    }
    const routing = value as Record<string, unknown>;
    return {
      autoAppliedPath: routing.auto_applied_path === true,
      autoAppliedTags: routing.auto_applied_tags === true,
    };
  }, [documentDetail]);
  /**
   * Resolves whether any routing suggestion still needs manual application.
   */
  const hasPathSuggestion = Boolean(document?.suggested_path) && !routingAutoApplyState.autoAppliedPath;
  const hasTagSuggestions = (document?.suggested_tags.length ?? 0) > 0 && !routingAutoApplyState.autoAppliedTags;
  const canApplyAllSuggestions = hasPathSuggestion || hasTagSuggestions;
  /**
   * Applies suggested path value to editable metadata field.
   */
  const applySuggestedPath = (): void => {
    if (!hasPathSuggestion || !document?.suggested_path) {
      return;
    }
    setLogicalPath(document.suggested_path);
    setIsMetadataDirty(true);
  };
  /**
   * Applies one suggested tag to editable metadata field.
   */
  const applySuggestedTag = (tag: string): void => {
    if (!hasTagSuggestions || tags.includes(tag)) {
      return;
    }
    setTags([...tags, tag]);
    setIsMetadataDirty(true);
  };
  /**
   * Applies all suggested routing values into editable metadata fields.
   */
  const applyAllSuggestions = (): void => {
    if (hasPathSuggestion && document?.suggested_path) {
      setLogicalPath(document.suggested_path);
    }
    if (hasTagSuggestions && document?.suggested_tags.length) {
      const nextTags = [...tags];
      for (const tag of document.suggested_tags) {
        if (!nextTags.includes(tag)) {
          nextTags.push(tag);
        }
      }
      setTags(nextTags);
    }
    setIsMetadataDirty(true);
  };
  /**
   * Persists metadata changes to backend.
   */
  const handleSave = async (): Promise<void> => {
    if (!document) {
      return;
    }
    setIsSaving(true);
    setError(null);
    try {
      const updated = await updateDocumentMetadata(document.id, {
        original_filename: originalFilename,
        logical_path: logicalPath,
        tags,
      });
      setOriginalFilename(updated.original_filename);
      setLogicalPath(updated.logical_path);
      setTags(updated.tags);
      setIsMetadataDirty(false);
      onDocumentUpdated(updated);
      const payload = await getDocumentDetails(document.id);
      setDocumentDetail(payload);
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Failed to save metadata');
    } finally {
      setIsSaving(false);
    }
  };
  /**
   * Re-runs extraction and routing logic for the currently selected document.
   */
  const handleReprocess = async (): Promise<void> => {
    if (!document) {
      return;
    }
    setIsReprocessing(true);
    setError(null);
    try {
      const updated = await reprocessDocument(document.id);
      onDocumentUpdated(updated);
      const payload = await getDocumentDetails(document.id);
      setDocumentDetail(payload);
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Failed to reprocess document');
    } finally {
      setIsReprocessing(false);
    }
  };
  /**
   * Moves the selected document to trash state.
   */
  const handleTrash = async (): Promise<void> => {
    if (!document) {
      return;
    }
    setIsTrashing(true);
    setError(null);
    try {
      const updated = await trashDocument(document.id);
      onDocumentUpdated(updated);
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Failed to trash document');
    } finally {
      setIsTrashing(false);
    }
  };
  /**
   * Restores the selected document from trash.
   */
  const handleRestore = async (): Promise<void> => {
    if (!document) {
      return;
    }
    setIsRestoring(true);
    setError(null);
    try {
      const updated = await restoreDocument(document.id);
      onDocumentUpdated(updated);
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Failed to restore document');
    } finally {
      setIsRestoring(false);
    }
  };
  /**
   * Permanently deletes the selected document and associated files.
   */
  const handleDelete = async (): Promise<void> => {
    if (!document) {
      return;
    }
    const confirmed = await requestConfirmation(
      'Delete Document Permanently',
      'This removes the document record and stored file from the system.',
      'Delete Permanently',
    );
    if (!confirmed) {
      return;
    }
    setIsDeleting(true);
    setError(null);
    try {
      await deleteDocument(document.id);
      onDocumentDeleted(document.id);
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Failed to delete document');
    } finally {
      setIsDeleting(false);
    }
  };
  if (!document) {
    return (
      <aside className="document-viewer empty">
        <h2>Document Details</h2>
        <p>Select a document to preview and manage metadata.</p>
      </aside>
    );
  }
  const isTrashed = document.status === 'trashed' || isTrashView;
  const metadataDisabled = isTrashed || isSaving || isTrashing || isRestoring || isDeleting;
  return (
    <aside className="document-viewer">
      <h2>{document.original_filename}</h2>
      <p className="small">Status: {document.status}</p>
      <div className="viewer-preview">
        {isImageDocument ? (
          <img src={previewUrl(document.id)} alt={document.original_filename} />
        ) : (
          <iframe src={previewUrl(document.id)} title={document.original_filename} />
        )}
      </div>
      <label>
        File Name
        <input
          value={originalFilename}
          onChange={(event) => {
            setOriginalFilename(event.target.value);
            setIsMetadataDirty(true);
          }}
          disabled={metadataDisabled}
        />
      </label>
      <label>
        Destination Path
        <PathInput
          value={logicalPath}
          onChange={(value) => {
            setLogicalPath(value);
            setIsMetadataDirty(true);
          }}
          suggestions={existingPaths}
          disabled={metadataDisabled}
        />
      </label>
      <label>
        Tags
        <TagInput
          value={tags}
          onChange={(value) => {
            setTags(value);
            setIsMetadataDirty(true);
          }}
          suggestions={existingTags}
          disabled={metadataDisabled}
        />
      </label>
      {(document.suggested_path || document.suggested_tags.length > 0 || routingSummary || routingError) && (
        <section className="routing-suggestions-panel">
          <div className="routing-suggestions-header">
            <h3>Routing Suggestions</h3>
            {canApplyAllSuggestions && (
              <button
                type="button"
                className="secondary-action"
                onClick={applyAllSuggestions}
                disabled={metadataDisabled}
              >
                Apply All
              </button>
            )}
          </div>
          {routingError && <p className="small error">{routingError}</p>}
          {routingSummary && <p className="small">{routingSummary}</p>}
          {hasPathSuggestion && document.suggested_path && (
            <div className="routing-suggestion-group">
              <p className="small">Suggested Path</p>
              <button
                type="button"
                className="routing-pill"
                onClick={applySuggestedPath}
                disabled={metadataDisabled}
              >
                {document.suggested_path}
              </button>
            </div>
          )}
          {hasTagSuggestions && document.suggested_tags.length > 0 && (
            <div className="routing-suggestion-group">
              <p className="small">Suggested Tags</p>
              <div className="routing-pill-row">
                {document.suggested_tags.map((tag) => (
                  <button
                    key={tag}
                    type="button"
                    className="routing-pill"
                    onClick={() => applySuggestedTag(tag)}
                    disabled={metadataDisabled}
                  >
                    {tag}
                  </button>
                ))}
              </div>
            </div>
          )}
        </section>
      )}
      <section className="extracted-text-panel">
        <h3>Extracted Text</h3>
        {transcriptionError && <p className="small error">{transcriptionError}</p>}
        {isLoadingDetails ? (
          <p className="small">Loading extracted text...</p>
        ) : documentDetail?.extracted_text.trim() ? (
          <pre>{documentDetail.extracted_text}</pre>
        ) : (
          <p className="small">No extracted text available for this document yet.</p>
        )}
      </section>
      {error && <p className="error">{error}</p>}
      <div className="viewer-actions">
        {!isTrashed && (
          <button type="button" onClick={handleSave} disabled={metadataDisabled}>
            {isSaving ? 'Saving...' : 'Save Metadata'}
          </button>
        )}
        {!isTrashed && (
          <button
            type="button"
            className="secondary-action"
            onClick={handleReprocess}
            disabled={metadataDisabled || isReprocessing}
            title="Re-runs OCR/extraction, summary generation, routing suggestion, and indexing for this document."
          >
            {isReprocessing ? 'Reprocessing...' : 'Reprocess Document'}
          </button>
        )}
        {!isTrashed && (
          <button
            type="button"
            className="warning-action"
            onClick={handleTrash}
            disabled={metadataDisabled || isTrashing}
          >
            {isTrashing ? 'Trashing...' : 'Move To Trash'}
          </button>
        )}
        {isTrashed && (
          <button
            type="button"
            className="secondary-action"
            onClick={handleRestore}
            disabled={isRestoring || isDeleting}
          >
            {isRestoring ? 'Restoring...' : 'Restore Document'}
          </button>
        )}
        <button
          type="button"
          className="secondary-action"
          onClick={() => window.open(contentMarkdownUrl(document.id), '_blank', 'noopener,noreferrer')}
          disabled={isDeleting}
          title="Downloads recognized/extracted content as markdown for this document."
        >
          Download Recognized MD
        </button>
        {isTrashed && (
          <button
            type="button"
            className="danger-action"
            onClick={handleDelete}
            disabled={isDeleting || isRestoring}
          >
            {isDeleting ? 'Deleting...' : 'Delete Permanently'}
          </button>
        )}
      </div>
      <p className="viewer-inline-help">
        Reprocess runs OCR/extraction, updates summary, refreshes routing suggestions, and re-indexes search.
      </p>
    </aside>
  );
 }
@@ -0,0 +1,73 @@
 /**
 * Path editor with suggestion dropdown for scalable logical-path selection.
 */
 import { useMemo, useState } from 'react';
 /**
 * Defines properties for the reusable path input component.
 */
 interface PathInputProps {
  value: string;
  suggestions: string[];
  placeholder?: string;
  disabled?: boolean;
  onChange: (nextValue: string) => void;
 }
 /**
 * Renders a text input with filtered clickable path suggestions.
 */
 export default function PathInput({
  value,
  suggestions,
  placeholder = 'Destination path',
  disabled = false,
  onChange,
 }: PathInputProps): JSX.Element {
  const [isFocused, setIsFocused] = useState<boolean>(false);
  /**
   * Calculates filtered suggestions based on current input value.
   */
  const filteredSuggestions = useMemo(() => {
    const normalized = value.trim().toLowerCase();
    if (!normalized) {
      return suggestions.slice(0, 20);
    }
    return suggestions.filter((candidate) => candidate.toLowerCase().includes(normalized)).slice(0, 20);
  }, [suggestions, value]);
  return (
    <div className={`path-input ${disabled ? 'disabled' : ''}`}>
      <input
        value={value}
        onChange={(event) => onChange(event.target.value)}
        onFocus={() => setIsFocused(true)}
        onBlur={() => {
          window.setTimeout(() => setIsFocused(false), 120);
        }}
        placeholder={placeholder}
        disabled={disabled}
      />
      {isFocused && filteredSuggestions.length > 0 && (
        <div className="path-suggestions" role="listbox" aria-label="Path suggestions">
          {filteredSuggestions.map((suggestion) => (
            <button
              key={suggestion}
              type="button"
              className="path-suggestion-item"
              onMouseDown={(event) => {
                event.preventDefault();
                onChange(suggestion);
                setIsFocused(false);
              }}
              disabled={disabled}
            >
              {suggestion}
            </button>
          ))}
        </div>
      )}
    </div>
  );
 }
@@ -0,0 +1,203 @@
 /**
 * Processing log timeline panel for upload, OCR, summarization, routing, and indexing events.
 */
 import { useEffect, useMemo, useRef, useState } from 'react';
 import type { ProcessingLogEntry } from '../types';
 interface ProcessingLogPanelProps {
  entries: ProcessingLogEntry[];
  isLoading: boolean;
  isClearing: boolean;
  selectedDocumentId: string | null;
  isProcessingActive: boolean;
  typingAnimationEnabled: boolean;
  onClear: () => void;
 }
 /**
 * Renders processing events in a terminal-style stream with optional typed headers.
 */
 export default function ProcessingLogPanel({
  entries,
  isLoading,
  isClearing,
  selectedDocumentId,
  isProcessingActive,
  typingAnimationEnabled,
  onClear,
 }: ProcessingLogPanelProps): JSX.Element {
  const timeline = useMemo(() => [...entries].reverse(), [entries]);
  const [typedEntryIds, setTypedEntryIds] = useState<Set<number>>(() => new Set());
  const [typingEntryId, setTypingEntryId] = useState<number | null>(null);
  const [typingHeader, setTypingHeader] = useState<string>('');
  const [expandedIds, setExpandedIds] = useState<Set<number>>(() => new Set());
  const timerRef = useRef<number | null>(null);
  const formatTimestamp = (value: string): string => {
    const parsed = new Date(value);
    if (Number.isNaN(parsed.getTime())) {
      return value;
    }
    return parsed.toLocaleString();
  };
  const payloadText = (payload: Record<string, unknown>): string => {
    try {
      return JSON.stringify(payload, null, 2);
    } catch (error) {
      return String(error);
    }
  };
  const renderHeader = (entry: ProcessingLogEntry): string => {
    const headerParts = [formatTimestamp(entry.created_at), entry.level.toUpperCase(), entry.stage];
    if (entry.document_filename) {
      headerParts.push(entry.document_filename);
    }
    if (selectedDocumentId !== null && selectedDocumentId === entry.document_id) {
      headerParts.push('selected-document');
    }
    return `[${headerParts.join(' | ')}] ${entry.event}`;
  };
  useEffect(() => {
    const knownIds = new Set(typedEntryIds);
    if (typingEntryId !== null) {
      knownIds.add(typingEntryId);
    }
    const nextUntyped = timeline.find((entry) => !knownIds.has(entry.id));
    if (!nextUntyped) {
      return;
    }
    if (!typingAnimationEnabled) {
      setTypedEntryIds((current) => {
        const next = new Set(current);
        next.add(nextUntyped.id);
        return next;
      });
      return;
    }
    if (typingEntryId !== null) {
      return;
    }
    const fullHeader = renderHeader(nextUntyped);
    setTypingEntryId(nextUntyped.id);
    setTypingHeader('');
    let cursor = 0;
    timerRef.current = window.setInterval(() => {
      cursor += 1;
      setTypingHeader(fullHeader.slice(0, cursor));
      if (cursor >= fullHeader.length) {
        if (timerRef.current !== null) {
          window.clearInterval(timerRef.current);
          timerRef.current = null;
        }
        setTypedEntryIds((current) => {
          const next = new Set(current);
          next.add(nextUntyped.id);
          return next;
        });
        setTypingEntryId(null);
      }
    }, 10);
  }, [timeline, typedEntryIds, typingAnimationEnabled, typingEntryId]);
  useEffect(() => {
    return () => {
      if (timerRef.current !== null) {
        window.clearInterval(timerRef.current);
      }
    };
  }, []);
  return (
    <section className="processing-log-panel">
      <div className="panel-header">
        <h2>Processing Log</h2>
        <div className="processing-log-header-actions">
          <p>{isLoading ? 'Refreshing...' : `${entries.length} recent event(s)`}</p>
          <button type="button" className="secondary-action" onClick={onClear} disabled={isLoading || isClearing}>
            {isClearing ? 'Clearing...' : 'Clear All Logs'}
          </button>
        </div>
      </div>
      <div className="processing-log-terminal-wrap">
        <div className="processing-log-terminal">
          {timeline.length === 0 && <p className="terminal-empty">No processing events yet.</p>}
          {timeline.map((entry, index) => {
            const groupKey = `${entry.document_id ?? 'unbound'}:${entry.stage}`;
            const previousGroupKey = index > 0 ? `${timeline[index - 1].document_id ?? 'unbound'}:${timeline[index - 1].stage}` : null;
            const showSeparator = index > 0 && groupKey !== previousGroupKey;
            const isTyping = entry.id === typingEntryId;
            const isTyped = typedEntryIds.has(entry.id) || (!typingAnimationEnabled && !isTyping);
            const isExpanded = expandedIds.has(entry.id);
            const providerModel = [entry.provider_id, entry.model_name].filter(Boolean).join(' / ');
            const hasDetails =
              providerModel.length > 0 ||
              Object.keys(entry.payload_json).length > 0 ||
              Boolean(entry.prompt_text) ||
              Boolean(entry.response_text);
            return (
              <div key={entry.id}>
                {showSeparator && <div className="terminal-separator">------</div>}
                <div className="terminal-row-header">
                  <span>{isTyping ? typingHeader : renderHeader(entry)}</span>
                  {hasDetails && isTyped && (
                    <button
                      type="button"
                      className="terminal-unfold-button"
                      onClick={() =>
                        setExpandedIds((current) => {
                          const next = new Set(current);
                          if (next.has(entry.id)) {
                            next.delete(entry.id);
                          } else {
                            next.add(entry.id);
                          }
                          return next;
                        })
                      }
                    >
                      {isExpanded ? 'Fold' : 'Unfold'}
                    </button>
                  )}
                </div>
                {isExpanded && isTyped && (
                  <div className="terminal-row-details">
                    {providerModel && <div>provider/model: {providerModel}</div>}
                    {Object.keys(entry.payload_json).length > 0 && (
                      <>
                        <div>payload:</div>
                        <pre>{payloadText(entry.payload_json)}</pre>
                      </>
                    )}
                    {entry.prompt_text && (
                      <>
                        <div>prompt:</div>
                        <pre>{entry.prompt_text}</pre>
                      </>
                    )}
                    {entry.response_text && (
                      <>
                        <div>response:</div>
                        <pre>{entry.response_text}</pre>
                      </>
                    )}
                  </div>
                )}
              </div>
            );
          })}
          {isProcessingActive && typingEntryId === null && (
            <div className="terminal-idle-prompt">
              <span className="terminal-caret">&gt;</span>
              <span className="terminal-caret-blink">_</span>
            </div>
          )}
        </div>
      </div>
    </section>
  );
 }
@@ -0,0 +1,107 @@
 /**
 * Compact search and filter controls for document discovery.
 */
 interface SearchFiltersBarProps {
  searchText: string;
  onSearchTextChange: (value: string) => void;
  onSearchSubmit: () => void;
  onReset: () => void;
  hasActiveSearch: boolean;
  knownTags: string[];
  knownPaths: string[];
  knownTypes: string[];
  tagFilter: string;
  onTagFilterChange: (value: string) => void;
  typeFilter: string;
  onTypeFilterChange: (value: string) => void;
  pathFilter: string;
  onPathFilterChange: (value: string) => void;
  processedFrom: string;
  onProcessedFromChange: (value: string) => void;
  processedTo: string;
  onProcessedToChange: (value: string) => void;
  isLoading: boolean;
 }
 /**
 * Renders dense search, filter, and quick reset controls.
 */
 export default function SearchFiltersBar({
  searchText,
  onSearchTextChange,
  onSearchSubmit,
  onReset,
  hasActiveSearch,
  knownTags,
  knownPaths,
  knownTypes,
  tagFilter,
  onTagFilterChange,
  typeFilter,
  onTypeFilterChange,
  pathFilter,
  onPathFilterChange,
  processedFrom,
  onProcessedFromChange,
  processedTo,
  onProcessedToChange,
  isLoading,
 }: SearchFiltersBarProps): JSX.Element {
  return (
    <div className="search-filters-bar">
      <input
        value={searchText}
        onChange={(event) => onSearchTextChange(event.target.value)}
        placeholder="Search across name, text, path, tags"
        onKeyDown={(event) => {
          if (event.key === 'Enter') {
            event.preventDefault();
            onSearchSubmit();
          }
        }}
      />
      <select value={tagFilter} onChange={(event) => onTagFilterChange(event.target.value)}>
        <option value="">All Tags</option>
        {knownTags.map((tag) => (
          <option key={tag} value={tag}>
            {tag}
          </option>
        ))}
      </select>
      <select value={typeFilter} onChange={(event) => onTypeFilterChange(event.target.value)}>
        <option value="">All Types</option>
        {knownTypes.map((typeValue) => (
          <option key={typeValue} value={typeValue}>
            {typeValue}
          </option>
        ))}
      </select>
      <select value={pathFilter} onChange={(event) => onPathFilterChange(event.target.value)}>
        <option value="">All Paths</option>
        {knownPaths.map((path) => (
          <option key={path} value={path}>
            {path}
          </option>
        ))}
      </select>
      <input
        type="date"
        value={processedFrom}
        onChange={(event) => onProcessedFromChange(event.target.value)}
        title="Processed from"
      />
      <input
        type="date"
        value={processedTo}
        onChange={(event) => onProcessedToChange(event.target.value)}
        title="Processed to"
      />
      <button type="button" onClick={onSearchSubmit} disabled={isLoading}>
        Search
      </button>
      <button type="button" className="secondary-action" onClick={onReset} disabled={!hasActiveSearch || isLoading}>
        Reset
      </button>
    </div>
  );
 }
@@ -0,0 +1,721 @@
 /**
 * Dedicated settings screen for providers, task model bindings, and catalog controls.
 */
 import { useCallback, useEffect, useMemo, useState } from 'react';
 import PathInput from './PathInput';
 import TagInput from './TagInput';
 import type {
  AppSettings,
  AppSettingsUpdate,
  DisplaySettings,
  HandwritingStyleClusteringSettings,
  OcrTaskSettings,
  PredefinedPathEntry,
  PredefinedTagEntry,
  ProviderSettings,
  RoutingTaskSettings,
  SummaryTaskSettings,
  UploadDefaultsSettings,
 } from '../types';
 interface EditableProvider extends ProviderSettings {
  row_id: string;
  api_key: string;
  clear_api_key: boolean;
 }
 interface SettingsScreenProps {
  settings: AppSettings | null;
  isSaving: boolean;
  knownTags: string[];
  knownPaths: string[];
  onSave: (payload: AppSettingsUpdate) => Promise<void>;
  onRegisterSaveAction?: (action: (() => Promise<void>) | null) => void;
 }
 function clampCardsPerPage(value: number): number {
  return Math.max(1, Math.min(200, value));
 }
 function parseCardsPerPageInput(input: string, fallback: number): number {
  const parsed = Number.parseInt(input, 10);
  if (Number.isNaN(parsed)) {
    return clampCardsPerPage(fallback);
  }
  return clampCardsPerPage(parsed);
 }
 /**
 * Renders compact human-oriented settings controls.
 */
 export default function SettingsScreen({
  settings,
  isSaving,
  knownTags,
  knownPaths,
  onSave,
  onRegisterSaveAction,
 }: SettingsScreenProps): JSX.Element {
  const [providers, setProviders] = useState<EditableProvider[]>([]);
  const [ocrTask, setOcrTask] = useState<OcrTaskSettings | null>(null);
  const [summaryTask, setSummaryTask] = useState<SummaryTaskSettings | null>(null);
  const [routingTask, setRoutingTask] = useState<RoutingTaskSettings | null>(null);
  const [handwritingStyle, setHandwritingStyle] = useState<HandwritingStyleClusteringSettings | null>(null);
  const [predefinedPaths, setPredefinedPaths] = useState<PredefinedPathEntry[]>([]);
  const [predefinedTags, setPredefinedTags] = useState<PredefinedTagEntry[]>([]);
  const [newPredefinedPath, setNewPredefinedPath] = useState<string>('');
  const [newPredefinedTag, setNewPredefinedTag] = useState<string>('');
  const [uploadDefaults, setUploadDefaults] = useState<UploadDefaultsSettings | null>(null);
  const [displaySettings, setDisplaySettings] = useState<DisplaySettings | null>(null);
  const [cardsPerPageInput, setCardsPerPageInput] = useState<string>('12');
  const [error, setError] = useState<string | null>(null);
  useEffect(() => {
    if (!settings) {
      return;
    }
    setProviders(
      settings.providers.map((provider) => ({
        ...provider,
        row_id: `${provider.id}-${Math.random().toString(36).slice(2, 9)}`,
        api_key: '',
        clear_api_key: false,
      })),
    );
    setOcrTask(settings.tasks.ocr_handwriting);
    setSummaryTask(settings.tasks.summary_generation);
    setRoutingTask(settings.tasks.routing_classification);
    setHandwritingStyle(settings.handwriting_style_clustering);
    setPredefinedPaths(settings.predefined_paths);
    setPredefinedTags(settings.predefined_tags);
    setUploadDefaults(settings.upload_defaults);
    setDisplaySettings(settings.display);
    setCardsPerPageInput(String(settings.display.cards_per_page));
    setError(null);
  }, [settings]);
  const fallbackProviderId = useMemo(() => providers[0]?.id ?? '', [providers]);
  const addProvider = (): void => {
    const sequence = providers.length + 1;
    setProviders((current) => [
      ...current,
      {
        row_id: `provider-row-${Date.now()}-${sequence}`,
        id: `provider-${sequence}`,
        label: `Provider ${sequence}`,
        provider_type: 'openai_compatible',
        base_url: 'http://localhost:11434/v1',
        timeout_seconds: 45,
        api_key_set: false,
        api_key_masked: '',
        api_key: '',
        clear_api_key: false,
      },
    ]);
  };
  const removeProvider = (rowId: string): void => {
    const target = providers.find((provider) => provider.row_id === rowId);
    if (!target || providers.length <= 1) {
      return;
    }
    const remaining = providers.filter((provider) => provider.row_id !== rowId);
    const fallback = remaining[0]?.id ?? '';
    setProviders(remaining);
    if (ocrTask?.provider_id === target.id && fallback) {
      setOcrTask({ ...ocrTask, provider_id: fallback });
    }
    if (summaryTask?.provider_id === target.id && fallback) {
      setSummaryTask({ ...summaryTask, provider_id: fallback });
    }
    if (routingTask?.provider_id === target.id && fallback) {
      setRoutingTask({ ...routingTask, provider_id: fallback });
    }
  };
  const addPredefinedPath = (): void => {
    const value = newPredefinedPath.trim().replace(/^\/+|\/+$/g, '');
    if (!value) {
      return;
    }
    if (predefinedPaths.some((entry) => entry.value.toLowerCase() === value.toLowerCase())) {
      setNewPredefinedPath('');
      return;
    }
    setPredefinedPaths([...predefinedPaths, { value, global_shared: false }]);
    setNewPredefinedPath('');
  };
  const addPredefinedTag = (): void => {
    const value = newPredefinedTag.trim();
    if (!value) {
      return;
    }
    if (predefinedTags.some((entry) => entry.value.toLowerCase() === value.toLowerCase())) {
      setNewPredefinedTag('');
      return;
    }
    setPredefinedTags([...predefinedTags, { value, global_shared: false }]);
    setNewPredefinedTag('');
  };
  const handleSave = useCallback(async (): Promise<void> => {
    if (!ocrTask || !summaryTask || !routingTask || !handwritingStyle || !uploadDefaults || !displaySettings) {
      setError('Settings are not fully loaded yet');
      return;
    }
    if (providers.length === 0) {
      setError('At least one provider is required');
      return;
    }
    setError(null);
    try {
      const resolvedCardsPerPage = parseCardsPerPageInput(cardsPerPageInput, displaySettings.cards_per_page);
      setDisplaySettings({ ...displaySettings, cards_per_page: resolvedCardsPerPage });
      setCardsPerPageInput(String(resolvedCardsPerPage));
      await onSave({
        upload_defaults: {
          logical_path: uploadDefaults.logical_path.trim(),
          tags: uploadDefaults.tags,
        },
        display: {
          cards_per_page: resolvedCardsPerPage,
          log_typing_animation_enabled: displaySettings.log_typing_animation_enabled,
        },
        predefined_paths: predefinedPaths,
        predefined_tags: predefinedTags,
        handwriting_style_clustering: {
          enabled: handwritingStyle.enabled,
          embed_model: handwritingStyle.embed_model.trim(),
          neighbor_limit: handwritingStyle.neighbor_limit,
          match_min_similarity: handwritingStyle.match_min_similarity,
          bootstrap_match_min_similarity: handwritingStyle.bootstrap_match_min_similarity,
          bootstrap_sample_size: handwritingStyle.bootstrap_sample_size,
          image_max_side: handwritingStyle.image_max_side,
        },
        providers: providers.map((provider) => ({
          id: provider.id.trim(),
          label: provider.label.trim(),
          provider_type: provider.provider_type,
          base_url: provider.base_url.trim(),
          timeout_seconds: provider.timeout_seconds,
          api_key: provider.api_key.trim() || undefined,
          clear_api_key: provider.clear_api_key,
        })),
        tasks: {
          ocr_handwriting: {
            enabled: ocrTask.enabled,
            provider_id: ocrTask.provider_id,
            model: ocrTask.model.trim(),
            prompt: ocrTask.prompt,
          },
          summary_generation: {
            enabled: summaryTask.enabled,
            provider_id: summaryTask.provider_id,
            model: summaryTask.model.trim(),
            prompt: summaryTask.prompt,
            max_input_tokens: summaryTask.max_input_tokens,
          },
          routing_classification: {
            enabled: routingTask.enabled,
            provider_id: routingTask.provider_id,
            model: routingTask.model.trim(),
            prompt: routingTask.prompt,
            neighbor_count: routingTask.neighbor_count,
            neighbor_min_similarity: routingTask.neighbor_min_similarity,
            auto_apply_confidence_threshold: routingTask.auto_apply_confidence_threshold,
            auto_apply_neighbor_similarity_threshold: routingTask.auto_apply_neighbor_similarity_threshold,
            neighbor_path_override_enabled: routingTask.neighbor_path_override_enabled,
            neighbor_path_override_min_similarity: routingTask.neighbor_path_override_min_similarity,
            neighbor_path_override_min_gap: routingTask.neighbor_path_override_min_gap,
            neighbor_path_override_max_confidence: routingTask.neighbor_path_override_max_confidence,
          },
        },
      });
    } catch (caughtError) {
      setError(caughtError instanceof Error ? caughtError.message : 'Failed to save settings');
    }
  }, [
    cardsPerPageInput,
    displaySettings,
    handwritingStyle,
    ocrTask,
    onSave,
    predefinedPaths,
    predefinedTags,
    providers,
    routingTask,
    summaryTask,
    uploadDefaults,
  ]);
  useEffect(() => {
    if (!onRegisterSaveAction) {
      return;
    }
    if (!settings || !ocrTask || !summaryTask || !routingTask || !handwritingStyle || !uploadDefaults || !displaySettings) {
      onRegisterSaveAction(null);
      return;
    }
    onRegisterSaveAction(() => handleSave());
    return () => onRegisterSaveAction(null);
  }, [displaySettings, handleSave, handwritingStyle, ocrTask, onRegisterSaveAction, routingTask, settings, summaryTask, uploadDefaults]);
  if (!settings || !ocrTask || !summaryTask || !routingTask || !handwritingStyle || !uploadDefaults || !displaySettings) {
    return (
      <section className="settings-layout">
        <div className="settings-card">
          <h2>Settings</h2>
          <p>Loading settings...</p>
        </div>
      </section>
    );
  }
  return (
    <section className="settings-layout">
      {error && <p className="error-banner">{error}</p>}
      <div className="settings-card settings-section">
        <div className="settings-section-header">
          <h3>Workspace</h3>
          <p className="small">Defaults and display behavior for document operations.</p>
        </div>
        <div className="settings-field-grid">
          <label className="settings-field settings-field-wide">
            Default Path
            <PathInput
              value={uploadDefaults.logical_path}
              onChange={(nextPath) => setUploadDefaults({ ...uploadDefaults, logical_path: nextPath })}
              suggestions={knownPaths}
            />
          </label>
          <label className="settings-field settings-field-wide">
            Default Tags
            <TagInput
              value={uploadDefaults.tags}
              onChange={(nextTags) => setUploadDefaults({ ...uploadDefaults, tags: nextTags })}
              suggestions={knownTags}
            />
          </label>
          <label className="settings-field">
            Cards Per Page
            <input
              type="number"
              min={1}
              max={200}
              value={cardsPerPageInput}
              onChange={(event) => setCardsPerPageInput(event.target.value)}
            />
          </label>
          <label className="inline-checkbox settings-checkbox-field">
            <input
              type="checkbox"
              checked={displaySettings.log_typing_animation_enabled}
              onChange={(event) =>
                setDisplaySettings({ ...displaySettings, log_typing_animation_enabled: event.target.checked })
              }
            />
            Processing log typing animation enabled
          </label>
        </div>
      </div>
      <div className="settings-card settings-section">
        <div className="settings-section-header">
          <h3>Catalog Presets</h3>
          <p className="small">Pre-register allowed paths and tags. Global-shared is irreversible.</p>
        </div>
        <div className="settings-catalog-grid">
          <section className="settings-catalog-card">
            <h4>Predefined Paths</h4>
            <div className="settings-catalog-add-row">
              <input
                placeholder="Add path"
                value={newPredefinedPath}
                onChange={(event) => setNewPredefinedPath(event.target.value)}
              />
              <button type="button" className="secondary-action" onClick={addPredefinedPath}>
                Add
              </button>
            </div>
            <div className="settings-catalog-list">
              {predefinedPaths.map((entry) => (
                <div key={entry.value} className="settings-catalog-row">
                  <span>{entry.value}</span>
                  <label className="inline-checkbox">
                    <input
                      type="checkbox"
                      checked={entry.global_shared}
                      disabled={entry.global_shared}
                      onChange={(event) =>
                        setPredefinedPaths((current) =>
                          current.map((item) =>
                            item.value === entry.value
                              ? { ...item, global_shared: item.global_shared || event.target.checked }
                              : item,
                          ),
                        )
                      }
                    />
                    Global
                  </label>
                  <button
                    type="button"
                    className="secondary-action"
                    onClick={() => setPredefinedPaths((current) => current.filter((item) => item.value !== entry.value))}
                  >
                    Remove
                  </button>
                </div>
              ))}
            </div>
          </section>
          <section className="settings-catalog-card">
            <h4>Predefined Tags</h4>
            <div className="settings-catalog-add-row">
              <input
                placeholder="Add tag"
                value={newPredefinedTag}
                onChange={(event) => setNewPredefinedTag(event.target.value)}
              />
              <button type="button" className="secondary-action" onClick={addPredefinedTag}>
                Add
              </button>
            </div>
            <div className="settings-catalog-list">
              {predefinedTags.map((entry) => (
                <div key={entry.value} className="settings-catalog-row">
                  <span>{entry.value}</span>
                  <label className="inline-checkbox">
                    <input
                      type="checkbox"
                      checked={entry.global_shared}
                      disabled={entry.global_shared}
                      onChange={(event) =>
                        setPredefinedTags((current) =>
                          current.map((item) =>
                            item.value === entry.value
                              ? { ...item, global_shared: item.global_shared || event.target.checked }
                              : item,
                          ),
                        )
                      }
                    />
                    Global
                  </label>
                  <button
                    type="button"
                    className="secondary-action"
                    onClick={() => setPredefinedTags((current) => current.filter((item) => item.value !== entry.value))}
                  >
                    Remove
                  </button>
                </div>
              ))}
            </div>
          </section>
        </div>
      </div>
      <div className="settings-card settings-section">
        <div className="settings-section-header">
          <h3>Providers</h3>
          <p className="small">Configure OpenAI-compatible model endpoints.</p>
        </div>
        <div className="provider-list">
          {providers.map((provider, index) => (
            <div key={provider.row_id} className="provider-grid">
              <div className="provider-header">
                <h4>{provider.label || `Provider ${index + 1}`}</h4>
                <button
                  type="button"
                  className="danger-action"
                  onClick={() => removeProvider(provider.row_id)}
                  disabled={providers.length <= 1 || isSaving}
                >
                  Remove
                </button>
              </div>
              <div className="settings-field-grid">
                <label className="settings-field">
                  Provider ID
                  <input
                    value={provider.id}
                    onChange={(event) =>
                      setProviders((current) =>
                        current.map((item) => (item.row_id === provider.row_id ? { ...item, id: event.target.value } : item)),
                      )
                    }
                  />
                </label>
                <label className="settings-field">
                  Label
                  <input
                    value={provider.label}
                    onChange={(event) =>
                      setProviders((current) =>
                        current.map((item) =>
                          item.row_id === provider.row_id ? { ...item, label: event.target.value } : item,
                        ),
                      )
                    }
                  />
                </label>
                <label className="settings-field">
                  Timeout Seconds
                  <input
                    type="number"
                    value={provider.timeout_seconds}
                    onChange={(event) => {
                      const nextTimeout = Number.parseInt(event.target.value, 10);
                      if (Number.isNaN(nextTimeout)) {
                        return;
                      }
                      setProviders((current) =>
                        current.map((item) =>
                          item.row_id === provider.row_id ? { ...item, timeout_seconds: nextTimeout } : item,
                        ),
                      );
                    }}
                  />
                </label>
                <label className="settings-field settings-field-wide">
                  Base URL
                  <input
                    value={provider.base_url}
                    onChange={(event) =>
                      setProviders((current) =>
                        current.map((item) =>
                          item.row_id === provider.row_id ? { ...item, base_url: event.target.value } : item,
                        ),
                      )
                    }
                  />
                </label>
                <label className="settings-field settings-field-wide">
                  API Key
                  <input
                    type="password"
                    placeholder={provider.api_key_set ? `Stored: ${provider.api_key_masked}` : 'Optional API key'}
                    value={provider.api_key}
                    onChange={(event) =>
                      setProviders((current) =>
                        current.map((item) =>
                          item.row_id === provider.row_id ? { ...item, api_key: event.target.value } : item,
                        ),
                      )
                    }
                  />
                </label>
                <label className="inline-checkbox settings-checkbox-field">
                  <input
                    type="checkbox"
                    checked={provider.clear_api_key}
                    onChange={(event) =>
                      setProviders((current) =>
                        current.map((item) =>
                          item.row_id === provider.row_id ? { ...item, clear_api_key: event.target.checked } : item,
                        ),
                      )
                    }
                  />
                  Clear Stored API Key
                </label>
              </div>
            </div>
          ))}
        </div>
        <div className="settings-section-actions">
          <button type="button" className="secondary-action" onClick={addProvider} disabled={isSaving}>
            Add Provider
          </button>
        </div>
      </div>
      <div className="settings-card settings-section">
        <div className="settings-section-header">
          <h3>Task Runtime</h3>
          <p className="small">Bind providers and tune OCR, summary, routing, and handwriting style behavior.</p>
        </div>
        <div className="task-settings-block">
          <div className="task-block-header">
            <h4>OCR Handwriting</h4>
            <label className="inline-checkbox settings-toggle">
              <input type="checkbox" checked={ocrTask.enabled} onChange={(event) => setOcrTask({ ...ocrTask, enabled: event.target.checked })} />
              Enabled
            </label>
          </div>
          <div className="settings-field-grid">
            <label className="settings-field">
              Provider
              <select value={ocrTask.provider_id} onChange={(event) => setOcrTask({ ...ocrTask, provider_id: event.target.value || fallbackProviderId })}>
                {providers.map((provider) => (
                  <option key={provider.row_id} value={provider.id}>
                    {provider.label} ({provider.id})
                  </option>
                ))}
              </select>
            </label>
            <label className="settings-field">
              Model
              <input value={ocrTask.model} onChange={(event) => setOcrTask({ ...ocrTask, model: event.target.value })} />
            </label>
            <label className="settings-field settings-field-wide">
              OCR Prompt
              <textarea value={ocrTask.prompt} onChange={(event) => setOcrTask({ ...ocrTask, prompt: event.target.value })} />
            </label>
          </div>
        </div>
        <div className="task-settings-block">
          <div className="task-block-header">
            <h4>Summary Generation</h4>
            <label className="inline-checkbox settings-toggle">
              <input type="checkbox" checked={summaryTask.enabled} onChange={(event) => setSummaryTask({ ...summaryTask, enabled: event.target.checked })} />
              Enabled
            </label>
          </div>
          <div className="settings-field-grid">
            <label className="settings-field">
              Provider
              <select value={summaryTask.provider_id} onChange={(event) => setSummaryTask({ ...summaryTask, provider_id: event.target.value || fallbackProviderId })}>
                {providers.map((provider) => (
                  <option key={provider.row_id} value={provider.id}>
                    {provider.label} ({provider.id})
                  </option>
                ))}
              </select>
            </label>
            <label className="settings-field">
              Model
              <input value={summaryTask.model} onChange={(event) => setSummaryTask({ ...summaryTask, model: event.target.value })} />
            </label>
            <label className="settings-field">
              Max Input Tokens
              <input
                type="number"
                min={512}
                max={64000}
                value={summaryTask.max_input_tokens}
                onChange={(event) => {
                  const nextValue = Number.parseInt(event.target.value, 10);
                  if (!Number.isNaN(nextValue)) {
                    setSummaryTask({ ...summaryTask, max_input_tokens: nextValue });
                  }
                }}
              />
            </label>
            <label className="settings-field settings-field-wide">
              Summary Prompt
              <textarea value={summaryTask.prompt} onChange={(event) => setSummaryTask({ ...summaryTask, prompt: event.target.value })} />
            </label>
          </div>
        </div>
        <div className="task-settings-block">
          <div className="task-block-header">
            <h4>Routing Classification</h4>
            <label className="inline-checkbox settings-toggle">
              <input type="checkbox" checked={routingTask.enabled} onChange={(event) => setRoutingTask({ ...routingTask, enabled: event.target.checked })} />
              Enabled
            </label>
          </div>
          <div className="settings-field-grid">
            <label className="settings-field">
              Provider
              <select value={routingTask.provider_id} onChange={(event) => setRoutingTask({ ...routingTask, provider_id: event.target.value || fallbackProviderId })}>
                {providers.map((provider) => (
                  <option key={provider.row_id} value={provider.id}>
                    {provider.label} ({provider.id})
                  </option>
                ))}
              </select>
            </label>
            <label className="settings-field">
              Model
              <input value={routingTask.model} onChange={(event) => setRoutingTask({ ...routingTask, model: event.target.value })} />
            </label>
            <label className="settings-field">
              Neighbor Count
              <input type="number" value={routingTask.neighbor_count} onChange={(event) => setRoutingTask({ ...routingTask, neighbor_count: Number.parseInt(event.target.value, 10) || routingTask.neighbor_count })} />
            </label>
            <label className="settings-field">
              Min Neighbor Similarity
              <input type="number" step="0.01" min="0" max="1" value={routingTask.neighbor_min_similarity} onChange={(event) => setRoutingTask({ ...routingTask, neighbor_min_similarity: Number.parseFloat(event.target.value) || routingTask.neighbor_min_similarity })} />
            </label>
            <label className="settings-field">
              Auto Apply Confidence
              <input type="number" step="0.01" min="0" max="1" value={routingTask.auto_apply_confidence_threshold} onChange={(event) => setRoutingTask({ ...routingTask, auto_apply_confidence_threshold: Number.parseFloat(event.target.value) || routingTask.auto_apply_confidence_threshold })} />
            </label>
            <label className="settings-field">
              Auto Apply Neighbor Similarity
              <input type="number" step="0.01" min="0" max="1" value={routingTask.auto_apply_neighbor_similarity_threshold} onChange={(event) => setRoutingTask({ ...routingTask, auto_apply_neighbor_similarity_threshold: Number.parseFloat(event.target.value) || routingTask.auto_apply_neighbor_similarity_threshold })} />
            </label>
            <label className="inline-checkbox settings-checkbox-field">
              <input type="checkbox" checked={routingTask.neighbor_path_override_enabled} onChange={(event) => setRoutingTask({ ...routingTask, neighbor_path_override_enabled: event.target.checked })} />
              Dominant neighbor path override enabled
            </label>
            <label className="settings-field">
              Override Min Similarity
              <input type="number" step="0.01" min="0" max="1" value={routingTask.neighbor_path_override_min_similarity} onChange={(event) => setRoutingTask({ ...routingTask, neighbor_path_override_min_similarity: Number.parseFloat(event.target.value) || routingTask.neighbor_path_override_min_similarity })} />
            </label>
            <label className="settings-field">
              Override Min Gap
              <input type="number" step="0.01" min="0" max="1" value={routingTask.neighbor_path_override_min_gap} onChange={(event) => setRoutingTask({ ...routingTask, neighbor_path_override_min_gap: Number.parseFloat(event.target.value) || routingTask.neighbor_path_override_min_gap })} />
            </label>
            <label className="settings-field">
              Override Max LLM Confidence
              <input type="number" step="0.01" min="0" max="1" value={routingTask.neighbor_path_override_max_confidence} onChange={(event) => setRoutingTask({ ...routingTask, neighbor_path_override_max_confidence: Number.parseFloat(event.target.value) || routingTask.neighbor_path_override_max_confidence })} />
            </label>
            <label className="settings-field settings-field-wide">
              Routing Prompt
              <textarea value={routingTask.prompt} onChange={(event) => setRoutingTask({ ...routingTask, prompt: event.target.value })} />
            </label>
          </div>
        </div>
        <div className="task-settings-block">
          <div className="task-block-header">
            <h4>Handwriting Style Clustering</h4>
            <label className="inline-checkbox settings-toggle">
              <input type="checkbox" checked={handwritingStyle.enabled} onChange={(event) => setHandwritingStyle({ ...handwritingStyle, enabled: event.target.checked })} />
              Enabled
            </label>
          </div>
          <div className="settings-field-grid">
            <label className="settings-field settings-field-wide">
              Typesense Embedding Model Slug
              <input value={handwritingStyle.embed_model} onChange={(event) => setHandwritingStyle({ ...handwritingStyle, embed_model: event.target.value })} />
            </label>
            <label className="settings-field">
              Neighbor Limit
              <input type="number" min={1} max={32} value={handwritingStyle.neighbor_limit} onChange={(event) => setHandwritingStyle({ ...handwritingStyle, neighbor_limit: Number.parseInt(event.target.value, 10) || handwritingStyle.neighbor_limit })} />
            </label>
            <label className="settings-field">
              Match Min Similarity
              <input type="number" step="0.01" min="0" max="1" value={handwritingStyle.match_min_similarity} onChange={(event) => setHandwritingStyle({ ...handwritingStyle, match_min_similarity: Number.parseFloat(event.target.value) || handwritingStyle.match_min_similarity })} />
            </label>
            <label className="settings-field">
              Bootstrap Match Min Similarity
              <input type="number" step="0.01" min="0" max="1" value={handwritingStyle.bootstrap_match_min_similarity} onChange={(event) => setHandwritingStyle({ ...handwritingStyle, bootstrap_match_min_similarity: Number.parseFloat(event.target.value) || handwritingStyle.bootstrap_match_min_similarity })} />
            </label>
            <label className="settings-field">
              Bootstrap Sample Size
              <input type="number" min={1} max={30} value={handwritingStyle.bootstrap_sample_size} onChange={(event) => setHandwritingStyle({ ...handwritingStyle, bootstrap_sample_size: Number.parseInt(event.target.value, 10) || handwritingStyle.bootstrap_sample_size })} />
            </label>
            <label className="settings-field">
              Max Image Side (px)
              <input type="number" min={256} max={4096} value={handwritingStyle.image_max_side} onChange={(event) => setHandwritingStyle({ ...handwritingStyle, image_max_side: Number.parseInt(event.target.value, 10) || handwritingStyle.image_max_side })} />
            </label>
          </div>
        </div>
      </div>
    </section>
  );
 }
@@ -0,0 +1,123 @@
 /**
 * Tag editor with suggestion dropdown and keyboard-friendly chip interactions.
 */
 import { useMemo, useState } from 'react';
 import type { KeyboardEvent } from 'react';
 /**
 * Defines properties for the reusable tag input component.
 */
 interface TagInputProps {
  value: string[];
  suggestions: string[];
  placeholder?: string;
  disabled?: boolean;
  onChange: (tags: string[]) => void;
 }
 /**
 * Renders a chip-based tag editor with inline suggestions.
 */
 export default function TagInput({
  value,
  suggestions,
  placeholder = 'Add tag',
  disabled = false,
  onChange,
 }: TagInputProps): JSX.Element {
  const [draft, setDraft] = useState<string>('');
  /**
   * Calculates filtered suggestions based on current draft and selected tags.
   */
  const filteredSuggestions = useMemo(() => {
    const normalized = draft.trim().toLowerCase();
    return suggestions
      .filter((candidate) => !value.includes(candidate))
      .filter((candidate) => (normalized ? candidate.toLowerCase().includes(normalized) : false))
      .slice(0, 8);
  }, [draft, suggestions, value]);
  /**
   * Adds a tag to the selected value list when valid.
   */
  const addTag = (tag: string): void => {
    const normalized = tag.trim();
    if (!normalized) {
      return;
    }
    if (value.includes(normalized)) {
      setDraft('');
      return;
    }
    onChange([...value, normalized]);
    setDraft('');
  };
  /**
   * Removes one tag by value.
   */
  const removeTag = (tag: string): void => {
    onChange(value.filter((candidate) => candidate !== tag));
  };
  /**
   * Handles keyboard interactions for quick tag editing.
   */
  const handleKeyDown = (event: KeyboardEvent<HTMLInputElement>): void => {
    if (event.key === 'Enter' || event.key === ',') {
      event.preventDefault();
      addTag(draft);
      return;
    }
    if (event.key === 'Backspace' && draft.length === 0 && value.length > 0) {
      event.preventDefault();
      onChange(value.slice(0, -1));
    }
  };
  return (
    <div className={`tag-input ${disabled ? 'disabled' : ''}`}>
      <div className="tag-chip-row">
        {value.map((tag) => (
          <button
            key={tag}
            type="button"
            className="tag-chip"
            onClick={() => removeTag(tag)}
            disabled={disabled}
            title="Remove tag"
          >
            {tag}
          </button>
        ))}
      </div>
      <input
        value={draft}
        onChange={(event) => setDraft(event.target.value)}
        onKeyDown={handleKeyDown}
        onBlur={() => addTag(draft)}
        placeholder={placeholder}
        disabled={disabled}
      />
      {filteredSuggestions.length > 0 && (
        <div className="tag-suggestions" role="listbox" aria-label="Tag suggestions">
          {filteredSuggestions.map((suggestion) => (
            <button
              key={suggestion}
              type="button"
              className="tag-suggestion-item"
              onMouseDown={(event) => {
                event.preventDefault();
                addTag(suggestion);
              }}
              disabled={disabled}
            >
              {suggestion}
            </button>
          ))}
        </div>
      )}
    </div>
  );
 }
@@ -0,0 +1,127 @@
 /**
 * Upload surface that supports global drag-and-drop and file/folder picking.
 */
 import { useEffect, useMemo, useRef, useState } from 'react';
 import type { ChangeEvent } from 'react';
 /**
 * Defines callback signature for queued file uploads.
 */
 interface UploadSurfaceProps {
  onUploadRequested: (files: File[]) => Promise<void>;
  isUploading: boolean;
  variant?: 'panel' | 'inline';
 }
 /**
 * Renders upload actions and drag overlay for dropping documents anywhere.
 */
 export default function UploadSurface({
  onUploadRequested,
  isUploading,
  variant = 'panel',
 }: UploadSurfaceProps): JSX.Element {
  const [isDragging, setIsDragging] = useState<boolean>(false);
  const fileInputRef = useRef<HTMLInputElement | null>(null);
  const folderInputRef = useRef<HTMLInputElement | null>(null);
  /**
   * Installs folder-selection attributes unsupported by default React typings.
   */
  useEffect(() => {
    if (folderInputRef.current) {
      folderInputRef.current.setAttribute('webkitdirectory', '');
      folderInputRef.current.setAttribute('directory', '');
      folderInputRef.current.setAttribute('multiple', '');
    }
  }, []);
  /**
   * Registers global drag listeners so users can drop files anywhere in the app.
   */
  useEffect(() => {
    const onDragOver = (event: DragEvent): void => {
      event.preventDefault();
      setIsDragging(true);
    };
    const onDragLeave = (event: DragEvent): void => {
      event.preventDefault();
      if (!event.relatedTarget) {
        setIsDragging(false);
      }
    };
    const onDrop = async (event: DragEvent): Promise<void> => {
      event.preventDefault();
      setIsDragging(false);
      const droppedFiles = Array.from(event.dataTransfer?.files ?? []);
      if (droppedFiles.length > 0) {
        await onUploadRequested(droppedFiles);
      }
    };
    window.addEventListener('dragover', onDragOver);
    window.addEventListener('dragleave', onDragLeave);
    window.addEventListener('drop', onDrop);
    return () => {
      window.removeEventListener('dragover', onDragOver);
      window.removeEventListener('dragleave', onDragLeave);
      window.removeEventListener('drop', onDrop);
    };
  }, [onUploadRequested]);
  /**
   * Provides helper text based on current upload activity.
   */
  const statusLabel = useMemo(() => {
    if (isUploading) {
      return 'Uploading and scheduling processing...';
    }
    return 'Drop files anywhere or use file/folder upload.';
  }, [isUploading]);
  /**
   * Handles manual file and folder input selection events.
   */
  const handlePickedFiles = async (event: ChangeEvent<HTMLInputElement>): Promise<void> => {
    const pickedFiles = Array.from(event.target.files ?? []);
    if (pickedFiles.length > 0) {
      await onUploadRequested(pickedFiles);
    }
    event.target.value = '';
  };
  if (variant === 'inline') {
    return (
      <>
        {isDragging && <div className="drop-overlay">Drop to upload</div>}
        <div className="upload-actions upload-actions-inline">
          <button type="button" onClick={() => fileInputRef.current?.click()} disabled={isUploading}>
            Upload Files
          </button>
          <button type="button" onClick={() => folderInputRef.current?.click()} disabled={isUploading}>
            Upload Folder
          </button>
        </div>
        <input ref={fileInputRef} type="file" multiple hidden onChange={handlePickedFiles} />
        <input ref={folderInputRef} type="file" hidden onChange={handlePickedFiles} />
      </>
    );
  }
  return (
    <section className="upload-surface">
      {isDragging && <div className="drop-overlay">Drop to upload</div>}
      <div className="upload-actions">
        <button type="button" onClick={() => fileInputRef.current?.click()} disabled={isUploading}>
          Upload Files
        </button>
        <button type="button" onClick={() => folderInputRef.current?.click()} disabled={isUploading}>
          Upload Folder
        </button>
      </div>
      <p>{statusLabel}</p>
      <input ref={fileInputRef} type="file" multiple hidden onChange={handlePickedFiles} />
      <input ref={folderInputRef} type="file" hidden onChange={handlePickedFiles} />
    </section>
  );
 }
@@ -0,0 +1,119 @@
 /**
 * Foundational compact tokens and primitives for the LedgerDock frontend.
 */
@import url('https://fonts.googleapis.com/css2?family=Archivo:wght@500;600;700&family=IBM+Plex+Mono:wght@400;500&family=IBM+Plex+Sans:wght@400;500;600&display=swap');
 :root {
  --font-display: 'Archivo', sans-serif;
  --font-body: 'IBM Plex Sans', sans-serif;
  --font-mono: 'IBM Plex Mono', monospace;
  --color-bg-0: #0b111b;
  --color-bg-1: #101827;
  --color-panel: #141e2f;
  --color-panel-strong: #1b273a;
  --color-panel-elevated: #1f2d44;
  --color-border: #2f3f5a;
  --color-border-strong: #46597a;
  --color-text: #e4ebf7;
  --color-text-muted: #9aa8c1;
  --color-accent: #3f8dff;
  --color-accent-strong: #2e70cf;
  --color-success: #3bb07f;
  --color-warning: #d89a42;
  --color-danger: #d56a6a;
  --color-focus: #79adff;
  --radius-xs: 4px;
  --radius-sm: 6px;
  --radius-md: 8px;
  --radius-lg: 10px;
  --shadow-soft: 0 10px 24px rgba(0, 0, 0, 0.24);
  --shadow-strong: 0 16px 34px rgba(0, 0, 0, 0.34);
  --space-1: 0.25rem;
  --space-2: 0.5rem;
  --space-3: 0.75rem;
  --space-4: 1rem;
  --space-5: 1.5rem;
  --transition-fast: 140ms ease;
  --transition-base: 200ms ease;
 }
 * {
  box-sizing: border-box;
 }
 html,
 body,
 #root {
  min-height: 100%;
 }
 body {
  margin: 0;
  color: var(--color-text);
  font-family: var(--font-body);
  line-height: 1.45;
  background:
    radial-gradient(circle at 15% -5%, rgba(63, 141, 255, 0.24), transparent 38%),
    radial-gradient(circle at 90% -15%, rgba(130, 166, 229, 0.15), transparent 35%),
    linear-gradient(180deg, var(--color-bg-0) 0%, var(--color-bg-1) 100%);
 }
 body::before {
  content: '';
  position: fixed;
  inset: 0;
  pointer-events: none;
  z-index: -1;
  opacity: 0.35;
  background-image:
    linear-gradient(rgba(139, 162, 196, 0.08) 1px, transparent 1px),
    linear-gradient(90deg, rgba(139, 162, 196, 0.08) 1px, transparent 1px);
  background-size: 34px 34px;
 }
 button,
 input,
 select,
 textarea {
  font: inherit;
 }
 input[type='checkbox'] {
  accent-color: var(--color-accent);
 }
 :focus-visible {
  outline: 2px solid var(--color-focus);
  outline-offset: 1px;
 }
@keyframes rise-in {
  from {
    opacity: 0;
    transform: translateY(8px);
  }
  to {
    opacity: 1;
    transform: translateY(0);
  }
 }
@keyframes pulse-border {
  from {
    box-shadow: 0 0 0 0 rgba(121, 173, 255, 0.36);
  }
  to {
    box-shadow: 0 0 0 8px rgba(121, 173, 255, 0);
  }
 }
@keyframes terminal-blink {
  50% {
    opacity: 0;
  }
 }
@@ -0,0 +1,411 @@
 /**
 * API client for backend DMS endpoints.
 */
 import type {
  AppSettings,
  AppSettingsUpdate,
  DocumentListResponse,
  DmsDocument,
  DmsDocumentDetail,
  ProcessingLogListResponse,
  SearchResponse,
  TypeListResponse,
  UploadResponse,
 } from '../types';
 /**
 * Resolves backend base URL from environment with localhost fallback.
 */
 const API_BASE = import.meta.env.VITE_API_BASE ?? 'http://localhost:8000/api/v1';
 /**
 * Encodes query parameters while skipping undefined and null values.
 */
 function buildQuery(params: Record<string, string | number | boolean | undefined | null>): string {
  const searchParams = new URLSearchParams();
  Object.entries(params).forEach(([key, value]) => {
    if (value === undefined || value === null || value === '') {
      return;
    }
    searchParams.set(key, String(value));
  });
  const encoded = searchParams.toString();
  return encoded ? `?${encoded}` : '';
 }
 /**
 * Extracts a filename from content-disposition headers with fallback support.
 */
 function responseFilename(response: Response, fallback: string): string {
  const disposition = response.headers.get('content-disposition') ?? '';
  const match = disposition.match(/filename="?([^";]+)"?/i);
  if (!match || !match[1]) {
    return fallback;
  }
  return match[1];
 }
 /**
 * Loads documents from the backend list endpoint.
 */
 export async function listDocuments(options?: {
  limit?: number;
  offset?: number;
  includeTrashed?: boolean;
  onlyTrashed?: boolean;
  pathPrefix?: string;
  pathFilter?: string;
  tagFilter?: string;
  typeFilter?: string;
  processedFrom?: string;
  processedTo?: string;
 }): Promise<DocumentListResponse> {
  const query = buildQuery({
    limit: options?.limit ?? 100,
    offset: options?.offset ?? 0,
    include_trashed: options?.includeTrashed,
    only_trashed: options?.onlyTrashed,
    path_prefix: options?.pathPrefix,
    path_filter: options?.pathFilter,
    tag_filter: options?.tagFilter,
    type_filter: options?.typeFilter,
    processed_from: options?.processedFrom,
    processed_to: options?.processedTo,
  });
  const response = await fetch(`${API_BASE}/documents${query}`);
  if (!response.ok) {
    throw new Error('Failed to load documents');
  }
  return response.json() as Promise<DocumentListResponse>;
 }
 /**
 * Executes free-text search against backend search endpoint.
 */
 export async function searchDocuments(
  queryText: string,
  options?: {
    limit?: number;
    offset?: number;
    includeTrashed?: boolean;
    onlyTrashed?: boolean;
    pathFilter?: string;
    tagFilter?: string;
    typeFilter?: string;
    processedFrom?: string;
    processedTo?: string;
  },
 ): Promise<SearchResponse> {
  const query = buildQuery({
    query: queryText,
    limit: options?.limit ?? 100,
    offset: options?.offset ?? 0,
    include_trashed: options?.includeTrashed,
    only_trashed: options?.onlyTrashed,
    path_filter: options?.pathFilter,
    tag_filter: options?.tagFilter,
    type_filter: options?.typeFilter,
    processed_from: options?.processedFrom,
    processed_to: options?.processedTo,
  });
  const response = await fetch(`${API_BASE}/search${query}`);
  if (!response.ok) {
    throw new Error('Search failed');
  }
  return response.json() as Promise<SearchResponse>;
 }
 /**
 * Loads processing logs for recent upload, OCR, summarization, routing, and indexing steps.
 */
 export async function listProcessingLogs(options?: {
  limit?: number;
  offset?: number;
  documentId?: string;
 }): Promise<ProcessingLogListResponse> {
  const query = buildQuery({
    limit: options?.limit ?? 120,
    offset: options?.offset ?? 0,
    document_id: options?.documentId,
  });
  const response = await fetch(`${API_BASE}/processing/logs${query}`);
  if (!response.ok) {
    throw new Error('Failed to load processing logs');
  }
  return response.json() as Promise<ProcessingLogListResponse>;
 }
 /**
 * Trims persisted processing logs while keeping recent document sessions.
 */
 export async function trimProcessingLogs(options?: {
  keepDocumentSessions?: number;
  keepUnboundEntries?: number;
 }): Promise<{ deleted_document_entries: number; deleted_unbound_entries: number }> {
  const query = buildQuery({
    keep_document_sessions: options?.keepDocumentSessions ?? 2,
    keep_unbound_entries: options?.keepUnboundEntries ?? 80,
  });
  const response = await fetch(`${API_BASE}/processing/logs/trim${query}`, {
    method: 'POST',
  });
  if (!response.ok) {
    throw new Error('Failed to trim processing logs');
  }
  return response.json() as Promise<{ deleted_document_entries: number; deleted_unbound_entries: number }>;
 }
 /**
 * Clears all persisted processing logs.
 */
 export async function clearProcessingLogs(): Promise<{ deleted_entries: number }> {
  const response = await fetch(`${API_BASE}/processing/logs/clear`, {
    method: 'POST',
  });
  if (!response.ok) {
    throw new Error('Failed to clear processing logs');
  }
  return response.json() as Promise<{ deleted_entries: number }>;
 }
 /**
 * Returns existing tags for suggestion UIs.
 */
 export async function listTags(includeTrashed = false): Promise<string[]> {
  const query = buildQuery({ include_trashed: includeTrashed });
  const response = await fetch(`${API_BASE}/documents/tags${query}`);
  if (!response.ok) {
    throw new Error('Failed to load tags');
  }
  const payload = (await response.json()) as { tags: string[] };
  return payload.tags;
 }
 /**
 * Returns existing logical paths for suggestion UIs.
 */
 export async function listPaths(includeTrashed = false): Promise<string[]> {
  const query = buildQuery({ include_trashed: includeTrashed });
  const response = await fetch(`${API_BASE}/documents/paths${query}`);
  if (!response.ok) {
    throw new Error('Failed to load paths');
  }
  const payload = (await response.json()) as { paths: string[] };
  return payload.paths;
 }
 /**
 * Returns distinct type values from extension, MIME, and image text categories.
 */
 export async function listTypes(includeTrashed = false): Promise<string[]> {
  const query = buildQuery({ include_trashed: includeTrashed });
  const response = await fetch(`${API_BASE}/documents/types${query}`);
  if (!response.ok) {
    throw new Error('Failed to load document types');
  }
  const payload = (await response.json()) as TypeListResponse;
  return payload.types;
 }
 /**
 * Uploads files with optional logical path and tags.
 */
 export async function uploadDocuments(
  files: File[],
  options: {
    logicalPath: string;
    tags: string;
    conflictMode: 'ask' | 'replace' | 'duplicate';
  },
 ): Promise<UploadResponse> {
  const formData = new FormData();
  files.forEach((file) => {
    formData.append('files', file, file.name);
    const relativePath = (file as File & { webkitRelativePath?: string }).webkitRelativePath ?? file.name;
    formData.append('relative_paths', relativePath);
  });
  formData.append('logical_path', options.logicalPath);
  formData.append('tags', options.tags);
  formData.append('conflict_mode', options.conflictMode);
  const response = await fetch(`${API_BASE}/documents/upload`, {
    method: 'POST',
    body: formData,
  });
  if (!response.ok) {
    throw new Error('Upload failed');
  }
  return response.json() as Promise<UploadResponse>;
 }
 /**
 * Updates document metadata and optionally trains routing suggestions.
 */
 export async function updateDocumentMetadata(
  documentId: string,
  payload: { original_filename?: string; logical_path?: string; tags?: string[] },
 ): Promise<DmsDocument> {
  const response = await fetch(`${API_BASE}/documents/${documentId}`, {
    method: 'PATCH',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify(payload),
  });
  if (!response.ok) {
    throw new Error('Failed to update document metadata');
  }
  return response.json() as Promise<DmsDocument>;
 }
 /**
 * Moves a document to trash state without removing stored files.
 */
 export async function trashDocument(documentId: string): Promise<DmsDocument> {
  const response = await fetch(`${API_BASE}/documents/${documentId}/trash`, { method: 'POST' });
  if (!response.ok) {
    throw new Error('Failed to trash document');
  }
  return response.json() as Promise<DmsDocument>;
 }
 /**
 * Restores a document from trash to active state.
 */
 export async function restoreDocument(documentId: string): Promise<DmsDocument> {
  const response = await fetch(`${API_BASE}/documents/${documentId}/restore`, { method: 'POST' });
  if (!response.ok) {
    throw new Error('Failed to restore document');
  }
  return response.json() as Promise<DmsDocument>;
 }
 /**
 * Permanently deletes a document record and associated stored files.
 */
 export async function deleteDocument(documentId: string): Promise<{ deleted_documents: number; deleted_files: number }> {
  const response = await fetch(`${API_BASE}/documents/${documentId}`, { method: 'DELETE' });
  if (!response.ok) {
    throw new Error('Failed to delete document');
  }
  return response.json() as Promise<{ deleted_documents: number; deleted_files: number }>;
 }
 /**
 * Loads full details for one document, including extracted text content.
 */
 export async function getDocumentDetails(documentId: string): Promise<DmsDocumentDetail> {
  const response = await fetch(`${API_BASE}/documents/${documentId}`);
  if (!response.ok) {
    throw new Error('Failed to load document details');
  }
  return response.json() as Promise<DmsDocumentDetail>;
 }
 /**
 * Re-enqueues one document for extraction and classification processing.
 */
 export async function reprocessDocument(documentId: string): Promise<DmsDocument> {
  const response = await fetch(`${API_BASE}/documents/${documentId}/reprocess`, {
    method: 'POST',
  });
  if (!response.ok) {
    throw new Error('Failed to reprocess document');
  }
  return response.json() as Promise<DmsDocument>;
 }
 /**
 * Builds preview URL for a specific document.
 */
 export function previewUrl(documentId: string): string {
  return `${API_BASE}/documents/${documentId}/preview`;
 }
 /**
 * Builds thumbnail URL for dashboard card rendering.
 */
 export function thumbnailUrl(documentId: string): string {
  return `${API_BASE}/documents/${documentId}/thumbnail`;
 }
 /**
 * Builds download URL for a specific document.
 */
 export function downloadUrl(documentId: string): string {
  return `${API_BASE}/documents/${documentId}/download`;
 }
 /**
 * Builds direct markdown-content download URL for one document.
 */
 export function contentMarkdownUrl(documentId: string): string {
  return `${API_BASE}/documents/${documentId}/content-md`;
 }
 /**
 * Exports extracted content markdown files for selected documents or path filters.
 */
 export async function exportContentsMarkdown(payload: {
  document_ids?: string[];
  path_prefix?: string;
  include_trashed?: boolean;
  only_trashed?: boolean;
 }): Promise<{ blob: Blob; filename: string }> {
  const response = await fetch(`${API_BASE}/documents/content-md/export`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify(payload),
  });
  if (!response.ok) {
    throw new Error('Failed to export markdown contents');
  }
  const blob = await response.blob();
  return {
    blob,
    filename: responseFilename(response, 'document-contents-md.zip'),
  };
 }
 /**
 * Retrieves persisted application settings from backend.
 */
 export async function getAppSettings(): Promise<AppSettings> {
  const response = await fetch(`${API_BASE}/settings`);
  if (!response.ok) {
    throw new Error('Failed to load application settings');
  }
  return response.json() as Promise<AppSettings>;
 }
 /**
 * Updates provider and task settings for OpenAI-compatible model execution.
 */
 export async function updateAppSettings(payload: AppSettingsUpdate): Promise<AppSettings> {
  const response = await fetch(`${API_BASE}/settings`, {
    method: 'PATCH',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify(payload),
  });
  if (!response.ok) {
    throw new Error('Failed to update settings');
  }
  return response.json() as Promise<AppSettings>;
 }
 /**
 * Resets persisted provider and task settings to backend defaults.
 */
 export async function resetAppSettings(): Promise<AppSettings> {
  const response = await fetch(`${API_BASE}/settings/reset`, {
    method: 'POST',
  });
  if (!response.ok) {
    throw new Error('Failed to reset settings');
  }
  return response.json() as Promise<AppSettings>;
 }
@@ -0,0 +1,18 @@
 /**
 * Frontend application bootstrap for React rendering.
 */
 import { StrictMode } from 'react';
 import { createRoot } from 'react-dom/client';
 import App from './App';
 import './design-foundation.css';
 import './styles.css';
 /**
 * Mounts the root React application into the document.
 */
 createRoot(document.getElementById('root')!).render(
  <StrictMode>
    <App />
  </StrictMode>,
 );
@@ -0,0 +1,292 @@
 /**
 * Shared TypeScript API contracts used by frontend components.
 */
 /**
 * Enumerates backend document lifecycle states.
 */
 export type DocumentStatus = 'queued' | 'processed' | 'unsupported' | 'error' | 'trashed';
 /**
 * Represents one document row returned by backend APIs.
 */
 export interface DmsDocument {
  id: string;
  original_filename: string;
  source_relative_path: string;
  mime_type: string;
  extension: string;
  size_bytes: number;
  sha256: string;
  logical_path: string;
  suggested_path: string | null;
  image_text_type: string | null;
  handwriting_style_id: string | null;
  tags: string[];
  suggested_tags: string[];
  status: DocumentStatus;
  preview_available: boolean;
  is_archive_member: boolean;
  archived_member_path: string | null;
  parent_document_id: string | null;
  replaces_document_id: string | null;
  created_at: string;
  processed_at: string | null;
 }
 /**
 * Represents full document detail payload including extracted text and metadata.
 */
 export interface DmsDocumentDetail extends DmsDocument {
  extracted_text: string;
  metadata_json: Record<string, unknown>;
 }
 /**
 * Represents paginated document list payload.
 */
 export interface DocumentListResponse {
  total: number;
  items: DmsDocument[];
 }
 /**
 * Represents search result payload.
 */
 export interface SearchResponse {
  total: number;
  items: DmsDocument[];
 }
 /**
 * Represents distinct document type values available for filter controls.
 */
 export interface TypeListResponse {
  types: string[];
 }
 /**
 * Represents one processing pipeline event entry returned by the backend.
 */
 export interface ProcessingLogEntry {
  id: number;
  created_at: string;
  level: string;
  stage: string;
  event: string;
  document_id: string | null;
  document_filename: string | null;
  provider_id: string | null;
  model_name: string | null;
  prompt_text: string | null;
  response_text: string | null;
  payload_json: Record<string, unknown>;
 }
 /**
 * Represents paginated processing log response payload.
 */
 export interface ProcessingLogListResponse {
  total: number;
  items: ProcessingLogEntry[];
 }
 /**
 * Represents upload conflict information.
 */
 export interface UploadConflict {
  original_filename: string;
  sha256: string;
  existing_document_id: string;
 }
 /**
 * Represents upload response payload.
 */
 export interface UploadResponse {
  uploaded: DmsDocument[];
  conflicts: UploadConflict[];
 }
 /**
 * Represents one model provider binding served by the backend.
 */
 export interface ProviderSettings {
  id: string;
  label: string;
  provider_type: string;
  base_url: string;
  timeout_seconds: number;
  api_key_set: boolean;
  api_key_masked: string;
 }
 /**
 * Represents OCR task settings served by the backend.
 */
 export interface OcrTaskSettings {
  enabled: boolean;
  provider_id: string;
  model: string;
  prompt: string;
 }
 /**
 * Represents summarization task settings served by the backend.
 */
 export interface SummaryTaskSettings {
  enabled: boolean;
  provider_id: string;
  model: string;
  prompt: string;
  max_input_tokens: number;
 }
 /**
 * Represents routing task settings served by the backend.
 */
 export interface RoutingTaskSettings {
  enabled: boolean;
  provider_id: string;
  model: string;
  prompt: string;
  neighbor_count: number;
  neighbor_min_similarity: number;
  auto_apply_confidence_threshold: number;
  auto_apply_neighbor_similarity_threshold: number;
  neighbor_path_override_enabled: boolean;
  neighbor_path_override_min_similarity: number;
  neighbor_path_override_min_gap: number;
  neighbor_path_override_max_confidence: number;
 }
 /**
 * Represents default upload destination and tags.
 */
 export interface UploadDefaultsSettings {
  logical_path: string;
  tags: string[];
 }
 /**
 * Represents display preferences for document listings.
 */
 export interface DisplaySettings {
  cards_per_page: number;
  log_typing_animation_enabled: boolean;
 }
 /**
 * Represents one predefined logical path and discoverability scope.
 */
 export interface PredefinedPathEntry {
  value: string;
  global_shared: boolean;
 }
 /**
 * Represents one predefined tag and discoverability scope.
 */
 export interface PredefinedTagEntry {
  value: string;
  global_shared: boolean;
 }
 /**
 * Represents handwriting-style clustering settings for Typesense image embeddings.
 */
 export interface HandwritingStyleClusteringSettings {
  enabled: boolean;
  embed_model: string;
  neighbor_limit: number;
  match_min_similarity: number;
  bootstrap_match_min_similarity: number;
  bootstrap_sample_size: number;
  image_max_side: number;
 }
 /**
 * Represents all task-level settings served by the backend.
 */
 export interface TaskSettings {
  ocr_handwriting: OcrTaskSettings;
  summary_generation: SummaryTaskSettings;
  routing_classification: RoutingTaskSettings;
 }
 /**
 * Represents runtime settings served by the backend.
 */
 export interface AppSettings {
  upload_defaults: UploadDefaultsSettings;
  display: DisplaySettings;
  handwriting_style_clustering: HandwritingStyleClusteringSettings;
  predefined_paths: PredefinedPathEntry[];
  predefined_tags: PredefinedTagEntry[];
  providers: ProviderSettings[];
  tasks: TaskSettings;
 }
 /**
 * Represents provider settings update input payload.
 */
 export interface ProviderSettingsUpdate {
  id: string;
  label: string;
  provider_type: string;
  base_url: string;
  timeout_seconds: number;
  api_key?: string;
  clear_api_key?: boolean;
 }
 /**
 * Represents task settings update input payload.
 */
 export interface TaskSettingsUpdate {
  ocr_handwriting?: Partial<OcrTaskSettings>;
  summary_generation?: Partial<SummaryTaskSettings>;
  routing_classification?: Partial<RoutingTaskSettings>;
 }
 /**
 * Represents upload defaults update input payload.
 */
 export interface UploadDefaultsSettingsUpdate {
  logical_path?: string;
  tags?: string[];
 }
 /**
 * Represents display settings update input payload.
 */
 export interface DisplaySettingsUpdate {
  cards_per_page?: number;
  log_typing_animation_enabled?: boolean;
 }
 /**
 * Represents handwriting-style clustering settings update payload.
 */
 export interface HandwritingStyleClusteringSettingsUpdate {
  enabled?: boolean;
  embed_model?: string;
  neighbor_limit?: number;
  match_min_similarity?: number;
  bootstrap_match_min_similarity?: number;
  bootstrap_sample_size?: number;
  image_max_side?: number;
 }
 /**
 * Represents app settings update payload sent to backend.
 */
 export interface AppSettingsUpdate {
  upload_defaults?: UploadDefaultsSettingsUpdate;
  display?: DisplaySettingsUpdate;
  handwriting_style_clustering?: HandwritingStyleClusteringSettingsUpdate;
  predefined_paths?: PredefinedPathEntry[];
  predefined_tags?: PredefinedTagEntry[];
  providers?: ProviderSettingsUpdate[];
  tasks?: TaskSettingsUpdate;
 }
@@ -0,0 +1,19 @@
 {
  "compilerOptions": {
    "target": "ES2022",
    "useDefineForClassFields": true,
    "lib": ["ES2022", "DOM", "DOM.Iterable"],
    "module": "ESNext",
    "skipLibCheck": true,
    "moduleResolution": "Bundler",
    "allowImportingTsExtensions": false,
    "resolveJsonModule": true,
    "isolatedModules": true,
    "noEmit": true,
    "jsx": "react-jsx",
    "strict": true,
    "noFallthroughCasesInSwitch": true,
    "types": ["vite/client"]
  },
  "include": ["src"]
 }
@@ -0,0 +1,9 @@
 {
  "compilerOptions": {
    "composite": true,
    "module": "ESNext",
    "moduleResolution": "Bundler",
    "allowSyntheticDefaultImports": true
  },
  "include": ["vite.config.ts"]
 }
@@ -0,0 +1,14 @@
 /**
 * Vite configuration for the DMS frontend application.
 */
 import { defineConfig } from 'vite';
 /**
 * Exports frontend build and dev-server settings.
 */
 export default defineConfig({
  server: {
    host: '0.0.0.0',
    port: 5173,
  },
 });
		`@@ -0,0 +1 @@`
							`"""Backend application package for the DMS service."""`
		`@@ -0,0 +1 @@`
							`"""API package containing route modules and router registration."""`
		`@@ -0,0 +1 @@`
							`"""Core settings and shared configuration package."""`
		`@@ -0,0 +1 @@`
							`"""Database package exposing engine and session utilities."""`
		`@@ -0,0 +1 @@`
							`"""Pydantic schema package for API request and response models."""`
		`@@ -0,0 +1 @@`
							`"""Domain services package for storage, extraction, and classification logic."""`
		`@@ -0,0 +1 @@`
							`"""Background worker package for queueing and document processing tasks."""`