Compare commits
7 Commits
5792586a90
...
c3f34b38b4
| Author | SHA1 | Date | |
|---|---|---|---|
|
c3f34b38b4
|
|||
|
1cb6bfee58
|
|||
|
a69702f099
|
|||
|
c1a7011d71
|
|||
|
b25e508a00
|
|||
|
74a6551237
|
|||
|
3cbad053cc
|
138
REPORT.md
Normal file
138
REPORT.md
Normal file
@@ -0,0 +1,138 @@
|
|||||||
|
# Security Audit Report
|
||||||
|
|
||||||
|
Date: 2026-02-21
|
||||||
|
Repository: /Users/bedas/Developer/GitHub/dcm
|
||||||
|
Audit type: Static, read-only code and configuration review
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
- Backend API, worker, extraction and routing pipeline, settings handling, and storage interactions.
|
||||||
|
- Frontend dependency posture.
|
||||||
|
- Docker runtime and service exposure.
|
||||||
|
|
||||||
|
## Method
|
||||||
|
- File-level inspection with targeted code tracing for authn/authz, input validation, upload and archive processing, outbound network behavior, secret handling, logging, and deployment hardening.
|
||||||
|
- No runtime penetration testing was performed.
|
||||||
|
|
||||||
|
## Findings
|
||||||
|
|
||||||
|
### 1) Critical - Missing authentication and authorization on privileged API routes
|
||||||
|
- Impact: Any reachable client can access document, settings, and log-management functionality.
|
||||||
|
- Evidence:
|
||||||
|
- `backend/app/main.py:29`
|
||||||
|
- `backend/app/api/router.py:14`
|
||||||
|
- `backend/app/api/routes_documents.py:464`
|
||||||
|
- `backend/app/api/routes_documents.py:666`
|
||||||
|
- `backend/app/api/routes_settings.py:148`
|
||||||
|
- `backend/app/api/routes_processing_logs.py:22`
|
||||||
|
- Recommendation:
|
||||||
|
- Enforce authentication globally for non-health routes.
|
||||||
|
- Add per-endpoint authorization checks for read/update/delete/admin actions.
|
||||||
|
|
||||||
|
### 2) Critical - SSRF and data exfiltration risk via configurable model provider base URL
|
||||||
|
- Impact: An attacker can redirect model calls to attacker-controlled or internal hosts and exfiltrate document-derived content.
|
||||||
|
- Evidence:
|
||||||
|
- `backend/app/api/routes_settings.py:148`
|
||||||
|
- `backend/app/schemas/settings.py:24`
|
||||||
|
- `backend/app/services/app_settings.py:249`
|
||||||
|
- `backend/app/services/model_runtime.py:144`
|
||||||
|
- `backend/app/services/model_runtime.py:170`
|
||||||
|
- `backend/app/worker/tasks.py:505`
|
||||||
|
- `backend/app/services/routing_pipeline.py:803`
|
||||||
|
- Recommendation:
|
||||||
|
- Restrict provider endpoints to an allowlist.
|
||||||
|
- Validate URL scheme and block private/link-local destinations.
|
||||||
|
- Protect settings updates behind strict admin authorization.
|
||||||
|
- Enforce outbound egress controls at runtime.
|
||||||
|
|
||||||
|
### 3) High - Unbounded upload and archive extraction can cause memory/disk denial of service
|
||||||
|
- Impact: Oversized files or compressed archive bombs can exhaust API/worker resources.
|
||||||
|
- Evidence:
|
||||||
|
- `backend/app/api/routes_documents.py:486`
|
||||||
|
- `backend/app/services/extractor.py:309`
|
||||||
|
- `backend/app/services/extractor.py:312`
|
||||||
|
- `backend/app/worker/tasks.py:122`
|
||||||
|
- `backend/app/core/config.py:20`
|
||||||
|
- Recommendation:
|
||||||
|
- Enforce request and file size limits.
|
||||||
|
- Stream uploads and extraction where possible.
|
||||||
|
- Cap total uncompressed archive size and per-entry size.
|
||||||
|
|
||||||
|
### 4) High - Sensitive data logging exposed through unsecured log endpoints
|
||||||
|
- Impact: Extracted text, prompts, and model outputs may be retrievable by unauthorized callers.
|
||||||
|
- Evidence:
|
||||||
|
- `backend/app/models/processing_log.py:31`
|
||||||
|
- `backend/app/models/processing_log.py:32`
|
||||||
|
- `backend/app/services/routing_pipeline.py:803`
|
||||||
|
- `backend/app/services/routing_pipeline.py:814`
|
||||||
|
- `backend/app/worker/tasks.py:479`
|
||||||
|
- `backend/app/schemas/processing_logs.py:21`
|
||||||
|
- `backend/app/api/routes_processing_logs.py:22`
|
||||||
|
- Recommendation:
|
||||||
|
- Require admin authorization for log endpoints.
|
||||||
|
- Remove or redact sensitive payloads from logs.
|
||||||
|
- Reduce retention for operational logs that may include sensitive context.
|
||||||
|
|
||||||
|
### 5) High - Internal services exposed with weak default posture in docker compose
|
||||||
|
- Impact: Exposed Redis/Postgres/Typesense can enable data compromise and queue abuse.
|
||||||
|
- Evidence:
|
||||||
|
- `docker-compose.yml:5`
|
||||||
|
- `docker-compose.yml:6`
|
||||||
|
- `docker-compose.yml:9`
|
||||||
|
- `docker-compose.yml:21`
|
||||||
|
- `docker-compose.yml:29`
|
||||||
|
- `docker-compose.yml:32`
|
||||||
|
- `docker-compose.yml:68`
|
||||||
|
- `backend/app/worker/queue.py:15`
|
||||||
|
- `backend/app/core/config.py:34`
|
||||||
|
- Recommendation:
|
||||||
|
- Remove unnecessary host port exposure for internal services.
|
||||||
|
- Use strong credentials and network ACL segmentation.
|
||||||
|
- Enable authentication and transport protections for stateful services.
|
||||||
|
|
||||||
|
### 6) Medium - Plaintext secrets and weak defaults in configuration paths
|
||||||
|
- Impact: Credentials and API keys can be exposed from source or storage.
|
||||||
|
- Evidence:
|
||||||
|
- `backend/app/services/app_settings.py:129`
|
||||||
|
- `backend/app/services/app_settings.py:257`
|
||||||
|
- `backend/app/services/app_settings.py:667`
|
||||||
|
- `backend/app/core/config.py:17`
|
||||||
|
- `backend/app/core/config.py:34`
|
||||||
|
- `backend/.env.example:15`
|
||||||
|
- Recommendation:
|
||||||
|
- Use managed secrets storage and encryption at rest.
|
||||||
|
- Remove default credentials.
|
||||||
|
- Rotate exposed and default keys/credentials.
|
||||||
|
|
||||||
|
### 7) Low - Minimal HTTP hardening headers and broad CORS shape
|
||||||
|
- Impact: Increased browser-side attack surface, especially once authentication is introduced.
|
||||||
|
- Evidence:
|
||||||
|
- `backend/app/main.py:23`
|
||||||
|
- `backend/app/main.py:25`
|
||||||
|
- `backend/app/main.py:26`
|
||||||
|
- `backend/app/main.py:27`
|
||||||
|
- Recommendation:
|
||||||
|
- Add standard security headers middleware.
|
||||||
|
- Constrain allowed methods and headers to actual application needs.
|
||||||
|
|
||||||
|
### 8) Low - Containers appear to run as root by default
|
||||||
|
- Impact: In-container compromise has higher blast radius.
|
||||||
|
- Evidence:
|
||||||
|
- `backend/Dockerfile:1`
|
||||||
|
- `backend/Dockerfile:17`
|
||||||
|
- `frontend/Dockerfile:1`
|
||||||
|
- `frontend/Dockerfile:16`
|
||||||
|
- Recommendation:
|
||||||
|
- Run containers as non-root users.
|
||||||
|
- Drop unnecessary Linux capabilities.
|
||||||
|
|
||||||
|
## Residual Risk and Assumptions
|
||||||
|
- This audit assumes services may be reachable beyond a strictly isolated localhost-only environment.
|
||||||
|
- If an external auth proxy is enforced upstream, risk severity of unauthenticated routes is reduced but not eliminated unless backend also enforces trust boundaries.
|
||||||
|
- Dependency CVE posture was not exhaustively enumerated in this static pass.
|
||||||
|
|
||||||
|
## Priority Remediation Order
|
||||||
|
1. Enforce authentication and authorization across API routes.
|
||||||
|
2. Lock down settings mutation paths, especially model provider endpoint configuration.
|
||||||
|
3. Add strict upload/extraction resource limits.
|
||||||
|
4. Remove sensitive logging and protect log APIs.
|
||||||
|
5. Harden Docker/network exposure and secrets management.
|
||||||
@@ -2,6 +2,17 @@ APP_ENV=development
|
|||||||
DATABASE_URL=postgresql+psycopg://dcm:dcm@db:5432/dcm
|
DATABASE_URL=postgresql+psycopg://dcm:dcm@db:5432/dcm
|
||||||
REDIS_URL=redis://redis:6379/0
|
REDIS_URL=redis://redis:6379/0
|
||||||
STORAGE_ROOT=/data/storage
|
STORAGE_ROOT=/data/storage
|
||||||
|
ADMIN_API_TOKEN=replace-with-random-admin-token
|
||||||
|
USER_API_TOKEN=replace-with-random-user-token
|
||||||
|
MAX_UPLOAD_FILES_PER_REQUEST=50
|
||||||
|
MAX_UPLOAD_FILE_SIZE_BYTES=26214400
|
||||||
|
MAX_UPLOAD_REQUEST_SIZE_BYTES=104857600
|
||||||
|
MAX_ZIP_MEMBER_UNCOMPRESSED_BYTES=26214400
|
||||||
|
MAX_ZIP_TOTAL_UNCOMPRESSED_BYTES=157286400
|
||||||
|
MAX_ZIP_COMPRESSION_RATIO=120
|
||||||
|
PROVIDER_BASE_URL_ALLOWLIST=["api.openai.com"]
|
||||||
|
PROVIDER_BASE_URL_ALLOW_HTTP=false
|
||||||
|
PROVIDER_BASE_URL_ALLOW_PRIVATE_NETWORK=false
|
||||||
DEFAULT_OPENAI_BASE_URL=https://api.openai.com/v1
|
DEFAULT_OPENAI_BASE_URL=https://api.openai.com/v1
|
||||||
DEFAULT_OPENAI_MODEL=gpt-4.1-mini
|
DEFAULT_OPENAI_MODEL=gpt-4.1-mini
|
||||||
DEFAULT_OPENAI_TIMEOUT_SECONDS=45
|
DEFAULT_OPENAI_TIMEOUT_SECONDS=45
|
||||||
|
|||||||
@@ -12,6 +12,11 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
|
|||||||
COPY requirements.txt /app/requirements.txt
|
COPY requirements.txt /app/requirements.txt
|
||||||
RUN pip install --no-cache-dir -r /app/requirements.txt
|
RUN pip install --no-cache-dir -r /app/requirements.txt
|
||||||
|
|
||||||
COPY app /app/app
|
RUN addgroup --system appgroup && adduser --system --ingroup appgroup --uid 10001 appuser
|
||||||
|
RUN mkdir -p /data/storage && chown -R appuser:appgroup /app /data
|
||||||
|
|
||||||
|
COPY --chown=appuser:appgroup app /app/app
|
||||||
|
|
||||||
|
USER appuser
|
||||||
|
|
||||||
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||||
|
|||||||
87
backend/app/api/auth.py
Normal file
87
backend/app/api/auth.py
Normal file
@@ -0,0 +1,87 @@
|
|||||||
|
"""Token-based authentication and authorization dependencies for privileged API routes."""
|
||||||
|
|
||||||
|
import hmac
|
||||||
|
from typing import Annotated
|
||||||
|
|
||||||
|
from fastapi import Depends, HTTPException, status
|
||||||
|
from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
|
||||||
|
|
||||||
|
from app.core.config import Settings, get_settings
|
||||||
|
|
||||||
|
|
||||||
|
bearer_auth = HTTPBearer(auto_error=False)
|
||||||
|
|
||||||
|
|
||||||
|
class AuthRole:
|
||||||
|
"""Declares supported authorization roles for privileged API operations."""
|
||||||
|
|
||||||
|
ADMIN = "admin"
|
||||||
|
USER = "user"
|
||||||
|
|
||||||
|
|
||||||
|
def _raise_unauthorized() -> None:
|
||||||
|
"""Raises an HTTP 401 response with bearer authentication challenge headers."""
|
||||||
|
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||||
|
detail="Invalid or missing API token",
|
||||||
|
headers={"WWW-Authenticate": "Bearer"},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _configured_admin_token(settings: Settings) -> str:
|
||||||
|
"""Returns required admin token or raises configuration error when unset."""
|
||||||
|
|
||||||
|
token = settings.admin_api_token.strip()
|
||||||
|
if token:
|
||||||
|
return token
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
|
||||||
|
detail="Admin API token is not configured",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_token_role(token: str, settings: Settings) -> str:
|
||||||
|
"""Resolves role from a bearer token using constant-time comparisons."""
|
||||||
|
|
||||||
|
admin_token = _configured_admin_token(settings)
|
||||||
|
if hmac.compare_digest(token, admin_token):
|
||||||
|
return AuthRole.ADMIN
|
||||||
|
|
||||||
|
user_token = settings.user_api_token.strip()
|
||||||
|
if user_token and hmac.compare_digest(token, user_token):
|
||||||
|
return AuthRole.USER
|
||||||
|
|
||||||
|
_raise_unauthorized()
|
||||||
|
|
||||||
|
|
||||||
|
def get_request_role(
|
||||||
|
credentials: Annotated[HTTPAuthorizationCredentials | None, Depends(bearer_auth)],
|
||||||
|
settings: Annotated[Settings, Depends(get_settings)],
|
||||||
|
) -> str:
|
||||||
|
"""Authenticates request token and returns its authorization role."""
|
||||||
|
|
||||||
|
if credentials is None:
|
||||||
|
_raise_unauthorized()
|
||||||
|
|
||||||
|
token = credentials.credentials.strip()
|
||||||
|
if not token:
|
||||||
|
_raise_unauthorized()
|
||||||
|
return _resolve_token_role(token=token, settings=settings)
|
||||||
|
|
||||||
|
|
||||||
|
def require_user_or_admin(role: Annotated[str, Depends(get_request_role)]) -> str:
|
||||||
|
"""Requires a valid user or admin token and returns resolved role."""
|
||||||
|
|
||||||
|
return role
|
||||||
|
|
||||||
|
|
||||||
|
def require_admin(role: Annotated[str, Depends(get_request_role)]) -> str:
|
||||||
|
"""Requires admin role and rejects requests authenticated as regular users."""
|
||||||
|
|
||||||
|
if role != AuthRole.ADMIN:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_403_FORBIDDEN,
|
||||||
|
detail="Admin token required",
|
||||||
|
)
|
||||||
|
return role
|
||||||
@@ -1,7 +1,8 @@
|
|||||||
"""API router registration for all HTTP route modules."""
|
"""API router registration for all HTTP route modules."""
|
||||||
|
|
||||||
from fastapi import APIRouter
|
from fastapi import APIRouter, Depends
|
||||||
|
|
||||||
|
from app.api.auth import require_admin, require_user_or_admin
|
||||||
from app.api.routes_documents import router as documents_router
|
from app.api.routes_documents import router as documents_router
|
||||||
from app.api.routes_health import router as health_router
|
from app.api.routes_health import router as health_router
|
||||||
from app.api.routes_processing_logs import router as processing_logs_router
|
from app.api.routes_processing_logs import router as processing_logs_router
|
||||||
@@ -11,7 +12,27 @@ from app.api.routes_settings import router as settings_router
|
|||||||
|
|
||||||
api_router = APIRouter()
|
api_router = APIRouter()
|
||||||
api_router.include_router(health_router)
|
api_router.include_router(health_router)
|
||||||
api_router.include_router(documents_router, prefix="/documents", tags=["documents"])
|
api_router.include_router(
|
||||||
api_router.include_router(processing_logs_router, prefix="/processing/logs", tags=["processing-logs"])
|
documents_router,
|
||||||
api_router.include_router(search_router, prefix="/search", tags=["search"])
|
prefix="/documents",
|
||||||
api_router.include_router(settings_router, prefix="/settings", tags=["settings"])
|
tags=["documents"],
|
||||||
|
dependencies=[Depends(require_user_or_admin)],
|
||||||
|
)
|
||||||
|
api_router.include_router(
|
||||||
|
processing_logs_router,
|
||||||
|
prefix="/processing/logs",
|
||||||
|
tags=["processing-logs"],
|
||||||
|
dependencies=[Depends(require_admin)],
|
||||||
|
)
|
||||||
|
api_router.include_router(
|
||||||
|
search_router,
|
||||||
|
prefix="/search",
|
||||||
|
tags=["search"],
|
||||||
|
dependencies=[Depends(require_user_or_admin)],
|
||||||
|
)
|
||||||
|
api_router.include_router(
|
||||||
|
settings_router,
|
||||||
|
prefix="/settings",
|
||||||
|
tags=["settings"],
|
||||||
|
dependencies=[Depends(require_admin)],
|
||||||
|
)
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
"""Document CRUD, lifecycle, metadata, file access, and content export endpoints."""
|
"""Authenticated document CRUD, lifecycle, metadata, file access, and content export endpoints."""
|
||||||
|
|
||||||
import io
|
import io
|
||||||
import re
|
import re
|
||||||
@@ -14,7 +14,7 @@ from fastapi.responses import FileResponse, Response, StreamingResponse
|
|||||||
from sqlalchemy import or_, func, select
|
from sqlalchemy import or_, func, select
|
||||||
from sqlalchemy.orm import Session
|
from sqlalchemy.orm import Session
|
||||||
|
|
||||||
from app.services.app_settings import read_predefined_paths_settings, read_predefined_tags_settings
|
from app.core.config import get_settings
|
||||||
from app.db.base import get_session
|
from app.db.base import get_session
|
||||||
from app.models.document import Document, DocumentStatus
|
from app.models.document import Document, DocumentStatus
|
||||||
from app.schemas.documents import (
|
from app.schemas.documents import (
|
||||||
@@ -26,6 +26,7 @@ from app.schemas.documents import (
|
|||||||
UploadConflict,
|
UploadConflict,
|
||||||
UploadResponse,
|
UploadResponse,
|
||||||
)
|
)
|
||||||
|
from app.services.app_settings import read_predefined_paths_settings, read_predefined_tags_settings
|
||||||
from app.services.extractor import sniff_mime
|
from app.services.extractor import sniff_mime
|
||||||
from app.services.handwriting_style import delete_many_handwriting_style_documents
|
from app.services.handwriting_style import delete_many_handwriting_style_documents
|
||||||
from app.services.processing_logs import log_processing_event, set_processing_log_autocommit
|
from app.services.processing_logs import log_processing_event, set_processing_log_autocommit
|
||||||
@@ -35,6 +36,7 @@ from app.worker.queue import get_processing_queue
|
|||||||
|
|
||||||
|
|
||||||
router = APIRouter()
|
router = APIRouter()
|
||||||
|
settings = get_settings()
|
||||||
|
|
||||||
|
|
||||||
def _parse_csv(value: str | None) -> list[str]:
|
def _parse_csv(value: str | None) -> list[str]:
|
||||||
@@ -227,6 +229,33 @@ def _build_document_list_statement(
|
|||||||
return statement
|
return statement
|
||||||
|
|
||||||
|
|
||||||
|
def _enforce_upload_shape(files: list[UploadFile]) -> None:
|
||||||
|
"""Validates upload request shape against configured file-count bounds."""
|
||||||
|
|
||||||
|
if not files:
|
||||||
|
raise HTTPException(status_code=400, detail="Upload request must include at least one file")
|
||||||
|
if len(files) > settings.max_upload_files_per_request:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=413,
|
||||||
|
detail=(
|
||||||
|
"Upload request exceeds file count limit "
|
||||||
|
f"({len(files)} > {settings.max_upload_files_per_request})"
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
async def _read_upload_bytes(file: UploadFile, max_bytes: int) -> bytes:
|
||||||
|
"""Reads one upload file while enforcing per-file byte limits."""
|
||||||
|
|
||||||
|
data = await file.read(max_bytes + 1)
|
||||||
|
if len(data) > max_bytes:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=413,
|
||||||
|
detail=f"File '{file.filename or 'upload'}' exceeds per-file limit of {max_bytes} bytes",
|
||||||
|
)
|
||||||
|
return data
|
||||||
|
|
||||||
|
|
||||||
def _collect_document_tree(session: Session, root_document_id: UUID) -> list[tuple[int, Document]]:
|
def _collect_document_tree(session: Session, root_document_id: UUID) -> list[tuple[int, Document]]:
|
||||||
"""Collects a document and all descendants for recursive permanent deletion."""
|
"""Collects a document and all descendants for recursive permanent deletion."""
|
||||||
|
|
||||||
@@ -472,18 +501,29 @@ async def upload_documents(
|
|||||||
) -> UploadResponse:
|
) -> UploadResponse:
|
||||||
"""Uploads files, records metadata, and enqueues asynchronous extraction tasks."""
|
"""Uploads files, records metadata, and enqueues asynchronous extraction tasks."""
|
||||||
|
|
||||||
|
_enforce_upload_shape(files)
|
||||||
set_processing_log_autocommit(session, True)
|
set_processing_log_autocommit(session, True)
|
||||||
normalized_tags = _normalize_tags(tags)
|
normalized_tags = _normalize_tags(tags)
|
||||||
queue = get_processing_queue()
|
queue = get_processing_queue()
|
||||||
uploaded: list[DocumentResponse] = []
|
uploaded: list[DocumentResponse] = []
|
||||||
conflicts: list[UploadConflict] = []
|
conflicts: list[UploadConflict] = []
|
||||||
|
total_request_bytes = 0
|
||||||
|
|
||||||
indexed_relative_paths = relative_paths or []
|
indexed_relative_paths = relative_paths or []
|
||||||
prepared_uploads: list[dict[str, object]] = []
|
prepared_uploads: list[dict[str, object]] = []
|
||||||
|
|
||||||
for idx, file in enumerate(files):
|
for idx, file in enumerate(files):
|
||||||
filename = file.filename or f"uploaded_{idx}"
|
filename = file.filename or f"uploaded_{idx}"
|
||||||
data = await file.read()
|
data = await _read_upload_bytes(file, settings.max_upload_file_size_bytes)
|
||||||
|
total_request_bytes += len(data)
|
||||||
|
if total_request_bytes > settings.max_upload_request_size_bytes:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=413,
|
||||||
|
detail=(
|
||||||
|
"Upload request exceeds total size limit "
|
||||||
|
f"({total_request_bytes} > {settings.max_upload_request_size_bytes} bytes)"
|
||||||
|
),
|
||||||
|
)
|
||||||
sha256 = compute_sha256(data)
|
sha256 = compute_sha256(data)
|
||||||
source_relative_path = indexed_relative_paths[idx] if idx < len(indexed_relative_paths) else filename
|
source_relative_path = indexed_relative_paths[idx] if idx < len(indexed_relative_paths) else filename
|
||||||
extension = Path(filename).suffix.lower()
|
extension = Path(filename).suffix.lower()
|
||||||
|
|||||||
@@ -1,10 +1,11 @@
|
|||||||
"""Read-only API endpoints for processing pipeline event logs."""
|
"""Admin-only API endpoints for processing pipeline event logs."""
|
||||||
|
|
||||||
from uuid import UUID
|
from uuid import UUID
|
||||||
|
|
||||||
from fastapi import APIRouter, Depends, Query
|
from fastapi import APIRouter, Depends, Query
|
||||||
from sqlalchemy.orm import Session
|
from sqlalchemy.orm import Session
|
||||||
|
|
||||||
|
from app.core.config import get_settings
|
||||||
from app.db.base import get_session
|
from app.db.base import get_session
|
||||||
from app.schemas.processing_logs import ProcessingLogEntryResponse, ProcessingLogListResponse
|
from app.schemas.processing_logs import ProcessingLogEntryResponse, ProcessingLogListResponse
|
||||||
from app.services.app_settings import read_processing_log_retention_settings
|
from app.services.app_settings import read_processing_log_retention_settings
|
||||||
@@ -17,12 +18,13 @@ from app.services.processing_logs import (
|
|||||||
|
|
||||||
|
|
||||||
router = APIRouter()
|
router = APIRouter()
|
||||||
|
settings = get_settings()
|
||||||
|
|
||||||
|
|
||||||
@router.get("", response_model=ProcessingLogListResponse)
|
@router.get("", response_model=ProcessingLogListResponse)
|
||||||
def get_processing_logs(
|
def get_processing_logs(
|
||||||
offset: int = Query(default=0, ge=0),
|
offset: int = Query(default=0, ge=0),
|
||||||
limit: int = Query(default=120, ge=1, le=400),
|
limit: int = Query(default=120, ge=1, le=settings.processing_log_max_unbound_entries),
|
||||||
document_id: UUID | None = Query(default=None),
|
document_id: UUID | None = Query(default=None),
|
||||||
session: Session = Depends(get_session),
|
session: Session = Depends(get_session),
|
||||||
) -> ProcessingLogListResponse:
|
) -> ProcessingLogListResponse:
|
||||||
@@ -43,8 +45,8 @@ def get_processing_logs(
|
|||||||
|
|
||||||
@router.post("/trim")
|
@router.post("/trim")
|
||||||
def trim_processing_logs(
|
def trim_processing_logs(
|
||||||
keep_document_sessions: int | None = Query(default=None, ge=0, le=20),
|
keep_document_sessions: int | None = Query(default=None, ge=0, le=settings.processing_log_max_document_sessions),
|
||||||
keep_unbound_entries: int | None = Query(default=None, ge=0, le=400),
|
keep_unbound_entries: int | None = Query(default=None, ge=0, le=settings.processing_log_max_unbound_entries),
|
||||||
session: Session = Depends(get_session),
|
session: Session = Depends(get_session),
|
||||||
) -> dict[str, int]:
|
) -> dict[str, int]:
|
||||||
"""Deletes old processing logs using query values or persisted retention defaults."""
|
"""Deletes old processing logs using query values or persisted retention defaults."""
|
||||||
@@ -61,10 +63,19 @@ def trim_processing_logs(
|
|||||||
else int(retention_defaults.get("keep_unbound_entries", 80))
|
else int(retention_defaults.get("keep_unbound_entries", 80))
|
||||||
)
|
)
|
||||||
|
|
||||||
|
capped_keep_document_sessions = min(
|
||||||
|
settings.processing_log_max_document_sessions,
|
||||||
|
max(0, int(resolved_keep_document_sessions)),
|
||||||
|
)
|
||||||
|
capped_keep_unbound_entries = min(
|
||||||
|
settings.processing_log_max_unbound_entries,
|
||||||
|
max(0, int(resolved_keep_unbound_entries)),
|
||||||
|
)
|
||||||
|
|
||||||
result = cleanup_processing_logs(
|
result = cleanup_processing_logs(
|
||||||
session=session,
|
session=session,
|
||||||
keep_document_sessions=resolved_keep_document_sessions,
|
keep_document_sessions=capped_keep_document_sessions,
|
||||||
keep_unbound_entries=resolved_keep_unbound_entries,
|
keep_unbound_entries=capped_keep_unbound_entries,
|
||||||
)
|
)
|
||||||
session.commit()
|
session.commit()
|
||||||
return result
|
return result
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
"""API routes for managing persistent single-user application settings."""
|
"""Admin-only API routes for managing persistent single-user application settings."""
|
||||||
|
|
||||||
from fastapi import APIRouter
|
from fastapi import APIRouter, HTTPException
|
||||||
|
|
||||||
from app.schemas.settings import (
|
from app.schemas.settings import (
|
||||||
AppSettingsUpdateRequest,
|
AppSettingsUpdateRequest,
|
||||||
@@ -18,6 +18,7 @@ from app.schemas.settings import (
|
|||||||
UploadDefaultsResponse,
|
UploadDefaultsResponse,
|
||||||
)
|
)
|
||||||
from app.services.app_settings import (
|
from app.services.app_settings import (
|
||||||
|
AppSettingsValidationError,
|
||||||
TASK_OCR_HANDWRITING,
|
TASK_OCR_HANDWRITING,
|
||||||
TASK_ROUTING_CLASSIFICATION,
|
TASK_ROUTING_CLASSIFICATION,
|
||||||
TASK_SUMMARY_GENERATION,
|
TASK_SUMMARY_GENERATION,
|
||||||
@@ -179,6 +180,7 @@ def set_app_settings(payload: AppSettingsUpdateRequest) -> AppSettingsResponse:
|
|||||||
if payload.predefined_tags is not None:
|
if payload.predefined_tags is not None:
|
||||||
predefined_tags_payload = [item.model_dump(exclude_none=True) for item in payload.predefined_tags]
|
predefined_tags_payload = [item.model_dump(exclude_none=True) for item in payload.predefined_tags]
|
||||||
|
|
||||||
|
try:
|
||||||
updated = update_app_settings(
|
updated = update_app_settings(
|
||||||
providers=providers_payload,
|
providers=providers_payload,
|
||||||
tasks=tasks_payload,
|
tasks=tasks_payload,
|
||||||
@@ -189,6 +191,8 @@ def set_app_settings(payload: AppSettingsUpdateRequest) -> AppSettingsResponse:
|
|||||||
predefined_paths=predefined_paths_payload,
|
predefined_paths=predefined_paths_payload,
|
||||||
predefined_tags=predefined_tags_payload,
|
predefined_tags=predefined_tags_payload,
|
||||||
)
|
)
|
||||||
|
except AppSettingsValidationError as error:
|
||||||
|
raise HTTPException(status_code=400, detail=str(error)) from error
|
||||||
return _build_response(updated)
|
return _build_response(updated)
|
||||||
|
|
||||||
|
|
||||||
@@ -203,6 +207,7 @@ def reset_settings_to_defaults() -> AppSettingsResponse:
|
|||||||
def set_handwriting_settings(payload: HandwritingSettingsUpdateRequest) -> AppSettingsResponse:
|
def set_handwriting_settings(payload: HandwritingSettingsUpdateRequest) -> AppSettingsResponse:
|
||||||
"""Updates handwriting transcription settings and returns the resulting configuration."""
|
"""Updates handwriting transcription settings and returns the resulting configuration."""
|
||||||
|
|
||||||
|
try:
|
||||||
updated = update_handwriting_settings(
|
updated = update_handwriting_settings(
|
||||||
enabled=payload.enabled,
|
enabled=payload.enabled,
|
||||||
openai_base_url=payload.openai_base_url,
|
openai_base_url=payload.openai_base_url,
|
||||||
@@ -211,6 +216,8 @@ def set_handwriting_settings(payload: HandwritingSettingsUpdateRequest) -> AppSe
|
|||||||
openai_api_key=payload.openai_api_key,
|
openai_api_key=payload.openai_api_key,
|
||||||
clear_openai_api_key=payload.clear_openai_api_key,
|
clear_openai_api_key=payload.clear_openai_api_key,
|
||||||
)
|
)
|
||||||
|
except AppSettingsValidationError as error:
|
||||||
|
raise HTTPException(status_code=400, detail=str(error)) from error
|
||||||
return _build_response(updated)
|
return _build_response(updated)
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -1,7 +1,10 @@
|
|||||||
"""Application settings and environment configuration."""
|
"""Application settings and environment configuration."""
|
||||||
|
|
||||||
from functools import lru_cache
|
from functools import lru_cache
|
||||||
|
import ipaddress
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
import socket
|
||||||
|
from urllib.parse import urlparse, urlunparse
|
||||||
|
|
||||||
from pydantic import Field
|
from pydantic import Field
|
||||||
from pydantic_settings import BaseSettings, SettingsConfigDict
|
from pydantic_settings import BaseSettings, SettingsConfigDict
|
||||||
@@ -18,9 +21,24 @@ class Settings(BaseSettings):
|
|||||||
redis_url: str = "redis://redis:6379/0"
|
redis_url: str = "redis://redis:6379/0"
|
||||||
storage_root: Path = Path("/data/storage")
|
storage_root: Path = Path("/data/storage")
|
||||||
upload_chunk_size: int = 4 * 1024 * 1024
|
upload_chunk_size: int = 4 * 1024 * 1024
|
||||||
|
max_upload_files_per_request: int = 50
|
||||||
|
max_upload_file_size_bytes: int = 25 * 1024 * 1024
|
||||||
|
max_upload_request_size_bytes: int = 100 * 1024 * 1024
|
||||||
max_zip_members: int = 250
|
max_zip_members: int = 250
|
||||||
max_zip_depth: int = 2
|
max_zip_depth: int = 2
|
||||||
|
max_zip_member_uncompressed_bytes: int = 25 * 1024 * 1024
|
||||||
|
max_zip_total_uncompressed_bytes: int = 150 * 1024 * 1024
|
||||||
|
max_zip_compression_ratio: float = 120.0
|
||||||
max_text_length: int = 500_000
|
max_text_length: int = 500_000
|
||||||
|
admin_api_token: str = ""
|
||||||
|
user_api_token: str = ""
|
||||||
|
provider_base_url_allowlist: list[str] = Field(default_factory=lambda: ["api.openai.com"])
|
||||||
|
provider_base_url_allow_http: bool = False
|
||||||
|
provider_base_url_allow_private_network: bool = False
|
||||||
|
processing_log_max_document_sessions: int = 20
|
||||||
|
processing_log_max_unbound_entries: int = 400
|
||||||
|
processing_log_max_payload_chars: int = 4096
|
||||||
|
processing_log_max_text_chars: int = 12000
|
||||||
default_openai_base_url: str = "https://api.openai.com/v1"
|
default_openai_base_url: str = "https://api.openai.com/v1"
|
||||||
default_openai_model: str = "gpt-4.1-mini"
|
default_openai_model: str = "gpt-4.1-mini"
|
||||||
default_openai_timeout_seconds: int = 45
|
default_openai_timeout_seconds: int = 45
|
||||||
@@ -39,6 +57,187 @@ class Settings(BaseSettings):
|
|||||||
cors_origins: list[str] = Field(default_factory=lambda: ["http://localhost:5173", "http://localhost:3000"])
|
cors_origins: list[str] = Field(default_factory=lambda: ["http://localhost:5173", "http://localhost:3000"])
|
||||||
|
|
||||||
|
|
||||||
|
LOCAL_HOSTNAME_SUFFIXES = (".local", ".internal", ".home.arpa")
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_allowlist(allowlist: object) -> tuple[str, ...]:
|
||||||
|
"""Normalizes host allowlist entries to lowercase DNS labels."""
|
||||||
|
|
||||||
|
if not isinstance(allowlist, (list, tuple, set)):
|
||||||
|
return ()
|
||||||
|
normalized = {
|
||||||
|
candidate.strip().lower().rstrip(".")
|
||||||
|
for candidate in allowlist
|
||||||
|
if isinstance(candidate, str) and candidate.strip()
|
||||||
|
}
|
||||||
|
return tuple(sorted(normalized))
|
||||||
|
|
||||||
|
|
||||||
|
def _host_matches_allowlist(hostname: str, allowlist: tuple[str, ...]) -> bool:
|
||||||
|
"""Returns whether a hostname is included by an exact or subdomain allowlist rule."""
|
||||||
|
|
||||||
|
if not allowlist:
|
||||||
|
return False
|
||||||
|
candidate = hostname.lower().rstrip(".")
|
||||||
|
for allowed_host in allowlist:
|
||||||
|
if candidate == allowed_host or candidate.endswith(f".{allowed_host}"):
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def _is_private_or_special_ip(value: ipaddress.IPv4Address | ipaddress.IPv6Address) -> bool:
|
||||||
|
"""Returns whether an IP belongs to private, loopback, link-local, or reserved ranges."""
|
||||||
|
|
||||||
|
return (
|
||||||
|
value.is_private
|
||||||
|
or value.is_loopback
|
||||||
|
or value.is_link_local
|
||||||
|
or value.is_multicast
|
||||||
|
or value.is_reserved
|
||||||
|
or value.is_unspecified
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _validate_resolved_host_ips(hostname: str, port: int, allow_private_network: bool) -> None:
|
||||||
|
"""Resolves hostnames and rejects private or special addresses when private network access is disabled."""
|
||||||
|
|
||||||
|
try:
|
||||||
|
addresses = socket.getaddrinfo(hostname, port, type=socket.SOCK_STREAM)
|
||||||
|
except socket.gaierror as error:
|
||||||
|
raise ValueError(f"Provider base URL host cannot be resolved: {hostname}") from error
|
||||||
|
|
||||||
|
resolved_ips: set[ipaddress.IPv4Address | ipaddress.IPv6Address] = set()
|
||||||
|
for entry in addresses:
|
||||||
|
sockaddr = entry[4]
|
||||||
|
if not sockaddr:
|
||||||
|
continue
|
||||||
|
ip_text = sockaddr[0]
|
||||||
|
try:
|
||||||
|
resolved_ips.add(ipaddress.ip_address(ip_text))
|
||||||
|
except ValueError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if not resolved_ips:
|
||||||
|
raise ValueError(f"Provider base URL host resolved without usable IP addresses: {hostname}")
|
||||||
|
|
||||||
|
if allow_private_network:
|
||||||
|
return
|
||||||
|
|
||||||
|
blocked = [ip for ip in resolved_ips if _is_private_or_special_ip(ip)]
|
||||||
|
if blocked:
|
||||||
|
blocked_text = ", ".join(str(ip) for ip in blocked)
|
||||||
|
raise ValueError(f"Provider base URL resolves to private or special IP addresses: {blocked_text}")
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_and_validate_provider_base_url(
|
||||||
|
raw_value: str,
|
||||||
|
allowlist: tuple[str, ...],
|
||||||
|
allow_http: bool,
|
||||||
|
allow_private_network: bool,
|
||||||
|
resolve_dns: bool,
|
||||||
|
) -> str:
|
||||||
|
"""Normalizes and validates provider base URLs with SSRF-safe scheme and host checks."""
|
||||||
|
|
||||||
|
trimmed = raw_value.strip().rstrip("/")
|
||||||
|
if not trimmed:
|
||||||
|
raise ValueError("Provider base URL must not be empty")
|
||||||
|
|
||||||
|
parsed = urlparse(trimmed)
|
||||||
|
scheme = parsed.scheme.lower()
|
||||||
|
if scheme not in {"http", "https"}:
|
||||||
|
raise ValueError("Provider base URL must use http or https")
|
||||||
|
if scheme == "http" and not allow_http:
|
||||||
|
raise ValueError("Provider base URL must use https")
|
||||||
|
if parsed.query or parsed.fragment:
|
||||||
|
raise ValueError("Provider base URL must not include query strings or fragments")
|
||||||
|
if parsed.username or parsed.password:
|
||||||
|
raise ValueError("Provider base URL must not include embedded credentials")
|
||||||
|
|
||||||
|
hostname = (parsed.hostname or "").lower().rstrip(".")
|
||||||
|
if not hostname:
|
||||||
|
raise ValueError("Provider base URL must include a hostname")
|
||||||
|
if allowlist and not _host_matches_allowlist(hostname, allowlist):
|
||||||
|
allowed_hosts = ", ".join(allowlist)
|
||||||
|
raise ValueError(f"Provider base URL host is not in allowlist: {hostname}. Allowed hosts: {allowed_hosts}")
|
||||||
|
|
||||||
|
if hostname == "localhost" or hostname.endswith(LOCAL_HOSTNAME_SUFFIXES):
|
||||||
|
if not allow_private_network:
|
||||||
|
raise ValueError("Provider base URL must not target local or internal hostnames")
|
||||||
|
|
||||||
|
try:
|
||||||
|
ip_host = ipaddress.ip_address(hostname)
|
||||||
|
except ValueError:
|
||||||
|
ip_host = None
|
||||||
|
|
||||||
|
if ip_host is not None:
|
||||||
|
if not allow_private_network and _is_private_or_special_ip(ip_host):
|
||||||
|
raise ValueError("Provider base URL must not target private or special IP addresses")
|
||||||
|
elif resolve_dns:
|
||||||
|
resolved_port = parsed.port
|
||||||
|
if resolved_port is None:
|
||||||
|
resolved_port = 443 if scheme == "https" else 80
|
||||||
|
_validate_resolved_host_ips(
|
||||||
|
hostname=hostname,
|
||||||
|
port=resolved_port,
|
||||||
|
allow_private_network=allow_private_network,
|
||||||
|
)
|
||||||
|
|
||||||
|
path = (parsed.path or "").rstrip("/")
|
||||||
|
if not path.endswith("/v1"):
|
||||||
|
path = f"{path}/v1" if path else "/v1"
|
||||||
|
|
||||||
|
normalized_hostname = hostname
|
||||||
|
if ":" in normalized_hostname and not normalized_hostname.startswith("["):
|
||||||
|
normalized_hostname = f"[{normalized_hostname}]"
|
||||||
|
netloc = f"{normalized_hostname}:{parsed.port}" if parsed.port is not None else normalized_hostname
|
||||||
|
return urlunparse((scheme, netloc, path, "", "", ""))
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=256)
|
||||||
|
def _normalize_and_validate_provider_base_url_cached(
|
||||||
|
raw_value: str,
|
||||||
|
allowlist: tuple[str, ...],
|
||||||
|
allow_http: bool,
|
||||||
|
allow_private_network: bool,
|
||||||
|
) -> str:
|
||||||
|
"""Caches provider URL validation results for non-DNS-resolved checks."""
|
||||||
|
|
||||||
|
return _normalize_and_validate_provider_base_url(
|
||||||
|
raw_value=raw_value,
|
||||||
|
allowlist=allowlist,
|
||||||
|
allow_http=allow_http,
|
||||||
|
allow_private_network=allow_private_network,
|
||||||
|
resolve_dns=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def normalize_and_validate_provider_base_url(raw_value: str, *, resolve_dns: bool = False) -> str:
|
||||||
|
"""Validates and normalizes provider base URL values using configured SSRF protections."""
|
||||||
|
|
||||||
|
settings = get_settings()
|
||||||
|
allowlist = _normalize_allowlist(settings.provider_base_url_allowlist)
|
||||||
|
allow_http = settings.provider_base_url_allow_http if isinstance(settings.provider_base_url_allow_http, bool) else False
|
||||||
|
allow_private_network = (
|
||||||
|
settings.provider_base_url_allow_private_network
|
||||||
|
if isinstance(settings.provider_base_url_allow_private_network, bool)
|
||||||
|
else False
|
||||||
|
)
|
||||||
|
if resolve_dns:
|
||||||
|
return _normalize_and_validate_provider_base_url(
|
||||||
|
raw_value=raw_value,
|
||||||
|
allowlist=allowlist,
|
||||||
|
allow_http=allow_http,
|
||||||
|
allow_private_network=allow_private_network,
|
||||||
|
resolve_dns=True,
|
||||||
|
)
|
||||||
|
return _normalize_and_validate_provider_base_url_cached(
|
||||||
|
raw_value=raw_value,
|
||||||
|
allowlist=allowlist,
|
||||||
|
allow_http=allow_http,
|
||||||
|
allow_private_network=allow_private_network,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
@lru_cache(maxsize=1)
|
@lru_cache(maxsize=1)
|
||||||
def get_settings() -> Settings:
|
def get_settings() -> Settings:
|
||||||
"""Returns a cached settings object for dependency injection and service access."""
|
"""Returns a cached settings object for dependency injection and service access."""
|
||||||
|
|||||||
@@ -1,7 +1,10 @@
|
|||||||
"""FastAPI entrypoint for the DMS backend service."""
|
"""FastAPI entrypoint for the DMS backend service."""
|
||||||
|
|
||||||
from fastapi import FastAPI
|
from typing import Awaitable, Callable
|
||||||
|
|
||||||
|
from fastapi import FastAPI, Request, Response
|
||||||
from fastapi.middleware.cors import CORSMiddleware
|
from fastapi.middleware.cors import CORSMiddleware
|
||||||
|
from fastapi.responses import JSONResponse
|
||||||
|
|
||||||
from app.api.router import api_router
|
from app.api.router import api_router
|
||||||
from app.core.config import get_settings
|
from app.core.config import get_settings
|
||||||
@@ -13,6 +16,18 @@ from app.services.typesense_index import ensure_typesense_collection
|
|||||||
|
|
||||||
|
|
||||||
settings = get_settings()
|
settings = get_settings()
|
||||||
|
UPLOAD_ENDPOINT_PATH = "/api/v1/documents/upload"
|
||||||
|
UPLOAD_ENDPOINT_METHOD = "POST"
|
||||||
|
|
||||||
|
|
||||||
|
def _is_upload_size_guard_target(request: Request) -> bool:
|
||||||
|
"""Returns whether upload request-size enforcement applies to this request.
|
||||||
|
|
||||||
|
Upload-size validation is intentionally scoped to the upload POST endpoint so CORS
|
||||||
|
preflight OPTIONS requests can pass through CORSMiddleware.
|
||||||
|
"""
|
||||||
|
|
||||||
|
return request.method.upper() == UPLOAD_ENDPOINT_METHOD and request.url.path == UPLOAD_ENDPOINT_PATH
|
||||||
|
|
||||||
|
|
||||||
def create_app() -> FastAPI:
|
def create_app() -> FastAPI:
|
||||||
@@ -28,6 +43,38 @@ def create_app() -> FastAPI:
|
|||||||
)
|
)
|
||||||
app.include_router(api_router, prefix="/api/v1")
|
app.include_router(api_router, prefix="/api/v1")
|
||||||
|
|
||||||
|
@app.middleware("http")
|
||||||
|
async def enforce_upload_request_size(
|
||||||
|
request: Request,
|
||||||
|
call_next: Callable[[Request], Awaitable[Response]],
|
||||||
|
) -> Response:
|
||||||
|
"""Rejects only POST upload bodies without deterministic length or with oversized request totals."""
|
||||||
|
|
||||||
|
if _is_upload_size_guard_target(request):
|
||||||
|
content_length = request.headers.get("content-length", "").strip()
|
||||||
|
if not content_length:
|
||||||
|
return JSONResponse(
|
||||||
|
status_code=411,
|
||||||
|
content={"detail": "Content-Length header is required for document uploads"},
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
content_length_value = int(content_length)
|
||||||
|
except ValueError:
|
||||||
|
return JSONResponse(status_code=400, content={"detail": "Invalid Content-Length header"})
|
||||||
|
if content_length_value <= 0:
|
||||||
|
return JSONResponse(status_code=400, content={"detail": "Content-Length must be a positive integer"})
|
||||||
|
if content_length_value > settings.max_upload_request_size_bytes:
|
||||||
|
return JSONResponse(
|
||||||
|
status_code=413,
|
||||||
|
content={
|
||||||
|
"detail": (
|
||||||
|
"Upload request exceeds total size limit "
|
||||||
|
f"({content_length_value} > {settings.max_upload_request_size_bytes} bytes)"
|
||||||
|
)
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return await call_next(request)
|
||||||
|
|
||||||
@app.on_event("startup")
|
@app.on_event("startup")
|
||||||
def startup_event() -> None:
|
def startup_event() -> None:
|
||||||
"""Initializes storage directories and database schema on service startup."""
|
"""Initializes storage directories and database schema on service startup."""
|
||||||
|
|||||||
@@ -2,14 +2,121 @@
|
|||||||
|
|
||||||
import uuid
|
import uuid
|
||||||
from datetime import UTC, datetime
|
from datetime import UTC, datetime
|
||||||
|
import re
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
from sqlalchemy import BigInteger, DateTime, ForeignKey, String, Text
|
from sqlalchemy import BigInteger, DateTime, ForeignKey, String, Text
|
||||||
from sqlalchemy.dialects.postgresql import JSONB, UUID
|
from sqlalchemy.dialects.postgresql import JSONB, UUID
|
||||||
from sqlalchemy.orm import Mapped, mapped_column
|
from sqlalchemy.orm import Mapped, mapped_column, validates
|
||||||
|
|
||||||
|
from app.core.config import get_settings
|
||||||
from app.db.base import Base
|
from app.db.base import Base
|
||||||
|
|
||||||
|
|
||||||
|
settings = get_settings()
|
||||||
|
|
||||||
|
|
||||||
|
SENSITIVE_KEY_MARKERS = (
|
||||||
|
"api_key",
|
||||||
|
"apikey",
|
||||||
|
"authorization",
|
||||||
|
"bearer",
|
||||||
|
"token",
|
||||||
|
"secret",
|
||||||
|
"password",
|
||||||
|
"credential",
|
||||||
|
"cookie",
|
||||||
|
)
|
||||||
|
SENSITIVE_TEXT_PATTERNS = (
|
||||||
|
re.compile(r"(?i)[\"']authorization[\"']\s*:\s*[\"']bearer\s+[^\"']+[\"']"),
|
||||||
|
re.compile(r"(?i)[\"']bearer[\"']\s*:\s*[\"'][^\"']+[\"']"),
|
||||||
|
re.compile(r"(?i)[\"'](?:api[_-]?key|token|secret|password)[\"']\s*:\s*[\"'][^\"']+[\"']"),
|
||||||
|
re.compile(r"(?i)\bauthorization\b\s*[:=]\s*bearer\s+[a-z0-9._~+/\-]+=*"),
|
||||||
|
re.compile(r"(?i)\bbearer\s+[a-z0-9._~+/\-]+=*"),
|
||||||
|
re.compile(r"\b[a-z0-9_-]{8,}\.[a-z0-9_-]{8,}\.[a-z0-9_-]{8,}\b", flags=re.IGNORECASE),
|
||||||
|
re.compile(r"(?i)\bsk-[a-z0-9]{16,}\b"),
|
||||||
|
re.compile(r"(?i)\b(api[_-]?key|token|secret|password)\b\s*[:=]\s*['\"]?[^\s,'\";]+['\"]?"),
|
||||||
|
)
|
||||||
|
REDACTED_TEXT = "[REDACTED]"
|
||||||
|
MAX_PAYLOAD_KEYS = 80
|
||||||
|
MAX_PAYLOAD_LIST_ITEMS = 80
|
||||||
|
|
||||||
|
|
||||||
|
def _truncate(value: str, limit: int) -> str:
|
||||||
|
"""Truncates long log fields to configured bounds with stable suffix marker."""
|
||||||
|
|
||||||
|
normalized = value.strip()
|
||||||
|
if len(normalized) <= limit:
|
||||||
|
return normalized
|
||||||
|
return normalized[: max(0, limit - 3)] + "..."
|
||||||
|
|
||||||
|
|
||||||
|
def _is_sensitive_key(key: str) -> bool:
|
||||||
|
"""Returns whether a payload key likely contains sensitive credential data."""
|
||||||
|
|
||||||
|
normalized = key.strip().lower()
|
||||||
|
return any(marker in normalized for marker in SENSITIVE_KEY_MARKERS)
|
||||||
|
|
||||||
|
|
||||||
|
def _redact_sensitive_text(value: str) -> str:
|
||||||
|
"""Redacts token-like segments from log text while retaining non-sensitive context."""
|
||||||
|
|
||||||
|
redacted = value
|
||||||
|
for pattern in SENSITIVE_TEXT_PATTERNS:
|
||||||
|
redacted = pattern.sub(lambda _: REDACTED_TEXT, redacted)
|
||||||
|
return redacted
|
||||||
|
|
||||||
|
|
||||||
|
def sanitize_processing_log_payload_value(value: Any, *, parent_key: str | None = None) -> Any:
|
||||||
|
"""Sanitizes payload structures by redacting sensitive fields and bounding size."""
|
||||||
|
|
||||||
|
if parent_key and _is_sensitive_key(parent_key):
|
||||||
|
return REDACTED_TEXT
|
||||||
|
|
||||||
|
if isinstance(value, dict):
|
||||||
|
sanitized: dict[str, Any] = {}
|
||||||
|
for index, (raw_key, raw_value) in enumerate(value.items()):
|
||||||
|
if index >= MAX_PAYLOAD_KEYS:
|
||||||
|
break
|
||||||
|
key = str(raw_key)
|
||||||
|
sanitized[key] = sanitize_processing_log_payload_value(raw_value, parent_key=key)
|
||||||
|
return sanitized
|
||||||
|
|
||||||
|
if isinstance(value, list):
|
||||||
|
return [
|
||||||
|
sanitize_processing_log_payload_value(item, parent_key=parent_key)
|
||||||
|
for item in value[:MAX_PAYLOAD_LIST_ITEMS]
|
||||||
|
]
|
||||||
|
|
||||||
|
if isinstance(value, tuple):
|
||||||
|
return [
|
||||||
|
sanitize_processing_log_payload_value(item, parent_key=parent_key)
|
||||||
|
for item in list(value)[:MAX_PAYLOAD_LIST_ITEMS]
|
||||||
|
]
|
||||||
|
|
||||||
|
if isinstance(value, str):
|
||||||
|
redacted = _redact_sensitive_text(value)
|
||||||
|
return _truncate(redacted, settings.processing_log_max_payload_chars)
|
||||||
|
|
||||||
|
if isinstance(value, (int, float, bool)) or value is None:
|
||||||
|
return value
|
||||||
|
|
||||||
|
as_text = _truncate(str(value), settings.processing_log_max_payload_chars)
|
||||||
|
return _redact_sensitive_text(as_text)
|
||||||
|
|
||||||
|
|
||||||
|
def sanitize_processing_log_text(value: str | None) -> str | None:
|
||||||
|
"""Sanitizes prompt and response fields by redacting credentials and clamping length."""
|
||||||
|
|
||||||
|
if value is None:
|
||||||
|
return None
|
||||||
|
normalized = value.strip()
|
||||||
|
if not normalized:
|
||||||
|
return None
|
||||||
|
redacted = _redact_sensitive_text(normalized)
|
||||||
|
return _truncate(redacted, settings.processing_log_max_text_chars)
|
||||||
|
|
||||||
|
|
||||||
class ProcessingLogEntry(Base):
|
class ProcessingLogEntry(Base):
|
||||||
"""Stores a timestamped processing event with optional model prompt and response text."""
|
"""Stores a timestamped processing event with optional model prompt and response text."""
|
||||||
|
|
||||||
@@ -31,3 +138,17 @@ class ProcessingLogEntry(Base):
|
|||||||
prompt_text: Mapped[str | None] = mapped_column(Text, nullable=True)
|
prompt_text: Mapped[str | None] = mapped_column(Text, nullable=True)
|
||||||
response_text: Mapped[str | None] = mapped_column(Text, nullable=True)
|
response_text: Mapped[str | None] = mapped_column(Text, nullable=True)
|
||||||
payload_json: Mapped[dict] = mapped_column(JSONB, nullable=False, default=dict)
|
payload_json: Mapped[dict] = mapped_column(JSONB, nullable=False, default=dict)
|
||||||
|
|
||||||
|
@validates("prompt_text", "response_text")
|
||||||
|
def _validate_text_fields(self, key: str, value: str | None) -> str | None:
|
||||||
|
"""Redacts and bounds free-text log fields before persistence."""
|
||||||
|
|
||||||
|
return sanitize_processing_log_text(value)
|
||||||
|
|
||||||
|
@validates("payload_json")
|
||||||
|
def _validate_payload_json(self, key: str, value: dict[str, Any] | None) -> dict[str, Any]:
|
||||||
|
"""Redacts and bounds structured payload fields before persistence."""
|
||||||
|
|
||||||
|
if not isinstance(value, dict):
|
||||||
|
return {}
|
||||||
|
return sanitize_processing_log_payload_value(value)
|
||||||
|
|||||||
@@ -1,13 +1,16 @@
|
|||||||
"""Pydantic schemas for processing pipeline log API payloads."""
|
"""Pydantic schemas for processing pipeline log API payloads."""
|
||||||
|
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
|
from typing import Any
|
||||||
from uuid import UUID
|
from uuid import UUID
|
||||||
|
|
||||||
from pydantic import BaseModel, Field
|
from pydantic import BaseModel, Field, field_validator
|
||||||
|
|
||||||
|
from app.models.processing_log import sanitize_processing_log_payload_value, sanitize_processing_log_text
|
||||||
|
|
||||||
|
|
||||||
class ProcessingLogEntryResponse(BaseModel):
|
class ProcessingLogEntryResponse(BaseModel):
|
||||||
"""Represents one persisted processing log event returned by API endpoints."""
|
"""Represents one persisted processing log event with already-redacted sensitive fields."""
|
||||||
|
|
||||||
id: int
|
id: int
|
||||||
created_at: datetime
|
created_at: datetime
|
||||||
@@ -20,7 +23,26 @@ class ProcessingLogEntryResponse(BaseModel):
|
|||||||
model_name: str | None
|
model_name: str | None
|
||||||
prompt_text: str | None
|
prompt_text: str | None
|
||||||
response_text: str | None
|
response_text: str | None
|
||||||
payload_json: dict
|
payload_json: dict[str, Any]
|
||||||
|
|
||||||
|
@field_validator("prompt_text", "response_text", mode="before")
|
||||||
|
@classmethod
|
||||||
|
def _sanitize_text_fields(cls, value: Any) -> str | None:
|
||||||
|
"""Ensures log text fields are redacted in API responses."""
|
||||||
|
|
||||||
|
if value is None:
|
||||||
|
return None
|
||||||
|
return sanitize_processing_log_text(str(value))
|
||||||
|
|
||||||
|
@field_validator("payload_json", mode="before")
|
||||||
|
@classmethod
|
||||||
|
def _sanitize_payload_field(cls, value: Any) -> dict[str, Any]:
|
||||||
|
"""Ensures payload fields are redacted in API responses."""
|
||||||
|
|
||||||
|
if not isinstance(value, dict):
|
||||||
|
return {}
|
||||||
|
sanitized = sanitize_processing_log_payload_value(value)
|
||||||
|
return sanitized if isinstance(sanitized, dict) else {}
|
||||||
|
|
||||||
class Config:
|
class Config:
|
||||||
"""Enables ORM object parsing for SQLAlchemy model instances."""
|
"""Enables ORM object parsing for SQLAlchemy model instances."""
|
||||||
|
|||||||
@@ -5,12 +5,16 @@ import re
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from app.core.config import get_settings
|
from app.core.config import get_settings, normalize_and_validate_provider_base_url
|
||||||
|
|
||||||
|
|
||||||
settings = get_settings()
|
settings = get_settings()
|
||||||
|
|
||||||
|
|
||||||
|
class AppSettingsValidationError(ValueError):
|
||||||
|
"""Raised when user-provided settings values fail security or contract validation."""
|
||||||
|
|
||||||
|
|
||||||
TASK_OCR_HANDWRITING = "ocr_handwriting"
|
TASK_OCR_HANDWRITING = "ocr_handwriting"
|
||||||
TASK_SUMMARY_GENERATION = "summary_generation"
|
TASK_SUMMARY_GENERATION = "summary_generation"
|
||||||
TASK_ROUTING_CLASSIFICATION = "routing_classification"
|
TASK_ROUTING_CLASSIFICATION = "routing_classification"
|
||||||
@@ -156,13 +160,13 @@ def _clamp_cards_per_page(value: int) -> int:
|
|||||||
def _clamp_processing_log_document_sessions(value: int) -> int:
|
def _clamp_processing_log_document_sessions(value: int) -> int:
|
||||||
"""Clamps the number of recent document log sessions kept during cleanup."""
|
"""Clamps the number of recent document log sessions kept during cleanup."""
|
||||||
|
|
||||||
return max(0, min(20, value))
|
return max(0, min(settings.processing_log_max_document_sessions, value))
|
||||||
|
|
||||||
|
|
||||||
def _clamp_processing_log_unbound_entries(value: int) -> int:
|
def _clamp_processing_log_unbound_entries(value: int) -> int:
|
||||||
"""Clamps retained unbound processing log events kept during cleanup."""
|
"""Clamps retained unbound processing log events kept during cleanup."""
|
||||||
|
|
||||||
return max(0, min(400, value))
|
return max(0, min(settings.processing_log_max_unbound_entries, value))
|
||||||
|
|
||||||
|
|
||||||
def _clamp_predefined_entries_limit(value: int) -> int:
|
def _clamp_predefined_entries_limit(value: int) -> int:
|
||||||
@@ -242,12 +246,19 @@ def _normalize_provider(
|
|||||||
api_key_value = payload.get("api_key", fallback_values.get("api_key", defaults["api_key"]))
|
api_key_value = payload.get("api_key", fallback_values.get("api_key", defaults["api_key"]))
|
||||||
api_key = str(api_key_value).strip() if api_key_value is not None else ""
|
api_key = str(api_key_value).strip() if api_key_value is not None else ""
|
||||||
|
|
||||||
|
raw_base_url = str(payload.get("base_url", fallback_values.get("base_url", defaults["base_url"]))).strip()
|
||||||
|
if not raw_base_url:
|
||||||
|
raw_base_url = str(defaults["base_url"]).strip()
|
||||||
|
try:
|
||||||
|
normalized_base_url = normalize_and_validate_provider_base_url(raw_base_url)
|
||||||
|
except ValueError as error:
|
||||||
|
raise AppSettingsValidationError(str(error)) from error
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"id": provider_id,
|
"id": provider_id,
|
||||||
"label": str(payload.get("label", fallback_values.get("label", provider_id))).strip() or provider_id,
|
"label": str(payload.get("label", fallback_values.get("label", provider_id))).strip() or provider_id,
|
||||||
"provider_type": provider_type,
|
"provider_type": provider_type,
|
||||||
"base_url": str(payload.get("base_url", fallback_values.get("base_url", defaults["base_url"]))).strip()
|
"base_url": normalized_base_url,
|
||||||
or defaults["base_url"],
|
|
||||||
"timeout_seconds": _clamp_timeout(
|
"timeout_seconds": _clamp_timeout(
|
||||||
_safe_int(
|
_safe_int(
|
||||||
payload.get("timeout_seconds", fallback_values.get("timeout_seconds", defaults["timeout_seconds"])),
|
payload.get("timeout_seconds", fallback_values.get("timeout_seconds", defaults["timeout_seconds"])),
|
||||||
@@ -576,7 +587,7 @@ def _normalize_handwriting_style_settings(payload: dict[str, Any], defaults: dic
|
|||||||
|
|
||||||
|
|
||||||
def _sanitize_settings(payload: dict[str, Any]) -> dict[str, Any]:
|
def _sanitize_settings(payload: dict[str, Any]) -> dict[str, Any]:
|
||||||
"""Sanitizes all persisted settings into a stable normalized structure."""
|
"""Sanitizes persisted settings into a stable structure while tolerating corrupt provider rows."""
|
||||||
|
|
||||||
if not isinstance(payload, dict):
|
if not isinstance(payload, dict):
|
||||||
payload = {}
|
payload = {}
|
||||||
@@ -592,7 +603,14 @@ def _sanitize_settings(payload: dict[str, Any]) -> dict[str, Any]:
|
|||||||
if not isinstance(provider_payload, dict):
|
if not isinstance(provider_payload, dict):
|
||||||
continue
|
continue
|
||||||
fallback = defaults["providers"][0]
|
fallback = defaults["providers"][0]
|
||||||
candidate = _normalize_provider(provider_payload, fallback_id=f"provider-{index + 1}", fallback_values=fallback)
|
try:
|
||||||
|
candidate = _normalize_provider(
|
||||||
|
provider_payload,
|
||||||
|
fallback_id=f"provider-{index + 1}",
|
||||||
|
fallback_values=fallback,
|
||||||
|
)
|
||||||
|
except AppSettingsValidationError:
|
||||||
|
continue
|
||||||
if candidate["id"] in seen_provider_ids:
|
if candidate["id"] in seen_provider_ids:
|
||||||
continue
|
continue
|
||||||
seen_provider_ids.add(candidate["id"])
|
seen_provider_ids.add(candidate["id"])
|
||||||
|
|||||||
@@ -300,16 +300,39 @@ def extract_text_content(filename: str, data: bytes, mime_type: str) -> Extracti
|
|||||||
|
|
||||||
|
|
||||||
def extract_archive_members(data: bytes, depth: int = 0) -> list[ArchiveMember]:
|
def extract_archive_members(data: bytes, depth: int = 0) -> list[ArchiveMember]:
|
||||||
"""Extracts processable members from zip archives with configurable depth limits."""
|
"""Extracts processable ZIP members within configured decompression safety budgets."""
|
||||||
|
|
||||||
members: list[ArchiveMember] = []
|
members: list[ArchiveMember] = []
|
||||||
if depth > settings.max_zip_depth:
|
if depth > settings.max_zip_depth:
|
||||||
return members
|
return members
|
||||||
|
|
||||||
|
total_uncompressed_bytes = 0
|
||||||
|
try:
|
||||||
with zipfile.ZipFile(io.BytesIO(data)) as archive:
|
with zipfile.ZipFile(io.BytesIO(data)) as archive:
|
||||||
infos = [info for info in archive.infolist() if not info.is_dir()][: settings.max_zip_members]
|
infos = [info for info in archive.infolist() if not info.is_dir()][: settings.max_zip_members]
|
||||||
for info in infos:
|
for info in infos:
|
||||||
member_data = archive.read(info.filename)
|
if info.file_size <= 0:
|
||||||
|
continue
|
||||||
|
if info.file_size > settings.max_zip_member_uncompressed_bytes:
|
||||||
|
continue
|
||||||
|
if total_uncompressed_bytes + info.file_size > settings.max_zip_total_uncompressed_bytes:
|
||||||
|
continue
|
||||||
|
|
||||||
|
compressed_size = max(1, int(info.compress_size))
|
||||||
|
compression_ratio = float(info.file_size) / float(compressed_size)
|
||||||
|
if compression_ratio > settings.max_zip_compression_ratio:
|
||||||
|
continue
|
||||||
|
|
||||||
|
with archive.open(info, mode="r") as archive_member:
|
||||||
|
member_data = archive_member.read(settings.max_zip_member_uncompressed_bytes + 1)
|
||||||
|
if len(member_data) > settings.max_zip_member_uncompressed_bytes:
|
||||||
|
continue
|
||||||
|
if total_uncompressed_bytes + len(member_data) > settings.max_zip_total_uncompressed_bytes:
|
||||||
|
continue
|
||||||
|
|
||||||
|
total_uncompressed_bytes += len(member_data)
|
||||||
members.append(ArchiveMember(name=info.filename, data=member_data))
|
members.append(ArchiveMember(name=info.filename, data=member_data))
|
||||||
|
except zipfile.BadZipFile:
|
||||||
|
return []
|
||||||
|
|
||||||
return members
|
return members
|
||||||
|
|||||||
@@ -2,10 +2,10 @@
|
|||||||
|
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from typing import Any
|
from typing import Any
|
||||||
from urllib.parse import urlparse, urlunparse
|
|
||||||
|
|
||||||
from openai import APIConnectionError, APIError, APITimeoutError, OpenAI
|
from openai import APIConnectionError, APIError, APITimeoutError, OpenAI
|
||||||
|
|
||||||
|
from app.core.config import normalize_and_validate_provider_base_url
|
||||||
from app.services.app_settings import read_task_runtime_settings
|
from app.services.app_settings import read_task_runtime_settings
|
||||||
|
|
||||||
|
|
||||||
@@ -36,18 +36,9 @@ class ModelTaskRuntime:
|
|||||||
|
|
||||||
|
|
||||||
def _normalize_base_url(raw_value: str) -> str:
|
def _normalize_base_url(raw_value: str) -> str:
|
||||||
"""Normalizes provider base URL and appends /v1 for OpenAI-compatible servers."""
|
"""Normalizes provider base URL and enforces SSRF protections before outbound calls."""
|
||||||
|
|
||||||
trimmed = raw_value.strip().rstrip("/")
|
return normalize_and_validate_provider_base_url(raw_value, resolve_dns=True)
|
||||||
if not trimmed:
|
|
||||||
return "https://api.openai.com/v1"
|
|
||||||
|
|
||||||
parsed = urlparse(trimmed)
|
|
||||||
path = parsed.path or ""
|
|
||||||
if not path.endswith("/v1"):
|
|
||||||
path = f"{path}/v1" if path else "/v1"
|
|
||||||
|
|
||||||
return urlunparse(parsed._replace(path=path))
|
|
||||||
|
|
||||||
|
|
||||||
def _should_fallback_to_chat(error: Exception) -> bool:
|
def _should_fallback_to_chat(error: Exception) -> bool:
|
||||||
@@ -137,11 +128,16 @@ def resolve_task_runtime(task_name: str) -> ModelTaskRuntime:
|
|||||||
if provider_type != "openai_compatible":
|
if provider_type != "openai_compatible":
|
||||||
raise ModelTaskError(f"unsupported_provider_type:{provider_type}")
|
raise ModelTaskError(f"unsupported_provider_type:{provider_type}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
normalized_base_url = _normalize_base_url(str(provider_payload.get("base_url", "https://api.openai.com/v1")))
|
||||||
|
except ValueError as error:
|
||||||
|
raise ModelTaskError(f"invalid_provider_base_url:{error}") from error
|
||||||
|
|
||||||
return ModelTaskRuntime(
|
return ModelTaskRuntime(
|
||||||
task_name=task_name,
|
task_name=task_name,
|
||||||
provider_id=str(provider_payload.get("id", "")),
|
provider_id=str(provider_payload.get("id", "")),
|
||||||
provider_type=provider_type,
|
provider_type=provider_type,
|
||||||
base_url=_normalize_base_url(str(provider_payload.get("base_url", "https://api.openai.com/v1"))),
|
base_url=normalized_base_url,
|
||||||
timeout_seconds=int(provider_payload.get("timeout_seconds", 45)),
|
timeout_seconds=int(provider_payload.get("timeout_seconds", 45)),
|
||||||
api_key=str(provider_payload.get("api_key", "")).strip() or "no-key-required",
|
api_key=str(provider_payload.get("api_key", "")).strip() or "no-key-required",
|
||||||
model=str(task_payload.get("model", "")).strip(),
|
model=str(task_payload.get("model", "")).strip(),
|
||||||
|
|||||||
149
backend/tests/test_app_settings_provider_resilience.py
Normal file
149
backend/tests/test_app_settings_provider_resilience.py
Normal file
@@ -0,0 +1,149 @@
|
|||||||
|
"""Unit coverage for resilient provider sanitization in persisted app settings."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import unittest
|
||||||
|
from pathlib import Path
|
||||||
|
from types import ModuleType
|
||||||
|
from typing import Any
|
||||||
|
from unittest.mock import patch
|
||||||
|
|
||||||
|
|
||||||
|
BACKEND_ROOT = Path(__file__).resolve().parents[1]
|
||||||
|
if str(BACKEND_ROOT) not in sys.path:
|
||||||
|
sys.path.insert(0, str(BACKEND_ROOT))
|
||||||
|
|
||||||
|
if "pydantic_settings" not in sys.modules:
|
||||||
|
pydantic_settings_stub = ModuleType("pydantic_settings")
|
||||||
|
|
||||||
|
class _BaseSettings:
|
||||||
|
"""Minimal BaseSettings replacement for dependency-light unit test execution."""
|
||||||
|
|
||||||
|
def __init__(self, **kwargs: object) -> None:
|
||||||
|
for key, value in kwargs.items():
|
||||||
|
setattr(self, key, value)
|
||||||
|
|
||||||
|
def _settings_config_dict(**kwargs: object) -> dict[str, object]:
|
||||||
|
"""Returns configuration values using dict semantics expected by settings module."""
|
||||||
|
|
||||||
|
return kwargs
|
||||||
|
|
||||||
|
pydantic_settings_stub.BaseSettings = _BaseSettings
|
||||||
|
pydantic_settings_stub.SettingsConfigDict = _settings_config_dict
|
||||||
|
sys.modules["pydantic_settings"] = pydantic_settings_stub
|
||||||
|
|
||||||
|
from app.services import app_settings
|
||||||
|
|
||||||
|
|
||||||
|
def _sample_current_payload() -> dict[str, Any]:
|
||||||
|
"""Builds a sanitized payload used as in-memory persistence fixture for update tests."""
|
||||||
|
|
||||||
|
return app_settings._sanitize_settings(app_settings._default_settings())
|
||||||
|
|
||||||
|
|
||||||
|
class AppSettingsProviderResilienceTests(unittest.TestCase):
|
||||||
|
"""Verifies read-path resilience for corrupt persisted providers without weakening writes."""
|
||||||
|
|
||||||
|
def test_sanitize_settings_skips_invalid_persisted_provider_entries(self) -> None:
|
||||||
|
"""Invalid persisted providers are skipped and tasks rebind to remaining valid providers."""
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"providers": [
|
||||||
|
{
|
||||||
|
"id": "insecure-provider",
|
||||||
|
"label": "Insecure Provider",
|
||||||
|
"provider_type": "openai_compatible",
|
||||||
|
"base_url": "http://api.openai.com/v1",
|
||||||
|
"timeout_seconds": 45,
|
||||||
|
"api_key": "",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "secure-provider",
|
||||||
|
"label": "Secure Provider",
|
||||||
|
"provider_type": "openai_compatible",
|
||||||
|
"base_url": "https://api.openai.com/v1",
|
||||||
|
"timeout_seconds": 45,
|
||||||
|
"api_key": "",
|
||||||
|
},
|
||||||
|
],
|
||||||
|
"tasks": {
|
||||||
|
app_settings.TASK_OCR_HANDWRITING: {"provider_id": "insecure-provider"},
|
||||||
|
app_settings.TASK_SUMMARY_GENERATION: {"provider_id": "insecure-provider"},
|
||||||
|
app_settings.TASK_ROUTING_CLASSIFICATION: {"provider_id": "insecure-provider"},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
sanitized = app_settings._sanitize_settings(payload)
|
||||||
|
self.assertEqual([provider["id"] for provider in sanitized["providers"]], ["secure-provider"])
|
||||||
|
self.assertEqual(
|
||||||
|
sanitized["tasks"][app_settings.TASK_OCR_HANDWRITING]["provider_id"],
|
||||||
|
"secure-provider",
|
||||||
|
)
|
||||||
|
self.assertEqual(
|
||||||
|
sanitized["tasks"][app_settings.TASK_SUMMARY_GENERATION]["provider_id"],
|
||||||
|
"secure-provider",
|
||||||
|
)
|
||||||
|
self.assertEqual(
|
||||||
|
sanitized["tasks"][app_settings.TASK_ROUTING_CLASSIFICATION]["provider_id"],
|
||||||
|
"secure-provider",
|
||||||
|
)
|
||||||
|
|
||||||
|
def test_sanitize_settings_uses_default_provider_when_all_persisted_entries_are_invalid(self) -> None:
|
||||||
|
"""Default provider is restored when all persisted provider rows are invalid."""
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"providers": [
|
||||||
|
{
|
||||||
|
"id": "insecure-provider",
|
||||||
|
"label": "Insecure Provider",
|
||||||
|
"provider_type": "openai_compatible",
|
||||||
|
"base_url": "http://api.openai.com/v1",
|
||||||
|
"timeout_seconds": 45,
|
||||||
|
"api_key": "",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
sanitized = app_settings._sanitize_settings(payload)
|
||||||
|
defaults = app_settings._default_settings()
|
||||||
|
default_provider_id = defaults["providers"][0]["id"]
|
||||||
|
self.assertEqual(sanitized["providers"][0]["id"], default_provider_id)
|
||||||
|
self.assertEqual(sanitized["providers"][0]["base_url"], defaults["providers"][0]["base_url"])
|
||||||
|
self.assertEqual(
|
||||||
|
sanitized["tasks"][app_settings.TASK_OCR_HANDWRITING]["provider_id"],
|
||||||
|
default_provider_id,
|
||||||
|
)
|
||||||
|
self.assertEqual(
|
||||||
|
sanitized["tasks"][app_settings.TASK_SUMMARY_GENERATION]["provider_id"],
|
||||||
|
default_provider_id,
|
||||||
|
)
|
||||||
|
self.assertEqual(
|
||||||
|
sanitized["tasks"][app_settings.TASK_ROUTING_CLASSIFICATION]["provider_id"],
|
||||||
|
default_provider_id,
|
||||||
|
)
|
||||||
|
|
||||||
|
def test_update_app_settings_keeps_provider_base_url_validation_strict(self) -> None:
|
||||||
|
"""Provider write updates still reject invalid base URLs instead of silently sanitizing."""
|
||||||
|
|
||||||
|
current_payload = _sample_current_payload()
|
||||||
|
current_provider = current_payload["providers"][0]
|
||||||
|
provider_update = {
|
||||||
|
"id": current_provider["id"],
|
||||||
|
"label": current_provider["label"],
|
||||||
|
"provider_type": current_provider["provider_type"],
|
||||||
|
"base_url": "http://api.openai.com/v1",
|
||||||
|
"timeout_seconds": current_provider["timeout_seconds"],
|
||||||
|
}
|
||||||
|
|
||||||
|
with (
|
||||||
|
patch.object(app_settings, "_read_raw_settings", return_value=current_payload),
|
||||||
|
patch.object(app_settings, "_write_settings") as write_settings_mock,
|
||||||
|
):
|
||||||
|
with self.assertRaises(app_settings.AppSettingsValidationError):
|
||||||
|
app_settings.update_app_settings(providers=[provider_update])
|
||||||
|
write_settings_mock.assert_not_called()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
unittest.main()
|
||||||
341
backend/tests/test_security_controls.py
Normal file
341
backend/tests/test_security_controls.py
Normal file
@@ -0,0 +1,341 @@
|
|||||||
|
"""Unit coverage for API auth, SSRF validation, and processing-log redaction controls."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from datetime import UTC, datetime
|
||||||
|
import socket
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from types import ModuleType, SimpleNamespace
|
||||||
|
import unittest
|
||||||
|
from unittest.mock import patch
|
||||||
|
|
||||||
|
|
||||||
|
BACKEND_ROOT = Path(__file__).resolve().parents[1]
|
||||||
|
if str(BACKEND_ROOT) not in sys.path:
|
||||||
|
sys.path.insert(0, str(BACKEND_ROOT))
|
||||||
|
|
||||||
|
if "pydantic_settings" not in sys.modules:
|
||||||
|
pydantic_settings_stub = ModuleType("pydantic_settings")
|
||||||
|
|
||||||
|
class _BaseSettings:
|
||||||
|
"""Minimal BaseSettings replacement for dependency-light unit test execution."""
|
||||||
|
|
||||||
|
def __init__(self, **kwargs: object) -> None:
|
||||||
|
for key, value in kwargs.items():
|
||||||
|
setattr(self, key, value)
|
||||||
|
|
||||||
|
def _settings_config_dict(**kwargs: object) -> dict[str, object]:
|
||||||
|
"""Returns configuration values using dict semantics expected by settings module."""
|
||||||
|
|
||||||
|
return kwargs
|
||||||
|
|
||||||
|
pydantic_settings_stub.BaseSettings = _BaseSettings
|
||||||
|
pydantic_settings_stub.SettingsConfigDict = _settings_config_dict
|
||||||
|
sys.modules["pydantic_settings"] = pydantic_settings_stub
|
||||||
|
|
||||||
|
if "fastapi" not in sys.modules:
|
||||||
|
fastapi_stub = ModuleType("fastapi")
|
||||||
|
|
||||||
|
class _HTTPException(Exception):
|
||||||
|
"""Minimal HTTPException compatible with route dependency tests."""
|
||||||
|
|
||||||
|
def __init__(self, status_code: int, detail: str, headers: dict[str, str] | None = None) -> None:
|
||||||
|
super().__init__(detail)
|
||||||
|
self.status_code = status_code
|
||||||
|
self.detail = detail
|
||||||
|
self.headers = headers or {}
|
||||||
|
|
||||||
|
class _Status:
|
||||||
|
"""Minimal status namespace for auth unit tests."""
|
||||||
|
|
||||||
|
HTTP_401_UNAUTHORIZED = 401
|
||||||
|
HTTP_403_FORBIDDEN = 403
|
||||||
|
HTTP_503_SERVICE_UNAVAILABLE = 503
|
||||||
|
|
||||||
|
def _depends(dependency): # type: ignore[no-untyped-def]
|
||||||
|
"""Returns provided dependency unchanged for unit testing."""
|
||||||
|
|
||||||
|
return dependency
|
||||||
|
|
||||||
|
fastapi_stub.Depends = _depends
|
||||||
|
fastapi_stub.HTTPException = _HTTPException
|
||||||
|
fastapi_stub.status = _Status()
|
||||||
|
sys.modules["fastapi"] = fastapi_stub
|
||||||
|
|
||||||
|
if "fastapi.security" not in sys.modules:
|
||||||
|
fastapi_security_stub = ModuleType("fastapi.security")
|
||||||
|
|
||||||
|
class _HTTPAuthorizationCredentials:
|
||||||
|
"""Minimal bearer credential object used by auth dependency tests."""
|
||||||
|
|
||||||
|
def __init__(self, *, scheme: str, credentials: str) -> None:
|
||||||
|
self.scheme = scheme
|
||||||
|
self.credentials = credentials
|
||||||
|
|
||||||
|
class _HTTPBearer:
|
||||||
|
"""Minimal HTTPBearer stand-in for dependency construction."""
|
||||||
|
|
||||||
|
def __init__(self, auto_error: bool = True) -> None:
|
||||||
|
self.auto_error = auto_error
|
||||||
|
|
||||||
|
fastapi_security_stub.HTTPAuthorizationCredentials = _HTTPAuthorizationCredentials
|
||||||
|
fastapi_security_stub.HTTPBearer = _HTTPBearer
|
||||||
|
sys.modules["fastapi.security"] = fastapi_security_stub
|
||||||
|
|
||||||
|
from fastapi import HTTPException
|
||||||
|
from fastapi.security import HTTPAuthorizationCredentials
|
||||||
|
|
||||||
|
from app.api.auth import AuthRole, get_request_role, require_admin
|
||||||
|
from app.core import config as config_module
|
||||||
|
from app.models.processing_log import sanitize_processing_log_payload_value, sanitize_processing_log_text
|
||||||
|
from app.schemas.processing_logs import ProcessingLogEntryResponse
|
||||||
|
|
||||||
|
|
||||||
|
def _security_settings(
|
||||||
|
*,
|
||||||
|
allowlist: list[str] | None = None,
|
||||||
|
allow_http: bool = False,
|
||||||
|
allow_private_network: bool = False,
|
||||||
|
) -> SimpleNamespace:
|
||||||
|
"""Builds lightweight settings object for provider URL validation tests."""
|
||||||
|
|
||||||
|
return SimpleNamespace(
|
||||||
|
provider_base_url_allowlist=allowlist if allowlist is not None else ["api.openai.com"],
|
||||||
|
provider_base_url_allow_http=allow_http,
|
||||||
|
provider_base_url_allow_private_network=allow_private_network,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class AuthDependencyTests(unittest.TestCase):
|
||||||
|
"""Verifies token authentication and admin authorization behavior."""
|
||||||
|
|
||||||
|
def test_get_request_role_accepts_admin_token(self) -> None:
|
||||||
|
"""Admin token resolves admin role."""
|
||||||
|
|
||||||
|
settings = SimpleNamespace(admin_api_token="admin-token", user_api_token="user-token")
|
||||||
|
credentials = HTTPAuthorizationCredentials(scheme="Bearer", credentials="admin-token")
|
||||||
|
role = get_request_role(credentials=credentials, settings=settings)
|
||||||
|
self.assertEqual(role, AuthRole.ADMIN)
|
||||||
|
|
||||||
|
def test_get_request_role_rejects_missing_credentials(self) -> None:
|
||||||
|
"""Missing bearer credentials return 401."""
|
||||||
|
|
||||||
|
settings = SimpleNamespace(admin_api_token="admin-token", user_api_token="user-token")
|
||||||
|
with self.assertRaises(HTTPException) as context:
|
||||||
|
get_request_role(credentials=None, settings=settings)
|
||||||
|
self.assertEqual(context.exception.status_code, 401)
|
||||||
|
|
||||||
|
def test_require_admin_rejects_user_role(self) -> None:
|
||||||
|
"""User role cannot access admin-only endpoints."""
|
||||||
|
|
||||||
|
with self.assertRaises(HTTPException) as context:
|
||||||
|
require_admin(role=AuthRole.USER)
|
||||||
|
self.assertEqual(context.exception.status_code, 403)
|
||||||
|
|
||||||
|
|
||||||
|
class ProviderBaseUrlValidationTests(unittest.TestCase):
|
||||||
|
"""Verifies allowlist, scheme, and private-network SSRF protections."""
|
||||||
|
|
||||||
|
def setUp(self) -> None:
|
||||||
|
"""Clears URL validation cache to keep tests independent."""
|
||||||
|
|
||||||
|
config_module._normalize_and_validate_provider_base_url_cached.cache_clear()
|
||||||
|
|
||||||
|
def test_validation_accepts_allowlisted_https_url(self) -> None:
|
||||||
|
"""Allowlisted HTTPS URLs are normalized with /v1 suffix."""
|
||||||
|
|
||||||
|
with patch.object(config_module, "get_settings", return_value=_security_settings(allowlist=["api.openai.com"])):
|
||||||
|
normalized = config_module.normalize_and_validate_provider_base_url("https://api.openai.com")
|
||||||
|
self.assertEqual(normalized, "https://api.openai.com/v1")
|
||||||
|
|
||||||
|
def test_validation_rejects_non_allowlisted_host(self) -> None:
|
||||||
|
"""Hosts outside configured allowlist are rejected."""
|
||||||
|
|
||||||
|
with patch.object(config_module, "get_settings", return_value=_security_settings(allowlist=["api.openai.com"])):
|
||||||
|
with self.assertRaises(ValueError):
|
||||||
|
config_module.normalize_and_validate_provider_base_url("https://example.org/v1")
|
||||||
|
|
||||||
|
def test_validation_rejects_private_ip_literal(self) -> None:
|
||||||
|
"""Private and loopback IP literals are blocked."""
|
||||||
|
|
||||||
|
with patch.object(config_module, "get_settings", return_value=_security_settings(allowlist=[])):
|
||||||
|
with self.assertRaises(ValueError):
|
||||||
|
config_module.normalize_and_validate_provider_base_url("https://127.0.0.1/v1")
|
||||||
|
|
||||||
|
def test_validation_rejects_private_ip_after_dns_resolution(self) -> None:
|
||||||
|
"""DNS rebind protection blocks public hostnames resolving to private addresses."""
|
||||||
|
|
||||||
|
mocked_dns_response = [
|
||||||
|
(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP, "", ("127.0.0.1", 443)),
|
||||||
|
]
|
||||||
|
with (
|
||||||
|
patch.object(config_module, "get_settings", return_value=_security_settings(allowlist=["api.openai.com"])),
|
||||||
|
patch.object(config_module.socket, "getaddrinfo", return_value=mocked_dns_response),
|
||||||
|
):
|
||||||
|
with self.assertRaises(ValueError):
|
||||||
|
config_module.normalize_and_validate_provider_base_url(
|
||||||
|
"https://api.openai.com/v1",
|
||||||
|
resolve_dns=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
def test_resolve_dns_validation_revalidates_each_call(self) -> None:
|
||||||
|
"""DNS-resolved validation is not cached and re-checks host resolution each call."""
|
||||||
|
|
||||||
|
mocked_dns_response = [
|
||||||
|
(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP, "", ("8.8.8.8", 443)),
|
||||||
|
]
|
||||||
|
with (
|
||||||
|
patch.object(config_module, "get_settings", return_value=_security_settings(allowlist=["api.openai.com"])),
|
||||||
|
patch.object(config_module.socket, "getaddrinfo", return_value=mocked_dns_response) as getaddrinfo_mock,
|
||||||
|
):
|
||||||
|
first = config_module.normalize_and_validate_provider_base_url(
|
||||||
|
"https://api.openai.com/v1",
|
||||||
|
resolve_dns=True,
|
||||||
|
)
|
||||||
|
second = config_module.normalize_and_validate_provider_base_url(
|
||||||
|
"https://api.openai.com/v1",
|
||||||
|
resolve_dns=True,
|
||||||
|
)
|
||||||
|
self.assertEqual(first, "https://api.openai.com/v1")
|
||||||
|
self.assertEqual(second, "https://api.openai.com/v1")
|
||||||
|
self.assertEqual(getaddrinfo_mock.call_count, 2)
|
||||||
|
|
||||||
|
|
||||||
|
class ProcessingLogRedactionTests(unittest.TestCase):
|
||||||
|
"""Verifies sensitive processing-log values are redacted for persistence and responses."""
|
||||||
|
|
||||||
|
def test_payload_redacts_sensitive_keys(self) -> None:
|
||||||
|
"""Sensitive payload keys are replaced with redaction marker."""
|
||||||
|
|
||||||
|
sanitized = sanitize_processing_log_payload_value(
|
||||||
|
{
|
||||||
|
"api_key": "secret-value",
|
||||||
|
"nested": {
|
||||||
|
"authorization": "Bearer sample-token",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
)
|
||||||
|
self.assertEqual(sanitized["api_key"], "[REDACTED]")
|
||||||
|
self.assertEqual(sanitized["nested"]["authorization"], "[REDACTED]")
|
||||||
|
|
||||||
|
def test_text_redaction_removes_bearer_and_jwt_values(self) -> None:
|
||||||
|
"""Bearer and JWT token substrings are fully removed from log text."""
|
||||||
|
|
||||||
|
bearer_token = "super-secret-token-123"
|
||||||
|
jwt_token = (
|
||||||
|
"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9."
|
||||||
|
"eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4ifQ."
|
||||||
|
"signaturevalue123456789"
|
||||||
|
)
|
||||||
|
sanitized = sanitize_processing_log_text(
|
||||||
|
f"Authorization: Bearer {bearer_token}\nraw_jwt={jwt_token}"
|
||||||
|
)
|
||||||
|
self.assertIsNotNone(sanitized)
|
||||||
|
sanitized_text = sanitized or ""
|
||||||
|
self.assertIn("[REDACTED]", sanitized_text)
|
||||||
|
self.assertNotIn(bearer_token, sanitized_text)
|
||||||
|
self.assertNotIn(jwt_token, sanitized_text)
|
||||||
|
|
||||||
|
def test_text_redaction_removes_json_formatted_secret_values(self) -> None:
|
||||||
|
"""JSON-formatted quoted secrets are fully removed from redacted log text."""
|
||||||
|
|
||||||
|
api_key_secret = "json-api-key-secret"
|
||||||
|
token_secret = "json-token-secret"
|
||||||
|
authorization_secret = "json-auth-secret"
|
||||||
|
bearer_secret = "json-bearer-secret"
|
||||||
|
json_text = (
|
||||||
|
"{"
|
||||||
|
f"\"api_key\":\"{api_key_secret}\","
|
||||||
|
f"\"token\":\"{token_secret}\","
|
||||||
|
f"\"authorization\":\"Bearer {authorization_secret}\","
|
||||||
|
f"\"bearer\":\"{bearer_secret}\""
|
||||||
|
"}"
|
||||||
|
)
|
||||||
|
sanitized = sanitize_processing_log_text(json_text)
|
||||||
|
self.assertIsNotNone(sanitized)
|
||||||
|
sanitized_text = sanitized or ""
|
||||||
|
self.assertIn("[REDACTED]", sanitized_text)
|
||||||
|
self.assertNotIn(api_key_secret, sanitized_text)
|
||||||
|
self.assertNotIn(token_secret, sanitized_text)
|
||||||
|
self.assertNotIn(authorization_secret, sanitized_text)
|
||||||
|
self.assertNotIn(bearer_secret, sanitized_text)
|
||||||
|
|
||||||
|
def test_response_schema_applies_redaction_to_existing_entries(self) -> None:
|
||||||
|
"""API schema validators redact sensitive fields from legacy stored rows."""
|
||||||
|
|
||||||
|
bearer_token = "abc123token"
|
||||||
|
jwt_token = (
|
||||||
|
"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9."
|
||||||
|
"eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4ifQ."
|
||||||
|
"signaturevalue123456789"
|
||||||
|
)
|
||||||
|
response = ProcessingLogEntryResponse.model_validate(
|
||||||
|
{
|
||||||
|
"id": 1,
|
||||||
|
"created_at": datetime.now(UTC),
|
||||||
|
"level": "info",
|
||||||
|
"stage": "summary",
|
||||||
|
"event": "response",
|
||||||
|
"document_id": None,
|
||||||
|
"document_filename": "sample.txt",
|
||||||
|
"provider_id": "provider",
|
||||||
|
"model_name": "model",
|
||||||
|
"prompt_text": f"Authorization: Bearer {bearer_token}",
|
||||||
|
"response_text": f"token={jwt_token}",
|
||||||
|
"payload_json": {"password": "secret", "trace_id": "trace-1"},
|
||||||
|
}
|
||||||
|
)
|
||||||
|
self.assertEqual(response.payload_json["password"], "[REDACTED]")
|
||||||
|
self.assertIn("[REDACTED]", response.prompt_text or "")
|
||||||
|
self.assertIn("[REDACTED]", response.response_text or "")
|
||||||
|
self.assertNotIn(bearer_token, response.prompt_text or "")
|
||||||
|
self.assertNotIn(jwt_token, response.response_text or "")
|
||||||
|
|
||||||
|
def test_response_schema_redacts_json_formatted_secret_values(self) -> None:
|
||||||
|
"""Response schema redacts quoted JSON secret forms from legacy text fields."""
|
||||||
|
|
||||||
|
api_key_secret = "legacy-json-api-key"
|
||||||
|
token_secret = "legacy-json-token"
|
||||||
|
authorization_secret = "legacy-json-auth"
|
||||||
|
bearer_secret = "legacy-json-bearer"
|
||||||
|
prompt_text = (
|
||||||
|
"{"
|
||||||
|
f"\"api_key\":\"{api_key_secret}\","
|
||||||
|
f"\"token\":\"{token_secret}\""
|
||||||
|
"}"
|
||||||
|
)
|
||||||
|
response_text = (
|
||||||
|
"{"
|
||||||
|
f"\"authorization\":\"Bearer {authorization_secret}\","
|
||||||
|
f"\"bearer\":\"{bearer_secret}\""
|
||||||
|
"}"
|
||||||
|
)
|
||||||
|
|
||||||
|
response = ProcessingLogEntryResponse.model_validate(
|
||||||
|
{
|
||||||
|
"id": 2,
|
||||||
|
"created_at": datetime.now(UTC),
|
||||||
|
"level": "info",
|
||||||
|
"stage": "summary",
|
||||||
|
"event": "response",
|
||||||
|
"document_id": None,
|
||||||
|
"document_filename": "sample-json.txt",
|
||||||
|
"provider_id": "provider",
|
||||||
|
"model_name": "model",
|
||||||
|
"prompt_text": prompt_text,
|
||||||
|
"response_text": response_text,
|
||||||
|
"payload_json": {"trace_id": "trace-2"},
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
self.assertIn("[REDACTED]", response.prompt_text or "")
|
||||||
|
self.assertIn("[REDACTED]", response.response_text or "")
|
||||||
|
self.assertNotIn(api_key_secret, response.prompt_text or "")
|
||||||
|
self.assertNotIn(token_secret, response.prompt_text or "")
|
||||||
|
self.assertNotIn(authorization_secret, response.response_text or "")
|
||||||
|
self.assertNotIn(bearer_secret, response.response_text or "")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
unittest.main()
|
||||||
270
backend/tests/test_upload_request_size_middleware.py
Normal file
270
backend/tests/test_upload_request_size_middleware.py
Normal file
@@ -0,0 +1,270 @@
|
|||||||
|
"""Regression tests for upload request-size middleware scope and preflight handling."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import importlib
|
||||||
|
import sys
|
||||||
|
import unittest
|
||||||
|
from pathlib import Path
|
||||||
|
from types import ModuleType, SimpleNamespace
|
||||||
|
from typing import Any, Awaitable, Callable
|
||||||
|
|
||||||
|
|
||||||
|
BACKEND_ROOT = Path(__file__).resolve().parents[1]
|
||||||
|
if str(BACKEND_ROOT) not in sys.path:
|
||||||
|
sys.path.insert(0, str(BACKEND_ROOT))
|
||||||
|
|
||||||
|
|
||||||
|
def _install_main_import_stubs() -> dict[str, ModuleType | None]:
|
||||||
|
"""Installs lightweight module stubs required for importing app.main in isolation."""
|
||||||
|
|
||||||
|
previous_modules: dict[str, ModuleType | None] = {
|
||||||
|
name: sys.modules.get(name)
|
||||||
|
for name in [
|
||||||
|
"fastapi",
|
||||||
|
"fastapi.middleware",
|
||||||
|
"fastapi.middleware.cors",
|
||||||
|
"fastapi.responses",
|
||||||
|
"app.api.router",
|
||||||
|
"app.core.config",
|
||||||
|
"app.db.base",
|
||||||
|
"app.services.app_settings",
|
||||||
|
"app.services.handwriting_style",
|
||||||
|
"app.services.storage",
|
||||||
|
"app.services.typesense_index",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
fastapi_stub = ModuleType("fastapi")
|
||||||
|
|
||||||
|
class _Response:
|
||||||
|
"""Minimal response base class for middleware typing compatibility."""
|
||||||
|
|
||||||
|
class _FastAPI:
|
||||||
|
"""Captures middleware registration behavior used by app.main tests."""
|
||||||
|
|
||||||
|
def __init__(self, *_args: object, **_kwargs: object) -> None:
|
||||||
|
self.http_middlewares: list[Any] = []
|
||||||
|
|
||||||
|
def add_middleware(self, *_args: object, **_kwargs: object) -> None:
|
||||||
|
"""Accepts middleware registrations without side effects."""
|
||||||
|
|
||||||
|
def include_router(self, *_args: object, **_kwargs: object) -> None:
|
||||||
|
"""Accepts router registration without side effects."""
|
||||||
|
|
||||||
|
def middleware(
|
||||||
|
self,
|
||||||
|
middleware_type: str,
|
||||||
|
) -> Callable[[Callable[..., Any]], Callable[..., Any]]:
|
||||||
|
"""Registers request middleware functions for later invocation in tests."""
|
||||||
|
|
||||||
|
def decorator(func: Callable[..., Any]) -> Callable[..., Any]:
|
||||||
|
if middleware_type == "http":
|
||||||
|
self.http_middlewares.append(func)
|
||||||
|
return func
|
||||||
|
|
||||||
|
return decorator
|
||||||
|
|
||||||
|
def on_event(
|
||||||
|
self,
|
||||||
|
*_args: object,
|
||||||
|
**_kwargs: object,
|
||||||
|
) -> Callable[[Callable[..., Any]], Callable[..., Any]]:
|
||||||
|
"""Returns no-op startup and shutdown decorators."""
|
||||||
|
|
||||||
|
def decorator(func: Callable[..., Any]) -> Callable[..., Any]:
|
||||||
|
return func
|
||||||
|
|
||||||
|
return decorator
|
||||||
|
|
||||||
|
fastapi_stub.FastAPI = _FastAPI
|
||||||
|
fastapi_stub.Request = object
|
||||||
|
fastapi_stub.Response = _Response
|
||||||
|
sys.modules["fastapi"] = fastapi_stub
|
||||||
|
|
||||||
|
fastapi_middleware_stub = ModuleType("fastapi.middleware")
|
||||||
|
sys.modules["fastapi.middleware"] = fastapi_middleware_stub
|
||||||
|
|
||||||
|
fastapi_middleware_cors_stub = ModuleType("fastapi.middleware.cors")
|
||||||
|
|
||||||
|
class _CORSMiddleware:
|
||||||
|
"""Placeholder CORS middleware class accepted by FastAPI.add_middleware."""
|
||||||
|
|
||||||
|
fastapi_middleware_cors_stub.CORSMiddleware = _CORSMiddleware
|
||||||
|
sys.modules["fastapi.middleware.cors"] = fastapi_middleware_cors_stub
|
||||||
|
|
||||||
|
fastapi_responses_stub = ModuleType("fastapi.responses")
|
||||||
|
|
||||||
|
class _JSONResponse:
|
||||||
|
"""Simple JSONResponse stand-in exposing status code and payload fields."""
|
||||||
|
|
||||||
|
def __init__(self, *, status_code: int, content: dict[str, Any]) -> None:
|
||||||
|
self.status_code = status_code
|
||||||
|
self.content = content
|
||||||
|
|
||||||
|
fastapi_responses_stub.JSONResponse = _JSONResponse
|
||||||
|
sys.modules["fastapi.responses"] = fastapi_responses_stub
|
||||||
|
|
||||||
|
api_router_stub = ModuleType("app.api.router")
|
||||||
|
api_router_stub.api_router = object()
|
||||||
|
sys.modules["app.api.router"] = api_router_stub
|
||||||
|
|
||||||
|
config_stub = ModuleType("app.core.config")
|
||||||
|
|
||||||
|
def get_settings() -> SimpleNamespace:
|
||||||
|
"""Returns minimal settings consumed by app.main during test import."""
|
||||||
|
|
||||||
|
return SimpleNamespace(
|
||||||
|
cors_origins=["http://localhost:5173"],
|
||||||
|
max_upload_request_size_bytes=1024,
|
||||||
|
)
|
||||||
|
|
||||||
|
config_stub.get_settings = get_settings
|
||||||
|
sys.modules["app.core.config"] = config_stub
|
||||||
|
|
||||||
|
db_base_stub = ModuleType("app.db.base")
|
||||||
|
|
||||||
|
def init_db() -> None:
|
||||||
|
"""No-op database initializer for middleware scope tests."""
|
||||||
|
|
||||||
|
db_base_stub.init_db = init_db
|
||||||
|
sys.modules["app.db.base"] = db_base_stub
|
||||||
|
|
||||||
|
app_settings_stub = ModuleType("app.services.app_settings")
|
||||||
|
|
||||||
|
def ensure_app_settings() -> None:
|
||||||
|
"""No-op settings initializer for middleware scope tests."""
|
||||||
|
|
||||||
|
app_settings_stub.ensure_app_settings = ensure_app_settings
|
||||||
|
sys.modules["app.services.app_settings"] = app_settings_stub
|
||||||
|
|
||||||
|
handwriting_style_stub = ModuleType("app.services.handwriting_style")
|
||||||
|
|
||||||
|
def ensure_handwriting_style_collection() -> None:
|
||||||
|
"""No-op handwriting collection initializer for middleware scope tests."""
|
||||||
|
|
||||||
|
handwriting_style_stub.ensure_handwriting_style_collection = ensure_handwriting_style_collection
|
||||||
|
sys.modules["app.services.handwriting_style"] = handwriting_style_stub
|
||||||
|
|
||||||
|
storage_stub = ModuleType("app.services.storage")
|
||||||
|
|
||||||
|
def ensure_storage() -> None:
|
||||||
|
"""No-op storage initializer for middleware scope tests."""
|
||||||
|
|
||||||
|
storage_stub.ensure_storage = ensure_storage
|
||||||
|
sys.modules["app.services.storage"] = storage_stub
|
||||||
|
|
||||||
|
typesense_stub = ModuleType("app.services.typesense_index")
|
||||||
|
|
||||||
|
def ensure_typesense_collection() -> None:
|
||||||
|
"""No-op Typesense collection initializer for middleware scope tests."""
|
||||||
|
|
||||||
|
typesense_stub.ensure_typesense_collection = ensure_typesense_collection
|
||||||
|
sys.modules["app.services.typesense_index"] = typesense_stub
|
||||||
|
|
||||||
|
return previous_modules
|
||||||
|
|
||||||
|
|
||||||
|
def _restore_main_import_stubs(previous_modules: dict[str, ModuleType | None]) -> None:
|
||||||
|
"""Restores module table entries captured before installing app.main test stubs."""
|
||||||
|
|
||||||
|
for module_name, previous in previous_modules.items():
|
||||||
|
if previous is None:
|
||||||
|
sys.modules.pop(module_name, None)
|
||||||
|
else:
|
||||||
|
sys.modules[module_name] = previous
|
||||||
|
|
||||||
|
|
||||||
|
class UploadRequestSizeMiddlewareTests(unittest.IsolatedAsyncioTestCase):
|
||||||
|
"""Verifies upload request-size middleware ignores preflight and guards only upload POST."""
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def setUpClass(cls) -> None:
|
||||||
|
"""Installs import stubs and imports app.main once for middleware extraction."""
|
||||||
|
|
||||||
|
cls._previous_modules = _install_main_import_stubs()
|
||||||
|
cls.main_module = importlib.import_module("app.main")
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def tearDownClass(cls) -> None:
|
||||||
|
"""Removes imported module and restores pre-existing module table entries."""
|
||||||
|
|
||||||
|
sys.modules.pop("app.main", None)
|
||||||
|
_restore_main_import_stubs(cls._previous_modules)
|
||||||
|
|
||||||
|
def _http_middleware(
|
||||||
|
self,
|
||||||
|
) -> Callable[[object, Callable[[object], Awaitable[object]]], Awaitable[object]]:
|
||||||
|
"""Returns the registered HTTP middleware callable from the stubbed FastAPI app."""
|
||||||
|
|
||||||
|
return self.main_module.app.http_middlewares[0]
|
||||||
|
|
||||||
|
async def test_options_preflight_skips_upload_content_length_guard(self) -> None:
|
||||||
|
"""OPTIONS preflight requests for upload endpoint continue without Content-Length enforcement."""
|
||||||
|
|
||||||
|
request = SimpleNamespace(
|
||||||
|
method="OPTIONS",
|
||||||
|
url=SimpleNamespace(path="/api/v1/documents/upload"),
|
||||||
|
headers={},
|
||||||
|
)
|
||||||
|
expected_response = object()
|
||||||
|
call_next_count = 0
|
||||||
|
|
||||||
|
async def call_next(_request: object) -> object:
|
||||||
|
nonlocal call_next_count
|
||||||
|
call_next_count += 1
|
||||||
|
return expected_response
|
||||||
|
|
||||||
|
response = await self._http_middleware()(request, call_next)
|
||||||
|
|
||||||
|
self.assertIs(response, expected_response)
|
||||||
|
self.assertEqual(call_next_count, 1)
|
||||||
|
|
||||||
|
async def test_post_upload_without_content_length_is_rejected(self) -> None:
|
||||||
|
"""Upload POST requests remain blocked when Content-Length is absent."""
|
||||||
|
|
||||||
|
request = SimpleNamespace(
|
||||||
|
method="POST",
|
||||||
|
url=SimpleNamespace(path="/api/v1/documents/upload"),
|
||||||
|
headers={},
|
||||||
|
)
|
||||||
|
call_next_count = 0
|
||||||
|
|
||||||
|
async def call_next(_request: object) -> object:
|
||||||
|
nonlocal call_next_count
|
||||||
|
call_next_count += 1
|
||||||
|
return object()
|
||||||
|
|
||||||
|
response = await self._http_middleware()(request, call_next)
|
||||||
|
|
||||||
|
self.assertEqual(response.status_code, 411)
|
||||||
|
self.assertEqual(
|
||||||
|
response.content,
|
||||||
|
{"detail": "Content-Length header is required for document uploads"},
|
||||||
|
)
|
||||||
|
self.assertEqual(call_next_count, 0)
|
||||||
|
|
||||||
|
async def test_post_non_upload_path_skips_upload_content_length_guard(self) -> None:
|
||||||
|
"""Content-Length enforcement does not run for non-upload POST requests."""
|
||||||
|
|
||||||
|
request = SimpleNamespace(
|
||||||
|
method="POST",
|
||||||
|
url=SimpleNamespace(path="/api/v1/documents"),
|
||||||
|
headers={},
|
||||||
|
)
|
||||||
|
expected_response = object()
|
||||||
|
call_next_count = 0
|
||||||
|
|
||||||
|
async def call_next(_request: object) -> object:
|
||||||
|
nonlocal call_next_count
|
||||||
|
call_next_count += 1
|
||||||
|
return expected_response
|
||||||
|
|
||||||
|
response = await self._http_middleware()(request, call_next)
|
||||||
|
|
||||||
|
self.assertIs(response, expected_response)
|
||||||
|
self.assertEqual(call_next_count, 1)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
unittest.main()
|
||||||
@@ -6,7 +6,7 @@ This directory contains technical documentation for DMS.
|
|||||||
|
|
||||||
- `../README.md` - project overview, setup, and quick operations
|
- `../README.md` - project overview, setup, and quick operations
|
||||||
- `architecture-overview.md` - backend, frontend, and infrastructure architecture
|
- `architecture-overview.md` - backend, frontend, and infrastructure architecture
|
||||||
- `api-contract.md` - API endpoint contract grouped by route module, including settings and processing-log trim defaults
|
- `api-contract.md` - API endpoint contract grouped by route module, including token auth roles, upload limits, and settings or processing-log security constraints
|
||||||
- `data-model-reference.md` - database entity definitions and lifecycle states
|
- `data-model-reference.md` - database entity definitions and lifecycle states
|
||||||
- `operations-and-configuration.md` - runtime operations, ports, volumes, and persisted settings configuration
|
- `operations-and-configuration.md` - runtime operations, hardened compose defaults, security environment variables, and persisted settings configuration and read-sanitization behavior
|
||||||
- `frontend-design-foundation.md` - frontend visual system, tokens, UI implementation rules, processing-log timeline behavior, and settings helper-copy guidance
|
- `frontend-design-foundation.md` - frontend visual system, tokens, UI implementation rules, authenticated media delivery under API token auth, processing-log timeline behavior, and settings helper-copy guidance
|
||||||
|
|||||||
@@ -10,6 +10,17 @@ Primary implementation modules:
|
|||||||
- `backend/app/api/routes_processing_logs.py`
|
- `backend/app/api/routes_processing_logs.py`
|
||||||
- `backend/app/api/routes_settings.py`
|
- `backend/app/api/routes_settings.py`
|
||||||
|
|
||||||
|
## Authentication And Authorization
|
||||||
|
|
||||||
|
- Protected endpoints require `Authorization: Bearer <token>`.
|
||||||
|
- `ADMIN_API_TOKEN` is required for all privileged access and acts as fail-closed root credential.
|
||||||
|
- `USER_API_TOKEN` is optional and, when configured, grants access to document endpoints only.
|
||||||
|
- Authorization matrix:
|
||||||
|
- `documents/*`: `admin` or `user`
|
||||||
|
- `search/*`: `admin` or `user`
|
||||||
|
- `settings/*`: `admin` only
|
||||||
|
- `processing/logs/*`: `admin` only
|
||||||
|
|
||||||
## Health
|
## Health
|
||||||
|
|
||||||
- `GET /health`
|
- `GET /health`
|
||||||
@@ -18,6 +29,8 @@ Primary implementation modules:
|
|||||||
|
|
||||||
## Documents
|
## Documents
|
||||||
|
|
||||||
|
- Access: admin or user token required
|
||||||
|
|
||||||
### Collection and metadata helpers
|
### Collection and metadata helpers
|
||||||
|
|
||||||
- `GET /documents`
|
- `GET /documents`
|
||||||
@@ -76,9 +89,14 @@ Primary implementation modules:
|
|||||||
- `ask`: returns `conflicts` if duplicate checksum is detected
|
- `ask`: returns `conflicts` if duplicate checksum is detected
|
||||||
- `replace`: creates new document linked to replaced document id
|
- `replace`: creates new document linked to replaced document id
|
||||||
- `duplicate`: creates additional document record
|
- `duplicate`: creates additional document record
|
||||||
|
- upload `POST` request rejected with `411` when `Content-Length` is missing
|
||||||
|
- `OPTIONS /documents/upload` CORS preflight bypasses upload `Content-Length` enforcement
|
||||||
|
- request rejected with `413` when file count, per-file size, or total request size exceeds configured limits
|
||||||
|
|
||||||
## Search
|
## Search
|
||||||
|
|
||||||
|
- Access: admin or user token required
|
||||||
|
|
||||||
- `GET /search`
|
- `GET /search`
|
||||||
- Query: `query` (min length 2), `offset`, `limit`, `include_trashed`, `only_trashed`, `path_filter`, `tag_filter`, `type_filter`, `processed_from`, `processed_to`
|
- Query: `query` (min length 2), `offset`, `limit`, `include_trashed`, `only_trashed`, `path_filter`, `tag_filter`, `type_filter`, `processed_from`, `processed_to`
|
||||||
- Response model: `SearchResponse`
|
- Response model: `SearchResponse`
|
||||||
@@ -86,23 +104,32 @@ Primary implementation modules:
|
|||||||
|
|
||||||
## Processing Logs
|
## Processing Logs
|
||||||
|
|
||||||
|
- Access: admin token required
|
||||||
|
|
||||||
- `GET /processing/logs`
|
- `GET /processing/logs`
|
||||||
- Query: `offset`, `limit`, `document_id`
|
- Query: `offset`, `limit`, `document_id`
|
||||||
- Response model: `ProcessingLogListResponse`
|
- Response model: `ProcessingLogListResponse`
|
||||||
|
- `limit` is capped by runtime configuration
|
||||||
|
- sensitive fields are redacted in API responses
|
||||||
- `POST /processing/logs/trim`
|
- `POST /processing/logs/trim`
|
||||||
- Query: optional `keep_document_sessions`, `keep_unbound_entries`
|
- Query: optional `keep_document_sessions`, `keep_unbound_entries`
|
||||||
- Behavior: omitted query values fall back to persisted `/settings.processing_log_retention`
|
- Behavior: omitted query values fall back to persisted `/settings.processing_log_retention`
|
||||||
|
- query values are capped by runtime retention limits
|
||||||
- Response: trim counters
|
- Response: trim counters
|
||||||
- `POST /processing/logs/clear`
|
- `POST /processing/logs/clear`
|
||||||
- Response: clear counters
|
- Response: clear counters
|
||||||
|
|
||||||
## Settings
|
## Settings
|
||||||
|
|
||||||
|
- Access: admin token required
|
||||||
|
|
||||||
- `GET /settings`
|
- `GET /settings`
|
||||||
- Response model: `AppSettingsResponse`
|
- Response model: `AppSettingsResponse`
|
||||||
|
- persisted providers with invalid base URLs are ignored during read sanitization; response falls back to remaining valid providers or secure defaults
|
||||||
- `PATCH /settings`
|
- `PATCH /settings`
|
||||||
- Body model: `AppSettingsUpdateRequest`
|
- Body model: `AppSettingsUpdateRequest`
|
||||||
- Response model: `AppSettingsResponse`
|
- Response model: `AppSettingsResponse`
|
||||||
|
- rejects invalid provider base URLs with `400` when scheme, allowlist, or network safety checks fail
|
||||||
- `POST /settings/reset`
|
- `POST /settings/reset`
|
||||||
- Response model: `AppSettingsResponse`
|
- Response model: `AppSettingsResponse`
|
||||||
- `PATCH /settings/handwriting`
|
- `PATCH /settings/handwriting`
|
||||||
|
|||||||
@@ -49,6 +49,13 @@ Do not hardcode new palette or spacing values in component styles when a token a
|
|||||||
- Do not render queued headers before their animation starts, even when polling returns batched updates.
|
- Do not render queued headers before their animation starts, even when polling returns batched updates.
|
||||||
- Preserve existing header content format and fold/unfold detail behavior as lines are revealed.
|
- Preserve existing header content format and fold/unfold detail behavior as lines are revealed.
|
||||||
|
|
||||||
|
## Authenticated Media Delivery
|
||||||
|
|
||||||
|
- Document previews and thumbnails must load through authenticated fetch flows in `frontend/src/lib/api.ts`, then render via temporary object URLs.
|
||||||
|
- Direct `window.open` calls for protected media endpoints are not allowed because browser navigation requests do not include the API token header.
|
||||||
|
- Download actions for original files and markdown exports must use authenticated blob fetches plus controlled browser download triggers.
|
||||||
|
- Revoke all temporary object URLs after replacement, unmount, or completion to prevent browser memory leaks.
|
||||||
|
|
||||||
## Extension Checklist
|
## Extension Checklist
|
||||||
|
|
||||||
When adding or redesigning a UI area:
|
When adding or redesigning a UI area:
|
||||||
|
|||||||
@@ -3,12 +3,12 @@
|
|||||||
## Runtime Services
|
## Runtime Services
|
||||||
|
|
||||||
`docker-compose.yml` defines the runtime stack:
|
`docker-compose.yml` defines the runtime stack:
|
||||||
- `db` (Postgres 16, port `5432`)
|
- `db` (Postgres 16, localhost-bound port `5432`)
|
||||||
- `redis` (Redis 7, port `6379`)
|
- `redis` (Redis 7, localhost-bound port `6379`)
|
||||||
- `typesense` (Typesense 29, port `8108`)
|
- `typesense` (Typesense 29, localhost-bound port `8108`)
|
||||||
- `api` (FastAPI backend, port `8000`)
|
- `api` (FastAPI backend, localhost-bound port `8000`)
|
||||||
- `worker` (RQ background worker)
|
- `worker` (RQ background worker)
|
||||||
- `frontend` (Vite UI, port `5173`)
|
- `frontend` (Vite UI, localhost-bound port `5173`)
|
||||||
|
|
||||||
## Named Volumes
|
## Named Volumes
|
||||||
|
|
||||||
@@ -44,6 +44,15 @@ Tail logs:
|
|||||||
docker compose logs -f
|
docker compose logs -f
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Before running compose, provide explicit API tokens in your shell or project `.env` file:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export ADMIN_API_TOKEN="<random-admin-token>"
|
||||||
|
export USER_API_TOKEN="<random-user-token>"
|
||||||
|
```
|
||||||
|
|
||||||
|
Compose now fails fast if either token variable is missing.
|
||||||
|
|
||||||
## Backend Configuration
|
## Backend Configuration
|
||||||
|
|
||||||
Settings source:
|
Settings source:
|
||||||
@@ -55,8 +64,13 @@ Key environment variables used by `api` and `worker` in compose:
|
|||||||
- `DATABASE_URL`
|
- `DATABASE_URL`
|
||||||
- `REDIS_URL`
|
- `REDIS_URL`
|
||||||
- `STORAGE_ROOT`
|
- `STORAGE_ROOT`
|
||||||
|
- `ADMIN_API_TOKEN`
|
||||||
|
- `USER_API_TOKEN`
|
||||||
- `PUBLIC_BASE_URL`
|
- `PUBLIC_BASE_URL`
|
||||||
- `CORS_ORIGINS` (API service)
|
- `CORS_ORIGINS` (API service)
|
||||||
|
- `PROVIDER_BASE_URL_ALLOWLIST`
|
||||||
|
- `PROVIDER_BASE_URL_ALLOW_HTTP`
|
||||||
|
- `PROVIDER_BASE_URL_ALLOW_PRIVATE_NETWORK`
|
||||||
- `TYPESENSE_PROTOCOL`
|
- `TYPESENSE_PROTOCOL`
|
||||||
- `TYPESENSE_HOST`
|
- `TYPESENSE_HOST`
|
||||||
- `TYPESENSE_PORT`
|
- `TYPESENSE_PORT`
|
||||||
@@ -65,9 +79,17 @@ Key environment variables used by `api` and `worker` in compose:
|
|||||||
|
|
||||||
Selected defaults from `Settings` (`backend/app/core/config.py`):
|
Selected defaults from `Settings` (`backend/app/core/config.py`):
|
||||||
- `upload_chunk_size = 4194304`
|
- `upload_chunk_size = 4194304`
|
||||||
|
- `max_upload_files_per_request = 50`
|
||||||
|
- `max_upload_file_size_bytes = 26214400`
|
||||||
|
- `max_upload_request_size_bytes = 104857600`
|
||||||
- `max_zip_members = 250`
|
- `max_zip_members = 250`
|
||||||
- `max_zip_depth = 2`
|
- `max_zip_depth = 2`
|
||||||
|
- `max_zip_member_uncompressed_bytes = 26214400`
|
||||||
|
- `max_zip_total_uncompressed_bytes = 157286400`
|
||||||
|
- `max_zip_compression_ratio = 120.0`
|
||||||
- `max_text_length = 500000`
|
- `max_text_length = 500000`
|
||||||
|
- `processing_log_max_document_sessions = 20`
|
||||||
|
- `processing_log_max_unbound_entries = 400`
|
||||||
- `default_openai_model = "gpt-4.1-mini"`
|
- `default_openai_model = "gpt-4.1-mini"`
|
||||||
- `default_openai_timeout_seconds = 45`
|
- `default_openai_timeout_seconds = 45`
|
||||||
- `default_summary_model = "gpt-4.1-mini"`
|
- `default_summary_model = "gpt-4.1-mini"`
|
||||||
@@ -79,6 +101,15 @@ Selected defaults from `Settings` (`backend/app/core/config.py`):
|
|||||||
|
|
||||||
Frontend runtime API target:
|
Frontend runtime API target:
|
||||||
- `VITE_API_BASE` in `docker-compose.yml` frontend service
|
- `VITE_API_BASE` in `docker-compose.yml` frontend service
|
||||||
|
- `VITE_API_TOKEN` in `docker-compose.yml` frontend service (defaults to `USER_API_TOKEN` in compose, override to `ADMIN_API_TOKEN` when admin-only routes are needed)
|
||||||
|
|
||||||
|
Frontend API authentication behavior:
|
||||||
|
- `frontend/src/lib/api.ts` adds `Authorization: Bearer <VITE_API_TOKEN>` for all API requests only when `VITE_API_TOKEN` is non-empty
|
||||||
|
- requests are still sent without authorization when `VITE_API_TOKEN` is unset, which keeps unauthenticated endpoints such as `/api/v1/health` backward-compatible
|
||||||
|
|
||||||
|
Frontend container runtime behavior:
|
||||||
|
- the container runs as non-root `node`
|
||||||
|
- `/app` is owned by `node` in `frontend/Dockerfile` so Vite can create runtime temp config files under `/app`
|
||||||
|
|
||||||
Frontend local commands:
|
Frontend local commands:
|
||||||
|
|
||||||
@@ -103,8 +134,30 @@ Settings include:
|
|||||||
- predefined paths and tags
|
- predefined paths and tags
|
||||||
- handwriting-style clustering settings
|
- handwriting-style clustering settings
|
||||||
|
|
||||||
|
Read sanitization is resilient to corrupt persisted provider rows. If a persisted provider entry fails URL validation, the entry is skipped and defaults are used when no valid provider remains. This prevents unrelated read endpoints from failing due to stale invalid provider data.
|
||||||
|
|
||||||
Retention settings are used by worker cleanup and by `POST /api/v1/processing/logs/trim` when trim query values are not provided.
|
Retention settings are used by worker cleanup and by `POST /api/v1/processing/logs/trim` when trim query values are not provided.
|
||||||
|
|
||||||
|
## Security Controls
|
||||||
|
|
||||||
|
- Privileged APIs are token-gated with bearer auth:
|
||||||
|
- `documents` endpoints: user token or admin token
|
||||||
|
- `settings` and `processing/logs` endpoints: admin token only
|
||||||
|
- Authentication fails closed when `ADMIN_API_TOKEN` is not configured.
|
||||||
|
- Provider base URLs are validated on settings updates and before outbound model calls:
|
||||||
|
- allowlist enforcement (`PROVIDER_BASE_URL_ALLOWLIST`)
|
||||||
|
- scheme restrictions (`https` by default)
|
||||||
|
- local/private-network blocking and per-request DNS revalidation checks for outbound runtime calls
|
||||||
|
- Upload and archive safety guards are enforced:
|
||||||
|
- `POST /api/v1/documents/upload` requires `Content-Length` and enforces file-count, per-file size, and total request size limits
|
||||||
|
- `OPTIONS /api/v1/documents/upload` CORS preflight is excluded from `Content-Length` enforcement
|
||||||
|
- ZIP member count, per-member uncompressed size, total decompressed size, and compression-ratio guards
|
||||||
|
- Processing logs redact sensitive payload and text fields, and trim endpoints enforce retention caps from runtime config.
|
||||||
|
- Compose hardening defaults:
|
||||||
|
- host ports bind to `127.0.0.1` unless `HOST_BIND_IP` override is set
|
||||||
|
- `api`, `worker`, and `frontend` drop all Linux capabilities and set `no-new-privileges`
|
||||||
|
- backend and frontend containers run as non-root users by default
|
||||||
|
|
||||||
## Validation Checklist
|
## Validation Checklist
|
||||||
|
|
||||||
After operational or configuration changes, verify:
|
After operational or configuration changes, verify:
|
||||||
|
|||||||
@@ -6,7 +6,7 @@ services:
|
|||||||
POSTGRES_PASSWORD: dcm
|
POSTGRES_PASSWORD: dcm
|
||||||
POSTGRES_DB: dcm
|
POSTGRES_DB: dcm
|
||||||
ports:
|
ports:
|
||||||
- "5432:5432"
|
- "${HOST_BIND_IP:-127.0.0.1}:5432:5432"
|
||||||
volumes:
|
volumes:
|
||||||
- db-data:/var/lib/postgresql/data
|
- db-data:/var/lib/postgresql/data
|
||||||
healthcheck:
|
healthcheck:
|
||||||
@@ -18,7 +18,7 @@ services:
|
|||||||
redis:
|
redis:
|
||||||
image: redis:7-alpine
|
image: redis:7-alpine
|
||||||
ports:
|
ports:
|
||||||
- "6379:6379"
|
- "${HOST_BIND_IP:-127.0.0.1}:6379:6379"
|
||||||
volumes:
|
volumes:
|
||||||
- redis-data:/data
|
- redis-data:/data
|
||||||
|
|
||||||
@@ -29,7 +29,7 @@ services:
|
|||||||
- "--api-key=dcm-typesense-key"
|
- "--api-key=dcm-typesense-key"
|
||||||
- "--enable-cors"
|
- "--enable-cors"
|
||||||
ports:
|
ports:
|
||||||
- "8108:8108"
|
- "${HOST_BIND_IP:-127.0.0.1}:8108:8108"
|
||||||
volumes:
|
volumes:
|
||||||
- typesense-data:/data
|
- typesense-data:/data
|
||||||
|
|
||||||
@@ -41,16 +41,25 @@ services:
|
|||||||
DATABASE_URL: postgresql+psycopg://dcm:dcm@db:5432/dcm
|
DATABASE_URL: postgresql+psycopg://dcm:dcm@db:5432/dcm
|
||||||
REDIS_URL: redis://redis:6379/0
|
REDIS_URL: redis://redis:6379/0
|
||||||
STORAGE_ROOT: /data/storage
|
STORAGE_ROOT: /data/storage
|
||||||
|
ADMIN_API_TOKEN: ${ADMIN_API_TOKEN:?ADMIN_API_TOKEN must be set}
|
||||||
|
USER_API_TOKEN: ${USER_API_TOKEN:?USER_API_TOKEN must be set}
|
||||||
|
PROVIDER_BASE_URL_ALLOWLIST: '${PROVIDER_BASE_URL_ALLOWLIST:-["api.openai.com"]}'
|
||||||
|
PROVIDER_BASE_URL_ALLOW_HTTP: ${PROVIDER_BASE_URL_ALLOW_HTTP:-false}
|
||||||
|
PROVIDER_BASE_URL_ALLOW_PRIVATE_NETWORK: ${PROVIDER_BASE_URL_ALLOW_PRIVATE_NETWORK:-false}
|
||||||
OCR_LANGUAGES: eng,deu
|
OCR_LANGUAGES: eng,deu
|
||||||
PUBLIC_BASE_URL: http://192.168.2.5:8000
|
PUBLIC_BASE_URL: ${PUBLIC_BASE_URL:-http://localhost:8000}
|
||||||
CORS_ORIGINS: '["http://localhost:5173","http://localhost:3000","http://192.168.2.5:5173"]'
|
CORS_ORIGINS: '${CORS_ORIGINS:-["http://localhost:5173","http://localhost:3000"]}'
|
||||||
TYPESENSE_PROTOCOL: http
|
TYPESENSE_PROTOCOL: http
|
||||||
TYPESENSE_HOST: typesense
|
TYPESENSE_HOST: typesense
|
||||||
TYPESENSE_PORT: 8108
|
TYPESENSE_PORT: 8108
|
||||||
TYPESENSE_API_KEY: dcm-typesense-key
|
TYPESENSE_API_KEY: dcm-typesense-key
|
||||||
TYPESENSE_COLLECTION_NAME: documents
|
TYPESENSE_COLLECTION_NAME: documents
|
||||||
ports:
|
ports:
|
||||||
- "8000:8000"
|
- "${HOST_BIND_IP:-127.0.0.1}:8000:8000"
|
||||||
|
security_opt:
|
||||||
|
- no-new-privileges:true
|
||||||
|
cap_drop:
|
||||||
|
- ALL
|
||||||
volumes:
|
volumes:
|
||||||
- ./backend/app:/app/app
|
- ./backend/app:/app/app
|
||||||
- dcm-storage:/data
|
- dcm-storage:/data
|
||||||
@@ -71,6 +80,11 @@ services:
|
|||||||
DATABASE_URL: postgresql+psycopg://dcm:dcm@db:5432/dcm
|
DATABASE_URL: postgresql+psycopg://dcm:dcm@db:5432/dcm
|
||||||
REDIS_URL: redis://redis:6379/0
|
REDIS_URL: redis://redis:6379/0
|
||||||
STORAGE_ROOT: /data/storage
|
STORAGE_ROOT: /data/storage
|
||||||
|
ADMIN_API_TOKEN: ${ADMIN_API_TOKEN:?ADMIN_API_TOKEN must be set}
|
||||||
|
USER_API_TOKEN: ${USER_API_TOKEN:?USER_API_TOKEN must be set}
|
||||||
|
PROVIDER_BASE_URL_ALLOWLIST: '${PROVIDER_BASE_URL_ALLOWLIST:-["api.openai.com"]}'
|
||||||
|
PROVIDER_BASE_URL_ALLOW_HTTP: ${PROVIDER_BASE_URL_ALLOW_HTTP:-false}
|
||||||
|
PROVIDER_BASE_URL_ALLOW_PRIVATE_NETWORK: ${PROVIDER_BASE_URL_ALLOW_PRIVATE_NETWORK:-false}
|
||||||
OCR_LANGUAGES: eng,deu
|
OCR_LANGUAGES: eng,deu
|
||||||
PUBLIC_BASE_URL: http://localhost:8000
|
PUBLIC_BASE_URL: http://localhost:8000
|
||||||
TYPESENSE_PROTOCOL: http
|
TYPESENSE_PROTOCOL: http
|
||||||
@@ -81,6 +95,10 @@ services:
|
|||||||
volumes:
|
volumes:
|
||||||
- ./backend/app:/app/app
|
- ./backend/app:/app/app
|
||||||
- dcm-storage:/data
|
- dcm-storage:/data
|
||||||
|
security_opt:
|
||||||
|
- no-new-privileges:true
|
||||||
|
cap_drop:
|
||||||
|
- ALL
|
||||||
depends_on:
|
depends_on:
|
||||||
db:
|
db:
|
||||||
condition: service_healthy
|
condition: service_healthy
|
||||||
@@ -93,9 +111,10 @@ services:
|
|||||||
build:
|
build:
|
||||||
context: ./frontend
|
context: ./frontend
|
||||||
environment:
|
environment:
|
||||||
VITE_API_BASE: http://192.168.2.5:8000/api/v1
|
VITE_API_BASE: ${VITE_API_BASE:-http://localhost:8000/api/v1}
|
||||||
|
VITE_API_TOKEN: ${VITE_API_TOKEN:-${USER_API_TOKEN:-}}
|
||||||
ports:
|
ports:
|
||||||
- "5173:5173"
|
- "${HOST_BIND_IP:-127.0.0.1}:5173:5173"
|
||||||
volumes:
|
volumes:
|
||||||
- ./frontend/src:/app/src
|
- ./frontend/src:/app/src
|
||||||
- ./frontend/index.html:/app/index.html
|
- ./frontend/index.html:/app/index.html
|
||||||
@@ -103,6 +122,10 @@ services:
|
|||||||
depends_on:
|
depends_on:
|
||||||
api:
|
api:
|
||||||
condition: service_started
|
condition: service_started
|
||||||
|
security_opt:
|
||||||
|
- no-new-privileges:true
|
||||||
|
cap_drop:
|
||||||
|
- ALL
|
||||||
|
|
||||||
volumes:
|
volumes:
|
||||||
db-data:
|
db-data:
|
||||||
|
|||||||
@@ -3,14 +3,18 @@ FROM node:22-alpine
|
|||||||
WORKDIR /app
|
WORKDIR /app
|
||||||
|
|
||||||
COPY package.json /app/package.json
|
COPY package.json /app/package.json
|
||||||
RUN npm install
|
COPY package-lock.json /app/package-lock.json
|
||||||
|
RUN npm ci
|
||||||
|
RUN chown -R node:node /app
|
||||||
|
|
||||||
COPY tsconfig.json /app/tsconfig.json
|
COPY --chown=node:node tsconfig.json /app/tsconfig.json
|
||||||
COPY tsconfig.node.json /app/tsconfig.node.json
|
COPY --chown=node:node tsconfig.node.json /app/tsconfig.node.json
|
||||||
COPY vite.config.ts /app/vite.config.ts
|
COPY --chown=node:node vite.config.ts /app/vite.config.ts
|
||||||
COPY index.html /app/index.html
|
COPY --chown=node:node index.html /app/index.html
|
||||||
COPY src /app/src
|
COPY --chown=node:node src /app/src
|
||||||
|
|
||||||
EXPOSE 5173
|
EXPOSE 5173
|
||||||
|
|
||||||
|
USER node
|
||||||
|
|
||||||
CMD ["npm", "run", "dev", "--", "--host", "0.0.0.0", "--port", "5173"]
|
CMD ["npm", "run", "dev", "--", "--host", "0.0.0.0", "--port", "5173"]
|
||||||
|
|||||||
@@ -5,6 +5,7 @@
|
|||||||
"type": "module",
|
"type": "module",
|
||||||
"scripts": {
|
"scripts": {
|
||||||
"dev": "vite",
|
"dev": "vite",
|
||||||
|
"test": "node --experimental-strip-types src/lib/api.test.ts",
|
||||||
"build": "tsc -b && vite build",
|
"build": "tsc -b && vite build",
|
||||||
"preview": "vite preview --host 0.0.0.0 --port 4173"
|
"preview": "vite preview --host 0.0.0.0 --port 4173"
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -14,6 +14,7 @@ import SettingsScreen from './components/SettingsScreen';
|
|||||||
import UploadSurface from './components/UploadSurface';
|
import UploadSurface from './components/UploadSurface';
|
||||||
import {
|
import {
|
||||||
clearProcessingLogs,
|
clearProcessingLogs,
|
||||||
|
downloadBlobFile,
|
||||||
deleteDocument,
|
deleteDocument,
|
||||||
exportContentsMarkdown,
|
exportContentsMarkdown,
|
||||||
getAppSettings,
|
getAppSettings,
|
||||||
@@ -117,15 +118,6 @@ export default function App(): JSX.Element {
|
|||||||
}
|
}
|
||||||
}, []);
|
}, []);
|
||||||
|
|
||||||
const downloadBlob = useCallback((blob: Blob, filename: string): void => {
|
|
||||||
const objectUrl = URL.createObjectURL(blob);
|
|
||||||
const anchor = document.createElement('a');
|
|
||||||
anchor.href = objectUrl;
|
|
||||||
anchor.download = filename;
|
|
||||||
anchor.click();
|
|
||||||
URL.revokeObjectURL(objectUrl);
|
|
||||||
}, []);
|
|
||||||
|
|
||||||
const loadCatalogs = useCallback(async (): Promise<void> => {
|
const loadCatalogs = useCallback(async (): Promise<void> => {
|
||||||
const [tags, paths, types] = await Promise.all([listTags(true), listPaths(true), listTypes(true)]);
|
const [tags, paths, types] = await Promise.all([listTags(true), listPaths(true), listTypes(true)]);
|
||||||
setKnownTags(tags);
|
setKnownTags(tags);
|
||||||
@@ -465,13 +457,13 @@ export default function App(): JSX.Element {
|
|||||||
only_trashed: documentView === 'trash',
|
only_trashed: documentView === 'trash',
|
||||||
include_trashed: false,
|
include_trashed: false,
|
||||||
});
|
});
|
||||||
downloadBlob(result.blob, result.filename);
|
downloadBlobFile(result.blob, result.filename);
|
||||||
} catch (caughtError) {
|
} catch (caughtError) {
|
||||||
setError(caughtError instanceof Error ? caughtError.message : 'Failed to export selected markdown files');
|
setError(caughtError instanceof Error ? caughtError.message : 'Failed to export selected markdown files');
|
||||||
} finally {
|
} finally {
|
||||||
setIsRunningBulkAction(false);
|
setIsRunningBulkAction(false);
|
||||||
}
|
}
|
||||||
}, [documentView, downloadBlob, selectedDocumentIds]);
|
}, [documentView, selectedDocumentIds]);
|
||||||
|
|
||||||
const handleExportPath = useCallback(async (): Promise<void> => {
|
const handleExportPath = useCallback(async (): Promise<void> => {
|
||||||
const trimmedPrefix = exportPathInput.trim();
|
const trimmedPrefix = exportPathInput.trim();
|
||||||
@@ -487,13 +479,13 @@ export default function App(): JSX.Element {
|
|||||||
only_trashed: documentView === 'trash',
|
only_trashed: documentView === 'trash',
|
||||||
include_trashed: false,
|
include_trashed: false,
|
||||||
});
|
});
|
||||||
downloadBlob(result.blob, result.filename);
|
downloadBlobFile(result.blob, result.filename);
|
||||||
} catch (caughtError) {
|
} catch (caughtError) {
|
||||||
setError(caughtError instanceof Error ? caughtError.message : 'Failed to export path markdown files');
|
setError(caughtError instanceof Error ? caughtError.message : 'Failed to export path markdown files');
|
||||||
} finally {
|
} finally {
|
||||||
setIsRunningBulkAction(false);
|
setIsRunningBulkAction(false);
|
||||||
}
|
}
|
||||||
}, [documentView, downloadBlob, exportPathInput]);
|
}, [documentView, exportPathInput]);
|
||||||
|
|
||||||
const handleSaveSettings = useCallback(async (payload: AppSettingsUpdate): Promise<void> => {
|
const handleSaveSettings = useCallback(async (payload: AppSettingsUpdate): Promise<void> => {
|
||||||
setIsSavingSettings(true);
|
setIsSavingSettings(true);
|
||||||
|
|||||||
@@ -1,12 +1,17 @@
|
|||||||
/**
|
/**
|
||||||
* Card view for displaying document summary, preview, and metadata.
|
* Card view for displaying document summary, preview, and metadata.
|
||||||
*/
|
*/
|
||||||
import { useState } from 'react';
|
import { useEffect, useRef, useState } from 'react';
|
||||||
import type { JSX } from 'react';
|
import type { JSX } from 'react';
|
||||||
import { Download, FileText, Trash2 } from 'lucide-react';
|
import { Download, FileText, Trash2 } from 'lucide-react';
|
||||||
|
|
||||||
import type { DmsDocument } from '../types';
|
import type { DmsDocument } from '../types';
|
||||||
import { contentMarkdownUrl, downloadUrl, thumbnailUrl } from '../lib/api';
|
import {
|
||||||
|
downloadBlobFile,
|
||||||
|
downloadDocumentContentMarkdown,
|
||||||
|
downloadDocumentFile,
|
||||||
|
getDocumentThumbnailBlob,
|
||||||
|
} from '../lib/api';
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Defines properties accepted by the document card component.
|
* Defines properties accepted by the document card component.
|
||||||
@@ -79,12 +84,59 @@ export default function DocumentCard({
|
|||||||
onFilterTag,
|
onFilterTag,
|
||||||
}: DocumentCardProps): JSX.Element {
|
}: DocumentCardProps): JSX.Element {
|
||||||
const [isTrashing, setIsTrashing] = useState<boolean>(false);
|
const [isTrashing, setIsTrashing] = useState<boolean>(false);
|
||||||
|
const [thumbnailObjectUrl, setThumbnailObjectUrl] = useState<string | null>(null);
|
||||||
|
const thumbnailObjectUrlRef = useRef<string | null>(null);
|
||||||
const createdDate = new Date(document.created_at).toLocaleString();
|
const createdDate = new Date(document.created_at).toLocaleString();
|
||||||
const status = statusPresentation(document.status);
|
const status = statusPresentation(document.status);
|
||||||
const compactPath = compactLogicalPath(document.logical_path, 180);
|
const compactPath = compactLogicalPath(document.logical_path, 180);
|
||||||
const trashDisabled = isTrashView || document.status === 'trashed' || isTrashing;
|
const trashDisabled = isTrashView || document.status === 'trashed' || isTrashing;
|
||||||
const trashTitle = trashDisabled ? 'Already in trash' : 'Move to trash';
|
const trashTitle = trashDisabled ? 'Already in trash' : 'Move to trash';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Loads thumbnail preview through authenticated fetch and revokes replaced object URLs.
|
||||||
|
*/
|
||||||
|
useEffect(() => {
|
||||||
|
const revokeThumbnailObjectUrl = (): void => {
|
||||||
|
if (!thumbnailObjectUrlRef.current) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
URL.revokeObjectURL(thumbnailObjectUrlRef.current);
|
||||||
|
thumbnailObjectUrlRef.current = null;
|
||||||
|
};
|
||||||
|
|
||||||
|
if (!document.preview_available) {
|
||||||
|
revokeThumbnailObjectUrl();
|
||||||
|
setThumbnailObjectUrl(null);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
let cancelled = false;
|
||||||
|
const loadThumbnail = async (): Promise<void> => {
|
||||||
|
try {
|
||||||
|
const blob = await getDocumentThumbnailBlob(document.id);
|
||||||
|
if (cancelled) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
revokeThumbnailObjectUrl();
|
||||||
|
const objectUrl = URL.createObjectURL(blob);
|
||||||
|
thumbnailObjectUrlRef.current = objectUrl;
|
||||||
|
setThumbnailObjectUrl(objectUrl);
|
||||||
|
} catch {
|
||||||
|
if (cancelled) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
revokeThumbnailObjectUrl();
|
||||||
|
setThumbnailObjectUrl(null);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
void loadThumbnail();
|
||||||
|
return () => {
|
||||||
|
cancelled = true;
|
||||||
|
revokeThumbnailObjectUrl();
|
||||||
|
};
|
||||||
|
}, [document.id, document.preview_available]);
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<article
|
<article
|
||||||
className={`document-card ${isSelected ? 'selected' : ''}`}
|
className={`document-card ${isSelected ? 'selected' : ''}`}
|
||||||
@@ -119,8 +171,8 @@ export default function DocumentCard({
|
|||||||
</label>
|
</label>
|
||||||
</header>
|
</header>
|
||||||
<div className="document-preview">
|
<div className="document-preview">
|
||||||
{document.preview_available ? (
|
{document.preview_available && thumbnailObjectUrl ? (
|
||||||
<img src={thumbnailUrl(document.id)} alt={document.original_filename} loading="lazy" />
|
<img src={thumbnailObjectUrl} alt={document.original_filename} loading="lazy" />
|
||||||
) : (
|
) : (
|
||||||
<div className="document-preview-fallback">{document.extension || 'file'}</div>
|
<div className="document-preview-fallback">{document.extension || 'file'}</div>
|
||||||
)}
|
)}
|
||||||
@@ -173,7 +225,13 @@ export default function DocumentCard({
|
|||||||
onClick={(event) => {
|
onClick={(event) => {
|
||||||
event.preventDefault();
|
event.preventDefault();
|
||||||
event.stopPropagation();
|
event.stopPropagation();
|
||||||
window.open(downloadUrl(document.id), '_blank', 'noopener,noreferrer');
|
void (async (): Promise<void> => {
|
||||||
|
try {
|
||||||
|
const payload = await downloadDocumentFile(document.id);
|
||||||
|
downloadBlobFile(payload.blob, payload.filename);
|
||||||
|
} catch {
|
||||||
|
}
|
||||||
|
})();
|
||||||
}}
|
}}
|
||||||
>
|
>
|
||||||
<Download aria-hidden="true" />
|
<Download aria-hidden="true" />
|
||||||
@@ -186,7 +244,13 @@ export default function DocumentCard({
|
|||||||
onClick={(event) => {
|
onClick={(event) => {
|
||||||
event.preventDefault();
|
event.preventDefault();
|
||||||
event.stopPropagation();
|
event.stopPropagation();
|
||||||
window.open(contentMarkdownUrl(document.id), '_blank', 'noopener,noreferrer');
|
void (async (): Promise<void> => {
|
||||||
|
try {
|
||||||
|
const payload = await downloadDocumentContentMarkdown(document.id);
|
||||||
|
downloadBlobFile(payload.blob, payload.filename);
|
||||||
|
} catch {
|
||||||
|
}
|
||||||
|
})();
|
||||||
}}
|
}}
|
||||||
>
|
>
|
||||||
<FileText aria-hidden="true" />
|
<FileText aria-hidden="true" />
|
||||||
|
|||||||
@@ -1,14 +1,15 @@
|
|||||||
/**
|
/**
|
||||||
* Embedded document viewer panel for preview, metadata updates, and lifecycle actions.
|
* Embedded document viewer panel for preview, metadata updates, and lifecycle actions.
|
||||||
*/
|
*/
|
||||||
import { useEffect, useMemo, useState } from 'react';
|
import { useEffect, useMemo, useRef, useState } from 'react';
|
||||||
import type { JSX } from 'react';
|
import type { JSX } from 'react';
|
||||||
|
|
||||||
import {
|
import {
|
||||||
contentMarkdownUrl,
|
downloadBlobFile,
|
||||||
|
downloadDocumentContentMarkdown,
|
||||||
deleteDocument,
|
deleteDocument,
|
||||||
getDocumentDetails,
|
getDocumentDetails,
|
||||||
previewUrl,
|
getDocumentPreviewBlob,
|
||||||
reprocessDocument,
|
reprocessDocument,
|
||||||
restoreDocument,
|
restoreDocument,
|
||||||
trashDocument,
|
trashDocument,
|
||||||
@@ -44,6 +45,8 @@ export default function DocumentViewer({
|
|||||||
requestConfirmation,
|
requestConfirmation,
|
||||||
}: DocumentViewerProps): JSX.Element {
|
}: DocumentViewerProps): JSX.Element {
|
||||||
const [documentDetail, setDocumentDetail] = useState<DmsDocumentDetail | null>(null);
|
const [documentDetail, setDocumentDetail] = useState<DmsDocumentDetail | null>(null);
|
||||||
|
const [previewObjectUrl, setPreviewObjectUrl] = useState<string | null>(null);
|
||||||
|
const [isLoadingPreview, setIsLoadingPreview] = useState<boolean>(false);
|
||||||
const [isLoadingDetails, setIsLoadingDetails] = useState<boolean>(false);
|
const [isLoadingDetails, setIsLoadingDetails] = useState<boolean>(false);
|
||||||
const [originalFilename, setOriginalFilename] = useState<string>('');
|
const [originalFilename, setOriginalFilename] = useState<string>('');
|
||||||
const [logicalPath, setLogicalPath] = useState<string>('');
|
const [logicalPath, setLogicalPath] = useState<string>('');
|
||||||
@@ -55,6 +58,7 @@ export default function DocumentViewer({
|
|||||||
const [isDeleting, setIsDeleting] = useState<boolean>(false);
|
const [isDeleting, setIsDeleting] = useState<boolean>(false);
|
||||||
const [isMetadataDirty, setIsMetadataDirty] = useState<boolean>(false);
|
const [isMetadataDirty, setIsMetadataDirty] = useState<boolean>(false);
|
||||||
const [error, setError] = useState<string | null>(null);
|
const [error, setError] = useState<string | null>(null);
|
||||||
|
const previewObjectUrlRef = useRef<string | null>(null);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Syncs editable metadata fields whenever selection changes.
|
* Syncs editable metadata fields whenever selection changes.
|
||||||
@@ -62,6 +66,12 @@ export default function DocumentViewer({
|
|||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
if (!document) {
|
if (!document) {
|
||||||
setDocumentDetail(null);
|
setDocumentDetail(null);
|
||||||
|
if (previewObjectUrlRef.current) {
|
||||||
|
URL.revokeObjectURL(previewObjectUrlRef.current);
|
||||||
|
previewObjectUrlRef.current = null;
|
||||||
|
}
|
||||||
|
setPreviewObjectUrl(null);
|
||||||
|
setIsLoadingPreview(false);
|
||||||
setIsMetadataDirty(false);
|
setIsMetadataDirty(false);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
@@ -72,6 +82,57 @@ export default function DocumentViewer({
|
|||||||
setError(null);
|
setError(null);
|
||||||
}, [document?.id]);
|
}, [document?.id]);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Loads authenticated preview bytes and exposes a temporary object URL for iframe or image rendering.
|
||||||
|
*/
|
||||||
|
useEffect(() => {
|
||||||
|
const revokePreviewObjectUrl = (): void => {
|
||||||
|
if (!previewObjectUrlRef.current) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
URL.revokeObjectURL(previewObjectUrlRef.current);
|
||||||
|
previewObjectUrlRef.current = null;
|
||||||
|
};
|
||||||
|
|
||||||
|
if (!document) {
|
||||||
|
revokePreviewObjectUrl();
|
||||||
|
setPreviewObjectUrl(null);
|
||||||
|
setIsLoadingPreview(false);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
let cancelled = false;
|
||||||
|
setIsLoadingPreview(true);
|
||||||
|
const loadPreview = async (): Promise<void> => {
|
||||||
|
try {
|
||||||
|
const blob = await getDocumentPreviewBlob(document.id);
|
||||||
|
if (cancelled) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
revokePreviewObjectUrl();
|
||||||
|
const objectUrl = URL.createObjectURL(blob);
|
||||||
|
previewObjectUrlRef.current = objectUrl;
|
||||||
|
setPreviewObjectUrl(objectUrl);
|
||||||
|
} catch {
|
||||||
|
if (cancelled) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
revokePreviewObjectUrl();
|
||||||
|
setPreviewObjectUrl(null);
|
||||||
|
} finally {
|
||||||
|
if (!cancelled) {
|
||||||
|
setIsLoadingPreview(false);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
void loadPreview();
|
||||||
|
return () => {
|
||||||
|
cancelled = true;
|
||||||
|
revokePreviewObjectUrl();
|
||||||
|
};
|
||||||
|
}, [document?.id]);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Refreshes editable metadata from list updates only while form is clean.
|
* Refreshes editable metadata from list updates only while form is clean.
|
||||||
*/
|
*/
|
||||||
@@ -418,10 +479,16 @@ export default function DocumentViewer({
|
|||||||
<h2>{document.original_filename}</h2>
|
<h2>{document.original_filename}</h2>
|
||||||
<p className="small">Status: {document.status}</p>
|
<p className="small">Status: {document.status}</p>
|
||||||
<div className="viewer-preview">
|
<div className="viewer-preview">
|
||||||
{isImageDocument ? (
|
{previewObjectUrl ? (
|
||||||
<img src={previewUrl(document.id)} alt={document.original_filename} />
|
isImageDocument ? (
|
||||||
|
<img src={previewObjectUrl} alt={document.original_filename} />
|
||||||
) : (
|
) : (
|
||||||
<iframe src={previewUrl(document.id)} title={document.original_filename} />
|
<iframe src={previewObjectUrl} title={document.original_filename} />
|
||||||
|
)
|
||||||
|
) : isLoadingPreview ? (
|
||||||
|
<p className="small">Loading preview...</p>
|
||||||
|
) : (
|
||||||
|
<p className="small">Preview unavailable for this document.</p>
|
||||||
)}
|
)}
|
||||||
</div>
|
</div>
|
||||||
<label>
|
<label>
|
||||||
@@ -561,7 +628,16 @@ export default function DocumentViewer({
|
|||||||
<button
|
<button
|
||||||
type="button"
|
type="button"
|
||||||
className="secondary-action"
|
className="secondary-action"
|
||||||
onClick={() => window.open(contentMarkdownUrl(document.id), '_blank', 'noopener,noreferrer')}
|
onClick={() => {
|
||||||
|
void (async (): Promise<void> => {
|
||||||
|
try {
|
||||||
|
const payload = await downloadDocumentContentMarkdown(document.id);
|
||||||
|
downloadBlobFile(payload.blob, payload.filename);
|
||||||
|
} catch (caughtError) {
|
||||||
|
setError(caughtError instanceof Error ? caughtError.message : 'Failed to download markdown');
|
||||||
|
}
|
||||||
|
})();
|
||||||
|
}}
|
||||||
disabled={isDeleting}
|
disabled={isDeleting}
|
||||||
title="Downloads recognized/extracted content as markdown for this document."
|
title="Downloads recognized/extracted content as markdown for this document."
|
||||||
>
|
>
|
||||||
|
|||||||
85
frontend/src/lib/api.test.ts
Normal file
85
frontend/src/lib/api.test.ts
Normal file
@@ -0,0 +1,85 @@
|
|||||||
|
// @ts-expect-error Node strip-types runtime requires explicit .ts extension in ESM imports.
|
||||||
|
import { downloadDocumentContentMarkdown, downloadDocumentFile, getDocumentPreviewBlob, getDocumentThumbnailBlob } from './api.ts';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Throws when a test condition is false.
|
||||||
|
*/
|
||||||
|
function assert(condition: boolean, message: string): void {
|
||||||
|
if (!condition) {
|
||||||
|
throw new Error(message);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Verifies that async functions reject with an expected message fragment.
|
||||||
|
*/
|
||||||
|
async function assertRejects(action: () => Promise<unknown>, expectedMessage: string): Promise<void> {
|
||||||
|
try {
|
||||||
|
await action();
|
||||||
|
} catch (error) {
|
||||||
|
const message = error instanceof Error ? error.message : String(error);
|
||||||
|
assert(message.includes(expectedMessage), `Expected error containing "${expectedMessage}" but received "${message}"`);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
throw new Error(`Expected rejection containing "${expectedMessage}"`);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Runs API helper tests for authenticated media and download flows.
|
||||||
|
*/
|
||||||
|
async function runApiTests(): Promise<void> {
|
||||||
|
const originalFetch = globalThis.fetch;
|
||||||
|
|
||||||
|
try {
|
||||||
|
const requestUrls: string[] = [];
|
||||||
|
globalThis.fetch = (async (input: RequestInfo | URL): Promise<Response> => {
|
||||||
|
requestUrls.push(typeof input === 'string' ? input : input.toString());
|
||||||
|
return new Response('preview-bytes', { status: 200 });
|
||||||
|
}) as typeof fetch;
|
||||||
|
|
||||||
|
const thumbnail = await getDocumentThumbnailBlob('doc-1');
|
||||||
|
const preview = await getDocumentPreviewBlob('doc-1');
|
||||||
|
|
||||||
|
assert(await thumbnail.text() === 'preview-bytes', 'Thumbnail blob bytes mismatch');
|
||||||
|
assert(await preview.text() === 'preview-bytes', 'Preview blob bytes mismatch');
|
||||||
|
assert(
|
||||||
|
requestUrls[0] === 'http://localhost:8000/api/v1/documents/doc-1/thumbnail',
|
||||||
|
`Unexpected thumbnail URL ${requestUrls[0]}`,
|
||||||
|
);
|
||||||
|
assert(
|
||||||
|
requestUrls[1] === 'http://localhost:8000/api/v1/documents/doc-1/preview',
|
||||||
|
`Unexpected preview URL ${requestUrls[1]}`,
|
||||||
|
);
|
||||||
|
|
||||||
|
globalThis.fetch = (async (): Promise<Response> => {
|
||||||
|
return new Response('file-bytes', {
|
||||||
|
status: 200,
|
||||||
|
headers: {
|
||||||
|
'content-disposition': 'attachment; filename="invoice.pdf"',
|
||||||
|
},
|
||||||
|
});
|
||||||
|
}) as typeof fetch;
|
||||||
|
|
||||||
|
const fileResult = await downloadDocumentFile('doc-2');
|
||||||
|
assert(fileResult.filename === 'invoice.pdf', `Unexpected download filename ${fileResult.filename}`);
|
||||||
|
assert((await fileResult.blob.text()) === 'file-bytes', 'Original download bytes mismatch');
|
||||||
|
|
||||||
|
globalThis.fetch = (async (): Promise<Response> => {
|
||||||
|
return new Response('# markdown', { status: 200 });
|
||||||
|
}) as typeof fetch;
|
||||||
|
|
||||||
|
const markdownResult = await downloadDocumentContentMarkdown('doc-3');
|
||||||
|
assert(markdownResult.filename === 'document-content.md', `Unexpected markdown filename ${markdownResult.filename}`);
|
||||||
|
assert((await markdownResult.blob.text()) === '# markdown', 'Markdown bytes mismatch');
|
||||||
|
|
||||||
|
globalThis.fetch = (async (): Promise<Response> => {
|
||||||
|
return new Response('forbidden', { status: 401 });
|
||||||
|
}) as typeof fetch;
|
||||||
|
|
||||||
|
await assertRejects(async () => downloadDocumentContentMarkdown('doc-4'), 'Failed to download document markdown');
|
||||||
|
} finally {
|
||||||
|
globalThis.fetch = originalFetch;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
await runApiTests();
|
||||||
@@ -16,7 +16,40 @@ import type {
|
|||||||
/**
|
/**
|
||||||
* Resolves backend base URL from environment with localhost fallback.
|
* Resolves backend base URL from environment with localhost fallback.
|
||||||
*/
|
*/
|
||||||
const API_BASE = import.meta.env.VITE_API_BASE ?? 'http://localhost:8000/api/v1';
|
const API_BASE = import.meta.env?.VITE_API_BASE ?? 'http://localhost:8000/api/v1';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Optional bearer token used for authenticated backend routes.
|
||||||
|
*/
|
||||||
|
const API_TOKEN = import.meta.env?.VITE_API_TOKEN?.trim();
|
||||||
|
|
||||||
|
type ApiRequestInit = Omit<RequestInit, 'headers'> & { headers?: HeadersInit };
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Merges request headers and appends bearer authorization when configured.
|
||||||
|
*/
|
||||||
|
function buildRequestHeaders(headers?: HeadersInit): Headers | undefined {
|
||||||
|
if (!API_TOKEN && !headers) {
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
|
||||||
|
const requestHeaders = new Headers(headers);
|
||||||
|
if (API_TOKEN) {
|
||||||
|
requestHeaders.set('Authorization', `Bearer ${API_TOKEN}`);
|
||||||
|
}
|
||||||
|
return requestHeaders;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Executes an API request with centralized auth-header handling.
|
||||||
|
*/
|
||||||
|
function apiRequest(input: string, init: ApiRequestInit = {}): Promise<Response> {
|
||||||
|
const headers = buildRequestHeaders(init.headers);
|
||||||
|
return fetch(input, {
|
||||||
|
...init,
|
||||||
|
...(headers ? { headers } : {}),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Encodes query parameters while skipping undefined and null values.
|
* Encodes query parameters while skipping undefined and null values.
|
||||||
@@ -45,6 +78,22 @@ function responseFilename(response: Response, fallback: string): string {
|
|||||||
return match[1];
|
return match[1];
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Triggers a browser file download for blob payloads and releases temporary object URLs.
|
||||||
|
*/
|
||||||
|
export function downloadBlobFile(blob: Blob, filename: string): void {
|
||||||
|
const objectUrl = URL.createObjectURL(blob);
|
||||||
|
const anchor = document.createElement('a');
|
||||||
|
anchor.href = objectUrl;
|
||||||
|
anchor.download = filename;
|
||||||
|
document.body.appendChild(anchor);
|
||||||
|
anchor.click();
|
||||||
|
anchor.remove();
|
||||||
|
window.setTimeout(() => {
|
||||||
|
URL.revokeObjectURL(objectUrl);
|
||||||
|
}, 0);
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Loads documents from the backend list endpoint.
|
* Loads documents from the backend list endpoint.
|
||||||
*/
|
*/
|
||||||
@@ -72,7 +121,7 @@ export async function listDocuments(options?: {
|
|||||||
processed_from: options?.processedFrom,
|
processed_from: options?.processedFrom,
|
||||||
processed_to: options?.processedTo,
|
processed_to: options?.processedTo,
|
||||||
});
|
});
|
||||||
const response = await fetch(`${API_BASE}/documents${query}`);
|
const response = await apiRequest(`${API_BASE}/documents${query}`);
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
throw new Error('Failed to load documents');
|
throw new Error('Failed to load documents');
|
||||||
}
|
}
|
||||||
@@ -108,7 +157,7 @@ export async function searchDocuments(
|
|||||||
processed_from: options?.processedFrom,
|
processed_from: options?.processedFrom,
|
||||||
processed_to: options?.processedTo,
|
processed_to: options?.processedTo,
|
||||||
});
|
});
|
||||||
const response = await fetch(`${API_BASE}/search${query}`);
|
const response = await apiRequest(`${API_BASE}/search${query}`);
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
throw new Error('Search failed');
|
throw new Error('Search failed');
|
||||||
}
|
}
|
||||||
@@ -128,7 +177,7 @@ export async function listProcessingLogs(options?: {
|
|||||||
offset: options?.offset ?? 0,
|
offset: options?.offset ?? 0,
|
||||||
document_id: options?.documentId,
|
document_id: options?.documentId,
|
||||||
});
|
});
|
||||||
const response = await fetch(`${API_BASE}/processing/logs${query}`);
|
const response = await apiRequest(`${API_BASE}/processing/logs${query}`);
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
throw new Error('Failed to load processing logs');
|
throw new Error('Failed to load processing logs');
|
||||||
}
|
}
|
||||||
@@ -146,7 +195,7 @@ export async function trimProcessingLogs(options?: {
|
|||||||
keep_document_sessions: options?.keepDocumentSessions ?? 2,
|
keep_document_sessions: options?.keepDocumentSessions ?? 2,
|
||||||
keep_unbound_entries: options?.keepUnboundEntries ?? 80,
|
keep_unbound_entries: options?.keepUnboundEntries ?? 80,
|
||||||
});
|
});
|
||||||
const response = await fetch(`${API_BASE}/processing/logs/trim${query}`, {
|
const response = await apiRequest(`${API_BASE}/processing/logs/trim${query}`, {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
});
|
});
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
@@ -159,7 +208,7 @@ export async function trimProcessingLogs(options?: {
|
|||||||
* Clears all persisted processing logs.
|
* Clears all persisted processing logs.
|
||||||
*/
|
*/
|
||||||
export async function clearProcessingLogs(): Promise<{ deleted_entries: number }> {
|
export async function clearProcessingLogs(): Promise<{ deleted_entries: number }> {
|
||||||
const response = await fetch(`${API_BASE}/processing/logs/clear`, {
|
const response = await apiRequest(`${API_BASE}/processing/logs/clear`, {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
});
|
});
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
@@ -173,7 +222,7 @@ export async function clearProcessingLogs(): Promise<{ deleted_entries: number }
|
|||||||
*/
|
*/
|
||||||
export async function listTags(includeTrashed = false): Promise<string[]> {
|
export async function listTags(includeTrashed = false): Promise<string[]> {
|
||||||
const query = buildQuery({ include_trashed: includeTrashed });
|
const query = buildQuery({ include_trashed: includeTrashed });
|
||||||
const response = await fetch(`${API_BASE}/documents/tags${query}`);
|
const response = await apiRequest(`${API_BASE}/documents/tags${query}`);
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
throw new Error('Failed to load tags');
|
throw new Error('Failed to load tags');
|
||||||
}
|
}
|
||||||
@@ -186,7 +235,7 @@ export async function listTags(includeTrashed = false): Promise<string[]> {
|
|||||||
*/
|
*/
|
||||||
export async function listPaths(includeTrashed = false): Promise<string[]> {
|
export async function listPaths(includeTrashed = false): Promise<string[]> {
|
||||||
const query = buildQuery({ include_trashed: includeTrashed });
|
const query = buildQuery({ include_trashed: includeTrashed });
|
||||||
const response = await fetch(`${API_BASE}/documents/paths${query}`);
|
const response = await apiRequest(`${API_BASE}/documents/paths${query}`);
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
throw new Error('Failed to load paths');
|
throw new Error('Failed to load paths');
|
||||||
}
|
}
|
||||||
@@ -199,7 +248,7 @@ export async function listPaths(includeTrashed = false): Promise<string[]> {
|
|||||||
*/
|
*/
|
||||||
export async function listTypes(includeTrashed = false): Promise<string[]> {
|
export async function listTypes(includeTrashed = false): Promise<string[]> {
|
||||||
const query = buildQuery({ include_trashed: includeTrashed });
|
const query = buildQuery({ include_trashed: includeTrashed });
|
||||||
const response = await fetch(`${API_BASE}/documents/types${query}`);
|
const response = await apiRequest(`${API_BASE}/documents/types${query}`);
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
throw new Error('Failed to load document types');
|
throw new Error('Failed to load document types');
|
||||||
}
|
}
|
||||||
@@ -228,7 +277,7 @@ export async function uploadDocuments(
|
|||||||
formData.append('tags', options.tags);
|
formData.append('tags', options.tags);
|
||||||
formData.append('conflict_mode', options.conflictMode);
|
formData.append('conflict_mode', options.conflictMode);
|
||||||
|
|
||||||
const response = await fetch(`${API_BASE}/documents/upload`, {
|
const response = await apiRequest(`${API_BASE}/documents/upload`, {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
body: formData,
|
body: formData,
|
||||||
});
|
});
|
||||||
@@ -245,7 +294,7 @@ export async function updateDocumentMetadata(
|
|||||||
documentId: string,
|
documentId: string,
|
||||||
payload: { original_filename?: string; logical_path?: string; tags?: string[] },
|
payload: { original_filename?: string; logical_path?: string; tags?: string[] },
|
||||||
): Promise<DmsDocument> {
|
): Promise<DmsDocument> {
|
||||||
const response = await fetch(`${API_BASE}/documents/${documentId}`, {
|
const response = await apiRequest(`${API_BASE}/documents/${documentId}`, {
|
||||||
method: 'PATCH',
|
method: 'PATCH',
|
||||||
headers: {
|
headers: {
|
||||||
'Content-Type': 'application/json',
|
'Content-Type': 'application/json',
|
||||||
@@ -262,7 +311,7 @@ export async function updateDocumentMetadata(
|
|||||||
* Moves a document to trash state without removing stored files.
|
* Moves a document to trash state without removing stored files.
|
||||||
*/
|
*/
|
||||||
export async function trashDocument(documentId: string): Promise<DmsDocument> {
|
export async function trashDocument(documentId: string): Promise<DmsDocument> {
|
||||||
const response = await fetch(`${API_BASE}/documents/${documentId}/trash`, { method: 'POST' });
|
const response = await apiRequest(`${API_BASE}/documents/${documentId}/trash`, { method: 'POST' });
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
throw new Error('Failed to trash document');
|
throw new Error('Failed to trash document');
|
||||||
}
|
}
|
||||||
@@ -273,7 +322,7 @@ export async function trashDocument(documentId: string): Promise<DmsDocument> {
|
|||||||
* Restores a document from trash to active state.
|
* Restores a document from trash to active state.
|
||||||
*/
|
*/
|
||||||
export async function restoreDocument(documentId: string): Promise<DmsDocument> {
|
export async function restoreDocument(documentId: string): Promise<DmsDocument> {
|
||||||
const response = await fetch(`${API_BASE}/documents/${documentId}/restore`, { method: 'POST' });
|
const response = await apiRequest(`${API_BASE}/documents/${documentId}/restore`, { method: 'POST' });
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
throw new Error('Failed to restore document');
|
throw new Error('Failed to restore document');
|
||||||
}
|
}
|
||||||
@@ -284,7 +333,7 @@ export async function restoreDocument(documentId: string): Promise<DmsDocument>
|
|||||||
* Permanently deletes a document record and associated stored files.
|
* Permanently deletes a document record and associated stored files.
|
||||||
*/
|
*/
|
||||||
export async function deleteDocument(documentId: string): Promise<{ deleted_documents: number; deleted_files: number }> {
|
export async function deleteDocument(documentId: string): Promise<{ deleted_documents: number; deleted_files: number }> {
|
||||||
const response = await fetch(`${API_BASE}/documents/${documentId}`, { method: 'DELETE' });
|
const response = await apiRequest(`${API_BASE}/documents/${documentId}`, { method: 'DELETE' });
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
throw new Error('Failed to delete document');
|
throw new Error('Failed to delete document');
|
||||||
}
|
}
|
||||||
@@ -295,7 +344,7 @@ export async function deleteDocument(documentId: string): Promise<{ deleted_docu
|
|||||||
* Loads full details for one document, including extracted text content.
|
* Loads full details for one document, including extracted text content.
|
||||||
*/
|
*/
|
||||||
export async function getDocumentDetails(documentId: string): Promise<DmsDocumentDetail> {
|
export async function getDocumentDetails(documentId: string): Promise<DmsDocumentDetail> {
|
||||||
const response = await fetch(`${API_BASE}/documents/${documentId}`);
|
const response = await apiRequest(`${API_BASE}/documents/${documentId}`);
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
throw new Error('Failed to load document details');
|
throw new Error('Failed to load document details');
|
||||||
}
|
}
|
||||||
@@ -306,7 +355,7 @@ export async function getDocumentDetails(documentId: string): Promise<DmsDocumen
|
|||||||
* Re-enqueues one document for extraction and classification processing.
|
* Re-enqueues one document for extraction and classification processing.
|
||||||
*/
|
*/
|
||||||
export async function reprocessDocument(documentId: string): Promise<DmsDocument> {
|
export async function reprocessDocument(documentId: string): Promise<DmsDocument> {
|
||||||
const response = await fetch(`${API_BASE}/documents/${documentId}/reprocess`, {
|
const response = await apiRequest(`${API_BASE}/documents/${documentId}/reprocess`, {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
});
|
});
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
@@ -343,6 +392,60 @@ export function contentMarkdownUrl(documentId: string): string {
|
|||||||
return `${API_BASE}/documents/${documentId}/content-md`;
|
return `${API_BASE}/documents/${documentId}/content-md`;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Downloads preview bytes for one document using centralized auth headers.
|
||||||
|
*/
|
||||||
|
export async function getDocumentPreviewBlob(documentId: string): Promise<Blob> {
|
||||||
|
const response = await apiRequest(previewUrl(documentId));
|
||||||
|
if (!response.ok) {
|
||||||
|
throw new Error('Failed to load document preview');
|
||||||
|
}
|
||||||
|
return response.blob();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Downloads thumbnail bytes for one document using centralized auth headers.
|
||||||
|
*/
|
||||||
|
export async function getDocumentThumbnailBlob(documentId: string): Promise<Blob> {
|
||||||
|
const response = await apiRequest(thumbnailUrl(documentId));
|
||||||
|
if (!response.ok) {
|
||||||
|
throw new Error('Failed to load document thumbnail');
|
||||||
|
}
|
||||||
|
return response.blob();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Downloads the original document payload with backend-provided filename fallback.
|
||||||
|
*/
|
||||||
|
export async function downloadDocumentFile(documentId: string): Promise<{ blob: Blob; filename: string }> {
|
||||||
|
const response = await apiRequest(downloadUrl(documentId));
|
||||||
|
if (!response.ok) {
|
||||||
|
throw new Error('Failed to download document');
|
||||||
|
}
|
||||||
|
const blob = await response.blob();
|
||||||
|
return {
|
||||||
|
blob,
|
||||||
|
filename: responseFilename(response, 'document-download'),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Downloads extracted markdown content for one document with backend-provided filename fallback.
|
||||||
|
*/
|
||||||
|
export async function downloadDocumentContentMarkdown(
|
||||||
|
documentId: string,
|
||||||
|
): Promise<{ blob: Blob; filename: string }> {
|
||||||
|
const response = await apiRequest(contentMarkdownUrl(documentId));
|
||||||
|
if (!response.ok) {
|
||||||
|
throw new Error('Failed to download document markdown');
|
||||||
|
}
|
||||||
|
const blob = await response.blob();
|
||||||
|
return {
|
||||||
|
blob,
|
||||||
|
filename: responseFilename(response, 'document-content.md'),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Exports extracted content markdown files for selected documents or path filters.
|
* Exports extracted content markdown files for selected documents or path filters.
|
||||||
*/
|
*/
|
||||||
@@ -352,7 +455,7 @@ export async function exportContentsMarkdown(payload: {
|
|||||||
include_trashed?: boolean;
|
include_trashed?: boolean;
|
||||||
only_trashed?: boolean;
|
only_trashed?: boolean;
|
||||||
}): Promise<{ blob: Blob; filename: string }> {
|
}): Promise<{ blob: Blob; filename: string }> {
|
||||||
const response = await fetch(`${API_BASE}/documents/content-md/export`, {
|
const response = await apiRequest(`${API_BASE}/documents/content-md/export`, {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
headers: {
|
headers: {
|
||||||
'Content-Type': 'application/json',
|
'Content-Type': 'application/json',
|
||||||
@@ -373,7 +476,7 @@ export async function exportContentsMarkdown(payload: {
|
|||||||
* Retrieves persisted application settings from backend.
|
* Retrieves persisted application settings from backend.
|
||||||
*/
|
*/
|
||||||
export async function getAppSettings(): Promise<AppSettings> {
|
export async function getAppSettings(): Promise<AppSettings> {
|
||||||
const response = await fetch(`${API_BASE}/settings`);
|
const response = await apiRequest(`${API_BASE}/settings`);
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
throw new Error('Failed to load application settings');
|
throw new Error('Failed to load application settings');
|
||||||
}
|
}
|
||||||
@@ -384,7 +487,7 @@ export async function getAppSettings(): Promise<AppSettings> {
|
|||||||
* Updates provider and task settings for OpenAI-compatible model execution.
|
* Updates provider and task settings for OpenAI-compatible model execution.
|
||||||
*/
|
*/
|
||||||
export async function updateAppSettings(payload: AppSettingsUpdate): Promise<AppSettings> {
|
export async function updateAppSettings(payload: AppSettingsUpdate): Promise<AppSettings> {
|
||||||
const response = await fetch(`${API_BASE}/settings`, {
|
const response = await apiRequest(`${API_BASE}/settings`, {
|
||||||
method: 'PATCH',
|
method: 'PATCH',
|
||||||
headers: {
|
headers: {
|
||||||
'Content-Type': 'application/json',
|
'Content-Type': 'application/json',
|
||||||
@@ -401,7 +504,7 @@ export async function updateAppSettings(payload: AppSettingsUpdate): Promise<App
|
|||||||
* Resets persisted provider and task settings to backend defaults.
|
* Resets persisted provider and task settings to backend defaults.
|
||||||
*/
|
*/
|
||||||
export async function resetAppSettings(): Promise<AppSettings> {
|
export async function resetAppSettings(): Promise<AppSettings> {
|
||||||
const response = await fetch(`${API_BASE}/settings/reset`, {
|
const response = await apiRequest(`${API_BASE}/settings/reset`, {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
});
|
});
|
||||||
if (!response.ok) {
|
if (!response.ok) {
|
||||||
|
|||||||
Reference in New Issue
Block a user