Fix predefined catalog visibility and port security must-know guidance

This commit is contained in:
2026-03-01 21:15:12 -03:00
parent 32b4589b28
commit 874597e40b
4 changed files with 275 additions and 26 deletions

View File

@@ -75,34 +75,69 @@ docker compose down
## Security Must-Know Before Real User Deployment ## Security Must-Know Before Real User Deployment
This repository starts in a development-friendly mode. Before exposing it to real users or untrusted networks, verify these controls: The items below port the `MUST KNOW User-Dependent Risks` from `REPORT.md` into explicit operator actions.
1. Environment mode and host binding: ### High: Development-first defaults can be promoted to production
Avoid:
- Set `APP_ENV=production`. - Set `APP_ENV=production`.
- Keep `HOST_BIND_IP=127.0.0.1` and publish through an HTTPS reverse proxy instead of broad host bind. - Set `PROVIDER_BASE_URL_ALLOW_HTTP=false`.
- Set `PROVIDER_BASE_URL_ALLOW_PRIVATE_NETWORK=false`.
- Set a strict non-empty `PROVIDER_BASE_URL_ALLOWLIST` for approved provider hosts only.
- Set `PUBLIC_BASE_URL` to HTTPS.
- Restrict `CORS_ORIGINS` to exact production frontend origins.
- Use `REDIS_URL` with `rediss://`.
- Set `REDIS_SECURITY_MODE=strict`.
- Set `REDIS_TLS_MODE=required`.
- Keep `HOST_BIND_IP=127.0.0.1` and expose services only through an HTTPS reverse proxy.
2. Bootstrap credentials: Remedy:
- Replace all `AUTH_BOOTSTRAP_*` values with strong unique passwords before first public deployment. - Immediately correct the values above and redeploy `api` and `worker` (`docker compose up -d api worker`).
- Disable optional bootstrap user credentials unless they are needed. - Rotate `AUTH_BOOTSTRAP_*` credentials, provider API keys, and Redis credentials if insecure values were used in a reachable environment.
- Re-check `.env.example` and `docker-compose.yml` before each production promotion.
3. Processing log text persistence: ### Medium: Login throttle IP identity depends on proxy trust model
- Keep `PROCESSING_LOG_STORE_MODEL_IO_TEXT=false` and `PROCESSING_LOG_STORE_PAYLOAD_TEXT=false` unless temporary debugging is required.
- Enabling these values can store sensitive prompt, response, and payload text.
4. Provider outbound restrictions: Current behavior:
- Keep `PROVIDER_BASE_URL_ALLOW_HTTP=false` and `PROVIDER_BASE_URL_ALLOW_PRIVATE_NETWORK=false`. - Login throttle identity currently uses `request.client.host` directly.
- Set a strict `PROVIDER_BASE_URL_ALLOWLIST` containing only approved provider hosts.
5. Public URL and CORS posture: Avoid:
- Use HTTPS in `PUBLIC_BASE_URL`. - Deploy so the backend receives true client IP addresses and does not collapse all traffic to one proxy source IP.
- Restrict `CORS_ORIGINS` to exact production frontend origins only. - Validate lockout behavior with multiple client IPs before going live behind a proxy.
6. Redis transport security: Remedy:
- For live deployments, use `REDIS_URL` with `rediss://`, set `REDIS_SECURITY_MODE=strict`, and set `REDIS_TLS_MODE=required`. - If lockouts affect many users at once, temporarily increase `AUTH_LOGIN_FAILURE_LIMIT` and tune lockout timings to reduce impact while mitigation is in progress.
- Update network and proxy topology so client IP identity is preserved for the backend, then re-run lockout validation tests.
7. Development compose defaults: ### Medium: API documentation endpoints are exposed by default
- Review `.env.example` and `docker-compose.yml` security-related defaults before deployment.
- Do not promote development defaults unchanged into production. Avoid:
- Block public access to `/docs`, `/redoc`, and `/openapi.json` at the reverse proxy or edge firewall.
- Keep docs endpoints reachable only from trusted internal/admin networks.
Remedy:
- Add deny rules for those paths immediately and reload the proxy.
- Verify those routes return `403` or `404` from untrusted networks.
### Medium: Bearer token is stored in browser `sessionStorage`
Avoid:
- Enforce strict CSP and disallow inline script execution where possible.
- Avoid rendering untrusted HTML or script-capable content in the frontend.
- Keep dependencies patched to reduce known XSS vectors.
Remedy:
- If XSS is suspected, revoke active sessions, rotate privileged credentials, and redeploy frontend fixes before restoring user access.
- Treat exposed browser sessions as compromised until revocation and credential rotation are complete.
### Low: Typesense transport defaults to HTTP on internal network
Avoid:
- Keep Typesense on isolated internal networks only.
- Do not expose Typesense service ports directly to untrusted networks.
Remedy:
- For cross-host or untrusted network paths, terminate TLS in front of Typesense (or use equivalent secure service networking) and require encrypted transport for all clients.
## Common Operations ## Common Operations

View File

@@ -50,6 +50,31 @@ def _scope_document_statement_for_auth_context(statement, auth_context: AuthCont
return statement.where(Document.owner_user_id == auth_context.user_id) return statement.where(Document.owner_user_id == auth_context.user_id)
def _is_predefined_entry_visible_to_auth_context(entry: dict[str, object], auth_context: AuthContext) -> bool:
"""Returns whether one predefined catalog entry is visible to the active caller role."""
if auth_context.role == UserRole.ADMIN:
return True
return bool(entry.get("global_shared", False))
def _collect_visible_predefined_values(
entries: list[dict[str, object]],
*,
auth_context: AuthContext,
) -> set[str]:
"""Collects normalized predefined values visible for the active caller role."""
visible_values: set[str] = set()
for entry in entries:
if not _is_predefined_entry_visible_to_auth_context(entry, auth_context):
continue
normalized = str(entry.get("value", "")).strip()
if normalized:
visible_values.add(normalized)
return visible_values
def _ensure_document_access(document: Document, auth_context: AuthContext) -> None: def _ensure_document_access(document: Document, auth_context: AuthContext) -> None:
"""Enforces owner-level access for non-admin users and raises not-found on violations.""" """Enforces owner-level access for non-admin users and raises not-found on violations."""
@@ -397,9 +422,10 @@ def list_tags(
rows = session.execute(statement).scalars().all() rows = session.execute(statement).scalars().all()
tags = {tag for row in rows for tag in row if tag} tags = {tag for row in rows for tag in row if tag}
tags.update( tags.update(
str(item.get("value", "")).strip() _collect_visible_predefined_values(
for item in read_predefined_tags_settings() read_predefined_tags_settings(),
if str(item.get("value", "")).strip() auth_context=auth_context,
)
) )
tags = sorted(tags) tags = sorted(tags)
return {"tags": tags} return {"tags": tags}
@@ -421,9 +447,10 @@ def list_paths(
rows = session.execute(statement).scalars().all() rows = session.execute(statement).scalars().all()
paths = {row for row in rows if row} paths = {row for row in rows if row}
paths.update( paths.update(
str(item.get("value", "")).strip() _collect_visible_predefined_values(
for item in read_predefined_paths_settings() read_predefined_paths_settings(),
if str(item.get("value", "")).strip() auth_context=auth_context,
)
) )
paths = sorted(paths) paths = sorted(paths)
return {"paths": paths} return {"paths": paths}

View File

@@ -63,6 +63,22 @@ if "fastapi" not in sys.modules:
return decorator return decorator
def patch(self, *_args: object, **_kwargs: object): # type: ignore[no-untyped-def]
"""Returns no-op decorator for PATCH route declarations."""
def decorator(func): # type: ignore[no-untyped-def]
return func
return decorator
def delete(self, *_args: object, **_kwargs: object): # type: ignore[no-untyped-def]
"""Returns no-op decorator for DELETE route declarations."""
def decorator(func): # type: ignore[no-untyped-def]
return func
return decorator
class _Request: class _Request:
"""Minimal request placeholder for route function import compatibility.""" """Minimal request placeholder for route function import compatibility."""
@@ -88,13 +104,52 @@ if "fastapi" not in sys.modules:
return dependency return dependency
def _query(default=None, **_kwargs): # type: ignore[no-untyped-def]
"""Returns FastAPI-like query defaults for dependency-light route imports."""
return default
def _file(default=None, **_kwargs): # type: ignore[no-untyped-def]
"""Returns FastAPI-like file defaults for dependency-light route imports."""
return default
def _form(default=None, **_kwargs): # type: ignore[no-untyped-def]
"""Returns FastAPI-like form defaults for dependency-light route imports."""
return default
class _UploadFile:
"""Minimal UploadFile placeholder for route import compatibility."""
fastapi_stub.APIRouter = _APIRouter fastapi_stub.APIRouter = _APIRouter
fastapi_stub.Depends = _depends fastapi_stub.Depends = _depends
fastapi_stub.File = _file
fastapi_stub.Form = _form
fastapi_stub.HTTPException = _HTTPException fastapi_stub.HTTPException = _HTTPException
fastapi_stub.Query = _query
fastapi_stub.Request = _Request fastapi_stub.Request = _Request
fastapi_stub.UploadFile = _UploadFile
fastapi_stub.status = _Status() fastapi_stub.status = _Status()
sys.modules["fastapi"] = fastapi_stub sys.modules["fastapi"] = fastapi_stub
if "fastapi.responses" not in sys.modules:
fastapi_responses_stub = ModuleType("fastapi.responses")
class _Response:
"""Minimal response placeholder for route import compatibility."""
class _FileResponse(_Response):
"""Minimal file response placeholder for route import compatibility."""
class _StreamingResponse(_Response):
"""Minimal streaming response placeholder for route import compatibility."""
fastapi_responses_stub.Response = _Response
fastapi_responses_stub.FileResponse = _FileResponse
fastapi_responses_stub.StreamingResponse = _StreamingResponse
sys.modules["fastapi.responses"] = fastapi_responses_stub
if "fastapi.security" not in sys.modules: if "fastapi.security" not in sys.modules:
fastapi_security_stub = ModuleType("fastapi.security") fastapi_security_stub = ModuleType("fastapi.security")
@@ -267,8 +322,14 @@ if "app.services.handwriting_style" not in sys.modules:
return None return None
def _delete_many_handwriting_style_documents(*_args: object, **_kwargs: object) -> None:
"""No-op bulk style document delete stub for route import compatibility."""
return None
handwriting_style_stub.assign_handwriting_style = _assign_handwriting_style handwriting_style_stub.assign_handwriting_style = _assign_handwriting_style
handwriting_style_stub.delete_handwriting_style_document = _delete_handwriting_style_document handwriting_style_stub.delete_handwriting_style_document = _delete_handwriting_style_document
handwriting_style_stub.delete_many_handwriting_style_documents = _delete_many_handwriting_style_documents
sys.modules["app.services.handwriting_style"] = handwriting_style_stub sys.modules["app.services.handwriting_style"] = handwriting_style_stub
if "app.services.routing_pipeline" not in sys.modules: if "app.services.routing_pipeline" not in sys.modules:
@@ -304,6 +365,7 @@ from fastapi import HTTPException
from app.api.auth import AuthContext, require_admin from app.api.auth import AuthContext, require_admin
from app.api import routes_auth as auth_routes_module from app.api import routes_auth as auth_routes_module
from app.api import routes_documents as documents_routes_module
from app.core import config as config_module from app.core import config as config_module
from app.models.auth import UserRole from app.models.auth import UserRole
from app.models.processing_log import sanitize_processing_log_payload_value, sanitize_processing_log_text from app.models.processing_log import sanitize_processing_log_payload_value, sanitize_processing_log_text
@@ -359,6 +421,125 @@ class AuthDependencyTests(unittest.TestCase):
self.assertEqual(resolved.role, UserRole.ADMIN) self.assertEqual(resolved.role, UserRole.ADMIN)
class DocumentCatalogVisibilityTests(unittest.TestCase):
"""Verifies predefined tag and path discovery visibility by caller role."""
class _ScalarSequence:
"""Provides SQLAlchemy-like scalar result chaining for route unit tests."""
def __init__(self, values: list[object]) -> None:
self._values = values
def scalars(self) -> "DocumentCatalogVisibilityTests._ScalarSequence":
"""Returns self to emulate `.scalars().all()` call chains."""
return self
def all(self) -> list[object]:
"""Returns deterministic sequence values for route helper queries."""
return list(self._values)
class _SessionStub:
"""Returns a fixed scalar sequence for route metadata queries."""
def __init__(self, values: list[object]) -> None:
self._values = values
def execute(self, _statement: object) -> "DocumentCatalogVisibilityTests._ScalarSequence":
"""Ignores query details and returns deterministic scalar sequence results."""
return DocumentCatalogVisibilityTests._ScalarSequence(self._values)
@staticmethod
def _auth_context(role: UserRole) -> AuthContext:
"""Builds deterministic auth context fixtures for document discovery tests."""
return AuthContext(
user_id=uuid.uuid4(),
username=f"{role.value}-user",
role=role,
session_id=uuid.uuid4(),
expires_at=datetime.now(UTC),
)
def test_non_admin_only_receives_global_shared_predefined_tags_and_paths(self) -> None:
"""User role receives only globally shared predefined values in discovery responses."""
session = self._SessionStub(
values=[
["owner-tag", ""],
["owner-only-tag"],
]
)
predefined_tags = [
{"value": "SharedTag", "global_shared": True},
{"value": "InternalTag", "global_shared": False},
{"value": "ImplicitPrivateTag"},
]
predefined_paths = [
{"value": "Shared/Path", "global_shared": True},
{"value": "Internal/Path", "global_shared": False},
{"value": "Implicit/Private"},
]
with (
patch.object(documents_routes_module, "read_predefined_tags_settings", return_value=predefined_tags),
patch.object(documents_routes_module, "read_predefined_paths_settings", return_value=predefined_paths),
):
tags_response = documents_routes_module.list_tags(
include_trashed=False,
auth_context=self._auth_context(UserRole.USER),
session=session,
)
paths_response = documents_routes_module.list_paths(
include_trashed=False,
auth_context=self._auth_context(UserRole.USER),
session=self._SessionStub(values=["Owner/Path"]),
)
self.assertEqual(tags_response["tags"], ["SharedTag", "owner-only-tag", "owner-tag"])
self.assertEqual(paths_response["paths"], ["Owner/Path", "Shared/Path"])
def test_admin_receives_full_predefined_tags_and_paths_catalog(self) -> None:
"""Admin role receives full predefined values regardless of global-sharing scope."""
predefined_tags = [
{"value": "SharedTag", "global_shared": True},
{"value": "InternalTag", "global_shared": False},
{"value": "ImplicitPrivateTag"},
]
predefined_paths = [
{"value": "Shared/Path", "global_shared": True},
{"value": "Internal/Path", "global_shared": False},
{"value": "Implicit/Private"},
]
with (
patch.object(documents_routes_module, "read_predefined_tags_settings", return_value=predefined_tags),
patch.object(documents_routes_module, "read_predefined_paths_settings", return_value=predefined_paths),
):
tags_response = documents_routes_module.list_tags(
include_trashed=False,
auth_context=self._auth_context(UserRole.ADMIN),
session=self._SessionStub(values=[["admin-tag"]]),
)
paths_response = documents_routes_module.list_paths(
include_trashed=False,
auth_context=self._auth_context(UserRole.ADMIN),
session=self._SessionStub(values=["Admin/Path"]),
)
self.assertEqual(
tags_response["tags"],
["ImplicitPrivateTag", "InternalTag", "SharedTag", "admin-tag"],
)
self.assertEqual(
paths_response["paths"],
["Admin/Path", "Implicit/Private", "Internal/Path", "Shared/Path"],
)
class _FakeRedisPipeline: class _FakeRedisPipeline:
"""Provides deterministic Redis pipeline behavior for login throttle tests.""" """Provides deterministic Redis pipeline behavior for login throttle tests."""

View File

@@ -61,9 +61,15 @@ Ownership rules:
- `GET /documents/tags` - `GET /documents/tags`
- Query: `include_trashed` - Query: `include_trashed`
- Response: `{ "tags": string[] }` - Response: `{ "tags": string[] }`
- Behavior:
- all document-assigned tags visible to caller scope are included
- predefined tags are role-filtered: `admin` receives full catalog, `user` receives only entries with `global_shared=true`
- `GET /documents/paths` - `GET /documents/paths`
- Query: `include_trashed` - Query: `include_trashed`
- Response: `{ "paths": string[] }` - Response: `{ "paths": string[] }`
- Behavior:
- all document-assigned logical paths visible to caller scope are included
- predefined paths are role-filtered: `admin` receives full catalog, `user` receives only entries with `global_shared=true`
- `GET /documents/types` - `GET /documents/types`
- Query: `include_trashed` - Query: `include_trashed`
- Response: `{ "types": string[] }` - Response: `{ "types": string[] }`