diff --git a/README.md b/README.md index a8407b0..812d808 100644 --- a/README.md +++ b/README.md @@ -75,34 +75,69 @@ docker compose down ## Security Must-Know Before Real User Deployment -This repository starts in a development-friendly mode. Before exposing it to real users or untrusted networks, verify these controls: +The items below port the `MUST KNOW User-Dependent Risks` from `REPORT.md` into explicit operator actions. -1. Environment mode and host binding: +### High: Development-first defaults can be promoted to production + +Avoid: - Set `APP_ENV=production`. -- Keep `HOST_BIND_IP=127.0.0.1` and publish through an HTTPS reverse proxy instead of broad host bind. +- Set `PROVIDER_BASE_URL_ALLOW_HTTP=false`. +- Set `PROVIDER_BASE_URL_ALLOW_PRIVATE_NETWORK=false`. +- Set a strict non-empty `PROVIDER_BASE_URL_ALLOWLIST` for approved provider hosts only. +- Set `PUBLIC_BASE_URL` to HTTPS. +- Restrict `CORS_ORIGINS` to exact production frontend origins. +- Use `REDIS_URL` with `rediss://`. +- Set `REDIS_SECURITY_MODE=strict`. +- Set `REDIS_TLS_MODE=required`. +- Keep `HOST_BIND_IP=127.0.0.1` and expose services only through an HTTPS reverse proxy. -2. Bootstrap credentials: -- Replace all `AUTH_BOOTSTRAP_*` values with strong unique passwords before first public deployment. -- Disable optional bootstrap user credentials unless they are needed. +Remedy: +- Immediately correct the values above and redeploy `api` and `worker` (`docker compose up -d api worker`). +- Rotate `AUTH_BOOTSTRAP_*` credentials, provider API keys, and Redis credentials if insecure values were used in a reachable environment. +- Re-check `.env.example` and `docker-compose.yml` before each production promotion. -3. Processing log text persistence: -- Keep `PROCESSING_LOG_STORE_MODEL_IO_TEXT=false` and `PROCESSING_LOG_STORE_PAYLOAD_TEXT=false` unless temporary debugging is required. -- Enabling these values can store sensitive prompt, response, and payload text. +### Medium: Login throttle IP identity depends on proxy trust model -4. Provider outbound restrictions: -- Keep `PROVIDER_BASE_URL_ALLOW_HTTP=false` and `PROVIDER_BASE_URL_ALLOW_PRIVATE_NETWORK=false`. -- Set a strict `PROVIDER_BASE_URL_ALLOWLIST` containing only approved provider hosts. +Current behavior: +- Login throttle identity currently uses `request.client.host` directly. -5. Public URL and CORS posture: -- Use HTTPS in `PUBLIC_BASE_URL`. -- Restrict `CORS_ORIGINS` to exact production frontend origins only. +Avoid: +- Deploy so the backend receives true client IP addresses and does not collapse all traffic to one proxy source IP. +- Validate lockout behavior with multiple client IPs before going live behind a proxy. -6. Redis transport security: -- For live deployments, use `REDIS_URL` with `rediss://`, set `REDIS_SECURITY_MODE=strict`, and set `REDIS_TLS_MODE=required`. +Remedy: +- If lockouts affect many users at once, temporarily increase `AUTH_LOGIN_FAILURE_LIMIT` and tune lockout timings to reduce impact while mitigation is in progress. +- Update network and proxy topology so client IP identity is preserved for the backend, then re-run lockout validation tests. -7. Development compose defaults: -- Review `.env.example` and `docker-compose.yml` security-related defaults before deployment. -- Do not promote development defaults unchanged into production. +### Medium: API documentation endpoints are exposed by default + +Avoid: +- Block public access to `/docs`, `/redoc`, and `/openapi.json` at the reverse proxy or edge firewall. +- Keep docs endpoints reachable only from trusted internal/admin networks. + +Remedy: +- Add deny rules for those paths immediately and reload the proxy. +- Verify those routes return `403` or `404` from untrusted networks. + +### Medium: Bearer token is stored in browser `sessionStorage` + +Avoid: +- Enforce strict CSP and disallow inline script execution where possible. +- Avoid rendering untrusted HTML or script-capable content in the frontend. +- Keep dependencies patched to reduce known XSS vectors. + +Remedy: +- If XSS is suspected, revoke active sessions, rotate privileged credentials, and redeploy frontend fixes before restoring user access. +- Treat exposed browser sessions as compromised until revocation and credential rotation are complete. + +### Low: Typesense transport defaults to HTTP on internal network + +Avoid: +- Keep Typesense on isolated internal networks only. +- Do not expose Typesense service ports directly to untrusted networks. + +Remedy: +- For cross-host or untrusted network paths, terminate TLS in front of Typesense (or use equivalent secure service networking) and require encrypted transport for all clients. ## Common Operations diff --git a/backend/app/api/routes_documents.py b/backend/app/api/routes_documents.py index 8fd0a5a..f120802 100644 --- a/backend/app/api/routes_documents.py +++ b/backend/app/api/routes_documents.py @@ -50,6 +50,31 @@ def _scope_document_statement_for_auth_context(statement, auth_context: AuthCont return statement.where(Document.owner_user_id == auth_context.user_id) +def _is_predefined_entry_visible_to_auth_context(entry: dict[str, object], auth_context: AuthContext) -> bool: + """Returns whether one predefined catalog entry is visible to the active caller role.""" + + if auth_context.role == UserRole.ADMIN: + return True + return bool(entry.get("global_shared", False)) + + +def _collect_visible_predefined_values( + entries: list[dict[str, object]], + *, + auth_context: AuthContext, +) -> set[str]: + """Collects normalized predefined values visible for the active caller role.""" + + visible_values: set[str] = set() + for entry in entries: + if not _is_predefined_entry_visible_to_auth_context(entry, auth_context): + continue + normalized = str(entry.get("value", "")).strip() + if normalized: + visible_values.add(normalized) + return visible_values + + def _ensure_document_access(document: Document, auth_context: AuthContext) -> None: """Enforces owner-level access for non-admin users and raises not-found on violations.""" @@ -397,9 +422,10 @@ def list_tags( rows = session.execute(statement).scalars().all() tags = {tag for row in rows for tag in row if tag} tags.update( - str(item.get("value", "")).strip() - for item in read_predefined_tags_settings() - if str(item.get("value", "")).strip() + _collect_visible_predefined_values( + read_predefined_tags_settings(), + auth_context=auth_context, + ) ) tags = sorted(tags) return {"tags": tags} @@ -421,9 +447,10 @@ def list_paths( rows = session.execute(statement).scalars().all() paths = {row for row in rows if row} paths.update( - str(item.get("value", "")).strip() - for item in read_predefined_paths_settings() - if str(item.get("value", "")).strip() + _collect_visible_predefined_values( + read_predefined_paths_settings(), + auth_context=auth_context, + ) ) paths = sorted(paths) return {"paths": paths} diff --git a/backend/tests/test_security_controls.py b/backend/tests/test_security_controls.py index f12de50..c99c834 100644 --- a/backend/tests/test_security_controls.py +++ b/backend/tests/test_security_controls.py @@ -63,6 +63,22 @@ if "fastapi" not in sys.modules: return decorator + def patch(self, *_args: object, **_kwargs: object): # type: ignore[no-untyped-def] + """Returns no-op decorator for PATCH route declarations.""" + + def decorator(func): # type: ignore[no-untyped-def] + return func + + return decorator + + def delete(self, *_args: object, **_kwargs: object): # type: ignore[no-untyped-def] + """Returns no-op decorator for DELETE route declarations.""" + + def decorator(func): # type: ignore[no-untyped-def] + return func + + return decorator + class _Request: """Minimal request placeholder for route function import compatibility.""" @@ -88,13 +104,52 @@ if "fastapi" not in sys.modules: return dependency + def _query(default=None, **_kwargs): # type: ignore[no-untyped-def] + """Returns FastAPI-like query defaults for dependency-light route imports.""" + + return default + + def _file(default=None, **_kwargs): # type: ignore[no-untyped-def] + """Returns FastAPI-like file defaults for dependency-light route imports.""" + + return default + + def _form(default=None, **_kwargs): # type: ignore[no-untyped-def] + """Returns FastAPI-like form defaults for dependency-light route imports.""" + + return default + + class _UploadFile: + """Minimal UploadFile placeholder for route import compatibility.""" + fastapi_stub.APIRouter = _APIRouter fastapi_stub.Depends = _depends + fastapi_stub.File = _file + fastapi_stub.Form = _form fastapi_stub.HTTPException = _HTTPException + fastapi_stub.Query = _query fastapi_stub.Request = _Request + fastapi_stub.UploadFile = _UploadFile fastapi_stub.status = _Status() sys.modules["fastapi"] = fastapi_stub +if "fastapi.responses" not in sys.modules: + fastapi_responses_stub = ModuleType("fastapi.responses") + + class _Response: + """Minimal response placeholder for route import compatibility.""" + + class _FileResponse(_Response): + """Minimal file response placeholder for route import compatibility.""" + + class _StreamingResponse(_Response): + """Minimal streaming response placeholder for route import compatibility.""" + + fastapi_responses_stub.Response = _Response + fastapi_responses_stub.FileResponse = _FileResponse + fastapi_responses_stub.StreamingResponse = _StreamingResponse + sys.modules["fastapi.responses"] = fastapi_responses_stub + if "fastapi.security" not in sys.modules: fastapi_security_stub = ModuleType("fastapi.security") @@ -267,8 +322,14 @@ if "app.services.handwriting_style" not in sys.modules: return None + def _delete_many_handwriting_style_documents(*_args: object, **_kwargs: object) -> None: + """No-op bulk style document delete stub for route import compatibility.""" + + return None + handwriting_style_stub.assign_handwriting_style = _assign_handwriting_style handwriting_style_stub.delete_handwriting_style_document = _delete_handwriting_style_document + handwriting_style_stub.delete_many_handwriting_style_documents = _delete_many_handwriting_style_documents sys.modules["app.services.handwriting_style"] = handwriting_style_stub if "app.services.routing_pipeline" not in sys.modules: @@ -304,6 +365,7 @@ from fastapi import HTTPException from app.api.auth import AuthContext, require_admin from app.api import routes_auth as auth_routes_module +from app.api import routes_documents as documents_routes_module from app.core import config as config_module from app.models.auth import UserRole from app.models.processing_log import sanitize_processing_log_payload_value, sanitize_processing_log_text @@ -359,6 +421,125 @@ class AuthDependencyTests(unittest.TestCase): self.assertEqual(resolved.role, UserRole.ADMIN) +class DocumentCatalogVisibilityTests(unittest.TestCase): + """Verifies predefined tag and path discovery visibility by caller role.""" + + class _ScalarSequence: + """Provides SQLAlchemy-like scalar result chaining for route unit tests.""" + + def __init__(self, values: list[object]) -> None: + self._values = values + + def scalars(self) -> "DocumentCatalogVisibilityTests._ScalarSequence": + """Returns self to emulate `.scalars().all()` call chains.""" + + return self + + def all(self) -> list[object]: + """Returns deterministic sequence values for route helper queries.""" + + return list(self._values) + + class _SessionStub: + """Returns a fixed scalar sequence for route metadata queries.""" + + def __init__(self, values: list[object]) -> None: + self._values = values + + def execute(self, _statement: object) -> "DocumentCatalogVisibilityTests._ScalarSequence": + """Ignores query details and returns deterministic scalar sequence results.""" + + return DocumentCatalogVisibilityTests._ScalarSequence(self._values) + + @staticmethod + def _auth_context(role: UserRole) -> AuthContext: + """Builds deterministic auth context fixtures for document discovery tests.""" + + return AuthContext( + user_id=uuid.uuid4(), + username=f"{role.value}-user", + role=role, + session_id=uuid.uuid4(), + expires_at=datetime.now(UTC), + ) + + def test_non_admin_only_receives_global_shared_predefined_tags_and_paths(self) -> None: + """User role receives only globally shared predefined values in discovery responses.""" + + session = self._SessionStub( + values=[ + ["owner-tag", ""], + ["owner-only-tag"], + ] + ) + predefined_tags = [ + {"value": "SharedTag", "global_shared": True}, + {"value": "InternalTag", "global_shared": False}, + {"value": "ImplicitPrivateTag"}, + ] + predefined_paths = [ + {"value": "Shared/Path", "global_shared": True}, + {"value": "Internal/Path", "global_shared": False}, + {"value": "Implicit/Private"}, + ] + + with ( + patch.object(documents_routes_module, "read_predefined_tags_settings", return_value=predefined_tags), + patch.object(documents_routes_module, "read_predefined_paths_settings", return_value=predefined_paths), + ): + tags_response = documents_routes_module.list_tags( + include_trashed=False, + auth_context=self._auth_context(UserRole.USER), + session=session, + ) + paths_response = documents_routes_module.list_paths( + include_trashed=False, + auth_context=self._auth_context(UserRole.USER), + session=self._SessionStub(values=["Owner/Path"]), + ) + + self.assertEqual(tags_response["tags"], ["SharedTag", "owner-only-tag", "owner-tag"]) + self.assertEqual(paths_response["paths"], ["Owner/Path", "Shared/Path"]) + + def test_admin_receives_full_predefined_tags_and_paths_catalog(self) -> None: + """Admin role receives full predefined values regardless of global-sharing scope.""" + + predefined_tags = [ + {"value": "SharedTag", "global_shared": True}, + {"value": "InternalTag", "global_shared": False}, + {"value": "ImplicitPrivateTag"}, + ] + predefined_paths = [ + {"value": "Shared/Path", "global_shared": True}, + {"value": "Internal/Path", "global_shared": False}, + {"value": "Implicit/Private"}, + ] + + with ( + patch.object(documents_routes_module, "read_predefined_tags_settings", return_value=predefined_tags), + patch.object(documents_routes_module, "read_predefined_paths_settings", return_value=predefined_paths), + ): + tags_response = documents_routes_module.list_tags( + include_trashed=False, + auth_context=self._auth_context(UserRole.ADMIN), + session=self._SessionStub(values=[["admin-tag"]]), + ) + paths_response = documents_routes_module.list_paths( + include_trashed=False, + auth_context=self._auth_context(UserRole.ADMIN), + session=self._SessionStub(values=["Admin/Path"]), + ) + + self.assertEqual( + tags_response["tags"], + ["ImplicitPrivateTag", "InternalTag", "SharedTag", "admin-tag"], + ) + self.assertEqual( + paths_response["paths"], + ["Admin/Path", "Implicit/Private", "Internal/Path", "Shared/Path"], + ) + + class _FakeRedisPipeline: """Provides deterministic Redis pipeline behavior for login throttle tests.""" diff --git a/doc/api-contract.md b/doc/api-contract.md index db4379e..64fcedc 100644 --- a/doc/api-contract.md +++ b/doc/api-contract.md @@ -61,9 +61,15 @@ Ownership rules: - `GET /documents/tags` - Query: `include_trashed` - Response: `{ "tags": string[] }` + - Behavior: + - all document-assigned tags visible to caller scope are included + - predefined tags are role-filtered: `admin` receives full catalog, `user` receives only entries with `global_shared=true` - `GET /documents/paths` - Query: `include_trashed` - Response: `{ "paths": string[] }` + - Behavior: + - all document-assigned logical paths visible to caller scope are included + - predefined paths are role-filtered: `admin` receives full catalog, `user` receives only entries with `global_shared=true` - `GET /documents/types` - Query: `include_trashed` - Response: `{ "types": string[] }`