Initial commit
This commit is contained in:
@@ -0,0 +1,16 @@
|
||||
# DMARC Watch Developer Docs
|
||||
|
||||
This folder documents the implemented repository for developers and LLM agents. It is a code-navigation aid, not end-user product documentation.
|
||||
|
||||
## Index
|
||||
|
||||
- [Repository structure](repository.md): top-level files, application package layout, tests, design references, and generated runtime paths.
|
||||
- [Runtime and development workflow](runtime.md): configuration loading, local test commands, Docker entry points, scheduler jobs, and CLI/API processing paths.
|
||||
- [Code map](code-map.md): major modules, data flow, persistence models, alerting, LLM usage, and dashboard/API surfaces.
|
||||
- [Agent notes](agent-notes.md): scoped guidance for future code changes and documentation work in this repository.
|
||||
|
||||
## Scope Notes
|
||||
|
||||
- The repository is named DMARC Watch in the orchestration context, while the implemented FastAPI application currently identifies itself as `DMARC Sentinel`.
|
||||
- Settings are loaded from host-controlled files and environment variables. The implemented settings page is read-only.
|
||||
- These docs describe current files only. They do not define migration, compatibility, or product roadmap decisions.
|
||||
@@ -0,0 +1,26 @@
|
||||
# Agent Notes
|
||||
|
||||
Use this file as a quick orientation guide before editing the repository.
|
||||
|
||||
## Ground Rules
|
||||
|
||||
- Keep documentation and code claims aligned with implemented files.
|
||||
- Do not describe settings as editable through the web application. Runtime settings come from `config/config.yml` or `DMARC_SENTINEL_CONFIG`, plus environment variables. `config/config.example.yml` is only a template.
|
||||
- Treat `data/` and `logs/` as runtime output locations. The repository ignores SQLite databases and log files.
|
||||
- Do not add compatibility, migration, or legacy-support behavior unless a task explicitly asks for an implemented change.
|
||||
- UI work should follow the mockups in `design/`. If documenting UI icons, respect the project policy that icons must be material icons.
|
||||
|
||||
## Fast Navigation
|
||||
|
||||
- Start at `app/main.py` for HTTP routes, template rendering, API auth dependencies, and admin processing endpoints.
|
||||
- Read `app/config.py` before changing runtime behavior because settings are Pydantic models loaded at import time by `get_settings()`.
|
||||
- Follow ingestion through `app/message_processor.py`, then `app/attachment_extractor.py`, `app/dmarc_parser.py`, `app/known_senders.py`, and `app/analyzer.py`.
|
||||
- Use `app/models.py` for database table fields and relationships. Tables are created by `app.db.init_db()` through SQLAlchemy metadata.
|
||||
- Use `tests/` for examples of expected parser, extractor, analyzer, storage, homepage, and LLM fallback behavior.
|
||||
|
||||
## Documentation Change Checklist
|
||||
|
||||
1. Inspect the code path being documented.
|
||||
2. Keep filenames in `doc/` lowercase, except `README.md`.
|
||||
3. Prefer links to existing files and commands that are present in this repository.
|
||||
4. Validate with lightweight checks appropriate to the change. For docs-only changes, at minimum verify Markdown filenames and run the test suite when the environment has dependencies installed.
|
||||
@@ -0,0 +1,84 @@
|
||||
# Code Map
|
||||
|
||||
This map describes the implemented Python modules and their main responsibilities.
|
||||
|
||||
## Application Entry Points
|
||||
|
||||
- `app/main.py` creates the FastAPI app, loads settings, configures logging, initializes the database, mounts static assets, starts the scheduler on startup, and defines HTML/API routes.
|
||||
- `app/cli.py` exposes `python -m app.cli backlog` for backlog processing using the same `process_inbox()` pipeline as the admin API.
|
||||
- `Dockerfile` runs `uvicorn app.main:app --host 0.0.0.0 --port 8000`.
|
||||
|
||||
## Configuration and Auth
|
||||
|
||||
- `app/config.py` defines Pydantic models for `app`, `security`, `llm`, `inboxes`, `known_senders`, and `alerts`.
|
||||
- `load_settings()` reads `DMARC_SENTINEL_CONFIG` when set, otherwise `config/config.yml`; missing runtime config is a startup error.
|
||||
- `validate_llm_environment()` requires the configured OpenAI API key when `llm.provider` is `openai`, except when `DMARC_SENTINEL_ALLOW_NO_LLM_FOR_TESTS=true`.
|
||||
- `app/auth.py` protects dashboard/admin routes with Basic Auth when enabled and protects homepage API routes with bearer token auth when required.
|
||||
- `app/templates/settings.html` renders runtime settings and environment-variable presence as read-only information.
|
||||
|
||||
## Persistence
|
||||
|
||||
- `app/db.py` creates a SQLAlchemy engine from `settings.app.database_url`, provides session helpers, and creates tables with `Base.metadata.create_all()`.
|
||||
- `app/models.py` defines tables for inbox status, mail messages, parsed reports, records, auth results, alerts, daily stats, and stored LLM reports.
|
||||
- Duplicate report detection is based on the unique `Report.raw_xml_sha256` field.
|
||||
|
||||
## Ingestion Pipeline
|
||||
|
||||
The main ingestion path is implemented in `app/message_processor.py`:
|
||||
|
||||
1. `process_inbox()` opens an IMAP connection using `app/imap_client.py`.
|
||||
2. It searches UIDs in either new-message mode or backlog mode.
|
||||
3. It stores or reuses a `MailMessage` row for each fetched message.
|
||||
4. `is_candidate_message()` checks recipients, subject text, and attachment hints.
|
||||
5. `app/attachment_extractor.py` extracts `.xml`, `.gz`, and `.zip` report payloads, rejects unsafe ZIP paths, skips nested archives inside ZIPs, and enforces the decompressed size limit.
|
||||
6. `app/dmarc_parser.py` parses DMARC aggregate XML with `defusedxml`.
|
||||
7. `app/known_senders.py` classifies each record using configured CIDR allowlists, DKIM domains, SPF domains, or aligned DKIM evidence.
|
||||
8. `app/analyzer.py` creates or updates deterministic alerts.
|
||||
9. Email notifications are sent through `app/alerts.py` when configured and when a new alert is created or severity increases.
|
||||
|
||||
## Deterministic Analysis
|
||||
|
||||
`app/analyzer.py` produces alerts from stored report records. Implemented alert paths include:
|
||||
|
||||
- unknown source failed both SPF and DKIM alignment;
|
||||
- known sender DMARC failure;
|
||||
- quarantine or reject disposition;
|
||||
- first observed failing source;
|
||||
- first observed passing but unclassified source;
|
||||
- high SPF or DKIM alignment failure rate;
|
||||
- sudden unknown failure spike;
|
||||
- new reporter;
|
||||
- first observed policy for a domain;
|
||||
- missing expected reporter based on recent report history.
|
||||
|
||||
Open alerts are deduplicated by a fingerprint composed from domain, alert type, and key. Existing open alerts are updated rather than duplicated.
|
||||
|
||||
## LLM Usage
|
||||
|
||||
- `app/llm.py` wraps the OpenAI client for JSON-only alert explanations and daily/weekly summaries.
|
||||
- Alert existence and severity are deterministic in `app/analyzer.py`; LLM output is stored as explanation fields on alerts or in `LLMReport` rows.
|
||||
- Fallback outputs are returned when LLM calls fail or validation fails.
|
||||
- Config flags for raw XML and raw email are present in settings. The implemented LLM payloads use derived alert or summary facts.
|
||||
|
||||
## Scheduler
|
||||
|
||||
`app/scheduler.py` starts an APScheduler `BackgroundScheduler` with these jobs:
|
||||
|
||||
- `poll`: interval job using `settings.app.poll_interval_minutes`;
|
||||
- `daily`: cron job at 07:00 in `settings.app.timezone`;
|
||||
- `weekly`: cron job on Monday at 07:30 in `settings.app.timezone`.
|
||||
|
||||
Daily and weekly summary jobs aggregate stored data, call `LLMClient`, store `LLMReport` rows, and attempt digest email delivery through `app/alerts.py`.
|
||||
|
||||
## HTTP and Template Surfaces
|
||||
|
||||
HTML routes in `app/main.py` render templates from `app/templates/`:
|
||||
|
||||
- `/`: overview dashboard;
|
||||
- `/domains/{domain}`: domain detail;
|
||||
- `/reports/{report_id}`: report detail;
|
||||
- `/alerts`: alert list and actions;
|
||||
- `/inboxes`: inbox status and manual processing controls;
|
||||
- `/settings`: read-only runtime configuration.
|
||||
|
||||
JSON routes include `/health`, homepage widget endpoints, domain/report/alert APIs, and admin processing endpoints. Authentication behavior is defined by route dependencies in `app/main.py`.
|
||||
@@ -0,0 +1,61 @@
|
||||
# Repository Structure
|
||||
|
||||
The repository contains a FastAPI application for monitoring DMARC aggregate reports. The orchestration context names the repository DMARC Watch; the implemented application title and many runtime names currently use `DMARC Sentinel`.
|
||||
|
||||
## Top-Level Files
|
||||
|
||||
- `README.md`: current user-facing setup and operational notes.
|
||||
- `requirements.txt`: Python runtime and test dependencies.
|
||||
- `pytest.ini`: adds the repository root to `pythonpath`.
|
||||
- `Dockerfile`: Python 3.12 image that installs requirements, copies `app/`, creates runtime directories, and starts Uvicorn.
|
||||
- `docker-compose.yml`: builds the app service, mounts `config/` read-only, mounts `data/` and `logs/`, publishes port `8000`, and attaches to the external `npm_proxy` Docker network.
|
||||
- `config/config.example.yml`: host-controlled runtime configuration template.
|
||||
- `.env.example`: environment variable template for IMAP, dashboard auth, homepage token, OpenAI, and SMTP settings.
|
||||
- `dmarc-sentinel-codex-implementation-spec.md`: implementation specification document present in the repository.
|
||||
|
||||
## Application Package
|
||||
|
||||
`app/` is the main Python package.
|
||||
|
||||
- `main.py`: FastAPI app, startup, HTML routes, JSON routes, and admin endpoints.
|
||||
- `config.py`: settings models and configuration loading.
|
||||
- `db.py`: SQLAlchemy engine, session helpers, table initialization, and health check.
|
||||
- `models.py`: SQLAlchemy ORM models.
|
||||
- `message_processor.py`: IMAP message scanning, report import, analysis, message status updates, and processing summaries.
|
||||
- `imap_client.py`: IMAP connection, UID search/fetch, mark-seen, and move operations.
|
||||
- `attachment_extractor.py`: XML, gzip, and ZIP attachment extraction.
|
||||
- `dmarc_parser.py`: DMARC aggregate XML parser.
|
||||
- `known_senders.py`: deterministic sender classification.
|
||||
- `analyzer.py`: deterministic alert creation and update logic.
|
||||
- `alerts.py`: SMTP alert and digest delivery.
|
||||
- `llm.py`: OpenAI JSON calls and fallback summary/explanation outputs.
|
||||
- `scheduler.py`: polling, daily aggregation, and weekly summary jobs.
|
||||
- `homepage.py`: homepage widget and domain summary payload builders.
|
||||
- `schemas.py`: Pydantic request models for admin processing endpoints.
|
||||
- `auth.py`: dashboard Basic Auth and homepage bearer token dependencies.
|
||||
- `smoke_data.py`: helper for seeding smoke data.
|
||||
- `templates/`: Jinja2 templates for the server-rendered dashboard.
|
||||
- `static/app.css`: dashboard styles.
|
||||
|
||||
## Tests and Fixtures
|
||||
|
||||
`tests/` contains pytest coverage for current behavior:
|
||||
|
||||
- parser behavior: `tests/test_parser.py`;
|
||||
- attachment extraction and message attachment detection: `tests/test_attachment_extractor.py`;
|
||||
- duplicate storage behavior: `tests/test_storage.py`;
|
||||
- alert analysis behavior: `tests/test_analyzer.py`;
|
||||
- homepage summary behavior: `tests/test_homepage.py`;
|
||||
- LLM fallback validation behavior: `tests/test_llm.py`;
|
||||
- test environment defaults: `tests/conftest.py`;
|
||||
- sample config and XML fixtures: `tests/fixtures/`.
|
||||
|
||||
## Design References
|
||||
|
||||
`design/` contains mockups and design reference files. These are reference artifacts for UI work, not runtime code.
|
||||
|
||||
## Runtime Output Paths
|
||||
|
||||
- `data/` is used for SQLite database files.
|
||||
- `logs/` is used for `logs/dmarc-sentinel.log`.
|
||||
- `.gitignore` excludes SQLite databases, log files, bytecode, virtual environments, and pytest cache output.
|
||||
@@ -0,0 +1,97 @@
|
||||
# Runtime and Development Workflow
|
||||
|
||||
This file summarizes how the implemented repository runs and how developers can validate changes.
|
||||
|
||||
## Configuration Loading
|
||||
|
||||
`app/config.py` loads settings in this order:
|
||||
|
||||
1. `DMARC_SENTINEL_CONFIG`, when set;
|
||||
2. `config/config.yml`.
|
||||
|
||||
If neither path exists, startup fails with an instruction to create `config/config.yml` from `config/config.example.yml`.
|
||||
|
||||
Secrets are read from environment variables named by the loaded settings. The settings page renders loaded values and environment-variable presence as read-only status; the application does not implement settings writes through the dashboard.
|
||||
|
||||
When `llm.provider` is `openai`, `validate_llm_environment()` requires the configured API key environment variable unless `DMARC_SENTINEL_ALLOW_NO_LLM_FOR_TESTS=true`.
|
||||
|
||||
## Local Test Workflow
|
||||
|
||||
Install dependencies in a Python environment, then run:
|
||||
|
||||
```bash
|
||||
DMARC_SENTINEL_ALLOW_NO_LLM_FOR_TESTS=true python3 -m pytest
|
||||
```
|
||||
|
||||
`tests/conftest.py` also sets test defaults for the LLM bypass, auth credentials, and homepage token.
|
||||
|
||||
There is no repository-local Ruff configuration or requirement in `requirements.txt`; if linting is added, document the exact command with the added tooling.
|
||||
|
||||
## Docker Runtime
|
||||
|
||||
The Docker image starts the FastAPI app with:
|
||||
|
||||
```bash
|
||||
uvicorn app.main:app --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
The compose service:
|
||||
|
||||
- loads environment variables from `.env`;
|
||||
- mounts `./config` at `/app/config` as read-only;
|
||||
- mounts `./data` at `/app/data`;
|
||||
- mounts `./logs` at `/app/logs`;
|
||||
- publishes `8000:8000`;
|
||||
- attaches to the external `npm_proxy` network with static address `192.168.99.18`.
|
||||
|
||||
The app initializes database tables on import through `app.main` calling `init_db()`.
|
||||
|
||||
## CLI Backlog Processing
|
||||
|
||||
Backlog processing is implemented by `app/cli.py`:
|
||||
|
||||
```bash
|
||||
python -m app.cli backlog --inbox <inbox-id>
|
||||
```
|
||||
|
||||
Available arguments are:
|
||||
|
||||
- `--folder`;
|
||||
- `--since YYYY-MM-DD`;
|
||||
- `--before YYYY-MM-DD`;
|
||||
- `--limit`;
|
||||
- `--dry-run`;
|
||||
- `--reprocess`;
|
||||
- `--mark-seen`.
|
||||
|
||||
The CLI calls `process_inbox()` in backlog mode and prints the resulting `ProcessingSummary` counters.
|
||||
|
||||
## HTTP Processing Paths
|
||||
|
||||
`app/main.py` exposes admin endpoints that call the same processing pipeline:
|
||||
|
||||
- `POST /api/admin/process-now`: processes the configured inbox using request fields from `ProcessNowRequest`.
|
||||
- `POST /api/admin/backlog`: runs backlog processing using request fields from `BacklogRequest`.
|
||||
|
||||
Both endpoints use dashboard Basic Auth dependencies.
|
||||
|
||||
## Scheduled Work
|
||||
|
||||
On FastAPI startup, `start_scheduler(settings)` registers:
|
||||
|
||||
- polling for all enabled inboxes every `settings.app.poll_interval_minutes`;
|
||||
- daily summaries at 07:00 in `settings.app.timezone`;
|
||||
- weekly summaries on Monday at 07:30 in `settings.app.timezone`.
|
||||
|
||||
Polling uses `process_inbox()` with `mode="new"`. Summary jobs aggregate stored records, call the LLM wrapper, store `LLMReport` rows, and send digest email when SMTP settings allow it.
|
||||
|
||||
## Lightweight Validation Commands
|
||||
|
||||
For documentation-only changes, useful checks are:
|
||||
|
||||
```bash
|
||||
find doc -type f -maxdepth 1 -print
|
||||
python3 -m pytest
|
||||
```
|
||||
|
||||
The filename rule for this docs folder is: every Markdown file under `doc/` must be lowercase, except `README.md`.
|
||||
Reference in New Issue
Block a user