Initial commit

This commit is contained in:
2026-05-16 12:05:36 -03:00
parent 0ce972a361
commit e82cee97a7
65 changed files with 9051 additions and 5 deletions
+16
View File
@@ -0,0 +1,16 @@
# DMARC Watch Developer Docs
This folder documents the implemented repository for developers and LLM agents. It is a code-navigation aid, not end-user product documentation.
## Index
- [Repository structure](repository.md): top-level files, application package layout, tests, design references, and generated runtime paths.
- [Runtime and development workflow](runtime.md): configuration loading, local test commands, Docker entry points, scheduler jobs, and CLI/API processing paths.
- [Code map](code-map.md): major modules, data flow, persistence models, alerting, LLM usage, and dashboard/API surfaces.
- [Agent notes](agent-notes.md): scoped guidance for future code changes and documentation work in this repository.
## Scope Notes
- The repository is named DMARC Watch in the orchestration context, while the implemented FastAPI application currently identifies itself as `DMARC Sentinel`.
- Settings are loaded from host-controlled files and environment variables. The implemented settings page is read-only.
- These docs describe current files only. They do not define migration, compatibility, or product roadmap decisions.
+26
View File
@@ -0,0 +1,26 @@
# Agent Notes
Use this file as a quick orientation guide before editing the repository.
## Ground Rules
- Keep documentation and code claims aligned with implemented files.
- Do not describe settings as editable through the web application. Runtime settings come from `config/config.yml` or `DMARC_SENTINEL_CONFIG`, plus environment variables. `config/config.example.yml` is only a template.
- Treat `data/` and `logs/` as runtime output locations. The repository ignores SQLite databases and log files.
- Do not add compatibility, migration, or legacy-support behavior unless a task explicitly asks for an implemented change.
- UI work should follow the mockups in `design/`. If documenting UI icons, respect the project policy that icons must be material icons.
## Fast Navigation
- Start at `app/main.py` for HTTP routes, template rendering, API auth dependencies, and admin processing endpoints.
- Read `app/config.py` before changing runtime behavior because settings are Pydantic models loaded at import time by `get_settings()`.
- Follow ingestion through `app/message_processor.py`, then `app/attachment_extractor.py`, `app/dmarc_parser.py`, `app/known_senders.py`, and `app/analyzer.py`.
- Use `app/models.py` for database table fields and relationships. Tables are created by `app.db.init_db()` through SQLAlchemy metadata.
- Use `tests/` for examples of expected parser, extractor, analyzer, storage, homepage, and LLM fallback behavior.
## Documentation Change Checklist
1. Inspect the code path being documented.
2. Keep filenames in `doc/` lowercase, except `README.md`.
3. Prefer links to existing files and commands that are present in this repository.
4. Validate with lightweight checks appropriate to the change. For docs-only changes, at minimum verify Markdown filenames and run the test suite when the environment has dependencies installed.
+84
View File
@@ -0,0 +1,84 @@
# Code Map
This map describes the implemented Python modules and their main responsibilities.
## Application Entry Points
- `app/main.py` creates the FastAPI app, loads settings, configures logging, initializes the database, mounts static assets, starts the scheduler on startup, and defines HTML/API routes.
- `app/cli.py` exposes `python -m app.cli backlog` for backlog processing using the same `process_inbox()` pipeline as the admin API.
- `Dockerfile` runs `uvicorn app.main:app --host 0.0.0.0 --port 8000`.
## Configuration and Auth
- `app/config.py` defines Pydantic models for `app`, `security`, `llm`, `inboxes`, `known_senders`, and `alerts`.
- `load_settings()` reads `DMARC_SENTINEL_CONFIG` when set, otherwise `config/config.yml`; missing runtime config is a startup error.
- `validate_llm_environment()` requires the configured OpenAI API key when `llm.provider` is `openai`, except when `DMARC_SENTINEL_ALLOW_NO_LLM_FOR_TESTS=true`.
- `app/auth.py` protects dashboard/admin routes with Basic Auth when enabled and protects homepage API routes with bearer token auth when required.
- `app/templates/settings.html` renders runtime settings and environment-variable presence as read-only information.
## Persistence
- `app/db.py` creates a SQLAlchemy engine from `settings.app.database_url`, provides session helpers, and creates tables with `Base.metadata.create_all()`.
- `app/models.py` defines tables for inbox status, mail messages, parsed reports, records, auth results, alerts, daily stats, and stored LLM reports.
- Duplicate report detection is based on the unique `Report.raw_xml_sha256` field.
## Ingestion Pipeline
The main ingestion path is implemented in `app/message_processor.py`:
1. `process_inbox()` opens an IMAP connection using `app/imap_client.py`.
2. It searches UIDs in either new-message mode or backlog mode.
3. It stores or reuses a `MailMessage` row for each fetched message.
4. `is_candidate_message()` checks recipients, subject text, and attachment hints.
5. `app/attachment_extractor.py` extracts `.xml`, `.gz`, and `.zip` report payloads, rejects unsafe ZIP paths, skips nested archives inside ZIPs, and enforces the decompressed size limit.
6. `app/dmarc_parser.py` parses DMARC aggregate XML with `defusedxml`.
7. `app/known_senders.py` classifies each record using configured CIDR allowlists, DKIM domains, SPF domains, or aligned DKIM evidence.
8. `app/analyzer.py` creates or updates deterministic alerts.
9. Email notifications are sent through `app/alerts.py` when configured and when a new alert is created or severity increases.
## Deterministic Analysis
`app/analyzer.py` produces alerts from stored report records. Implemented alert paths include:
- unknown source failed both SPF and DKIM alignment;
- known sender DMARC failure;
- quarantine or reject disposition;
- first observed failing source;
- first observed passing but unclassified source;
- high SPF or DKIM alignment failure rate;
- sudden unknown failure spike;
- new reporter;
- first observed policy for a domain;
- missing expected reporter based on recent report history.
Open alerts are deduplicated by a fingerprint composed from domain, alert type, and key. Existing open alerts are updated rather than duplicated.
## LLM Usage
- `app/llm.py` wraps the OpenAI client for JSON-only alert explanations and daily/weekly summaries.
- Alert existence and severity are deterministic in `app/analyzer.py`; LLM output is stored as explanation fields on alerts or in `LLMReport` rows.
- Fallback outputs are returned when LLM calls fail or validation fails.
- Config flags for raw XML and raw email are present in settings. The implemented LLM payloads use derived alert or summary facts.
## Scheduler
`app/scheduler.py` starts an APScheduler `BackgroundScheduler` with these jobs:
- `poll`: interval job using `settings.app.poll_interval_minutes`;
- `daily`: cron job at 07:00 in `settings.app.timezone`;
- `weekly`: cron job on Monday at 07:30 in `settings.app.timezone`.
Daily and weekly summary jobs aggregate stored data, call `LLMClient`, store `LLMReport` rows, and attempt digest email delivery through `app/alerts.py`.
## HTTP and Template Surfaces
HTML routes in `app/main.py` render templates from `app/templates/`:
- `/`: overview dashboard;
- `/domains/{domain}`: domain detail;
- `/reports/{report_id}`: report detail;
- `/alerts`: alert list and actions;
- `/inboxes`: inbox status and manual processing controls;
- `/settings`: read-only runtime configuration.
JSON routes include `/health`, homepage widget endpoints, domain/report/alert APIs, and admin processing endpoints. Authentication behavior is defined by route dependencies in `app/main.py`.
+61
View File
@@ -0,0 +1,61 @@
# Repository Structure
The repository contains a FastAPI application for monitoring DMARC aggregate reports. The orchestration context names the repository DMARC Watch; the implemented application title and many runtime names currently use `DMARC Sentinel`.
## Top-Level Files
- `README.md`: current user-facing setup and operational notes.
- `requirements.txt`: Python runtime and test dependencies.
- `pytest.ini`: adds the repository root to `pythonpath`.
- `Dockerfile`: Python 3.12 image that installs requirements, copies `app/`, creates runtime directories, and starts Uvicorn.
- `docker-compose.yml`: builds the app service, mounts `config/` read-only, mounts `data/` and `logs/`, publishes port `8000`, and attaches to the external `npm_proxy` Docker network.
- `config/config.example.yml`: host-controlled runtime configuration template.
- `.env.example`: environment variable template for IMAP, dashboard auth, homepage token, OpenAI, and SMTP settings.
- `dmarc-sentinel-codex-implementation-spec.md`: implementation specification document present in the repository.
## Application Package
`app/` is the main Python package.
- `main.py`: FastAPI app, startup, HTML routes, JSON routes, and admin endpoints.
- `config.py`: settings models and configuration loading.
- `db.py`: SQLAlchemy engine, session helpers, table initialization, and health check.
- `models.py`: SQLAlchemy ORM models.
- `message_processor.py`: IMAP message scanning, report import, analysis, message status updates, and processing summaries.
- `imap_client.py`: IMAP connection, UID search/fetch, mark-seen, and move operations.
- `attachment_extractor.py`: XML, gzip, and ZIP attachment extraction.
- `dmarc_parser.py`: DMARC aggregate XML parser.
- `known_senders.py`: deterministic sender classification.
- `analyzer.py`: deterministic alert creation and update logic.
- `alerts.py`: SMTP alert and digest delivery.
- `llm.py`: OpenAI JSON calls and fallback summary/explanation outputs.
- `scheduler.py`: polling, daily aggregation, and weekly summary jobs.
- `homepage.py`: homepage widget and domain summary payload builders.
- `schemas.py`: Pydantic request models for admin processing endpoints.
- `auth.py`: dashboard Basic Auth and homepage bearer token dependencies.
- `smoke_data.py`: helper for seeding smoke data.
- `templates/`: Jinja2 templates for the server-rendered dashboard.
- `static/app.css`: dashboard styles.
## Tests and Fixtures
`tests/` contains pytest coverage for current behavior:
- parser behavior: `tests/test_parser.py`;
- attachment extraction and message attachment detection: `tests/test_attachment_extractor.py`;
- duplicate storage behavior: `tests/test_storage.py`;
- alert analysis behavior: `tests/test_analyzer.py`;
- homepage summary behavior: `tests/test_homepage.py`;
- LLM fallback validation behavior: `tests/test_llm.py`;
- test environment defaults: `tests/conftest.py`;
- sample config and XML fixtures: `tests/fixtures/`.
## Design References
`design/` contains mockups and design reference files. These are reference artifacts for UI work, not runtime code.
## Runtime Output Paths
- `data/` is used for SQLite database files.
- `logs/` is used for `logs/dmarc-sentinel.log`.
- `.gitignore` excludes SQLite databases, log files, bytecode, virtual environments, and pytest cache output.
+97
View File
@@ -0,0 +1,97 @@
# Runtime and Development Workflow
This file summarizes how the implemented repository runs and how developers can validate changes.
## Configuration Loading
`app/config.py` loads settings in this order:
1. `DMARC_SENTINEL_CONFIG`, when set;
2. `config/config.yml`.
If neither path exists, startup fails with an instruction to create `config/config.yml` from `config/config.example.yml`.
Secrets are read from environment variables named by the loaded settings. The settings page renders loaded values and environment-variable presence as read-only status; the application does not implement settings writes through the dashboard.
When `llm.provider` is `openai`, `validate_llm_environment()` requires the configured API key environment variable unless `DMARC_SENTINEL_ALLOW_NO_LLM_FOR_TESTS=true`.
## Local Test Workflow
Install dependencies in a Python environment, then run:
```bash
DMARC_SENTINEL_ALLOW_NO_LLM_FOR_TESTS=true python3 -m pytest
```
`tests/conftest.py` also sets test defaults for the LLM bypass, auth credentials, and homepage token.
There is no repository-local Ruff configuration or requirement in `requirements.txt`; if linting is added, document the exact command with the added tooling.
## Docker Runtime
The Docker image starts the FastAPI app with:
```bash
uvicorn app.main:app --host 0.0.0.0 --port 8000
```
The compose service:
- loads environment variables from `.env`;
- mounts `./config` at `/app/config` as read-only;
- mounts `./data` at `/app/data`;
- mounts `./logs` at `/app/logs`;
- publishes `8000:8000`;
- attaches to the external `npm_proxy` network with static address `192.168.99.18`.
The app initializes database tables on import through `app.main` calling `init_db()`.
## CLI Backlog Processing
Backlog processing is implemented by `app/cli.py`:
```bash
python -m app.cli backlog --inbox <inbox-id>
```
Available arguments are:
- `--folder`;
- `--since YYYY-MM-DD`;
- `--before YYYY-MM-DD`;
- `--limit`;
- `--dry-run`;
- `--reprocess`;
- `--mark-seen`.
The CLI calls `process_inbox()` in backlog mode and prints the resulting `ProcessingSummary` counters.
## HTTP Processing Paths
`app/main.py` exposes admin endpoints that call the same processing pipeline:
- `POST /api/admin/process-now`: processes the configured inbox using request fields from `ProcessNowRequest`.
- `POST /api/admin/backlog`: runs backlog processing using request fields from `BacklogRequest`.
Both endpoints use dashboard Basic Auth dependencies.
## Scheduled Work
On FastAPI startup, `start_scheduler(settings)` registers:
- polling for all enabled inboxes every `settings.app.poll_interval_minutes`;
- daily summaries at 07:00 in `settings.app.timezone`;
- weekly summaries on Monday at 07:30 in `settings.app.timezone`.
Polling uses `process_inbox()` with `mode="new"`. Summary jobs aggregate stored records, call the LLM wrapper, store `LLMReport` rows, and send digest email when SMTP settings allow it.
## Lightweight Validation Commands
For documentation-only changes, useful checks are:
```bash
find doc -type f -maxdepth 1 -print
python3 -m pytest
```
The filename rule for this docs folder is: every Markdown file under `doc/` must be lowercase, except `README.md`.