Add developer documentation

This commit is contained in:
2026-05-15 13:59:30 -03:00
commit 0ce972a361
5 changed files with 283 additions and 0 deletions
+84
View File
@@ -0,0 +1,84 @@
# Code Map
This map describes the implemented Python modules and their main responsibilities.
## Application Entry Points
- `app/main.py` creates the FastAPI app, loads settings, configures logging, initializes the database, mounts static assets, starts the scheduler on startup, and defines HTML/API routes.
- `app/cli.py` exposes `python -m app.cli backlog` for backlog processing using the same `process_inbox()` pipeline as the admin API.
- `Dockerfile` runs `uvicorn app.main:app --host 0.0.0.0 --port 8000`.
## Configuration and Auth
- `app/config.py` defines Pydantic models for `app`, `security`, `llm`, `inboxes`, `known_senders`, and `alerts`.
- `load_settings()` reads `DMARC_SENTINEL_CONFIG` when set, otherwise `config/config.yml` if it exists, otherwise `config/config.example.yml`.
- `validate_llm_environment()` requires the configured OpenAI API key when `llm.provider` is `openai`, except when `DMARC_SENTINEL_ALLOW_NO_LLM_FOR_TESTS=true`.
- `app/auth.py` protects dashboard/admin routes with Basic Auth when enabled and protects homepage API routes with bearer token auth when required.
- `app/templates/settings.html` renders runtime settings and environment-variable presence as read-only information.
## Persistence
- `app/db.py` creates a SQLAlchemy engine from `settings.app.database_url`, provides session helpers, and creates tables with `Base.metadata.create_all()`.
- `app/models.py` defines tables for inbox status, mail messages, parsed reports, records, auth results, alerts, daily stats, and stored LLM reports.
- Duplicate report detection is based on the unique `Report.raw_xml_sha256` field.
## Ingestion Pipeline
The main ingestion path is implemented in `app/message_processor.py`:
1. `process_inbox()` opens an IMAP connection using `app/imap_client.py`.
2. It searches UIDs in either new-message mode or backlog mode.
3. It stores or reuses a `MailMessage` row for each fetched message.
4. `is_candidate_message()` checks recipients, subject text, and attachment hints.
5. `app/attachment_extractor.py` extracts `.xml`, `.gz`, and `.zip` report payloads, rejects unsafe ZIP paths, skips nested archives inside ZIPs, and enforces the decompressed size limit.
6. `app/dmarc_parser.py` parses DMARC aggregate XML with `defusedxml`.
7. `app/known_senders.py` classifies each record using configured CIDR allowlists, DKIM domains, SPF domains, or aligned DKIM evidence.
8. `app/analyzer.py` creates or updates deterministic alerts.
9. Email notifications are sent through `app/alerts.py` when configured and when a new alert is created or severity increases.
## Deterministic Analysis
`app/analyzer.py` produces alerts from stored report records. Implemented alert paths include:
- unknown source failed both SPF and DKIM alignment;
- known sender DMARC failure;
- quarantine or reject disposition;
- first observed failing source;
- first observed passing but unclassified source;
- high SPF or DKIM alignment failure rate;
- sudden unknown failure spike;
- new reporter;
- first observed policy for a domain;
- missing expected reporter based on recent report history.
Open alerts are deduplicated by a fingerprint composed from domain, alert type, and key. Existing open alerts are updated rather than duplicated.
## LLM Usage
- `app/llm.py` wraps the OpenAI client for JSON-only alert explanations and daily/weekly summaries.
- Alert existence and severity are deterministic in `app/analyzer.py`; LLM output is stored as explanation fields on alerts or in `LLMReport` rows.
- Fallback outputs are returned when LLM calls fail or validation fails.
- Config flags for raw XML and raw email are present in settings. The implemented LLM payloads use derived alert or summary facts.
## Scheduler
`app/scheduler.py` starts an APScheduler `BackgroundScheduler` with these jobs:
- `poll`: interval job using `settings.app.poll_interval_minutes`;
- `daily`: cron job at 07:00 in `settings.app.timezone`;
- `weekly`: cron job on Monday at 07:30 in `settings.app.timezone`.
Daily and weekly summary jobs aggregate stored data, call `LLMClient`, store `LLMReport` rows, and attempt digest email delivery through `app/alerts.py`.
## HTTP and Template Surfaces
HTML routes in `app/main.py` render templates from `app/templates/`:
- `/`: overview dashboard;
- `/domains/{domain}`: domain detail;
- `/reports/{report_id}`: report detail;
- `/alerts`: alert list and actions;
- `/inboxes`: inbox status and manual processing controls;
- `/settings`: read-only runtime configuration.
JSON routes include `/health`, homepage widget endpoints, domain/report/alert APIs, and admin processing endpoints. Authentication behavior is defined by route dependencies in `app/main.py`.