Files
DMARC-Sentinel/docs/code-map.md
T
2026-05-16 12:05:36 -03:00

4.6 KiB

Code Map

This map describes the implemented Python modules and their main responsibilities.

Application Entry Points

  • app/main.py creates the FastAPI app, loads settings, configures logging, initializes the database, mounts static assets, starts the scheduler on startup, and defines HTML/API routes.
  • app/cli.py exposes python -m app.cli backlog for backlog processing using the same process_inbox() pipeline as the admin API.
  • Dockerfile runs uvicorn app.main:app --host 0.0.0.0 --port 8000.

Configuration and Auth

  • app/config.py defines Pydantic models for app, security, llm, inboxes, known_senders, and alerts.
  • load_settings() reads DMARC_SENTINEL_CONFIG when set, otherwise config/config.yml; missing runtime config is a startup error.
  • validate_llm_environment() requires the configured OpenAI API key when llm.provider is openai, except when DMARC_SENTINEL_ALLOW_NO_LLM_FOR_TESTS=true.
  • app/auth.py protects dashboard/admin routes with Basic Auth when enabled and protects homepage API routes with bearer token auth when required.
  • app/templates/settings.html renders runtime settings and environment-variable presence as read-only information.

Persistence

  • app/db.py creates a SQLAlchemy engine from settings.app.database_url, provides session helpers, and creates tables with Base.metadata.create_all().
  • app/models.py defines tables for inbox status, mail messages, parsed reports, records, auth results, alerts, daily stats, and stored LLM reports.
  • Duplicate report detection is based on the unique Report.raw_xml_sha256 field.

Ingestion Pipeline

The main ingestion path is implemented in app/message_processor.py:

  1. process_inbox() opens an IMAP connection using app/imap_client.py.
  2. It searches UIDs in either new-message mode or backlog mode.
  3. It stores or reuses a MailMessage row for each fetched message.
  4. is_candidate_message() checks recipients, subject text, and attachment hints.
  5. app/attachment_extractor.py extracts .xml, .gz, and .zip report payloads, rejects unsafe ZIP paths, skips nested archives inside ZIPs, and enforces the decompressed size limit.
  6. app/dmarc_parser.py parses DMARC aggregate XML with defusedxml.
  7. app/known_senders.py classifies each record using configured CIDR allowlists, DKIM domains, SPF domains, or aligned DKIM evidence.
  8. app/analyzer.py creates or updates deterministic alerts.
  9. Email notifications are sent through app/alerts.py when configured and when a new alert is created or severity increases.

Deterministic Analysis

app/analyzer.py produces alerts from stored report records. Implemented alert paths include:

  • unknown source failed both SPF and DKIM alignment;
  • known sender DMARC failure;
  • quarantine or reject disposition;
  • first observed failing source;
  • first observed passing but unclassified source;
  • high SPF or DKIM alignment failure rate;
  • sudden unknown failure spike;
  • new reporter;
  • first observed policy for a domain;
  • missing expected reporter based on recent report history.

Open alerts are deduplicated by a fingerprint composed from domain, alert type, and key. Existing open alerts are updated rather than duplicated.

LLM Usage

  • app/llm.py wraps the OpenAI client for JSON-only alert explanations and daily/weekly summaries.
  • Alert existence and severity are deterministic in app/analyzer.py; LLM output is stored as explanation fields on alerts or in LLMReport rows.
  • Fallback outputs are returned when LLM calls fail or validation fails.
  • Config flags for raw XML and raw email are present in settings. The implemented LLM payloads use derived alert or summary facts.

Scheduler

app/scheduler.py starts an APScheduler BackgroundScheduler with these jobs:

  • poll: interval job using settings.app.poll_interval_minutes;
  • daily: cron job at 07:00 in settings.app.timezone;
  • weekly: cron job on Monday at 07:30 in settings.app.timezone.

Daily and weekly summary jobs aggregate stored data, call LLMClient, store LLMReport rows, and attempt digest email delivery through app/alerts.py.

HTTP and Template Surfaces

HTML routes in app/main.py render templates from app/templates/:

  • /: overview dashboard;
  • /domains/{domain}: domain detail;
  • /reports/{report_id}: report detail;
  • /alerts: alert list and actions;
  • /inboxes: inbox status and manual processing controls;
  • /settings: read-only runtime configuration.

JSON routes include /health, homepage widget endpoints, domain/report/alert APIs, and admin processing endpoints. Authentication behavior is defined by route dependencies in app/main.py.