How Provero Works¶

This section is a guided tour through what actually happens when you type provero run. It walks the pipeline end to end, from the YAML file on disk to the final CheckResult list, stopping at every component that matters.

If you want to know "where is the parser?", "how does the source string turn into a live connection?", or "what does the optimizer actually batch?", you are in the right place.

For the user-facing configuration reference, see Configuration. For the target architecture (including features on the roadmap), see Architecture.

Pipeline at a Glance¶

provero.yaml                                    Return
    │                                             ▲
    │ 1. compile_file()                           │ 7. SuiteResult
    ▼                                             │
ProveroConfig                                     │
    │                                     ┌───────┴────────┐
    │ 2. create_connector()               │ compute_status │
    ▼                                     │ quality_score  │
Connector  ─────── connect() ──► Connection
    │                                     ▲
    │ 3. _expand_multi_column_checks()    │
    ▼                                     │
Expanded CheckConfig list                 │
    │                                     │
    │ 4. plan_batch()                     │
    ▼                                     │
BatchPlan (metrics + non_batchable)       │
    │                                     │
    │ 5. execute_batch()                  │ 6. _run_single_check()
    ▼                                     │    (sequential or parallel)
Batched CheckResults ─────────────────────┘
                                  ▲
                                  │
                      Non-batchable CheckResults

Every stage maps to one file in provero-core/src/provero/:

Stage	Component	Entry point
1. Parse YAML	Parser	`core/compiler.py::compile_file()`
2. Resolve source	Connectors	`connectors/factory.py::create_connector()`
3, 4, 5. Plan and batch SQL	Optimizer	`core/optimizer.py::plan_batch()`
6. Run remainder, orchestrate	Engine	`core/engine.py::run_suite()`
Register checks	Check Registry	`checks/registry.py::register_check()`
7. Aggregate	Results	`core/results.py::SuiteResult.compute_status()`
Defense in depth	SQL Safety	`core/sql.py::quote_identifier()`

Read in Order¶

If this is your first time reading the internals, follow the pages in this order:

Parser: how provero.yaml becomes a typed ProveroConfig.
Connectors: how a SourceConfig becomes a live database connection, with plugin support via entry_points.
Optimizer: how multiple checks are compiled into a single SQL query to minimize round trips.
Engine: how a full suite is executed, including the non-batchable runner path and parallel mode.
Check Registry: how check runners are registered and discovered (built-ins + third-party plugins).
Results: how individual CheckResult objects are aggregated into a SuiteResult with a severity-weighted quality score.
SQL Safety: the defenses that keep user-supplied identifiers and values from becoming a SQL injection vector.

Each page is self-contained: it explains what the component does, where it lives, its public API, and the design decisions behind it.

The `Engine` Class Shortcut¶

If you just want to use Provero programmatically, the Engine class wraps every stage:

from provero import Engine

engine = Engine("provero.yaml")
results = engine.run(optimize=True, parallel=False)

See the Engine page for the full API, including Engine.from_dict() for in-memory configs and run_suites() for richer output.

How Provero Works¶

Pipeline at a Glance¶

Read in Order¶

The Engine Class Shortcut¶

The `Engine` Class Shortcut¶