Provero¶
provero (Esperanto): to test, to put to proof.
A vendor-neutral, declarative data quality engine.
Why Provero?¶
- 16 check types covering completeness, uniqueness, validity, freshness, volume, anomaly detection, email validation, referential integrity, and custom SQL
- 3 connectors: DuckDB (files + in-memory), PostgreSQL, Pandas/Polars DataFrame
- SQL batch optimizer compiles N checks into 1 query
- Data contracts with schema validation, SLA enforcement, and contract diff
- Anomaly detection using Z-Score, MAD, IQR (stdlib only, no scipy needed)
- HTML reports via
provero run --report html - Webhook alerts for Slack, PagerDuty, or any HTTP endpoint
- Result store with SQLite time-series metrics and
provero history - Data profiling with
provero profile --suggestto auto-generate checks - Configurable severity: info, warning, critical, blocker per check
- JSON Schema validation for
provero.yaml - Airflow provider:
ProveroCheckOperator+@provero_checkdecorator
Quick Install¶
Minimal Example¶
source:
type: duckdb
table: orders
checks:
- not_null: [order_id, customer_id, amount]
- unique: order_id
- accepted_values:
column: status
values: [pending, shipped, delivered, cancelled]
- range:
column: amount
min: 0
max: 100000
- row_count:
min: 1
┌─────────────────┬──────────────┬──────────┬──────────────────┬──────────────────┐
│ Check │ Column │ Status │ Observed │ Expected │
├─────────────────┼──────────────┼──────────┼──────────────────┼──────────────────┤
│ not_null │ order_id │ ✓ PASS │ 0 nulls │ 0 nulls │
│ not_null │ customer_id │ ✓ PASS │ 0 nulls │ 0 nulls │
│ not_null │ amount │ ✓ PASS │ 0 nulls │ 0 nulls │
│ unique │ order_id │ ✓ PASS │ 0 duplicates │ 0 duplicates │
│ accepted_values │ status │ ✓ PASS │ 0 invalid values │ only [pending..] │
│ range │ amount │ ✓ PASS │ min=45, max=999 │ min=0, max=100k │
│ row_count │ - │ ✓ PASS │ 5 │ >= 1 │
└─────────────────┴──────────────┴──────────┴──────────────────┴──────────────────┘
Score: 100/100 | 7 passed, 0 failed | 22ms
Next Steps¶
- Getting Started for a full walkthrough
- Configuration for all config options
- Check Types for the complete check reference
- Connectors for database setup
- CLI Reference for all commands and flags