Check Types¶
Provero includes 16 built-in check types organized by category. Each check can be written in shorthand or expanded YAML form.
Completeness¶
not_null¶
Verifies that column(s) contain no null values.
| Parameter | Type | Required | Description |
|---|---|---|---|
| column(s) | string or list | Yes | Column name or list of column names |
Default severity: critical
completeness¶
Checks that a column meets a minimum percentage of non-null values.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
column |
string | Yes | Column to check | |
min |
number | No | 0.95 |
Minimum completeness ratio (0 to 1) |
Default severity: critical
Uniqueness¶
unique¶
Verifies that a column has no duplicate values.
| Parameter | Type | Required | Description |
|---|---|---|---|
| column | string | Yes | Column name |
Default severity: critical
unique_combination¶
Checks that a combination of columns is unique (composite key).
| Parameter | Type | Required | Description |
|---|---|---|---|
| columns | list | Yes | List of column names (minimum 2) |
Default severity: critical
Validity¶
accepted_values¶
Ensures column values are within an allowed set.
| Parameter | Type | Required | Description |
|---|---|---|---|
column |
string | Yes | Column to check |
values |
list | Yes | Allowed values (strings, numbers, or booleans) |
Default severity: critical
range¶
Verifies that numeric values fall within min/max bounds.
| Parameter | Type | Required | Description |
|---|---|---|---|
column |
string | Yes | Column to check |
min |
number | No | Minimum allowed value |
max |
number | No | Maximum allowed value |
At least one of min or max should be specified.
Default severity: critical
regex¶
Validates that column values match a regular expression pattern.
| Parameter | Type | Required | Description |
|---|---|---|---|
column |
string | Yes | Column to check |
pattern |
string | Yes | Regular expression pattern |
Works across databases: uses regexp_matches() on DuckDB, ~ on PostgreSQL, and REGEXP on MySQL/SQLite.
Default severity: warning
email_validation¶
Validates that column values are valid email addresses using a standard email regex pattern.
| Parameter | Type | Required | Description |
|---|---|---|---|
column |
string | Yes | Column to check |
Works across databases: uses regexp_matches() on DuckDB, ~ on PostgreSQL, and REGEXP on MySQL/SQLite.
Default severity: warning
type¶
Checks that a column's data type matches the expected type.
| Parameter | Type | Required | Description |
|---|---|---|---|
column |
string | Yes | Column to check |
expected |
string | Yes | Expected type: integer, float, string, boolean, date, timestamp |
Type names are normalized across databases. For example, int, int4, bigint, and smallint all match integer.
Default severity: critical
Freshness¶
freshness¶
Checks that the most recent row is within a time threshold.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
column |
string | Yes | Timestamp column | |
max_age |
string | Yes | 24h |
Maximum age. Format: 30m, 24h, 7d |
Default severity: critical
latency¶
Measures the time difference between two timestamp columns. Useful for detecting pipeline delays.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
source_column |
string | Yes | Start timestamp (e.g., event time) | |
target_column |
string | Yes | End timestamp (e.g., load time) | |
max_latency |
string | No | 1h |
Maximum acceptable latency |
Default severity: warning
Volume¶
row_count¶
Validates that the table row count is within expected bounds.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
min |
integer | No | 0 |
Minimum row count |
max |
integer | No | Maximum row count |
Default severity: critical
row_count_change¶
Compares the current row count against the previous run. Requires the result store to be enabled (default).
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
max_decrease |
string | No | 50% |
Maximum allowed decrease percentage |
max_increase |
string | No | 500% |
Maximum allowed increase percentage |
Default severity: warning
Anomaly Detection¶
anomaly¶
Statistical anomaly detection on historical metrics. Compares the current value against stored history using one of three methods.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
metric |
string | Yes | Metric to check: row_count, null_count, null_rate, distinct_count, mean, min, max |
|
column |
string | No | Column for column-level metrics | |
method |
string | No | mad |
Detection method: zscore, mad, iqr |
sensitivity |
string | No | medium |
Sensitivity level |
threshold |
float | No | Direct threshold override |
Default severity: warning
Detection methods:
| Method | Description | Best for |
|---|---|---|
zscore |
Standard Z-Score | Normally distributed metrics |
mad |
Median Absolute Deviation | Robust to outliers |
iqr |
Interquartile Range | Skewed distributions |
All methods are implemented using Python stdlib only (no scipy dependency). Anomaly detection uses the result store to compare current values against historical data. Run provero run regularly to build up the baseline.
Referential Integrity¶
referential_integrity¶
Validates that all non-null values in a column exist in a reference table's column (foreign key validation). Orphaned rows where the FK value does not exist in the referenced table cause the check to fail. NULL values are excluded, as they represent optional relationships.
| Parameter | Type | Required | Description |
|---|---|---|---|
column |
string | Yes | FK column in the source table |
reference_table |
string | Yes | Referenced table name |
reference_column |
string | Yes | Referenced column (usually the primary key) |
Default severity: critical
checks:
- referential_integrity:
column: customer_id
reference_table: customers
reference_column: id
Custom¶
custom_sql¶
Runs a custom SQL query that must return a truthy value to pass.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
string | Yes | SQL query returning a single value |
name |
string | No | Custom name for the check |
Default severity: critical