Checks¶
Built-in check types and the check registry.
Registry¶
Check runner registry with plugin discovery via entry_points.
Built-in checks register via the @register_check decorator.
Third-party checks register in their pyproject.toml::
[project.entry-points."provero.checks"]
pii_detection = "provero_pii:check_pii"
The registry discovers them automatically at runtime.
register_check(name)
¶
Decorator to register a check runner.
Used by built-in checks and can be used by plugins that are imported directly (not via entry_points).
get_check_runner(name)
¶
Get a check runner by name.
Resolution order:
1. Built-in checks (via @register_check decorator)
2. Plugin checks (via entry_points, provero.checks group)
Built-ins load first. Plugins can add new checks but cannot override built-ins (to prevent supply-chain attacks).
list_checks()
¶
List all registered check types (built-in + plugins).
Completeness¶
Uniqueness¶
Validity¶
Validity checks: accepted_values, range, regex, email_validation, type.
check_accepted_values(connection, table, check_config)
¶
Check that column only contains accepted values.
NULLs are excluded from validation (filtered via WHERE IS NOT NULL).
Use the not_null check separately if NULL values should be flagged.
check_range(connection, table, check_config)
¶
Check that column values fall within a range.
check_regex(connection, table, check_config)
¶
Check that column values match a regex pattern.
Uses regexp_matches() for DuckDB, falls back to col ~ 'pattern' (PostgreSQL) and REGEXP (MySQL/SQLite) for cross-database compatibility.
check_email_validation(connection, table, check_config)
¶
Check that column values are valid email addresses.
Uses the same cross-database regex approach as the regex check.
NULLs are excluded from validation (filtered via WHERE IS NOT NULL).
check_type(connection, table, check_config)
¶
Check that a column has the expected data type.
Freshness¶
Freshness checks: freshness, latency.
check_freshness(connection, table, check_config)
¶
Check that data is fresh (most recent row within max_age).
check_latency(connection, table, check_config)
¶
Check that the latency between two timestamp columns is within bounds.
Measures the time difference between a source timestamp (e.g., event_time) and a target timestamp (e.g., loaded_at). Useful for detecting pipeline delays.
Volume¶
Volume checks: row_count.
check_row_count(connection, table, check_config)
¶
Check that table has expected number of rows.
Custom¶
Custom SQL checks.
check_custom_sql(connection, table, check_config)
¶
Execute a custom SQL check. The query must return a single boolean value.