Skip to content

Checks

Built-in check types and the check registry.

Registry

Check runner registry with plugin discovery via entry_points.

Built-in checks register via the @register_check decorator. Third-party checks register in their pyproject.toml::

[project.entry-points."provero.checks"]
pii_detection = "provero_pii:check_pii"

The registry discovers them automatically at runtime.

register_check(name)

Decorator to register a check runner.

Used by built-in checks and can be used by plugins that are imported directly (not via entry_points).

get_check_runner(name)

Get a check runner by name.

Resolution order: 1. Built-in checks (via @register_check decorator) 2. Plugin checks (via entry_points, provero.checks group)

Built-ins load first. Plugins can add new checks but cannot override built-ins (to prevent supply-chain attacks).

list_checks()

List all registered check types (built-in + plugins).

Completeness

Completeness checks: not_null, completeness.

check_not_null(connection, table, check_config)

Check that column(s) have no null values.

check_completeness(connection, table, check_config)

Check that a column meets a minimum completeness threshold.

Uniqueness

Uniqueness checks.

check_unique(connection, table, check_config)

Check that column values are unique.

check_unique_combination(connection, table, check_config)

Check that a combination of columns is unique.

Validity

Validity checks: accepted_values, range, regex, email_validation, type.

check_accepted_values(connection, table, check_config)

Check that column only contains accepted values.

NULLs are excluded from validation (filtered via WHERE IS NOT NULL). Use the not_null check separately if NULL values should be flagged.

check_range(connection, table, check_config)

Check that column values fall within a range.

check_regex(connection, table, check_config)

Check that column values match a regex pattern.

Uses regexp_matches() for DuckDB, falls back to col ~ 'pattern' (PostgreSQL) and REGEXP (MySQL/SQLite) for cross-database compatibility.

check_email_validation(connection, table, check_config)

Check that column values are valid email addresses.

Uses the same cross-database regex approach as the regex check. NULLs are excluded from validation (filtered via WHERE IS NOT NULL).

check_type(connection, table, check_config)

Check that a column has the expected data type.

Freshness

Freshness checks: freshness, latency.

check_freshness(connection, table, check_config)

Check that data is fresh (most recent row within max_age).

check_latency(connection, table, check_config)

Check that the latency between two timestamp columns is within bounds.

Measures the time difference between a source timestamp (e.g., event_time) and a target timestamp (e.g., loaded_at). Useful for detecting pipeline delays.

Volume

Volume checks: row_count.

check_row_count(connection, table, check_config)

Check that table has expected number of rows.

Custom

Custom SQL checks.

check_custom_sql(connection, table, check_config)

Execute a custom SQL check. The query must return a single boolean value.