Skip to content

Connectors

Data source connectors provide the interface between Provero and your databases or DataFrames.

Base

Base connector protocol.

Connection

Bases: Protocol

A database connection that can execute SQL.

get_columns(table)

Return column metadata: [{name, type, nullable}, ...].

Connector

Bases: Protocol

Interface for data source connectors.

Every connector implements at minimum connect/disconnect/execute. The get_profile and get_schema methods have default implementations that work via SQL, but connectors may override them with database-specific optimizations.

connect()

Establish connection to the data source.

disconnect(connection)

Close the connection.

get_schema(connection, table)

Return schema info for a table.

Default implementation uses get_columns(). Connectors may override with native INFORMATION_SCHEMA queries for richer metadata.

get_profile(connection, table, columns=None, sample_size=None)

Return statistical profile of a table.

Default implementation delegates to the profiler module. Connectors may override with database-specific profiling (e.g., Snowflake DESCRIBE TABLE EXTENDED).

DuckDB

DuckDB connector for local files (Parquet, CSV, JSON).

DuckDBConnection(conn)

DuckDB connection wrapper.

DuckDBConnector(database=':memory:')

Connector for DuckDB (local files and in-memory).

PostgreSQL

PostgreSQL connector via SQLAlchemy.

SQLAlchemyConnection(engine)

SQLAlchemy-based connection wrapper.

PostgresConnector(connection_string)

Connector for PostgreSQL databases.

SQLAlchemyConnector(connection_string)

Generic connector for any SQLAlchemy-supported database.

DataFrame

DataFrame connector for Pandas and Polars DataFrames via DuckDB.

DataFrameConnection(conn, table_name)

Bases: DuckDBConnection

Connection that wraps a DataFrame registered in DuckDB.

DataFrameConnector(dataframe, table_name='df')

Connector for Pandas and Polars DataFrames.

Registers the DataFrame as a virtual table in an in-memory DuckDB instance, allowing full SQL execution against it. Supports both Pandas and Polars DataFrames transparently.

Usage::

import pandas as pd
df = pd.read_csv("orders.csv")
connector = DataFrameConnector(df, table_name="orders")
conn = connector.connect()
result = conn.execute("SELECT COUNT(*) as cnt FROM orders")

Factory

Connector factory with plugin discovery via entry_points.

Third-party connectors register themselves in their pyproject.toml::

[project.entry-points."provero.connectors"]
mysql = "provero_mysql:MySQLConnector"

The factory discovers them automatically at runtime.

create_connector(source)

Create a connector based on source type.

Resolution order: 1. entry_points plugins (provero.connectors group) 2. Built-in connectors (DuckDB, Postgres, SQLAlchemy-based)

Plugins take priority so users can override built-ins.

list_connectors()

List all available connector types (built-in + plugins).