Connectors¶
Data source connectors provide the interface between Provero and your databases or DataFrames.
Base¶
Base connector protocol.
Connection
¶
Bases: Protocol
A database connection that can execute SQL.
get_columns(table)
¶
Return column metadata: [{name, type, nullable}, ...].
Connector
¶
Bases: Protocol
Interface for data source connectors.
Every connector implements at minimum connect/disconnect/execute. The get_profile and get_schema methods have default implementations that work via SQL, but connectors may override them with database-specific optimizations.
connect()
¶
Establish connection to the data source.
disconnect(connection)
¶
Close the connection.
get_schema(connection, table)
¶
Return schema info for a table.
Default implementation uses get_columns(). Connectors may override with native INFORMATION_SCHEMA queries for richer metadata.
get_profile(connection, table, columns=None, sample_size=None)
¶
Return statistical profile of a table.
Default implementation delegates to the profiler module. Connectors may override with database-specific profiling (e.g., Snowflake DESCRIBE TABLE EXTENDED).
DuckDB¶
PostgreSQL¶
DataFrame¶
DataFrame connector for Pandas and Polars DataFrames via DuckDB.
DataFrameConnection(conn, table_name)
¶
DataFrameConnector(dataframe, table_name='df')
¶
Connector for Pandas and Polars DataFrames.
Registers the DataFrame as a virtual table in an in-memory DuckDB instance, allowing full SQL execution against it. Supports both Pandas and Polars DataFrames transparently.
Usage::
import pandas as pd
df = pd.read_csv("orders.csv")
connector = DataFrameConnector(df, table_name="orders")
conn = connector.connect()
result = conn.execute("SELECT COUNT(*) as cnt FROM orders")
Factory¶
Connector factory with plugin discovery via entry_points.
Third-party connectors register themselves in their pyproject.toml::
[project.entry-points."provero.connectors"]
mysql = "provero_mysql:MySQLConnector"
The factory discovers them automatically at runtime.
create_connector(source)
¶
Create a connector based on source type.
Resolution order:
1. entry_points plugins (provero.connectors group)
2. Built-in connectors (DuckDB, Postgres, SQLAlchemy-based)
Plugins take priority so users can override built-ins.
list_connectors()
¶
List all available connector types (built-in + plugins).