GitHub - pavel-v-sobolev/dbmerge: A database-agnostic Python library for simple and fast UPSERT (Insert/Update/Delete) operations via SQLAlchemy.

dbmerge logo

DBMerge is a Python library that provides a simplified interface for performing UPSERT (Insert/Update/Delete) operations.
Built on top of SQLAlchemy, it abstracts away engine-specific SQL MERGE or ON CONFLICT statements.

Overview

Common problems solved

Efficient bulk upsert with conflict resolution - Insert or update records in bulk without writing complex SQL. DBMerge automatically handles primary key conflicts: existing rows are updated, new rows are inserted.
Update rows only when values change - Automatically compares source data against the target table and skips writes for unchanged rows, reducing database load and I/O.
Materialize complex views - Persist results of heavy computations to a physical table for fast querying, supporting also partial data updates.

DBMerge accepts multiple data sources as input:

Pandas DataFrame
Lists of dictionaries
Database table or view

DBMerge automates data update process by comparing your source data against the target table and automatically performing the required operations.

Insert new records that do not exist in the target table.
Update existing records only if their values have changed.
Delete (or mark) existing records in the target table that are no longer present in the source data.

To ensure optimal performance, the library loads your data into a temporary table first, and then executes bulk synchronization queries.

Supported Databases

Tested and verified with:

PostgreSQL
MariaDB / MySQL
SQLite
MS SQL Server

Installation

pip install dbmerge

Quick Start Example

The library uses a context manager to handle database connections and ensure resources are safely released.

from sqlalchemy import create_engine
from datetime import date
from dbmerge import dbmerge

# 1. Initialize DB engine
engine = create_engine("sqlite://")

# 2. Prepare your source data
data = [
    {'Shop': '123', 'Product': 'A1', 'Date': date(2025, 1, 1), 'Qty': 2, 'Price': 50.10},
    {'Shop': '124', 'Product': 'A1', 'Date': date(2025, 1, 1), 'Qty': 1, 'Price': 100.50}
]

# 3. Execute the merge operation
# The table will be created automatically if it doesn't exist.
with dbmerge(engine=engine, data=data, table_name="Facts", 
             key=['Shop', 'Product', 'Date']) as merge:
    merge.exec()

Key Features

Database Agnostic: Write your synchronization logic once and run it across different SQL databases without modifying the code.
High Performance: Uses temporary staging tables for fast bulk operations rather than slow row-by-row changes.
Smart Deletion: Supports scoped deletion. You can pass a SQLAlchemy logical expression to delete missing data only within a specific timeframe or subset (e.g., updating only a single month).
Auto-Schema Management: Automatically creates missing tables or columns in the database.
Audit: Optional parameters to automatically add merged_on and inserted_on timestamps to track when rows were created or modified.
Statistics: Measures number of updated/inserted/deleted rows, total time and time for each operation step.

Benchmark

DBMerge handles the entire reconciliation process (staging, comparing, updating, inserting) with solid performance, scaling well even for larger datasets.

Here is a rough performance comparison for synchronizing data of different sizes using DBMerge (measured on a standard developer laptop):

Database	DBMerge (100k rows)	DBMerge (1mil rows)
PostgreSQL	~2.0s	~19.8s
MySQL / MariaDB	~1.0s	~11.1s
SQLite	~0.7s	~7.6s
MS SQL Server*	~22.4s	~4m 23s

* Note: MS SQL Server bulk operations take longer due to inherent limitations in the pyodbc driver

Database-Specific Notes & Limitations

PostgreSQL:
- Temporary tables are created as UNLOGGED.
- JSONB type is supported, but not JSON (as it cannot be compared to detect changes).
MariaDB / MySQL:
- Does not detect changes in uppercase vs. lowercase or space padding by default (e.g., 'test' == ' Test'). If this is important, you need to change the collation settings in your database.
- The schema is treated the same as the database, but schema settings are still supported by this library.
- Does not allow strings with unlimited size. You must explicitly define data_types if you want to create a table or field automatically (e.g., data_types={'Your Field': String(100)}).
SQLite: Does not support schemas. If a schema setting is provided, it is automatically reset to None with a warning.
MS SQL Server: Bulk insert operations may have lower performance due to specific pyodbc driver limitations.
Oracle: Currently not supported (missing support for JOIN operations in UPDATE statements within the oracledb module).
DuckDB: Currently not supported (due to a bug in duckdb_engine regarding table definition loading).

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
assets		assets
db		db
src/dbmerge		src/dbmerge
tests		tests
.gitignore		.gitignore
DOCUMENTATION.md		DOCUMENTATION.md
LICENSE		LICENSE
README.md		README.md
llms.txt		llms.txt
pyproject.toml		pyproject.toml
quick_start.py		quick_start.py
user_guide.py		user_guide.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Supported Databases

Installation

Quick Start Example

Key Features

Benchmark

Database-Specific Notes & Limitations

Documentation

About

Uh oh!

Releases 11

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Supported Databases

Installation

Quick Start Example

Key Features

Benchmark

Database-Specific Notes & Limitations

Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages