Add skipDuplicates() scope guard to createDocuments by premtsd-code · Pull Request #852 · utopia-php/database

premtsd-code · 2026-04-08T21:00:18Z

Summary

Adds a Database::skipDuplicates() scope guard so callers can wrap a createDocuments() call and have duplicate-by-id rows silently skipped instead of throwing DuplicateException. Used by CSV import / re-import flows and database migration workers where re-running the same batch should be idempotent rather than fatal.

The shape follows the existing silent() / skipRelationships() pattern — instance-level scope guard, not a flag parameter — so callers compose naturally:

$database->skipDuplicates(fn () => $database->createDocuments($collection, $batch));

Design

Duplicate handling happens at the adapter layer via dialect-specific no-op inserts. Database::createDocuments sets a scope-guarded flag on the adapter before each chunk's INSERT; the adapter emits the right SQL (or Mongo equivalent) and returns the batch. No pre-filter, no intra-batch dedup — adapter-level row-level dedup is sufficient and avoids redundant queries.

Per-adapter implementation

Adapter	Strategy
MariaDB / MySQL	`INSERT IGNORE INTO`
Postgres	`INSERT INTO ... ON CONFLICT (...) DO NOTHING`
SQLite	`INSERT OR IGNORE INTO`
Mongo	`upsert + $setOnInsert`, bypassing the transaction wrap to avoid txn-abort-on-duplicate

Hot-path cost

One property read + one boolean branch per chunk. Callers that don't use skipDuplicates pay no closure allocation and no scope-guard setup — the feature cost is paid only when the feature is actually used. This addresses review comment #1 (don't penalize the common case with an $enable flag pattern).

Mirror dual-write correctness

Mirror::createDocuments pre-filters against source to identify which input ids already exist, then after the source insert forwards only the genuinely-new docs to destination. Pre-existing ids were silently no-op'd by the adapter's INSERT IGNORE — they represent no user write, so nothing propagates to destination.

This replaces an earlier captureOnNext-based approach flagged by Greptile (#3084293974): that approach wrapped $onNext to capture "what source inserted", but SQL adapters return the full input batch from createDocuments regardless of rowCount(), causing $onNext to fire for every doc — including skipped duplicates — so Mirror forwarded would-be values to destination, diverging from source. Pre-filter-then-insert matches the idempotent-write semantics: a skipped duplicate is not a user write, so it does not mirror.

Cost: one extra SELECT against source per batch when skipDuplicates is active and Mirror has an upgraded destination. Zero cost on the non-skip path. Skip and non-skip paths are unified into a single method body with branches only at the three points where behavior actually differs (source call, forward filter, destination call).

`upsertDocumentsWithIncrease` tenant grouping (regression fix)

Restored per-tenant grouping for the existing-doc lookup in sharedTables + tenantPerDocument mode. An earlier review round (#2 — inline find() instead of a helper) was correct for single-tenant use cases but ran under the session tenant context (null for platform workers), silently missing rows from other tenants. This broke appwrite's StatsUsage worker, which accumulates stats across many projects and flushes via $dbForLogs->setTenant(null)->setTenantPerDocument(true) + upsertDocumentsWithIncrease() — every input doc looked "new" and went down the create path instead of the update path.

The fix: branch on tenantPerDocument mode and run one find() per unique tenant wrapped in withTenant($tenant, ...). K queries total (K = unique tenants in the batch). Single-tenant callers (the common case, including CSV import and database migration) still hit the fast one-query batched path.

Helper: tenantKey(Document) — used only within upsertDocumentsWithIncrease to keep the in-memory lookup map and the intra-batch dedup check consistent. Returns "<tenant>:<id>" in tenantPerDocument mode, just <id> otherwise. Without it, a legitimate cross-tenant batch like [{id: x, tenant: 1}, {id: x, tenant: 2}] would falsely fail with DuplicateException.

Tests

End-to-end coverage across all adapter variants (MariaDB / MySQL / Postgres / SQLite / MongoDB × Pool / Mirror / SharedTables):

testCreateDocumentsIgnoreDuplicates — mix of existing + new docs, non-existing inserted, existing no-op'd
testCreateDocumentsIgnoreAllDuplicates — every doc already exists, nothing newly inserted
testCreateDocumentsSkipDuplicatesEmptyBatch — empty batch is a no-op
testCreateDocumentsSkipDuplicatesNestedScope — scope guard saves & restores through nested calls
testCreateDocumentsSkipDuplicatesLargeBatch — 300 docs / 50 pre-existing, exercises the chunk loop
testCreateDocumentsSkipDuplicatesSecondCallSkipsAll — idempotent re-invocation preserves first-write values
testCreateDocumentsSkipDuplicatesRelationships — relationship children aren't duplicated on re-run
testCreateDocumentsSkipDuplicatesDoesNotDivergeDestination (Mirror) — source-seeded dup=Original with a dup=WouldBe caller input, asserts destination.dup stays empty and destination.fresh gets the newly-inserted row (Greptile's exact assertion from #3084293974)

Test plan

skipDuplicates tests pass on every adapter variant
Full Mirror suite: 635 tests / 11,035 assertions green
Greptile divergence assertion holds
composer analyze (PHPStan) clean at the project's configured level
No regressions in existing testCreateDocuments* and testUpsert* suites

Summary by CodeRabbit

New Features
- Add scoped "skip duplicates" mode for batch document creation that safely ignores duplicates, preserves existing attributes, and avoids aborting operations.
Refactor
- Batch lookups and tenant-aware identity handling improve upsert/dedup detection and large-batch performance; duplicate-skipping integrates across storage backends and mirror flows.
Tests
- Added comprehensive end-to-end tests covering nested scopes, large batches, empty batches, mirrors, and relationship propagation.

coderabbitai · 2026-04-08T21:00:24Z

Warning

Rate limit exceeded

@premtsd-code has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 30 minutes and 43 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 30 minutes and 43 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5fe84cc5-10e1-46d9-9f87-37a1ce5e5c7d

📥 Commits

Reviewing files that changed from the base of the PR and between fa0e373 and 52b189b.

📒 Files selected for processing (2)

src/Database/Database.php
src/Database/Mirror.php

📝 Walkthrough

Walkthrough

Adds a scoped skipDuplicates(callable $callback) mode across Database, Pool, Mirror, and adapters that changes create/insert flows to avoid overwriting existing documents by using adapter-specific INSERT-IGNORE/ON CONFLICT or conditional upserts, and adds e2e tests for the behavior.

Changes

Cohort / File(s)	Summary
Core adapter & Database state `src/Database/Adapter.php`, `src/Database/Database.php`	Introduce protected `skipDuplicates` flag and public `skipDuplicates(callable $callback): mixed`. Database adds `tenantKey(...)`, batches existing-document lookups, and conditionally routes create flows through adapter skip scope.
Pool delegation `src/Database/Adapter/Pool.php`	Pool now wraps delegated adapter calls (pinned or borrowed) and transactional pinned-adapter `withTransaction` in the adapter's `skipDuplicates(...)` when pool skip flag is set.
Mongo adapter `src/Database/Adapter/Mongo.php`	`withTransaction` short-circuits when skipping duplicates. `createDocuments()` gains a skip-duplicates path implemented via conditional upserts (`$setOnInsert`, `_uid`/`_tenant` filters) instead of `insertMany`.
SQL / Postgres / SQLite adapters `src/Database/Adapter/SQL.php`, `src/Database/Adapter/Postgres.php`, `src/Database/Adapter/SQLite.php`	Added protected hooks `getInsertKeyword()`, `getInsertSuffix(string $table)`, `getInsertPermissionsSuffix()`; SQL generation now emits `INSERT IGNORE` / `ON CONFLICT DO NOTHING` when skip mode is enabled.
Mirror behavior `src/Database/Mirror.php`	When skipping duplicates, Mirror prefetches existing IDs from source, uses adapter `skipDuplicates(...)` for source/destination writes, and filters out source-existing docs before forwarding to destination.
Tests (e2e) `tests/e2e/Adapter/Scopes/DocumentTests.php`, `tests/e2e/Adapter/MirrorTest.php`	Added comprehensive e2e tests covering duplicate-skipping semantics: mixed/all-duplicate/empty batches, nested scopes, large chunked batches, repeated calls, relationships, and mirror divergence prevention.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Database
    participant Adapter
    participant Storage as DB

    rect rgba(100,180,120,0.5)
    Note over Client,Storage: New flow with skipDuplicates enabled
    Client->>Database: createDocuments(docs) within skipDuplicates scope
    Database->>Database: compute tenantKey, dedupe, chunk ids
    Database->>Adapter: find(existing ids) [chunked]
    Adapter->>DB: SELECT/_find by _uid/_id
    DB-->>Adapter: existing ids
    Adapter-->>Database: existing ids
    Database->>Database: filter out pre-existing docs, prepare inserts
    Database->>Adapter: adapter->skipDuplicates(fn => createDocuments(filtered))
    Adapter->>DB: INSERT IGNORE / ON CONFLICT DO NOTHING or conditional upserts (Mongo)
    DB-->>Adapter: success (duplicates skipped)
    Adapter-->>Database: inserted result info
    Database->>Database: create relationships for actually inserted docs
    Database-->>Client: return inserted count / docs
    end

sequenceDiagram
    participant Client
    participant Database
    participant Adapter
    participant Storage as DB

    rect rgba(200,100,150,0.5)
    Note over Client,Storage: Regular flow without skipDuplicates
    Client->>Database: createDocuments(docs)
    Database->>Adapter: createDocuments(docs)
    Adapter->>DB: INSERT INTO ...
    DB-->>Adapter: may error on duplicates
    Adapter-->>Database: propagate exception or handle normally
    Database-->>Client: result or exception
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Feat mongodb #721: Overlaps Mongo adapter transaction and createDocuments changes that touch skip-duplicate behavior.
Fix update documents #646: Modifies Database::createDocuments and SQL insert handling; relevant to the insert-path changes.
Feat mongo tmp #647: Related Mongo adapter changes and upsert/transaction behavior affecting the same area.

Suggested reviewers

abnegate
fogelito
ArnabChatterjee20k

Poem

🐇 I hopped through rows and checked each id,
I nudged the duplicates softly to the side,
Scoped my skip so nested hops restore,
Fresh carrots stay, old roots keep core,
A tiny rabbit guard — no overwrites tonight.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 47.06% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding a skipDuplicates() scope guard to createDocuments functionality across the codebase.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch csv-import-upsert-v2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-04-08T21:06:28Z

Greptile Summary

This PR adds a skipDuplicates() scope guard to createDocuments across all adapter backends (MySQL INSERT IGNORE, Postgres ON CONFLICT DO NOTHING, SQLite INSERT OR IGNORE, Mongo upsert + $setOnInsert), fixes per-tenant batching in upsertDocumentsWithIncrease, and hardens the Mirror flow to avoid forwarding skipped duplicates to the destination.

P1 — M2M junction duplication: createDocumentRelationships runs unconditionally for every input document before the adapter's dedup step. For MANY_TO_MANY relationships, relateDocuments always calls createDocument on the junction collection without checking for an existing entry; the junction table has no composite UNIQUE constraint, so each re-run silently appends a duplicate row for the same (key, twoWayKey) pair. The existing relationship test uses ONE_TO_MANY only and does not catch this.
P2 — Mirror pre-filter not tenant-scoped: In tenantPerDocument mode, the find() pre-filter in Mirror::createDocuments runs under the session tenant and misses documents from other tenants, potentially forwarding them to destination as if they were new writes.

Confidence Score: 3/5

One P1 defect (M2M junction duplication on re-run) should be resolved before merging for any re-import use cases involving MANY_TO_MANY relationships.

The core mechanism — adapter flags, Pool propagation, Mongo transaction bypass, Mirror divergence prevention — is well-designed and tests are thorough for covered cases. However, createDocumentRelationships runs before the dedup step with no guard for known duplicates, meaning MANY_TO_MANY junction entries accumulate silently on every re-run of a batch containing existing parent documents — directly undermining the idempotent-re-import guarantee the PR was built to deliver.

src/Database/Database.php (createDocuments pre-processing loop around line 5720), src/Database/Mirror.php (skipDuplicates pre-filter around line 614)

Important Files Changed

Filename	Overview
src/Database/Database.php	Adds `skipDuplicates` scope guard and `tenantKey` helper; `upsertDocumentsWithIncrease` batch-fetch is well structured, but `createDocumentRelationships` is called unconditionally for all documents — including known duplicates — causing MANY_TO_MANY junction entries to accumulate silently on re-runs.
src/Database/Mirror.php	Pre-filter correctly chunks by `maxQueryValues` and prevents duplicate forwarding to destination; `find` is not tenant-scoped in `tenantPerDocument` mode, which can leak documents to destination in cross-tenant batches.
src/Database/Adapter/SQL.php	Pluggable `getInsertKeyword` / `getInsertSuffix` pattern is clean; returns the full input batch regardless of actual row insertions, a previously-discussed known trade-off.
src/Database/Adapter/Postgres.php	Overrides `getInsertSuffix` and `getInsertPermissionsSuffix` with correct `ON CONFLICT … DO NOTHING` clauses; conflict targets are tenant-aware.
src/Database/Adapter/SQLite.php	Minimal override — provides `INSERT OR IGNORE INTO` for the skip-duplicates path.
src/Database/Adapter/Mongo.php	Bypasses transaction and uses `upsert + $setOnInsert` to avoid write-conflict errors; return value of `upsert()` is discarded (full input batch returned), a trade-off explicitly acknowledged in a prior thread reply.
src/Database/Adapter/Pool.php	Correctly propagates `skipDuplicates` to the pinned and pooled underlying adapters, including the `withTransaction` path.
src/Database/Adapter.php	Adds `$skipDuplicates` flag and `skipDuplicates(callable)` scope guard with save/restore; clean implementation following the existing `silent`/`skipRelationships` pattern.
tests/e2e/Adapter/Scopes/DocumentTests.php	Good breadth of new tests covering empty batch, nested scopes, large batches, and Mirror divergence; relationship test covers ONE_TO_MANY only, leaving MANY_TO_MANY junction duplication untested.
tests/e2e/Adapter/MirrorTest.php	Adds `testCreateDocumentsSkipDuplicatesDoesNotDivergeDestination` covering the Mirror divergence scenario from the prior review discussion; assertions are precise.

Comments Outside Diff (1)

src/Database/Database.php, line 5720-5722 (link)

MANY_TO_MANY junction entries duplicated on re-run with skipDuplicates

createDocumentRelationships is called unconditionally for every document in the pre-processing loop — including documents whose IDs are known duplicates and will be silently skipped by the adapter. For MANY_TO_MANY relationships, relateDocuments always creates a new junction entry via createDocument without checking whether one already exists. The junction collection is created with plain INDEX_KEY indexes, not a composite UNIQUE constraint, so each re-run silently appends a second (third, …) row for the same (key, twoWayKey) pair.

Concrete failure: call skipDuplicates + createDocuments with a batch that contains existingParent having M2M children [existingChild]. The first call creates the junction row {parentId: existingParent, childId: existingChild}. The second call skips the parent insert but still calls createDocumentRelationships → relateDocuments → createDocument(junction, …) → a second identical junction row is inserted. After two calls existingChild appears twice in every query on the relationship.

The existing test testCreateDocumentsSkipDuplicatesRelationships uses ONE_TO_MANY only (child stores FK, no junction table), so it does not exercise this path.

A minimal guard is to skip createDocumentRelationships for documents whose ID is already in $preExistingIds when $this->skipDuplicates is true, which requires fetching pre-existing IDs before this loop (similar to what upsertDocumentsWithIncrease does at line 7153–7193).

_{Reviews (29): Last reviewed commit: "Chunk id lookups by \$this->maxQueryValu..." | Re-trigger Greptile}

src/Database/Adapter/Mongo.php

src/Database/Database.php

src/Database/Adapter/SQL.php

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

tests/e2e/Adapter/Scopes/DocumentTests.php (1)

7815-7818: Prefer __FUNCTION__ for collection naming consistency.

Using __FUNCTION__ here (as in adjacent tests) improves isolation and avoids accidental name reuse across reruns.

♻️ Suggested tweak

-        $col = 'createDocsIgnoreIntraBatch';
+        $col = __FUNCTION__;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/Adapter/Scopes/DocumentTests.php` around lines 7815 - 7818, The
collection name is hard-coded as 'createDocsIgnoreIntraBatch' which can collide
across test runs; change the $col assignment to use the current function name
(replace the literal with __FUNCTION__) so the calls to
$database->createCollection($col) and $database->createAttribute($col, 'name',
Database::VAR_STRING, 128, true) use a unique, test-local collection name
consistent with adjacent tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/Mongo.php`:
- Around line 1501-1524: The current dedupe prefetch uses only _uid and a single
tenant filter, causing cross-tenant collisions for shared tables; update the
logic that builds existingUids (and the prefetch loop around
$this->client->find/$this->client->getMore) to be tenant-aware by using a
composite key (e.g., "$tenant:$uid") when populating $existingUids, include the
tenant in the $filters returned by $this->getTenantFilters($collection->getId())
per chunk, and prefetch in per-tenant chunks using array_chunk($uidValues,
max(1, $this->maxQueryValues)) (use the chunk size when calling getMore instead
of count($uidValues)); then when filtering $records check the composite key
($record['_tenant'] . ':' . ($record['_uid'] ?? '')) against $existingUids so
duplicate detection respects tenant boundaries.
- Around line 1504-1516: The prefetch block calls $this->client->find() and
$this->client->getMore() without exception mapping; wrap each driver call in a
try/catch that catches \Throwable and rethrows the normalized exception via
processException($e) (same pattern used elsewhere in this class), i.e. surround
the $this->client->find(...) call and the $this->client->getMore(...) call with
try { ... } catch (\Throwable $e) { throw $this->processException($e); } so any
raw driver errors from the $response/$more handling are mapped consistently.

In `@src/Database/Database.php`:
- Around line 7210-7255: The upsertDocumentsWithIncrease() flow builds
$existingDocs keyed as tenant:id when $upsertTenantPerDocument is true, but
later duplicate-input detection still uses only $document->getId(), causing
false DuplicateException across tenants; update the duplicate-input check and
any intermediate maps (e.g., $seenIds/$inputKeys) to use the same composite
lookup key (use the existing $lookupKey logic: $tenant . ':' .
$document->getId() when $upsertTenantPerDocument is true) so the tenant-aware
key is reused consistently across the whole upsert path.

---

Nitpick comments:
In `@tests/e2e/Adapter/Scopes/DocumentTests.php`:
- Around line 7815-7818: The collection name is hard-coded as
'createDocsIgnoreIntraBatch' which can collide across test runs; change the $col
assignment to use the current function name (replace the literal with
__FUNCTION__) so the calls to $database->createCollection($col) and
$database->createAttribute($col, 'name', Database::VAR_STRING, 128, true) use a
unique, test-local collection name consistent with adjacent tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c9f1924e-3ed0-43a8-8859-6e6c299aadb5

📥 Commits

Reviewing files that changed from the base of the PR and between 72ee161 and b9c7f4d.

📒 Files selected for processing (9)

src/Database/Adapter.php
src/Database/Adapter/Mongo.php
src/Database/Adapter/Pool.php
src/Database/Adapter/Postgres.php
src/Database/Adapter/SQL.php
src/Database/Adapter/SQLite.php
src/Database/Database.php
src/Database/Mirror.php
tests/e2e/Adapter/Scopes/DocumentTests.php

src/Database/Adapter/Mongo.php

src/Database/Database.php

src/Database/Adapter/SQL.php

src/Database/Database.php

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

src/Database/Database.php (2)

5682-5726: ⚠️ Potential issue | 🟠 Major

Don’t let the outer prefetch snapshot suppress writes.

The adapters already do the authoritative duplicate filtering inside the transaction/session, but this code still drops $chunk entries from a pre-transaction find(). If a matching row is deleted after that snapshot but before the insert transaction starts, ignore: true will incorrectly skip a document that should now insert, and $modified / $onNext will be low.

Also applies to: 5805-5815

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5682 - 5726, The current prefetch
logic builds $preExistingIds (when $ignore is true, in the tenantPerDocument and
non-tenant branches using find()/authorization->skip/etc.) and later uses that
snapshot to skip writes, which can wrongly suppress inserts if a matching row
was deleted after the prefetch; remove or stop using this prefetch-based
suppression. Instead, eliminate reliance on $preExistingIds to skip document
inserts/relationship writes (and the related checks that reference it around the
relationship-resolution code and at the later block noted ~5805-5815), and
either rely on the adapters’ authoritative duplicate filtering inside the
transaction/session or re-check existence inside the transaction. Keep the
prefetch only if it’s used for optimizations that do not affect whether an
insert occurs, or remove the prefetch entirely if it’s unused for
non-authoritative optimizations.

5788-5790: ⚠️ Potential issue | 🟠 Major

Use the same tenant-aware key for deferred relationship writes.

In tenant-per-document mode, two inserted documents can legitimately share the same $id. This map is keyed only by $id, so the later payload overwrites the earlier one, and unset() after the first insert prevents the second tenant’s relationship write entirely.

♻️ Proposed fix

                 if (!empty($relationshipData)) {
-                    $deferredRelationships[$document->getId()] = $relationshipData;
+                    $relationshipKey = $tenantPerDocument
+                        ? $document->getTenant() . ':' . $document->getId()
+                        : $document->getId();
+                    $deferredRelationships[$relationshipKey] = $relationshipData;
                 }
@@
                 foreach ($batch as $insertedDoc) {
-                    $docId = $insertedDoc->getId();
-                    if (\array_key_exists($docId, $deferredRelationships)) {
+                    $relationshipKey = $tenantPerDocument
+                        ? $insertedDoc->getTenant() . ':' . $insertedDoc->getId()
+                        : $insertedDoc->getId();
+                    if (\array_key_exists($relationshipKey, $deferredRelationships)) {
                         $relDoc = clone $insertedDoc;
-                        foreach ($deferredRelationships[$docId] as $key => $value) {
+                        foreach ($deferredRelationships[$relationshipKey] as $key => $value) {
                             $relDoc->setAttribute($key, $value);
                         }
                         $this->silent(fn () => $this->createDocumentRelationships($collection, $relDoc));
-                        unset($deferredRelationships[$docId]);
+                        unset($deferredRelationships[$relationshipKey]);
                     }
                 }

Also applies to: 5823-5832

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5788 - 5790, The deferred
relationship map ($deferredRelationships) is keyed only by $document->getId(),
which causes collisions in tenant-per-document mode; change the key to be
tenant-aware (e.g., combine tenant id and document id or use the document's
tenant identifier method) wherever entries are set and removed so each
tenant/document pair is unique—update the code that assigns
$deferredRelationships[$document->getId()] = $relationshipData; and the matching
unset() logic (and the same pattern around the related block at lines referenced
5823-5832) to use the composite tenant-aware key instead of just getId().

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2482-2527: The pre-filter SELECT that checks for existing _uid
rows (the block building $sql, preparing via $this->getPDO()->prepare, binding
values and calling $stmt->execute()/fetchAll()) can throw a raw PDOException
because it runs outside the method's main try/catch; wrap the entire
prepare/execute/fetch block in the adapter's processException(...) wrapper so
exceptions are mapped consistently (i.e., call $this->processException(...)
around the closure that prepares, binds, executes and fetches, returning $rows),
keeping the logic inside createDocuments() and referencing the existing $sql,
$binds, $stmt and $rows variables.
- Around line 2665-2713: The reconciliation currently keys timestamps only by
_uid and uses a single :_vfy_tenant bind when $this->sharedTables is true, which
corrupts mixed-tenant batches; update the verification to be tenant-aware by
binding the tenant per document and keying by a composite tenant+uid.
Concretely, in the block that builds $verifyPlaceholders/$verifyBinds and
$verifySql (the code using $verifyPlaceholders, $verifyBinds, $verifySql and
$verifyStmt), include the tenant value for each document (use the document's
tenant when tenant-per-document applies), change the WHERE to match pairs (e.g.
use row-value tuples or explicit (_uid = :uX AND _tenant = :tX) clauses), and
then build $actualTimestamps keyed by "$tenant|$uid" (and compare against
$expectedTimestamps keyed the same way) so the subsequent loop that populates
$insertedDocs/$insertedUids only matches rows with the correct tenant+uid.
- Around line 2650-2652: Replace the use of "INSERT IGNORE" (returned by
getInsertKeyword) with a plain "INSERT" and remove any IGNORE-specific behavior
in getInsertSuffix so that constraint violations are not globally suppressed;
instead let the DB raise errors and update the insert/execute path that catches
SQL exceptions to inspect the database error/constraint name (e.g., index or key
name such as _uid) and map errors to DuplicateException only when the violated
constraint corresponds to the _uid duplicate, otherwise map to UniqueException
(or other specific exceptions) for PRIMARY/UNIQUE/NOT NULL/truncation
violations—use the existing functions getInsertKeyword and getInsertSuffix to
locate changes and adjust the exception-mapping in the code that performs the
insert and error handling.

---

Duplicate comments:
In `@src/Database/Database.php`:
- Around line 5682-5726: The current prefetch logic builds $preExistingIds (when
$ignore is true, in the tenantPerDocument and non-tenant branches using
find()/authorization->skip/etc.) and later uses that snapshot to skip writes,
which can wrongly suppress inserts if a matching row was deleted after the
prefetch; remove or stop using this prefetch-based suppression. Instead,
eliminate reliance on $preExistingIds to skip document inserts/relationship
writes (and the related checks that reference it around the
relationship-resolution code and at the later block noted ~5805-5815), and
either rely on the adapters’ authoritative duplicate filtering inside the
transaction/session or re-check existence inside the transaction. Keep the
prefetch only if it’s used for optimizations that do not affect whether an
insert occurs, or remove the prefetch entirely if it’s unused for
non-authoritative optimizations.
- Around line 5788-5790: The deferred relationship map ($deferredRelationships)
is keyed only by $document->getId(), which causes collisions in
tenant-per-document mode; change the key to be tenant-aware (e.g., combine
tenant id and document id or use the document's tenant identifier method)
wherever entries are set and removed so each tenant/document pair is
unique—update the code that assigns $deferredRelationships[$document->getId()] =
$relationshipData; and the matching unset() logic (and the same pattern around
the related block at lines referenced 5823-5832) to use the composite
tenant-aware key instead of just getId().

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d2ab610b-bb64-4dcb-979c-f3880d9e99df

📥 Commits

Reviewing files that changed from the base of the PR and between 20846d3 and 242c9a8.

📒 Files selected for processing (3)

src/Database/Adapter/Mongo.php
src/Database/Adapter/SQL.php
src/Database/Database.php

🚧 Files skipped from review as they are similar to previous changes (1)

src/Database/Adapter/Mongo.php

src/Database/Adapter/SQL.php

coderabbitai

♻️ Duplicate comments (3)

src/Database/Adapter/SQL.php (3)

2482-2527: ⚠️ Potential issue | 🟠 Major

Wrap the ignore-mode pre-filter in processException() handling.

This query runs before the method’s main try/catch, so SQL or connection failures here still escape as raw PDOException while the rest of createDocuments() is adapter-mapped.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/SQL.php` around lines 2482 - 2527, The pre-filter block
in createDocuments() that prepares and executes the SELECT (using getSQLTable(),
getPDO(), getPDOType(), preparing $sql and executing $stmt) should be wrapped
inside a call to processException() so any PDOException is adapter-mapped
instead of escaping; move the existing logic that builds $placeholders/$binds,
$tenantFilter, $sql, $stmt->bindValue(), $stmt->execute(), $rows =
$stmt->fetchAll() and $stmt->closeCursor() into a closure passed to
processException(), and return/assign the result from that closure (or rethrow
via processException) so behavior matches the rest of createDocuments().

2782-2785: ⚠️ Potential issue | 🟠 Major

Don’t use INSERT IGNORE as the base implementation for ignore=true.

For adapters that inherit this default, ignore=true stops meaning “skip duplicate document IDs” and starts suppressing a broader class of insert failures on both the document and permission insert paths. The base hook should stay plain INSERT INTO, and adapter-specific duplicate handling should be expressed with targeted conflict clauses or constraint-aware exception mapping.

Does MariaDB/MySQL `INSERT IGNORE` suppress constraint violations beyond duplicate-key conflicts, such as PRIMARY KEY, other UNIQUE, NOT NULL, truncation, or conversion errors?

Based on learnings, in src/Database/Adapter/MariaDB.php only duplicate _uid violations should throw DuplicateException; other unique or PRIMARY collisions should still throw UniqueException.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/SQL.php` around lines 2782 - 2785, The base method
getInsertKeyword in class SQL should not return 'INSERT IGNORE INTO' for
ignore=true; change it to always return the plain 'INSERT INTO' and move any
ignore semantics into adapter-specific code (e.g., Adapter/MariaDB.php) by using
targeted conflict clauses or by mapping constraint/SQLSTATE errors to the
correct exceptions; specifically, in MariaDB adapter ensure only duplicate
violations on the document _uid map to DuplicateException while other
UNIQUE/PRIMARY/constraint failures map to UniqueException (and do not rely on
INSERT IGNORE in SQL::getInsertKeyword).

2664-2728: ⚠️ Potential issue | 🔴 Critical

Keep expectedTimestamps tenant-aware too.

$actualTimestamps is keyed by tenant:uid now, but $expectedTimestamps is still keyed only by _uid. In tenantPerDocument mode, two documents with the same _uid in different tenants can overwrite each other here, causing an inserted document to be dropped from $documents and miss its permission insert.

💡 Proposed fix

-                $expectedTimestamps = [];
-                foreach ($documents as $doc) {
-                    $expectedTimestamps[$doc->getId()] = $doc->getCreatedAt();
-                }
+                $expectedTimestamps = [];
+                foreach ($documents as $doc) {
+                    $key = ($this->sharedTables && $this->tenantPerDocument)
+                        ? $doc->getTenant() . ':' . $doc->getId()
+                        : $doc->getId();
+                    $expectedTimestamps[$key] = $doc->getCreatedAt();
+                }
@@
-                    if (isset($actualTimestamps[$key]) && $actualTimestamps[$key] === $expectedTimestamps[$doc->getId()]) {
+                    if (isset($actualTimestamps[$key]) && $actualTimestamps[$key] === $expectedTimestamps[$key]) {
                         $insertedDocs[] = $doc;
                     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/SQL.php` around lines 2664 - 2728, The
expectedTimestamps map is not tenant-aware while actualTimestamps is, so when
sharedTables && tenantPerDocument you must key expectedTimestamps the same way
as actualTimestamps (tenant + ':' + uid) during the initial foreach over
$documents and also use that key when comparing later (instead of
$expectedTimestamps[$doc->getId()]); update the loop that builds
$expectedTimestamps and the comparison that checks equality to construct/look up
keys as ($doc->getTenant() . ':' . $doc->getId()) when $this->sharedTables &&
$this->tenantPerDocument so entries don't collide across tenants.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2482-2527: The pre-filter block in createDocuments() that prepares
and executes the SELECT (using getSQLTable(), getPDO(), getPDOType(), preparing
$sql and executing $stmt) should be wrapped inside a call to processException()
so any PDOException is adapter-mapped instead of escaping; move the existing
logic that builds $placeholders/$binds, $tenantFilter, $sql, $stmt->bindValue(),
$stmt->execute(), $rows = $stmt->fetchAll() and $stmt->closeCursor() into a
closure passed to processException(), and return/assign the result from that
closure (or rethrow via processException) so behavior matches the rest of
createDocuments().
- Around line 2782-2785: The base method getInsertKeyword in class SQL should
not return 'INSERT IGNORE INTO' for ignore=true; change it to always return the
plain 'INSERT INTO' and move any ignore semantics into adapter-specific code
(e.g., Adapter/MariaDB.php) by using targeted conflict clauses or by mapping
constraint/SQLSTATE errors to the correct exceptions; specifically, in MariaDB
adapter ensure only duplicate violations on the document _uid map to
DuplicateException while other UNIQUE/PRIMARY/constraint failures map to
UniqueException (and do not rely on INSERT IGNORE in SQL::getInsertKeyword).
- Around line 2664-2728: The expectedTimestamps map is not tenant-aware while
actualTimestamps is, so when sharedTables && tenantPerDocument you must key
expectedTimestamps the same way as actualTimestamps (tenant + ':' + uid) during
the initial foreach over $documents and also use that key when comparing later
(instead of $expectedTimestamps[$doc->getId()]); update the loop that builds
$expectedTimestamps and the comparison that checks equality to construct/look up
keys as ($doc->getTenant() . ':' . $doc->getId()) when $this->sharedTables &&
$this->tenantPerDocument so entries don't collide across tenants.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 892638bd-ac20-414b-9059-7f7e20295028

📥 Commits

Reviewing files that changed from the base of the PR and between 242c9a8 and 455da70.

📒 Files selected for processing (2)

src/Database/Adapter/SQL.php
src/Database/Database.php

🚧 Files skipped from review as they are similar to previous changes (1)

src/Database/Database.php

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2676-2682: The verification query is binding composite keys from
$expectedTimestamps into $verifyBinds/placeholders and then comparing them
against the _uid column, but when tenantPerDocument is enabled _uid was set to
the raw document ID ($attributes['_uid'] = $document->getId()), so the WHERE
_uid IN (...) must use the document IDs not the composite keys. Fix by
extracting the document ID portion from each composite key before creating $ph
and adding to $verifyBinds (or if keys are already plain IDs, leave unchanged);
update the loop that builds $verifyPlaceholders/$verifyBinds (the foreach over
array_values(array_unique(array_keys($expectedTimestamps)))) to derive $uid =
<documentIdFromComposite($uid)> and bind that instead so the verification query
matches rows by actual _uid values.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 43af96a2-d479-4354-86c9-7d68abad7b56

📥 Commits

Reviewing files that changed from the base of the PR and between 455da70 and deedfda.

📒 Files selected for processing (1)

src/Database/Adapter/SQL.php

src/Database/Adapter/SQL.php

src/Database/Adapter/Mongo.php

Adapters: - MariaDB/MySQL: INSERT IGNORE INTO - PostgreSQL: ON CONFLICT DO NOTHING - SQLite: INSERT OR IGNORE INTO - MongoDB: session-scoped pre-filter before insertMany Database.php: - Intra-batch dedup by ID (tenant-aware, first occurrence wins) - Pre-fetch existing IDs to skip known duplicates - Deferred relationship creation for ignore mode (no orphans) - Race-condition reconciliation via _createdAt timestamp verification upsertDocuments: - Batch-fetch existing docs with find() instead of per-row getDocument() - Tenant-aware composite keys for seenIds duplicate check All paths are tenant-per-document aware with null-safe array_filter.

- testCreateDocumentsIgnoreDuplicates: mixed batch with onNext assertions - testCreateDocumentsIgnoreIntraBatchDuplicates: first-wins, no ACL drift - testCreateDocumentsIgnoreAllDuplicates: zero inserts, empty onNext

src/Database/Adapter/SQL.php

…x _uid mapping - Pass ignoreDuplicates option to insertMany for race-condition safety - Extract _uid from raw array before replaceChars (excluded from transformation) - Map inserted records back to original Document objects by _uid

src/Database/Adapter/Mongo.php

…t and fix _uid mapping" This reverts commit 93a9136.

- Add skipDuplicates(callable) scope guard on Database, following existing pattern (skipRelationships, skipValidation, etc.) - Remove bool $ignore parameter from createDocuments signature - Mirror propagates skipDuplicates state to source and destination - Update tests to use $database->skipDuplicates(function() { ... })

- Add $skipDuplicates property + skipDuplicates() scope guard to Adapter - Remove bool $ignore parameter from all adapter createDocuments signatures (Adapter, Pool, Mongo, SQL) - Remove bool $ignore from SQL helper methods (getInsertKeyword, getInsertSuffix, getInsertPermissionsSuffix) and Postgres/SQLite overrides - Pool delegate() and withTransaction() propagate skipDuplicates state to pooled/pinned adapter via the scope guard - Database::createDocuments() wraps adapter call in adapter->skipDuplicates() when the flag is set, drops the local $ignore variable

- skipDuplicates() takes optional $enable so Pool/Mirror/Database can forward state in a single call instead of branching - Database: tenantKey(Document) helper replaces ~8 inline ternaries - Database: fetchExistingByIds() helper replaces the tenant-grouped batched find() in createDocuments and upsertDocuments pre-fetch - Database: drop dead guard on non-defer relationship path - Mongo: findExistingUids() helper replaces the duplicated cursor walk in the skipDuplicates pre-filter (tenant and non-tenant variants) - SQL: buildUidTenantLookup() helper replaces the duplicated placeholder-and-binds builder shared by pre-filter and reconciliation

src/Database/Adapter/SQL.php

When a batch has only null-ID documents but non-null tenants, the previous code emitted WHERE _uid IN () AND _tenant IN (...) — invalid SQL that the !empty(binds) call-site guard failed to catch because the tenant binds kept the binds array non-empty. Return 'where' => '1=0' with empty binds when there are no UIDs, so the query matches nothing and leaves no dangling placeholders.

- testCreateDocumentsSkipDuplicatesEmptyBatch: no-op with empty array - testCreateDocumentsSkipDuplicatesNestedScope: scope guard save/restore through nested skipDuplicates() calls - testCreateDocumentsSkipDuplicatesLargeBatch: 300-doc batch with 50 pre-existing, exercising the chunk loop - testCreateDocumentsSkipDuplicatesSecondCallSkipsAll: second identical batch should preserve First values, not overwrite with Second - testCreateDocumentsSkipDuplicatesRelationships: skipDuplicates with a one-to-many relationship — verifies deferred relationship logic does not create orphan child rows for ignored parents

INSERT IGNORE / ON CONFLICT DO NOTHING / INSERT OR IGNORE handle duplicates at the DB level, and the existing reconciliation block identifies actually-inserted rows via the _createdAt timestamp. The pre-filter was a wire-data optimization that ran a SELECT before the INSERT, then trimmed the batch — but the same SELECT (with timestamp comparison) already happens in the reconciliation path on demand. Net: ~45 lines removed, single code path for skipDuplicates instead of pre-filter + fallback. Race-condition handling is unchanged (reconciliation path covers it).

Previously the Database orchestrator carried a pre-fetch + chunk-filter pass to compensate for Mongo's plain upsert() returning matched + upserted combined ('n') — without that compensation, $modified would over-count the number of pre-existing docs in the batch. This pushes the responsibility into the adapter, where it belongs: - composer bump to utopia-php/mongo dev-upsert-return-upserted-count (PR #37, adds Client::upsertWithCounts() returning matched/modified/ upserted[] separately) - Mongo adapter calls upsertWithCounts() and maps the response's upserted[].index entries back to the originating Document objects, returning ONLY the actually-inserted docs (race-safe) - Database::createDocuments drops $preExistingIds pre-fetch and the chunk filter; the adapter is now responsible for accurate counts Net effect on Database.php: ~17 lines removed. Net effect on Mongo.php: ~80 lines net deletion (replaced the inline pre-filter with a single upsertWithCounts call + index mapping).

src/Database/Adapter/Mongo.php

This reverts commit c6d566e.

The SQL adapter previously carried a pre-filter SELECT plus a _createdAt reconciliation pass to compensate for INSERT IGNORE not telling us which rows were actually inserted vs skipped. The Mongo adapter pulled the same trick via Client::upsertWithCounts() (mongo PR #37). Both layers were solving the same problem twice. Move the pre-filter back to Database::createDocuments where it ran originally, and let each adapter's INSERT IGNORE / ON CONFLICT DO NOTHING / upsert+\$setOnInsert serve only as a race-safety net. The adapter just returns its input as the inserted set. - Database.php: restore \$preExistingIds pre-fetch + chunk filter - SQL.php: drop pre-filter, drop _createdAt reconciliation, drop buildUidTenantLookup helper (-173 lines) - Mongo.php: revert to plain Client::upsert(), return input docs (withTransaction skipDuplicates bypass stays — still needed to avoid WriteConflict under snapshot isolation) - composer.json/lock: back to stable utopia-php/mongo 1.* Trade-off: in the race window between the orchestrator's pre-fetch SELECT and the adapter INSERT, \$modified over-reports by N and onNext fires for N skipped docs. Common case stays correct.

…en comments The deferred-relationships mechanism was useful when the adapter could report which docs were actually inserted (via reconciliation/ upsertWithCounts). With those gone, $batch always equals $chunk, so the defer-then-match dance is dead weight. The orchestrator already pre- filters known duplicates upfront, so any survivor is expected to insert and relationships can be created inline like the non-skipDuplicates path. - Database.php: pre-filter $documents once, before encode/validate, instead of filtering each chunk inline. Drop $deferredRelationships scaffolding and the per-chunk match loop. Collapse if/elseif into one inline createDocumentRelationships call. - Mongo.php: tighten withTransaction skipDuplicates comment (the "relationship writes are deferred" rationale is no longer accurate). - Mirror.php: tighten the captureOnNext comment. - composer.lock: revert unrelated symfony-polyfill-php85 / mongo 1.0.0→ 1.0.2 bumps that crept in from an earlier composer update. Race window unchanged: a doc that passes pre-filter but loses to a concurrent insert still produces an orphan relationship and an over-counted $modified — same as before this commit.

src/Database/Adapter/Mongo.php

abnegate · 2026-04-13T09:41:35Z

src/Database/Adapter.php

+     * @param bool $enable
+     * @return T
+     */
+    public function skipDuplicates(callable $callback, bool $enable = true): mixed


Let's remove the $enable flag and check if skipDuplicates instead, otherwise we add the overhead of calling an extra callable when it's false

abnegate · 2026-04-13T09:42:33Z

src/Database/Database.php

+     * @param bool $idsOnly When true, uses Query::select(['$id']) for a lighter fetch
+     * @return array<string, Document>
+     */
+    private function fetchExistingByIds(Document $collection, array $documents, bool $idsOnly = false): array


We can use find

abnegate · 2026-04-13T09:44:46Z

src/Database/Database.php

+        // skipDuplicates: dedupe intra-batch (first wins) then drop already-existing rows.
+        // Adapter INSERT IGNORE / ON CONFLICT / upsert is the race-safety net.
+        if ($this->skipDuplicates) {
+            $seenIds = [];
+            $deduplicated = [];
+            foreach ($documents as $document) {
+                if ($document->getId() !== '') {
+                    $dedupeKey = $this->tenantKey($document);
+                    if (isset($seenIds[$dedupeKey])) {
+                        continue;
+                    }
+                    $seenIds[$dedupeKey] = true;
+                }
+                $deduplicated[] = $document;
+            }
+            $documents = $deduplicated;
+
+            $preExistingIds = $this->fetchExistingByIds($collection, $documents, idsOnly: true);
+            if (!empty($preExistingIds)) {
+                $documents = \array_values(\array_filter(
+                    $documents,
+                    fn (Document $doc) => !isset($preExistingIds[$this->tenantKey($doc)])
+                ));
+            }
+        }
+


I don't think we need to do this if using insert ignore

@abnegate The child collection is getting orphan rows. when multiple rows with same parent and different childs. where parent is ignored but child is not ignored

database/src/Database/Database.php

Lines 5765 to 5767 in 41704eb

if ($this->resolveRelationships) {

$document = $this->silent(fn () => $this->createDocumentRelationships($collection, $document));

}

abnegate · 2026-04-13T09:48:04Z

src/Database/Mirror.php

+        $insertedFromSource = [];
+        $captureOnNext = function (Document $doc) use (&$insertedFromSource, $onNext): void {
+            $insertedFromSource[] = $doc;
+            if ($onNext !== null) {
+                $onNext($doc);
+            }
+        };


We can never capture like this internally, if there were many docs, this loop could easily OOM

abnegate · 2026-04-13T09:52:36Z

src/Database/Adapter/SQL.php

+    /**
+     * Returns the INSERT keyword, optionally with IGNORE for duplicate handling.
+     * Override in adapter subclasses for DB-specific syntax.
+     */
+    protected function getInsertKeyword(): string
+    {
+        return $this->skipDuplicates ? 'INSERT IGNORE INTO' : 'INSERT INTO';
+    }
+
+    /**
+     * Returns a suffix appended after VALUES clause for duplicate handling.
+     * Override in adapter subclasses (e.g., Postgres uses ON CONFLICT DO NOTHING).
+     */
+    protected function getInsertSuffix(string $table): string
+    {
+        return '';
+    }
+
+    /**
+     * Returns a suffix for the permissions INSERT statement when ignoring duplicates.
+     * Override in adapter subclasses for DB-specific syntax.
+     */
+    protected function getInsertPermissionsSuffix(): string
+    {
+        return '';
+    }
+


Let's group logically with other getter methods

#1 Drop $enable flag on skipDuplicates() scope guard The $enable param made every non-skipDuplicates createDocuments call pay for a closure allocation + extra function call. Branch at the call site instead so the cost only applies when the flag is actually set. - Adapter::skipDuplicates(callable, bool) → skipDuplicates(callable) - Database::skipDuplicates(callable, bool) → skipDuplicates(callable) - Database::createDocuments, Mirror::createDocuments, Pool::delegate, Pool::withTransaction now branch inline. #2 Drop fetchExistingByIds helper, inline find() The helper's per-tenant grouping defended a hypothetical multi-tenant batching scenario that no caller exercises (relationships are intra- tenant, callers always batch per tenant). Existing patterns in the same file (refetchDocuments, relationship loading) just call find() directly. Match that idiom and drop ~70 lines. #4 Mirror: only capture inserted docs in skipDuplicates mode The captureOnNext accumulator paid the cost (closure + per-doc array push) on every createDocuments call, including the common non-skip path. Branch at the entry of Mirror::createDocuments so the capture only happens when skipDuplicates is set; the non-skip path passes through to source/destination unchanged. #5 Move getInsertKeyword/Suffix/PermissionsSuffix to getters cluster Were sitting next to createDocuments(); moved to the getSupport* cluster around line 1030 where other adapter-capability shims live. Not addressed: - #2 partial: the existing patterns (refetchDocuments etc.) don't handle tenant-per-document multi-tenant batches either, so this is consistent. - #3 (drop the pre-filter): rejected. createDocumentRelationships runs in the encoding loop BEFORE the adapter's INSERT IGNORE no-ops the parent, so dropping the pre-filter would deterministically duplicate child rows on every CSV re-import of a collection with relationships (not a race window — every call). The relationships test verifies this. Reverting would require reintroducing the deferred-relationships scaffolding we just removed, and the adapter still couldn't tell us which parents were actually inserted (SQL INSERT IGNORE has no per-row reporting). Pre-filter stays.

src/Database/Database.php

Per Jake's #4 review comment, the captureOnNext accumulator could grow unbounded with the full inserted set. Replace it with a bounded buffer that flushes to the destination every $batchSize docs as the source streams them through onNext, plus a tail flush after the source returns. Memory peak in skipDuplicates mode is now O($batchSize) (~1000 docs) regardless of how many docs the source inserts. Same destination batching efficiency — destination still gets multi-doc createDocuments calls, just one per chunk instead of one giant call at the end. Eligibility checks (SOURCE_ONLY_COLLECTIONS, upgrade status) moved upfront so the flush closure can capture the decision and short-circuit cleanly. Non-skip path is unchanged: direct passthrough, no buffer. The flush-on-fill is load-bearing for the memory bound — see the in-method comment.

src/Database/Database.php

…erDocument When inlining find() per Jake's review, the per-tenant grouping the deleted fetchExistingByIds helper used to do was lost. In tenantPerDocument mode with a multi-tenant batch, the inlined find() runs under the session tenant context (often null for platform workers) and silently misses rows belonging to other tenants — every input doc looks "new" and goes down the create path instead of the update path. This actively breaks appwrite's StatsUsage worker, which accumulates stats across many projects (each tagged with its own $tenant) and flushes them via $dbForLogs->setTenant(null)->setTenantPerDocument(true) followed by upsertDocumentsWithIncrease(...). With the old per-doc getDocument loop, each lookup ran under withTenant($doc->getTenant()) and was correct. The batch find() inlining lost that scoping. Fix per Greptile's suggestion: branch on tenantPerDocument mode and run one find() per unique tenant value under withTenant, merging results into the same lookup hash. K queries (K = unique tenants in the batch) instead of N (per-doc) or 1 (broken). Common single-tenant case still hits the fast batched path. Applied at both call sites: - Database::createDocuments skipDuplicates pre-filter - Database::upsertDocumentsWithIncrease existing-docs lookup Fixes greptile flag at PR #852 discussion r3077481595.

Per Jake's review feedback, the pre-filter in Database::createDocuments was redundant with the adapter's INSERT IGNORE / ON CONFLICT DO NOTHING / upsert + \$setOnInsert path. The adapter already handles duplicates atomically; the upfront SELECT duplicated that work. The semantic is now row-level dedup: - pre-existing rows are silently no-op'd by the adapter - new rows go through - retries are idempotent - source data is never overwritten Tests updated to match: - 5 skipDuplicates tests adjusted to expect full-chunk counts, with data-level assertions unchanged (pre-existing values preserved, new rows inserted). - testCreateDocumentsSkipDuplicatesRelationships encodes the retry scenario: missing nested child gets created via relateDocuments' idempotent path when the parent already exists. - testCreateDocumentsSkipDuplicatesDoesNotDivergeDestination relaxed to source-authoritative semantics only. Destination may transiently hold would-be values during the migration window; backfill reconciles. All 105 skipDuplicates tests across 11 adapter variants pass, full Mirror suite (636 tests) passes, PHPStan level 7 clean.

Follow-up to Jake's #5 review comment — the fix was only applied to SQL.php. Propagate the same grouping to the other adapters and fix one cosmetic byproduct from the Mirror refactor. - Postgres.php: move getInsertKeyword/Suffix/PermissionsSuffix from between createDocuments() and getUpsertStatement() down to the getSupportFor* cluster (matches SQL.php:1036 layout). - SQLite.php: move getInsertKeyword override from the top of the class to after the getSupportFor* cluster. - Adapter.php: skipDuplicates() method was inserted between protected property declarations, breaking the property cluster. Move the method down next to withTransaction() — the other callable-taking scope guard. $skipDuplicates property stays with its peers. - Mirror.php: revert the incidental fn () => collapse on the non-skip destination createDocuments call back to the original two-line form, minimizing unrelated diff noise. Pure relocation / style. No behavior change.

src/Database/Mirror.php

Addresses Greptile #3084293974. The captureOnNext-based forwarding was structurally broken post-pre-filter-removal: SQL adapters return the full input batch from createDocuments regardless of how many rows INSERT IGNORE actually inserted, so onNext fired for every document including skipped duplicates. captureOnNext buffered all of them and flushed the full buffer (carrying the caller's would-be values) to destination, diverging destination from source whenever source had a pre-existing row the caller tried to overwrite. Fix: instead of trying to distinguish inserted vs skipped at insert time, re-fetch source's authoritative state via find() after the adapter write settles, then forward that to destination. Race-free regardless of concurrent writers — destination always receives a faithful snapshot of whatever source holds, because the query runs after source's write has resolved. Cost: one extra SELECT per batch against source when skipDuplicates is active AND Mirror has an upgraded destination. Zero cost on the non-skip path. Memory remains O(batch_size). Net ~47 lines smaller than the previous capture-based approach: - captureOnNext closure removed - flushBuffer closure removed - bounded-buffer machinery removed - shouldMirror upfront-snapshot removed (inlined as eligibility check) Test: testCreateDocumentsSkipDuplicatesDoesNotDivergeDestination now asserts destination holds source's 'Original' value rather than the caller's 'WouldBe' input, per Greptile's suggested assertion.

src/Database/Mirror.php

…p paths Addresses Greptile #3084293974. The captureOnNext-based forwarding was structurally broken: SQL adapters return the full input batch from createDocuments regardless of what INSERT IGNORE actually inserted, so onNext fired for every doc including skipped duplicates. captureOnNext then forwarded the caller's would-be values to destination, diverging destination from source whenever source had a pre-existing row the caller tried to re-insert. Fix: pre-filter against source before the insert to identify which input ids already exist, then forward only the genuinely-new docs to destination. A source-skipped duplicate is a no-op (no user write happened), so nothing should propagate. This matches Greptile's semantics: destination.dup.isEmpty() after skipping a duplicate. While the logic was touched, the skip and non-skip paths (which shared eligibility check, upgrade check, clone loop, destination write, and after-filter loop) are unified into a single flow with $this-> skipDuplicates branches only at the points where behavior actually differs: - source call: wrapped in source->skipDuplicates when active - $toForward: filter out pre-existing ids when active - destination call: wrapped in destination->skipDuplicates when active Non-skip hot path pays zero new closure allocations; the pre-filter block is gated behind if ($this->skipDuplicates) and short-circuits to an empty $existingIds map. Net: -46 lines in createDocuments (~170 → ~95), same test coverage, Greptile's assertion now holds.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/Database/Mirror.php (1)
604-656: ⚠️ Potential issue | 🟠 Major

Pre-filter lookup is not tenant-aware for tenantPerDocument mode.

When tenantPerDocument is enabled, documents with the same $id but different tenants are distinct. The current implementation:

Calls find() with all IDs at once without grouping by tenant (line 616)

Keys $existingIds by $id alone (line 622)

Filters $toForward by $id alone (line 650)

If the batch contains documents for multiple tenants, this incorrectly excludes new documents whose ID exists for a different tenant, or fails to detect true duplicates for the correct tenant.

Use the same tenant-aware pattern as Database::upsertDocumentsWithIncrease(): when tenantPerDocument is enabled, group documents by tenant and call find() per-tenant using withTenant(), then use composite keys ($tenant:$id) for tracking existing IDs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Mirror.php` around lines 604 - 656, The pre-filter that builds
$existingIds and filters $toForward must be made tenant-aware when
tenantPerDocument is enabled: instead of calling $this->source->find(...) once
for all IDs and keying by $id, group the $documents by tenant (use the same
pattern as Database::upsertDocumentsWithIncrease()), and for each tenant call
$this->source->withTenant($tenant)->silent(fn() => ->find(...)) to fetch only
that tenant's existing IDs; store keys in $existingIds as composite
"$tenant:$id" and update the filtering lambda for $toForward (and the earlier
array_map/array_filter that builds $ids) to use the same composite key so only
documents with matching tenant+id are considered duplicates.

♻️ Duplicate comments (1)

src/Database/Database.php (1)
5728-5732: ⚠️ Potential issue | 🔴 Critical

skipDuplicates() still leaves relationship side effects behind.

By the time execution reaches Line 5728, every input document has already gone through createDocumentRelationships(). If the adapter then ignores a duplicate parent row, those child/junction writes are still committed. That can orphan rows or mutate relationships on an existing document even though the parent create was skipped. Relationship writes need to be driven from the successfully inserted $batch, not from the pre-insert input list.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5728 - 5732, The current flow runs
createDocumentRelationships() before calling the adapter and then uses
adapter->skipDuplicates() which can ignore parent inserts while relationship
writes have already been applied; change the flow so the adapter call (the
$insert closure -> $this->withTransaction(fn () =>
$this->adapter->createDocuments($collection, $chunk))) executes first to produce
the actual inserted $batch (ensure adapter->skipDuplicates(...) returns the real
inserted batch when used), and only after $batch is obtained call
createDocumentRelationships() and any child/junction writes against $batch (not
the original input). Adjust skipDuplicates() usage so it returns the inserted
documents batch and ensure relationship-writing code references $batch instead
of the pre-insert input list.

🧹 Nitpick comments (2)

src/Database/Adapter/Postgres.php (1)
2358-2358: Silence the unused override parameter warning.

$table is intentionally unused here. If the parent signature must stay as-is, rename it to $_table or add a local suppression so PHPMD stops flagging this override.
♻️ Small cleanup
-    protected function getInsertSuffix(string $table): string
+    protected function getInsertSuffix(string $_table): string
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/Postgres.php` at line 2358, The override of
getInsertSuffix(string $table) triggers an unused-parameter warning; rename the
unused parameter to $_table (or prefix it with an underscore) in the
getInsertSuffix method signature or add a local PHPMD suppression comment inside
that method so the linter stops flagging it; update the method signature in the
Postgres class (getInsertSuffix) and keep the parent signature unchanged.
tests/e2e/Adapter/Scopes/DocumentTests.php (1)
7930-7933: Clarify whether these assertions validate “processed” vs “created” semantics.

At Line 7930, Line 7984, Line 8110, and Line 8155, the tests assert counts and onNext emissions include duplicate IDs. That encodes a “processed inputs” contract, not “newly created rows.” Please make this explicit (e.g., rename to $processedCount / add a short comment) or add before/after count assertions to separately validate created-row deltas.

Also applies to: 7984-7986, 8110-8112, 8155-8157
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/Adapter/Scopes/DocumentTests.php` around lines 7930 - 7933, The
assertions using $count and $emittedIds currently validate “processed inputs”
semantics (they expect duplicate IDs for repeated processing) rather than newly
created rows; update the test to make this explicit by renaming variables (e.g.,
$processedCount, $processedEmittedIds) or adding a short comment above the block
clarifying “processed inputs may include duplicates,” and if you also need to
assert created-row deltas add before/after row-count checks (capture $beforeRows
= ... then run operation then assert $afterRows - $beforeRows equals expected
new creations) around the assertions that reference $count and $emittedIds (the
same pattern appears for the other occurrences currently asserting duplicates).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Database.php`:
- Around line 7153-7184: The pre-read ID queries in the branches that build
$idsByTenant/$docIds must be chunked to respect the Query::equal('$id', ...)
limit controlled by $this->maxQueryValues; update the tenant loop (where you
build $idsByTenant and call $this->find(... Query::equal('$id', $tenantIds)
...)) and the else branch (where $docIds is used in $this->find(...
Query::equal('$id', $docIds) ...)) to split each array with array_chunk(...,
max(1, $this->maxQueryValues)) and call find for each chunk (using the same
authorization->skip/withTenant/silent wrappers), then merge all returned
documents into $existingDocs (use the same key format $tenant . ':' .
$doc->getId() in the tenant branch) or $existing as before so downstream code
sees the combined results. Ensure you preserve unique IDs before chunking and
maintain limit counts per chunk.

In `@src/Database/Mirror.php`:
- Around line 614-624: The prefetch call in Mirror.php uses Query::equal('$id',
$ids) with an unchunked $ids array which can exceed DocumentsValidator limits;
change the logic that builds $existing by chunking $ids with array_chunk($ids,
max(1, $this->maxQueryValues)) (same pattern used in createDocuments() and
upsertDocumentsWithIncrease()), call $this->source->silent(fn() =>
$this->source->find($collection, [Query::equal('$id', $chunk),
Query::limit(count($chunk))])) for each chunk, and merge results into
$existingIds[$doc->getId()] = true to preserve current behavior. Ensure you
reference $this->maxQueryValues and use Query::equal('$id', $chunk) and
Query::limit(count($chunk)) per chunk.

---

Outside diff comments:
In `@src/Database/Mirror.php`:
- Around line 604-656: The pre-filter that builds $existingIds and filters
$toForward must be made tenant-aware when tenantPerDocument is enabled: instead
of calling $this->source->find(...) once for all IDs and keying by $id, group
the $documents by tenant (use the same pattern as
Database::upsertDocumentsWithIncrease()), and for each tenant call
$this->source->withTenant($tenant)->silent(fn() => ->find(...)) to fetch only
that tenant's existing IDs; store keys in $existingIds as composite
"$tenant:$id" and update the filtering lambda for $toForward (and the earlier
array_map/array_filter that builds $ids) to use the same composite key so only
documents with matching tenant+id are considered duplicates.

---

Duplicate comments:
In `@src/Database/Database.php`:
- Around line 5728-5732: The current flow runs createDocumentRelationships()
before calling the adapter and then uses adapter->skipDuplicates() which can
ignore parent inserts while relationship writes have already been applied;
change the flow so the adapter call (the $insert closure ->
$this->withTransaction(fn () => $this->adapter->createDocuments($collection,
$chunk))) executes first to produce the actual inserted $batch (ensure
adapter->skipDuplicates(...) returns the real inserted batch when used), and
only after $batch is obtained call createDocumentRelationships() and any
child/junction writes against $batch (not the original input). Adjust
skipDuplicates() usage so it returns the inserted documents batch and ensure
relationship-writing code references $batch instead of the pre-insert input
list.

---

Nitpick comments:
In `@src/Database/Adapter/Postgres.php`:
- Line 2358: The override of getInsertSuffix(string $table) triggers an
unused-parameter warning; rename the unused parameter to $_table (or prefix it
with an underscore) in the getInsertSuffix method signature or add a local PHPMD
suppression comment inside that method so the linter stops flagging it; update
the method signature in the Postgres class (getInsertSuffix) and keep the parent
signature unchanged.

In `@tests/e2e/Adapter/Scopes/DocumentTests.php`:
- Around line 7930-7933: The assertions using $count and $emittedIds currently
validate “processed inputs” semantics (they expect duplicate IDs for repeated
processing) rather than newly created rows; update the test to make this
explicit by renaming variables (e.g., $processedCount, $processedEmittedIds) or
adding a short comment above the block clarifying “processed inputs may include
duplicates,” and if you also need to assert created-row deltas add before/after
row-count checks (capture $beforeRows = ... then run operation then assert
$afterRows - $beforeRows equals expected new creations) around the assertions
that reference $count and $emittedIds (the same pattern appears for the other
occurrences currently asserting duplicates).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 06d46ba1-a7eb-478b-ad90-a27f372a0327

📥 Commits

Reviewing files that changed from the base of the PR and between 455da70 and 934ec04.

📒 Files selected for processing (10)

src/Database/Adapter.php
src/Database/Adapter/Mongo.php
src/Database/Adapter/Pool.php
src/Database/Adapter/Postgres.php
src/Database/Adapter/SQL.php
src/Database/Adapter/SQLite.php
src/Database/Database.php
src/Database/Mirror.php
tests/e2e/Adapter/MirrorTest.php
tests/e2e/Adapter/Scopes/DocumentTests.php

🚧 Files skipped from review as they are similar to previous changes (3)

src/Database/Adapter.php
src/Database/Adapter/SQLite.php
src/Database/Adapter/Mongo.php

src/Database/Database.php

src/Database/Mirror.php

Addresses CodeRabbit #3084994641 and #3084994657 (+ Greptile #3084466265). Three call sites were passing unchunked id arrays into Query::equal('$id', ...), which throws QueryException from DocumentsValidator once the array exceeds 5000 values. Mirror::createDocuments (skipDuplicates pre-filter): defensive — Mirror's public API accepts arbitrary-size input, and while current callers pass batches below the limit, the code shouldn't throw on a theoretical direct caller with >5000 input ids. Database::upsertDocumentsWithIncrease (both tenantPerDocument and single- tenant branches): real regression. The batched pre-read was introduced in ae929db to restore per-tenant grouping for StatsUsage. It replaced a per-doc getDocument() loop on main (which is safe from validator limits because each call queries one id). Passing the full input array to Query::equal now breaks at >5000 input docs — a path that previously worked slowly on main. All three sites use self::RELATION_QUERY_CHUNK_SIZE (5000) and Query::limit(PHP_INT_MAX), matching the existing pattern used by the five populateDocumentsRelationshipsBatch chunking sites. Accumulator shape is adapted to our map-only use case: we build the existence set / lookup map directly inside the chunk loop instead of carrying a flat list we don't need.

coderabbitai

♻️ Duplicate comments (2)

src/Database/Database.php (2)
7160-7187: ⚠️ Potential issue | 🟠 Major

Respect maxQueryValues when chunking the pre-read lookups.

These find() calls still validate Query::equal('$id', $chunk) against $this->maxQueryValues. If a caller lowers that limit below 5000, chunking by RELATION_QUERY_CHUNK_SIZE will still fail before the upsert starts.
✂️ Suggested change
-                foreach (\array_chunk($tenantIds, self::RELATION_QUERY_CHUNK_SIZE) as $chunk) {
+                foreach (\array_chunk($tenantIds, \max(1, $this->maxQueryValues)) as $chunk) {
@@
-                foreach (\array_chunk($docIds, self::RELATION_QUERY_CHUNK_SIZE) as $chunk) {
+                foreach (\array_chunk($docIds, \max(1, $this->maxQueryValues)) as $chunk) {
Based on learnings, both createDocuments() and upsertDocumentsWithIncrease() prefetch existing document IDs in chunks using array_chunk(..., max(1, $this->maxQueryValues)) to satisfy DocumentsValidator limits on Query::equal('$id', ...).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 7160 - 7187, The pre-read find()
calls chunk IDs using self::RELATION_QUERY_CHUNK_SIZE but Query::equal('$id',
$chunk) is validated against $this->maxQueryValues, so change the chunking in
both loops (the tenant branch that iterates over $idsByTenant and the else
branch that iterates over $docIds) to use array_chunk(..., max(1,
$this->maxQueryValues)) instead of self::RELATION_QUERY_CHUNK_SIZE so the chunk
size never exceeds the DocumentsValidator limit when calling find().
5728-5732: ⚠️ Potential issue | 🔴 Critical

skipDuplicates() still lets skipped rows mutate relationship state.

By the time this guard runs, every input document has already gone through createDocumentRelationships() at Lines 5720-5722. A row ignored here can still create/update children or junction rows, and duplicate child/junction inserts triggered during that earlier relationship pass still go through the normal createDocument() path. The insert result needs to be authoritative for relationship work, or those writes need to be deferred until after you know which rows were actually inserted.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5728 - 5732, Summary:
duplicate-skipping allows skipped rows to still mutate relationships because
createDocumentRelationships runs before the adapter decides which rows are
inserted. Fix: change the control flow so relationship work is driven by the
authoritative insert result — either defer createDocumentRelationships until
after $batch is computed or make adapter->skipDuplicates return per-row
insertion outcomes and use that to gate relationship writes; specifically update
the sequence around the $insert closure, withTransaction,
adapter->skipDuplicates and createDocumentRelationships (and any calls to
createDocument) so relationship/junction creation only runs for rows confirmed
inserted by createDocuments (or is skipped per the adapter-provided insertion
map).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/Database/Database.php`:
- Around line 7160-7187: The pre-read find() calls chunk IDs using
self::RELATION_QUERY_CHUNK_SIZE but Query::equal('$id', $chunk) is validated
against $this->maxQueryValues, so change the chunking in both loops (the tenant
branch that iterates over $idsByTenant and the else branch that iterates over
$docIds) to use array_chunk(..., max(1, $this->maxQueryValues)) instead of
self::RELATION_QUERY_CHUNK_SIZE so the chunk size never exceeds the
DocumentsValidator limit when calling find().
- Around line 5728-5732: Summary: duplicate-skipping allows skipped rows to
still mutate relationships because createDocumentRelationships runs before the
adapter decides which rows are inserted. Fix: change the control flow so
relationship work is driven by the authoritative insert result — either defer
createDocumentRelationships until after $batch is computed or make
adapter->skipDuplicates return per-row insertion outcomes and use that to gate
relationship writes; specifically update the sequence around the $insert
closure, withTransaction, adapter->skipDuplicates and
createDocumentRelationships (and any calls to createDocument) so
relationship/junction creation only runs for rows confirmed inserted by
createDocuments (or is skipped per the adapter-provided insertion map).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c7a16723-e265-42e8-8fd4-3a629176caf0

📥 Commits

Reviewing files that changed from the base of the PR and between 934ec04 and fa0e373.

📒 Files selected for processing (2)

src/Database/Database.php
src/Database/Mirror.php

🚧 Files skipped from review as they are similar to previous changes (1)

src/Database/Mirror.php

…SIZE CodeRabbit #3084994641 + #3084994657 follow-up: the earlier chunking fix used self::RELATION_QUERY_CHUNK_SIZE (5000), which matches the existing populateDocumentsRelationshipsBatch pattern but doesn't respect the user-configurable validator ceiling. Query::equal('\$id', \$chunk) is validated by DocumentsValidator against \$this->maxQueryValues, not RELATION_QUERY_CHUNK_SIZE. A caller doing \$db->setMaxQueryValues(100) followed by any of the three chunked pre-reads would still throw QueryException because chunks of 5000 exceed the configured 100. Switch to array_chunk(..., max(1, \$this->maxQueryValues)) at all three sites so chunk size always respects the actual validator limit. In the default case (maxQueryValues = 5000) this produces identical chunks to the old code; under customization it correctly scales down. Note: the 5 existing relationship-population chunking sites still use RELATION_QUERY_CHUNK_SIZE and have the same latent bug under customized maxQueryValues. Leaving that for a follow-up since fixing the pattern globally is out of scope for this PR.

CSV / JSON / appwrite-to-appwrite imports that re-run on the same batch (e.g. user re-uploads the same file, or a worker retries a failed chunk) currently throw DuplicateException and abort the whole batch. Wrap the row-buffer flush in the new skipDuplicates() scope guard so duplicate-by-id rows are silently no-op'd at the adapter layer (INSERT IGNORE / ON CONFLICT DO NOTHING / $setOnInsert), letting the rest of the batch proceed. The existing skipRelationshipsExistCheck() wrapper is preserved (FK-target guard); skipDuplicates composes around it. Feature-branch note: depends on utopia-php/database's skipDuplicates() scope guard from PR utopia-php/database#852. composer.json is temporarily pinned to dev-csv-import-upsert-v2 with a 5.99.0 alias so composer can resolve the 5.* constraint transitively. Must be reset to the proper release version (^5.X.Y) once PR #852 merges and utopia-php/database ships.

Exposes two new optional boolean params on the three migration creation endpoints so CSV / JSON / appwrite-to-appwrite imports can choose how to handle rows whose IDs already exist at the destination. Endpoints updated (app/controllers/api/migrations.php): - POST /v1/migrations/appwrite - POST /v1/migrations/csv/imports - POST /v1/migrations/json/imports Parameter semantics: - overwrite=true -> destination uses upsertDocuments instead of createDocuments; existing rows are replaced with imported values - skip=true -> destination wraps createDocuments in skipDuplicates; existing rows are preserved unchanged, duplicate-id rows silently no-op - both false -> default; fails fast on DuplicateException (original behavior, unchanged) - both true -> overwrite wins (upsert subsumes skip) Both params are stored in the migration Document's options array (matches the existing pattern for destination behavior config like path, size, delimiter, bucketId, etc.) and read back in the worker's processDestination() to instantiate DestinationAppwrite with the new constructor params. Feature-branch note: depends on utopia-php/migration#feat/skip-duplicates (DestinationAppwrite constructor params) which in turn depends on utopia-php/database#852 (skipDuplicates scope guard). composer.json is temporarily pinned to dev-feat/skip-duplicates and dev-csv-import-upsert-v2 respectively; both must be reset to proper release versions once the upstream PRs merge.

greptile-apps bot reviewed Apr 8, 2026

View reviewed changes

src/Database/Adapter/Mongo.php Outdated Show resolved Hide resolved

src/Database/Database.php Outdated Show resolved Hide resolved

src/Database/Adapter/SQL.php Outdated Show resolved Hide resolved

greptile-apps bot reviewed Apr 8, 2026

View reviewed changes

src/Database/Adapter/SQL.php Show resolved Hide resolved

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

src/Database/Adapter/Mongo.php Outdated Show resolved Hide resolved

src/Database/Adapter/Mongo.php Outdated Show resolved Hide resolved

src/Database/Database.php Outdated Show resolved Hide resolved

src/Database/Database.php Outdated Show resolved Hide resolved

greptile-apps bot reviewed Apr 8, 2026

View reviewed changes

src/Database/Adapter/SQL.php Outdated Show resolved Hide resolved

src/Database/Database.php Outdated Show resolved Hide resolved

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

src/Database/Adapter/SQL.php Outdated Show resolved Hide resolved

src/Database/Adapter/SQL.php Outdated Show resolved Hide resolved

src/Database/Adapter/SQL.php Outdated Show resolved Hide resolved

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

src/Database/Adapter/SQL.php Outdated Show resolved Hide resolved

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

src/Database/Adapter/SQL.php Outdated Show resolved Hide resolved

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

src/Database/Adapter/SQL.php Outdated Show resolved Hide resolved

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

src/Database/Adapter/Mongo.php Outdated Show resolved Hide resolved

premtsd-code added 2 commits April 9, 2026 11:26

Add e2e tests for createDocuments ignore mode

bee42d4

- testCreateDocumentsIgnoreDuplicates: mixed batch with onNext assertions - testCreateDocumentsIgnoreIntraBatchDuplicates: first-wins, no ACL drift - testCreateDocumentsIgnoreAllDuplicates: zero inserts, empty onNext

premtsd-code force-pushed the csv-import-upsert-v2 branch from f2b08dc to bee42d4 Compare April 9, 2026 10:28

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

src/Database/Adapter/SQL.php Show resolved Hide resolved

greptile-apps bot reviewed Apr 10, 2026

View reviewed changes

src/Database/Adapter/Mongo.php Outdated Show resolved Hide resolved

premtsd-code added 4 commits April 10, 2026 08:00

Revert "Fix Mongo adapter ignore mode: pass ignoreDuplicates to clien…

2906dda

…t and fix _uid mapping" This reverts commit 93a9136.

greptile-apps bot reviewed Apr 12, 2026

View reviewed changes

src/Database/Adapter/SQL.php Outdated Show resolved Hide resolved

premtsd-code added 5 commits April 12, 2026 14:52

Merge remote-tracking branch 'origin/main' into csv-import-upsert-v2

a24358c

greptile-apps bot reviewed Apr 13, 2026

View reviewed changes

src/Database/Adapter/Mongo.php Outdated Show resolved Hide resolved

Revert "SQL adapter: remove redundant skipDuplicates pre-filter"

3b783af

This reverts commit c6d566e.

premtsd-code mentioned this pull request Apr 13, 2026

feat: add upsertWithCounts() with separate matched/upserted counts utopia-php/mongo#37

Open

4 tasks

premtsd-code added 2 commits April 13, 2026 07:10

greptile-apps bot reviewed Apr 13, 2026

View reviewed changes

src/Database/Adapter/Mongo.php Show resolved Hide resolved

abnegate reviewed Apr 13, 2026

View reviewed changes

greptile-apps bot reviewed Apr 13, 2026

View reviewed changes

src/Database/Database.php Outdated Show resolved Hide resolved

greptile-apps bot reviewed Apr 14, 2026

View reviewed changes

src/Database/Database.php Outdated Show resolved Hide resolved

premtsd-code added 4 commits April 14, 2026 08:44

skipDuplicates: drop intra-batch dedup, trim verbose test comments

0fd7c33

greptile-apps bot reviewed Apr 15, 2026

View reviewed changes

src/Database/Mirror.php Outdated Show resolved Hide resolved

premtsd-code mentioned this pull request Apr 15, 2026

Add skipDuplicates() scope guard to createDocuments #855

Closed

3 tasks

greptile-apps bot reviewed Apr 15, 2026

View reviewed changes

src/Database/Mirror.php Outdated Show resolved Hide resolved

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

src/Database/Database.php Outdated Show resolved Hide resolved

src/Database/Mirror.php Show resolved Hide resolved

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

This was referenced Apr 15, 2026

feat: add overwrite and skip options to DestinationAppwrite utopia-php/migration#169

Draft

feat(migrations): add overwrite and skip options to CSV/JSON/Appwrite imports appwrite/appwrite#11910

Draft

	if ($this->resolveRelationships) {
	$document = $this->silent(fn () => $this->createDocumentRelationships($collection, $document));
	}

Conversation

premtsd-code commented Apr 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Per-adapter implementation

Hot-path cost

Mirror dual-write correctness

upsertDocumentsWithIncrease tenant grouping (regression fix)

Tests

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

greptile-apps bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abnegate Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

abnegate Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

abnegate Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

premtsd-code Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

abnegate Apr 13, 2026

Choose a reason for hiding this comment

premtsd-code commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

`upsertDocumentsWithIncrease` tenant grouping (regression fix)

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

greptile-apps bot commented Apr 8, 2026 •

edited

Loading