quickwit-metastore-migration

Migrates Quickwit's metastore from file-backed (S3) to PostgreSQL.

Why

The file-backed metastore stores one metastore.json per index on S3. At scale this causes gRPC timeouts on list_splits, slow GC, and full JSON download/parse/rewrite on every metadata operation. PostgreSQL metastore fixes all of this with indexed queries. There's no built-in migration path, so this tool fills that gap.

How it works

Reads from file-backed metastore using Quickwit's own FileBackedMetastore (handles all JSON versioning, manifest loading, S3 access)
Writes to PostgreSQL using raw sqlx with the same schema Quickwit uses (runs upstream migrations, UNNEST-based batch inserts)
Per-index transaction: if anything fails, that index rolls back cleanly
Post-migration verification compares counts between source and target

Delete task opstamp remapping

This is the trickiest part. File-backed metastore has per-index delete task opstamp sequences (1, 2, 3...) and splits reference them via delete_opstamp. PostgreSQL uses a global BIGSERIAL, so we insert delete tasks one-by-one to get the new auto-assigned opstamps, then remap split delete_opstamp values using a mapping table. If a split has delete_opstamp=2, we find the highest new opstamp whose original was <= 2.

Split maturity

Mature -> timestamp 0, Immature -> create_timestamp + maturation_period. Same logic as upstream quickwit-metastore/src/metastore/postgres/utils.rs.

Production migration path

The migration tool only reads from the file-backed metastore (never modifies it), so your source data is always safe. The risk is new data arriving during the migration window that ends up only in the old metastore. Here's the full path:

1. Prepare PostgreSQL

Provision a Postgres instance. Quickwit is not heavy on metastore queries, so nothing fancy needed — the same instance you'd use for any small-to-medium service. Make sure your Quickwit nodes can reach it.

2. Dry run

Run the migration with --dry-run first to see what would be migrated and catch any config issues:

quickwit-metastore-migration \
  --source-config node.yaml \
  --target-postgres-url postgresql://quickwit:pass@pg-host:5432/quickwit_metastore \
  --dry-run

3. Stop writers (indexers, control plane, janitor)

You need to stop everything that writes to the metastore. That means:

Component	Why stop it	What happens if you don't
Indexers	Create new splits, run merges	New splits appear in file-backed metastore but not in Postgres — data loss after switchover
Control plane	Schedules indexing plans, assigns shards	Could trigger new indexing work during migration
Janitor	Runs GC, deletes MarkedForDeletion splits from S3	Could delete split files that the migration tool is about to reference

Searchers can stay running during migration for read availability. They're read-only against the metastore. Users can keep querying while you migrate.

If you're using Kafka sources: Quickwit tracks consumer offsets (checkpoints) in the metastore. After migration, Quickwit will resume from the last committed checkpoint — no data loss, it just reprocesses from where it left off.

If you're using the ingest API: wait for any in-flight data to be committed to splits before stopping indexers. Check that no splits are in Staged state (meaning they haven't been published yet). You can check via:

curl http://quickwit:7280/api/v1/indexes/{index_id}/splits?split_states=Staged

4. Run the migration

quickwit-metastore-migration \
  --source-config node.yaml \
  --target-postgres-url postgresql://quickwit:pass@pg-host:5432/quickwit_metastore

The tool will:

Run Postgres schema migrations (creates tables, indexes, triggers)
Migrate each index in its own transaction
Print progress and verification results
Exit 0 if everything matches

For large clusters, you can migrate a single index first to test:

quickwit-metastore-migration \
  --source-config node.yaml \
  --target-postgres-url postgresql://...  \
  --index my-important-index

5. Verify

The tool runs verification automatically, but you can also sanity-check directly:

-- Check index count
SELECT COUNT(*) FROM indexes;

-- Check splits per index
SELECT i.index_id, s.split_state, COUNT(*)
FROM splits s JOIN indexes i ON s.index_uid = i.index_uid
GROUP BY i.index_id, s.split_state ORDER BY 1, 2;

6. Reconfigure Quickwit to use PostgreSQL metastore

Change metastore_uri in your Quickwit config (node.yaml, Helm values, env vars — however you deploy) from the S3 path to the Postgres URL:

# Before
metastore_uri: s3://my-bucket/indexes

# After
metastore_uri: postgresql://quickwit:pass@pg-host:5432/quickwit_metastore

Everything else stays the same — default_index_root_uri still points to S3 because that's where the actual split data files live. Only the metastore (index metadata, split registry, delete tasks) moves to Postgres.

7. Start everything back up

Start indexers, control plane, janitor. Searchers that were still running will need a restart to pick up the new metastore URI.

Kafka sources will resume from their last checkpoint automatically.

8. (Optional) Clean up old metastore files

After you're confident everything works, you can remove the old metastore.json files from S3. They're not needed anymore. Don't delete the split data files — those are still referenced by the Postgres metastore.

Rollback

If anything goes wrong after switchover, just point metastore_uri back to the S3 path and restart. The file-backed metastore was never modified. You'll lose any data that was ingested after the switchover (it's only in Postgres), but everything before the migration is intact.

Total downtime

Only write downtime — no new data ingested during the migration window. Read downtime is zero if you keep searchers running. The migration itself takes seconds to minutes depending on how many splits you have (it's just reading JSON from S3 and inserting rows into Postgres).

Usage

quickwit-metastore-migration \
  --source-config <path-to-node.yaml> \
  --target-postgres-url postgresql://user:pass@host:5432/quickwit_metastore

Flags

Flag	Default	Description
`--source-config`	required	Path to Quickwit node.yaml (metastore_uri + S3 config)
`--target-postgres-url`	required	PostgreSQL connection string
`--dry-run`	false	Show what would be migrated, don't write
`--index <id>`	all	Only migrate this one index
`--batch-size <n>`	500	Splits per INSERT batch
`--skip-schema-setup`	false	Skip running PostgreSQL migrations

Example node.yaml (source)

version: 0.8
metastore_uri: s3://my-bucket/indexes
default_index_root_uri: s3://my-bucket/indexes
storage:
  s3:
    region: us-east-1

Building

Needs protoc and cmake:

apt-get install -y protobuf-compiler cmake   # or brew install protobuf cmake
RUSTFLAGS="--cfg tokio_unstable" cargo build --release

Or with Docker (recommended, avoids local toolchain issues):

docker build -t quickwit-metastore-migration .

PostgreSQL migrations

The migrations/ directory contains copies of the upstream Quickwit PostgreSQL migrations from quickwit-metastore/migrations/postgresql/ at commit 3bfdbbbbf. They're copied (not symlinked) so this tool is self-contained. These create all the tables Quickwit expects: indexes, splits, delete_tasks, shards, index_templates, etc.

Source layout

src/
  main.rs       CLI entry, glues everything together
  reader.rs     Reads from FileBackedMetastore via MetastoreService RPCs
  writer.rs     Writes to PostgreSQL via sqlx (schema setup + inserts)
  migrator.rs   Per-index orchestration: read -> transform -> write in a tx
  verify.rs     Post-migration count comparison (source vs target)

Tests

There are two test setups, both using docker-compose.

Quick structural test (`docker-compose.yml`)

Tests that the migration tool correctly reads hand-crafted metastore.json files and inserts the data into PostgreSQL with correct row counts and states. No Quickwit instance involved - just MinIO + Postgres + the migration tool.

Test data (test/test_data/):

test-index-clean: 5 Published splits, 0 delete tasks, simple
test-index-dirty: 20 splits (10 Published, 5 MarkedForDeletion, 5 Staged), 2 delete tasks, tags on splits

Run:

docker compose up -d minio postgres
docker compose run --rm minio-setup
docker compose run --rm migration
# Then verify:
./test/verify.sh postgresql://quickwit:quickwit@localhost:15432/quickwit_metastore
docker compose down -v

verify.sh checks Postgres tables directly: index counts, split counts per state, delete task counts, tags stored as TEXT[].

Full E2E test (`docker-compose.e2e.yml`)

The real deal. Spins up actual Quickwit instances, indexes 5000 real documents, migrates the metastore, then verifies every single document matches between the old and new metastore backends.

What it does:

Starts MinIO (S3) + PostgreSQL
Starts Quickwit with file-backed metastore on MinIO
Creates an index, ingests 5000 docs in 10 batches (varied log messages, timestamps spanning ~10 hours, 4 log levels, 14 services)
Runs baseline queries (total count, per-level counts, text search, tag filter, time range filter), dumps all 5000 docs
Stops file-backed Quickwit
Runs the migration tool (file-backed -> PostgreSQL)
Starts Quickwit with PostgreSQL metastore (same S3 for split storage)
Runs the exact same queries and compares every result
Dumps all 5000 docs again and compares doc-by-doc (sorted by timestamp+message, field-level comparison)
Ingests 100 more docs to prove the postgres metastore is writable
Checks split metadata integrity (time ranges, doc count sums)

Run:

./test/e2e-test.sh

Takes ~3-4 minutes. Uses quickwit/quickwit:edge image. Output shows PASS/FAIL for each check.

Generate test data (already committed, but if you want to regenerate):

python3 test/generate-docs.py

Creates test/batches/batch_001.ndjson through batch_010.ndjson (500 docs each, seeded random so reproducible).

Results are saved to test/results/ after a run:

baseline_docs.jsonl / postgres_docs.jsonl - full doc dumps
baseline_sorted.jsonl / postgres_sorted.jsonl - sorted for diff
e2e-output.log - full test output

What the E2E test proves

All 5000 documents are searchable after migration (not just metadata counts)
Text search, tag filtering, and time range filtering all work identically
First and last document content matches exactly
Doc-by-doc comparison of all 5000 docs: zero differences
Post-migration ingest works (metastore is fully writable)
Split metadata (time ranges, doc counts) is preserved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quickwit-metastore-migration

Why

How it works

Delete task opstamp remapping

Split maturity

Production migration path

1. Prepare PostgreSQL

2. Dry run

3. Stop writers (indexers, control plane, janitor)

4. Run the migration

5. Verify

6. Reconfigure Quickwit to use PostgreSQL metastore

7. Start everything back up

8. (Optional) Clean up old metastore files

Rollback

Total downtime

Usage

Flags

Example node.yaml (source)

Building

PostgreSQL migrations

Source layout

Tests

Quick structural test (`docker-compose.yml`)

Full E2E test (`docker-compose.e2e.yml`)

What the E2E test proves

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

quickwit-metastore-migration

Why

How it works

Delete task opstamp remapping

Split maturity

Production migration path

1. Prepare PostgreSQL

2. Dry run

3. Stop writers (indexers, control plane, janitor)

4. Run the migration

5. Verify

6. Reconfigure Quickwit to use PostgreSQL metastore

7. Start everything back up

8. (Optional) Clean up old metastore files

Rollback

Total downtime

Usage

Flags

Example node.yaml (source)

Building

PostgreSQL migrations

Source layout

Tests

Quick structural test (docker-compose.yml)

Full E2E test (docker-compose.e2e.yml)

What the E2E test proves

Quick structural test (`docker-compose.yml`)

Full E2E test (`docker-compose.e2e.yml`)