Skip to content

hatemosphere/quickwit-metastore-migration

Repository files navigation

quickwit-metastore-migration

Migrates Quickwit's metastore from file-backed (S3) to PostgreSQL.

Why

The file-backed metastore stores one metastore.json per index on S3. At scale this causes gRPC timeouts on list_splits, slow GC, and full JSON download/parse/rewrite on every metadata operation. PostgreSQL metastore fixes all of this with indexed queries. There's no built-in migration path, so this tool fills that gap.

How it works

  1. Reads from file-backed metastore using Quickwit's own FileBackedMetastore (handles all JSON versioning, manifest loading, S3 access)
  2. Writes to PostgreSQL using raw sqlx with the same schema Quickwit uses (runs upstream migrations, UNNEST-based batch inserts)
  3. Per-index transaction: if anything fails, that index rolls back cleanly
  4. Post-migration verification compares counts between source and target

Delete task opstamp remapping

This is the trickiest part. File-backed metastore has per-index delete task opstamp sequences (1, 2, 3...) and splits reference them via delete_opstamp. PostgreSQL uses a global BIGSERIAL, so we insert delete tasks one-by-one to get the new auto-assigned opstamps, then remap split delete_opstamp values using a mapping table. If a split has delete_opstamp=2, we find the highest new opstamp whose original was <= 2.

Split maturity

Mature -> timestamp 0, Immature -> create_timestamp + maturation_period. Same logic as upstream quickwit-metastore/src/metastore/postgres/utils.rs.

Production migration path

The migration tool only reads from the file-backed metastore (never modifies it), so your source data is always safe. The risk is new data arriving during the migration window that ends up only in the old metastore. Here's the full path:

1. Prepare PostgreSQL

Provision a Postgres instance. Quickwit is not heavy on metastore queries, so nothing fancy needed — the same instance you'd use for any small-to-medium service. Make sure your Quickwit nodes can reach it.

2. Dry run

Run the migration with --dry-run first to see what would be migrated and catch any config issues:

quickwit-metastore-migration \
  --source-config node.yaml \
  --target-postgres-url postgresql://quickwit:pass@pg-host:5432/quickwit_metastore \
  --dry-run

3. Stop writers (indexers, control plane, janitor)

You need to stop everything that writes to the metastore. That means:

Component Why stop it What happens if you don't
Indexers Create new splits, run merges New splits appear in file-backed metastore but not in Postgres — data loss after switchover
Control plane Schedules indexing plans, assigns shards Could trigger new indexing work during migration
Janitor Runs GC, deletes MarkedForDeletion splits from S3 Could delete split files that the migration tool is about to reference

Searchers can stay running during migration for read availability. They're read-only against the metastore. Users can keep querying while you migrate.

If you're using Kafka sources: Quickwit tracks consumer offsets (checkpoints) in the metastore. After migration, Quickwit will resume from the last committed checkpoint — no data loss, it just reprocesses from where it left off.

If you're using the ingest API: wait for any in-flight data to be committed to splits before stopping indexers. Check that no splits are in Staged state (meaning they haven't been published yet). You can check via:

curl http://quickwit:7280/api/v1/indexes/{index_id}/splits?split_states=Staged

4. Run the migration

quickwit-metastore-migration \
  --source-config node.yaml \
  --target-postgres-url postgresql://quickwit:pass@pg-host:5432/quickwit_metastore

The tool will:

  • Run Postgres schema migrations (creates tables, indexes, triggers)
  • Migrate each index in its own transaction
  • Print progress and verification results
  • Exit 0 if everything matches

For large clusters, you can migrate a single index first to test:

quickwit-metastore-migration \
  --source-config node.yaml \
  --target-postgres-url postgresql://...  \
  --index my-important-index

5. Verify

The tool runs verification automatically, but you can also sanity-check directly:

-- Check index count
SELECT COUNT(*) FROM indexes;

-- Check splits per index
SELECT i.index_id, s.split_state, COUNT(*)
FROM splits s JOIN indexes i ON s.index_uid = i.index_uid
GROUP BY i.index_id, s.split_state ORDER BY 1, 2;

6. Reconfigure Quickwit to use PostgreSQL metastore

Change metastore_uri in your Quickwit config (node.yaml, Helm values, env vars — however you deploy) from the S3 path to the Postgres URL:

# Before
metastore_uri: s3://my-bucket/indexes

# After
metastore_uri: postgresql://quickwit:pass@pg-host:5432/quickwit_metastore

Everything else stays the same — default_index_root_uri still points to S3 because that's where the actual split data files live. Only the metastore (index metadata, split registry, delete tasks) moves to Postgres.

7. Start everything back up

Start indexers, control plane, janitor. Searchers that were still running will need a restart to pick up the new metastore URI.

Kafka sources will resume from their last checkpoint automatically.

8. (Optional) Clean up old metastore files

After you're confident everything works, you can remove the old metastore.json files from S3. They're not needed anymore. Don't delete the split data files — those are still referenced by the Postgres metastore.

Rollback

If anything goes wrong after switchover, just point metastore_uri back to the S3 path and restart. The file-backed metastore was never modified. You'll lose any data that was ingested after the switchover (it's only in Postgres), but everything before the migration is intact.

Total downtime

Only write downtime — no new data ingested during the migration window. Read downtime is zero if you keep searchers running. The migration itself takes seconds to minutes depending on how many splits you have (it's just reading JSON from S3 and inserting rows into Postgres).


Usage

quickwit-metastore-migration \
  --source-config <path-to-node.yaml> \
  --target-postgres-url postgresql://user:pass@host:5432/quickwit_metastore

Flags

Flag Default Description
--source-config required Path to Quickwit node.yaml (metastore_uri + S3 config)
--target-postgres-url required PostgreSQL connection string
--dry-run false Show what would be migrated, don't write
--index <id> all Only migrate this one index
--batch-size <n> 500 Splits per INSERT batch
--skip-schema-setup false Skip running PostgreSQL migrations

Example node.yaml (source)

version: 0.8
metastore_uri: s3://my-bucket/indexes
default_index_root_uri: s3://my-bucket/indexes
storage:
  s3:
    region: us-east-1

Building

Needs protoc and cmake:

apt-get install -y protobuf-compiler cmake   # or brew install protobuf cmake
RUSTFLAGS="--cfg tokio_unstable" cargo build --release

Or with Docker (recommended, avoids local toolchain issues):

docker build -t quickwit-metastore-migration .

PostgreSQL migrations

The migrations/ directory contains copies of the upstream Quickwit PostgreSQL migrations from quickwit-metastore/migrations/postgresql/ at commit 3bfdbbbbf. They're copied (not symlinked) so this tool is self-contained. These create all the tables Quickwit expects: indexes, splits, delete_tasks, shards, index_templates, etc.

Source layout

src/
  main.rs       CLI entry, glues everything together
  reader.rs     Reads from FileBackedMetastore via MetastoreService RPCs
  writer.rs     Writes to PostgreSQL via sqlx (schema setup + inserts)
  migrator.rs   Per-index orchestration: read -> transform -> write in a tx
  verify.rs     Post-migration count comparison (source vs target)

Tests

There are two test setups, both using docker-compose.

Quick structural test (docker-compose.yml)

Tests that the migration tool correctly reads hand-crafted metastore.json files and inserts the data into PostgreSQL with correct row counts and states. No Quickwit instance involved - just MinIO + Postgres + the migration tool.

Test data (test/test_data/):

  • test-index-clean: 5 Published splits, 0 delete tasks, simple
  • test-index-dirty: 20 splits (10 Published, 5 MarkedForDeletion, 5 Staged), 2 delete tasks, tags on splits

Run:

docker compose up -d minio postgres
docker compose run --rm minio-setup
docker compose run --rm migration
# Then verify:
./test/verify.sh postgresql://quickwit:quickwit@localhost:15432/quickwit_metastore
docker compose down -v

verify.sh checks Postgres tables directly: index counts, split counts per state, delete task counts, tags stored as TEXT[].

Full E2E test (docker-compose.e2e.yml)

The real deal. Spins up actual Quickwit instances, indexes 5000 real documents, migrates the metastore, then verifies every single document matches between the old and new metastore backends.

What it does:

  1. Starts MinIO (S3) + PostgreSQL
  2. Starts Quickwit with file-backed metastore on MinIO
  3. Creates an index, ingests 5000 docs in 10 batches (varied log messages, timestamps spanning ~10 hours, 4 log levels, 14 services)
  4. Runs baseline queries (total count, per-level counts, text search, tag filter, time range filter), dumps all 5000 docs
  5. Stops file-backed Quickwit
  6. Runs the migration tool (file-backed -> PostgreSQL)
  7. Starts Quickwit with PostgreSQL metastore (same S3 for split storage)
  8. Runs the exact same queries and compares every result
  9. Dumps all 5000 docs again and compares doc-by-doc (sorted by timestamp+message, field-level comparison)
  10. Ingests 100 more docs to prove the postgres metastore is writable
  11. Checks split metadata integrity (time ranges, doc count sums)

Run:

./test/e2e-test.sh

Takes ~3-4 minutes. Uses quickwit/quickwit:edge image. Output shows PASS/FAIL for each check.

Generate test data (already committed, but if you want to regenerate):

python3 test/generate-docs.py

Creates test/batches/batch_001.ndjson through batch_010.ndjson (500 docs each, seeded random so reproducible).

Results are saved to test/results/ after a run:

  • baseline_docs.jsonl / postgres_docs.jsonl - full doc dumps
  • baseline_sorted.jsonl / postgres_sorted.jsonl - sorted for diff
  • e2e-output.log - full test output

What the E2E test proves

  • All 5000 documents are searchable after migration (not just metadata counts)
  • Text search, tag filtering, and time range filtering all work identically
  • First and last document content matches exactly
  • Doc-by-doc comparison of all 5000 docs: zero differences
  • Post-migration ingest works (metastore is fully writable)
  • Split metadata (time ranges, doc counts) is preserved

About

One-way migration tool for moving Quickwit's metastore from file-backed (S3) to PostgreSQL. No built-in path exists upstream, so this fills the gap.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors