the “best” approach depends on what kind of data you’re seeding. The senior answer is: don’t treat all seed data the same—split it into categories and choose the mechanism that matches the lifecycle.
1) Classify seed data first
A) Reference / canonical data (good for migrations)
Examples:
- roles:
ADMIN,USER - countries, currencies
- feature flags defaults
- permission matrices (if stable)
- system config keys required for startup
✅ This belongs in migration tooling because:
- app may not start without it
- it must be consistent across environments
- it’s part of “schema contract”
B) Demo / test / local-only data (don’t put in prod migrations)
Examples:
- fake users, sample orders
- mock catalog entries
- dev fixtures
❌ This should be injected via:
- separate scripts
- test setup
- profiles
- Testcontainers init scripts
- dedicated “dev seed” task
C) Environment-specific operational data (avoid in migrations)
Examples:
- prod API keys
- customer-specific config
- partner identifiers
❌ Keep in secrets/config management, not DB migrations.
2) Golden rule: seed data must be idempotent
Your seed should be safe if it runs:
- again
- on partially seeded environments
- after rollback/restore
- during blue/green deploys
Postgres example
INSERT INTO role(code)
VALUES ('ADMIN'), ('USER')
ON CONFLICT (code) DO NOTHING;
MySQL example
INSERT IGNORE INTO role(code) VALUES ('ADMIN'), ('USER');
Or explicit upsert with unique key.
📌 Interview phrase:
“Seed data should be idempotent and based on natural keys, not generated IDs.”
3) Prefer “natural keys” and stable identifiers
Bad:
- insert role, then later code refers to
role_id = 1
Good:
- refer by
code = 'ADMIN' - or use UUIDs you control
- or insert with fixed IDs only if you own the full lifecycle
This avoids drift when different environments generate IDs differently.
4) Keep seed operations small and fast
Avoid:
- inserting 100k rows in a migration
- heavy transforms
- long transactions
For big datasets:
- use batch jobs / ETL
- or preloaded snapshots
Migration tools are part of the deploy path—keep them predictable.
5) Tool-specific best practices
Flyway
Best practice for reference seed data:
- separate files:
V10__create_tables.sqlV11__seed_reference_data.sql
If data needs periodic refresh:
- consider repeatable migrations carefully:
R__views.sql✅R__seed_data.sql⚠️ usually no, unless it’s fully idempotent and “authoritative”
In Flyway, repeatables rerun on checksum change—this can surprise you.
Liquibase
Liquibase gives you more control:
- Use
<loadData>for CSV-based seed data (nice for reference tables) - Use contexts/labels:
context="dev"for dev-only seeds- no context for prod-required reference data
Also:
- Use preconditions if necessary (but don’t hide required data behind conditions unless you really mean it)
6) Recommended pattern: reference seed in migrations + fixtures elsewhere
What I’d do in a real project
- Reference data (roles, currencies) → migrations (idempotent)
- Dev/demo → separate “seed task”:
- Gradle task / Maven profile
- docker-compose init scripts
- Testcontainers init
- Test data → test setup (JUnit / integration test bootstrap)
7) Interview-ready answer (30 seconds)
“I seed only canonical reference data through the migration tool, and I keep it idempotent using natural keys and upserts so reruns are safe. I avoid putting demo or environment-specific data into migrations; those go into separate dev/test seed mechanisms or config management. For Flyway I prefer versioned seed migrations; for Liquibase I often use contexts/labels and loadData for reference tables.”