SQL.What guarantees does the relational model provide, and what does SQL not guarantee?

What the relational model guarantees

The relational model defines what states of data are valid.

These guarantees are logical, not performance-related.


1️⃣ Set-based data (no duplicates by default)

  • Relations are sets, not lists
  • Rows have no inherent order
  • Duplicates are eliminated conceptually

Example:

SELECT a FROM t;

Logically:

  • Result is an unordered set

➡️ Order only exists if explicitly defined.

2️⃣ Entity identity via keys

The relational model guarantees:

  • Every row can be uniquely identified
  • Keys uniquely identify tuples

This is why:

  • Candidate keys exist
  • Primary keys exist

Without keys → relation is not well-defined.

3️⃣ Referential integrity

Relationships between relations are:

  • Explicit
  • Enforceable
  • Consistent

A foreign key guarantees:

  • Child rows reference existing parent rows
  • Orphan rows are impossible

This is a core relational guarantee.

4️⃣ Integrity constraints

The model guarantees enforcement of:

  • Domain constraints (NOT NULL, type rules)
  • Key constraints (PRIMARY KEY, UNIQUE)
  • Referential constraints (FOREIGN KEY)

Invalid states are unrepresentable.

5️⃣ Declarative semantics

You specify:

  • What result you want
  • Not how to get it

This enables:

  • Query optimization
  • Execution plan changes
  • Storage evolution

6️⃣ Closure (important theory point)

Relational operations:

  • Take relations as input
  • Produce relations as output

This allows:

  • Query composition
  • Subqueries
  • Views

What SQL does NOT guarantee

SQL is a language, not a correctness oracle.

Many things developers assume are explicitly not guaranteed.


❌ 1️⃣ Row order (unless ORDER BY)

SELECT * FROM users;

No guaranteed order
✔ Only ORDER BY defines order

Indexes do not guarantee order.

❌ 2️⃣ Execution order of clauses

Logical SQL order ≠ execution order:

SELECT ...
FROM ...
WHERE ...
GROUP BY ...
HAVING ...
ORDER BY ...

You cannot rely on evaluation order inside WHERE.


❌ 3️⃣ Deterministic evaluation of expressions

WHERE f(x) AND g(x)

SQL does not guarantee:

  • Left-to-right evaluation
  • Short-circuiting

The optimizer may reorder conditions.

❌ 4️⃣ Absence of duplicates (unless enforced)

SELECT col FROM table;
  • Duplicates may appear
  • SQL works on bags (multisets)

Only DISTINCT or constraints remove duplicates.

❌ 5️⃣ Consistency without constraints

SQL does not guarantee:

  • Uniqueness without UNIQUE
  • Referential integrity without FOREIGN KEY
  • Valid ranges without CHECK

If you don’t declare it, SQL won’t enforce it.

❌ 6️⃣ Stable execution plans

The same query:

  • May use different indexes tomorrow
  • May switch join algorithms
  • May behave differently after stats change

Execution plans are not stable contracts.

❌ 7️⃣ Isolation without transactions

SQL does not guarantee:

  • Atomicity
  • Isolation
  • Consistency

Unless:

BEGIN;
...
COMMIT;

Even then — isolation level matters.

Big mental model (interview gold)

Relational ModelSQL
Defines valid data statesDefines how to query data
Guarantees correctnessMakes no implicit promises
Based on setsOperates on multisets
OrderlessOrder must be declared
DeclarativeOptimizer-controlled

One-sentence interview answer (perfect)

The relational model guarantees data correctness through keys, constraints, referential integrity, and declarative semantics, while SQL does not guarantee row order, execution order, uniqueness, consistency, or stable plans unless they are explicitly specified.

Common interview traps ❌

  • Assuming index order = result order
  • Relying on WHERE clause evaluation order
  • Assuming SQL enforces business rules automatically
  • Confusing logical correctness with physical execution
Posted in Без рубрики | Comments Off on SQL.What guarantees does the relational model provide, and what does SQL not guarantee?

SQL.How logical and physical schemas implemented on practice (in Postgres for ex.) ?

High-level idea (keep this in mind)

In PostgreSQL, you define the logical schema explicitly,
and PostgreSQL derives and manages the physical schema implicitly, with optional tuning knobs.

You declare intent → Postgres decides storage.

1️⃣ Logical schema in PostgreSQL (what you define)

This is everything you normally write in DDL.

Logical schema includes

CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  email TEXT NOT NULL UNIQUE,
  created_at TIMESTAMP NOT NULL
);

CREATE TABLE orders (
  id BIGSERIAL PRIMARY KEY,
  user_id BIGINT NOT NULL,
  created_at TIMESTAMP NOT NULL,
  FOREIGN KEY (user_id) REFERENCES users(id)
);

What this expresses (logically)

  • Entities: users, orders
  • Relationships: orders → users
  • Constraints:
    • Primary keys
    • Foreign keys
    • Uniqueness
  • Column meaning & types

💡 No storage details yet.

2️⃣ What PostgreSQL creates physically (automatically)

From the logical schema above, PostgreSQL creates:

For each table

  • A heap file on disk
  • Internal row identifiers (ctid)
  • System catalog entries

For each constraint

  • Indexes (e.g. PK, UNIQUE)
  • Internal dependency metadata

Example:

users
├── heap file (rows)
├── PK index (btree on id)
├── UNIQUE index (btree on email)

You did not specify:

  • File layout
  • Page size
  • Index structure

Postgres handled it.

3️⃣ Physical schema in PostgreSQL (how data is stored & accessed)

This is where performance tuning lives.

Physical schema includes

Indexes

CREATE INDEX idx_orders_user_created
ON orders(user_id, created_at DESC);

Index types

USING btree
USING hash
USING gin
USING gist

Table partitioning

CREATE TABLE orders (
  ...
) PARTITION BY RANGE (created_at);

Storage parameters

ALTER TABLE orders SET (fillfactor = 70);

Tablespaces

CREATE TABLESPACE fast_ssd LOCATION '/mnt/ssd';

All of this:

  • Does not change correctness
  • Only affects performance & storage

4️⃣ Logical vs physical: same data, different behavior

Logical query (unchanged)





SELECT *
FROM orders
WHERE user_id = 42
  AND created_at >= now() - interval '7 days';

Physical execution (can change!)

  • Seq scan → Index scan
  • Index scan → Bitmap scan
  • Bitmap scan → Partition pruning

Same query.
Different physical plan.

This is the power of declarative SQL.

5️⃣ PostgreSQL catalogs: where logic meets physics

Postgres stores everything in system tables:

  • pg_class → tables & indexes
  • pg_attribute → columns
  • pg_constraint → PK / FK / UNIQUE
  • pg_index → index metadata

Logical schema:

  • What relationships exist

Physical schema:

  • How those relationships are implemented internally

You rarely touch this directly — but tools do.

6️⃣ Migrations: how this works in real teams

Typical workflow

  1. Logical change
ALTER TABLE orders ADD COLUMN status TEXT NOT NULL;

2. Physical optimization (optional, later)

CREATE INDEX idx_orders_status ON orders(status);

Important:

  • Logical migrations are mandatory
  • Physical migrations are optional & iterative

Senior teams separate them consciously.

7️⃣ Expand–Migrate–Contract pattern (production reality)

Example: adding a new constraint safely

Expand (logical)

ALTER TABLE users ADD COLUMN email_new TEXT;

Migrate (data)

UPDATE users SET email_new = email;

Contract (logical + physical)

ALTER TABLE users DROP COLUMN email;
ALTER TABLE users RENAME COLUMN email_new TO email;
CREATE UNIQUE INDEX idx_users_email ON users(email);

Logical rules first → physical enforcement last.

8️⃣ Mental model (very important)

Think in layers:





Application
  ↓
Logical schema (tables, constraints, relations)
  ↓
Query planner
  ↓
Physical schema (indexes, partitions, files)
  ↓
Disk / Memory

You control:

  • Logical schema directly
  • Physical schema indirectly

Postgres controls:

  • Execution plans
  • Storage layout
  • Optimization decisions

Common interview mistakes ❌

  • Saying indexes are part of logical schema
  • Mixing constraints with performance tuning
  • Thinking partitioning changes correctness
  • Optimizing physical schema before modeling logic

One-sentence interview answer (perfect)

In PostgreSQL, the logical schema is defined explicitly using tables, constraints, and relationships, while the physical schema consists of indexes, partitions, and storage parameters that PostgreSQL uses to store and access that logical model efficiently.

Posted in Без рубрики | Comments Off on SQL.How logical and physical schemas implemented on practice (in Postgres for ex.) ?

SQL.What is the difference between a logical schema and a physical schema?

Logical schema vs Physical schema

Logical schema describes what the data model is.
Physical schema describes how it is stored and accessed.

That sentence alone already scores points.


1️⃣ Logical schema

What it is

The logical schema defines:

  • Entities and attributes
  • Relationships
  • Constraints
  • Business rules

It is DB-engine independent.

Focus

  • Correctness
  • Meaning
  • Domain modeling

Example (logical)

User
- id
- email (unique)
- has many Orders

Order
- id
- user_id
- created_at

This says nothing about:

  • Indexes
  • Storage
  • Performance
  • Tablespaces

Typical elements

  • Tables (as concepts)
  • Columns and data types (abstract)
  • Primary keys
  • Foreign keys
  • UNIQUE constraints
  • Cardinality

2️⃣ Physical schema

What it is

The physical schema defines:

  • How data is stored on disk
  • How it is indexed
  • How it is partitioned
  • How it is optimized

It is DB-engine specific.

Focus

  • Performance
  • Scalability
  • Storage efficiency

Example (physical)

CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  email TEXT NOT NULL
);

CREATE UNIQUE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_created
  ON orders(user_id, created_at DESC);

PARTITION BY RANGE (created_at);

Typical elements

  • Index types (B-tree, hash)
  • Index order
  • Partitioning
  • Tablespaces
  • Storage parameters
  • Fillfactor, compression

Key differences (interview gold)

AspectLogical schemaPhysical schema
PurposeData meaningData storage
LevelConceptualImplementation
DB-specificNoYes
Changes when scalingRarelyOften
Concerned withCorrectnessPerformance

Why this distinction matters

1️⃣ You can change physical schema without changing logic

  • Add indexes
  • Change partitioning
  • Tune storage

➡️ Application logic remains untouched.

2️⃣ Logical mistakes are fatal; physical mistakes are fixable

  • Wrong relationships → broken domain
  • Missing constraints → corrupted data
  • Bad index → slow query (but correct)

3️⃣ Interview insight

Senior engineers think:

  1. Model correctness first (logical)
  2. Optimize later (physical)

Real-world example

Logical decision

“An order must belong to a user.”

Physical decisions

  • Index on orders.user_id
  • Partition orders by date
  • Cluster table by PK

The rule doesn’t change — the implementation does.

Common interview mistakes ❌

  • Mixing logical and physical concepts
  • Saying indexes are part of logical schema
  • Ignoring constraints as “physical”
  • Designing for performance before correctness

One-sentence interview answer (perfect)

The logical schema defines the structure and rules of the data model independently of storage, while the physical schema defines how that model is implemented, stored, and optimized in a specific database system.


Bonus follow-up (very common)

“Can indexes change logical behavior?”

Answer:

No — they only affect performance, not correctness.

Posted in Без рубрики | Comments Off on SQL.What is the difference between a logical schema and a physical schema?

SQL.Why are foreign keys needed, and what problems do they solve beyond JOINs?

Why are foreign keys needed (beyond JOINs)?

Foreign keys are about data correctness and lifecycle guarantees — not query syntax.

JOINs are just a query feature.
Foreign keys are integrity constraints.


What a foreign key actually guarantees

A foreign key enforces a parent–child relationship at the database level.

orders.user_id → users.id

This means:

  • A child cannot exist without a parent
  • Parent–child lifecycle is enforced
  • Invalid states are impossible

JOINs alone cannot do this.


1️⃣ Prevent orphaned data (core reason)

Without FK

INSERT INTO orders (user_id) VALUES (999);

✔ Succeeds
❌ But user 999 does not exist

Now you have:

  • Orphan rows
  • Broken business logic
  • Silent data corruption

With FK

FOREIGN KEY (user_id) REFERENCES users(id)

Result:
❌ Insert fails immediately
✔ Data stays correct

2️⃣ Enforce invariants under concurrency (huge)

Imagine two concurrent operations:

  • Tx A deletes user
  • Tx B inserts order for that user

Without FK

Race condition → orphan order

With FK

DB enforces correct ordering:

  • One transaction blocks or fails
  • Invariant preserved

No amount of Java code can guarantee this globally.

3️⃣ Make invalid states unrepresentable

This is a DDD-level benefit.

Foreign keys encode domain rules:

“An order must belong to an existing user.”

Once enforced:

  • You don’t need defensive checks everywhere
  • You reduce system complexity
  • Bugs fail fast

4️⃣ Control delete & update behavior explicitly

Foreign keys define lifecycle semantics:

ON DELETE RESTRICT
ON DELETE CASCADE
ON DELETE SET NULL

Examples:

  • Prevent deleting a user with active orders
  • Automatically clean up child rows
  • Explicitly model ownership

Without FKs:

  • Deletions become dangerous
  • Cleanup logic leaks into application code

5️⃣ Protect against bad migrations & scripts

Production reality:

  • Hotfix SQL
  • Manual backfills
  • Data imports
  • One-off scripts

Foreign keys are the last line of defense.

They stop:

  • Accidental deletes
  • Broken inserts
  • Partial migrations

This is not theoretical — this saves companies.

6️⃣ Improve query planner correctness (bonus)

Some databases use FKs to:

  • Optimize joins
  • Infer cardinality
  • Eliminate unnecessary joins

Not the main reason — but a real benefit.

7️⃣ Documentation & schema self-descriptiveness

A schema with FKs:

  • Explains itself
  • Is easier to review
  • Prevents wrong assumptions

Without FKs:

  • You must trust the code
  • New engineers make mistakes

Common myths ❌

❌ “FKs are only for joins”
❌ “We enforce it in the application”
❌ “FKs slow down the system”
❌ “Microservices shouldn’t use FKs”

Correct framing:

FKs protect data, not code paths.

When people avoid foreign keys (and why it’s risky)

Reasons given:

  • “Performance”
  • “Microservices”
  • “Flexibility”

Reality:

  • Usually premature optimization
  • Or organizational fear, not technical necessity

One-sentence interview answer (perfect)

Foreign keys enforce referential integrity by preventing invalid parent–child relationships, protecting data under concurrency, controlling lifecycle behavior, and making invalid states impossible — which JOINs alone cannot do.

Strong follow-up answer (bonus)

JOINs answer how to read data, foreign keys define which data states are allowed to exist.

Posted in Без рубрики | Comments Off on SQL.Why are foreign keys needed, and what problems do they solve beyond JOINs?

SQL.What are the problems of composite keys to equals / hash code in ORM ?

Why composite keys break equals() / hashCode() in ORM

ORMs rely on equals() / hashCode() to track entity identity.
Composite keys make identity mutable, incomplete, or unstable during an entity’s lifecycle.

That’s the root problem.


First: how ORM uses equals() / hashCode()

Hibernate uses them in:

  • Persistence Context (1st-level cache)
  • Set / Map collections
  • Dirty checking
  • Association management
  • Proxy resolution

Hibernate assumes:

  • Identity is stable
  • Identity exists from creation to deletion
  • equals() does not change over time

Composite keys violate these assumptions.

Problem 1️⃣: null values before persist (very common)

Composite key example

@Embeddable
class UserRoleId {
    Long userId;
    Long roleId;
}

Before persist():

userId = null
roleId = null

But equals() / hashCode() must already work.

What happens

  • hashCode() changes after IDs are assigned
  • Entity moves between hash buckets
  • ORM loses track of the entity

What happens

  • hashCode() changes after IDs are assigned
  • Entity moves between hash buckets
  • ORM loses track of the entity

Result

  • Entity “disappears” from Set
  • Duplicate entries appear
  • Updates silently fail

This bug is nightmare-level to debug.

Problem 2️⃣: Mutable identity (identity must be immutable)

Composite keys are built from:

  • Foreign keys
  • Business columns

These can change.

Example:

(user_id, role_id)

What if:

  • Role is reassigned
  • User is merged
  • FK updated

Now:

  • Primary key changes
  • equals() / hashCode() changes
  • ORM breaks identity tracking

Surrogate IDs never change.

Problem 3️⃣: Partial equality before flush

Entity lifecycle:

  1. New entity created
  2. Added to a Set
  3. Persisted
  4. Flushed → ID assigned

With composite key:

  • Step 2: equality based on nulls
  • Step 4: equality based on real values

This violates the Java contract:

If an object is in a HashSet, its hashCode() must not change.

Composite keys almost guarantee violation.

Problem 4️⃣: Proxy vs real object comparison

Hibernate often compares:

  • Proxy instance
  • Real entity instance

With composite PK:

  • Multiple fields to compare
  • More chances of mismatch
  • Broken lazy-loading equality

Example bug:

proxy.equals(entity) == false

This breaks collections and caches.

Problem 5️⃣: Developers implement equals/hashCode wrong

Typical mistakes:

  • Forgetting one key field
  • Including non-key fields
  • Using mutable fields
  • Using Lombok @Data blindly

With composite keys:

  • More fields = more risk
  • One mistake = silent corruption

Problem 6️⃣: Identity vs business equality confusion

Composite keys blur the line:

  • Is (userId, roleId) identity?
  • Or just a uniqueness rule?

ORM wants:

  • Identity = immutable technical key

Composite keys encode business rules into identity.

That’s a conceptual mismatch.


Concrete failure example (realistic)

Set<UserRole> roles = new HashSet<>();

UserRole ur = new UserRole();
ur.setUser(user);
ur.setRole(role);

roles.add(ur);   // hashCode based on nulls

entityManager.persist(ur);
entityManager.flush();  // IDs assigned → hashCode changes

roles.contains(ur); // false ❌

Why surrogate keys avoid all of this

@Id
Long id;
  • Assigned once
  • Immutable
  • Simple equality
  • Stable hash code

ORMs are optimized for this model.

One-sentence interview answer (perfect)

Composite keys make entity identity depend on multiple mutable or initially-null fields, which causes equals() and hashCode() to change during the entity lifecycle and breaks ORM caching and collections.

Final mental model (remember this)

ORM identity must be boring.
Composite keys are interesting — and that’s the problem.

Posted in Без рубрики | Comments Off on SQL.What are the problems of composite keys to equals / hash code in ORM ?

SQL.What problems do we have with ORM when we use composite keys?

Why ORMs (JPA / Hibernate) struggle with composite primary keys

First: what ORM expects

An ORM assumes:

  • One column = one identity
  • Identity is simple, immutable, comparable
  • Identity is used for:
    • Persistence context (1st-level cache)
    • Equality checks
    • Lazy loading
    • Proxies
    • Hash-based collections

A composite key breaks all of these assumptions.


1️⃣ @EmbeddedId / @IdClass — what’s the problem?

With a composite PK, you must create a separate ID class

@Embeddable
class UserRoleId {
    Long userId;
    Long roleId;
}
@Entity
class UserRole {
    @EmbeddedId
    UserRoleId id;
}

Option B: @IdClass

class UserRoleId {
    Long userId;
    Long roleId;
}
@Entity
@IdClass(UserRoleId.class)
class UserRole {
    @Id Long userId;
    @Id Long roleId;
}

👉 Already:

  • Extra class
  • Extra boilerplate
  • Extra cognitive load

Now the real problems start.


2️⃣ equals() / hashCode() — this is the killer

Hibernate uses equals() and hashCode() to manage entities in:

  • Persistence context
  • Sets
  • Maps
  • Dirty checking

With surrogate key

equals → id
hashCode → id

Simple. Stable.

With composite key

You must implement:

equals(userId, roleId)
hashCode(userId, roleId)

Problems:

  • What if one field is null before persist?
  • What if entities are put into a Set before flush?
  • What if ID fields are mutable?

Result:

  • Entities disappear from Sets
  • Duplicate entities appear
  • Cache behaves incorrectly

These bugs are subtle and brutal.

3️⃣ First-level cache confusion (Persistence Context)

Hibernate cache key:

(EntityClass, PrimaryKey)

With composite PK:

(UserRole, (userId, roleId))

Every lookup must:

  • Instantiate composite ID
  • Compare multiple fields
  • Rely on correct equals/hashCode

This:

  • Slows down lookups
  • Makes bugs harder to trace
  • Breaks identity guarantees if equals/hashCode is wrong

4️⃣ Lazy loading & proxies get awkward

Example:

UserRole role = em.getReference(
    UserRole.class,
    new UserRoleId(1L, 2L)
);

Compared to:

em.getReference(UserRole.class, 42L);

You now:

  • Allocate objects just to reference entities
  • Leak DB structure into service code
  • Increase verbosity everywhere

5️⃣ Repositories become ugly

Spring Data repository:

Surrogate PK

interface UserRoleRepo extends JpaRepository<UserRole, Long> {}

Composite PK

interface UserRoleRepo extends JpaRepository<UserRole, UserRoleId> {}

Every call now requires:

repo.findById(new UserRoleId(userId, roleId));

This is not wrong, but it’s noisy and error-prone.

6️⃣ Associations are harder to map

Mapping relations:

@ManyToOne
@JoinColumns({
    @JoinColumn(name = "user_id"),
    @JoinColumn(name = "role_id")
})

Instead of:

@ManyToOne
@JoinColumn(name = "user_id")

ore annotations → more bugs → harder refactors.


7️⃣ Why this matters in real teams

Composite PK issues:

  • Slow down development
  • Confuse mid-level devs
  • Increase code review overhead
  • Cause production-only bugs

This is why experienced teams avoid them unless forced.

Mental model (important)

ORMs are optimized for object identity, not relational purity.

Composite PKs are relationally elegant
Surrogate PKs are operationally practical


Clean alternative (best of both worlds)

user_roles (
  id BIGSERIAL PRIMARY KEY,
  user_id BIGINT NOT NULL,
  role_id BIGINT NOT NULL,
  UNIQUE (user_id, role_id)
)

ORM:

@Entity
class UserRole {
    @Id Long id;
    Long userId;
    Long roleId;
}

✔ Simple identity
✔ Business rule enforced
✔ ORM-friendly
✔ Evolvable

One-sentence interview explanation (perfect)

ORMs rely on a simple, stable single-column identity, and composite primary keys complicate equality, caching, associations, and repository usage, which is why teams usually avoid them in ORM-based systems.

Posted in Без рубрики | Comments Off on SQL.What problems do we have with ORM when we use composite keys?

SQL.Why is it bad to use composite key on many to many table ?

Short answer (interview-ready)

Using a composite primary key in a many-to-many table is often problematic because it tightly couples identity to business columns, complicates foreign keys, hurts ORM usability, and makes future schema evolution harder — even though it looks “relationally pure”.

That framing already sounds senior. Now let’s justify it.

The canonical many-to-many table

user_roles (
  user_id BIGINT,
  role_id BIGINT,
  PRIMARY KEY (user_id, role_id)
)

This is theoretically correct.
But theory ≠ production reality.

Why composite PKs hurt in practice

1️⃣ Identity is not stable

A primary key should represent row identity.

With a composite PK:

(user_id, role_id) = identity

But what if:

  • Roles are merged
  • Role IDs change
  • You add versioning
  • You add validity periods

Now the “identity” of the row changes.

Primary keys should not change.

2️⃣ Foreign keys become painful

The moment someone wants to reference this table:

audit_log (
  user_id BIGINT,
  role_id BIGINT,
  ...
)

You now need:

FOREIGN KEY (user_id, role_id)
REFERENCES user_roles(user_id, role_id)

Problems:

  • Verbose
  • Easy to break
  • Harder to index
  • Ugly in joins

Single-column FKs scale better organizationally.

3️⃣ ORMs hate composite primary keys (huge!)

In JPA/Hibernate:

  • @EmbeddedId
  • @IdClass
  • Broken equals/hashCode
  • Painful caching
  • Awkward repositories

Example smell:

UserRoleId(userId, roleId)

Senior teams avoid this unless absolutely necessary.

4️⃣ Harder to evolve the table

Sooner or later someone asks:

  • “Can we add assigned_at?”
  • “Can we soft-delete roles?”
  • “Can we add metadata?”

Now you want:

  • Multiple rows per (user_id, role_id)
  • History
  • Flags

With composite PK:
❌ Impossible without redesign

With surrogate PK:
✔ Easy extension

5️⃣ Composite PK ≠ business rule clarity

The real business rule is:

A user cannot have the same role twice

That is a uniqueness rule, not an identity rule.

So this is conceptually cleaner:

id BIGSERIAL PRIMARY KEY,
UNIQUE (user_id, role_id)

Identity ≠ constraint.

6️⃣ Indexing & performance nuance

Composite PK creates:

PRIMARY KEY (user_id, role_id)

That index:

  • Is ordered by user_id first
  • May not fit your query patterns
  • Forces additional indexes anyway

So you often end up with:

  • Composite PK index
  • Extra indexes

No real win.

The preferred production pattern ✅

user_roles (
  id BIGSERIAL PRIMARY KEY,      -- surrogate identity
  user_id BIGINT NOT NULL,
  role_id BIGINT NOT NULL,
  UNIQUE (user_id, role_id)      -- business rule
)

Benefits:

  • Stable identity
  • Simple foreign keys
  • ORM-friendly
  • Easy to extend
  • Business correctness enforced

This is not junior laziness — it’s deliberate.


When composite PK is acceptable (be honest)

Composite primary keys are OK when:

  • Table is purely associative
  • No other table will reference it
  • No extra attributes ever
  • No ORM (or you’re brave)

Example:

  • Pure link table in a reporting schema

Even then — many teams still avoid it.


Interview traps ❌

  • “Composite keys are always bad” ❌
  • “Surrogate keys are always better” ❌
  • “ORMs can handle it fine” ❌

Correct stance:

Composite keys are relationally valid, but often operationally expensive.


One-sentence senior answer (perfect)

Composite keys in many-to-many tables couple identity to business columns, complicate foreign keys and ORM usage, and make future evolution harder, so in production we usually prefer a surrogate key with a composite UNIQUE constraint.

Posted in Без рубрики | Comments Off on SQL.Why is it bad to use composite key on many to many table ?

SQL.What pros with having surrogate key and unique business fields. Give examples ?

Pros of having surrogate key + unique business fields

This pattern separates technical identity from business rules — which is the core reason it scales in real systems.


1️⃣ Stable identity + flexible business rules

Why it matters

Business data changes. Primary keys should not.

Example

users (
  id BIGSERIAL PRIMARY KEY,   -- surrogate key
  email TEXT NOT NULL,
  UNIQUE (email)
)

Scenario:

  • User changes email
  • email is updated
  • No foreign keys break
  • id remains stable

If email were the PK:
❌ FK cascade pain
❌ Cache invalidation chaos
❌ API contracts break

2️⃣ Strong data integrity (DB-enforced)

Why it matters

Only the database can guarantee global correctness.

Example

UNIQUE (email)

Two concurrent requests try to create the same user:

  • App-level check → race condition
  • DB constraint → atomic guarantee

This is critical in:

  • Distributed systems
  • Microservices
  • Multi-node deployments

This is critical in:

  • Distributed systems
  • Microservices
  • Multi-node deployments

3️⃣ Better performance & smaller indexes

Why it matters

Surrogate keys are:

  • Small
  • Fixed-size
  • Cache-friendly

Example

orders.user_id → users.id

vs

orders.user_email → users.email

Benefits:

  • Faster joins
  • Smaller B-tree indexes
  • Less memory & IO

This matters at scale.

4️⃣ Cleaner foreign keys & schema evolution

Why it matters

Foreign keys define coupling.

Example

orders (
  id BIGSERIAL PRIMARY KEY,
  user_id BIGINT REFERENCES users(id)
)

If business rules change:

  • Email format changes
  • Multiple emails per user
  • Email becomes nullable

➡️ No FK refactor required.

5️⃣ ORM & tooling friendliness

Why it matters

Most ORMs expect:

  • Single-column PK
  • Immutable identity

Example (JPA mindset)

@Entity
class User {
  @Id
  Long id;

  @Column(unique = true)
  String email;
}

Composite or natural PKs:

  • Complicate mappings
  • Break lazy loading
  • Hurt caching

6️⃣ Clear separation of concerns (conceptual clarity)

LayerResponsibility
Surrogate keyTechnical identity
UNIQUE fieldsBusiness rules
FK relationsData consistency

This makes schemas:

  • Easier to reason about
  • Easier to review
  • Easier to evolve

Senior engineers think this way.

7️⃣ Easier debugging & auditing

Example

User(id=91238123, email=alex@example.com)

IDs are immutable

Business data can change

Logs stay consistent

Audits stay meaningful

Real-world examples (very interview-friendly)

Example 1: Users

id (PK)        -- surrogate
email (UNIQUE) -- natural key

Example 2: Orders

id (PK)
order_number (UNIQUE)  -- shown to customers

Example 3: Accounts

id (PK)
iban (UNIQUE)

Example 4: Products

id (PK)
sku (UNIQUE)

When this pattern is NOT ideal (be honest)

  • Truly immutable natural keys
    • ISO country codes
    • Currency codes

Even then — teams often still use surrogate IDs for consistency.


One-sentence interview answer (perfect)

Using a surrogate key with unique business fields gives us a stable technical identity while still enforcing business-level uniqueness and correctness at the database level.

What interviewers hear when you say this ✅

“This person has designed real production schemas.”

Posted in Без рубрики | Comments Off on SQL.What pros with having surrogate key and unique business fields. Give examples ?

SQL.Why do we need enforce naural keys when we have surrogate ones ?

Short answer (interview-ready)

Surrogate keys provide technical identity, but natural keys enforce business correctness.
Without enforcing natural keys, a relational database cannot protect the domain from invalid or duplicate business data.

That’s the essence. Now let’s break it down.


What surrogate keys actually guarantee (and what they don’t)

A surrogate key (id) guarantees only:

  • Each row is technically unique
  • Rows can be referenced efficiently

It does NOT guarantee:

  • One user per email
  • One order per business number
  • One country per ISO code

Example:

users (
  id BIGSERIAL PRIMARY KEY,
  email TEXT
)

This table allows:

(id=1, email=a@test.com)
(id=2, email=a@test.com)  ❌

From the database point of view — everything is valid.

From the business point of view — data is corrupted.


Why natural keys must still be enforced

1️⃣ Business rules live in the database

Natural keys represent real-world uniqueness:

  • Email
  • Passport number
  • Account number
  • Order number

If the DB does not enforce them:

  • You rely on application code
  • You risk bugs, race conditions, partial failures
UNIQUE (email)

➡️ DB becomes the final authority, not your code.


2️⃣ Concurrency safety (huge interview point)

Two concurrent requests:

Request A: check email exists → false
Request B: check email exists → false

Both insert.

Without a DB constraint:
❌ Duplicate data

With UNIQUE (email):
✔️ One fails atomically

No amount of Java synchronized fixes this across nodes.

3️⃣ Defense in depth

Production reality:

  • Multiple services
  • Migrations
  • Manual scripts
  • Hotfixes
  • Data imports

Only DB constraints protect against:

  • Human error
  • Script mistakes
  • Partial deployments

4️⃣ Surrogate keys hide data problems

Surrogate keys make duplicates invisible:

Two users with same email → different IDs → looks fine

This leads to:

  • Broken reports
  • Incorrect billing
  • Ghost users

Enforcing natural keys prevents silent corruption.

5️⃣ Easier reasoning & debugging

With enforced natural keys:

  • Errors surface early
  • Failures are explicit
  • Logs show constraint violations

Without them:

  • Bugs surface months later
  • Data cleanup becomes painful
  • Trust in data is lost

Correct real-world pattern (gold standard)

users (
  id BIGSERIAL PRIMARY KEY,      -- surrogate key
  email TEXT NOT NULL,
  UNIQUE (email)                 -- natural key enforced
)

This gives you:

  • Stable technical identity
  • Guaranteed business correctness

Both are needed.

Interview traps ❌

  • “IDs are enough” ❌
  • “We enforce it in code” ❌
  • “We don’t need UNIQUE constraints” ❌
  • “Surrogate keys prevent duplicates” ❌

All of these are red flags at senior interviews.

One-sentence senior-level answer (perfect)

Surrogate keys identify rows technically, but natural keys must still be enforced to guarantee business-level uniqueness, data integrity, and concurrency safety.


Bonus follow-up (very common)

“What happens if business rules change?”

Answer:

You change or migrate the natural constraint, not the surrogate key — which remains stable.

Posted in Без рубрики | Comments Off on SQL.Why do we need enforce naural keys when we have surrogate ones ?

SQL.What is surrogate key and why is it called so ?

What is a Surrogate Key?

Definition

A surrogate key is an artificial, system-generated identifier used as a primary key that has no business meaning.

Typical forms:

  • Auto-increment integer (SERIAL, IDENTITY)
  • Sequence-based BIGINT
  • UUID

Example:

users (
  id BIGSERIAL PRIMARY KEY,
  email TEXT UNIQUE NOT NULL
)

Why is it called surrogate?

Surrogate literally means:

a substitute or replacement

In databases:

  • It replaces a natural/business key
  • It stands in for real-world identity
  • It exists only for technical purposes

You are saying:

“I don’t want my business data to be the identity of this row — I’ll use a surrogate instead.”

That’s the core idea.


Surrogate Key vs Natural Key

AspectSurrogate KeyNatural Key
OriginSystem-generatedBusiness data
MeaningNoneReal-world meaning
StabilityStableCan change
SizeSmall & fixedOften large
PK changesNeverPossible
ORM-friendlyYesOften painful

Why do we use surrogate keys? (important)

1️⃣ Stability

Business data changes:

  • Email can change
  • Passport can be reissued
  • Phone number changes

Changing a primary key is expensive and dangerous.

2️⃣ Performance

  • Small integer keys
  • Smaller indexes
  • Faster joins

Compare:

JOIN ON user_id BIGINT
vs
JOIN ON email TEXT

No contest.

3️⃣ Simpler foreign keys

orders.user_id → users.id

vs

orders.user_email → users.email

Less coupling, less pain.


4️⃣ ORM & microservices friendliness

  • JPA/Hibernate expect stable single-column PKs
  • IDs are easier to pass between services
  • Easier caching and serialization

Typical usage pattern (best practice)

users (
  id BIGSERIAL PRIMARY KEY,
  email TEXT NOT NULL,
  UNIQUE (email)
)

This gives you:

  • Technical identity (surrogate key)
  • Business uniqueness (natural key enforced)

This is the most common real-world design.


Interview traps ❌

  • “Surrogate key is faster” (too shallow)
  • Using surrogate keys without business constraints
  • Thinking surrogate = UUID only
  • Forgetting to enforce natural uniqueness

One-sentence interview answer (perfect)

A surrogate key is a system-generated identifier with no business meaning that substitutes a natural key to provide a stable, efficient primary key.


Bonus follow-up interview question 🚨

“When would you avoid surrogate keys?”

Strong answer:

When the natural key is short, stable, and truly immutable, such as ISO country codes.

Posted in Без рубрики | Comments Off on SQL.What is surrogate key and why is it called so ?