filter in network
-status-code:200
filter in network
-status-code:200
his is a schema design / data integrity question. Interviewers ask it to see whether you push correctness into the database, not only into application code.
Domain types and constraints restrict what values a column can hold, encoding business rules directly in the schema so invalid data cannot be stored, regardless of the application.
A domain is a restricted version of a base data type.
Think of it as:
“Not just an INT — an INT with meaning and rules.”
Instead of repeating rules everywhere, you define them once and reuse them.
Rules applied per column:
|
1 |
|
1 2 |
amount DECIMAL(10,2) NOT NULL CHECK (amount >= 0) |
Rules applied to a reusable type:
|
1 2 3 |
CREATE DOMAIN positive_amount AS DECIMAL(10,2) CHECK (VALUE >= 0); |
Then used like:
|
1 |
amount positive_amount NOT NULL |
Same rule, centralized.
|
1 2 |
email TEXT NOT NULL |
Prevents missing mandatory data.
CHECK
|
1 2 |
CHECK (age >= 18) |
Encodes business rules.
UNIQUE
|
1 2 |
UNIQUE (email) |
Prevents duplicates.
|
1 2 |
FOREIGN KEY (user_id) REFERENCES users(id) |
Enforces referential integrity.
|
1 2 |
CHECK (status IN ('NEW', 'PAID', 'CANCELLED')) |
Rules are duplicated:
Eventually:
The database becomes the last line of defense.
|
1 |
|
1 2 |
email TEXT CHECK (position('@' in email) > 1) |
(Not perfect, but enforces basic sanity.)
Example 2: Money
|
1 2 3 |
CREATE DOMAIN money_amount AS BIGINT CHECK (VALUE >= 0); |
Stores cents, guarantees non-negative.
|
1 |
|
1 2 3 |
CREATE DOMAIN order_status AS TEXT CHECK (VALUE IN ('NEW', 'PAID', 'CANCELLED')); |
No accidental "new" or "PAYED".
Sooner or later:
Something bypasses validation.
⚠️ Constraints can:
Senior answer:
Correctness > convenience.
“Domain types and constraints restrict allowed values for columns, encoding business rules directly into the database schema.
They prevent invalid data from being stored regardless of how the database is accessed, improving correctness, consistency, and long-term maintainability.
They move validation from application code into the data model itself.”
If a candidate says:
👉 That’s not senior-level thinking.
This is a semantic + performance trap question. Interviewers ask it to see whether you understand implicit casting, three-valued logic, and index usage.
SQL implicitly casts values to make comparisons possible, but this can change semantics and disable indexes, leading to wrong results and full table scans.
That’s why comparing different data types is dangerous and should be avoided.
When you compare different data types, SQL tries to coerce one side to the other:
WHERE user_id = '42'
If user_id is INTEGER, the engine rewrites it (conceptually) as:
WHERE user_id = CAST('42' AS INTEGER)
This is implicit type casting.
Which side gets cast depends on:
WHERE amount > '100'
Possible outcomes:
Lexical comparison:
'20' > '100' -- TRUE (string comparison!)
That’s silent data corruption.
🔴 Example 2: invalid values
WHERE user_id = 'abc'
Some DBs throw an error
Others silently fail or filter everything
Behavior may change after upgrades
CREATE INDEX idx_orders_user_id ON orders(user_id);
SELECT *
FROM orders
WHERE CAST(user_id AS TEXT) = '42';
Plan:
Seq Scan on orders
❌ Index cannot be used because the column is wrapped in a function.
Indexes work only when the column appears “as-is” in the predicate.
Safe:
WHERE user_id = 42
Unsafe:
WHERE CAST(user_id AS TEXT) = '42'
JOIN users u ON o.user_id = u.external_id
If:
user_id = INTEGERexternal_id = VARCHARThen:
This turns OLTP joins into OLAP-style work.
WHERE user_id = NULL
UNKNOWNCombined with casts, this becomes extremely hard to debug.
INT ↔ VARCHAR joinsWHERE user_id = CAST(? AS INTEGER)
❌ Never:
WHERE CAST(user_id AS TEXT) = ?
✅ Use typed literals
DATE '2026-01-28'
TIMESTAMP '2026-01-28 10:00:00'
Errors > silent coercion.
“SQL allows comparisons between different data types by applying implicit casts, but this is dangerous because it can change comparison semantics and prevent index usage.
When a cast is applied to a column, indexes often become unusable, leading to full scans and poor performance.
To avoid this, schemas and queries must use consistent data types and apply casts only to constants, not columns.”
DATEstores a calendar day,TIMEstores a time of day,TIMESTAMPstores a date and time without timezone, andTIMESTAMPTZstores an absolute moment in time.
For business events,TIMESTAMPTZshould almost always be used.
DATEDATE '2026-01-28'
❌ Events with ordering or exact time
TIMETIME '14:30:00'
❌ Events
❌ Logging
❌ Auditing
TIMESTAMP (without timezone)TIMESTAMP '2026-01-28 14:30:00'
Example:
"2026-03-29 02:30" -- may not exist in some timezones
TIMESTAMPTZ (timestamp with time zone)An absolute point in time stored internally in UTC.
TIMESTAMPTZ '2026-01-28 14:30:00+03'
Stored as:
2026-01-28 11:30:00 UTC
✅ Business events
✅ Logs
✅ Audits
✅ Payments
✅ Distributed systems
| Type | Date | Time | Timezone | Meaning |
|---|---|---|---|---|
| DATE | ✅ | ❌ | ❌ | Calendar day |
| TIME | ❌ | ✅ | ❌ | Time of day |
| TIMESTAMP | ✅ | ✅ | ❌ | Local datetime |
| TIMESTAMPTZ | ✅ | ✅ | ✅ | Absolute instant |
Use
TIMESTAMPTZ.
Why:
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
Still use TIMESTAMPTZ, then derive:
DATE(created_at AT TIME ZONE 'Europe/Berlin')
Never store the derived value as truth.
“
DATEandTIMEstore partial temporal information,TIMESTAMPstores a local date-time without timezone, andTIMESTAMPTZrepresents an absolute instant normalized to UTC.
For business events and distributed systems,TIMESTAMPTZis the correct choice because it avoids ambiguity and timezone-related bugs.”
If someone says:
INTEGERandBIGINTare exact whole numbers,DECIMALis exact for fractional values, andFLOATis approximate and should never be used for money.
The choice affects correctness, not just storage.
INTEGER / BIGINT — exact whole numbersINTEGER -- usually 32-bit
BIGINT -- usually 64-bit
INTEGER: ~ ±2 billionBIGINT: ~ ±9 quintillion✅ IDs, counters, quantities
✅ Status codes
✅ Amounts in minor units (cents)
balance_cents BIGINT
If money has no fractions → integers are perfect.
DECIMAL(p, s) / NUMERIC — exact decimalsDECIMAL(10, 2)
p = total digits
s = digits after decimal point
✅ Money
✅ Financial calculations
✅ Rates, percentages, measurements where correctness matters
price DECIMAL(10,2)
Why it matters
0.1 + 0.2 = 0.3 -- TRUE with DECIMAL
FLOAT / REAL / DOUBLE — approximate numbers 🚨REAL
DOUBLE PRECISION
The classic trap
0.1 + 0.2 = 0.30000000000000004
✅ Scientific data
✅ Metrics
✅ Sensor data
✅ ML / statistics
❌ Money ❌
Never use FLOAT for financial data.
4️⃣ Precision comparison (must remember)
| Type | Exact? | Fractional? | Typical use |
|---|---|---|---|
| INTEGER | ✅ | ❌ | IDs, counters |
| BIGINT | ✅ | ❌ | Large counters, money in cents |
| DECIMAL | ✅ | ✅ | Money, finance |
| FLOAT | ❌ | ✅ | Scientific, analytics |
INTEGER / BIGINT → fastest, exactDECIMAL → slower, exactFLOAT → fastest for math, inexactSenior mindset:
Correctness first, performance second.
❌ Using FLOAT for prices
❌ Mixing DECIMAL and FLOAT in calculations
❌ Using DECIMAL without defining scale
❌ Storing money as DOUBLE “because Java double”
NUMERIC == DECIMAL“
INTEGERandBIGINTstore exact whole numbers and are ideal for IDs and counters.DECIMALstores exact fractional values and is the correct choice for money and financial data.FLOATuses approximate representation and should be used only where small rounding errors are acceptable, such as scientific or statistical data.”
Implicit type casting can change query semantics, hide bugs, and—most importantly—prevent index usage by forcing the database to apply functions to indexed columns, leading to full scans instead of index scans.
Implicit casting happens when SQL automatically converts one type to another so a comparison can be made.
-- column is INTEGER
WHERE user_id = '42' -- string literal
The DB silently inserts a cast.
CREATE INDEX idx_orders_user_id ON orders(user_id);
SELECT *
FROM orders
WHERE user_id = '42';
What really happens (simplified):
WHERE user_id = CAST('42' AS INTEGER)
✅ Index can still be used in some DBs.
But this one is deadly:
SELECT *
FROM orders
WHERE CAST(user_id AS TEXT) = '42';
Execution plan:
Seq Scan on orders
❌ Index is not usable because the column is wrapped in a function.
If the indexed column is on the left side of a function or cast, the index cannot be used.
❌ Comparing DATE to TIMESTAMP
WHERE created_at = '2024-01-01'
Becomes:
WHERE created_at = TIMESTAMP '2024-01-01 00:00:00'
Almost never matches what you expect.
Better:
WHERE created_at >= DATE '2024-01-01'
AND created_at < DATE '2024-01-02'
WHERE amount = '100'
Works
But hides schema bugs
Can break if locale or format changes
JOIN users u ON o.user_id = u.external_id
Where:
user_id = INTexternal_id = VARCHARResult:
WHERE price > '100'
Depending on DB:
'100' < '20'!)This is data corruption waiting to happen.
Implicit casting can:
So a tiny type mismatch can:
turn a millisecond OLTP query into a seconds-long query.
WHERE user_id = 42
DATE '2024-01-01'
TIMESTAMP '2024-01-01 10:00:00'
INT ↔ VARCHAR joinsWHERE user_id = CAST(? AS INTEGER)
❌ Never:
WHERE CAST(user_id AS TEXT) = ?
“Implicit type casting can hide bugs and cause serious performance issues by preventing index usage.
When a cast or function is applied to an indexed column, the optimizer can’t use the index, often resulting in full scans or suboptimal join algorithms.
To avoid this, schemas and comparisons must use consistent data types and casts should be applied to constants, not columns.”
If someone says:
👉 That’s mid-level thinking.
NULLmeans “unknown or not applicable”, while0, empty string (''), andFALSEare known, concrete values.NULLpropagates through expressions and comparisons, whereas the others behave like normal values.
NULLNULLUNKNOWN0'' (empty string)FALSENULL = NULL → UNKNOWN
NULL <> NULL → UNKNOWN
NULL = 0 → UNKNOWN
NULL = '' → UNKNOWN
but
0 = 0 → TRUE
'' = '' → TRUE
FALSE = FALSE → TRUE
Consequence
WHERE column = NULL
Correct
WHERE column IS NULL
Logical expressions (three-valued logic)
TRUE AND NULL → UNKNOWN
FALSE AND NULL → FALSE
TRUE OR NULL → TRUE
FALSE OR NULL → UNKNOWN
NOT NULL → UNKNOWN
WHERE keeps only TRUE, so UNKNOWN rows are filtered out.
COUNT(column) -- ignores NULL
COUNT(*) -- counts rows
SUM(column) -- ignores NULL
But:
AVG(NULL) → NULL
This surprises many candidates.
UNIQUE allows multiple NULLs (in most DBs)NOT NULL enforces presenceCHECK (col > 0) → fails if col is NULL (because result is UNKNOWN)balance INT DEFAULT 0
Does 0 mean:
Correct modeling:
NULL for “unknown”“
NULLrepresents the absence of a value, not a value itself, while0, empty string, andFALSEare concrete values.
Comparisons withNULLevaluate toUNKNOWNdue to SQL’s three-valued logic, which is whyIS NULLmust be used instead of=.
MisusingNULLleads to subtle filtering and aggregation bugs.”
If a candidate says:
CHARis fixed-length and space-padded,VARCHARis variable-length with a limit, andTEXTis variable-length without a defined limit.
In practice,VARCHARis the default choice,CHARis for truly fixed-size values, andTEXTis for large or unbounded content.
CHAR(n)n charactersCHAR(10)
'abc' → 'abc '
CHAR(2))✅ Predictable size
❌ Wastes space
❌ Easy to misuse
CHARrarely improves performance in modern DBs.
VARCHAR(n)VARCHAR(255)
'abc' → 'abc'
✅ No wasted space
✅ Length constraint protects data
❌ Slight length overhead (negligible)
Default choice for application data.
TEXTTEXT
✅ No artificial limit
❌ Harder to reason about size
❌ Sometimes discouraged in strict schemas
TEXTandVARCHARhave similar performance in PostgreSQL.
| Type | Fixed | Max length | Padding | Typical use |
|---|---|---|---|---|
| CHAR(n) | ✅ | Yes | Yes | Country codes, legacy |
| VARCHAR(n) | ❌ | Yes | No | Most app fields |
| TEXT | ❌ | No | No | Large free text |
TEXT and VARCHAR almost identicallyBad myth:
“
VARCHAR(255)is faster thanTEXT”
❌ False in Postgres.
❌ Don’t use CHAR(n) for names or emails
❌ Don’t use TEXT when business rules require a max length
❌ Don’t pick VARCHAR(255) blindly “because everyone does”
“
CHARis fixed-length and space-padded, suitable only for truly fixed-size values.VARCHARis variable-length with a maximum and is the default choice for most application fields.TEXThas no declared limit and is used for large or unbounded content; in PostgreSQL it performs similarly toVARCHAR.”
In well-designed schemas:
MVCC is a concurrency control mechanism where the database keeps multiple versions of a row, allowing readers and writers to proceed without blocking each other.
To avoid read locks blocking writes (and vice versa) while preserving transactional isolation.
So:
No read locks.
T1: SELECT balance FROM accounts; → sees version V1
T2: UPDATE accounts SET balance=90; → creates version V2
T1: still sees V1
T3: sees V2
✅ Readers don’t block writers
✅ Writers don’t block readers
✅ High concurrency (OLTP-friendly)
✅ Consistent reads inside a transaction
❌ Dead tuples accumulate
❌ Requires VACUUM / AutoVacuum
❌ Long-running transactions prevent cleanup
❌ Storage bloat if mismanaged
This is why autovacuum is critical.
READ COMMITTED → snapshot per statementREPEATABLE READ → snapshot per transactionSERIALIZABLE → MVCC + conflict detectionMVCC allows multiple versions of rows so readers and writers don’t block each other, using transaction snapshots to decide which version is visible.
Autovacuum is PostgreSQL’s background process that cleans up dead tuples created by MVCC and refreshes statistics, so tables don’t bloat and the optimizer stays accurate.
Postgres uses MVCC, so updates/deletes create dead rows; autovacuum removes them and updates stats automatically.
Autovacuum triggers per table when changes exceed thresholds:
autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * table_sizeautovacuum_analyze_threshold + autovacuum_analyze_scale_factor * table_size(Defaults work for many cases; hot tables often need tuning.)
Autovacuum is PostgreSQL’s background worker that reclaims space from dead rows and keeps optimizer statistics fresh to maintain performance and prevent bloat.