Database.Middle.What are surroget keys ?

Surrogate keys are artificial, system-generated keys used to uniquely identify rows in a database table. They are not derived from application data and have no business meaning — typically auto-increment integers or UUIDs.


🧠 Definition

A surrogate key is a standalone identifier created solely for the purpose of being a primary key, usually implemented as an ID column.


✅ Example

Table: users

id (surrogate key)emailusername
1john@example.comjohnny
2alice@example.comalice88

Here, id is a surrogate key — it’s:

  • Unique
  • System-generated
  • Has no meaning to the user

🆚 Surrogate Key vs Natural Key

FeatureSurrogate KeyNatural Key
Based on data?❌ No (synthetic)✅ Yes (e.g. email, SSN, VIN)
Stability✅ Very stable❌ Can change over time
Performance✅ Fast joins/indexes❌ Slower if long or complex
Human meaning❌ None✅ Usually has semantic meaning
Exampleuser_id SERIALemail, SSN, username

🧰 Why Use Surrogate Keys?

  • Immutable: won’t change even if business data does
  • Uniform type: predictable size (e.g., INT, UUID)
  • Simpler relationships: easier to define foreign keys
  • Avoid composite keys: simplifies joins and indexing

⚠️ When Not to Use

  • When the natural key is compact and immutable (e.g., ISO country codes, VINs, UUIDs)
  • When you want to prevent duplicates naturally with meaningful columns

🧾 Example in SQL

CREATE TABLE users (
  id SERIAL PRIMARY KEY,      -- surrogate key
  email VARCHAR(100) UNIQUE,  -- natural candidate key
  name TEXT
);

Or with UUID:

CREATE TABLE orders (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  customer_id INT,
  total_amount NUMERIC
);

🧠 Summary

TermMeaning
Surrogate KeyArtificial primary key with no business meaning
PurposeStable, consistent, unique row identifier
Common FormsSERIAL, BIGSERIAL, UUID
Best Use CaseGeneral-purpose primary key in most tables
Posted in Без рубрики | Leave a comment

Database.Middle.Explain the difference between OLTP and OLAP.

OLTP and OLAP are two fundamental types of database systems, each optimized for different workloads and purposes.


🧾 Definitions

TermStands ForPurpose
OLTPOnline Transaction ProcessingFast, real-time operations (e.g. purchases, signups)
OLAPOnline Analytical ProcessingComplex analysis, reports, trends

🔍 Core Differences

FeatureOLTPOLAP
📌 PurposeHandle day-to-day operationsPerform analytical queries and reporting
🧑‍💻 UsersFront-end apps, customers, clerksData analysts, decision-makers
💾 DataCurrent, operational dataHistorical, aggregated data
🧠 QueriesSimple and frequent (CRUD)Complex, long-running (GROUP BY, JOIN, etc.)
⚡ Speed FocusFast reads/writes for single rowsHigh-throughput reads across many rows
📊 ExamplesBanking system, e-commerce checkoutSales forecasting, business dashboards
🛠 Schema DesignNormalized (3NF) to reduce redundancyDenormalized (star/snowflake) for speed
🧪 TransactionsShort, ACID-compliantRead-heavy, fewer updates
🔁 Data UpdatesConstant inserts/updatesPeriodic bulk loads (ETL)

🏦 Example Systems

OLTP SystemOLAP System
Banking apps, ticketing, e-commerceData warehouses, BI dashboards
PostgreSQL, MySQL, SQL Server (OLTP mode)Amazon Redshift, Snowflake, BigQuery

🔧 Example Use Case

OLTP:

-- User places an order
INSERT INTO orders (user_id, product_id, quantity) VALUES (123, 45, 1);

OLAP:

-- Monthly sales by region
SELECT region, SUM(amount)
FROM sales
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY region;

🧠 Summary Table

ConceptOLTPOLAP
FocusSpeed + consistency for transactionsSpeed + depth for analysis
Data modelNormalizedDenormalized
Query typeShort + frequentComplex + infrequent
Typical DBPostgreSQL, MySQLSnowflake, Redshift, BigQuery
Best forApps, transactionsReports, trends, dashboards
Posted in Без рубрики | Leave a comment

Database.Middle.Partitions and sharding, differenes

Partitioning and sharding are both techniques for splitting large datasets into smaller parts — but they solve different problems and work at different levels.

Here’s a clear breakdown:


⚖️ Partitioning vs Sharding: Core Differences

FeaturePartitioningSharding
📍 LevelWithin a single database/serverAcross multiple databases/servers
🔧 Managed byDatabase engine internally (e.g., PostgreSQL)Application logic or sharding middleware
🎯 GoalImprove performance, manageabilityAchieve horizontal scalability and distribution
🔄 Transparent?Yes — user queries one logical tableNo — app often needs to know which shard to use
🗃️ Data locationAll data is on the same database instanceData is split across many machines
🔗 Joins/aggregatesEasy (same DB instance)Hard (needs cross-shard coordination)

📦 What Is Partitioning?

Partitioning divides a single large table into smaller, logical pieces called partitions, but all remain part of the same table and database.

Example:

-- sales table partitioned by year
sales_2022
sales_2023
sales_2024
  • You query sales, and the DB decides which partitions to access
  • Often used for time-series, log data, etc.

🌐 What Is Sharding?

Sharding splits the entire dataset across multiple databases or servers (called shards), each storing only a part of the data.

Example:

  • Shard 1 → users with ID 1–1M
  • Shard 2 → users with ID 1M–2M
  • Shard 3 → users with ID 2M–3M
  • The application or a middleware decides which shard to query
  • Common in big web apps (e.g., Twitter, Instagram)

🧠 Analogy

ConceptAnalogy
PartitioningLike cutting a pizza into slices 🍕
ShardingLike putting each slice on a different plate, in different rooms 🧩

🧰 When to Use Each

ScenarioUse This
Single-node performance tuningPartitioning
Handling billions of users/recordsSharding
Simplify index and query managementPartitioning
Scale beyond one serverSharding

🧠 Summary Table

AspectPartitioningSharding
Physical locationSame serverDifferent servers/databases
Managed byDB engineApplication or middleware
GoalQuery performance & maintainabilityScalability & high availability
Use caseTime-series, logs, big tablesMulti-tenant, global-scale apps
Posted in Без рубрики | Leave a comment

Database.Middle.do i see different tables in some sql explorer like dataGrip, for example ?

🔍 In DataGrip (or pgAdmin, MySQL Workbench, etc.)

When you’re using partitioning, you typically see:

  • The main table (logical parent)
  • Its child tables (actual partitions)

Example (PostgreSQL):

In your table list, you’ll see:

📁 Tables
   ├── sales          -- 👈 Parent table
   ├── sales_2023_q1  -- 👶 Partition
   ├── sales_2023_q2  -- 👶 Partition
  • You query the parent table (sales)
  • The data is stored in the partition tables (sales_2023_q1, etc.)
  • Each partition can be opened individually, showing its subset of data

🧠 What’s Actually Happening

  • The parent table is empty by itself (in most partitioning implementations)
  • When you SELECT * FROM sales, the DB internally runs:
SELECT * FROM sales_2023_q1
UNION ALL
SELECT * FROM sales_2023_q2
...

If partition pruning applies, only the relevant partition(s) are queried.

🧰 Database-Specific Behavior

DatabasePartition Table Appearance in Tools
PostgreSQLShows parent + child tables (visible)
MySQL (InnoDB)Shows only 1 logical table (partitions hidden)
OraclePartitions may not show as separate tables
SQL ServerPartitioning done via partition schemes (not separate tables)

✅ Summary

What You See in DataGripNotes
Parent tableThe table you query directly
Child/partition tables (PostgreSQL)Can be viewed, queried directly
One unified table (MySQL)Partitioning happens under the hood
Posted in Без рубрики | Leave a comment

Database.Middle.How does database partitioning work?

Database partitioning is a technique used to split a large table into smaller, more manageable pieces, called partitions, while keeping them logically part of the same table. This improves performance, scalability, and sometimes maintenance.


🧠 Why Partition?

  • Query performance: Only scan relevant partition(s)
  • Manageability: Easier to archive, purge, or back up
  • Parallelism: Different partitions can be read/written in parallel
  • Maintenance: Indexes and constraints apply per partition

📦 Types of Partitioning

TypeHow it worksExample
RangeSplit by a range of valuesDate ranges, ID ranges
ListSplit by discrete valuesCountry = US, UK, FR
HashSplit using a hash functionEven data distribution
CompositeCombines two types (e.g., range + hash)Date → then hash within each month

🔍 Example: Range Partitioning (PostgreSQL)

CREATE TABLE sales (
  id SERIAL,
  sale_date DATE NOT NULL,
  amount NUMERIC
) PARTITION BY RANGE (sale_date);

Then define partitions:

CREATE TABLE sales_2023_q1 PARTITION OF sales
  FOR VALUES FROM ('2023-01-01') TO ('2023-04-01');

CREATE TABLE sales_2023_q2 PARTITION OF sales
  FOR VALUES FROM ('2023-04-01') TO ('2023-07-01');

Now, when you query by date:

SELECT * FROM sales WHERE sale_date = '2023-02-10';

PostgreSQL automatically routes the query to sales_2023_q1 only → faster.

🧪 Other Partitioning Examples

✅ List Partitioning (MySQL, PostgreSQL 11+)

CREATE TABLE users (
  id INT,
  country_code TEXT
) PARTITION BY LIST (country_code);

CREATE TABLE users_us PARTITION OF users
  FOR VALUES IN ('US');

CREATE TABLE users_uk PARTITION OF users
  FOR VALUES IN ('UK');

🧪 Other Partitioning Examples

✅ List Partitioning (MySQL, PostgreSQL 11+)

CREATE TABLE users (
  id INT,
  country_code TEXT
) PARTITION BY LIST (country_code);

CREATE TABLE users_us PARTITION OF users
  FOR VALUES IN ('US');

CREATE TABLE users_uk PARTITION OF users
  FOR VALUES IN ('UK');

✅ Hash Partitioning (PostgreSQL / MySQL)

Useful for spreading rows evenly when ranges/lists aren’t practical.

CREATE TABLE logs (
  id INT,
  message TEXT
) PARTITION BY HASH (id);

CREATE TABLE logs_p0 PARTITION OF logs FOR VALUES WITH (modulus 4, remainder 0);
CREATE TABLE logs_p1 PARTITION OF logs FOR VALUES WITH (modulus 4, remainder 1);
-- etc.

🚦 Partition Pruning

Modern databases (PostgreSQL, MySQL, Oracle, etc.) prune unneeded partitions automatically during query execution, so only relevant partitions are scanned.

This boosts performance dramatically for filtered queries like:

SELECT * FROM sales WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31';

⚠️ Downsides / Gotchas

ConcernExplanation
❌ ComplexityMore complex DDL and indexing
❌ Not always fasterEspecially if queries span many partitions
❌ Insert routing overheadDatabase must decide where to insert
❌ Feature limitationsSome constraints or foreign keys may be restricted

✅ Summary

ConceptExplanation
PartitioningBreak large table into smaller parts
BenefitBetter performance, easier maintenance
TypesRange, List, Hash, Composite
Use CasesTime-series, multi-region, log data, huge tables
Posted in Без рубрики | Leave a comment

Database.Middle.What is a trigger? How and when would you use it?

A trigger in SQL is a database object that automatically executes (or “fires”) a specified block of code in response to certain events (like INSERT, UPDATE, or DELETE) on a table or view.


🔧 What Is a Trigger?

A trigger is like a hidden “watchdog” that runs custom logic automatically when data changes in a table.


🧠 When Would You Use a Trigger?

✅ Common Use Cases:

Use CaseDescription
Audit loggingLog who changed what and when
Enforcing business rulesBlock or transform certain changes
Auto-updating fieldsE.g., update updated_at on row change
Cascading actionsAutomatically affect related records
Preventing invalid changesValidate complex conditions before changes

🧾 Basic Trigger Syntax (PostgreSQL)

-- 1. Create a trigger function
CREATE OR REPLACE FUNCTION update_timestamp()
RETURNS TRIGGER AS $$
BEGIN
  NEW.updated_at := NOW();
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- 2. Attach trigger to a table
CREATE TRIGGER set_updated_at
BEFORE UPDATE ON users
FOR EACH ROW
EXECUTE FUNCTION update_timestamp();

✅ Now every time a users row is updated, updated_at is set to NOW() automatically.

🚦 Trigger Timing Options

KeywordFires When?
BEFORE INSERTBefore inserting a row
AFTER INSERTAfter inserting a row
BEFORE UPDATEBefore an update
AFTER UPDATEAfter an update
BEFORE DELETEBefore deleting a row
AFTER DELETEAfter deleting a row

🧮 FOR EACH ROW vs FOR EACH STATEMENT

OptionDescription
FOR EACH ROWFires once per affected row
FOR EACH STATEMENTFires once per SQL statement

🐬 Example: MySQL Trigger

CREATE TRIGGER log_deletes
AFTER DELETE ON users
FOR EACH ROW
INSERT INTO deleted_users_log(user_id, deleted_at)
VALUES (OLD.id, NOW());

⚠️ Trigger Considerations

AdvantageDisadvantage
✅ Automates integrity logic❌ Hidden logic → harder to debug
✅ Keeps schema DRY❌ Can affect performance
✅ Useful for audits or sync❌ Not portable across DB engines

🧠 Summary

ConceptTrigger
What is it?Code that runs automatically on events
When?INSERT, UPDATE, DELETE
Why?Automate logic, enforce rules, audit
How?CREATE TRIGGER ... with custom logic
Posted in Без рубрики | Leave a comment

Database.Middle.What is the difference between ROW_NUMBER(), RANK(), and DENSE_RANK()?

Great question! ROW_NUMBER(), RANK(), and DENSE_RANK() are ranking window functions in SQL that assign a number to each row based on some ordering criteria.

They look similar but behave differently when there are ties in the data (i.e., rows with the same value in the ORDER BY column).

📊 Quick Comparison

FunctionHandles Ties?Gaps in Rank?Always Unique?
ROW_NUMBER()❌ No✅ Yes✅ Yes
RANK()✅ Yes✅ Yes❌ No (may repeat)
DENSE_RANK()✅ Yes❌ No❌ No (may repeat)

🔍 Visual Example

Let’s say we have this table ordered by score DESC:

namescore
Alice100
Bob90
Carol90
Dave80

Query:

SELECT
  name,
  score,
  ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,
  RANK()       OVER (ORDER BY score DESC) AS rank,
  DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM scores;

📈 Output:

namescorerow_numrankdense_rank
Alice100111
Bob90222
Carol90322
Dave80443

🧠 Key Differences Explained

ROW_NUMBER()

  • Ignores ties.
  • Simply numbers rows sequentially.
  • Always unique.

RANK()

  • Gives same rank to tied rows.
  • Skips the next rank(s) to maintain total ordering.

Two rows tied at rank 2 → next row gets rank 4 (gap).

DENSE_RANK()

  • Also gives same rank to tied rows.
  • Does not skip the next rank.

Two rows tied at rank 2 → next row gets rank 3.


📦 When to Use Each

Use CaseRecommended Function
Need unique row indexROW_NUMBER()
Ranking with gaps (like sports)RANK()
Ranking without gapsDENSE_RANK()
Pagination with orderingROW_NUMBER()
Posted in Без рубрики | Leave a comment

Database.Middle.What are CTEs (Common Table Expressions)?

CTEs (Common Table Expressions) are a way to write temporary, reusable result sets (like named subqueries) in SQL, using the WITH keyword. They’re especially useful for making complex queries more readable, modular, and recursive.

📌 Basic Syntax

WITH cte_name AS (
  SELECT ...
)
SELECT * FROM cte_name;

Think of a CTE as a temporary view that only exists during the execution of the main query.


🧾 Example 1: Simple CTE

WITH recent_orders AS (
  SELECT * FROM orders
  WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT customer_id, COUNT(*) 
FROM recent_orders
GROUP BY customer_id;

What it does:

  • Creates a CTE called recent_orders
  • Then uses it as if it’s a table in the main SELECT

✅ Benefits of CTEs

FeatureDescription
🔍 ReadabilityBreaks large queries into understandable parts
🧩 ReusabilityCan refer to the same logic multiple times
🧠 Recursion supportEnables recursive queries like hierarchies
🔄 Avoids duplicationDon’t repeat long subqueries

🧾 Example 2: Recursive CTE (Hierarchy)

Suppose you have an employees table with a manager_id column:

WITH RECURSIVE org_chart AS (
  SELECT id, name, manager_id, 1 AS level
  FROM employees
  WHERE manager_id IS NULL  -- start from the CEO

  UNION ALL

  SELECT e.id, e.name, e.manager_id, oc.level + 1
  FROM employees e
  JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart;

✅ This will return the full hierarchy of employees, with levels!

🆚 CTE vs Subquery vs Temp Table

FeatureCTESubqueryTemp Table
Readability✅ Clear names❌ Can get messy✅ Clear names
Reusability✅ Yes❌ Not reusable✅ Yes
Performance🔄 Usually similar✅ Sometimes better✅ Good for huge datasets
Recursive support✅ Yes❌ No❌ No

🧠 Summary

  • A CTE is a named temporary result set used in a query.
  • Declared with the WITH clause
  • Can be recursive for things like org charts or tree traversal
  • Improves query clarity and modularity
Posted in Без рубрики | Leave a comment

Database.Middle.If i don’t use ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so i take all rows ?

Excellent question — and the answer depends on which window function you’re using, and whether you’re using ORDER BY in the OVER() clause.

Let’s clarify it fully:


✅ If You Use ORDER BY Without ROWS BETWEEN

When you write something like:

SUM(sale_amount) OVER (
  PARTITION BY employee_id
  ORDER BY sale_date
)

You are implicitly using:

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

✅ So YES — it’s the same as using it explicitly.

🧪 What Happens Under the Hood

The default window frame depends on what you specify:

Clause UsedInterpreted As
ORDER BY (without ROWS BETWEEN)ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
No ORDER BYRANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING (entire partition)

🧾 Example 1: With ORDER BY

SUM(sale_amount) OVER (
  PARTITION BY employee_id
  ORDER BY sale_date
)

→ Behaves like a running total


🧾 Example 2: Without ORDER BY

SUM(sale_amount) OVER (
  PARTITION BY employee_id
)

→ Behaves like SUM() per group (same result for every row in the partition), because it considers all rows in the partition.

🧠 Summary

SituationWindow Frame Used
PARTITION BY + ORDER BYROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
PARTITION BY only (no ORDER BY)RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING (entire group)
Explicit ROWS BETWEEN clauseAs specified
Posted in Без рубрики | Leave a comment

Database.Middle.WindowFunctions.Explain ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Great question! The clause:

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

is used within a window function to define the range of rows (called the window frame) that the function operates on relative to the current row.


🔍 What It Means

  • UNBOUNDED PRECEDING: Start from the first row in the partition (group)
  • CURRENT ROW: End at the current row (inclusive)

So you’re saying:

“Include all rows from the beginning of the group up to and including this one.”

✅ Used For: Running Totals, Running Averages, Running Max/Min

🧾 Example

SELECT
  employee_id,
  sale_date,
  sale_amount,
  SUM(sale_amount) OVER (
    PARTITION BY employee_id
    ORDER BY sale_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM sales;

This computes a running total for each employee’s sales in order of sale_date.


📊 Visualization

Let’s say we have this:

employee_idsale_datesale_amount
1012024-06-01100
1012024-06-02150
1012024-06-03200

Here’s what each row sees:

Current RowRows in FrameRunning SUM
Row 1[100]100
Row 2[100, 150]250
Row 3[100, 150, 200]450

🔁 Variants

Frame ClauseWhat It Includes
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWAll rows up to and including the current row
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWINGAll rows from current to end
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWINGA moving window of 3 rows (prev, current, next)
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGThe entire partition

🧠 Summary

Clause PartMeaning
ROWSOperate on physical rows, not values
UNBOUNDED PRECEDINGStart from first row in group
CURRENT ROWEnd at current row
ResultRolling calculation up to this row
Posted in Без рубрики | Leave a comment