Database.What problems can arise with concurrent access using transactions?

Excellent question! When multiple transactions access the same data concurrently, various data anomalies can occur — especially if isolation levels are low.

These problems can cause inconsistent, incorrect, or even lost data if not handled properly.

🔍 Common Problems with Concurrent Transactions:


1. 🧼 Dirty Read

One transaction reads uncommitted changes made by another.

Example:

  • T1 updates balance = 500 → 600 but hasn’t committed.
  • T2 reads balance = 600.
  • T1 rolls back → T2 used invalid data.

🔒 Prevented by: READ COMMITTED and higher.

2. 🔁 Non-Repeatable Read

A row is read twice, and the value changes between reads within the same transaction.

Example:

  • T1 reads user email = alice@mail.com.
  • T2 updates it to alice@new.com and commits.
  • T1 reads again: value changed.

🔒 Prevented by: REPEATABLE READ and SERIALIZABLE.

3. 👻 Phantom Read

A query returns different sets of rows when run twice in the same transaction due to new rows being inserted.

Example:

  • T1: SELECT * FROM Orders WHERE total > 100;
  • T2: inserts a new matching order and commits.
  • T1 runs the same query again — now sees a new row.

🔒 Prevented by: SERIALIZABLE.

4. ❌ Lost Update

Two transactions read the same row, modify it, and write back, overwriting each other’s changes.

Example:

  • T1 reads balance = 500, adds 100 → sets to 600
  • T2 also reads 500, adds 200 → sets to 700
  • Final result = 700 ❌ (T1’s change is lost)

🔒 Prevented by: proper locking, SERIALIZABLE, or optimistic concurrency control.

🧠 Summary Table:

ProblemDescriptionPrevented by
Dirty ReadReading uncommitted dataREAD COMMITTED or higher
Non-Repeatable ReadSame row read twice gives different dataREPEATABLE READ or higher
Phantom ReadRe-executed query returns new rowsSERIALIZABLE
Lost UpdateConcurrent writes overwrite each otherLocks, optimistic concurrency
Posted in Без рубрики | Comments Off on Database.What problems can arise with concurrent access using transactions?

Database.What are the levels of transaction isolation?

Transaction isolation levels define how isolated a transaction is from others — i.e., how much one transaction can “see” the uncommitted changes of another.

They are standardized by SQL ANSI and help balance data consistency vs performance.


🔒 The 4 Standard Isolation Levels (from lowest to highest):

Isolation LevelPreventsAllowsPerformanceConsistency
READ UNCOMMITTED✖ Dirty reads✓ All anomalies🔥 Fastest😵 Weakest
READ COMMITTED✔ Dirty reads✖ Non-repeatable, phantom reads⚡ Fast😐 Fair
REPEATABLE READ✔ Dirty + non-repeatable reads✖ Phantom reads⏳ Moderate😊 Strong
SERIALIZABLE✔ All anomalies✖ Everything🐢 Slowest💎 Strongest

🧪 What Does Each Level Do?

1. READ UNCOMMITTED

  • Transactions can see uncommitted changes of others.
  • Prone to dirty reads (reading data that may be rolled back).
  • Rarely used due to danger of corrupt logic.

2. READ COMMITTED (default in many DBs)

  • Can only read committed data.
  • Prevents dirty reads, but still allows:
    • Non-repeatable reads (same query returns different results)
    • Phantom reads (rows appear/disappear in repeated queries)

3. REPEATABLE READ

  • Prevents dirty reads and non-repeatable reads.
  • Ensures same query in same transaction always sees same rows.
  • Still allows phantom reads (new rows may show up if using SELECT WHERE)

4. SERIALIZABLE

  • Strictest level: transactions are executed as if they were serial (one at a time).
  • Prevents:
    • Dirty reads
    • Non-repeatable reads
    • Phantom reads
  • Ensures full isolation but with lower concurrency and higher locking overhead.

💡 Summary Diagram:

READ UNCOMMITTED < READ COMMITTED < REPEATABLE READ < SERIALIZABLE
       (More concurrency)                         (More isolation)
Posted in Без рубрики | Comments Off on Database.What are the levels of transaction isolation?

Database.What are the main properties of a transaction?

The main properties of a transaction are summarized by the acronym ACID, which ensures reliability and consistency in database operations.


🔐 ACID Properties:

PropertyDescription
A – AtomicityA transaction is all-or-nothing: either every operation succeeds, or none do. If any part fails, the entire transaction is rolled back.
C – ConsistencyA transaction takes the database from one valid state to another, preserving all defined rules, constraints, and relationships.
I – IsolationEach transaction is executed as if it were the only one running. Concurrent transactions don’t interfere with each other’s data.
D – DurabilityOnce a transaction is committed, its changes are permanent — even if the system crashes immediately afterward.

🧠 Example: Bank Transfer

BEGIN;

UPDATE Accounts SET balance = balance - 100 WHERE id = 1;
UPDATE Accounts SET balance = balance + 100 WHERE id = 2;

COMMIT;

If the system crashes after the first update:

  • Atomicity ensures both updates are rolled back
  • Consistency ensures no money is lost or created
  • Isolation prevents interference from other transfers
  • Durability ensures that if COMMIT runs, the transfer is saved
Posted in Без рубрики | Comments Off on Database.What are the main properties of a transaction?

Database.What is a “transaction”?

💼 What is a Transaction in a Database?

A transaction is a sequence of one or more SQL operations (such as INSERT, UPDATE, DELETE) that are executed as a single logical unit of work.

A transaction must either complete fully or have no effect at all — there is no in-between.


🔒 Transactions follow the ACID properties:

PropertyMeaning
A – AtomicityAll operations succeed or none do — it’s all-or-nothing
C – ConsistencyThe database remains in a valid state before and after the transaction
I – IsolationTransactions are isolated from each other while running
D – DurabilityOnce committed, the changes are permanent (even after crash)

🛠️ Example:

BEGIN;

UPDATE Accounts SET balance = balance - 100 WHERE id = 1;
UPDATE Accounts SET balance = balance + 100 WHERE id = 2;

COMMIT;

✅ If both updates succeed: COMMIT makes them permanent

❌ If any fails: ROLLBACK undoes all changes

🔄 Commands:

CommandDescription
BEGIN / START TRANSACTIONStarts a new transaction
COMMITSaves all changes
ROLLBACKCancels all changes since BEGIN

🔍 Why Use Transactions?

  • To transfer money safely between accounts
  • To batch multiple operations that must succeed together
  • To maintain data integrity in case of errors or crashes
Posted in Без рубрики | Comments Off on Database.What is a “transaction”?

Database.What is Low Selectivity problem, when we speaking about indexes ?

Selectivity = the percentage of rows that match a given condition.

  • High selectivity: few rows match
    → e.g. WHERE email = 'alice@example.com'
    → ✅ index is efficient
  • Low selectivity: many rows match
    → e.g. WHERE gender = 'M' in a table where 60% are 'M'
    → ❌ index might be inefficient

🧠 Why Is Low Selectivity Bad for Index Use?

When many rows match, using an index means:

  1. The database finds matching row IDs in the index
  2. Then it has to jump to each row individually (called random I/O)
  3. This is slower than just reading the whole table sequentially in memory or disk

✅ Example:

Assume this table has 1 million rows:

SELECT * FROM Users WHERE is_active = true;

If 950,000 rows have is_active = true, that’s very low selectivity

The index on is_active matches most of the table

The database says:
❌ “Too many matches — don’t waste time jumping around.”
✅ “Just scan the whole table — it’s faster.”

Visual Analogy:

  • 📇 Index access is like flipping through a card catalog to find books, but running back and forth in the library for each one.
  • 📚 Full table scan is like walking down every shelf and grabbing all relevant books in order — more efficient if you’re taking most of them.

🧪 Rule of Thumb:

  • If a condition matches >10–20% of the table, many databases prefer a full scan over index access.
Posted in Без рубрики | Comments Off on Database.What is Low Selectivity problem, when we speaking about indexes ?

Datasource.When is a full scan of a data set more advantageous than index access?

Despite indexes often improving performance, there are important cases where a full table scan is actually better than using an index.


When Is a Full Table Scan More Efficient?


1. When a Large Portion of the Table Matches

  • If a query needs to scan a high percentage of rows (e.g., >10–20%), a sequential scan is often faster than jumping around via an index.

✅ Example:

SELECT * FROM Orders WHERE order_date > '2000-01-01';

if 90% of the table matches, the index offers little benefit — the DB just scans the whole table efficiently.

2. Low Selectivity (Few Distinct Values)

  • If you filter on a low-cardinality column, the index doesn’t help much.

✅ Example:

SELECT * FROM Users WHERE gender = 'M';

If half of all rows are 'M', the DB will often choose a full scan.

3. Fetching All or Most Columns

  • Indexes store only some columns.
  • If you’re selecting all columns, the database may do an index lookup + row fetch for each row, which is slower than just reading the table directly.

✅ Example:

SELECT * FROM Products WHERE price > 10;

Even if there’s an index on price, if all columns are needed, the index might cause extra disk reads (called a “bookmark lookup”).

4. Small Tables

  • For small tables, it’s often faster to scan the whole thing than to use an index.
  • Index overhead may not be worth it for a 10-row table.

5. Outdated or Non-existent Statistics

  • If the database query planner lacks good stats, it might default to a full scan.
  • Also true if no suitable index exists.

6. Queries with Functions or Calculations on Indexed Columns

  • Expressions disable index use unless functional indexes exist.

❌ Example:

SELECT * FROM Users WHERE UPPER(email) = 'ALICE@EXAMPLE.COM';

📊 Summary Table

ScenarioIndex?Full Scan?
Small table
Most rows match (low selectivity)
Query returns many/all columns
High selectivity (few rows match)
WHERE on low-cardinality column
Complex expressions in WHERE clause✅ (unless function index)
Posted in Без рубрики | Comments Off on Datasource.When is a full scan of a data set more advantageous than index access?

Database.Does it make sense to index data that has a small number of possible values?

Great question — the answer is:

Usually, no, it does not make sense to index columns with a small number of distinct values (i.e., low cardinality), unless there’s a very specific reason.


📉 Why Not?

Columns like:

gender = 'M' or 'F'
status = 'active' or 'inactive'
is_deleted = true or false

Have very few possible values, so:

  • Each index entry matches many rows
  • The database will likely do a full table scan anyway, because the index isn’t selective
  • Using the index can be slower than just scanning the table

🔍 When Might It Still Make Sense?

✅ 1. In combination with other columns (composite index)

CREATE INDEX idx_status_created ON Users(status, created_at);

Useful if you often query:

SELECT * FROM Users
WHERE status = 'active' AND created_at > '2024-01-01';

Here, status helps narrow the data before applying the range on created_at.

✅ 2. In large, filtered queries or partitions

If:

  • The table is huge, and
  • The low-cardinality column is highly skewed (e.g., 90% 'active', 10% 'inactive'), and
  • You often query the rare value:
SELECT * FROM Users WHERE status = 'inactive';

Then an index might help — but it depends on the query planner and row distribution.

✅ 3. Bitmap Indexes (in data warehouses)

In OLAP databases (e.g., Oracle, Redshift), low-cardinality columns are perfect for bitmap indexes, which compress well and are efficient for complex filtering.

🧠 Rule of Thumb:

CardinalityShould You Index?
🔴 Very low (e.g., true/false)❌ No — often worse performance
🟡 Medium⚠️ Depends on query patterns and size
🟢 High (e.g., email, ID)✅ Yes — ideal for indexing
Posted in Без рубрики | Comments Off on Database.Does it make sense to index data that has a small number of possible values?

Database.How do i define my clustered index ?

1. In MySQL (InnoDB)

🔹 By default:

  • The primary key becomes the clustered index.
  • If no primary key is defined, InnoDB picks:
    • the first unique NOT NULL index, or
    • creates a hidden row_id.

🔹 To define your own clustered index:

You just declare a primary key, and it becomes clustered:

CREATE TABLE Users (
    user_id INT NOT NULL,
    name VARCHAR(100),
    email VARCHAR(100),
    PRIMARY KEY (user_id)  -- this is the clustered index
);

To use a different column:

CREATE TABLE Orders (
    order_number INT,
    customer_id INT,
    order_date DATE,
    PRIMARY KEY (order_date, order_number)  -- clustered index on (order_date, order_number)
);

🧠 MySQL does not let you create a clustered index directly by name — it’s always tied to the primary key in InnoDB.

2. In SQL Server

Here you have full control and can explicitly create a clustered index:

CREATE CLUSTERED INDEX idx_order_date
ON Orders(order_date);

You can also specify it during table creation:

CREATE TABLE Products (
    product_id INT,
    name VARCHAR(100),
    price DECIMAL(10, 2),
    CONSTRAINT pk_product PRIMARY KEY CLUSTERED (product_id)
);

3. In PostgreSQL

PostgreSQL does not support true clustered indexes — but it allows clustering once, and you must re-cluster manually when data changes.

CREATE INDEX idx_order_date ON Orders(order_date);
CLUSTER Orders USING idx_order_date;

⚠️ PostgreSQL won’t maintain the clustering automatically.

🧠 Summary Table

DBMSClustered Index Defined By
MySQLAutomatically via primary key (InnoDB)
SQL ServerExplicit with CREATE CLUSTERED INDEX
PostgreSQLNot automatic; use CLUSTER manually

🔑 In simple terms:

A clustered index determines the default physical order of rows in the table — the table is stored on disk sorted by this index.

So yes:

  • When you define a clustered index, the table rows are physically arranged in that order.
  • It’s like saying: “This is the natural sort order of the table.

📌 Key Points Recap:

ConceptDescription
📚 Clustered IndexDefines the physical order of rows in the table
📦 Table = IndexThe data is stored within the index itself (not separate)
🧱 Only One Per TableBecause you can’t store the rows in multiple physical orders
⚡ Fast for Range QueriesSince rows are already sorted, range scans (BETWEEN, <, >) are efficient
Posted in Без рубрики | Comments Off on Database.How do i define my clustered index ?

Database.What is clustered index ?

📘 What is a Clustered Index?

A clustered index is a special type of index that determines the physical order of rows in a table.

In other words: the table’s data is stored in the order of the clustered index.


🔑 Key Characteristics:

  1. Only one per table
    Because the data can be physically ordered only one way.
  2. Data is stored with the index
    The clustered index contains the actual table data — there is no separate storage.
  3. Primary Key = Clustered Index (by default)
    In many databases (e.g., MySQL InnoDB, SQL Server), the primary key automatically becomes the clustered index unless you specify otherwise.

📊 Example:

CREATE TABLE Users (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100)
);

id is the clustered index

Rows in the Users table are physically stored sorted by id

So:

SELECT * FROM Users WHERE id = 5;

➡️ The database can go directly to the row, without any extra lookup.

🧠 Think of it Like:

A clustered index is like a phonebook where the pages are sorted by last name.
You flip directly to the page you need — no extra lookup required.


🔄 Clustered vs Non-Clustered:

FeatureClustered IndexNon-Clustered Index
Data in index?✅ Yes, contains table data❌ No, points to data
Affects row order?✅ Yes❌ No
Number per tableOnly 1Many allowed
Lookup speed🔥 Very fast for indexed columnGood, but needs extra lookup

🛠 When to Use a Clustered Index:

Use a clustered index on:

  • Columns frequently used in WHERE, JOIN, ORDER BY
  • Columns with high selectivity (many unique values)
  • Primary keys (almost always clustered)
Posted in Без рубрики | Comments Off on Database.What is clustered index ?

Datasource.What is the difference between clustered and non-clustered indexes?

🗃️ 1. Clustered Index

A clustered index determines the physical order of rows in a table.

  • ✅ There can be only one clustered index per table.
  • ✅ The data is stored directly in the index.
  • ✅ The table is the index.

Example:

In a table with a clustered index on id, the rows are physically sorted by id.
Accessing a row via this index goes directly to the data — no lookup needed.

📂 2. Non-Clustered Index

A non-clustered index is a separate structure that points to the actual data rows.

  • ✅ You can create multiple non-clustered indexes per table.
  • ❌ Does not change the physical order of the data.
  • 📎 Contains pointers (row IDs or clustered index keys) to access the actual data.

Example:

A non-clustered index on email will contain sorted email values with pointers to the actual row location (which is elsewhere).


🔍 Visual Analogy:

TypeAnalogy
ClusteredA phonebook sorted by last name — the data is in the index
Non-ClusteredA book’s index — points you to a page where data lives

🧱 Example (SQL Server or MySQL InnoDB):

-- Clustered index (by default on PRIMARY KEY)
CREATE TABLE Users (
    id INT PRIMARY KEY,      -- clustered index
    email VARCHAR(100),
    name VARCHAR(100)
);

-- Non-clustered index on email
CREATE INDEX idx_email ON Users(email);

⚖️ Comparison Table

FeatureClustered IndexNon-Clustered Index
Physical row order✅ Yes (matches index order)❌ No (separate structure)
Data stored in index✅ Yes❌ No (index points to data)
Number per tableOnly 1Many allowed
Speed (lookup)✅ Fastest for range/equality on keySlightly slower (requires extra lookup)
SizeUsually smallerSlightly larger (includes pointers)

✅ When to Use:

ScenarioUse This Index Type
Primary key, sorting, fast retrievalClustered Index
Frequently searched non-key columnsNon-Clustered Index
Multiple searchable fieldsMultiple Non-Clustered
Posted in Без рубрики | Comments Off on Datasource.What is the difference between clustered and non-clustered indexes?