The GROUP BY clause is used to:
Group rows that have the same values in specified columns into summary rows — like totals, counts, averages, etc.
✅ It’s most commonly used with aggregate functions like:
SUM()COUNT()AVG()MAX()MIN()
Why Use GROUP BY?
Without GROUP BY, aggregate functions summarize the entire table.
With GROUP BY, you can:
- Summarize data by categories.
- Get subtotals for each group.
- Analyze data in a more meaningful way.
Simple Example
Table: Sales
| sale_id | customer_id | amount |
|---|---|---|
| 1 | 101 | 100 |
| 2 | 101 | 150 |
| 3 | 102 | 200 |
| 4 | 103 | 250 |
Without GROUP BY
Total sales:
SELECT SUM(amount) FROM Sales;
➡️ 100 + 150 + 200 + 250 = 700
With GROUP BY
Total sales per customer:
SELECT customer_id, SUM(amount)
FROM Sales
GROUP BY customer_id;
Result:
| customer_id | SUM(amount) |
|---|---|
| 101 | 250 |
| 102 | 200 |
| 103 | 250 |
✅ Here:
- It groups rows by
customer_id. - For each group, it calculates the sum of
amount.
Key Points
| Feature | Description |
|---|---|
| Groups data | Rows with the same value in the group-by columns are combined. |
| Works with aggregates | Calculates SUM(), COUNT(), etc., per group. |
| Reduces number of rows | Returns one row per group, not per original row. |
| Columns rule | All selected columns must either be grouped or aggregated. |
What Happens Internally?
Step-by-step:
- Scan all the rows.
- Group rows with the same
customer_id. - Apply the
SUM(amount)function to each group. - Return one row per group.
⚡ Golden Rule
When you use GROUP BY, every column in the SELECT must be:
- In the
GROUP BYlist OR - Be an aggregate function.
❌ This is wrong:
SELECT customer_id, amount FROM Sales GROUP BY customer_id;
Because amount is not aggregated.
✅ Correct:
SELECT customer_id, SUM(amount) FROM Sales GROUP BY customer_id;
Real-Life Analogy
Imagine you have a list of purchases:
GROUP BYis like grouping all purchases by customer before totaling them.- It’s like summarizing all receipts per customer.
In Short
GROUP BY = Group similar rows together and apply aggregation on each group.