🔎 The root cause: how SQL JOINs work with one-to-many
Imagine you have:
Author
(id=1) with 3 Books (Book A, Book B, Book C).
When you do a JOIN to get authors and their books:
SELECT a.*, b.* FROM authors a
JOIN books b ON a.id = b.author_id
Your SQL result will look like:
| author_id | author_name | book_id | book_title |
|----------|-------------|---------|------------|
| 1 | John Smith | 101 | Book A |
| 1 | John Smith | 102 | Book B |
| 1 | John Smith | 103 | Book C |
Notice what happens:
- The same author appears 3 times — once for each book — because that’s how SQL JOINs work: they produce a row per combination of parent-child.
🚨 Without DISTINCT
in JPQL
Hibernate reads those rows and reconstructs objects.
But if your query is:
SELECT a FROM Author a JOIN FETCH a.books
Hibernate sees 3 rows, each with an Author, and by default adds all of them to the result list → so your List<Author>
will have 3 elements, each seemingly the same Author (but possibly different object instances pointing to the same DB row).
That’s why you get duplicates.
✅ With DISTINCT
in JPQL
When you write:
SELECT DISTINCT a FROM Author a JOIN FETCH a.books
Hibernate collects all Authors from the result rows, but then deduplicates them in-memory based on their IDs, so you only get one Author
instance in your List<Author>
, no matter how many rows the JOIN produced.
🔎 Key point:
- The duplication isn’t a bug — it’s the nature of SQL’s one-to-many joins.
- But it becomes a problem when you want a clean list of unique parents in your Java collection.
✅ Always use DISTINCT
when JOIN FETCHing a collection
It avoids surprises like:
- lists containing dozens or hundreds of duplicates of the same parent entity,
- bugs where
List.size()
is misleading, - unexpected UI issues (e.g., table rows showing duplicates).