Introduction
In JPA and Hibernate, repository methods can return either List<DTO> or Stream<DTO>.
At first glance, this looks like a minor API choice. In reality, it reflects two fundamentally different data processing models with serious implications for memory usage, GC behavior, latency, and scalability.
This article explains:
- how
Stream<DTO>works at the JDBC and SQL level - why it does not execute multiple SQL queries
- where the real savings come from
- when streaming is useful — and when it is pointless or dangerous
List<DTO>: Eager Materialization
List<UserDto> users = repository.findAllDtos();
What happens internally
- Hibernate executes one SQL query
- JDBC reads the entire ResultSet
- Hibernate:
- creates all DTO objects
- stores them in a
List
- The full result stays in memory until the list is released
Characteristics
- Memory usage: O(N)
- GC pressure: high
- Simple and safe
- Suitable for:
- UI requests
- pagination
- small or medium result sets
Key limitation
Even if you later process the list with streams or filters, all rows are already loaded into memory.
Stream<DTO>: Cursor-Based Processing
@Transactional(readOnly = true)
try (Stream<UserDto> stream = repository.streamAllDtos()) {
stream.forEach(this::process);
}
What happens internally
- Hibernate executes one SQL query
- JDBC opens a ResultSet backed by a database cursor
- Rows are read incrementally, using
ResultSet.next() - DTOs are created on demand
- Previously processed rows can be garbage-collected
Important clarification
- SQL is executed once
- Hibernate does not re-query the table
- Fetching happens at the JDBC protocol level, not SQL level
Does ResultSet Hold All Rows in Memory?
Not necessarily.
A ResultSet is a cursor, not a collection.
Correct streaming behavior requires:
autoCommit = falsefetchSize > 0- an active transaction
- a JDBC driver that supports cursor-based fetching
Example (PostgreSQL):
spring.jpa.properties.hibernate.jdbc.fetch_size=100
With this configuration:
- the driver fetches rows in batches (e.g. 100 at a time)
- memory usage on the client stays bounded
- previously read rows are discarded
Without these settings
The JDBC driver may:
- read the entire result into memory
- turn your “stream” into a fake stream
Why Is Streaming Useful If We Still Read All Rows?
This is the key conceptual point.
The benefit is not reading fewer rows
The benefit is not holding all rows at the same time.
Real advantages
1. Stable memory usage
- Heap does not grow with result size
- No massive object retention
- Predictable GC behavior
2. Reduced GC pressure
- Objects become eligible for GC immediately after processing
- No long-lived lists
3. Early termination
repository.streamAllDtos()
.filter(...)
.findFirst();
Only the necessary rows are read from the ResultSet.
4. Back-pressure
Streaming naturally matches the speed of:
- file writes
- network calls
- Kafka producers
- external APIs
What Streaming Does Not Solve
- It does not make billion-row scans fast
- It does not reduce I/O cost
- It does not remove the need for proper data partitioning
For very large datasets, better approaches include:
- keyset pagination
- range-based processing
- table partitioning
- database-native export tools
- distributed processing systems
Critical Pitfall: Streaming Entities
Stream<User> streamAll(); // dangerous
Hibernate stores entities in the persistence context.
Result:
- memory grows anyway
- streaming loses its benefit
- potential memory leak
Correct approach
Stream<UserDto> streamAllDtos(); // safe
When Stream<DTO> Is Worth Using
| Scenario | Recommended |
|---|---|
| UI / REST responses | ❌ No |
| Small datasets | ❌ No |
| Batch processing | ✅ Yes |
| Data export | ✅ Yes |
| ETL jobs | ✅ Yes |
| Early-stop queries | ✅ Yes |
| Limited heap | ✅ Yes |
Conclusion
Stream<DTO> in JPA does not execute multiple SQL queries and does not magically reduce I/O.
Its value lies in controlling memory usage, GC pressure, and processing flow.
Used correctly, it enables stable, scalable batch processing.
Used blindly, it provides no benefit and can even cause production issues.
Interview Question Based on This Topic
Question:
In JPA, what is the difference between returningList<DTO>andStream<DTO>from a repository method?
DoesStreamexecute multiple SQL queries, and where does the actual performance and memory benefit come from?
What a strong answer should mention
- single SQL execution
- JDBC
ResultSetand cursor behavior - fetch size and transaction requirements
- memory and GC impact
- DTO vs Entity streaming
- when streaming is appropriate