Database.JPA Stream vs List: What Actually Happens Under the Hood

Introduction

In JPA and Hibernate, repository methods can return either List<DTO> or Stream<DTO>.
At first glance, this looks like a minor API choice. In reality, it reflects two fundamentally different data processing models with serious implications for memory usage, GC behavior, latency, and scalability.

This article explains:

  • how Stream<DTO> works at the JDBC and SQL level
  • why it does not execute multiple SQL queries
  • where the real savings come from
  • when streaming is useful — and when it is pointless or dangerous

List<DTO>: Eager Materialization





List<UserDto> users = repository.findAllDtos();

What happens internally

  1. Hibernate executes one SQL query
  2. JDBC reads the entire ResultSet
  3. Hibernate:
    • creates all DTO objects
    • stores them in a List
  4. The full result stays in memory until the list is released

Characteristics

  • Memory usage: O(N)
  • GC pressure: high
  • Simple and safe
  • Suitable for:
    • UI requests
    • pagination
    • small or medium result sets

Key limitation

Even if you later process the list with streams or filters, all rows are already loaded into memory.

Stream<DTO>: Cursor-Based Processing

@Transactional(readOnly = true)
try (Stream<UserDto> stream = repository.streamAllDtos()) {
    stream.forEach(this::process);
}

What happens internally

  1. Hibernate executes one SQL query
  2. JDBC opens a ResultSet backed by a database cursor
  3. Rows are read incrementally, using ResultSet.next()
  4. DTOs are created on demand
  5. Previously processed rows can be garbage-collected

Important clarification

  • SQL is executed once
  • Hibernate does not re-query the table
  • Fetching happens at the JDBC protocol level, not SQL level

Does ResultSet Hold All Rows in Memory?

Not necessarily.
A ResultSet is a cursor, not a collection.

Correct streaming behavior requires:

  • autoCommit = false
  • fetchSize > 0
  • an active transaction
  • a JDBC driver that supports cursor-based fetching

Example (PostgreSQL):

spring.jpa.properties.hibernate.jdbc.fetch_size=100

With this configuration:

  • the driver fetches rows in batches (e.g. 100 at a time)
  • memory usage on the client stays bounded
  • previously read rows are discarded

Without these settings

The JDBC driver may:

  • read the entire result into memory
  • turn your “stream” into a fake stream

Why Is Streaming Useful If We Still Read All Rows?

This is the key conceptual point.

The benefit is not reading fewer rows

The benefit is not holding all rows at the same time.

Real advantages

1. Stable memory usage

  • Heap does not grow with result size
  • No massive object retention
  • Predictable GC behavior

2. Reduced GC pressure

  • Objects become eligible for GC immediately after processing
  • No long-lived lists

3. Early termination

repository.streamAllDtos()
    .filter(...)
    .findFirst();

Only the necessary rows are read from the ResultSet.

4. Back-pressure

Streaming naturally matches the speed of:

  • file writes
  • network calls
  • Kafka producers
  • external APIs

What Streaming Does Not Solve

  • It does not make billion-row scans fast
  • It does not reduce I/O cost
  • It does not remove the need for proper data partitioning

For very large datasets, better approaches include:

  • keyset pagination
  • range-based processing
  • table partitioning
  • database-native export tools
  • distributed processing systems

Critical Pitfall: Streaming Entities

Stream<User> streamAll(); // dangerous

Hibernate stores entities in the persistence context.

Result:

  • memory grows anyway
  • streaming loses its benefit
  • potential memory leak

Correct approach

Stream<UserDto> streamAllDtos(); // safe

When Stream<DTO> Is Worth Using

ScenarioRecommended
UI / REST responses❌ No
Small datasets❌ No
Batch processing✅ Yes
Data export✅ Yes
ETL jobs✅ Yes
Early-stop queries✅ Yes
Limited heap✅ Yes

Conclusion

Stream<DTO> in JPA does not execute multiple SQL queries and does not magically reduce I/O.
Its value lies in controlling memory usage, GC pressure, and processing flow.

Used correctly, it enables stable, scalable batch processing.
Used blindly, it provides no benefit and can even cause production issues.


Interview Question Based on This Topic

Question:
In JPA, what is the difference between returning List<DTO> and Stream<DTO> from a repository method?
Does Stream execute multiple SQL queries, and where does the actual performance and memory benefit come from?

What a strong answer should mention

  • single SQL execution
  • JDBC ResultSet and cursor behavior
  • fetch size and transaction requirements
  • memory and GC impact
  • DTO vs Entity streaming
  • when streaming is appropriate
This entry was posted in Без рубрики. Bookmark the permalink.