Module 19: Advanced Capstone Portfolio

Capstone 3: Multi-threaded Data Collector

Build a multi-threaded data collector that fetches or simulates data from several sources, merges the results, and reports failures clearly. This capstone is where your concurrency work becomes a real workflow with limits, timeouts, and aggregation.

Author

Java Learner Editorial Team

Reviewer

Technical review by Java Learner

Last reviewed

2026-04-17

Java version

Java 25 LTS

How this lesson was prepared: AI-assisted draft, manually expanded into a full lesson guide, and checked against current official Java, Spring, testing, and delivery documentation.

Learning goals

Design a bounded concurrent pipeline that collects results from multiple tasks
Handle timeouts, partial failures, and result aggregation without corrupting state
Produce a concurrency project that demonstrates design maturity, not just parallelism

Before you start

You completed the earlier advanced modules or can already build small Java applications independently

Lesson roadmap: Start with the mental model, then follow the design choices, common pitfalls, and the practical workflow you should apply in a real project.

Project goal: Collect data from several sources concurrently, then store or summarize the results.

What to practice: Bounded concurrency, retries, progress reporting, and safe result aggregation.

You do not need a public web scraper specifically: The core lesson is managing concurrent external or simulated tasks cleanly.

Success check: The project should stay understandable and correct even when many tasks run at once.

Project goal: Collect data from several sources in parallel, normalize it into one format, and produce a final report. You can simulate remote calls if you do not want real HTTP dependencies yet.

Suggested design: One task per source, one executor to run them, one result object per task, and one aggregator that merges successful results while recording failures separately.

Milestone plan: Start with sequential collection, then parallel collection, then timeouts, then retries or failure reporting, then result aggregation and summary output.

Stretch ideas: Add cancellation, configurable concurrency limits, or a small persistence layer for saved results. Keep the first milestone focused on reliable task orchestration.

How to use the capstones: Build them like portfolio pieces, not toy exercises. Write down the scope first, finish a thin vertical slice, and only then add polish, tests, extra features, or deployment steps.

Project review mindset: A strong capstone shows design choices, not just code volume. Clear boundaries, naming, validation, error handling, and a short README often matter more than adding ten extra features.

Delivery habit: Each finished project should include a problem statement, setup instructions, example input and output, and at least one obvious next improvement for future iteration.

Runnable examples

A fixed pool keeps concurrency bounded

ExecutorService pool = Executors.newFixedThreadPool(4);

Expected output

The collector can run several tasks in parallel without spawning unbounded threads.

A fixed pool keeps data collection bounded

ExecutorService pool = Executors.newFixedThreadPool(4);
List<Future<String>> results = new ArrayList<>();

Expected output

The collector limits how many tasks run at once instead of creating unbounded threads.

Common mistakes

Updating one shared mutable result structure directly from every worker thread

Return per-task results and merge them in a controlled aggregation step.

Treating timeouts and partial failure as edge cases instead of core requirements

Write down how the collector behaves when one source is slow or broken before you start coding.

Mini exercise

List the states one source task can end in: success, timeout, failure, cancelled. Then decide how the final report should present each state to the user.

Summary

This capstone is about coordinated concurrent work.
Bounded pools and safe aggregation matter more than maximum thread count.
Retries and failure handling are part of the real design.
This capstone proves whether you can make concurrency observable and controlled.
Parallel collection is useful only when failure handling and aggregation are just as well designed.

Next step

The fourth capstone builds a full API backed by persistence.

Sources used

Capstone 3: Multi-threaded Data Collector

A fixed pool keeps concurrency bounded

A fixed pool keeps data collection bounded

What is the key design concern in this capstone?