Module 19: Advanced Capstone Portfolio
Capstone 3: Multi-threaded Data Collector
Build a multi-threaded data collector that fetches or simulates data from several sources, merges the results, and reports failures clearly. This capstone is where your concurrency work becomes a real workflow with limits, timeouts, and aggregation.
Author
Java Learner Editorial Team
Reviewer
Technical review by Java Learner
Last reviewed
2026-04-17
Java version
Java 25 LTS
Learning goals
- Design a bounded concurrent pipeline that collects results from multiple tasks
- Handle timeouts, partial failures, and result aggregation without corrupting state
- Produce a concurrency project that demonstrates design maturity, not just parallelism
Before you start
- You completed the earlier advanced modules or can already build small Java applications independently
Lesson roadmap: Start with the mental model, then follow the design choices, common pitfalls, and the practical workflow you should apply in a real project.
Project goal: Collect data from several sources concurrently, then store or summarize the results.
What to practice: Bounded concurrency, retries, progress reporting, and safe result aggregation.
You do not need a public web scraper specifically: The core lesson is managing concurrent external or simulated tasks cleanly.
Success check: The project should stay understandable and correct even when many tasks run at once.
Project goal: Collect data from several sources in parallel, normalize it into one format, and produce a final report. You can simulate remote calls if you do not want real HTTP dependencies yet.
Suggested design: One task per source, one executor to run them, one result object per task, and one aggregator that merges successful results while recording failures separately.
Milestone plan: Start with sequential collection, then parallel collection, then timeouts, then retries or failure reporting, then result aggregation and summary output.
Stretch ideas: Add cancellation, configurable concurrency limits, or a small persistence layer for saved results. Keep the first milestone focused on reliable task orchestration.
How to use the capstones: Build them like portfolio pieces, not toy exercises. Write down the scope first, finish a thin vertical slice, and only then add polish, tests, extra features, or deployment steps.
Project review mindset: A strong capstone shows design choices, not just code volume. Clear boundaries, naming, validation, error handling, and a short README often matter more than adding ten extra features.
Delivery habit: Each finished project should include a problem statement, setup instructions, example input and output, and at least one obvious next improvement for future iteration.
Runnable examples
A fixed pool keeps concurrency bounded
ExecutorService pool = Executors.newFixedThreadPool(4);Expected output
The collector can run several tasks in parallel without spawning unbounded threads.
A fixed pool keeps data collection bounded
ExecutorService pool = Executors.newFixedThreadPool(4);
List<Future<String>> results = new ArrayList<>();Expected output
The collector limits how many tasks run at once instead of creating unbounded threads.
Common mistakes
Updating one shared mutable result structure directly from every worker thread
Return per-task results and merge them in a controlled aggregation step.
Treating timeouts and partial failure as edge cases instead of core requirements
Write down how the collector behaves when one source is slow or broken before you start coding.
Mini exercise
List the states one source task can end in: success, timeout, failure, cancelled. Then decide how the final report should present each state to the user.
Summary
- This capstone is about coordinated concurrent work.
- Bounded pools and safe aggregation matter more than maximum thread count.
- Retries and failure handling are part of the real design.
- This capstone proves whether you can make concurrency observable and controlled.
- Parallel collection is useful only when failure handling and aggregation are just as well designed.
Next step
The fourth capstone builds a full API backed by persistence.
Sources used