CityPulse: Real-Time Traffic Data Analytics and Congestion Prediction
Abstract: CityPulse is a proof-of-concept big data pipeline designed to enable real-time urban mobility analytics using scalable, containerized components -- without reliance on physical sensor infrastructure. The system simulates the ingestion of 11 million traffic-related records representing urban phenomena such as vehicle congestion, GPS coordinates, and weather conditions. Data is ingested through a Dockerized Apache Kafka cluster, coordinated by ZooKeeper, and processed in real time using Apache Spark Structured Streaming. To ensure robustness under load, the architecture introduces a temporary data storage layer that buffers Spark output before committing it to a centralized data warehouse. This design improves write efficiency, fault tolerance, and enables batch processing of intermediate results. The refined data feeds into a lightweight machine learning module and is served through a Flask backend with a React-based frontend for visualization and interaction. Stress testing shows that the system maintains over 300,000 records per minute throughput with only a 10\% increase in latency under full load conditions. With its modular Docker-based deployment, CityPulse offers a cost-effective and reproducible analytics solution for traffic congestion monitoring in resource-constrained environments, particularly in developing regions like Cameroon.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.