Big data project for
Telecommunications
Why?
The Client is a major provider of streaming services to Southern Asia. They decided to improve Vertica-based clickstream analytics and migrate it to an open-source solution. There was a need to integrate additional applications, expand analytics and add machine learning, primarily for recommendation algorithms. The main issue was to manage the size of the data which was 15k RPS (requests per second), establish how to store it and make it available to show and interpret on graphs in near real-time. The Client expected the data to be presented in charts and reports and to be readily available.
What & how?
The platform was deployed from scratch. It is set upon-premise using Apache Kafka, Kafka Connect, Presto, Hive, Ceph, Apache Druid, Apache Spark (standalone), Airflow. Deployment is done by using Terraform, Ansible and Docker. This full open-source solution was chosen because apart from the lack of licensing costs, there was a highly skilled monitoring and support team. Data from applications is sent to Apache Kafka. Kafka Connect is used for streaming data from Apache Kafka to on-premise s3 implementation (ceph). Both streaming and batch data are enriched by Apache Spark and ingested into the Apache Druid data warehouse. Apache Airflow is used for orchestration. Data for analytics and machine learning algorithms is available through Apache Presto. Data is visualised by Apache Druid integrated UI’s (Apache Superset andTurnilo).
Benefits
- migration of Vertica-based solution, keeping all previous system functionalities, adding new features
- setting up hardware requirements to handle the size of the data - our solution can process about 50k RPS with a maximum of one-minute data delay
- unification of multiple data sources
- creation of pipeline starting from Apache Kafka and ending with Visualization and data preparation for machine learning
- Apache Airflow DAGs creation and management
- creation of Spark Jobs - data enrichment and reporting
Knowledge Zone
Get to know us, discover our interests, projects and training courses.