Big data project for
Telecommunications
About the Client
The Client is a leading provider of streaming services in the vibrant region of Southern Asia. With a massive user base comprising millions of individuals, their platform offers an unparalleled range of personalized content. Users can immerse themselves in a diverse selection of programs, including a wide array of live video channels that cater to various preferences. Moreover, the Client goes beyond traditional streaming services by introducing innovative and interactive features, allowing users to engage in exciting shopping experiences for example.
Complication
The Client's primary objective is to enhance their video platform to seamlessly accommodate a larger number of users, particularly during prime time. Additionally, they aim to expand their analytics capabilities to deliver more accurate content recommendations based on individual user history. Lastly, the Client seeks cost reduction measures to optimize their operations.
The challenges faced by the Client involve consolidating multiple data sources, maintaining existing system functionalities, and effectively managing substantial data volume, which necessitates handling a maximum of 40,000 requests per second. The previous implementation, built on an enterprise solution, is unable to efficiently extract value from the vast amounts of data records generated by users. The Client aims to prioritize real-time processing, but the existing solution is suboptimal due to its high latency and associated costs. Furthermore, the platform costs have already exceeded the Client's planned budget, and the demand for new feature ideas and increasing traffic necessitates an expansion of the solution. These requirements indicate a substantial need for architectural and design changes to the Client's platform.
The Value We Delivered
- Seamless User Experience: The new comprehensive platform implemented by Datumo experts can seamlessly process approximately 50,000 requests per second, ensuring smooth and uninterrupted streaming services even during peak usage times, with preparedness for future traffic increment. This enhances customer satisfaction and retention, leading to improved KPIs.
- Advanced Recommendation Algorithms: Datumo utilizes machine learning techniques to design, develop and train advanced recommendation models for the Client. Drawing from user history, these models can deliver more accurate and personalized content, resulting in improved user engagement on the Client's platform and ultimately increasing viewership and profitability. Achieving such accurate recommendations hinges on improvements to the Client's data layer. The introduction of separated data cubes has significantly expedited data analysis tasks. Consequently, not only can the recommendation models process user data efficiently, but it also greatly enhances the daily work of the Client's analysts with data. Datumo has also implemented CI/CD and deployment solutions for the ML models, ensuring that the recommendation algorithms can be utilized with minimal effort.
- Reliable Platform and Cost Reduction through Open-Source Solutions: By collaborating with Datumo, the platform is composed of robust and respected services, while the Client can avoid licensing costs as the implemented system is fully open-source. Apache Spark is utilized for powerful data processing jobs, while Ceph provides a reliable and resilient storage layer. With Apache Kafka as the messaging service, the data continuously produced by users is received and distributed efficiently. Apache Druid enables performant real-time data analysis with near zero latency. The automation and orchestration over all these services is handled by Apache Airflow. Thanks to optimized implementation of the platform by Datumo experts, the resource allocation is as low as possible, thus noticeably reducing the maintenance cost.
- Streamlined Analytics and Enhanced Data Visualization: The solution built on top of Turnilo BI tool presents data in user-friendly charts and reports, streamlining the activities performed by the Client's analytics teams. This reduces manual processes, enhances data analysis capabilities, and enables faster decision-making based on actionable insights. Whenever there is a need for a new dashboard or report, the generic and efficient implementation of the platform allows the fulfillment of these needs with minimal effort.
Innovative solutions and advanced technologies
The platform is set upon-premise using the Apache Kafka messaging service, Kafka Connect, Presto, Hive, Ceph, Apache Druid, Apache Spark (standalone cluster) and Apache Airflow. Deployment and infrastructure automation is grounded on Terraform, Ansible and Docker technologies.
Data produced by Client applications is sent to Apache Kafka, while Kafka Connect is used for streaming data records from Apache Kafka brokers to on-premise s3 implementation (Ceph). Both real-time and batch data is enriched by Apache Spark jobs and ingested into an Apache Druid data warehouse. Apache Airflow is used for orchestration and automation of the pipelines. Data for analytics and machine learning algorithms is available through Apache Presto. Visualization is provided by Apache Druid integrated UI’s (Apache Superset and Turnilo).
Knowledge Zone
Get to know us, discover our interests, projects and training courses.