Big data project for
E-commerce
Why?
Private data centres have already begun to impede the dynamic growth of companies. Many new major initiatives require resource planning in advance and waiting for shipments of new hardware which can take months.
The situation is even worse for e-commerce platforms where traffic is significantly higher during events like Black Friday. Such Companies have to invest in data centres in advance to handle peak traffic which lasts only for a short period. As a result, most resources are not fully utilised for the remainder of the year.
Cloud migration is the best solution for this problem. Cloud Providers offer pay-as-you-go pricing models, which allow organisations to pay only for the resources they use. Cloud-based solutions are highly scalable, which is especially important for organisations that experience spikes in data volume.
This is why one of the largest e-commerce platforms in Europe decided to cooperate with us to migrate the Big Data and AI platform to the Google Cloud Platform. The Datumo team of experts helped the Client in facilitating their path to the Cloud.
What & how?
The Client’s Big Data and AI on-premise platform was based on a common Hadoop stack. HDFS and Hive were used as a data lake. Data Was transformed using Apache Spark applications on one large Apache Yarn Cluster. Processing workflows were defined and orchestrated by Apache Airflow.
In improve and move migration, platform components were replaced by their managed or serverless GCP equivalents. BigQuery became a new serverless data lake. One enterprise long-living Spark cluster was replaced by ephemeral Dataproc clusters which were created only for processing time. Cloud Composer was used as a managed Apache Airflow cluster.
It seems quite straightforward, however the petabyte datascale, more than 1k workflows and the complexity of adapting the Hadoop stack to the Cloud, presents a demanding migration challenge. To take full advantage of Cloud-native capabilities, workloads had to be refactored orre-architectured. SQL-based processing was replaced by serverless BigQuery jobs, and heavy computational aspects of Spark processing were offloaded to BigQuery.
Datumo took a holistic approach to this project. Not only did our work accelerate the speed of migration, but it also helped in performance optimization and FinOps. In many cases, the implemented improvements greatly reduced the processing runtime while still utilising the same processing power, thus resulting in cost reduction. The team also improved data quality through precise validation and conducted multiple knowledge transfer sessions with the Client.
Benefits
Datumo’s work accelerated the Big Data platform migration to the Cloud. The Client can now take full advantage of Cloud possibilities and managers do not have to purchase hardware for new projects. Instead, they can focus on their business aspects. The new platform can dynamically scale up to handle Black Friday and holiday data volume.
Datumo took a holistic approach to this project. Not only did our work accelerate the speed of migration, but it also helped in FinOps. With the technical improvements introduced by Datumo experts, during an average month of FinOps work the Client saved 400 thousand USD per year on Cloud costs.
Knowledge Zone
Get to know us, discover our interests, projects and training courses.