How Datumo Transformed merXu's Data Management with the Google Cloud Platform
August 21, 2023
In the rapidly evolving world of e-commerce, data has become the lifeblood of businesses seeking to gain a competitive edge. As a premium online trading platform for B2B companies across the European Union, merXu is no stranger to the challenges of managing an ever-growing volume of data. With a presence in multiple countries and a diverse range of industrial sectors, merXu's platform serves as a vital hub for businesses of all sizes, connecting verified buyers and sellers to facilitate direct transactions or inquiries.
Due to the rapid growth of merXu’s platform, the company faced an increasingly complex data landscape. Raw pieces of valuable information were continuously flowing into their servers, sparking numerous ideas on how this data could be utilized by the company's Analysis and Strategy team. However, this rapid influx of data and frequent changes to the platform led to a reliance on ad-hoc solutions, resulting in a need for automation, standardization, and proper monitoring techniques.
Recognizing the need for a more robust and scalable data management approach, merXu sought the expertise of Datumo. The challenge was to stabilize the data environment, share cloud and data engineering best practices, and create generic data pipelines that would seamlessly handle merXu's future analysis needs. By leveraging the power of the Google Cloud Platform and drawing on their extensive experience in data engineering, Datumo aimed to revolutionize merXu's data management and set the stage for a bright, data-driven future.
Transforming Data Management with Apache Airflow
As merXu's data volume grew, they needed a robust solution to manage and process their data effectively. Datumo turned to Apache Airflow, a powerful open-source platform for orchestrating complex data workflows. Deploying Airflow on Google Kubernetes Engine (GKE) allowed merXu to maintain consistency with their other services running on GKE while benefiting from the platform’s autoscaling and rolling update capabilities.
One of the main advantages of implementing Airflow for merXu was the heightened resilience to errors, increased dependency control, and ease of orchestration. Datumo's experts began by designing custom Directed Acyclic Graphs (DAGs) within Airflow, defining data pipelines and the flow of processes specific to merXu's needs. This allowed merXu to benefit from deterministic, well-organized dependencies that catered to their business requirements, and ensured that transformations were efficiently executed with proper handling of potential failures, whether originating from the processes themselves or upstream dependencies.
merXu had data pipelines that involved transferring raw data, delivered irregularly to a Google Cloud Storage (GCS) bucket, into BigQuery tables. They relied on intricate and undefined processes to handle various scenarios. By implementing DAGs leveraging apache–airflow–providers--google facilities like Sensors monitoring the availability of new data, Datumo was able to create a lightweight and well-structured process, eliminating the need for complex, convoluted solutions.
Datumo also integrated merXu's Airflow with their Slack environment, enabling monitoring alerts to be sent to dedicated channels, ensuring a quick response to any potential issues. This seamless integration provided merXu with the ability to manage their data workflows more effectively, ensuring a smooth and efficiently run environment.
merXu's previous data backup method involved daily data dumps and storage to a GCS bucket, which generated unnecessary storage costs. Datumo introduced BigQuery Snapshots as a cost-effective alternative for backing up data. This solution only required storage for actual changes in the tables, significantly reducing expenses by not generating extra costs for unchanged data.
In addition to implementing Apache Airflow, Datumo shared best practices with merXu regarding the use of the Google Cloud Platform as a Big Data platform. This included optimizing BigQuery for access and backups, as well as offering guidance on configuring storage classes for different types of data in Google Cloud Storage, further reducing costs.
With the implementation of Airflow, merXu gained the ability to efficiently execute data processing jobs, manage dependencies better, and easily orchestrate data workflows. Datumo also developed DAGs for data backup and maintenance tasks, ensuring that merXu's environment remained clean and efficient.
By leveraging the capabilities of Apache Airflow and Google Cloud Platform, Datumo transformed merXu’s data management processes, optimizing costs, and improving overall efficiency. As a result, the fast-growing e-commerce platform was better equipped to handle its increasing data volume and make data-driven decisions.
A Reliable, Easy-to-Manage Data Pipeline
With the implementation of Datumo's improvements, merXu experienced a transformative shift in the way they managed their data pipelines. The once time-consuming and complex processes were streamlined, making it easier for the team to create, monitor, and maintain core data workflows.
Thanks to Datumo's implementation of generic solutions and pipelines based on DAGs in Airflow, merXu was able to launch new data processing workflows that seamlessly integrated multiple Google Cloud Platform services into a single, reliable pipeline, simply by providing a straightforward configuration. This approach greatly simplified the process of setting up and managing data pipelines for merXu's team. This marked a significant reduction in the time and effort required to build and maintain data pipelines, freeing up valuable time and resources for the team to focus on other business-critical tasks.
Thanks to the seamless integration of Airflow with BigQuery, Google Cloud Storage, and other Google Cloud Platform services, merXu's data now flows uninterruptedly through the system, ensuring that the company's analysts have access to up-to-date, accurate information. This has greatly enhanced their ability to perform complex business intelligence tasks, empowering merXu to make data-driven decisions and identify new opportunities for growth and expansion.
Furthermore, the adoption of monitored orchestration based on Apache Airflow has significantly improved the reliability and stability of merXu's data processes. All tasks within the data pipelines are properly executed and, in case of any issues, the support team receives transparent and clear information that allows them to quickly detect the cause and implement a solution. This proactive approach to monitoring has not only minimized the risk of data-related disruptions but also fostered a culture of continuous improvement within the organization.
Overall, the partnership with Datumo and the utilization of the Google Cloud Platform have transformed merXu's data management landscape, resulting in more reliable, efficient, and easy-to-manage data pipelines. This newfound agility has positioned merXu for continued success, enabling them to harness the power of their data to drive innovation and fuel growth.
A Bright, Data-Driven Future for merXu
The collaboration between merXu and Datumo has set the stage for a new era in data management for the thriving e-commerce platform. By utilizing the power of Apache Airflow and the Google Cloud Platform, merXu has successfully transformed its data analytics, enabling the company to fully capitalize on the wealth of information generated by its rapid growth.
The new, reliable, and easy-to-manage data pipeline has not only streamlined merXu's data processes but also empowered the company's Analysis and Strategy team to make informed, data-driven decisions. This has, in turn, allowed merXu to identify new opportunities for growth, expansion, and innovation across its diverse range of industrial sectors.
Furthermore, the partnership with Datumo has had a lasting impact on the organization as a whole. By sharing cloud and data engineering best practices, Datumo has effectively upskilled merXu's team, equipping them with the knowledge and expertise necessary to maintain and further develop the new data infrastructure. This transfer of knowledge has fostered a culture of continuous improvement and innovation within the company, ensuring that merXu remains at the forefront of data-driven decision-making in the competitive e-commerce landscape.
As merXu looks to the future, the scalability and adaptability of the data pipeline solutions implemented by Datumo will prove invaluable. With the ability to easily introduce new data pipelines to address evolving business requirements, merXu's data infrastructure is well-equipped to support the company's continued growth and success. Moreover, the enhanced reliability and stability of the data processes have given merXu the confidence to embrace new challenges and opportunities, knowing that their data foundation is secure and robust.
Ultimately, the partnership between merXu and Datumo has paved the way for a bright, data-driven future. By transforming the way merXu manages and leverages its data, the company is now well-positioned to unlock its full potential, harnessing the power of information to drive innovation, growth, and long-term success in the ever-evolving world of e-commerce.
Share this post
Infrastructure As Code
Data engineer in Datumo, enthusiast of things that are automated, simple and reliable (and cloud-based). Runner.
Get to know us, discover our interests, projects and training courses.
Getting familiar with Krew plugins. Part I. Do you like using Kubernetes but the endless struggle with kubectl stops you from fully enjoying this fantastic technology? In this article, you will find the solution to all your problems and will be able to finally experience K8S the right way.