Big data project for
Healthcare
About the Client
The Client, a global pharmaceutical company, outsources its digital and traditional marketing campaigns to various providers worldwide. Recognizing the power of data collaboration, the company has formed strategic partnerships with leading tech providers and research institutions to drive innovation in healthcare analytics. They've established a centralized data platform to store vast amounts of clinical trial data, real-world evidence, and market research, enabling rapid analysis and insight generation.
Complication
The Client's primary goal is to develop a comprehensive omnichannel marketing dashboard and calculate marketing return on investment (MROI). Achieving this requires the design and development of a unified data model.
Datumo embraced the challenge of integrating marketing interaction data from a wide range of providers, comprising millions of rows daily, encompassing both aggregated and non-aggregated metrics. This data covers a broad spectrum of channels, arriving through various delivery methods such as Snowflake Shares, APIs, or S3 buckets. The core challenges were manifold. Datumo needed to smoothly integrate a legacy solution into a new system while simultaneously managing data distributed across various Snowflake instances. Robust data pipelines had to be meticulously crafted from scratch to ensure seamless loading into Snowflake. Lastly, to streamline analysis, Datumo concentrated on generating pre-aggregated views optimized for consumption by external BI tools.
The Value We Delivered
Unified Data Model and Standards: Datumo eliminated data silos and discrepancies, establishing a single source of truth for analytics and reporting across diverse business units. This standardized structure reduces time spent deciphering data, allowing analysts and data scientists to focus on actionable insights. The adaptable model accommodates new data sources and evolving business requirements, ensuring longevity and scalability. Clear documentation makes data accessible to non-technical stakeholders, fostering a keen sense of collaboration.
Single Source for Transformations, Tests, and Documentation: Leveraging dbt and its integration with Snowflake, Datumo streamlined model development, quality assurance, and documentation. This approach significantly improved the data landscape, reducing the time needed to build and deploy robust data pipelines. Embedded tests and documentation enhance data reliability and lineage, greatly increasing insight reliability. Version control enables clear tracking of changes, enhancing platform maintainability.
Layered Data Model: Our platform architecture includes raw data, common data models, and consumption layers. The raw data layer archives incoming data for historical analysis and compliance. The common data model layer standardizes and cleanses data, reducing repetitive tasks and ensuring consistency. The consumption layer provides tailored insights for different domains, accelerating time-to-insight for business users. This layered approach promotes flexibility and strong governance practices.
Interactive Dashboards for Quality Assurance: Streamlit in Snowflake enables dynamic, interactive dashboards, allowing users to thoroughly explore data quality patterns and anomalies. This facilitates effective communication of data quality health to both technical and non-technical stakeholders. Integrating Streamlit within Snowflake streamlines workflows, enhancing overall efficiency.
Innovative solutions and advanced technologies
Consumer Data Platform (CDP) maximizes operational efficiency and data integrity through the strengths of Snowflake Data Cloud. It prioritizes flexibility in data ingestion by utilizing a combination of APIs, S3 buckets, and Snowflake Shares. Snowflake's staging areas optimize loading, while tables and views ensure structured data storage. For robust data transformations, comprehensive documentation governance, and rigorous quality checks, dbt and its packages are the backbone of the CDP. To enhance quality assurance and insights presentation, Streamlit’s seamless integration to Snowflake offers interactive visualizations of test results. Snowflake alerts provide notifications of potential data issues, facilitating swift resolutions. Additionally, SQLdbm is used for intuitive data model visualization and Alation as a comprehensive data catalog.
Knowledge Zone
Get to know us, discover our interests, projects and training courses.