Datumo: A Journey From a Product Company to Experts in Data Analytics
Asia: Hi Michał, thank you for taking the time for our interview today.
Michał: Hi Asia, thank you for the invitation. I'm excited about the revival of our blog!
A: Indeed, speaking of reactivating the Datumo blog, this interview is our first publication after a while. Why was the blog reactivated?
M: Our priority is to share knowledge - technical, business and employer branding expertise. We want to introduce our readers to what we do and the capabilities we possess. We've worked extensively on our new website to give our blog sufficient exposure. I'm glad we're returning to regular publications!
A: Moving on to our topic today, let's start with a standard question - who are you and what do you do at Datumo on a daily basis?
M: Let me start with an interesting fact - I was the first Datumo employee, involved before it was even established.
I have been working at Datumo for 6 years, and for the past 3, I have been serving as the CTO, which, in the case of a company like Datumo, means helping customers by architecting solutions that maximize added value, while overseeing the company's technological development. During my 6 years at Datumo, I've had the pleasure of working on data platform and migration projects for organizations of various sizes - from scale-ups to companies employing tens of thousands of employees. Looking back, working at Datumo has allowed me to gain a vast amount of experience in a relatively short period. My most recent success has been the migration of one of the largest data platforms in Central and Eastern Europe to Google Cloud. In this project, I led a 7-person team of Cloud and Data specialists from Datumo, collaborating with client teams on data processing migration.
In addition to project work, I oversee an academy of future Datumo leaders. Within the initiative, we carry out internal projects that streamline our daily duties. An important aspect of my work is mentoring my coworkers. Last but not least, I continuously strive to develop myself professionally. I try to keep up to date with technology, which is not easy considering the rapid development of our industry (Data & AI).
Outside of work, I have a real passion for running and I can proudly say that I have run a 5 km distance in under 15 minutes. Running has taught me determination in pursuing goals, perseverance, and discipline - skills that can be utilized in my professional career. Apart from that, I also manage to find some time for gaming.
A: Let's rewind a bit. As the first Datumo employee, you have extensive knowledge and are essentially part of the company's history. Datumo is currently a service company, but it wasn't always that way. Can you tell me how Datumo got to where it is today?
M: Datumo started out as a product provider. Our product, Storyteller, was a cloud-based analytics platform built on open-source technologies deployed on the Google Cloud Platform. Apache Druid was at the core of the platform, a high-performance data store optimized for time series data analysis. In building the platform, we also used Apache technologies: Spark, Airflow and Kafka, as well as BigQuery, and BI tools like Apache Superset and Turnilo.
The product was quite successful as we onboarded 8 clients to the platform. The main challenge and at the same time advantage of our product was Druid, which is a highly specialized technology for time series data analysis. Its potential is evident in handling large-scale data. Readers can learn more about Druid from my presentation at Devoxx. However, specializing in Druid significantly limited our potential client base, as it wasn't the go-to choice for most data analysis needs. Another challenge we faced was the reluctance of companies to switch to cloud storage. They had concerns as to how our product would store and manage client data. Recently, there has been an ongoing positive change in attitudes towards cloud storage
Since 2020, we have overseen a smooth transition of Datumo to a service-based model to address our clients' needs. Increasingly, customers enquired about Big Data services not related to the platform. We saw huge potential in this and decided that it was worth looking into as it fell under our area of expertise. We treated Storyteller as a product comprised of components that could be independently deployed. We began assisting our clients in data platform design, cloud migration, and data pipeline implementation. In one particularly innovative project, we even managed to deploy a customized version of Storyteller adapted for a streaming platform in Asia.
In hindsight, I believe that focusing on Druid and building a product based on it was a wise decision, considering the invaluable technical and business experience we attained during the three years of developing and selling the product. We've learned that creating a good system is only half the battle. You need to learn to highlight and talk up the advantages of our services.
A: So, what is Datumo today?
M: We are a team of 40 Big Data and Cloud experts with extensive experience and a fantastic track record. Based on our experience gained during product development, we build data and ML platforms tailored to our clients' needs. We take an active role in all levels of data lifecycle management: modeling, transformation, integration and cataloging. We're breaking data silos by organizing customer data into a lakehouse architecture. We love to create platforms which follow Data Mesh principles. Our second specialization is data platform migration to the cloud and we particularly enjoy Hadoop migrations. As I mentioned earlier, I recently completed a year-and-a-half-long migration of a large Hadoop platform, in which my previously mentioned team of 7 coworkers played a key role. We also migrate classic enterprise data warehouses or relational databases to cloud native analytics solutions. Not only do we carry out migrations, but we also assist clients in choosing the optimal migration approach based on their needs. Another specialization of Datumo is finding data-based solutions to business problems. We implement data processing systems that handle tens of terabytes daily. We create ML models that can bring huge advantages to our clients. I suppose it is worth mentioning that we’re Google Cloud and Databricks partners.
A: What attracts clients to Datumo?
M: Clients value Datumo primarily for the quality of our services and our understanding of their business. For me, as the CTO, it is paramount that we give equal focus to both of these aspects, upholding high standards in Data & AI system creation and tailoring these systems to address specific business needs. Most of our projects begin with a series of workshops where we get to know our clients and their requirements. Building partnerships is essential in our business. Based on our experience, we define project requirements, architecture, and implementation plans. Our goal is to build solutions that maximize value for the client. A common compliment we often hear from our clients is that we function as a cohesive team. We promote a healthy working environment and we exchange project knowledge internally, boasting a low employee turnover. I'm pleased that we have been able to maintain this from the very beginning.
A: What technologies is Datumo currently specialized in?
M: We specialize and excel in Big Data and cloud technologies. We have a strong team that is able to handle any challenges that may arise. Technology is rapidly changing, and we value our team’s ability to adapt to new trends. We've seen the second generation of Big Data and Cloud technologies emerge in the last few years. What was cutting-edge when Datumo was established is now considered the standard. This shows that we made the right decision in choosing our technological stack at the beginning. However, we can't take our foot off the gas - new technologies are constantly emerging, opening up new possibilities for us. To answer your question - for our clients, we undertake projects on all three major cloud providers: Google Cloud, Azure, and AWS. Regarding open-source technologies, we mostly use Apache Spark, Airflow, Kafka, DBT, Scala, Python, and Docker. We are also very adept in cloud products such as Databricks and Snowflake. The technologies I mentioned are just a small part of our stack.
A: Michał, thank you very much for your time and for sharing your knowledge.
M: You are very welcome and thank you.