Big data project for
Insurance
Why?
Making accurate decisions is quite a challenge for all investors. You always want to maximise the profit and perform better than your competitors. Even though the best moment to buy or sell a security is virtually impossible to predict, you can use some historical data to spot an opportunity. As there are many complex factors influencing share prices, you need a lot of data from a variety of sources with heavy computational transformations to maximise the potential of these statistics.
Our Client, a company investing capital generated by insurance services, has an asset portfolio of €900 billion. It has an existing Big Data platform running in the Azure Cloud with strong utilisation of Databricks. Dozens of jobs are executed daily to produce reports required for regulatory reasons and for the analytic steam - the company is heavily focused on ESG investing. The computations are extensive, with rapidly changing requirements and specifications. Datumo experts were brought in to use their Big Data and Cloud experience to improve the performance of existing jobs and develop future features in an optimised way.
What & how?
There are multiple steps in the process of utilising data from various sources and producing valuable investment insights. Many API fetchers periodically acquire data about issuers and securities from a diverse range of providers. Datumo developers significantly improved those API scrapers by introducing proven solutions like asynchronous HTTP requests, better error and retry handling, and implementing the code more generically, so new integrations can be added easily.
With data loaded into the data lake, Spark jobs perform multiple computations. As Databricks is the main service of the platform and Delta Lake is the storage layer, the medallion architecture is used as a design pattern for data division. A lot of periodic ETL jobs transform the data, while other applications produce reports required for business intelligence and regulatory compliance. All of these processes are implemented as Spark jobs written in Scala. As Datumo experts are Databrick saficionados, we optimised the computations and storage performance. Also, our Spark experience allowed us to refactor the code and make it more generic and reliable.
Benefits
Datumo's contribution resulted in significant improvements across the board. Redesigning the API fetchers reduced the execution time of crucial downloads - from even 6 hours to a few minutes. Spark jobs were greatly optimised by refactoring core components or adjusting the code to utilise parallelism. With built-in Databricks features, we improved Delta Lake and a lot of database operations. All the upgrades resulted in the platform being more reliable and simplified the introduction of new features, while minimising the number of errors and platform issues. Last but not least, each step reduced the Cloud services cost.
Knowledge Zone
Get to know us, discover our interests, projects and training courses.