Big data project for

Banking

About the Client

The Client, a multinational bank, has embarked on an ambitious global expansion strategy that demands the augmentation of their existing platform. This strategic initiative aims to ensure the platform's capacity to effectively manage and accommodate the continuously growing transaction volume resulting from the bank's global presence. The ability to generate numerous reports promptly and accurately for a wide range of clients is crucial in meeting their needs. By prioritizing platform enhancements, the Client aims to seamlessly handle the escalating demands of their global operations while upholding the highest standards of efficiency, reliability, and data security.

Complication

The Client recognized that their existing Big Data platform would not be able to handle the expanding data volume, with more and more applications and data sources being integrated. The significant increment of input records (representing banking transactions, arrangements etc.) to the platform is supposed to deliver great business value, however with the initial architecture, the system was not capable of processing such amounts of data. There were a variety of potential risks, including delays in receiving the input data, report delivery and accessibility issues. The challenge required an outsourcing of Big Data skills. To address this, the Client partnered with Datumo to develop a centralized system with robust and resilient components that guaranteed a seamless flow of data, supporting the increasing transaction volume. The system adheres to integrity, compliance, and accessibility standards while incorporating stringent safety protocols, while meeting tight delivery deadlines.

The Value We Delivered

Unification of Data Sources: The consolidation allows for efficient data management, simplifies data processing and facilitates comprehensive analysis, resulting in improved decision-making capabilities for the Client.
Comprehensive Platform Migration: To optimize platform cost and ensure maintenance and support levels required by the Client, the existing solution was moved to the Cloudera Data Platform. Datumo played a crucial role in the migration process and architectural discussions. Our experts developed tools verifying the correctness and consistency of the migrated services.
High-Quality Data: Through collaboration with Datumo, the Client gains access to high-quality data tailored to meet the integrity, compliance, and accessibility rules outlined in the Service Level Agreement. This ensures accurate and reliable reporting, enhancing the Client's reputation and trustworthiness.
Compliance with Security Standards: the solution ensures adherence to the stringent security standards of the banking industry. This includes robust measures to safeguard data confidentiality, integrity, and availability, mitigating the risk of data breaches and unauthorized access.
Solution Expansion: The solution is easily scalable and open to extension, allowing for further development of business analytics, providing a better understanding of the business, and mitigation of risks associated with further expansion.

Innovative solutions and advanced technologies

Stored data is distributed through Hive tables with complex schema structure, Impala and Hbase. Incoming data is varied as it contains information regarding different domain areas - thus different strategies for storing and updating data are implemented. Avro schema is used to ensure data compatibility and smooth schema upgrades. Generic data converters have been implemented to allow some breaking, non-backward compatible changes.

Various reports and real-time metrics are generated with Apache Spark (on YARN) and complex SQL queries. Datumo experts implemented robust mechanisms allowing advanced data transformation tracking through all Spark applications. Ansible and Jenkins are used for automation tasks and CI/CD processes. Orchestration of the processes is conducted in UC4 consisting of over 500 jobs. Late events and transactions are supported by implemented data pipelines that process and share data within required security norms, ensuring data governance and compliance. All in all, the platform allows users and stakeholders to responsibly and efficiently analyze data and make better business decisions. It’s worth mentioning that great emphasis was placed on the quality of the code through a detailed code review and test coverage of at least 90%.

Knowledge Zone

Get to know us, discover our interests, projects and training courses.

Knowledge Zone

Snowflake vs Databricks vs BigQuery

5 reasons why Google Cloud BigQuery is perfect for building a data platform

When shuffle in a BigQuery matters - short story of few joins

Get expert advice for free