Big data project for
Banking
Why?
Due to global expansion, the Client wishes to create a centralised data platform to ensure smooth data flow and make data more accessible for various teams. The key factors include data integrity, compliance, accessibility and watertight security. The platform needs to be resilient and capable of handling a constantly growing transaction volume. Additionally, it has to generate dozens of reports for various clients in both real-time and batch manner in an optimised way as they must adhere to strict delivery schedules.
What & how?
Datumo focuses on implementing and improving Hadoop-based solutions, collecting data from various systems. Data is received through Apache Kafka and stored on HDFS. Data transfer is managed by Apache Flume and Spark. Stored data is represented in tables in Hive with complex schema structure, Impala and Hbase. Incoming data is varied as it contains information regarding different domain areas - thus different strategies for storing and updating data are implemented. Avro schema is used to ensure data compatibility and smooth schema upgrades. Report generating code uses Apache Spark (on YARN) and queries data to produce various reports which are then available in batch and real-time. Automation and orchestration of this process are conducted in UC4 consisting of over 500 jobs. This flow also supports late events and transactions. Additionally, it shares data within required security norms. It allows users to analyse and make better business decisions. Great emphasis was placed on the quality of the code through a detailed code review and test coverage of at least 90%.
Datumo is responsible for:
- setting up environments with Ansible
- making code improvements and refactors
- Spark jobs optimization
- unification of data
- migration to the Cloudera platform
- expanding the solution with new products and reports
- setup Oozie workflows
- gathering and analysing logs with Splunk
- Platform support concerning SLA.
Benefits
Datumo’s team introduced many improvements:
- code refactoring allowed faster execution of Spark jobs
- update of deployment scripts made deployment easier
- migration to the Cloudera platform in a fast, safe and secure manner
- reduction of execution time through changes in automation flows
- introduction and utilisation of new features of applied technologies.
Knowledge Zone
Get to know us, discover our interests, projects and training courses.