Creating Big Data platforms using technologies from the Apache family
The aim of the training is to gain practical knowledge about Big Data solutions.
Purpose of training
You will learn how to use popular Big Data technologies: Apache Spark, Apache Kafka, Apache Airflow and Apache Druid. You will also discover how to build complex Big Data systems from scratch. Practical workshops are the main strength of the training.
50% - theory, 50% - practical workshops
The training can be conducted in the Client’s office or other convenient spot.
3 days. The training program is tailored to the needs of the group.
The training is addressed to programmers and business analysts whose goal is to learn about Big Data tools. Basic knowledge of Java or Scala is recommended.
Module 1 Overview of the Apache family Big Data solutions and introduction to data processing
1.1 Overview of the Apache family Big Data solutions
1.2 Scala for Big Data
- 1.2.1 Case Class, Traits
- 1.2.2 Tuples
- 1.2.3 Lazy evaluation
- 1.2.4 String interpolation
- 1.2.5 Pattern matching
- 1.2.6 Companion object
- 1.2.7 Collections and transformations
- 1.2.8 For comprehension, mapping
- 1.2.9 Try/Either/Option
- 1.2.10 Implicits
1.3 Apache Spark - introduction
- 1.3.1 RDD, DataFrame, Dataset
- 1.3.2 Lazy evaluation
- 1.3.3 Transformations and actions
- 1.3.4 Spark vs. Hadoop
- 1.3.5 DataFrame vs Dataset API
Module 2 Data processing using Apache Spark and a modern data warehouse - Apache Druid
2.1 Workshops: Spark - how to enrich your data?
2.2 Apache Spark - architecture and optimization
- 2.2.1 Architecture (driver, worker, executor...)
- 2.2.2. Optimization of jobs and parameters
- 2.2.3 Deployment
- 2.2.4 Shuffling
- 2.2.5 Common errors - key-skew, serialization, OOM
- 2.2.6 Broadcast, repartition, caching, execution plans, optimization
- 2.2.7 Spark internals - joins, group by
2.3 Apache Druid
- 2.3.1 Architecture
- 2.3.2 Data structures
- 2.3.3 Component management
- 2.3.4 Druid and Big Data platforms based on Apache Hadoop
- 2.3.5 Real-time and batch processing
Module 3 Streaming and orchestration
3.1 Apache Kafka
- 3.1.1 Pub/Sub pattern, difference between push and pull models
- 3.1.2 Architecture
- 3.1.3 Topics
- 3.1.4 Kafka producer & Kafka consumer
- 3.1.5 Analysis of the scalability of Apache Kafka based system
- 3.1.6 Consumer groups
- 3.1.7 Replication and retention
- 3.1.8 Zookeeper
3.2 Apache Airflow
- 3.2.1 Processing automation
- 3.2.2 Creating a Data Pipeline - Defining Acyclic Processing Directed Graphs (DAGs)
- 3.2.3 Architecture
For many years, I have been regularly conducting practical training in the area of Big Data. I conduct lectures and workshops with pleasure and a smile on my face. I have received very positive feedback from all of my students. In the past, I lectured in postgraduate studies. Let's talk!
I am a dedicated trainer, specialising in Scala and Spark. I like sharing knowledge, conducting training, while popularizing programming best practices. I develop code on a daily basis, so I constantly increase my hands-on experience. You'd like to talk about programming practices? Write to me!
I have been running Big Data training courses since 2019. I relish the opportunity to meet interesting people and identify problems that Big Data technologies can solve. Let's talk!
I am a Big Data trainer. I effectively share knowledge in technological areas, such as Apache Hadoop, Spark, Nifi, Airflow, Druid or Kafka. Training gives me an opportunity to spread interest in new technologies and, as a result, overcome business challenges. Let's talk!
I conduct training in Big Data technologies, focusing on Scala and Spark. My goal is to transfer knowledge in an effective and interesting way that attracts people to new technologies and solutions. Let's talk!
Since 2020, I have been co-creating materials and conducting theoretical and practical classes focused on Spark. I use this technology on a daily basis to process large data sets from various industries, both in on-premise environments and those based on cloud solutions. This makes me a reliable expert who builds training on experience.