How to become a junior Data Engineer? — my story
My name is Krzysztof Wasilewski and since February 2021 I have been a Junior Data Engineer in Datumo. I’d love to share my story with you!
Since childhood I have been interested in computers and new technology. After high school I started studying Electronics at Warsaw University of Technology. This was my first experience with programming, at a basic level. During the first year I learned the basics of Structured and Object Oriented Programming, and completed several projects that gave me a lot of useful experience. At that time I was interested mostly in the Internet of Things and Embedded Systems. My Bachelor’s thesis “Raspberry Pi as a cost-optimized network data acquisition system” was my first major project and I learned a lot about Linux device drivers in the course of it. The main aim of the project was to build a system which can be used to acquire data and save as much as accurate information as possible about measurement timestamps. The main issue was to ensure completion of time-critical tasks in the multithreading Linux operating system.
In my Master’s Thesis I implemented neural networks in an FPGA system. At the beginning of my journey I had little idea about Big Data.
Why Big Data?
It’s easy to see that nowadays we are generating ever more data and so we need more sophisticated ways of processing it efficiently. As the “Forecast Revenue Big Data Worldwide 2011–2027” shows, in the next few years the market is most likely going to grow substantially, so this is an ideal time to learn more about Big Data itself.
The Embedded Software market is considerably smaller and lacks interesting projects for Junior Engineers. While out running with my good friend Michał Misiewicz — Chief Technology Officer at Datumo — we spent a lot of time talking about technology. It was Michał who convinced me that Big Data is well worth learning, and thanks to him I’ve decided to change my specialization.
Challenges faced during learning basic Big Data skills
When I started learning Big Data I was initially shocked by how many different technologies and tools are commonly used. At the beginning it wasn’t easy to become familiar with all the different terms, especially since developers and architects tend to use acronyms. I remember having to google each one, so as not to interrupt the flow of their conversation too much. At the beginning I was unaware of tools which can be really helpful in software development and acquiring new skills.
Before being recruited, Datumo gave me access to online courses and the necessary materials to prepare for the interview. After 2 months of learning I successfully passed the interview and started the onboarding process. Right now I’m working on a project in a strong team of 5 people.
During the onboarding period in Datumo I received a lot of help from other team members. I was learning basic Data Engineer technologies like Scala, Spark, Kafka and Docker, using workshops prepared by colleagues working as Senior Developers. Each coding exercise had tests, so that I was able to check my solutions by myself. However, a working solution isn’t always perfect, so after completing each task I had a code review. On many occasions Senior Devs helped me find a simpler solution to a problem and directed my attention to optimization problems.
I guess this was the most valuable experience in the entire onboarding process. What I also learned at that time is that software development is teamwork and for Junior Data Engineers it’s really important to get in touch with more experienced developers.
My academic projects were mostly concerned with low level programming. I used the C/C++ language, Assembler and even VHDL (Hardware Description Language). I also learned the basics of Java and Python, and all of this came in handy when I set out to learn Big Data technologies. But sometimes I regret those habits which I picked up from my low-level programming experience. I also regret that I started learning Object Oriented Programming using C++. In my opinion, it would have been much easier to learn basic OOP concepts using higher-level programming languages, like Python or Java. I could also have begun learning software design patterns earlier, but at the time I didn’t know how important this was. I learned about DDD (Domain Driven Development) and hexagonal architecture, and then developed a project using these two patterns. Another highly valuable experience was learning about TDD (Test Driven Development), which I was practicing during coding exercises.
Advice for people who want to become Data Engineers
The most important thing in learning programming is to practice, practice and practice! Particularly at the beginning, you should set out to do a lot of projects on your own.
For those who are confused by hearing so many different acronyms and new Big Data technologies, I suggest creating your own dictionary to keep track of them. In my case, this proved to be really helpful.
Another suggestion is to create your own wiki, where you can record all the dev problems you encountered, together with the solutions you came up with. I’ve lost count of how many times I came across the same kind of problem yet somehow forgot the solution that goes with it. Another nice idea is to have one common wiki for the whole team.
In Datumo we have developed a wiki portal based on mkdocs. Wiki content is written in Markdown language and saved in a Gitlab repository. Every team member can post his or her “piece” of knowledge, make a Pull Request and — after a code review — share their experience with others.
There are lots of technologies and tools used in the Big Data world and the situation is changing rapidly. I’m not able to mention all of them but to start with I suggest learning the basics of Spark which is a really powerful tool commonly used in lots of Big Data projects. Spark can be used with Java, Scala, Python, R, and SQL, but as it is written in Scala it works best with that. Since we are dealing with data, SQL and relational database skills are also very useful.
It’s also very important to have basic git skills at the beginning of learning software development. My advice is to ask one of your colleagues to show you how your team works on code development and then do the same. Configure your IDE to make the process fast and easy.
Last but not least, every time you encounter new technology don’t hesitate to spend some time reading through the documentation. Regarding this point, let me just leave you with a meme:
Working as a Junior Data Engineer has been a great experience for me. Dealing with processing and organizing data requires lots of technical skills, such as programming ability and computer science knowledge, but also some soft skills, such as sharing know-how with your colleagues and maximizing use of the data you collect. I’ve really enjoyed the time I spent during the onboarding period in Datumo, and would like to thank my colleagues for their patience and sharing their experience with me.
I can’t wait to learn more about Big Data and I hope I’ve convinced you to find out more about it too.