I want to learn data engineering, and per the advice of Andreas Kretz, I’m going to document it.
Let me begin by introducing myself. First of all, who am I? At the time of writing this, I’m a full stack developer in my early twenties. When I was much younger, I had never enjoyed learning in an academic environment, and when this feeling was coupled with the uncertainty I felt surrounding the topic of my professional aspirations — (I had no idea what I wanted to do), I thought it best to avoid the debt associated with higher education unless I was certain about where I wanted to go.
How did I start coding?
The afore mentioned period of uncertainty lasted a while, but uncertain as I was, I had always enjoyed technical subjects like mathematics, physics, and engineering. Living each day unsure about where I wanted to go with my future, at some point, I discovered coding, and naturally — given my affinity toward technical subjects and introversion — it fit. First I was making simple and terrible sites with just HTML on YouTube, then came more serious, structured material on sites like Udemy, I’m sure my story is no different from hundreds of thousands of others. Today I have built my skills up to a point where I am able to offer real value to real people as a full stack web developer. Along the way I have learned evergreen (well… as evergreen as you can be in tech) programming principles, frameworks and tools that will maybe only be hot for a few years, and most of all I have learned the lesson that self-education is powerful and you can take yourself from a complete beginner at something, to someone with skills that the market values highly.
It’s great, but what if in addition to learning all that other stuff, you also learn that maybe you kinda sorta don’t like the job anymore?
My problem with Web Development
So, let me start by saying this is just my personal experience, YMMV, different strokes for different folks, etc…
My issue with web development has been that while it is technically a STEM field, for me personally, I never felt the level of challenge and stimulation that I felt from doing things like maths, physics, or learning about topics like ML. Now, don’t get me wrong, I can say without a doubt that there are smarter people working on things that are too complicated for me to understand in this space, so it’s nothing like that. For me, I felt stuck. There comes a point where you simply get bored of centring a div, or modifying a gatsby-config.js file. When you’re able to perform to a level that satisfies the requirements of your clients, there is ample room for stagnation, which then becomes boredom. One solution to this would be to improve your skillsets and double down. With deeper skills and understanding, you can work with more complex problems and continue progressing! But in my opinion, going deeper is only a route worth exploring if you’re currently enjoying the work you do and you want to dive deeper into topics you already enjoy. I can’t imagine that forcing yourself to learn when you have no interest in the material is a path to excellence.
Enter data engineering
What did I tell myself when I looked to step away from web development? I know I like coding, and I know there can be more to it.
Early in my coding journey I took a few courses on simple data manipulation with python. I spent a few months learning about pandas numpy, simple basic stuff. I liked it but at the time I never saw how it would benefit me. So while it kept me occupied and interested, it never really stuck because I simply stopped practicing — I definitely enjoyed it while I was doing it though.
More recently, I spent my free time early this year going through some of the Andrew Ng ML introduction, which was a thoroughly enjoyable experience.
Clearly these things all have a bias toward data. So I began to learn about what roles there are within data — data engineers, data analysts, and data scientists.
From my research, I learned that data scientists are usually educated to a postgraduate level in statistics and have a deep understanding of ML… Not me! Analysts look to gather insights from the data, sounds cool but also sounds like there’s lots of room for error. Finally we have data engineering- without the data engineer, neither the analyst nor the DS can do their job. Data is essential for so many things, and by being as close to the source as possible, you’re in a position that’s most secure. The demand for data with grow more and more every day, as each day a new entity decides to leverage it within their organisation. Today data is an essential resource to so many businesses, and it’s only going to grow.
How am I going to become a data engineer?
First I have to realise there are things I don’t know. Then I have to make it my job to immerse myself and answer all of the beginner questions that come up. As my sphere of understanding grows, I’ll become better equipped to learn. A quick job search and online research on the topic shows me that some of the top skills for a data engineer are the following:
- SQL — postgreSQL is popular
- Apache Softwares like Kafka, Spark, and Airflow, Hadoop
- Cloud skills — that’s AWS/Azure/GCP
So there’s clearly work to be done, but now the first question is answered ‘what do I need to know?’. The way I plan to learn is to study something, with books, courses, videos, etc. and then build a project with that understanding.
The goal here is process. The process of learning every day and producing. Along the way I’ll be documenting things here.
So let’s get started!